SimilarWeb Data Methodology

In this guide, we’ll show you how SimilarWeb gathers data from a wide variety of sources to provide our users with digital market intelligence insights on millions of websites and apps from over 190 countries around the world.

SimiliarWeb’s data methodology embraces a multidimensional approach, leveraging four primary data sources to offer a holistic and reliable view of your and your competitors’ digital profiles:

  • Global Panel Data
  • Data Partnerships
  • Public Data Sources
  • Direct Measurement Data

Who Provides SimilarWeb’s Global Panel Data?

Global panel data is derived from a broad consumer product-based panel. We have a dedicated product team at SimilarWeb that is responsible for building and partnering with hundreds of high value consumer products that make up the panel.

As users benefit from the products, they contribute to the panel, seamlessly and anonymously. SimilarWeb does not incentivize users to download or use our products, eliminating certain biases. Our high retention utility and entertainment products providing real value to consumers keep the panel consistent and representative. These products are installed on hundreds of millions of devices covering 190+ countries, making it the largest and most diverse panel in the industry.

Our panel is roughly split 50/50 between desktop and mobile. 

What Kind of Data Partnerships Does SimilarWeb Have?

SimilarWeb seeks and maintains data partnerships to supplement insights about digital behavior, particularly from operating systems and or devices that may be underrepresented in the panel.

We are unable to disclose the names of our data partners due to confidentiality agreements in place. SimilarWeb always complies with local laws and only collects data from countries that allow such partnerships.

Data Partnerships FAQ

Where does this kind of data come from?

SimilarWeb partners with Internet Service Providers (ISPs) around the world that provide anonymous, aggregated logged data. This data is used to supplement insights about digital behavior worldwide in all platforms collected from consumer products.

ISPs aggregate usage behavior across websites and apps, not users.

Which Public Data Sources Does SimilarWeb Use?

SimilarWeb’s public data sources are an aggregation of online information available to the public. Similar to how search engines like Google index the web, SimilarWeb employs an automated technique for capturing and indexing public data from over a billion websites and app pages every month.

With this indexed public data, SimilarWeb constructs relationship webs between sites and apps enabling us to categorize every website and every interaction and to map out the taxonomy of sites and their subdomains.

SimilarWeb is the only Market Intelligence company on the planet that provides a full breakdown across all digital channels.

How Does SimilarWeb Leverage Direct Measurement Data?

SimilarWeb leverages a growing dataset of hundreds of thousands of websites and apps that share their directly measured data, including Google Analytics, Adobe Analytics, app developer data, and others, to test and calibrate the reliability of the insights produced from other sources.

Many companies connect the direct measurement tools they use on their site to their SimilarWeb account for greater precision and understanding of their performance in relation to their competitors.

Unlike conventional, purely linear panel methodologies, direct measurement data enables SimilarWeb’s team to scientifically calibrate for bias and move from sample to estimation by transforming all of the data sources into intelligent estimations across all sites.

Direct Measurement Data FAQ

Why do companies connect their direct measurement tools to their SimilarWeb account?

Using a third-party measurement tool like SimilarWeb offers sites independent verification of the metrics they use to attract advertising or investment.

You can learn more about connecting your Google Analytics account to SimilarWeb here.

Direct Measurement Data Alone Is Not Enough

Even though different direct measurement tools use similar technology, there are often significant discrepancies in the results they provide. Even with Google’s family of products there are discrepancies in the metrics they use.

When customers switch from one direct measurement tool to another, they often experience  a change in their numbers of 20-30%, practically overnight. One of the reasons for this discrepancy is “bot” traffic, or traffic that comes from robots that are crawling and scraping, inc contrast to real users. Bot traffic should be disregarded from direct measurement analytics. However, each direct measurement tool uses different ways to detect and disregard the bot traffic, which means that the statistics across tools are not aligned.

Companies already using direct measurement data can benefit enormously from the contextual market intelligence that SimilarWeb provide for benchmarking performance and optimizing acquisition.

Was this article helpful?
0 out of 1 found this helpful