Everything you need to know about Similarweb’s Data Version Update

In this article


Introduction and main takeaways

At Similarweb, our commitment to delivering the most comprehensive and valuable insights into the digital world drives us to continually enhance our data offerings. We are excited to announce the upcoming release of our newest data version, officially launching on July 28, 2024.

This new data version will bring significant improvements to the accuracy of web data estimations, expanded domain coverage, and algorithm updates that reflect the latest advancements in our web methodologies, including industry changes like Google’s transition to GA4. These enhancements are the result of extensive research and development efforts aimed at elevating our data quality.

With the new data powering Similarweb’s Web Intelligence solution, you can expect the highest standards of coverage and accuracy, invaluable web insights, and unparalleled visibility into your industry, market, and competitors.

What does this mean for you? You will benefit from more precise data for better benchmarking and decision-making, broader insights from an expanded range of domains, and enhanced visibility into more countries. This update ensures you stay ahead with the most accurate and complete representation of the web.

This powerful and exciting new data release will include: 


30M+ New Websites:  Increased global coverage by adding millions of new domains and improving our data globally.


Enhanced Web Accuracy: Improve the accuracy of core metrics, enabling higher precision of website traffic estimations.


Improved Web Insights: Provide a more accurate view of key insights, including category analysis, traffic sources & segments.


Historical Data Access: A full 5-year historical rerun for all core traffic and engagement metrics.  Note: Access to historical data will be based on your package. 

41339c4c-547e-4584-a8e1-076de3a885d5 Alignment across solutions: All web data across Similarweb solutions and channels will be seamlessly updated. This includes the platform, API, and our free tools.


Similarweb data methodology refresher

First, a refresher on Similarweb's data collection methodology -- Our data methodology uses a multidimensional approach to ensure that every data point we estimate is both reliable and representative of a site’s traffic and usage. This commitment remains firmly intact and our methodology continues to rely on four primary sources that allow us to create a comprehensive view of the digital world:

  • Direct Measurement Data: Direct input from websites and businesses

  • Global panel data: Aggregated from millions of users globally

  • Data Partnerships: Collaboration with data partners enriches our coverages

  • Public Data Sources: Web scraping and publicly available data help provide a broad market view

To learn more, visit Similarweb Data Methodology. 

So, what's different in this new data release? Read on.


What's changing in this data version and why

The updated data version represents significant algorithm improvements and coverage enhancements. The new data will align historical and new data for consistency -- ensuring accurate comparisons and better insights.

We've made several important updates across three key pillars of our web methodology: Proprietary Data, Learning Sets, and Web Algorithms. Here’s a quick overview of what’s changing and why:

1. Proprietary data:

  • New data points: We’ve integrated additional data points from both new and existing sources.
  • Long-tail domains: We’ve developed a new method to better estimate traffic for smaller sites, leading to a significant increase in coverage.
  • Bot traffic filtering: Improved identification and filtration of bot-generated traffic to ensure more accurate input.

2. Learning set:

  • Transition to GA4: We’ve adapted our algorithms and methodology to Google Analytics 4 (GA4), aligning how we measure the web with how many of our clients measure their own traffic and with current market standards.

3. Algorithms:

  • Updated Algorithms: We’ve updated our Desktop and Mobile Web algorithms for better accuracy.
  • Improved Marketing Mix (MMX) Algorithm: Improved our MMX algorithm to better identify traffic sources to websites.
  • Enhanced Tracking: Improved mechanisms for tracking seasonality and month-over-month trends.
  • Localized Data Models: Improved coverage and accuracy in key countries like Brazil, India, Japan, and Russia.

What this means for you:

  • Expanded coverage:  You can expect more reliable and accurate data estimations and broader geographical coverage which will significantly broaden the scope of web analytics you can access with Similarweb.  
  • Improved Precision and Accuracy: New advanced filtering and improved algorithms ensure your strategic decisions are based on the most precise and reliable metrics available. We've seen measurable enhancements in the metrics, as verified against direct measurements and real numbers from first-party publishers.
  • Adaptive Algorithms: The data is fine-tuned to keep pace with the dynamic web and mobile usage trends and industry shifts, such as GA4. This ensures that the data you rely on is not only the most current and accurate data available, but also predictive and insightful.
  • Historical Data: the new data will include five years of historical data rerun with the improved algorithms.


What metrics will be impacted?

The new and improved data version will impact all web-related metrics and will include a historical reinstatement of the following metrics:

The scope of impact will vary based on a website's size, geographic region, and other parameters, including:

  • Overall traffic volumes: Some sites may see traffic numbers rise or fall depending on size and region.
  • Website traffic trends: For existing websites, trends will likely remain consistent. New websites will display data from July 2024.

  • Availability of data by country: There will be more robust coverage of websites in smaller countries, which previously displayed N/A.

In addition, coverage will be expanded across many regions.  Some examples of the data coverage boost by country include: Data update country coverage.png 


Impact on Keyword Analysis

Our keyword research tools are backed by our powerful Search 3.0 data, which alongside this data version change is now moving out of open beta. The data version change helps increase the accuracy and granularity of: 

  • Monthly search volume 
  • Last 28 days search volume
  • Click (visitor) data 

Users may find either an increase or decrease in figures from rollout (July 28th), though we estimate the change to be relatively minor for most countries. 

Impact on Segment Analysis

Generally, the changes in Segment Analysis Traffic & Engagement metrics will be consistent with the changes observed in Website Analysis metrics.
In addition to these changes, the introduction of new data sources starting from July 28 data (which will be available in August) will lead to the following updates:

  • Segment Share and Monthly Visits (All Traffic filter) - In 95% of cases, the data may show slight variations, but the overall trend will remain consistent.
  • Unique Visitors (All Traffic filter) - In 25% of cases, there will be changes in the trendline. Please be aware of this when performing year-over-year analysis.
  • Other Traffic & Engagement metrics will not be impacted by the new data sources.

It’s important to note that any Segment Analysis query performed through the platform or API after the data version change will utilize updated data across the entire 37 months of available historical data. This applies to all segment queries, regardless of when the segment was created.


Impact on Conversion Analysis

Website Visits will be updated to reflect the new estimations shown in Website Analysis. Please note that Conversion Rate and Converted Visits will not be retroactively adjusted. As a result, starting in July 2024, users may notice a slight data shift of up to 5%.


Which features/datasets will NOT be impacted by the update?

Still have questions? We’re happy to help! Reach out to your account manager or submit a request through our contact form.



Frequently Asked Questions

Q: Which Countries will be impacted?

A: The new data version will impact all available countries with additional domain coverage and improved accuracy.

Q: Where will the greatest impact be seen - on large websites or smaller websites?

A: In terms of coverage, the data for smaller sites will see the main impact. Accuracy improvements will be more evident with medium websites (>50k monthly visits) and some larger websites (>1M monthly visits).

Q: Has your data methodology changed?

A: No. Our data methodology remains the same, but we improved every component of the methodology to provide better estimation capabilities and improved coverage on both desktop and mobile.

Q. Will I be able to see the old data after you apply the new version?

A. Yes, we will keep the old version of our data (only core web metrics and marketing channels as listed in this article) accessible via Batch API for a period of 3 months after the new version is released, and then we’ll no longer retain it.

Q: Will SEO and PPC data change?

A: Search 3.0 powers our keyword research tools, and this data will be impacted by this version change. As Search 3.0 is a blend of SERP data too, the impact of this change will be minimal, though accuracy of search volume and click metrics will improve.

Q: How will the data version impact segments being pulled through the API?

A: With each API pull, users will receive data from the latest data version. Please note that the first request after a data version update may experience a slight delay.

Q: A challenge we’ve had with Segment Analysis is not picking up enough data because the traffic size is too small - how is this changing with the new data?

A: Segments will have wider coverage and will be able to pick up on smaller sites.

Was this article helpful?
5 out of 6 found this helpful