In this article
Introduction and Main Takeaways
At Similarweb, our commitment to delivering the most comprehensive and valuable insights into the digital world drives us to continually enhance our data offerings. That is why we released a new data version that officially launched on July 28, 2024.
This new data version brings significant improvements to the accuracy of web data estimations, expanded domain coverage, and algorithm updates that reflect the latest advancements in our web methodologies, including industry changes like Google’s transition to GA4. These enhancements are the result of extensive research and development efforts aimed at elevating our data quality.
With the new data powering Similarweb’s Web Intelligence solution, you can expect the highest standards of coverage and accuracy, invaluable web insights, and unparalleled visibility into your industry, market, and competitors.
What does this mean for you? You will benefit from more precise data for better benchmarking and decision-making, broader insights from an expanded range of domains, and enhanced visibility into more countries. This update ensures you stay ahead with the most accurate and complete representation of the web.
This powerful and exciting new data release will include:
30M+ New Websites: Increased global coverage by adding millions of new domains and improving our data globally. |
|
Enhanced Web Accuracy: Improve the accuracy of core metrics, enabling higher precision of website traffic estimations. |
|
Improved Web Insights: Provide a more accurate view of key insights, including category analysis, traffic sources & segments. |
|
Historical Data Access: A full 5-year historical rerun for all core traffic and engagement metrics. Note: Access to historical data will be based on your package. |
|
Alignment across solutions: All web data across Similarweb solutions and channels will be seamlessly updated. This includes the platform, API, and our free tools. |
Similarweb Data Methodology Refresher
First, a refresher on Similarweb's data collection methodology -- Our data methodology uses a multidimensional approach to ensure that every data point we estimate is both reliable and representative of a site’s traffic and usage. This commitment remains firmly intact and our methodology continues to rely on four primary sources that allow us to create a comprehensive view of the digital world:
-
Direct Measurement Data: Direct input from websites and businesses
-
Global panel data: Aggregated from millions of users globally
-
Data Partnerships: Collaboration with data partners enriches our coverages
-
Public Data Sources: Web scraping and publicly available data help provide a broad market view
To learn more, visit Similarweb Data Methodology.
So, what's different in this new data release? Read on.
What changed in this data version and why
The updated data version represents significant algorithm improvements and coverage enhancements. The new data will align historical and new data for consistency -- ensuring accurate comparisons and better insights.
We've made several important updates across three key pillars of our web methodology: Proprietary Data, Learning Sets, and Web Algorithms. Here’s a quick overview of what’s changing and why:
1. Proprietary data:
- New data points: We’ve integrated additional data points from both new and existing sources.
- Long-tail domains: We’ve developed a new method to better estimate traffic for smaller sites, leading to a significant increase in coverage.
- Bot traffic filtering: Improved identification and filtration of bot-generated traffic to ensure more accurate input.
2. Learning set:
- Transition to GA4: We’ve adapted our algorithms and methodology to Google Analytics 4 (GA4), aligning how we measure the web with how many of our clients measure their own traffic and with current market standards.
3. Algorithms:
- Updated Algorithms: We’ve updated our Desktop and Mobile Web algorithms for better accuracy.
- Improved Marketing Mix (MMX) Algorithm: Improved our MMX algorithm to better identify traffic sources to websites.
- Enhanced Tracking: Improved mechanisms for tracking seasonality and month-over-month trends.
- Localized Data Models: Improved coverage and accuracy in key countries like Brazil, India, Japan, and Russia.
What this means for you:
- Expanded coverage: You can expect more reliable and accurate data estimations and broader geographical coverage which will significantly broaden the scope of web analytics you can access with Similarweb.
- Improved Precision and Accuracy: New advanced filtering and improved algorithms ensure your strategic decisions are based on the most precise and reliable metrics available. We've seen measurable enhancements in the metrics, as verified against direct measurements and real numbers from first-party publishers.
- Adaptive Algorithms: The data is fine-tuned to keep pace with the dynamic web and mobile usage trends and industry shifts, such as GA4. This ensures that the data you rely on is not only the most current and accurate data available, but also predictive and insightful.
- Historical Data: the new data will include five years of historical data rerun with the improved algorithms.
What metrics were impacted?
The new and improved data version will impact all web-related metrics and will include a historical reinstatement of the following metrics:
- Visits
- Pages per visit
- Bounce rate
- Page views
- Global Rank
- Average visit duration
- Deduplicated Audience
- Geography - Top websites by country
- Device Share
- Marketing Channels
- Unique Visitors
The scope of impact will vary based on a website's size, geographic region, and other parameters, including:
- Overall traffic volumes: Some sites may see traffic numbers rise or fall depending on size and region.
-
Website traffic trends: For existing websites, trends will likely remain consistent. New websites will display data from July 2024.
-
Availability of data by country: There will be more robust coverage of websites in smaller countries, which previously displayed N/A.
In addition, coverage will be expanded across many regions. Some examples of the data coverage boost by country include:
Impact on Keyword Analysis
Our keyword research tools are backed by our powerful Search 3.0 data, which alongside this data version change is now moving out of open beta. The data version change helps increase the accuracy and granularity of:
- Monthly search volume
- Last 28 days search volume
- Click (visitor) data
Users may find either an increase or decrease in figures from rollout (July 28th), though we estimate the change to be relatively minor for most countries.
Impact on Segment Analysis
Generally, the changes in Segment Analysis Traffic & Engagement metrics will be consistent with the changes observed in Website Analysis metrics.
In addition to these changes, the introduction of new data sources starting from July 28 data (which will be available in August) will lead to the following updates:
- Segment Share and Monthly Visits (All Traffic filter) - In 95% of cases, the data may show slight variations, but the overall trend will remain consistent.
- Unique Visitors (All Traffic filter) - In 25% of cases, there will be changes in the trendline. Please be aware of this when performing year-over-year analysis.
- Other Traffic & Engagement metrics will not be impacted by the new data sources.
It’s important to note that any Segment Analysis query performed through the platform or API after the data version change will utilize updated data across the entire 37 months of available historical data. This applies to all segment queries, regardless of when the segment was created.
Impact on Conversion Analysis
Website Visits will be updated to reflect the new estimations shown in Website Analysis. Please note that Conversion Rate and Converted Visits will not be retroactively adjusted. As a result, starting in July 2024, users may notice a slight data shift of up to 5%.
Impact on PPC Spend Metric
You may notice discrepancies when comparing July and August data to previous months. We are currently working to resolve this and will backdate the data to January 2023 for consistency. The update, including historical data, will be available by September 2024 and will apply to both the platform and API.
For further questions, please reach out to your dedicated account manager or product support
Which features/datasets are NOT impacted by the update?
-
Company Data (Firmographics)
-
Technologies (Technographics)
Still have questions? We’re happy to help! Reach out to your account manager or submit a request through our contact form.
Frequently Asked Questions
Q: Which Countries will be impacted?
A: The new data version will impact all available countries with additional domain coverage and improved accuracy.
Q: Where will the greatest impact be seen - on large websites or smaller websites?
A: In terms of coverage, the data for smaller sites will see the main impact. Accuracy improvements will be more evident with medium websites (>50k monthly visits) and some larger websites (>1M monthly visits).
Q: Has your data methodology changed?
A: No. Our data methodology remains the same, but we improved every component of the methodology to provide better estimation capabilities and improved coverage on both desktop and mobile.
Q. Will I be able to see the old data after you apply the new version?
A. Yes, we will keep the old version of our data (only core web metrics and marketing channels as listed in this article) accessible via Batch API for a period of 3 months after the new version is released, and then we’ll no longer retain it.
Q: Will SEO and PPC data change?
A: Search 3.0 powers our keyword research tools, and this data will be impacted by this version change. As Search 3.0 is a blend of SERP data too, the impact of this change will be minimal, though accuracy of search volume and click metrics will improve.
Q: How will the data version impact segments being pulled through the API?
A: With each API pull, users will receive data from the latest data version. Please note that the first request after a data version update may experience a slight delay.
Q: A challenge we’ve had with Segment Analysis is not picking up enough data because the traffic size is too small - how is this changing with the new data?
A: Segments will have wider coverage and will be able to pick up on smaller sites.
Comments
Please sign in to leave a comment.