Major error in per country page views stats corrected

change1

Before fix: Sep 09-Jul 10 / After fix   : Nov 09 – Oct 10

Unfortunately I need to announce yet another major correction on page views stats. This time caused by a bug in wikistats scripts. Reports that show page views per region and per language were meant to be exclusive of automated requests, by crawlers, spiders, bots. Due to a bug these automated requests were not excluded on recent reports. Thanks to Ziko for detecting this. My apologies for any confusion caused.

Affected reports are
Page Views Per Country Overview/Breakdown/Trends
Page Views Per Wikipedia Language Breakdown

Reports on page edits were hardly affected: very few crawlers request edit pages.

Most crawlers operate from the United States. Therefore the share of page visits from the US was considerably overreported. Not only that. Also the share of crawler requests versus total requests apparently has risen in the last year, thus skewing the quarterly trend reports as well.

Here are a few examples of the effects of the correction:

North vs South

Ratio of page views  for North and South went from 83% – 17% before fix to 80% – 20% now.

Europe vs North America

The imbalance between views and edits from Europe vs North America did shrink. Ratio of page views  for Europe and North America went from 36% – 39% before the fix to 40% – 32% now. As said, share of edits hardly changed: Europe 52%, N-AM 24%.

Breakdown per country

Breakdown of page views from the US was much affected, other countries much less or hardly at all. Share of page views from the US to the English Wikipedia went from 78% to 92%.

Quarterly trends

Before the correction a few countries showed a significant rise in page views to the English Wikipedia from quarter to quarter. This growth disappeared or lessened considerably after the fix, again indicating that the share of crawlers requests vs total requests is growing over time.

Breakdown per language

This report has been most affected: for many Wikipedia’s readership from the US moved severals steps down in rank. Example of a massive shift: before the fix 21% of page views for Hungarian Wikipedia came from US, after the fix a mere 0.6%.

Note: For comparison faulty reports are still online.

This entry was posted in Wikimedia View(er)s, Wikistats Production, Wikistats Reports. Bookmark the permalink.

4 Responses to Major error in per country page views stats corrected

  1. Ziko van Dijk says:

    Hi Erik,
    Great news – I am always amazed about the multitude of factors that have to be considered.

    Alas, it means that I may have to adjust my maps. 🙂 Mostly, I suppose, this one about Europe.

    I will read the new numbers with big interest.

    Kind regards
    Ziko

    http://zikoblog.wordpress.com/2010/10/13/the-wikipedia-country-and-language-version-map-of-europe/

  2. Nemo says:

    Another strange thing Ilario noticed a month ago: Netherlands was reported to generate 4,9 % of edits to Italian Wikipedia. versus 0,9 % from Switzerland. Now it’s below 0,5 % (after USA, Germany and France).
    http://stats.wikimedia.org/wikimedia/squids/SquidReportPageEditsPerLanguageBreakdown.htm
    http://stats.wikimedia.org/archive/squid_reports/2010-10-err/SquidReportPageEditsPerLanguageBreakdown.htm

  3. Erik says:

    @Nemo: although few, there are some crawlers that harvest page content in edit mode. One reason could be to collect infobox parameters, or template names. Actually the best way to do this is to access page in raw mode.

  4. Pingback: Wikimedia Blog : Bei den Deutschen funktioniert sogar die Wikipedia!

Leave a Reply

Your email address will not be published. Required fields are marked *