Page views anomaly in October resolved

For about 4 weeks, starting September 28, page view stats for Wikimedia wikis were highly inflated. Close inspection revealed this was an artifact caused by a software change.

New code for page banner handling generated extra requests to the server in a format which
made these requests be seen as extra page views. This has been fixed.

Fortunately these inflated page view counts could be patched by substracting page views for the identified non-pages.

pageviewswikipediachange2

pageviewswikipediachangetable2

See also up to date Wikipedia page views table.

Trend over 3 years

The line plot below (used to analyze above anomaly) shows page views on major Wikipedias since we started collecting data.

Two observations:

1) The red line shows the peak in page views for the German Wikipedia caused by the public outcry over a court order which temporarily blocked access to the German Wikipedia or at least tried to do so (in fact only the ‘wikipedia.de’ alias was blocked).

2) The plot also shows when the previous anomaly occurred and for how long, and more or less how much data got lost. In the full sized plot (click image) you can clearly see how the weekly pattern (lower traffic in weekends) got disrupted from late November 2009 till mid July 2010. In this case only results from April till July could be patched from still available logs.

pageviewswikipedia-2008-20102

Side story

Incidentally you can also see from the page view stats page how Chinese page views have dropped considerably from May 2010.

[update] Members of the Chinese Wikipedia community explained to me that the probable cause for this is a modification in access guidelines for search engines (robots.txt), dating from May, which meant to better distinguish zh-tw and zh-cn link versions. Several weeks ago these guidelines were re-examined and updated. This should have a positive effect on traffic volume in the near future.

pageviewswikipediachinese2

This entry was posted in Wikimedia View(er)s, Wikistats Production, Wikistats Reports. Bookmark the permalink.

5 Responses to Page views anomaly in October resolved

  1. Jörn says:

    Hi Erik,
    nice stats. As this post is about view anomaly, maybe this question fits:
    As a side effort for my Masterthesis I needed the Top viewed Wikipedia pages. Even though not published (yet), I found a weird phenomenon:
    I noticed that since June there has been a sudden increase of requests for pages “initial” and “Initial” in the English Wikipedia:
    Month (initial, Initial views):
    May (259, 551) (I didn’t forget a K or M here!)
    June (307K, 270K)
    July (148K, 132K)
    August (1.5M, 1.3M)
    September (4.9M, 4.6M)
    October (5.5M, 5.2M)
    November (4.5M, 4.3M)

    Folks in the wikipedia-en irc channel also couldn’t explain it, but told me you might be interested and might have an idea.

    To put the numbers here into relation: the “initial” page now constantly is in the Top10 viewed pages (after cleaning out Main_Page, errorpage, etc).

    My current guess is some browser feature (such as a quick-search) or search engine referring to these pages as a default.
    I found some references to “Initial_D_(film)”, but while initial and Initial sky-rocket, the film stays pretty low. Perhaps you have more ideas / insight / access to logs and can shade some light into the dark?

    Jörn

    PS: To get these figures I used Domas Mituzas raw access logs from http://dammit.lt/wikistats

  2. Jörn says:

    erratum:
    July (1.5M, 1.3M)
    (sorry, missed a digit there.)

  3. Erik says:

    Jörn, thanks for your interesting observation. I wonder how both requests for initial and Initial can have such similar view counts. Mediawiki ignores case on first letter, so this in not about redirection. Why would a browser feature or search engine refer to variant url’s in equal amounts? To me this only adds to the mystery.

  4. Jörn says:

    Hi Erik,
    the similar view count is easy to explain: the raw access logs count every http-get. As every access to initial is redirected to Initial this indicates that actually we’re dealing with a lot requests to initial.

    But the puzzling fact again is: why is a large number of these redirects not loaded?

  5. Jörn says:

    Well, got news:
    Domas investigated: these hits are caused by: http://dj-sa.com/ads/forex/forex.htm according to the referrers. In the source of that page you’ll find a style attribute referring to http://en.wikipedia.org/wiki/initial . We don’t know why.
    The URL indicates that it’s some kind of advertisement, probably for the chat hosted on that domain.

    Lessons learned: don’t believe in access logs 😉

Leave a Reply

Your email address will not be published. Required fields are marked *