Wikimedia traffic analyzed

For many years wikimedians have been producers of knowledge largely for consumption by the general public, right? Just for a change let us turn the tables, let our consumers be producers, in order to enlighten us by their very act of consumption. Like a raytracer that paints a landscape by reverting time, by sending light rays back to their point of origin, let us draw the contours of our reader base by tracing back our visitors to where they came from, and studying with what means they arrived (an approach that others with a more prosaic inclination might call traffic analysis).

Wikimedia receives so much traffic that it becomes an embarassment of riches; so much traffic data that we can only log it for any amount of time in condensed form, or by discarding most of it, in order to store just a random sample. In fact a 1:1000 sample of the squid log (squid ~ public interface server) where 999 out of every 1000 server requests are discarded still provides ample data for detailed analysis.

Allow me to present a new set of reports that can enlighten us about our user base:

  • Requests by target and mime type Fact: Wikimedia’s upload server (images and other binaries) handles twice as many requests as even our largest Wikipedia.
  • Requests by origin Fact: Yahoo send us over twice as many image requests as Google.
  • Request methods Fact for techies: almost 4 billion daily server requests are divided almost equally between hits and misses.
  • Requested scripts and skins Fact: index.php with action=edit is issued roughly 6 million times a day, and action=submit only 1 million times a day, so it seems only 1 in 6 edit requests leads to a database update (previews?).
  • A survey of operating systems, brands and versions Fact: less than 1.5% of our traffic is issued from a Linux machine. With almost 0.9% of traffic iPhone is a remarkable runner-up, given its short existence.
  • A survey of browsers and other clients, brands and versions Fact: not suprisingly Microsoft Explorer (MSIE) is still dominant in the (or our) browser market with 58%, but Firefox does remarkably well with almost 31%.
  • An overview of all kinds of traffic where Google plays a part, either by referring people to a Wikimedia project or by crawling Wikimedia pages themselves, to feed their search engines. Fact: Google is somehow involved in half or our daily external page requests (external = the request does not originate from another Wikimedia page)

A word of caution, to counter balance the light hearted introduction above: although we reach a large share of the general public, we should be cautious with extrapolating the figures in these reports and interpret them as absolute statements of fact about the web at large.

This entry was posted in Wikimedia View(er)s, Wikistats Reports. Bookmark the permalink.

13 Responses to Wikimedia traffic analyzed

  1. Sky2042 says:

    It seems IE 6.0 is missing under the browser stats – non mobile version list… Though the stats are still there (or rather, the hole they should be in is still there).

  2. Erik says:

    @Sky2042 thanks for your quick feedback

    Is fixed the count results file contained:

    -,MSIE 6.0,6675301,17.62%

    -,MSIE/6.0,1,0.00%

    The second entry was normalized for presentation into

    -,MSIE 6.0,1,0.00%

    and thus overwrote the first and with 1 request failed the notoriety filter 🙂

  3. André Engels says:

    The action=edit vs. action=submit difference is not caused by previews, those actually have an opposite effect, because previews also use action=submit. So they must be from people opening the edit screen but breaking off _before_ reviewing (or from bots and similar, perhaps).

  4. Splarka says:

    Query: for the browser agent stats, do you factor in the extra queries required by MSIE users to download IE60Fixes.css/IE70Fixes.css/etc? Basically anything in /skins/* that ends in Fixes.css or .js

  5. Erik says:

    @Splarka no actually not, good point. For browser stats all server requests are treated equal as singular requests. I may have to rethink that part.

    But in this case I don’t think it would make much difference: browser stats page shows 2,187,984,1000 MSIE requests per day and scripts stats page shows just 23,212,000 ie70fixes and 10,759,000 ie60fixes requests. Alltogether mayby 2% for files you mention. No doubt because of browser cache.

  6. Andrew Lih says:

    Erik, great work as always. And to Andre Engels comment…

    We might have stumbled onto the explanation at Wiki Wednesday in San Francisco last night when we talked about usability.

    We were talking about your 1 million page saves a day, and then we talked about usability with Danny@wikia. Most likely, many of those 5 million “abandoned editing operations” were in fact people who saw the WikiMarkup, went “eek!” and were intimidated by the prospect of changing things?

    -Andrew

  7. Waldir says:

    @André Engels good point. see also http://twitter.com/MarkDilley/status/1436317684, refers that the save rate more than doubles with WYSIWYG (i.e, some people may to be scared away by complex wiki markup)

  8. Erik says:

    Andrew, after Andre’s explanation I cannot think of another than yours, so maybe we got a great metric here to monitor usability improvements. 🙂

    By the way did you see Wikipedia Weekly stats on previous post?

  9. Brya says:

    Actually there are reasons to hit the edit button, other than an intent to edit, such as 1) to take a look at the underlying text (“how is this coded?”) or 2) for reading in a different font. Also, 3) it is a required step in making a copy.

  10. Erik says:

    @Brya, agree. Also to list of templates. Actually there are (external) bots that only access in edit mode. I will follow up on this.

  11. Pingback: Infodisiac » Wikipedia page views, a global perspective (2)

  12. Nemo says:

    Image request from Yahoo have dramatically decreased: perhaps it was replaced by Bing?

    Also, at some point (September 2009? http://lists.wikimedia.org/pipermail/pywikipedia-l/2009-September/005998.html ) screencraping bots have been forced to use API. From September to December 2009, action=edit decrease from 7 to 5 million and submit from 1 to 0.7: http://stats.wikimedia.org/archive/squid_reports/2009-09/SquidReportScripts.htm http://stats.wikimedia.org/archive/squid_reports/2009-12/SquidReportScripts.htm
    Now we have 5 million edit and 0.65 million submit http://stats.wikimedia.org/wikimedia/squids/SquidReportScripts.htm , so it’s more or less the same.

  13. Pingback: On edits and submits | Infodisiac

Leave a Reply

Your email address will not be published. Required fields are marked *