For many years wikimedians have been producers of knowledge largely for consumption by the general public, right? Just for a change let us turn the tables, let our consumers be producers, in order to enlighten us by their very act of consumption. Like a raytracer that paints a landscape by reverting time, by sending light rays back to their point of origin, let us draw the contours of our reader base by tracing back our visitors to where they came from, and studying with what means they arrived (an approach that others with a more prosaic inclination might call traffic analysis).
Wikimedia receives so much traffic that it becomes an embarassment of riches; so much traffic data that we can only log it for any amount of time in condensed form, or by discarding most of it, in order to store just a random sample. In fact a 1:1000 sample of the squid log (squid ~ public interface server) where 999 out of every 1000 server requests are discarded still provides ample data for detailed analysis.
Allow me to present a new set of reports that can enlighten us about our user base:
- Requests by target and mime type Fact: Wikimedia’s upload server (images and other binaries) handles twice as many requests as even our largest Wikipedia.
- Requests by origin Fact: Yahoo send us over twice as many image requests as Google.
- Request methods Fact for techies: almost 4 billion daily server requests are divided almost equally between hits and misses.
- Requested scripts and skins Fact: index.php with action=edit is issued roughly 6 million times a day, and action=submit only 1 million times a day, so it seems only 1 in 6 edit requests leads to a database update (previews?).
- A survey of operating systems, brands and versions Fact: less than 1.5% of our traffic is issued from a Linux machine. With almost 0.9% of traffic iPhone is a remarkable runner-up, given its short existence.
- A survey of browsers and other clients, brands and versions Fact: not suprisingly Microsoft Explorer (MSIE) is still dominant in the (or our) browser market with 58%, but Firefox does remarkably well with almost 31%.
- An overview of all kinds of traffic where Google plays a part, either by referring people to a Wikimedia project or by crawling Wikimedia pages themselves, to feed their search engines. Fact: Google is somehow involved in half or our daily external page requests (external = the request does not originate from another Wikimedia page)
A word of caution, to counter balance the light hearted introduction above: although we reach a large share of the general public, we should be cautious with extrapolating the figures in these reports and interpret them as absolute statements of fact about the web at large.