Wikistats editor counts are broken (upd recovery complete)

Wikistats editor counts are too low for some languages, for all reported months. The issue has made it into the German Wikipedia’s “Kurier” newsletter (18.1) , and is discussed at this page.

First, there has not been a definition change or re-evaluation of editor counts. Current counts are wrong. Let me explain why all months display lower counts: on every monthly run of wikistats nearly all data are regenerated from the dumps. This is on purpose. This way new functionality and (rare) bug fixes apply to all months since the creation of the wiki. The other side is that when stats scripts or dumps are broken reports will show wrong data for all months, which is what happened now (a hybrid situation with data retention and update runs would complicate processing further and easily be a source of errors itself). I put up a notice on the stats pages.

I am investigating. So far test runs have been inconclusive. My first priority is to find out whether this is caused by a change in the dumps, or the scripts, or a config change on the server. The latter is most likely. In the past month Wikistats data and script were moved to a new server, with a number of modifications to overall configuration and shell scripts.

I will keep you posted on any new findings. My apologies for the inconvenience and confusion caused by this mishap.

Update June 17

The problem has been analyzed and fixed. A few weeks ago, during substantial overhaul on stats scripts (to add new metrics) an error had creeped in, which was not recognized during tests: as a result far too many articles were flagged as redirects, resulting in far too low counts for articles and editors: redirect pages are not counted as articles, and for consistency, not taken into accounts for edits and editors (*).

Now it still will take about 7-10 days to reprocess all dumps. Slightly delayed further by scheduled maintenance on the stats server, which is still ongoing.

Thanks so much for your patience.

*: This is of itself a point that could be debated, but this is how it works or is supposed to work. For one it prevents skewing of edits per article metric.

Update June 26

All 800+ wikis have been updated now.







Posted in uncategorized | 1 Comment

Wikimedia Usage Share By Browser

The following breakdowns of non-mobile and mobile traffic are based on our squid logs.

Note how share of mobile usage peaks every year around Christmas.
Note how mobile here is to be taken as ‘traffic from mobile devices’ not as ‘traffic to our mobile site’.

See also the Wikipedia article on usage share by browser.

Posted in uncategorized | 1 Comment

Wikipedia is still Wikimedia’s largest project

Since early 2008 we publish traffic stats per Wikimedia wiki in detailed reports.

In the past 4 years the relative proportion of page requests per project did not change much. All projects received more traffic, more or less in proportion to their earlier size. Wikipedia share rose from 95.7% in July 2008 to 96.4% in January 2012. Most of this relative gain was at the expense of Wiktionary which fell from 1.5% in July 2008 to 1.0% in January 2012.

Here are updated diagrams about the breakdown of page requests per project, first in absolute numbers, second in share of overall total:


Click image for slightly larger version




Posted in uncategorized | 3 Comments

Wikipedia Readers

Two years ago Wikimedia board member Stu West published the first version of the map above. I was asked to produce an updated version for presentations.

Countries are colored by monthly Wikipedia page views per internet user (2011 Q4). Overlayed on the map are monthly unique visitors per region (Dec 2011).

Data used:
Country coloring: Wikipedia lists internet users per country & population per country
+ page views per country -> aggregate data
Unique visitors per region: data kindly supplied by internet research company comScore


1) comScore publishes one UV count for Middle East & Africa combined (40 M in map above).
2) In many countries the number of internet users grows rapidly from year to year. Not all data in the Wikipedia lists are quite up to date. This can influence the shown ratio significantly. I hope some day we will have a more up to date data feed from e.g. World Bank, which publishes tons of metric via a very flexible API.
3) Like Stu I used this neat mapping tool
4) With current coloring it is a bit difficult to see the which countries have 15 views per internet user, which have 25. No large countries rise above 17 views per user. Check for details page views per country.

Posted in Nice Charts | 3 Comments

Some SOPA blackout stats

Here are a few plots and data I collected for the SOPA blackout from 18 January. Wikimedia Foundation is working on a wider coverage of the event.

Note that on the second day hourly hits to Special:CongressLookup page exceeded hits to SOPA_initiative/Learn_More. Probably part of the demand came from external referrers.

The huge dip in CongressLookup hits on the first day was during hours where most US citizens were asleep. Of course the CongressLookup page did not make much sense for non-US citizens.

Here is a list of most visited SOPA related pages on different Wikipedias during the blackout (24 hrs only).


Overall page requests to English Wikipedia during the blackout was not particularly high or low.

Posted in Nice Charts, Wikimedia View(er)s | Leave a comment

Wikipedia views visualized

In May 2011 I presented a new visualization tool which can playback all 400,000 Wikipedia edits for a random day, and show where and when these edits occurred, and for which language wiki. The tool also shows static maps for at-a-glance view of the global distribution of edits for a full day.

Recently I added two new maps:

Page views

Global distribution of page views on all Wikipedias combined This map (in the tool press 4) shows the global distribution of page views for all Wikipedia’s combined. Clearly certain parts of the world are better reached than others. No surprises here, but with this map you can examine this disparity in considerable detail.

Global page views and population density - split screen Global page views and population density - split screen Click for large version

The data shown are normalized for area, which allows direct comparison with a population density map (by SEDAC). In split screen mode you can see both page views and population density side by side: press ‘d’ (for density) repeatedly.

Mobile share

This map (press 5) shows the percentage of requests originating from mobile devices.

Percentage of page views from mobile devices

Percentage of page views from mobile devices

Note how these requests do not have to be directed to our mobile site. In fact roughly half of these requests go to the main site. As the coloring shows this percentage is quite different for different language projects. The English Wikipedia receives a far larger share of traffic via mobile devices than most other language projects. If you zoom in on Europe you can see clearly how UK stands out against e.g. Germany and France, where economic conditions are roughly comparable.

Population density 2010 - SEDAC Global distribution of edits on English Wikipedia Global distribution of page views on all Wikipedias combined Global page views and population density - split screen Global page views and population density - split screen
Extra large screenshots

>> Animation <<


Detection of mobile device is done by scanning for certain keywords in the agent string as contained in the meta data which our servers receive for every request. Page views are per square kilometer. For each page view log record from our 1:1000 sampled log the ip-address is translated into latitude and longitude, using the free Maxmind database. Views are accumulated for a whole month per small region (here 1/8 degree squared), averaged per day, corrected for projection distortion (any projection of a 3D globe to a 2D surface produces substantive distortions) and colored for intensity. Cycle with ‘d’ between page views [V], split screen [V|D], split screen opposite [D|V], population density [D]. Known issue: both page views and population density screens do not pan in sync, therefore no split screen at larger zoom levels. But you can still alternate between both full screen views. Compare global share of views from a mobile device as shown here (~12%) with share of mobile views to our mobile site (~6%) shown in our monthly report (column Wikipedia Mobile, top-right percentage).

Posted in Nice Charts, Wikimedia View(er)s | 1 Comment

Variations on the English Wikipedia Main Page

The WMF servers receive a lot of unserviceable page requests. To illustrate this, with what most likely is an extreme example, here is a list of page requests received in July 2011, which target any article with a title starting with ‘Main_Page’. Clearly most faulty  requests come from buggy software, not directly from users.

Posted in Wikimedia View(er)s | Leave a comment

Saving lifetimes

One days Jobs came into the cubicle of Larry Kenyon, an engineer who was working on the Macintosh operating system, and complained that it was taking too long to boot up. Kenyon started to explain, but Job cut him off. “If it could save a person’s life, would you find a way to shave ten seconds off the boot time?” he asked. Kenyon allowed that he probably could. Jobs went to a whiteboard and showed that if there were five million people using the Mac, and it took ten seconds extra to turn it on every day, that added up to three hundred million or so hours per year that people would save, which was the equivalent of at least one hundred lifetimes saved per year [1]. “Larry was suitably impressed, and a few weeks later he came back and booted up twenty-eight seconds faster” (‘Steve Jobs’, by Walter Isaacson, page 123).

Personally I still blame Microsoft for not introducing thousands separators into the dir output until MS-DOS 6. With hundreds of millions of users in the early 90’s, every quarter of a second wasted, several times a day, to read a 8-9 digit file size, added up to a comparable waste of lifetimes as above.

How does this translate to Wikimedia? With over 15 billion page views each month [2] each 1/10 second which is shaved off from page loading time saves humanity 1,500,000,000 seconds each month, which is very close to the waking hours spent by a 70 year old person. (70*365*16*3600). So the awesome dedication of the small Wikimedia operations team (staff AND volunteers) did not only save Wikimedia tons of hardware. It saves tens to hundreds lifetimes a year!

Of course all the ingenuity in the world does only go so far to accommodate Wikimedia’s ever increasing traffic. That’s why Wikimedia’s annual fundraiser, which is about to launch, is so vital to keep access to all our content fast, all over the world.


1 This is actually an overstatement: if every user saves 10 seconds per day this is roughly an hour per year. A 70 year old has been awake for 400,000 hours. Five million people saving an hour per year equates to 12 lifetimes.

2 Total file requests (images, scripts, etc) is even an order of magnitude larger (see image).

Posted in uncategorized | Leave a comment

Summary Reports for all Wikimedia Wikis

These weeks I am performing long overdue maintenance on Wikistats. This includes fixing bugs (e.g. Wikibooks en Wikiversity reports were broken for many months). This also includes automation, removing manual steps from the production process. I am also making good a long standing promise to publish summaries for all Wikimedia wikis.

These summaries were originally introduced for the monthly India Report Card, with content and layout suggestions from Wikimedia researcher Mani Pande. Hopefully they serve a wide audience. Hopefully they help to quickly assess fundamentals for any Wikimedia wiki, without getting avalanched by too many details from unwieldy tables.

Where to find these summaries

There are sets of summaries, also known as report cards, per project: Wikipedia, Wiktionary, Wikibooks, Wikinews, Wikiquote, Wikiversity, Wikisource and Other Projects.

For Wikipedia there are also sets of summaries per region: Africa, America’s, Asia, Europe, India, Oceania, and also for Artificial Languages.

Finally for every wiki there is also a new ‘Summary’ link in the project sitemaps: Wikipedia, Wiktionary, Wikibooks, Wikinews, Wikiquote, Wikiversity, Wikisource and Other Projects.

I am open to suggestions what to include further into these summaries. Yet their very purpose is to offer a quick at a glance overview, so this puts some restraints on which information to add.

Update 27 Sep: I added extra charts and metrics for Commons.


Posted in Nice Charts, Wikimedia Edit(or)s, Wikimedia View(er)s, Wikistats Reports | 6 Comments

Wikipedia Mobile Traffic II

Three months ago I blogged about mobile traffic to Wikipedia. I explained how we track two different metrics: on one hand traffic to our mobile site, on the other hand traffic from mobile devices (as detected from the so called agent string).

While preparing my presentation for Wikimania Haifa, which shows a visualization of  global page views (more on that soon),  it dawned on me that the chart I presented in that earlier blog actually shows incomparable metrics. They are not wrong, but a comparison of apples and oranges.

Above is the updated plot. Both existing lines are unchanged. I added a new line.

The issue is this: the blue line shows the ratio of page views to our mobile site, based on page views only, aka html requests. At the present our mobile site serves 6% of our page requests. (BTW read more on recent plans to redirect even more traffic to our mobile site).

The red line shows the ratio of requests that originate from a mobile device (to any of our sites), based on all traffic: not only html requests but also images and script files. There is a caveat here: many handheld clients (app/browser) do not retrieve a full Wikipedia page, but only the html file, and just a few of the images and scripts files. This skews the ratio, and not a little bit!

The new purple line shows the ratio of page views from handheld devices, disregarding all non-html file types.  The difference is striking. It turns out at least 15% of our page views comes from mobile devices.  I say at least as we do not factor in API calls yet, my colleague Nimish Gautam thinks this might further drive the ratio upward (to be continued).

It is not possible to generate the new metric for traffic activity older than 3 months. WMF only keeps request logs for a short period due to privacy considerations. Although somewhat confusing without this explanation, I will keep the red line for a while, to allow for long term trend assesments.




Posted in uncategorized | 3 Comments