Wikipedia page reads, breakdown by region

As we know in some regions of the world people have more easily access to Wikipedia than in others. The majority of reads come from the so called Global North (*). Now is this unbalance between North and South diminishing? Not so easy to get the needle moving, as a whole range of regional differences come into play: average internet speed and latency differ widely between regions, in some regions internet access is almost ubiquitous, at any time or place, at home and at work, via desktop/mobile/tablet, yet in large parts of the world many can only access internet via shared computers (schools, cyber cafes). The saying goes that the second billion internet users will use a mobile phone as main access point, a true game changer. I hope and expect Wikipedia Zero will vastly speed up this development.

WMF’s monthly report card shows trends per region on reach and unique visitors (data from comScore), but those metrics are only part of the story. For one comScore uses indirect measurement (as a consequence of strict WMF privacy policy). Also, a metric like unique visitors does not weigh in total activity per user (one page view per month or thousand both count as one UV).

Fortunately we can also count page views directly, and break these down by region, target wiki, mobile or main site. Wikistats has many reports on this, e.g. page views/edits per region, page views per platform and target wiki.

Here is another set of charts. This time emphasis is not on absolute trends, but on relative content consumption per region. Again focusing on: do we see a shift in global distribution of page reads?

Please remember mobile in the charts is about traffic to mobile site, not traffic from mobile devices! A considerable part of web access from phone and tablets is to the main site.

 

The chart above shows how Africa still has a long way to go to gain equal access to internet: with about 15% of the worlds population, 1.4 % of Wikipedia page views is low, but still one and a half as much as 3 years ago.

 

Asia

Asia

Australia

Central America

Europe

North America

Oceania

South America

North vs South
Main site vs Mobile site

North vs South

Sub Saharan Africa


A page request is defined here as any request for html content (mime mode ‘text/html’).  So it includes non existing pages (e.g. 404′s), and maybe other cruft. Unlike in some other reports we do discern between human and bot page requests here. (**)

Data source for all these charts is one file, extracted from the same 1:1000 sampled server logs we already use for other reports. There is a rudimentary perl script (***) to extract cross sections from these data, and produce a csv file ready for import into a spreadsheet, so as to produce charts like above. Over time we may feed some of the results into our monthly report card.

Of course our upcoming data beast Kraken will take care of data collecting soon, with a better resolution than ever, more flexible breakdowns, and faster available. So consider this data stream not strategic, rather putting legacy to good use to fill in a void.  

Disclaimer: some of the anomalies that occurred over time in our data have been filtered out (those data points are blanked). And we had some serious data collecting mishaps over the years. For this reason data before 2010 are omitted altogether (****).

* = Not same as geographical north, in fact WMF uses its own breakdown on N vs S
** = We will overhaul definitions and move to standardized metrics, which by itself causes a new challenge: to somehow integrate old and new metrics in one timeline, until the new metrics have acquired enough history.
*** = Low on documentation, lots of room for improvement on filter and aggregation  options
****= We have chosen to include first half of 2010 despite major server under-reporting we faced at that time: because these charts are about relative rather than absolute traffic numbers, despite the possibility that server overload affected some times of the day and hence some regions somewhat more than others.

 

Posted in Nice Charts, Wikimedia View(er)s, Wikistats Reports | 1 Comment

Evernote ignores security flaw for months

Usually this blog is about Wikimedia statistics. Today I need to digress. My favorite cross-platform archival system is the hugely popular Evernote. I use Evernote for all kinds of data and images, and love the product. So much that I stored all kinds of (mildly) personal data. No longer.

As a paying subscriber I get a few goodies, like setting a pin code. In fact apart from the higher upload limit there are just a few of these goodies so this pin code thing is prominently featured on their sign up page, especially on iPad.

Evernote subscription benefits

In July I stumbled over a security flaw, and reported it first to the help desk (after all as a paid subscriber I get ‘top priority support’). They confirmed the bug quickly and said they reported it to the engineers. A lively debate on their support site followed, with an Evernote employee participating. 3.5 months, several small updates and one major new release later the bug still stands.

So what is it about? On iOS devices one can circumvent the pin code simple and fast. All that is needed is to remove the app and download it again. Takes less than a minute. Since iOS6 no Apple password is needed for updates. Remember Everything Evernote helpfully suggests to reuse existing account but forgets there was a pin code set. Oops!

Evernote dialog box

 

Evernote employee responds as follows (paraphrasing, see exact response here): it’s Apple’s fault: they changed their system, and the iOS device has its own pin code which one needs to bypass first, also Evernote supports encryption, that makes this less of an issue.

Evernote, if you rely on the general iOS login code why did you offer an extra pincode in the first place, and brag about it? Maybe some users prefer a short device login code to keep their daily news and amusement within easy reach, but treat Evernote as their trusted vault and use a more solid pin code there. Also encryption support is minimal, only for plain text, not for scans, pdf’s etc.

As much as I love your product, shouldn’t you care for your client’s security before adding new goodies? How difficult can it be to either disable import of stored account data, or remember the pin code as well?!

Posted in uncategorized | 2 Comments

Growth in article count at largest 20 Wikipedias

There is a lot of variation in article growth rate among mature Wikipedias. Growth slows down at some, is steady or even accelerates at others. Many have tried to model these trends. I have little to offer in explanation but can offer an at a glance overview of article growth trends for top wikis. You can switch between small and large charts, and either look at growth trends alone or match those visually with overall editor activity per wiki.



These charts existed for a while, via wikistats portal you can find similar charts for all other wikis. Navigate to sitemap pages, e.g. for wikipedia, click link Summary for any wiki. More links at bottom of sitemap to grouped summary pages per project.

New: trends are now broken down by type of editor: registered editors, anonymous editors, bots.

Some observations

Manual article growth has been slowing down on English Wikipedia from 2007, but seems to stabilize in last two years.

Growth in articles on German, French, Italian, and Polish articles has been pretty stable for many years.

Both observations seem relevant and somewhat opposite to the low-hanging fruit hypothesis, as all of these wikis can be considered fully mature Wikipedias. More about this hypothesis here and here.

Unrecognized bots?

Usually spikes in editor activity are caused by bots. A few charts show spikes in article creation rate for registered users. My hunch is these are anomalies, caused by bots not being recognized by name (roughly meaning they do not contain ‘bot’ in name) and not being registered as bot either (which I believe on many wikis is mandatory).

Any feedback on bots that fall in this category is very welcome. If some of these bots are registered after all, next month of charts will reflect this, for all history. Likely candidates for mis-attribution are Spanish spike in 2011, Chinese in 2012, Vietnamese in 2011/2012, Norwegian in 2008, Czech in 2010.

Portuguese Wikipedia

One particular issue prompted this overview, so let me ask: growth in new articles on the Portuguese Wikipedia dropped significantly early in 2011 (it seems to pick up again recently). The number of active editors did not change much in recent years. Any thoughts on this in general? Also it seems counting methodology changed (not on wikistats), or at least was questioned,  in March, according to this discussion (Google translates anexos as attachments (?))

Thanks in advance for any insights into these trends.

Upd: for further analysis you can download data files (csv)

 

Posted in Wikimedia Edit(or)s | 11 Comments

Wikistats editor counts are broken (upd recovery complete)

Wikistats editor counts are too low for some languages, for all reported months. The issue has made it into the German Wikipedia’s “Kurier” newsletter (18.1) , and is discussed at this page.

First, there has not been a definition change or re-evaluation of editor counts. Current counts are wrong. Let me explain why all months display lower counts: on every monthly run of wikistats nearly all data are regenerated from the dumps. This is on purpose. This way new functionality and (rare) bug fixes apply to all months since the creation of the wiki. The other side is that when stats scripts or dumps are broken reports will show wrong data for all months, which is what happened now (a hybrid situation with data retention and update runs would complicate processing further and easily be a source of errors itself). I put up a notice on the stats pages.

I am investigating. So far test runs have been inconclusive. My first priority is to find out whether this is caused by a change in the dumps, or the scripts, or a config change on the server. The latter is most likely. In the past month Wikistats data and script were moved to a new server, with a number of modifications to overall configuration and shell scripts.

I will keep you posted on any new findings. My apologies for the inconvenience and confusion caused by this mishap.

Update June 17

The problem has been analyzed and fixed. A few weeks ago, during substantial overhaul on stats scripts (to add new metrics) an error had creeped in, which was not recognized during tests: as a result far too many articles were flagged as redirects, resulting in far too low counts for articles and editors: redirect pages are not counted as articles, and for consistency, not taken into accounts for edits and editors (*).

Now it still will take about 7-10 days to reprocess all dumps. Slightly delayed further by scheduled maintenance on the stats server, which is still ongoing.

Thanks so much for your patience.

*: This is of itself a point that could be debated, but this is how it works or is supposed to work. For one it prevents skewing of edits per article metric.

Update June 26

All 800+ wikis have been updated now.

 

 

 

 

 

 

Posted in uncategorized | 1 Comment

Wikimedia Usage Share By Browser

The following breakdowns of non-mobile and mobile traffic are based on our squid logs.

Note how share of mobile usage peaks every year around Christmas.
Note how mobile here is to be taken as ‘traffic from mobile devices’ not as ‘traffic to our mobile site’.

See also the Wikipedia article on usage share by browser.

Posted in uncategorized | 1 Comment