Wiki Loves Monuments 2013

This gallery contains 5 photos.

Here are some charts on the breakdown by country of contributions and contributors to Wiki Loves Monuments 2013. Better late than never. I meant to publish this together with retention stats, but those are still in the pipeline, and may … Continue reading

More Galleries | 2 Comments

Reassessment of active editors

Yesterday I discovered a bug in wikistats which affects our editor counts for the last 2 years.

Wikistats does flag users as ‘anonymous’ based on pattern recognition, rather than relying on the <ip> tag. Reason: many anons with other pattern than just 4 numeric triplets (e.g. ended up in the <username> tag in early years). To my dismay I realized yesterday this recognition code was never adapted for ipv6 addresses. Hence those anonymous ipv6 addresses were counted as normal registered users in many reports. Especially for the last 12 month this visibly affected our totals for active editors (5+ edits a month), hardly so for very active editors (100+ edits a month).

Today I fixed this for our report on total unique (aka deduplicated) registered users for all Wikimedia wikis combined. The chart below show how much counts were lowered because of this.  Other reports will be fixed after June’s dump processing cycle.

My sincere apologies for any confusion or inconvenience caused by this.










Update: Here is a second chart which shows the effect on our active editors in absolute terms. For very active editors the difference is negligible and can not be shown in such a plot.


(for comparison here is the old version of the report)

Posted in uncategorized | 1 Comment

Portal can now be searched

Wikimedia stats portal now features more tools and reports than ever (57 and growing). An often heard complaint was that the portal was a bit overwhelming and hard to navigate.

Two changes hopefully help you find what you need with more ease. First all entries are now in one huge list, no artificial breakdown between internal and external tools. By itself this list may be even more daunting in size, but the new search feature aims to address just that.

You can now filter entries by keywords. Descriptions and search tags will be scanned. The search then returns a table of content, followed by qualifying full entries.

Like before each entry briefly describes a few highlights of the tool, and features a rather small screenshot. This screenshot is not meant to explain the tool or report in detail (it may even be hard to read). Its function is twofold: primarily it can help you find back a report which you used earlier, and which you may still recognize from its visual appearance. It also gives a clue for at a glance scanning for type of output, e.g. tables vs charts.



* Primary objective was to make the current portal easier to use with limited coding effort, short payback time. Any more substantial overhaul is not ruled out, but currently not on the agenda of the Wikimedia Analytics Team.
* Any feedback is of course welcome: suggestions for functional improvement, for entries to add, for keywords to add, for fixing minor layout quirks.
* Current focus is on publicly accessible tools and reports. None of the entries leads to a page which requires log in.
* You’ll find an entry for Wikipedia visualizations, but those can’t be searched individually (yet).
* Even some defunct reports are listed (but clearly marked as such). Partly because some of these are dearly missed and can serve as inspiration for future replacements.

Posted in Wikistats Reports | Leave a comment

Full archive dumps are being processed again, first since 2010

There is not Wikistats issue for which I received more mails than this: since 2010 some metrics on article content were no longer updated: word count, articles above 200 chars, mean size in bytes, percentage above 0.5 or 2 Kb, database size, word count, images and links (internal, interwiki, external). Word count in particular was often mentioned.








Example: Polish Wikipedia

All these metrics need to be collected from the ‘full archive dumps’, the dumps which contain the full raw content of every revision of every page. The sheer amount of data that needs to processed made it no longer feasible to process those full dumps on a monthly basis (it didn’t help that I do rather ambitious cleaning up of the raw page content before counts are generated (e.g. for word count to approach ‘readable body text’).

So in 2010 for most Wikipedias I switched to processing stub dumps, which contain all meta data for every revision, but not the raw page content. For sister projects with much smaller dumps I continued processing full archive dumps.

Now finally I can announce I applied a fix which makes it possible to update those missing metrics roughly on a quarterly cycle. Full archive dumps are now processed on a different server, running as continuous low priority job, and the reporting process combines metrics from both servers.

In the last two weeks some 260 wikis were processed. Only 10 large wikis remain to be done:  Arabic, English, French, German, Hebrew, Italian, Japanese, Spanish, Swedish, Russian.  I expect in a month time all but English will be ready. English may arrive -fingers crossed- a month later.




Posted in uncategorized | 2 Comments

Wikimedia editor trends broken down by project

Since a few years we present monthly deduplicated totals for active and very active editors. Deduplicated meaning: every editor only counts once, regardless of number of wikis edited. We never collected similar trends on a per project basis. So to make up for this, last week I ran some special iterations of Wikistats to collect active editors trends per project.

I want to share with you four charts, as they were presented at today’s Metrics Meeting. There will be a follow-up study, but here are a few quick observations:

1) First chart is the big picture,

  • where English editor community is still somewhat shrinking (but most of that happened earlier)
  • where all non-English Wikipedias combined are fairly stable
  • where non-Wikipedias combined show significant growth especially in 2013

2) Second chart focuses on two largest non Wikipedia projects: Commons and new project Wikidata (together these make up most of the orange line in first chart).

Note how the large peaks in Commons editorship in September are result of hugely successful Wiki Loves Monuments contests

3) Third chart shows smaller Wikimedia projects which are stable or growing

4) Fourth chart shows smaller projects which are slightly or significantly shrinking

Thanks to Dario Taraborelli for inquiring about these metrics. He and I will look into this further, possibly checking correlation with page view trends.


UniqueActiveEditorsOnLargestNonWikipedias UniqueActiveEditorsOnSmallProjects-Growth UniqueActiveEditorsOnSmallProjects-Decline




Posted in uncategorized | 3 Comments