Isotype diagrams are now easier to build on Wikipedia

Did you ever study a table with many large numbers, where the moment you put it away you realized nothing from what you just saw had stuck? I guess most of us suffer from this handicap that large numbers are difficult to absorb or evaluate.

In the 1930′s Gerd Arntz created a coherent set of 4000 pictograms and together with Otto Neurath built from these ‘words’ a ‘language’ called Isotype.

Wouldn’t it be nice if we could use a similar method to convert numbers into symbols on Wikipedia? Now we can:

I created a small Lua script (my first) called ‘Repeat_symbols’ to translate numbers into icons. Within 20 minutes the entire script had been replaced by a better version, no trace of my code left. Thanks again, Jackmcbarn. :-)



The following image is a tiny section from a huge table, that compares all European countries (EU member or not), and some other large countries, in terms of population, land area, GDP. You can see at a glance that Russia is much larger in land area than my country: the Netherlands. Who would have thought :-)

What did surprise me was that Russia’s GDP is not even 3 times as much a that of my home country. I told several people and they hardly believed. (click to zoom)

Also, instead of presenting the raw numbers I show all metrics as percentages of the EU total, which makes it much easier to evaluate and even remember some of them, when you see the full table with 65 countries.


The larger task really was to collect all data from English Wikipedia, and devise a spreadsheet which automated much of the data conversion and code generation, a nice past-time for several weekends.

For this particular chart, using fast and small traffic to symbolize land area was a compromise, as it takes some time to get used to it. I tried many symbols but most didn’t work at this small size. Suggestions welcome.

As I write this the new table has already been proposed for deletion (who has time for improvement when deletion is also an option). Hmmm. But the core principle and the new macro remain.




Posted in uncategorized | Leave a comment

Traffic to Wikipedia’s mobile site is growing fast

Since 2008 WMF count monthly page views for the non-mobile site.
Since June 2010 also for the mobile site.

From the respective monthly totals we can calculate which share of the traffic goes to the mobile site. Evidently this share has grown dramatically over recent years.

The first chart shows the trend for the eight most read Wikipedias.


The second chart shows the same trends, now for the nine ‘most mobile’ Wikipedias
(which also are above a threshold popularity of 1 million views a month).




Please don’t confuse traffic to the mobile site with traffic from mobile devices. One can choose to visit the non-mobile site from a phone or tablet. One can choose to visit the mobile site from a desktop computer.

These numbers have been collected with webstatscollector. There are a numbers of issues with that tool. My colleagues Christian Aistleitner and Andrew Otto are working on a new version of the tool, which will be more robust, more precise in which messages to count, and draw data from the new Kafka infrastructure instead of direct messages from each server (via udp2log). Later on with that new infrastructure we will also be able to do a more complete breakdown, by country, and hence by region.

Data files

The following data files are available for offline analysis:

Pageview reports

The Wikipedia pageview reports now also show % mobile for last 24 months. Example: pageviews for Wikipedia, all platforms, normalized.

Breakdown by region (sort of)

Here, for what it’s worth, a breakdown by region, but languages spoken in several regions are listed separately. So please use these regional results with a grain of salt.

region: Africa

languages:aa:Afar, af:Afrikaans, ak:Akan, am:Amharic, arz:Egyptian Arabic, bm:Bambara, ee:Ewe, ff:Fulfulde, ha:Hausa, hz:Herero, ig:Igbo, kab:Kabyle, kg:Kongo, ki:Kikuyu, kj:Kuanyama, kr:Kanuri, lg:Ganda, ln:Lingala, mg:Malagasy, ng:Ndonga, nso:Northern Sotho, ny:Chichewa, om:Oromo, rn:Kirundi, rw:Kinyarwanda, sg:Sangro, sn:Shona, so:Somali, ss:Siswati, st:Sesotho, sw:Swahili, ti:Tigrinya, tn:Setswana, ts:Tsonga, tum:Tumbuka, tw:Twi, ve:Venda, wo:Wolof, xh:Xhosa, yo:Yoruba, zu:Zulu
perc mobile: 22.5%

regions: Africa/Asia
perc mobile: 37.8%

region: Artificial
languages:eo:Esperanto, ia:Interlingua, ie:Interlingue, io:Ido, jbo:Lojban, nov:Novial, vo:Volapük
perc mobile: 13.1%

region: Asia
languages:ab:Abkhazian, ace:Acehnese, arc:Aramaic, as:Assamese, az:Azeri, ba:Bashkir, bcl:Central Bicolano, bh:Bihari, bjn:Banjar, bn:Bengali, bo:Tibetan, bpy:Bishnupriya Manipuri, bug:Buginese, bxr:Buryat, cbk-zam:Chavacano, cdo:Min Dong, ceb:Cebuano, ckb:Sorani, cv:Chuvash, diq:Zazaki, dv:Divehi, dz:Dzongkha, fa:Persian, gan:Gan, glk:Gilaki, gu:Gujarati, hak:Hakka, he:Hebrew, hi:Hindi, hy:Armenian, id:Indonesian, ii:Yi, ilo:Ilokano, ja:Japanese, jv:Javanese, kaa:Karakalpak, kbd:Karbadian, kk:Kazakh, km:Khmer, kn:Kannada, ko:Korean, krc:Karachay-Balkar, ks:Kashmiri, ku:Kurdish, ky:Kirghiz, lad:Ladino, lbe:Lak, lo:Laotian, map-bms:Banyumasan, min:Minangkabau, ml:Malayalam, mn:Mongolian, mr:Marathi, mrj:Western Mari, ms:Malay, my:Burmese, myv:Erzya, mzn:Mazandarani, ne:Nepali, new:Nepal Bhasa, or:Oriya, os:Ossetic, pa:Punjabi, pag:Pangasinan, pam:Kapampangan, pi:Pali, pnb:Western Panjabi, ps:Pashto, sa:Sanskrit, sah:Sakha, sd:Sindhi, si:Sinhala, su:Sundanese, ta:Tamil, te:Telugu, tet:Tetum, tg:Tajik, th:Thai, tk:Turkmen, tl:Tagalog, tpi:Tok Pisin, tt:Tatar, tyv:Tuvan, udm:Udmurt, ug:Uyghur, ur:Urdu, uz:Uzbek, vi:Vietnamese, war:Waray-Waray, wuu:Wu, za:Zhuang, zh:Chinese, zh-classical:Classical Chinese, zh-min-nan:Min Nan, zh-yue:Cantonese
perc mobile: 33.0%

region: Europe
languages:als:Alemannic, an:Aragonese, ang:Anglo-Saxon, ast:Asturian, av:Avar, bar:Bavarian, bat-smg:Samogitian, be:Belarusian, be-x-old:Belarusian (Taraškievica), bg:Bulgarian, br:Breton, bs:Bosnian, ca:Catalan, ce:Chechen, co:Corsican, cs:Czech, csb:Cassubian, cu:Old Church Slavonic, cy:Welsh, da:Danish, de:German, dsb:Lower Sorbian, el:Greek, eml:Emilian-Romagnol, et:Estonian, eu:Basque, ext:Extremaduran, fi:Finnish, fiu-vro:Voro, fo:Faroese, frp:Arpitan, frr:North Frisian, fur:Friulian, fy:Frisian, ga:Irish, gd:Scots Gaelic, gl:Galician, got:Gothic, gv:Manx, hr:Croatian, hsb:Upper Sorbian, hu:Hungarian, is:Icelandic, it:Italian, ka:Georgian, koi:Komi-Permyak, ksh:Ripuarian, kv:Komi, kw:Cornish, lb:Luxembourgish, lez:Lezgian, li:Limburgish, lij:Ligurian, lmo:Lombard, lt:Lithuanian, ltg:Latgalian, lv:Latvian, mdf:Moksha, mhr:Eastern Mari, mk:Macedonian, mo:Moldavian, mt:Maltese, mwl:Mirandese, nap:Neapolitan, nds:Low Saxon, nds-nl:Dutch Low Saxon, nn:Nynorsk, no:Norwegian, nrm:Norman, oc:Occitan, pcd:Picard, pl:Polish, pms:Piedmontese, pnt:Pontic, rm:Romansh, rmy:Romani, ro:Romanian, roa-rup:Aromanian, roa-tara:Tarantino, rue:Rusyn, sc:Sardinian, scn:Sicilian, sco:Scots, se:Northern Sami, sh:Serbo-Croatian, sk:Slovak, sl:Slovene, sq:Albanian, sr:Serbian, stq:Saterland Frisian, sv:Swedish, szl:Silesian, uk:Ukrainian, vec:Venetian, vep:Vepsian, vls:West Flemish, wa:Walloon, xal:Kalmyk, zea:Zealandic
perc mobile: 25.9%

regions: Europe/Asia
languages:crh:Crimean Tatar, ru:Russian, tr:Turkish
perc mobile: 20.6%

regions: Europe/North-America/Oceania/Asia/Africa
languages:en:English, simple:Simple English
perc mobile: 31.5%

regions: Europe/North-America/South-America/Asia/Africa
perc mobile: 31.9%

regions: Europe/North-America/South-America/Asia/Africa/Oceania
perc mobile: 28.0%

regions: Europe/South-America
perc mobile: 27.4%

regions: Europe/South-America/Africa/Asia
perc mobile: 25.0%

region: North-America
languages:cho:Choctaw, chr:Cherokee, chy:Cheyenne, cr:Cree, ht:Haitian, ik:Inupiak, iu:Inuktitut, kl:Greenlandic, mus:Muskogee, nah:Nahuatl, nv:Navajo, pdc:Pennsylvania German
perc mobile: 14.6%

region: Oceania
languages:bi:Bislama, ch:Chamorro, fj:Fijian, haw:Hawai’ian, hif:Fiji Hindi, ho:Hiri Motu, mh:Marshallese, mi:Maori, na:Nauruan, pih:Norfolk, sm:Samoan, to:Tongan, ty:Tahitian
perc mobile: 16.6%

region: South-America
languages:ay:Aymara, gn:Guarani, pap:Papiamentu, qu:Quechua, srn:Sranan
perc mobile: 13.8%

region: World
languages:la:Latin, yi:Yiddish
perc mobile: 11.5%

Posted in uncategorized | Leave a comment

Wiki Loves Monuments 2013

This gallery contains 5 photos.

Here are some charts on the breakdown by country of contributions and contributors to Wiki Loves Monuments 2013. Better late than never. I meant to publish this together with retention stats, but those are still in the pipeline, and may … Continue reading

More Galleries | 2 Comments

Reassessment of active editors

Yesterday I discovered a bug in wikistats which affects our editor counts for the last 2 years.

Wikistats does flag users as ‘anonymous’ based on pattern recognition, rather than relying on the <ip> tag. Reason: many anons with other pattern than just 4 numeric triplets (e.g. ended up in the <username> tag in early years). To my dismay I realized yesterday this recognition code was never adapted for ipv6 addresses. Hence those anonymous ipv6 addresses were counted as normal registered users in many reports. Especially for the last 12 month this visibly affected our totals for active editors (5+ edits a month), hardly so for very active editors (100+ edits a month).

Today I fixed this for our report on total unique (aka deduplicated) registered users for all Wikimedia wikis combined. The chart below show how much counts were lowered because of this.  Other reports will be fixed after June’s dump processing cycle.

My sincere apologies for any confusion or inconvenience caused by this.










Update: Here is a second chart which shows the effect on our active editors in absolute terms. For very active editors the difference is negligible and can not be shown in such a plot.


(for comparison here is the old version of the report)

Posted in uncategorized | 1 Comment

Portal can now be searched

Wikimedia stats portal now features more tools and reports than ever (57 and growing). An often heard complaint was that the portal was a bit overwhelming and hard to navigate.

Two changes hopefully help you find what you need with more ease. First all entries are now in one huge list, no artificial breakdown between internal and external tools. By itself this list may be even more daunting in size, but the new search feature aims to address just that.

You can now filter entries by keywords. Descriptions and search tags will be scanned. The search then returns a table of content, followed by qualifying full entries.

Like before each entry briefly describes a few highlights of the tool, and features a rather small screenshot. This screenshot is not meant to explain the tool or report in detail (it may even be hard to read). Its function is twofold: primarily it can help you find back a report which you used earlier, and which you may still recognize from its visual appearance. It also gives a clue for at a glance scanning for type of output, e.g. tables vs charts.



* Primary objective was to make the current portal easier to use with limited coding effort, short payback time. Any more substantial overhaul is not ruled out, but currently not on the agenda of the Wikimedia Analytics Team.
* Any feedback is of course welcome: suggestions for functional improvement, for entries to add, for keywords to add, for fixing minor layout quirks.
* Current focus is on publicly accessible tools and reports. None of the entries leads to a page which requires log in.
* You’ll find an entry for Wikipedia visualizations, but those can’t be searched individually (yet).
* Even some defunct reports are listed (but clearly marked as such). Partly because some of these are dearly missed and can serve as inspiration for future replacements.

Posted in Wikistats Reports | Leave a comment