Article count, a well established but corroding metric
For many years the key metric that wikimedians and outside observers (e.g. press) have used to describe the success of our projects has been article count. We use this metric for trend analysis, and comparison between language editions. Internally we report milestones reached. Round numbers with a fair number of zeroes lead to press statements. Wikistats has played its part in popularizing this metric.
Number of articles is indeed a simple metric, easy to define, easy to establish, but less and less easy to interpret. In early years article count was a reasonable (albeit one sided) indicator for our collective efforts. I am not so sure this is still the case on our larger wikis, where bots are gaining more and more prominence. One of many things that those bots can do is creation of new articles on a massive scale. It would be interesting to find out how much they do this already. I believe moderation is still applied in most cases, but yet more than once I read how wikimedians bragged about how they boosted the ranking of ‘their’ language project with bot created articles. This is a sad reason for using otherwise very useful bots. How far this will go in the future is a matter of taste and under debate  : in 2008 FritzpollBot was approved to create stub articles on the English Wikipedia for most or all of the documented villages and towns in the world. (Example.) This approval has been revoked since. This presumably could double the number of articles within months. And there are much larger data repositories from which we could create stubs (as an extreme example: how about a page per known celestial object (SINBAD database 4.7 million)?
|Bot activity is rising fast.||Star Tau Scorpii (human effort)
In 2006 at Wikimania Boston Jimmy Wales already invited us to shift our attention to quality (at least for the largest projects), and ease of use. More recently discussions have centered on yet another aspect of our performance.
Participation level focuses on our community and its potential
Jimmy Wales gave an excellent keynote speech at Wikimania 2009, called State of the Wiki (pdf) (video), in which he stated that we have much work ahead if we want to make our mission statement more than a somewhat pretentious sound byte.
In my presentation at Wikimania 2009 “Wikimedia in figures” (pdf) (video) I proposed a new metric for scoring and comparing the success of each of our projects, namely the participation level. By that I mean the size of the editor base as proportion of the potential size. In other words how well have we fared in reaching out to and including all major language communities in the world? This was partially inspired by the recent discussions about the size of our total community: can we sustain the level of activity we see today? It is also partially linked to ideas for outreach, as expressed on the strategy wiki 1.
Compared with older metrics like article count, word count, average article size, or even quality 2, the new metric does not so much focus on what we have achieved, but rather on what we can expect to achieve in the future. Amidst all uncertainties one thing is certain: our prime asset has always been and will always be our editor base. And language projects that lag behind will not thrive until we can convince enough speakers to contribute.
1 Shameless plug: see my proposal for helping underdeveloped wikis for large languages.
2 Assessing and measuring quality is a fascinating but challenging metric by itself : Martin Alec Walker gave an impressive presentation on what has been accomplished on the English Wikipedia for article quality assessment.
The following charts give a notion of how well we are doing in reaching out to the world.
Languages with 20+ million 1st/2nd speakers
Editors (5+ edits/month) per million speakers
Languages of European and Asian origin
Less than 2 editors per millions speakers
|click any image to enlarge|
Site map as mini ‘state of the wiki’
The slides above only show figures for the largest languages. In order to present the full picture and also allow us to track progress I integrated this metric into wikistats, namely into the site maps. The old site maps focused on ‘achievements’: for many years they listed languages by order of article count. When traffic statistics became available in 2008 the sort order was changed to page views per hour.
The new site map gives initial focus to participation (but as a bonus you can now resort the table).
Some columns are not shown here, see full picture
Participation is defined as the number of editors (5+ edits in last month) relative to the total number of speakers of that language (native and non native speakers combined).
- The new site map layout is first published for the Wikipedia wikis. Other projects will follow.
- Data for number of speakers are taken from the 250+ language articles on the English Wikipedia (August 2009). Some articles claim more precision than others. Some do not mention non native speakers. Where the article gives a range wikistats uses an average value (and uses other pages as well). Particularly the article on the English language gives a broad range: secondary language 0.2-1.4 billion. Of course there is also some arbitrariness in how well a person must speak a language before he or she qualifies as a non native speaker.
- All counts for number of speakers do not add up to 6.8 billion (the current world population), but that is not the point.
- The fact that age, literacy, standard of living (internet access), and political climate all influence the real potential of possible editors in a language are left out of consideration. For many of these factors good numbers are not available (world wide demographics per language), or hard to quantify (effect of political climate) or changing rapidly (internet access in 3rd world).
- One could argue that the absolute number of participants in a language project is a better metric for comparison. We do already have those figures. This new metric tells us how far we have progressed in terms of what can reasonably be expected (without regard for all complicating factors named above).
- The regions listed per language should be taken lightly, as a broad statement: many languages have recently dispersed over the globe due to economic migrations and/or diaspora and are spoken in many countries by a small minority.
Also new is that translations for all +/- 280 language names in all 28 wikistats languages are now retrieved automatically, from different sources. Because of this some 150 site maps (so many projects, so many language editions of wikistats) look much better now (still needs some polishing though). And the human effort needed to add another translation has halved.