Wikistats is back

A week ago I published new wikistats files, for the first time in 7 months, only to retract them 2 days later, when it turned out that counts for some wikis were completely wrong. After some serious bug hunting I nailed the creepy creature, that had been hiding in an unexpected corner (most bugs find refuge there and yet it is still the last place to look).

New files have been generated and uploaded for Wikibooks, Wiktionary, Wikinews, Wikiquote, Wikisource, Wikiversity and Wikispecial. The wikistats job for Wikipedia has another 10-12 days to go, but intermediate results have been published, and I will refresh these every one or two days till new counts for all Wikipedias are online (as always with English as exception).

What went wrong

Here is a short explanation of what went wrong:  the wikistats job parses all language specific message files (PHP code) to harvest localized version of certain keywords, like #Redirect, Image, User. Only by scanning both for the original English keyword #Redirect and the localized version (e.g. Swedish #Omdirigering) can it determine whether an article is to be counted as proper article.

These language files had been moved and partially restructured earlier this year. I had updated my code, but for some wikis the new code failed to locate the proper keywords, and instead returned value 0 (not as a status code, just as data). So for those wikis articles that contained ‘#Redirect’ or ‘0’ were not counted, easily skipping half of the content. Something similar had occurred for image link counts.

I disabled the original blog post until the counts for all Wikipedias are up to date. Only then can I revise conclusions and screen shots used in the post.

Update

21 Jan 2009: I posted a separate updated version of this crippled post.

This entry was posted in Wikistats Production. Bookmark the permalink.

9 Responses to Wikistats is back

  1. Siebrand says:

    Excellent report, Erik, as we always see from you. Small typo in the “**” comment. In “in July 2009 wikistats can only report new figures up to September 2008” you mean “September 2009”.

  2. Erik says:

    Thanks for your feedback Siebrand. Actually I do mean 2008. The point is: I can’t produce counts for months that are only partially covered by the dump, and for some articles the English dump does not reach beyond September 2008. Hopefully before next summer we will have a reworked and speedier dump job, fingers crossed.

  3. Darkoneko says:

    Something scary : I searched a link for the stats in your post, but didn’t find any 🙁
    had to google it.

  4. Erik says:

    Thx. I added a link to the stats in the blog. And here again: http://stats.wikimedia.org

  5. Sage says:

    Erik, thanks for the great work! One question that I hope you might try to answer sometime is, what does the “lifespan” of a typical contributor look like (broken down, say, by percentiles of total activity level)?

    I think your interpretation of the stats in pretty well dead on: awareness saturation and usability are two of the biggest checks on growth (along with, on the bigger projects, the difficulty of finding somewhere to start that isn’t already well-developed). For English WP, I blogged a bit about it:
    http://ragesossscholar.blogspot.com/2008/12/wikipedia-blogging-outside-wiki-planet.html

    You may have seen the work Robert Rohde has done for English Wikipedia to bring a few key statistics more up to date:
    http://en.wikipedia.org/wiki/User:Ragesoss/Editing_frequency_stats

    It’s interesting that new contributors globally peaked right around the time active contributors peaked on English Wikipedia. I wonder if new contributors had already peaked around mid-2006 for English, and if we could similarly predict a global peak in active users based on when the new contributors peaked.

  6. moralist says:

    Hi, I’m a bit curious about the official article counting. According to your statistics, the swedish wikipedia have 74k official articles. But the swedish wikipedia recently reached 300k articles. And according to it french wikipedia only have 129k articles according to official article count, instead of about 700k.

    And according to it svwp have -9 new articles per day, whick also seems a bit strange..

  7. Jan Ainali says:

    There seems to be a problem with the official article count. According to it svwp has 74k articles while sv:Special:Statistics says 300k. Are they measured differently? Even more curious is that the table state that sv has been loosing articles every day since march while sv:Wikipedia:Statistik/Statistik (which is a dump from Special:Statistics) suggests the opposite. That page averages around +100 new articles every day.

  8. Grillo says:

    Why have all stats between sep 2006 and may 2008 vanished…?

  9. Pingback: Infodisiac » Wikistats is back again

Leave a Reply

Your email address will not be published. Required fields are marked *