Growth in article count at largest 20 Wikipedias

There is a lot of variation in article growth rate among mature Wikipedias. Growth slows down at some, is steady or even accelerates at others. Many have tried to model these trends. I have little to offer in explanation but can offer an at a glance overview of article growth trends for top wikis. You can switch between small and large charts, and either look at growth trends alone or match those visually with overall editor activity per wiki.



These charts existed for a while, via wikistats portal you can find similar charts for all other wikis. Navigate to sitemap pages, e.g. for wikipedia, click link Summary for any wiki. More links at bottom of sitemap to grouped summary pages per project.

New: trends are now broken down by type of editor: registered editors, anonymous editors, bots.

Some observations

Manual article growth has been slowing down on English Wikipedia from 2007, but seems to stabilize in last two years.

Growth in articles on German, French, Italian, and Polish articles has been pretty stable for many years.

Both observations seem relevant and somewhat opposite to the low-hanging fruit hypothesis, as all of these wikis can be considered fully mature Wikipedias. More about this hypothesis here and here.

Unrecognized bots?

Usually spikes in editor activity are caused by bots. A few charts show spikes in article creation rate for registered users. My hunch is these are anomalies, caused by bots not being recognized by name (roughly meaning they do not contain ‘bot’ in name) and not being registered as bot either (which I believe on many wikis is mandatory).

Any feedback on bots that fall in this category is very welcome. If some of these bots are registered after all, next month of charts will reflect this, for all history. Likely candidates for mis-attribution are Spanish spike in 2011, Chinese in 2012, Vietnamese in 2011/2012, Norwegian in 2008, Czech in 2010.

Portuguese Wikipedia

One particular issue prompted this overview, so let me ask: growth in new articles on the Portuguese Wikipedia dropped significantly early in 2011 (it seems to pick up again recently). The number of active editors did not change much in recent years. Any thoughts on this in general? Also it seems counting methodology changed (not on wikistats), or at least was questioned,  in March, according to this discussion (Google translates anexos as attachments (?))

Thanks in advance for any insights into these trends.

Upd: for further analysis you can download data files (csv)

 

This entry was posted in Wikimedia Edit(or)s. Bookmark the permalink.

11 Responses to Growth in article count at largest 20 Wikipedias

  1. Andre Engels says:

    ‘Anexos’ is what in Dutch we would call ‘appendix’. In Portuguese and Spanish Wikipedia this is a separate namespace containing lists. Because it’s a separate namespace, not the main namespace, those lists are not counted in the number of articles, but they were before the separate namespace came into existence – and still are in other languages.

  2. Nemo says:

    For what Andre Engels says, they should be content namespace. I’m going to file a bug immediately.

  3. Erik says:

    Thx Andre/Nemo. And which namespace is this? (Wikistats does not yet download list of countable namespaces via api.php, needs manual script update).

  4. Andre Engels says:

    The ‘Annex’ namespace is namespace 102 in pt: and hr:, namespace 104 in ar:, es:, fr: and lt:. Another custom namespace that I think should better be counted as content, is namespace 104 in als:, which is the former als: dictionary which has been merged with Wikipedia.

  5. Erik says:

    Comment I received via mail:

    For the ruWP, you may need to take into account the activity of the authors in the Incubator (ns = 102). Each month, about 400-500 new authors create and modify his first articles in the ruWP-Incubator.

    http://ru.wikipedia.org/wiki/%D0%92%D0%B8%D0%BA%D0%B8%D0%BF%D0%B5%D0%B4%D0%B8%D1%8F:%D0%98%D0%BD%D0%BA%D1%83%D0%B1%D0%B0%D1%82%D0%BE%D1%80

    http://ru.wikipedia.org/wiki/%D0%92%D0%B8%D0%BA%D0%B8%D0%BF%D0%B5%D0%B4%D0%B8%D1%8F:%D0%98%D0%BD%D0%BA%D1%83%D0%B1%D0%B0%D1%82%D0%BE%D1%80/en

  6. Erik says:

    Comment I received via mail:

    What the community mentioned about the “attachments” is that, since May, [..], it’s been decided to include attachments (I don’t know which better translation would be used for that – definition is here http://pt.wikipedia.org/wiki/Wikip%C3%A9dia:Anexo and there’s no equivalent in English). The change is documented here: http://pt.wikipedia.org/wiki/Wikip%C3%A9dia:Esplanada/propostas/%22Arrumar_a_casa%22_no_dom%C3%ADnio_anexo_%2819set2011%29#Enquete, https://bugzilla.wikimedia.org/show_bug.cgi?id=36359 and here http://www.mediawiki.org/wiki/Manual:UpdateArticleCount.php

  7. Erik says:

    Comment via mail:

    The first time, we tested the Incubator, we have worked in the space of “?????????:” (ns = 4) It was in 2010-2011.

    Then for stubs it created a separate space. And the creation of pieces was moved there.
    http://ru.wikipedia.org/w/index.php?diff=38255895&oldid=37761444
    http://ru.wikipedia.org/w/index.php?diff=40084273&oldid=36637785

    Now we are working only in the new ns (102).

  8. Jessie says:

    Thanks for this post; really interesting! So, just to make sure I’m reading these comments correctly: the “annexo” is a separate namespace that was created, but in fact SHOULD be counted in the “mainspace” calculations?

  9. Nemo says:

    Jessie, yes, but it’s a bit unclear what some of those namespaces actually are. If they’ve been moved, it surely means that some editors disliked those lists and wanted them to be somehow “second class”; this is what happened on it.wiki, where raw lists (and navigational templates) have mostly been deleted in favour of categories (which didn’t exist a long time ago), something which of course doesn’t happen if they’re confined to their own namespace.

    The bug I filed is https://bugzilla.wikimedia.org/39866 and has some more discussion.
    For instance:
    * the French Wikipedia uses it for references/notes which are usually part of the articles themselves (or even templates);
    * users’ sandboxes/incubators (as in ru.wiki) are by definition not (yet) content of the wiki, despite important, anyway their activity will be retroactively included by WikiStats as soon as they are moved to a “public” namespace
    * es.wiki has 33 k anexos vs. 909 k articles;
    * pt.wiki 15 vs. 734;
    * hr.wiki 4.5 vs. 115 (mostly wikispecies-like lists? );
    * ar.wiki 8.3 vs. 338 but they’re mostly dates (like ) so that both 100 and 104 seem equivalent to the Portal namespace elsewhere (not content).

  10. Waldir says:

    Jessie: Exactly. It’s roughly equivalent to having a “List:” namespace in the English Wikipedia. Just a slight correction: “Mainspace” is probably not the best term, since namespace 0 (the unprefixed article/page namespace) is called the “main namespace” [1]. The correct term is probably something like “content namespaces” [2].

    1. http://www.mediawiki.org/wiki/Manual:Namespace#Built-in_namespaces
    2. http://www.mediawiki.org/wiki/Manual:Using_custom_namespaces#Content_namespaces

  11. Erik says:

    See also http://en.wikipedia.org/wiki/Wikipedia:Main_namespace

    BTW that article lists disambiguation as exclusion criterion. Wikistats doesn’t do that, it excludes redirects, and on full archive dumps (which contain article content) also articles without internal link or category, and large lists.

Leave a Reply

Your email address will not be published. Required fields are marked *