What Wikipedia readers devour

Recently I wrote a new tool for Wikipedia which makes good use of the consolidated page request files and the page category system, to rank thematic sets of Wikipedia pages by popularity. Tool and request files are both hobby projects.


Visually challenged satyan master demonstrating reading malayalam wikipedia using free software,
E-speak ,screen reader, author Fotokannan, copyright CC BY-SA 3.0

For selected languages your can browse the top 2500 or even top 10,000 most requested articles within a certain category or one of it’s subcategories. You can also browse the category hierarchy used for selection of these pages. Reports are grouped by language and month.

Sometimes entries in these lists seem oddly out of place. Any Wikipedia article can have tens of categories assigned to it. A popular article will rank high in any list where it’s featured, regardless of the category under review. Thus a well-known singer may be top ranking in a list about politicians, because he/she also played a minor or brief role in politics.

For a selected set of categories these stats will be refreshed monthly. Some popular languages, like Russian and Japanese, will be added once Unicode support is complete.

Michael Hale published a video the same day I first tweeted about this, which demonstrates a related more interactive tool. Highly recommended to also watch that demo. Both approaches are quite different, each with different merits.

We could do much more with article request counts. For instance we could weigh likelihood of a page popping up at Random article based on it’s popularity. Purists may object, as the selection would no longer be really random, but we could rename the button, e.g. to ‘Feeling lucky?’.

Examples

Politicians by country (and politics, and in some languages celebrities with party affiliations, as these are in the same category hierarchy): 
American (en), Brasilian (pt), British (en), Dutch (nl), French (fr), German (de), 
Indonesian (id), Italian (it), Polish (pl), Portuguese (pt), Spanish (es), Swedish (sv)

Museums by country: Australia (en), France (fr), Germany (de), India (en),
Indonesia (id), Mexico (es), Netherlands (nl), Poland (pl), Portugal (pt), Spain (es), 
Sweden (sv), UK (en), US (en)

Outreach: Bookshelf (en), Education (en),  GLAM (en)

Misc: GLAM on wp:en (en), Lists on wp:enMeta (en)

 

 

 

 

 

This entry was posted in uncategorized. Bookmark the permalink.

5 Responses to What Wikipedia readers devour

  1. Stefano Costa says:

    These stats are really interesting, thank you.

    I would be interested in the stats for museums in Italy but I don’t see them in your examples. Is the tool you developed available for others to use?

  2. Erik says:

    Hi Stefano, Museums and Lists for Italy have been added. There is a minor issue where category tree and page view report don’t link properly to each other, but they are both on this page. Links for building these reports yourself at bottom of each report. Erik

  3. Stefano Costq says:

    Thank you! I’m not sure I figured out exactly how to build the reports on my own (do I need a data dump to start with?) but I will try ? it is really useful.

  4. Erik says:

    @Stefano, you need an archived monthly file with page views (monthly totals only) from http://dumps.wikimedia.org/other/pagecounts-ez/merged/ e.g. pagecounts-2013-03-views-ge-5_totals.bz2

  5. Stefano Costa says:

    Thanks Erik,
    I figured that out and I started hacking right away, notwithstanding the huge files :)

    Some of the scripts were a bit difficult to start with. I am not sure this is really interesting for you but see this commit for some tweaks I did to make the whole “dammit.lt” thing more friendly to start from scratch. I understand that your work is primarily WMF-driven but it would be cool if this tool was available on its own and easier to use.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>