Wikimedia page views, some good and bad news

First the good news: there is new summary report for Wikimedia page views that presents trends for nearly all projects on a single page.

Now the bad news: a few days ago it was established that the server that collects and aggregates log data for all squids could not keep up with all incoming messages, and hence underreported page views. When I suggested that recent page view trends looked very suspicious Tim Starling and Mark Bergsma quickly analyzed the cause and fixed server overload. Kudos to them. For April – July 2010 I could still infer the amount of underreporting from available log files. Counts for these months have been corrected. For earlier months, possibly from Nov 2009 till March 2010 counts are still too low. For details on the error correction see this pdf.

Reports affected: all wikistats reports that are based on dammit.lt hourly log files are affected, notably page view reports and server request reports. The same goes for the monthly Report Card. Earlier editions of the monthly server request reports are not yet corrected like the page view trend reports (maybe just a notice will be added), and of course even though absolute numbers are too low, comparisons are not affected (e.g. market share per browser or OS) . Other sites that build on these log data will be also affected, notably stats.grok.se , trendingtopics, amaglamate.

This entry was posted in Wikimedia View(er)s, Wikistats Production, Wikistats Reports. Bookmark the permalink.

5 Responses to Wikimedia page views, some good and bad news

  1. bawolff says:

    It’d be interesting to have some stats for wikinews as a whole without sr included. SR has an insane amount of bot edits, which throw off the stats a bit. (Not to mention many people feel that such bot editing defeats the point, but thats a different issue)

  2. Erik says:

    Yes Serbian Wikinews dominates bot stats on http://stats.wikimedia.org/wikinews/EN/PlotEditsZZ.png . It is very clear when a bot started to add hourly weather data to hundreds of pages. Fortunately bot stats on other Wikinews projects are few, as you say Wikinews is about manually added content.

  3. Is underreporting proportional? That is, will it affect the order of results, or just the quantity of hits?

  4. Erik says:

    Underreporting is proportional indeed. So market share per browser or per operating system, or traffic share to different Wikimedia projects or languages is not affected.

  5. Pingback: Infodisiac » Page views anomaly in October resolved

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>