Category Archives: Statistics

Why we model comment counts

Last week, I wrote a very technical post on how I model the distribution of comment counts for an RSS feed in FeedHub.  I originally drafted the post to document its derivation.  It was non-trivial enough that I feel that there is some merit in sharing this information with others, but it started with the [...]

Modeling blog post comment counts

As part of my work for FeedHub, I found the need to model the distribution of comment counts for blog posts in RSS feeds.  In particular, I want to normalize the number of comments an item receives to a score ranging from 0 to 1.  It turns out that the  negative binomial distribution is a [...]

Useful evaluations

I have thought many times over the last few years about evaluation of information retrieval systems.  Karen Spärck Jones stated my thoughts quite eloquently.  Unfortunately, I wasn’t able to dig up the exact quote, but it was something along the lines of “statistical significance is not enough; you must also have practical significance.” To measure [...]

Multiple significance testing

Panos Ipeirotis recently posted some interesting thoughts about statistical significance tests for comparing systems when incrementally developing techniques to solve a problem.  While I don’t have the answer to his question, I did notice that he made note of the Bonferroni method for correcting for mutiple testing.  The need for multiple test correction arises when [...]