Last week, I wrote a very technical post on how I model the distribution of comment counts for an RSS feed in FeedHub. I originally drafted the post to document its derivation. It was non-trivial enough that I feel that there is some merit in sharing this information with others, but it started with the [...]
As part of my work for FeedHub, I found the need to model the distribution of comment counts for blog posts in RSS feeds. In particular, I want to normalize the number of comments an item receives to a score ranging from 0 to 1. It turns out that the negative binomial distribution is a [...]
I have thought many times over the last few years about evaluation of information retrieval systems. Karen Spärck Jones stated my thoughts quite eloquently. Unfortunately, I wasn’t able to dig up the exact quote, but it was something along the lines of “statistical significance is not enough; you must also have practical significance.” To measure [...]
Panos Ipeirotis recently posted some interesting thoughts about statistical significance tests for comparing systems when incrementally developing techniques to solve a problem. While I don’t have the answer to his question, I did notice that he made note of the Bonferroni method for correcting for mutiple testing. The need for multiple test correction arises when [...]