<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Information Retrieval on the Live Web &#187; negative binomial distribution</title>
	<atom:link href="http://livewebir.com/blog/tag/negative-binomial-distribution/feed/" rel="self" type="application/rss+xml" />
	<link>http://livewebir.com/blog</link>
	<description>by Paul Ogilvie</description>
	<lastBuildDate>Mon, 08 Mar 2010 17:20:24 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Modeling blog post comment counts</title>
		<link>http://livewebir.com/blog/2008/07/modeling-blog-post-comment-counts/</link>
		<comments>http://livewebir.com/blog/2008/07/modeling-blog-post-comment-counts/#comments</comments>
		<pubDate>Tue, 01 Jul 2008 20:51:46 +0000</pubDate>
		<dc:creator>pogil</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[comment counts]]></category>
		<category><![CDATA[expectation maximization]]></category>
		<category><![CDATA[maximum a posterori]]></category>
		<category><![CDATA[negative binomial distribution]]></category>

		<guid isPermaLink="false">http://pogil.wordpress.com/?p=8</guid>
		<description><![CDATA[As part of my work for FeedHub, I found the need to model the distribution of comment counts for blog posts in RSS feeds.  In particular, I want to normalize the number of comments an item receives to a score ranging from 0 to 1.  It turns out that the  negative binomial distribution is a [...]]]></description>
			<content:encoded><![CDATA[<p>As part of my work for <a title="FeedHub" href="http://www.feedhub.com">FeedHub</a>, I found the need to model the distribution of comment counts for blog posts in RSS feeds.  In particular, I want to normalize the number of comments an item receives to a score ranging from 0 to 1.  It turns out that the  <a title="negative binomial distribution" href="http://en.wikipedia.org/wiki/Negative_binomial_distribution">negative binomial distribution</a> is a good fit for comment counts. The negative binomial distribution is a discrete distribution with a probability density function of</p>
<p style="text-align:center;"><img src='http://s.wordpress.com/latex.php?latex=f%28x%3Bp%2Cr%29%20%3D%20%5Cfrac%7B%5CGamma%28x%20%2B%20r%29%7D%7B%5CGamma%28r%29%20x%21%7D%20p%5Er%20%281-p%29%5Ex&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='f(x;p,r) = \frac{\Gamma(x + r)}{\Gamma(r) x!} p^r (1-p)^x' title='f(x;p,r) = \frac{\Gamma(x + r)}{\Gamma(r) x!} p^r (1-p)^x' class='latex' /></p>
<p style="text-align:left;">where <img src='http://s.wordpress.com/latex.php?latex=r%20%3E%200&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='r &gt; 0' title='r &gt; 0' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=0%20%5Cleq%20p%20%5Cleq%201.&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='0 \leq p \leq 1.' title='0 \leq p \leq 1.' class='latex' /></p>
<p style="text-align:left;">However, in many cases we have small sample sizes and I felt the natural urge to specify prior distributions over the negative binomial&#8217;s parameters. The <a title="beta distribution" href="http://en.wikipedia.org/wiki/Beta_distribution">beta distribution</a> is conjugate for <img src='http://s.wordpress.com/latex.php?latex=p&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p' title='p' class='latex' /> when <img src='http://s.wordpress.com/latex.php?latex=r&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='r' title='r' class='latex' /> is known, but for our data <img src='http://s.wordpress.com/latex.php?latex=r&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='r' title='r' class='latex' /> is not fixed.</p>
<p style="text-align:left;">This left me a little stuck.  What I really want is a closed form conjugate prior when both parameters of the negative binomial distribution that is efficient to compute.  What I found was <a title="Conjugate Bayesian Analysis of the Negative Binomial Distribution" href="http://www.soa.org/library/research/actuarial-research-clearing-house/1990-99/1993/arch-1/arch93v112.pdf">Morgan and Hickman</a>, which requires a large sample to be accurate (kind of defeats the point).  I also found <a title="Bayesian Inference for the Negative Binomial Distribution via Polynomial Expansions" href="http://www.ingentaconnect.com/content/asa/jcgs/2002/00000011/00000001/art00009">Bradlow, Hardie, and Fader</a>, which has a &#8220;closed-form&#8221; solution which requires a 300-term expansion to accurately estimate the posterior.   While noticeably faster than <a title="Markov chain Monte Carlo" href="http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo">Markov chain Monte Carlo</a> methods, it is still more heavyweight than I want.</p>
<p style="text-align:left;">I felt defeated until I realized that if I accept inference using <a title="maximum a posteriori" href="http://en.wikipedia.org/wiki/Maximum_a_posteriori">maximum a posteriori</a> estimates of <img src='http://s.wordpress.com/latex.php?latex=p&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p' title='p' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=r&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='r' title='r' class='latex' /> as a good enough alternative to full-blown Bayesian inference, things become much simpler.  I need only periodically recompute <img src='http://s.wordpress.com/latex.php?latex=%5Chat%7Bp%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\hat{p}' title='\hat{p}' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=%5Chat%7Br%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\hat{r}' title='\hat{r}' class='latex' /> and perform inference directly on the negative binomial distribution using those estimates.  I also get an additional benefit from being willing to use maximum a posteriori estimates; I can use <a title="expectation-maximization" href="http://en.wikipedia.org/wiki/Expectation-maximization_algorithm">expectation-maximization</a> to estimate <img src='http://s.wordpress.com/latex.php?latex=%5Chat%7Bp%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\hat{p}' title='\hat{p}' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=%5Chat%7Br%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\hat{r}' title='\hat{r}' class='latex' />.</p>
<p style="text-align:left;">This task is quite easy when estimating the parameters of a negative binomial.  Given observations <img src='http://s.wordpress.com/latex.php?latex=x%5En%2C&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='x^n,' title='x^n,' class='latex' /> estimates <img src='http://s.wordpress.com/latex.php?latex=p%5E%7B%5Bt%5D%7D%2C%20r%5E%7B%5Bt%5D%7D%2C&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p^{[t]}, r^{[t]},' title='p^{[t]}, r^{[t]},' class='latex' /> and priors over <img src='http://s.wordpress.com/latex.php?latex=p&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p' title='p' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=r%2C&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='r,' title='r,' class='latex' /> we estimate new maximum a posteriori estimates <img src='http://s.wordpress.com/latex.php?latex=p%5E%7B%5Bt%2B1%5D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p^{[t+1]}' title='p^{[t+1]}' class='latex' />, <img src='http://s.wordpress.com/latex.php?latex=r%5E%7B%5Bt%2B1%5D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='r^{[t+1]}' title='r^{[t+1]}' class='latex' />  Wash, rinse, repeat until convergence.  This iterative procedure means that when estimating <img src='http://s.wordpress.com/latex.php?latex=p&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p' title='p' class='latex' />, we can assume a constant <img src='http://s.wordpress.com/latex.php?latex=r&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='r' title='r' class='latex' /> (and vice versa).</p>
<p style="text-align:left;">To estimate <img src='http://s.wordpress.com/latex.php?latex=p%7Cx%5En&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p|x^n' title='p|x^n' class='latex' />, write</p>
<p style="text-align:center;"><img src='http://s.wordpress.com/latex.php?latex=f%28p%7Cx%5En%29%20%5Cpropto%20f%28p%29L_n%28r%2Cp%29%5C%2C&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='f(p|x^n) \propto f(p)L_n(r,p)\,' title='f(p|x^n) \propto f(p)L_n(r,p)\,' class='latex' /></p>
<p style="text-align:left;">where</p>
<p style="text-align:center;"><img src='http://s.wordpress.com/latex.php?latex=%5Cbegin%7Barray%7D%7Brl%7D%20L_n%28r%2Cp%29%20%26%20%3D%20%5Cprod_%7Bi%3D1%7D%5En%20f%28x_i%3Bp%2Cn%29%20%5C%5C%20%5C%5C%20%26%20%3D%20p%5E%7Brn%7D%281-p%29%5E%7B%5Csum_%7Bi%3D1%7D%5En%20x_i%7D%20%5Cprod_%7Bi%3D1%7D%5En%20%5Cfrac%7B%5CGamma%28x_i%20%2B%20r%29%7D%7B%5CGamma%28r%29x_i%21%7D%20.%5Cend%7Barray%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\begin{array}{rl} L_n(r,p) &amp; = \prod_{i=1}^n f(x_i;p,n) \\ \\ &amp; = p^{rn}(1-p)^{\sum_{i=1}^n x_i} \prod_{i=1}^n \frac{\Gamma(x_i + r)}{\Gamma(r)x_i!} .\end{array}' title='\begin{array}{rl} L_n(r,p) &amp; = \prod_{i=1}^n f(x_i;p,n) \\ \\ &amp; = p^{rn}(1-p)^{\sum_{i=1}^n x_i} \prod_{i=1}^n \frac{\Gamma(x_i + r)}{\Gamma(r)x_i!} .\end{array}' class='latex' /></p>
<p style="text-align:left;">
<p style="text-align:left;">
<p style="text-align:left;">If <img src='http://s.wordpress.com/latex.php?latex=p%20%5Csim%20Beta%28%5Calpha%2C%20%5Cbeta%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p \sim Beta(\alpha, \beta)' title='p \sim Beta(\alpha, \beta)' class='latex' /> then</p>
<p style="text-align:center;"><img src='http://s.wordpress.com/latex.php?latex=%5Cbegin%7Barray%7D%7Brl%7D%20f%28p%7Cx%5En%29%20%26%20%5Cpropto%20p%5E%7B%5Calpha-1%7D%20%281-p%29%5E%7B%5Cbeta-1%7D%20L_n%28r%2Cp%29%20%5C%5C%20%5C%5C%20%26%20%5Cpropto%20p%5E%7B%5Calpha%2Brn%20-%201%7D%281-p%29%5E%7B%5Cbeta%20%2B%20%5Csum_%7Bi%3D1%7D%5En%20x_i%20-%201%7D%2C%5Cend%7Barray%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\begin{array}{rl} f(p|x^n) &amp; \propto p^{\alpha-1} (1-p)^{\beta-1} L_n(r,p) \\ \\ &amp; \propto p^{\alpha+rn - 1}(1-p)^{\beta + \sum_{i=1}^n x_i - 1},\end{array}' title='\begin{array}{rl} f(p|x^n) &amp; \propto p^{\alpha-1} (1-p)^{\beta-1} L_n(r,p) \\ \\ &amp; \propto p^{\alpha+rn - 1}(1-p)^{\beta + \sum_{i=1}^n x_i - 1},\end{array}' class='latex' /></p>
<p style="text-align:left;">which indicates <img src='http://s.wordpress.com/latex.php?latex=p%7Cx%5En%20%5Csim%20Beta%28%5Calpha%20%2B%20rn%2C%20%5Cbeta%20%2B%20%5Csum_%7Bi%3D1%7D%5En%20x_i%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p|x^n \sim Beta(\alpha + rn, \beta + \sum_{i=1}^n x_i)' title='p|x^n \sim Beta(\alpha + rn, \beta + \sum_{i=1}^n x_i)' class='latex' />.  To estimate <img src='http://s.wordpress.com/latex.php?latex=p%5E%7B%5Bt%2B1%5D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p^{[t+1]}' title='p^{[t+1]}' class='latex' />, we use the posterior mode of <img src='http://s.wordpress.com/latex.php?latex=p%7Cx%5En&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p|x^n' title='p|x^n' class='latex' />:</p>
<p style="text-align:center;"><img src='http://s.wordpress.com/latex.php?latex=p%5E%7B%5Bt%2B1%5D%7D%20%3D%20%5Cfrac%7B%5Calpha%20%2B%20r%5E%7B%5Bt%5D%7Dn%20-%201%7D%7B%5Calpha%20%2B%20%5Cbeta%20%2B%20r%5E%7B%5Bt%5D%7Dn%20%2B%20%5Csum_%7Bi%3D1%7D%5En%20x_i%20-%202%7D.&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p^{[t+1]} = \frac{\alpha + r^{[t]}n - 1}{\alpha + \beta + r^{[t]}n + \sum_{i=1}^n x_i - 2}.' title='p^{[t+1]} = \frac{\alpha + r^{[t]}n - 1}{\alpha + \beta + r^{[t]}n + \sum_{i=1}^n x_i - 2}.' class='latex' /></p>
<p style="text-align:left;">That was the easy part.  Now for the harder part.</p>
<p style="text-align:left;">For my purposes, the beta prime distribution is a good fit for <img src='http://s.wordpress.com/latex.php?latex=r&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='r' title='r' class='latex' />.  The <a title="beta prime distribution" href="http://en.wikipedia.org/wiki/Beta_prime_distribution">beta prime distribution</a> has a similar shape to the more familiar gamma distribution.  If <img src='http://s.wordpress.com/latex.php?latex=X%20%5Csim%20Beta%28a%2C%20b%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X \sim Beta(a, b)' title='X \sim Beta(a, b)' class='latex' /> then <img src='http://s.wordpress.com/latex.php?latex=Y%20%3D%20%5Cfrac%7BX%7D%7B1%20-%20X%7D%20%5Csim%20Beta%5E%5Cprime%28a%2C%20b%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y = \frac{X}{1 - X} \sim Beta^\prime(a, b)' title='Y = \frac{X}{1 - X} \sim Beta^\prime(a, b)' class='latex' />.  For the beta prime distribution,</p>
<p style="text-align:center;"><img src='http://s.wordpress.com/latex.php?latex=f%28r%3Ba%2Cb%29%20%3D%20%5Cfrac%7B%5CGamma%28a%20%2B%20b%29%7D%7B%5CGamma%28a%29%5CGamma%28b%29%7D%20r%5Ea%20%281%2Br%29%5E%7B-a-b%7D%20.&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='f(r;a,b) = \frac{\Gamma(a + b)}{\Gamma(a)\Gamma(b)} r^a (1+r)^{-a-b} .' title='f(r;a,b) = \frac{\Gamma(a + b)}{\Gamma(a)\Gamma(b)} r^a (1+r)^{-a-b} .' class='latex' /></p>
<p style="text-align:left;">
<p style="text-align:left;">Our posterior <img src='http://s.wordpress.com/latex.php?latex=r%20%7Cx%5En&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='r |x^n' title='r |x^n' class='latex' /> is distributed</p>
<p style="text-align:center;"><img src='http://s.wordpress.com/latex.php?latex=f%28r%7Cx%5En%29%3Df%28r%3Ba%2Cb%29L_n%28r%2Cp%29%5Cleft%2F%5Cint%20f%28r%3Ba%2Cb%29L_n%28r%2Cp%29dr%20%5Cright.%20%2C%20&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='f(r|x^n)=f(r;a,b)L_n(r,p)\left/\int f(r;a,b)L_n(r,p)dr \right. , ' title='f(r|x^n)=f(r;a,b)L_n(r,p)\left/\int f(r;a,b)L_n(r,p)dr \right. , ' class='latex' /></p>
<p style="text-align:left;">where</p>
<p style="text-align:center;"><img src='http://s.wordpress.com/latex.php?latex=f%28r%3Ba%2Cb%29L_n%28r%2Cp%29%5Cpropto%20r%5Ea%281%2Br%29%5E%7B-a-b%7Dp%5E%7Brn%7D%5Cprod_%7Bi%3D1%7D%5En%5CGamma%28r%2Bx_i%29%2F%5CGamma%28r%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='f(r;a,b)L_n(r,p)\propto r^a(1+r)^{-a-b}p^{rn}\prod_{i=1}^n\Gamma(r+x_i)/\Gamma(r)' title='f(r;a,b)L_n(r,p)\propto r^a(1+r)^{-a-b}p^{rn}\prod_{i=1}^n\Gamma(r+x_i)/\Gamma(r)' class='latex' /></p>
<p style="text-align:left;">
<p style="text-align:left;">Following Bradlow, Hardie, and Fader, we note that the ratio of the two Gamma functions can computed exactly:</p>
<p style="text-align:center;"><img src='http://s.wordpress.com/latex.php?latex=%5Cprod_%7Bi%3D1%7D%5En%20%5CGamma%28r%2Bx_i%29%20%5Cleft%2F%5CGamma%28r%29%5Cright.%3D%5Cprod_%7Bi%3D1%7D%5E%7Bx%5E%2A%7D%28r%2Bi-1%29%5E%7Bs_i%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\prod_{i=1}^n \Gamma(r+x_i) \left/\Gamma(r)\right.=\prod_{i=1}^{x^*}(r+i-1)^{s_i}' title='\prod_{i=1}^n \Gamma(r+x_i) \left/\Gamma(r)\right.=\prod_{i=1}^{x^*}(r+i-1)^{s_i}' class='latex' /></p>
<p style="text-align:left;">where <img src='http://s.wordpress.com/latex.php?latex=x%5E%2A%20%3D%20%5Cmax%28x%5En%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='x^* = \max(x^n)' title='x^* = \max(x^n)' class='latex' />, <img src='http://s.wordpress.com/latex.php?latex=s_i%3D%5Csum_%7Bj%3Di%7D%5E%7Bx%5E%2A%7Dn_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='s_i=\sum_{j=i}^{x^*}n_j' title='s_i=\sum_{j=i}^{x^*}n_j' class='latex' />, and <img src='http://s.wordpress.com/latex.php?latex=n_j%3D%7C%5C%7Bx_i%20%5Cin%20x%5En%3Ax_i%3Dj%5C%7D%7C&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='n_j=|\{x_i \in x^n:x_i=j\}|' title='n_j=|\{x_i \in x^n:x_i=j\}|' class='latex' /> is the number of observations equal to <img src='http://s.wordpress.com/latex.php?latex=j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='j' title='j' class='latex' />.</p>
<p style="text-align:left;">While <img src='http://s.wordpress.com/latex.php?latex=f%28r%3Ba%2Cb%29L_n%28r%2Cp%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='f(r;a,b)L_n(r,p)' title='f(r;a,b)L_n(r,p)' class='latex' /> can be computed exactly, it is not easy to integrate analytically.  Although I am unwilling to perform computationally expensive operations regularly, I am very comfortable with using numeric techniques to update the MAP estimates.  So I can rely on black-box estimation functions in <a title="R" href="http://www.r-project.org">R</a> when testing or the <a title="Apache Commons Math" href="http://commons.apache.org/math/">Apache Commons Math</a> library for use in our system.  The resulting recipe to estimate for <img src='http://s.wordpress.com/latex.php?latex=r%5E%7B%5Bt%2B1%5D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='r^{[t+1]}' title='r^{[t+1]}' class='latex' />:</p>
<p style="text-align:center;"><img src='http://s.wordpress.com/latex.php?latex=%5Cbegin%7Barray%7D%7Brl%7D%20r%5E%7B%5Bt%2B1%5D%7D%20%26%20%3D%20%5Carg%5Cmax_r%20f%28r%3Ba%2Cb%29L_n%28r%2Cp%5E%7B%5Bt%5D%7D%29%20%5C%5C%20%5C%5C%20%26%20%3D%20%5Carg%5Cmax_r%20%20r%5Ea%281%2Br%29%5E%7B-a-b%7Dp%5E%7Brn%7D%20%5Cprod_%7Bi%3D1%7D%5E%7Bx%5E%2A%7D%28r%2Bi-1%29%5E%7Bs_i%7D%20.%20%5Cend%7Barray%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\begin{array}{rl} r^{[t+1]} &amp; = \arg\max_r f(r;a,b)L_n(r,p^{[t]}) \\ \\ &amp; = \arg\max_r  r^a(1+r)^{-a-b}p^{rn} \prod_{i=1}^{x^*}(r+i-1)^{s_i} . \end{array}' title='\begin{array}{rl} r^{[t+1]} &amp; = \arg\max_r f(r;a,b)L_n(r,p^{[t]}) \\ \\ &amp; = \arg\max_r  r^a(1+r)^{-a-b}p^{rn} \prod_{i=1}^{x^*}(r+i-1)^{s_i} . \end{array}' class='latex' /></p>
<p>In practice, what I actually maximize is the log of the above quantity, which reduces the chance of overflow or underflow during computation.  While <img src='http://s.wordpress.com/latex.php?latex=f%28r%3Ba%2Cb%29L_n%28r%2Cp%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='f(r;a,b)L_n(r,p)' title='f(r;a,b)L_n(r,p)' class='latex' /> is not easily integrable, it is differentiable (as is its derivative).  One could define first and second derivatives and find the maximum value using Newton-Raphson, but the derivatives require recurrence relations to state succinctly (and I couldn&#8217;t be bothered).</p>
<p>All that remains is the initial choice of <img src='http://s.wordpress.com/latex.php?latex=p%5E%7B%5B0%5D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p^{[0]}' title='p^{[0]}' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=r%5E%7B%5B0%5D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='r^{[0]}' title='r^{[0]}' class='latex' />.  I chose as starting points the method of moments estimator for these parameters (<a title="Efficient Estimation of Parmaeters in the Negative Binomial Distribution" href="http://dx.doi.org/10.1080/03610920500501346">Savani and Zhigljavsky</a>):</p>
<p style="text-align:center;"><img src='http://s.wordpress.com/latex.php?latex=%5Cbegin%7Barray%7D%7Brl%7D%20r%5E%7B%5B0%5D%7D%20%26%20%3D%20%5Cbar%7Bx%7D%5E2%2F%28v%20-%20%5Cbar%7Bx%7D%29%20%5C%5C%20%5C%5C%20p%5E%7B%5B0%5D%7D%20%26%20%3D%20r%5E%7B%5B0%5D%7D%20%2F%20%28r%5E%7B%5B0%5D%7D%20%2B%20%5Cbar%7Bx%7D%29%20%20%5Cend%7Barray%7D%20&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\begin{array}{rl} r^{[0]} &amp; = \bar{x}^2/(v - \bar{x}) \\ \\ p^{[0]} &amp; = r^{[0]} / (r^{[0]} + \bar{x})  \end{array} ' title='\begin{array}{rl} r^{[0]} &amp; = \bar{x}^2/(v - \bar{x}) \\ \\ p^{[0]} &amp; = r^{[0]} / (r^{[0]} + \bar{x})  \end{array} ' class='latex' /></p>
<p>where <img src='http://s.wordpress.com/latex.php?latex=%5Cbar%7Bx%7D%20%3D%20%5Cfrac%7B1%7D%7Bn%7D%5Csum_%7Bi%3D1%7D%5En%20x_i&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i' title='\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=v%20%3D%20%5Cfrac%7B1%7D%7Bn%7D%5Cleft%28%5Csum_%7Bi%3D1%7D%20x%5E2%5Cright%29%20-%20%5Cbar%7Bx%7D%5E2&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='v = \frac{1}{n}\left(\sum_{i=1} x^2\right) - \bar{x}^2' title='v = \frac{1}{n}\left(\sum_{i=1} x^2\right) - \bar{x}^2' class='latex' />.  When the method of moments estimates are not well-defined for the sample data, I use the means of the prior distributions as starting points.</p>
<p>Here&#8217;s a snapshot of a few feeds.  The red line indicates the method of moments estimator and the blue line shows the maximum a posteriori estimates. Yes, I know that the negative binomial is a discrete distribution and plotting it as a line is misleading, but I wanted to look at a large number of feeds at a time and using lines to plot the density is easier to see.</p>
<p style="text-align:center;"><a href="http://pogil.files.wordpress.com/2008/07/comment-counts.png"><img class="size-full wp-image-12 aligncenter" src="http://pogil.files.wordpress.com/2008/07/comment-counts.png" alt="Distribution of comment counts" width="460" height="447" /></a></p>
<h3 style="text-align: left;">Update:</h3>
<p style="text-align: left;"><a href="http://livewebir.com/blog/2008/07/modeling-blog-post-comment-counts/#comment-6">Michelle asked a couple of questions</a> about the use of the beta prime distribution as a prior for the <img src='http://s.wordpress.com/latex.php?latex=r&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='r' title='r' class='latex' /> parameter of the negative binomial, so I figured I&#8217;d update this post with a little more detail about this choice.</p>
<p style="text-align: left;">When choosing a distribution to model the prior for <img src='http://s.wordpress.com/latex.php?latex=r&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='r' title='r' class='latex' />, I started by looking at the histogram of <img src='http://s.wordpress.com/latex.php?latex=r&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='r' title='r' class='latex' /> values estimated using the method of moments estimator described above.  I first considered using the gamma distribution, but it didn&#8217;t turn out to be a great fit.  The red line on the histogram below shows the method of moments estimator for the gamma function proposed by <a title="A CLASS OF METHOD OF MOMENTS ESTIMATORS FOR THE TWO-PARAMETER GAMMA FAMILY" href="http://www.stat.ualberta.ca/~wiens/pubs/gamma.pdf">Wiens et al (Pak. J. Statist. 2003 Vol.19(1) pp129-141)</a> with <img src='http://s.wordpress.com/latex.php?latex=k%3D0&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='k=0' title='k=0' class='latex' />.  The blue line shows the much better fit given by the beta prime distribution.</p>
<p style="text-align: center;"><a href="http://livewebir.com/blog/wp-content/uploads/2008/11/negbinom_r_parameter.png"><img class="size-full wp-image-39 aligncenter" title="Distribution of r parameter for negative binomial " src="http://livewebir.com/blog/wp-content/uploads/2008/11/negbinom_r_parameter.png" alt="Distribution of r parameter for negative binomial " width="375" height="375" /></a></p>
<p>To fit the beta prime parameters, I fit a beta distribution to <img src='http://s.wordpress.com/latex.php?latex=r%20%2F%20%28r%20%2B%201%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='r / (r + 1)' title='r / (r + 1)' class='latex' />.  Here&#8217;s a plot showing the histogram for <img src='http://s.wordpress.com/latex.php?latex=r%20%2F%20%28r%20%2B%201%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='r / (r + 1)' title='r / (r + 1)' class='latex' /> and the method of moments estimator for the beta distribution:</p>
<p style="text-align: center;"><a href="http://livewebir.com/blog/wp-content/uploads/2008/11/beta_transformation.png"><img class="size-full wp-image-40 aligncenter" title="beta_transformation" src="http://livewebir.com/blog/wp-content/uploads/2008/11/beta_transformation.png" alt="The beta distribution fit for r / (1 + r)" width="375" height="375" /></a></p>
<p style="text-align:left;">
<p style="text-align:left;">I hope this makes my choice of the beta prime distribution and its estimation a little more clear.</p>
]]></content:encoded>
			<wfw:commentRss>http://livewebir.com/blog/2008/07/modeling-blog-post-comment-counts/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
