<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Information Retrieval on the Live Web &#187; multiple testing</title>
	<atom:link href="http://livewebir.com/blog/tag/multiple-testing/feed/" rel="self" type="application/rss+xml" />
	<link>http://livewebir.com/blog</link>
	<description>by Paul Ogilvie</description>
	<lastBuildDate>Mon, 08 Mar 2010 17:20:24 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Multiple significance testing</title>
		<link>http://livewebir.com/blog/2008/06/multiple-significance-testing/</link>
		<comments>http://livewebir.com/blog/2008/06/multiple-significance-testing/#comments</comments>
		<pubDate>Mon, 16 Jun 2008 18:47:16 +0000</pubDate>
		<dc:creator>pogil</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[false discovery rate]]></category>
		<category><![CDATA[multiple testing]]></category>

		<guid isPermaLink="false">http://pogil.wordpress.com/?p=6</guid>
		<description><![CDATA[Panos Ipeirotis recently posted some interesting thoughts about statistical significance tests for comparing systems when incrementally developing techniques to solve a problem.  While I don&#8217;t have the answer to his question, I did notice that he made note of the Bonferroni method for correcting for mutiple testing.  The need for multiple test correction arises when [...]]]></description>
			<content:encoded><![CDATA[<p>Panos Ipeirotis recently posted some interesting thoughts about <a title="Statistical Significance of Sequntial Comparisons" href="http://behind-the-enemy-lines.blogspot.com/2008/06/statistical-significance-of-sequential.html">statistical significance tests</a> for comparing systems when incrementally developing techniques to solve a problem.  While I don&#8217;t have the answer to his question, I did notice that he made note of the Bonferroni method for correcting for mutiple testing.  The need for multiple test correction arises when you need to make more than one comparison; the more comparisons you make, the more likely an uncorrected test will be rejected by chance.</p>
<p>One limitation of Bonferroni method is its conservativeness.  If you would normally reject the null hypothesis when <img src='http://s.wordpress.com/latex.php?latex=p%20%3C%20%5Calpha&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p &lt; \alpha' title='p &lt; \alpha' class='latex' />, when using Bonferroni correction you would only reject when <img src='http://s.wordpress.com/latex.php?latex=p%20%3C%20%5Calpha%2Fm&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p &lt; \alpha/m' title='p &lt; \alpha/m' class='latex' />, where <img src='http://s.wordpress.com/latex.php?latex=m&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='m' title='m' class='latex' /> is the numer of tests.  This can make it very difficult to reject any of the tests when <img src='http://s.wordpress.com/latex.php?latex=m&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='m' title='m' class='latex' /> is large.  This arises because the Bonferroni method controls the probability of falsely rejecting <em>any</em> null hypothesis; it controls the family-wise error rate.</p>
<p>There&#8217;s a newer technique that addresses the conservative nature of the Bonferroni method.  Instead of controlling the probability of falsely rejecting any null hypothesis, the idea is to control the <a title="false discovery rate" href="http://en.wikipedia.org/wiki/False_discovery_rate">false discovery rate</a>.  The false discovery rate is the expected proportion of false rejections of the null hypothesis.  By controlling the false discovery rate, we acknowledge that we are willing to accept that for each rejection of the null hypothesis, we expected that the probability it was rejected in error is <img src='http://s.wordpress.com/latex.php?latex=%5Calpha&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\alpha' title='\alpha' class='latex' /> or less.</p>
<p>The <a title="Controlling the False Discovery Rate" href="http://www.math.tau.ac.il/~ybenja/MyPapers/benjamini_hochberg1995.pdf">Benjamini-Hochberg</a> method is a simple approach for controlling the false discovery rate.  Given <img src='http://s.wordpress.com/latex.php?latex=P_1%2C%20P_2%2C%20%5Cdots%20P_m&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='P_1, P_2, \dots P_m' title='P_1, P_2, \dots P_m' class='latex' /> p-values resulting from a statistical significance test:</p>
<ol>
<li>Let <img src='http://s.wordpress.com/latex.php?latex=P_%7B%281%29%7D%20%5Cleq%20P_%7B%282%29%7D%20%5Cleq%20%5Cdots%20%5Cleq%20P_%7B%28m%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='P_{(1)} \leq P_{(2)} \leq \dots \leq P_{(m)}' title='P_{(1)} \leq P_{(2)} \leq \dots \leq P_{(m)}' class='latex' /> be the p-values sorted in increasing order.</li>
<li>Define <img src='http://s.wordpress.com/latex.php?latex=l_i%20%3D%20%5Cfrac%7Bi%20%5Calpha%7D%7BC_m%20m%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='l_i = \frac{i \alpha}{C_m m}' title='l_i = \frac{i \alpha}{C_m m}' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=k%20%3D%20%5Cmax%5C%7Bi%20%3A%20P_%7B%28i%29%7D%20%3C%20l_i%5C%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='k = \max\{i : P_{(i)} &lt; l_i\}' title='k = \max\{i : P_{(i)} &lt; l_i\}' class='latex' /> where <img src='http://s.wordpress.com/latex.php?latex=C_m%20%3D%201&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='C_m = 1' title='C_m = 1' class='latex' /> if the p-values are independent and <img src='http://s.wordpress.com/latex.php?latex=C_m%20%3D%20%5Csum_%7Bi%3D1%7D%5E%7Bm%7D%201%2Fi&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='C_m = \sum_{i=1}^{m} 1/i' title='C_m = \sum_{i=1}^{m} 1/i' class='latex' /> otherwise.</li>
<li>Reject all null hypotheses where <img src='http://s.wordpress.com/latex.php?latex=P_i%20%5Cleq%20P_%7B%28k%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='P_i \leq P_{(k)}' title='P_i \leq P_{(k)}' class='latex' />.</li>
</ol>
<p>In my <a title="Investigating the Exhaustivity Dimension in Content-Oriented XML Element Retrieval Evaluation" href="http://www.cs.cmu.edu/~pto/papers/CIKM_2006_INEX_EXH.pdf">CIKM paper</a> from 2006 with <a title="Home Page of Mounia Lalmas" href="http://www.dcs.qmul.ac.uk/~mounia/">Mounia Lalmas</a>, we performed extensive system pairwise comparisons to understand some aspects of evaluation measures in XML element retrieval.  We did look into controlling family-wise error rate through the Bonferroni method, but because we were doing pair-wise tests on roughly 40 different result lists per task, none of the system differences were identified as statistically significant.   Controlling the false discovery rate allowed us to identify differences, despite the large number of comparisons and relatively small sample sizes.</p>
]]></content:encoded>
			<wfw:commentRss>http://livewebir.com/blog/2008/06/multiple-significance-testing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
