<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Information Retrieval on the Live Web &#187; Uncategorized</title>
	<atom:link href="http://livewebir.com/blog/category/uncategorized/feed/" rel="self" type="application/rss+xml" />
	<link>http://livewebir.com/blog</link>
	<description>by Paul Ogilvie</description>
	<lastBuildDate>Thu, 26 May 2011 18:47:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Freebase topic descriptions in Mechanical Turk</title>
		<link>http://livewebir.com/blog/2009/05/freebase-topic-descriptions-in-mechanical-turk/</link>
		<comments>http://livewebir.com/blog/2009/05/freebase-topic-descriptions-in-mechanical-turk/#comments</comments>
		<pubDate>Fri, 01 May 2009 20:58:48 +0000</pubDate>
		<dc:creator>pogil</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://livewebir.com/blog/?p=82</guid>
		<description><![CDATA[As a component of mSpoke&#8216;s named entity detection algorithm, we disambiguate known named-entities to Freebase topics.  To gather better evaluation and tuning data, I recently spent some time improving our Mechanical Turk template for assessing named entity assignment and disambiguation.  I didn&#8217;t want to upload a Freebase topic description with each HIT, so I instead [...]]]></description>
			<content:encoded><![CDATA[<p>As a component of <a title="mSpoke" href="http://www.mspoke.com">mSpoke</a>&#8216;s named entity detection algorithm, we disambiguate known named-entities to <a title="Freebase" href="http://www.freebase.com">Freebase</a> topics.  To gather better evaluation and tuning data, I recently spent some time improving our<a title="mSpoke" href="http://www.mspoke.com"></a> <a title="Amazon Mechanical Turk" href="https://www.mturk.com/mturk/welcome">Mechanical Turk</a> template for assessing named entity assignment and disambiguation.  I didn&#8217;t want to upload a Freebase topic description with each HIT, so I instead spent a little time figuring out how to load the topic description dynamically.  It wasn&#8217;t terribly difficult, but because it took me a little time to figure out how everything fits together, I figure it is probably worth sharing.</p>
<p>The basic approach is to include the Freebase id and label in the data we upload to Mechanical Turk, then use <a href="http://en.wikipedia.org/wiki/Ajax_(programming)">AJAX</a> to request the description from Freebase and fill it into the HIT when the template is rendered.</p>
<p>The first thing we need is some HTML for the <a href="http://www.freebase.com/signin/licensing">Freebase attribution</a> and places to fill in the <a href="http://en.wikipedia.org/wiki/Wikipedia:Reusing_Wikipedia_content#Example_notice">Wikipedia text attribution</a> (all of our disambiguated named entities have descriptions originating in Wikipedia) and topic description.</p>
<pre style="font-size:small;">&lt;div id="description"&gt;&lt;/div&gt;
&lt;div style="font-size:x-small"&gt;
  &lt;img src="http://www.freebase.com/api/trans/raw/freebase/attribution"
    style="float:left; margin-right: 5px" /&gt;
  &lt;div style="margin-left:30px"&gt; Source:
    &lt;a href="http://www.freebase.com" title="Freebase &amp;ndash; The World's Database"&gt;Freebase&lt;/a&gt;
    &amp;ndash; The World&amp;apos;s Database&lt;br /&gt;
    &amp;quot;&lt;a href="http://www.freebase.com/view/${freebase_id}"title="${label}:
    Freebase &amp;ndash; The World's Database"&gt;${label} &lt;/a&gt;&amp;quot; Freely licensed under
    &lt;a href="http://www.freebase.com/view/common/license/cc_attribution_25"&gt;CC-BY&lt;/a&gt;.
  &lt;/div&gt;
&lt;/div&gt;
&lt;div id="attribution"&gt;&lt;/div&gt;</pre>
<p>The Freebase attribution in the middle is mostly boilerplate and we can fill in the Freebase id and topic label from our HIT data using <code>${freebase_id}</code> and <code>${label}. </code>The <code>description</code> and <code>attribution</code> divs will be filled in using AJAX:</p>
<pre style="font-size:small;">&lt;script src="http://code.jquery.com/jquery-1.3.min.js"&gt;&lt;/script&gt;
&lt;script src="http://jquery-json.googlecode.com/files/jquery.json-1.3.min.js"&gt;&lt;/script&gt;
&lt;script&gt;
  var envelope = { query :
  {
    id :   "${freebase_id}",
    type : "/common/topic",
    article : [{
      id : null,
      "/common/document/source_uri" : null
    }]
  }};

  jQuery.getJSON(
    "http://api.freebase.com/api/service/mqlread?callback=?",
    { query : jQuery.toJSON( envelope ) },
    processIdRequest
  );

  function processIdRequest( response ) {
    if ( response.code == "/api/status/ok"
         &amp;&amp; response.result
         &amp;&amp; response.result.article ) {
      jQuery.each(response.result.article, function() {
        requestDescription(this.id);
        addAttribution(this["/common/document/source_uri"]);
      });
    }
  }

  function requestDescription( id ) {
    jQuery.getJSON(
      "http://api.freebase.com/api/trans/raw/" + id + "?callback=?",
      processDescriptionRequest
    );
  }

  function processDescriptionRequest( response ) {
    if ( response.code == "/api/status/ok"
         &amp;&amp; response.result
         &amp;&amp; response.result.body ) {
      jQuery("div#description").html(response.result.body);
    }
  }

  function addAttribution( uri ) {
    var id = uri.substr(uri.lastIndexOf('/') + 1);
    jQuery("div#attribution").html(
      "The original description for this topic was automatically generated from the " +
      "&lt;a href=\"http://en.wikipedia.org/w/index.php?curid=" + id + "\"&gt;Wikipedia article \"" +
      "${label}\"&lt;/a&gt; licensed under the &lt;a href=\"http://www.gnu.org/copyleft/fdl.html\"&gt;" +
      "GNU Free Documentation License.&lt;/a&gt;"
    );
  }
&lt;/script&gt;</pre>
<p>Since Mechanical Turk fills in the HIT template variables prior to rendering the web page, we can fill in the Freebase page id and topic label where needed, such as in the query to Freebase&#8217;s API and the Wikipedia attribution text.  The query to Freebase, represented by <code>envelope</code>, requests both the topic descriptions id and the Wikipedia source URI.  The script uses <a href="http://jquery.com/">jQuery</a> to request the data, and <code>processIdRequest</code> passes on the article id to <code>requestDescription</code> and the Wikipedia source uri to <code>addAttribution</code>. <code>processIdRequest</code> then uses Freebase to look up the description of the topic given its id.  Finally, since the Wikipedia source URI isn&#8217;t an actual link and looks like <code>http://wp/en/1194195</code>, <code>addAttribution</code> parses out the article id and generates a link to the actual Wikipedia page in the attribution.</p>
]]></content:encoded>
			<wfw:commentRss>http://livewebir.com/blog/2009/05/freebase-topic-descriptions-in-mechanical-turk/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

