As a component of mSpoke‘s named entity detection algorithm, we disambiguate known named-entities to Freebase topics. To gather better evaluation and tuning data, I recently spent some time improving our Mechanical Turk template for assessing named entity assignment and disambiguation. I didn’t want to upload a Freebase topic description with each HIT, so I instead spent a little time figuring out how to load the topic description dynamically. It wasn’t terribly difficult, but because it took me a little time to figure out how everything fits together, I figure it is probably worth sharing.
The basic approach is to include the Freebase id and label in the data we upload to Mechanical Turk, then use AJAX to request the description from Freebase and fill it into the HIT when the template is rendered.
The first thing we need is some HTML for the Freebase attribution and places to fill in the Wikipedia text attribution (all of our disambiguated named entities have descriptions originating in Wikipedia) and topic description.
<div id="description"></div>
<div style="font-size:x-small">
<img src="http://www.freebase.com/api/trans/raw/freebase/attribution"
style="float:left; margin-right: 5px" />
<div style="margin-left:30px"> Source:
<a href="http://www.freebase.com" title="Freebase – The World's Database">Freebase</a>
– The World's Database<br />
"<a href="http://www.freebase.com/view/${freebase_id}"title="${label}:
Freebase – The World's Database">${label} </a>" Freely licensed under
<a href="http://www.freebase.com/view/common/license/cc_attribution_25">CC-BY</a>.
</div>
</div>
<div id="attribution"></div>
The Freebase attribution in the middle is mostly boilerplate and we can fill in the Freebase id and topic label from our HIT data using ${freebase_id} and ${label}. The description and attribution divs will be filled in using AJAX:
<script src="http://code.jquery.com/jquery-1.3.min.js"></script>
<script src="http://jquery-json.googlecode.com/files/jquery.json-1.3.min.js"></script>
<script>
var envelope = { query :
{
id : "${freebase_id}",
type : "/common/topic",
article : [{
id : null,
"/common/document/source_uri" : null
}]
}};
jQuery.getJSON(
"http://api.freebase.com/api/service/mqlread?callback=?",
{ query : jQuery.toJSON( envelope ) },
processIdRequest
);
function processIdRequest( response ) {
if ( response.code == "/api/status/ok"
&& response.result
&& response.result.article ) {
jQuery.each(response.result.article, function() {
requestDescription(this.id);
addAttribution(this["/common/document/source_uri"]);
});
}
}
function requestDescription( id ) {
jQuery.getJSON(
"http://api.freebase.com/api/trans/raw/" + id + "?callback=?",
processDescriptionRequest
);
}
function processDescriptionRequest( response ) {
if ( response.code == "/api/status/ok"
&& response.result
&& response.result.body ) {
jQuery("div#description").html(response.result.body);
}
}
function addAttribution( uri ) {
var id = uri.substr(uri.lastIndexOf('/') + 1);
jQuery("div#attribution").html(
"The original description for this topic was automatically generated from the " +
"<a href=\"http://en.wikipedia.org/w/index.php?curid=" + id + "\">Wikipedia article \"" +
"${label}\"</a> licensed under the <a href=\"http://www.gnu.org/copyleft/fdl.html\">" +
"GNU Free Documentation License.</a>"
);
}
</script>
Since Mechanical Turk fills in the HIT template variables prior to rendering the web page, we can fill in the Freebase page id and topic label where needed, such as in the query to Freebase’s API and the Wikipedia attribution text. The query to Freebase, represented by envelope, requests both the topic descriptions id and the Wikipedia source URI. The script uses jQuery to request the data, and processIdRequest passes on the article id to requestDescription and the Wikipedia source uri to addAttribution. processIdRequest then uses Freebase to look up the description of the topic given its id. Finally, since the Wikipedia source URI isn’t an actual link and looks like http://wp/en/1194195, addAttribution parses out the article id and generates a link to the actual Wikipedia page in the attribution.
Comments (2)
thanks for the guide. i’ll probably be using AMT in the next couple months for relevance assessment & it would be nice not to have to upload all the data with the HIT dataset.
This is a nice concise way to get an article.. you might also try /api/trans/blurb rather than /api/trans/raw, if a short, nicely formatted description is all you want. /api/trans/raw can return some ugly stuff that might clutter up your UI, like html tags and the like.
We’d like to better expose things like wikipedia attribution so you don’t have to hand-parse/assemble it.