Twitter search ain’t that bad

My friend Daniel Tunkelang recently made the arguments that Twitter isn’t a search engine, nor is their search engine hard to build.  While I certainly agree that Twitter is not a search engine, I disagree with several of his comments with regards to their search engine. As an advocate of HCIR, I’m surprised Daniel didn’t take the time to think about what makes searching tweets different.

  1. Daniel argues that in order for there to be “search,” there must be an information need.  I use Twitter frequently to gauge sentiment and quantity of discussions around products.  Twitter search can greatly aid my decision making process of whether or not to use a product (such as an open source tool).
  2. It is arguable that the recency of a tweet is a very large component of relevance.  Twitter is a way of discussing and interacting with others about what is happening right now.  A reverse chronological ordering of tweets makes sense.
  3. At 140 characters, traditional text ranking algorithms might not perform well.  There’s very little context within a tweet to determine how well such a short tweet matches a short query.  Ordering by well tuned, state-of-the-art text retrieval approaches such as Okapi-BM25 may give bad orderings on tweets.
  4. At 140 characters, the entire tweet can be comfortably displayed in the result list.  There is no need for snippet generation, and query term highlighting facilitates quick scanning of the results.  Other aspects of relevance other than timeliness may be easier to gauge than in other search tasks.
  5. Given the emphasis on timeliness, the search engine must be indexing tweets in real time and making new tweets available very quickly.  While this is no doubt possible with in memory inverted indexes and many people have built this functionality, textbooks rarely address issues of searching and indexing documents in real time.  The real-time demands on Twitter search are greater than other web search engines.

Now that I’m done defending Twitter search, I agree there are things they could do better.  Daniel alludes in comments that one dimension of relevance that is currently not reflected in search results is the influence of the Twitter user on relevance.  There are search tasks where who is participating in the discussion is just as important as when the statement was made.

Also, I mentioned that I use Twitter to measure activity and sentiment.  Twitter search does little to summarize or aggregate results.  External tools such as twist.flaptor.com have done a better job of using tweets to measure and show volume of discussion.  Measuring sentiment from 140 characters is difficult, but it may be possible to measure general reaction as an aggregate.

Comments (4)

  1. I agree that Twitter is useful and I apologize if I didn’t make that clear. I thought that was evident from how much effort I’ve invested in using it and even in trying to improve it. I further concede that Twitter can help meet some information needs, and I just added a note to that effect in my post.

    But…does that make Twitter a search engine? Would you agree that, it’s not a search engine but rather a corpus to which one might apply a search engine–and ideally a more sophisticated one than search.twitter.com? Yes, recency matters (it matters for web search too), but what matters even more for Twitter is the ability to summarize the extreme redundancy of the echo chamber.

    In any case, what galls me is the hype comparing Twitter to Google as if the two were comparable and the former a threat to the latter. I think I’m on safe ground asserting that I’m not a Google fan boy. Still, I know Google, and Twitter is no Google.

    Monday, March 9, 2009 at 7:27 am #
  2. pogil wrote::

    Yes, I agree that Twitter isn’t a search engine. I also agree with you that Twitter’s search engine is not directly comparable to Google.

    However, the guys at Twitter search do know search engines. I think the choices they’ve made so far are reasonable, which I argued above. I think we’ll see more interesting things from them in the future.

    Monday, March 9, 2009 at 7:47 am #
  3. Indeed, I know they have Abdur Chowdhury on board (they acquired him via Summize), so they surely have access to search expertise. But no one gets graded on potential, except perhaps in the blogosphere.

    Monday, March 9, 2009 at 8:18 am #
  4. I agree that search.twitter.com is reasonable.

    Ranking by relevance doesn’t make sense because you often want messages ordered by time. But relevance ranking is the massively prevalent IR paradigm.

    In a user interface, if you have a big list of things with attributes, the two things you can do with an attribute are either ranking or filtering. So that leaves relevance filtering for binary relevance information. Faceted search makes a lot of sense.

    ok so all i did there was re-explain reasoning behind tweetmotif.com :)

    Wednesday, November 18, 2009 at 2:39 pm #