<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Blogs, queries, corpora</title>
	<atom:link href="http://probablyirrelevant.org/2008/09/blogs-queries-corpora/feed/" rel="self" type="application/rss+xml" />
	<link>http://probablyirrelevant.org/2008/09/blogs-queries-corpora/</link>
	<description>Information Retrieval Research and Development</description>
	<lastBuildDate>Thu, 30 Jun 2011 14:19:08 -0400</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: diazf</title>
		<link>http://probablyirrelevant.org/2008/09/blogs-queries-corpora/comment-page-1/#comment-17</link>
		<dc:creator>diazf</dc:creator>
		<pubDate>Fri, 12 Sep 2008 16:10:35 +0000</pubDate>
		<guid isPermaLink="false">http://probablyirrelevant.org/?p=9#comment-17</guid>
		<description>Hey Craig.  Thank you for clarifying the process of developing the Blog tracks.  We&#039;ll have to discuss getting you those query logs over a pint sometime.</description>
		<content:encoded><![CDATA[<p>Hey Craig.  Thank you for clarifying the process of developing the Blog tracks.  We&#8217;ll have to discuss getting you those query logs over a pint sometime.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Craig Macdonald</title>
		<link>http://probablyirrelevant.org/2008/09/blogs-queries-corpora/comment-page-1/#comment-16</link>
		<dc:creator>Craig Macdonald</dc:creator>
		<pubDate>Fri, 12 Sep 2008 15:17:48 +0000</pubDate>
		<guid isPermaLink="false">http://probablyirrelevant.org/?p=9#comment-16</guid>
		<description>While I may be partial to an occasional pint of beer, I would have to say that a great deal of thought goes into the defining of a TREC track and the corresponding tasks! Tasks must be proposed and motivated prior to TREC, in proposals which are rated by the TREC program committee. In essence, we didn&#039;t come up with them on the back of a beer mat.

You suggest a query log analysis to motivate the tasks. Why indeed, what a great idea! Indeed, its funny that the opening of each TREC Blog track overview describes the tasks, and motivates them using a study of blog search query logs.  Actually, if you care to read the overview papers or the ICSWM paper, you&#039;ll find that all opinion finding queries for TREC 06 and 07 where sampled from a real query log.

Nevertheless Fernando, if your company is able to provide more up-to-date query logs to allow the tasks to be refined, your help would be much appreciated.

Further reading:
1. &lt;a href=&quot;http://trec.nist.gov/pubs/trec16/papers/BLOG.OVERVIEW08.pdf&quot; rel=&quot;nofollow&quot;&gt; Overview of TREC Blog track 2007&lt;/a&gt;. C. Macdonald, I. Ounis &amp; I. Soboroff.&lt;/a&gt; TREC 2007 Proceedings.
2. &lt;a href=&quot;http://trec.nist.gov/pubs/trec15/papers/BLOG06.OVERVIEW.pdf&quot; rel=&quot;nofollow&quot;&gt; Overview of TREC Blog track 2006&lt;/a&gt;. I. Ounis et al. TREC 2006 Proceedings.
3. &lt;a href=&quot;http://www.dcs.gla.ac.uk/~craigm/publications/ounis08trecblog.pdf&quot; rel=&quot;nofollow&quot;&gt;On TREC Blog Track&lt;/a&gt;. I. Ounis et al. Proceedings of ICSWM 2008. &lt;a href=&quot;http://videolectures.net/icwsm08_soboroff_trec/&quot; rel=&quot;nofollow&quot;&gt;[Video]&lt;/a&gt;.
4. &lt;a href=&quot;http://dx.doi.org/10.1007/11735106_26&quot; rel=&quot;nofollow&quot;&gt;A Study of Blog Search&lt;/a&gt;. G. Mishne &amp; M. de Rijke. Proceedings of ECIR 2006.</description>
		<content:encoded><![CDATA[<p>While I may be partial to an occasional pint of beer, I would have to say that a great deal of thought goes into the defining of a TREC track and the corresponding tasks! Tasks must be proposed and motivated prior to TREC, in proposals which are rated by the TREC program committee. In essence, we didn&#8217;t come up with them on the back of a beer mat.</p>
<p>You suggest a query log analysis to motivate the tasks. Why indeed, what a great idea! Indeed, its funny that the opening of each TREC Blog track overview describes the tasks, and motivates them using a study of blog search query logs.  Actually, if you care to read the overview papers or the ICSWM paper, you&#8217;ll find that all opinion finding queries for TREC 06 and 07 where sampled from a real query log.</p>
<p>Nevertheless Fernando, if your company is able to provide more up-to-date query logs to allow the tasks to be refined, your help would be much appreciated.</p>
<p>Further reading:<br />
1. <a href="http://trec.nist.gov/pubs/trec16/papers/BLOG.OVERVIEW08.pdf" rel="nofollow"> Overview of TREC Blog track 2007</a>. C. Macdonald, I. Ounis &amp; I. Soboroff. TREC 2007 Proceedings.<br />
2. <a href="http://trec.nist.gov/pubs/trec15/papers/BLOG06.OVERVIEW.pdf" rel="nofollow"> Overview of TREC Blog track 2006</a>. I. Ounis et al. TREC 2006 Proceedings.<br />
3. <a href="http://www.dcs.gla.ac.uk/~craigm/publications/ounis08trecblog.pdf" rel="nofollow">On TREC Blog Track</a>. I. Ounis et al. Proceedings of ICSWM 2008. <a href="http://videolectures.net/icwsm08_soboroff_trec/" rel="nofollow">[Video]</a>.<br />
4. <a href="http://dx.doi.org/10.1007/11735106_26" rel="nofollow">A Study of Blog Search</a>. G. Mishne &amp; M. de Rijke. Proceedings of ECIR 2006.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jon Elsas</title>
		<link>http://probablyirrelevant.org/2008/09/blogs-queries-corpora/comment-page-1/#comment-15</link>
		<dc:creator>Jon Elsas</dc:creator>
		<pubDate>Fri, 12 Sep 2008 05:39:52 +0000</pubDate>
		<guid isPermaLink="false">http://probablyirrelevant.org/?p=9#comment-15</guid>
		<description>Fernando -- Although blog search is an area that I have had a lot of fun working in, I agree that the tasks need to be better defined and more rooted in reality.  This really became evident in last year&#039;s TREC track.  Like other nascent trec tracks, the participants developed queries, and then judged them after runs were submitted.  Some of the submitted queries really seemed like oddball blog search queries (&quot;christmas&quot;) and the corresponding relevance judgements were unrealistic for a real web search task (&gt; 100 relevant blogs).  A tighter task definition and queries from real query logs would&#039;ve certainly helped the situation.

But, I don&#039;t think the TREC blog tasks are as separated from reality as you seem to imply.  No doubt the research community can benefit from better understanding of queries  and information needs -- and that goes for pretty much every IR task considered at TREC.  I think, at least to some degree, the blog track tasks were inspired by some commercial examples of blog information access.  The distillation task is really doing what google is doing at the &lt;a href=&quot;http://blogsearch.google.com/blogsearch?hl=en&amp;q=information+retrieval&amp;btnG=Search+Blogs&quot; rel=&quot;nofollow&quot;&gt;top of their blog search list&lt;/a&gt;.  Bloglines is another good example.  The opinion task is similar to sentiment mining services &lt;a href=&quot;http://www.blogpulse.com/&quot; rel=&quot;nofollow&quot;&gt;BlogPulse&lt;/a&gt; provided for corporate customers (based on my understanding from previous discussions with Matt Hurst &amp; Natalie Glance).</description>
		<content:encoded><![CDATA[<p>Fernando &#8212; Although blog search is an area that I have had a lot of fun working in, I agree that the tasks need to be better defined and more rooted in reality.  This really became evident in last year&#8217;s TREC track.  Like other nascent trec tracks, the participants developed queries, and then judged them after runs were submitted.  Some of the submitted queries really seemed like oddball blog search queries (&#8221;christmas&#8221;) and the corresponding relevance judgements were unrealistic for a real web search task (> 100 relevant blogs).  A tighter task definition and queries from real query logs would&#8217;ve certainly helped the situation.</p>
<p>But, I don&#8217;t think the TREC blog tasks are as separated from reality as you seem to imply.  No doubt the research community can benefit from better understanding of queries  and information needs &#8212; and that goes for pretty much every IR task considered at TREC.  I think, at least to some degree, the blog track tasks were inspired by some commercial examples of blog information access.  The distillation task is really doing what google is doing at the <a href="http://blogsearch.google.com/blogsearch?hl=en&#038;q=information+retrieval&#038;btnG=Search+Blogs" rel="nofollow">top of their blog search list</a>.  Bloglines is another good example.  The opinion task is similar to sentiment mining services <a href="http://www.blogpulse.com/" rel="nofollow">BlogPulse</a> provided for corporate customers (based on my understanding from previous discussions with Matt Hurst &#038; Natalie Glance).</p>
]]></content:encoded>
	</item>
</channel>
</rss>

