<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments for Probably Irrelevant</title>
	<atom:link href="http://probablyirrelevant.org/comments/feed/" rel="self" type="application/rss+xml" />
	<link>http://probablyirrelevant.org</link>
	<description>Information Retrieval Research and Development</description>
	<lastBuildDate>Tue, 22 Jun 2010 14:21:52 -0400</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>Comment on Query logs and information retrieval research by Lee S Jensen</title>
		<link>http://probablyirrelevant.org/2010/06/query-logs-and-information-retrieval-research/comment-page-1/#comment-9424</link>
		<dc:creator>Lee S Jensen</dc:creator>
		<pubDate>Tue, 22 Jun 2010 14:21:52 +0000</pubDate>
		<guid isPermaLink="false">http://probablyirrelevant.org/?p=76#comment-9424</guid>
		<description>AOL got in big trouble when it released its search log. This was because it violated privacy issues. However, with a vertical search system the logs don&#039;t generally have that concern. Being able to track the search path of person when they are only looking at genealogy (www.ancestry.com) or cameras (www.dpreview.com) doesn&#039;t really tell you anything valuable about a particular person. Focusing on getting search logs from these types of companies should be our focus.</description>
		<content:encoded><![CDATA[<p>AOL got in big trouble when it released its search log. This was because it violated privacy issues. However, with a vertical search system the logs don&#8217;t generally have that concern. Being able to track the search path of person when they are only looking at genealogy (www.ancestry.com) or cameras (www.dpreview.com) doesn&#8217;t really tell you anything valuable about a particular person. Focusing on getting search logs from these types of companies should be our focus.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Query logs and information retrieval research by neelannair</title>
		<link>http://probablyirrelevant.org/2010/06/query-logs-and-information-retrieval-research/comment-page-1/#comment-9217</link>
		<dc:creator>neelannair</dc:creator>
		<pubDate>Wed, 09 Jun 2010 14:57:38 +0000</pubDate>
		<guid isPermaLink="false">http://probablyirrelevant.org/?p=76#comment-9217</guid>
		<description>Valid points.

But I guess query logs are a huge investment for most search engines. It forms part of their competitive advantage over other search engines.

I&#039;d like to propose a time-out clause for search engine query logs. They could release query logs more than &#039;n&#039; years old for a license and fee. So Universities can give open access to researchers to work with them.

As far as the Lemur Project was concerned, I came to know about it in the news that it had been discontinued. There is such a thing as publicity, I guess. Wonder if they tied up with existing search engines or something to promote the cause.</description>
		<content:encoded><![CDATA[<p>Valid points.</p>
<p>But I guess query logs are a huge investment for most search engines. It forms part of their competitive advantage over other search engines.</p>
<p>I&#8217;d like to propose a time-out clause for search engine query logs. They could release query logs more than &#8216;n&#8217; years old for a license and fee. So Universities can give open access to researchers to work with them.</p>
<p>As far as the Lemur Project was concerned, I came to know about it in the news that it had been discontinued. There is such a thing as publicity, I guess. Wonder if they tied up with existing search engines or something to promote the cause.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Vintage Cornell/SMART Tech Reports? by chris</title>
		<link>http://probablyirrelevant.org/2010/05/vintage-cornellsmart-tech-reports/comment-page-1/#comment-8702</link>
		<dc:creator>chris</dc:creator>
		<pubDate>Wed, 05 May 2010 21:58:00 +0000</pubDate>
		<guid isPermaLink="false">http://probablyirrelevant.org/?p=72#comment-8702</guid>
		<description>This?

http://www.sigir.org/museum/</description>
		<content:encoded><![CDATA[<p>This?</p>
<p><a href="http://www.sigir.org/museum/" rel="nofollow">http://www.sigir.org/museum/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on micro-IR by Gene Golovchinsky</title>
		<link>http://probablyirrelevant.org/2009/09/micro-ir/comment-page-1/#comment-3320</link>
		<dc:creator>Gene Golovchinsky</dc:creator>
		<pubDate>Mon, 14 Sep 2009 18:07:45 +0000</pubDate>
		<guid isPermaLink="false">http://probablyirrelevant.org/?p=66#comment-3320</guid>
		<description>I think one of the key issues here is whether the retrieved information is actionable, that is, whether enough context is represented in the system to suggest meaningful actions based on retrieved results. &lt;a href=&quot;http://palblog.fxpal.com/?p=1818&quot; rel=&quot;nofollow&quot;&gt;Here&lt;/a&gt;&#039;s my take.</description>
		<content:encoded><![CDATA[<p>I think one of the key issues here is whether the retrieved information is actionable, that is, whether enough context is represented in the system to suggest meaningful actions based on retrieved results. <a href="http://palblog.fxpal.com/?p=1818" rel="nofollow">Here</a>&#8217;s my take.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on micro-IR by Daniel Tunkelang</title>
		<link>http://probablyirrelevant.org/2009/09/micro-ir/comment-page-1/#comment-3281</link>
		<dc:creator>Daniel Tunkelang</dc:creator>
		<pubDate>Sat, 12 Sep 2009 18:39:38 +0000</pubDate>
		<guid isPermaLink="false">http://probablyirrelevant.org/?p=66#comment-3281</guid>
		<description>Great post! I just posted my own reaction here:

http://thenoisychannel.com/2009/09/12/micro-vs-macro-information-retrieval/</description>
		<content:encoded><![CDATA[<p>Great post! I just posted my own reaction here:</p>
<p><a href="http://thenoisychannel.com/2009/09/12/micro-vs-macro-information-retrieval/" rel="nofollow">http://thenoisychannel.com/2009/09/12/micro-vs-macro-information-retrieval/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on micro-IR by Jinyoung</title>
		<link>http://probablyirrelevant.org/2009/09/micro-ir/comment-page-1/#comment-3268</link>
		<dc:creator>Jinyoung</dc:creator>
		<pubDate>Sat, 12 Sep 2009 03:52:17 +0000</pubDate>
		<guid isPermaLink="false">http://probablyirrelevant.org/?p=66#comment-3268</guid>
		<description>Hi, Miles. (we met in ECIR)

I agree with Fernando in that we&#039;ve been dealing with very constrained features (i.e. textual similarity) in traditional IR.

What characterize Micro IR seems to be that the context (searcher goal) is known, with domain-specific notion of relevance (goodness) and similarity measures. I guess current IR frameworks (including learning to rank) can accmmodiate most of these problems. After all, it&#039;s still about combining evidences, albeit of a different type.

This also reminds me of vertical search, where we need to infer user&#039;s information goal given only query-words. Here, each information type may have somewhat different notion of relevance as well, although not to the extent you talk about in Micro IR.</description>
		<content:encoded><![CDATA[<p>Hi, Miles. (we met in ECIR)</p>
<p>I agree with Fernando in that we&#8217;ve been dealing with very constrained features (i.e. textual similarity) in traditional IR.</p>
<p>What characterize Micro IR seems to be that the context (searcher goal) is known, with domain-specific notion of relevance (goodness) and similarity measures. I guess current IR frameworks (including learning to rank) can accmmodiate most of these problems. After all, it&#8217;s still about combining evidences, albeit of a different type.</p>
<p>This also reminds me of vertical search, where we need to infer user&#8217;s information goal given only query-words. Here, each information type may have somewhat different notion of relevance as well, although not to the extent you talk about in Micro IR.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on micro-IR by Fernando</title>
		<link>http://probablyirrelevant.org/2009/09/micro-ir/comment-page-1/#comment-3265</link>
		<dc:creator>Fernando</dc:creator>
		<pubDate>Sat, 12 Sep 2009 00:57:22 +0000</pubDate>
		<guid isPermaLink="false">http://probablyirrelevant.org/?p=66#comment-3265</guid>
		<description>Any particular information retrieval problem is a function of queries and documents.  I try to approach this in as general a way as possible.  For me, the only difference between micro and macro IR is text.  In fact, I would say, in many ways, pure text IR is _more_ constrained than other IR tasks you mention.</description>
		<content:encoded><![CDATA[<p>Any particular information retrieval problem is a function of queries and documents.  I try to approach this in as general a way as possible.  For me, the only difference between micro and macro IR is text.  In fact, I would say, in many ways, pure text IR is _more_ constrained than other IR tasks you mention.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Finding relevance judgements in the wild by mariana_soffer</title>
		<link>http://probablyirrelevant.org/2009/04/finding-relevance-judgements-in-the-wild/comment-page-1/#comment-1846</link>
		<dc:creator>mariana_soffer</dc:creator>
		<pubDate>Mon, 25 May 2009 14:11:34 +0000</pubDate>
		<guid isPermaLink="false">http://probablyirrelevant.org/?p=61#comment-1846</guid>
		<description>And you complain, came on. Imagine if you have to do the stuff in spanish (not to mention other languages that are even more wierd), how do you parse? where do you get your training sets from? not even webscraping works here.</description>
		<content:encoded><![CDATA[<p>And you complain, came on. Imagine if you have to do the stuff in spanish (not to mention other languages that are even more wierd), how do you parse? where do you get your training sets from? not even webscraping works here.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Finding relevance judgements in the wild by Jon</title>
		<link>http://probablyirrelevant.org/2009/04/finding-relevance-judgements-in-the-wild/comment-page-1/#comment-1684</link>
		<dc:creator>Jon</dc:creator>
		<pubDate>Mon, 20 Apr 2009 14:36:56 +0000</pubDate>
		<guid isPermaLink="false">http://probablyirrelevant.org/?p=61#comment-1684</guid>
		<description>I think your intuition is correct about the FIRST method.  First-posts also tend to be more verbose than answers -- users often respond with a single sentence, which tends to be generally unhelpful in ranking.  

Due to time constraints &amp; such, we weren&#039;t able to annotate many more threads for relevance.  We identified 17k &quot;candidate&quot; question-post/answer-link pairs in the collection, using a few simple heuristics such as the presence of a link in a response message.  Of those 17k, we annotated 550 as to whether or not they actually contained a question/answer pair and identified the 48 we used in the study.  So, we found that roughly 8% of those candidates contain a real question/answer pair, and extrapolating up to the full 17k, I would estimate there are about 1400 question-answer pairs in the collection -- that&#039;s quite a few still to be found.  I&#039;m sure we didn&#039;t find them all.

Ideally, of course, we&#039;d like to use more queries.  I don&#039;t know at this point whether we&#039;ll push forward with this type of test set creation, or whether we&#039;ll do a more traditional relevance assessment.  I&#039;m tempted to do the latter, particularly because it would be nice to see if we observe the same results with the different types of test collections.</description>
		<content:encoded><![CDATA[<p>I think your intuition is correct about the FIRST method.  First-posts also tend to be more verbose than answers &#8212; users often respond with a single sentence, which tends to be generally unhelpful in ranking.  </p>
<p>Due to time constraints &amp; such, we weren&#8217;t able to annotate many more threads for relevance.  We identified 17k &#8220;candidate&#8221; question-post/answer-link pairs in the collection, using a few simple heuristics such as the presence of a link in a response message.  Of those 17k, we annotated 550 as to whether or not they actually contained a question/answer pair and identified the 48 we used in the study.  So, we found that roughly 8% of those candidates contain a real question/answer pair, and extrapolating up to the full 17k, I would estimate there are about 1400 question-answer pairs in the collection &#8212; that&#8217;s quite a few still to be found.  I&#8217;m sure we didn&#8217;t find them all.</p>
<p>Ideally, of course, we&#8217;d like to use more queries.  I don&#8217;t know at this point whether we&#8217;ll push forward with this type of test set creation, or whether we&#8217;ll do a more traditional relevance assessment.  I&#8217;m tempted to do the latter, particularly because it would be nice to see if we observe the same results with the different types of test collections.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Finding relevance judgements in the wild by William Webber</title>
		<link>http://probablyirrelevant.org/2009/04/finding-relevance-judgements-in-the-wild/comment-page-1/#comment-1683</link>
		<dc:creator>William Webber</dc:creator>
		<pubDate>Mon, 20 Apr 2009 10:10:01 +0000</pubDate>
		<guid isPermaLink="false">http://probablyirrelevant.org/?p=61#comment-1683</guid>
		<description>OK, I&#039;ve properly read (rather than skimmed) your poster now.  Very interesting work!

With regard to the relatively good performance of the FIRST method,
is this because the opening query in the linked-from thread is often similar to the opening query in the linked-to thread?

It is rather disappointing that you were only able to identify 48 query/answer pairs out of 375,000 threads.  Is this because that
was all there was, or did you stop looking once you&#039;d found 48?</description>
		<content:encoded><![CDATA[<p>OK, I&#8217;ve properly read (rather than skimmed) your poster now.  Very interesting work!</p>
<p>With regard to the relatively good performance of the FIRST method,<br />
is this because the opening query in the linked-from thread is often similar to the opening query in the linked-to thread?</p>
<p>It is rather disappointing that you were only able to identify 48 query/answer pairs out of 375,000 threads.  Is this because that<br />
was all there was, or did you stop looking once you&#8217;d found 48?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
