<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Probably Irrelevant</title>
	<atom:link href="http://probablyirrelevant.org/feed/" rel="self" type="application/rss+xml" />
	<link>http://probablyirrelevant.org</link>
	<description>Information Retrieval Research and Development</description>
	<lastBuildDate>Thu, 15 Jul 2010 20:43:14 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>&#8220;Economic Impact Assessment of NIST’s Text REtrieval Conference (TREC) Program&#8221;</title>
		<link>http://probablyirrelevant.org/2010/07/economic-impact-assessment-of-nist%e2%80%99s-text-retrieval-conference-trec-program/</link>
		<comments>http://probablyirrelevant.org/2010/07/economic-impact-assessment-of-nist%e2%80%99s-text-retrieval-conference-trec-program/#comments</comments>
		<pubDate>Thu, 15 Jul 2010 20:43:14 +0000</pubDate>
		<dc:creator>Fernando</dc:creator>
				<category><![CDATA[Conferences]]></category>
		<category><![CDATA[Evaluation]]></category>
		<category><![CDATA[Web Search]]></category>

		<guid isPermaLink="false">http://probablyirrelevant.org/?p=84</guid>
		<description><![CDATA[Thanks to your feedback,
&#8220;&#8230;this study estimates that TREC’s existence was responsible for approximately one-third of an improvement of more than 200% in web search products that was observed between 1999 and 2009.&#8221;
More here.
]]></description>
			<content:encoded><![CDATA[<p>Thanks to <a href="http://probablyirrelevant.org/2010/02/trec-survey/">your feedback</a>,</p>
<blockquote><p>&#8220;&#8230;this study estimates that TREC’s existence was responsible for approximately one-third of an improvement of more than 200% in web search products that was observed between 1999 and 2009.&#8221;</p></blockquote>
<p>More <a href="http://trec.nist.gov/pubs/2010.economic.impact.pdf">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://probablyirrelevant.org/2010/07/economic-impact-assessment-of-nist%e2%80%99s-text-retrieval-conference-trec-program/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SIGIR 2010 Best Paper Nominees</title>
		<link>http://probablyirrelevant.org/2010/07/sigir-2010-best-paper-nominees/</link>
		<comments>http://probablyirrelevant.org/2010/07/sigir-2010-best-paper-nominees/#comments</comments>
		<pubDate>Sat, 03 Jul 2010 22:18:57 +0000</pubDate>
		<dc:creator>Fernando</dc:creator>
				<category><![CDATA[Conferences]]></category>

		<guid isPermaLink="false">http://probablyirrelevant.org/?p=82</guid>
		<description><![CDATA[SIGIR has posted best paper nominees.

A comparison of general vs personalized affective models for the prediction of topical relevance, I. Arapakis, K. Athanasakos, J. Jose
Assessing the Scenic Route: Measuring the Value of Search Trails in Web Logs, R. White, J. Huang
Caching Search Engine Results over Incremental Indices, F. Junqueira, R. Blanco, E. Bortnikov, R. Lempel, [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.sigir2010.org/doku.php?id=program:awards">SIGIR has posted best paper nominees.</a></p>
<ul>
<li>A comparison of general vs personalized affective models for the prediction of topical relevance, I. Arapakis, K. Athanasakos, J. Jose</li>
<li>Assessing the Scenic Route: Measuring the Value of Search Trails in Web Logs, R. White, J. Huang</li>
<li>Caching Search Engine Results over Incremental Indices, F. Junqueira, R. Blanco, E. Bortnikov, R. Lempel, L. Telloli, H. Zaragoza</li>
<li>Comparing the Sensitivity of Information Retrieval Metrics, F. Radlinski, N. Craswell</li>
<li>Extending Average Precision to Graded Relevance Judgments, S. Robertson, E. Kanoulas, E. Yilmaz</li>
<li>Information Based Model for ad hoc information retrieval, S. Clinchant, E. Gaussier</li>
<li>Multi-style language model for web scale information retrieval, K. Wang, J. Gao, X. Li</li>
<li>Properties of Optimally Weighted Data Fusion in CBMIR, P. Wilkins, A. Smeaton</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://probablyirrelevant.org/2010/07/sigir-2010-best-paper-nominees/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Query logs and information retrieval research</title>
		<link>http://probablyirrelevant.org/2010/06/query-logs-and-information-retrieval-research/</link>
		<comments>http://probablyirrelevant.org/2010/06/query-logs-and-information-retrieval-research/#comments</comments>
		<pubDate>Wed, 02 Jun 2010 01:59:05 +0000</pubDate>
		<dc:creator>Fernando</dc:creator>
				<category><![CDATA[Evaluation]]></category>
		<category><![CDATA[Web Search]]></category>

		<guid isPermaLink="false">http://probablyirrelevant.org/?p=76</guid>
		<description><![CDATA[About one year ago,  Bruce Croft asked the IR community for help with getting access to query logs for academia,
The goal of this project is to create a database of web search activity that will be provided to the information retrieval research community to use on current and future information retrieval research projects.
To accomplish [...]]]></description>
			<content:encoded><![CDATA[<p>About one year ago,  Bruce Croft asked the IR community for help with getting access to query logs for academia,</p>
<blockquote><p>The goal of this project is to create a database of web search activity that will be provided to the information retrieval research community to use on current and future information retrieval research projects.</p></blockquote>
<p>To accomplish this, the Lemur Project developed a toolbar to be voluntarily installed by users.  After a year of data collection, <a href="http://lemurstudy.cs.umass.edu/">the project has been aborted</a>,</p>
<blockquote><p>Given that we have gathered the equivalent of less than 6 seconds of Google traffic (assuming 500 million queries per day) in one year, we have decided to terminate the project.</p></blockquote>
<p>This is pretty depressing news.  Admittedly, part of this depression originates from my guilt over not having contributed to the project myself.  However, a more substantial part stems from the potential this data set had to be groundbreaking, perhaps similar to the release of the first Tipster collections.  Although this was way before my time, I imagine the sudden release of a large, public corpus resulted in a tremendous amount of activity and excitement.</p>
<p>Information retrieval research has had large collections of documents for a few decades now.  We evaluate on a few hundred queries and publish results.  With some exceptions, the majority of interest in the field has focused on scaling up corpora.  As a result, we have rich set of tools to analyze and retrieve documents from large corpora.</p>
<p>There are two things missing from this model: a rich stream of queries coming into the system and a rich stream of interactions between users and documents.  Our friends in the CHI and information science communities have been doing a great job with understanding the important factors involved in user behavior on laboratory scale.  However, I&#8217;m going to draw an analogy here between small scale user studies for IR and document-level NLP analysis for IR that may raise a few eyebrows.  I believe that many IR researchers would argue that, given the choice between a corpus-driven approaches and NLP approaches to IR, they would opt for more data.  This is despite the rich analysis NLP can provide.  Similarly, I believe that the fine-grained analysis provided by laboratory studies may be less important than very large scale analysis of user behavior.  Of course, both the results about NLP for IR and the claim about laboratory experiments are based on relatively limited experiments (e.g. small sets of queries).  We should, as a community, continue research in all of these directions.</p>
<p>Having said this, let&#8217;s consider some motivations for web query logs and IR research,</p>
<p><strong>Claim 1. Web query logs will help with the contribution to web search research.</strong></p>
<p>There is no doubt that query logs are important for any search engine, web or otherwise.  However, query logs are only one of the many sources of interaction data available in production.  There are many, many other signals which can be effectively exploited for query understanding and document ranking.   In my opinion, outside of starting its own web search engine, academia will always be scurrying to catchup to industry&#8217;s data sources.</p>
<p>I convinced myself a few years ago that the resources required to build and maintain a web search engine may never exist in academia.  This is not to say that academic IR researchers should give up on having impact on web search engines.  IR research several decades old continues to impact modern search engine design.  What needs to be determined is how the current academic IR researchers can more directly address the problems confronted by web search companies.  I personally believe that a tight coupling between academic and industrial research labs needs to exist.  This could be accomplished in a number of ways.</p>
<ol>
<li> add value to an existing search engine&#8217;s interface.  If search engines provide ranker APIs, academics can develop new interfaces which may attract users and, as a result, interaction data.</li>
<li> teach the IR fundamentals during the academic year/perform intense interaction during the summer during internships or other collaborations.  I am most familiar with Yahoo&#8217;s <a href="http://labs.yahoo.com/ksc">Key Scientific Challenges Fellowships</a> and <a href="http://labs.yahoo.com/Academic_Relations/Faculty">Faculty Engagement Grants</a>.  Similar programs exist at other web search engines.</li>
<li> develop high-quality, public web search engine simulators which provide students/researchers with the ability to test algorithms <em>in silico</em>.  Our <a href="http://ciir.cs.umass.edu/~fdiaz/sigir09-DA.pdf">SIGIR 2009 paper</a> made extensive use of simulation whose parameters were grounded in real world data.  Systems research in computer architecture or computer networking have adopted this approach for a while.  SIGIR 2010 will be hosting a workshop on <a href="http://www.dcs.gla.ac.uk/access/simint/">simulated interaction</a>.</li>
</ol>
<p>No doubt there are many, many other alternatives.</p>
<p><strong>Claim 2. Web query logs will help with the contribution to production search research.</strong></p>
<p>As stated earlier, IR research has looked at the document side for many, many problems.  This research has benefited web search as well as search in other domains such as legal, news, and enterprise search.</p>
<p>User behavior data improved production web search engines; user behavior data will no doubt improve production non-web search engines.   Just as with web search though, this data does not exist in academia.</p>
<p>I believe, though, that the barrier to entry for non-web/vertical search engines is somewhat lower.  The collections are smaller and manageable.  At the same time, document representations can be richer for verticals, interaction is less constrained, and, as a result, the potential for attracting users may be higher than with portal web search engines.</p>
<p>If an academic institution maintained a domain-specific production search engine, academic research could become more relevant to industrial search engines.  For example, academic institutions would easily be able to publish about query logs, interaction, large scale adaptation, and online learning with large scale real world data.  One important, unresolved question is how to come to terms with experimental reproducibility and production data which is often closed due to privacy reasons.</p>
<p>Academic IR research will continue to contribute to general IR research.  Students trained in IR fundamentals will continue to be strong candidates for research and development in production search companies.  I believe that there is room for greater impact.  How that happens remains to be seen.</p>
]]></content:encoded>
			<wfw:commentRss>http://probablyirrelevant.org/2010/06/query-logs-and-information-retrieval-research/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Vintage Cornell/SMART Tech Reports?</title>
		<link>http://probablyirrelevant.org/2010/05/vintage-cornellsmart-tech-reports/</link>
		<comments>http://probablyirrelevant.org/2010/05/vintage-cornellsmart-tech-reports/#comments</comments>
		<pubDate>Wed, 05 May 2010 19:33:56 +0000</pubDate>
		<dc:creator>Fernando</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://probablyirrelevant.org/?p=72</guid>
		<description><![CDATA[A few years ago, someone published an online interface to many old Cornell/SMART tech reports from the Salton group.  Unfortunately, I cannot seem to find them anywhere now.  Who can help correct this irony?
Update: Here is the SIGIR Digital Museum of Information Retrieval Research, including those SMART reports.
]]></description>
			<content:encoded><![CDATA[<p>A few years ago, someone published an online interface to many old Cornell/SMART tech reports from the Salton group.  Unfortunately, I cannot seem to find them anywhere now.  Who can help correct this irony?</p>
<p><strong>Update:</strong> Here is the <a href="http://www.sigir.org/museum/">SIGIR Digital Museum of Information Retrieval Research</a>, including those SMART reports.</p>
]]></content:encoded>
			<wfw:commentRss>http://probablyirrelevant.org/2010/05/vintage-cornellsmart-tech-reports/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>TREC Survey</title>
		<link>http://probablyirrelevant.org/2010/02/trec-survey/</link>
		<comments>http://probablyirrelevant.org/2010/02/trec-survey/#comments</comments>
		<pubDate>Mon, 22 Feb 2010 17:52:18 +0000</pubDate>
		<dc:creator>Fernando</dc:creator>
				<category><![CDATA[Conferences]]></category>

		<guid isPermaLink="false">http://probablyirrelevant.org/?p=69</guid>
		<description><![CDATA[There is a survey being conducted about the impact of TREC on information retrieval research.  This feedback is important for organizers and I encourage researchers to participate.  If you are outside of the IR community and have ever used TREC collections, this is feedback is also valuable.
]]></description>
			<content:encoded><![CDATA[<p>There is <a href="https://trecsurvey.rti.org/Default.aspx">a survey being conducted about the impact of TREC on information retrieval research</a>.  This feedback is important for organizers and I encourage researchers to participate.  If you are outside of the IR community and have ever used TREC collections, this is feedback is also valuable.</p>
]]></content:encoded>
			<wfw:commentRss>http://probablyirrelevant.org/2010/02/trec-survey/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>micro-IR</title>
		<link>http://probablyirrelevant.org/2009/09/micro-ir/</link>
		<comments>http://probablyirrelevant.org/2009/09/micro-ir/#comments</comments>
		<pubDate>Fri, 11 Sep 2009 19:42:13 +0000</pubDate>
		<dc:creator>miles</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://probablyirrelevant.org/?p=66</guid>
		<description><![CDATA[I&#8217;ve been watching with interest as Apple&#8217;s iphone/ipod_touch app store has grown and matured over the last couple of year (yes, I know, me and almost everyone else).  Interacting with apps on my own, and more recently, building a few, has started me thinking about what I perceive to be an interesting, and I think, novel [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been watching with interest as Apple&#8217;s iphone/ipod_touch app store has <a href="http://brainstormtech.blogs.fortune.cnn.com/2009/05/07/iphone-app-store-40000-and-counting/">grown and matured</a> over the last couple of year (yes, I know, me and almost everyone else).  Interacting with apps on my own, and more recently, building a few, has started me thinking about what I perceive to be an interesting, and I think, novel mode of information interaction.</p>
<p>For lack of a better term, I think of this phenomenon as &#8220;micro information retrieval&#8221; (micro-IR).</p>
<p>By micro-IR I mean the practice of farming information needs out across multiple applications.  Each of these micro-IR applications is built around a tightly constrained problem space, and I think it&#8217;s this constraint that makes micro-IR interesting.</p>
<p>A couple of examples (apologies for any appearance of commercial endorsement; none intended):</p>
<ul>
<li>the <a href="http://yelp.com">yelp</a> app: find, say, restaurants near me</li>
<li><a href="http://www.loopt.com/">loopt</a>: find friends near me</li>
<li>B<a href="http://www.barnesandnoble.com/iphone/">arnes and Noble app</a>: find info on the book in this photo I took</li>
<li><a href="http://www.shazam.com/music/web/home.html">shazam</a>: find the song that is playing into the iphone mic.</li>
</ul>
<p>These examples swing close to simple database lookups.  But if we take a longer view, a more interesting dynamic comes up.  The apps are simple because each one solves a problem that is tightly constrained, answering a question that would involve complicated interaction in its absence.</p>
<p>By way of a few more examples, I am currently developing an app that answers the question: how many gallons of oil would it take to prepare a given recipe?  The app then ranks candidate recipes in increasing order of petroleum consumption.</p>
<p>And it&#8217;s not the case that these sorts of interactions are limited to mobile devices.  Thanks to <a href="http://palblog.fxpal.com/?tag=evaluation">Gene Golovchinsky</a> for pointing me towards <a href="http://labs.adobe.com/technologies/blueprint/">Blueprint</a> an Eclipse plugin that allows users to search for code snippets from within their IDE, leveraging Flex syntax to finesse the search.</p>
<p>Trying to lasso these examples together in efforts to triangulate on what micro-IR actually is, I&#8217;ll note a few overarching commonalities that I see here:</p>
<ol>
<li>In ad hoc (text) IR a principal intellectual challenge lies in modeling &#8216;aboutness.&#8217;  In micro-IR settings, the creativity comes into play in posing a useful (and tractable) question to answer.  The engineering comes easily after that.</li>
<li>The constrained nature of micro-IR applications leads to a lightweight articulation of information need.  There is a tight coupling here between task, query, and the unit of retrieval, a dynamic that I think is compelling.  Pushing this a bit farther, we might consider the simple act of choosing to use a particular application from those apps on a user&#8217;s palette as part of the information need expression.</li>
<li>The tight coupling of task to data to &#8216;query&#8217; enables a strong contextual element to inform the interaction.  Context constitutes the foreground of the micro-IR interaction.</li>
</ol>
<p>I don&#8217;t want to overstate the distinction between micro- and macro-IR.  Of course applications fall along a spectrum of their similarity to the modalities I&#8217;ve laid out here.  But I do think that being aware of micro-IR system characteristics is worthwhile.  Aside from an inherent innovation to how people interact with information, micro-IR opens the door to small-scale developers gaining a wide audience (i.e. the barrier to entry is low).  And concomitant with this is the new monetization model at work in the app store.</p>
<p>I hope readers will comment on this: is micro-IR something at all?  Is it actually related to IR?  How might we turn our eye to micro-IR with respect to generating bona fide research?  Surely there are better example systems than those I&#8217;ve listed&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://probablyirrelevant.org/2009/09/micro-ir/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Finding relevance judgements in the wild</title>
		<link>http://probablyirrelevant.org/2009/04/finding-relevance-judgements-in-the-wild/</link>
		<comments>http://probablyirrelevant.org/2009/04/finding-relevance-judgements-in-the-wild/#comments</comments>
		<pubDate>Tue, 14 Apr 2009 14:45:41 +0000</pubDate>
		<dc:creator>Jon Elsas</dc:creator>
				<category><![CDATA[Evaluation]]></category>
		<category><![CDATA[Social Media]]></category>

		<guid isPermaLink="false">http://probablyirrelevant.org/?p=61</guid>
		<description><![CDATA[We recently heard our poster on online forum search was accepted to SIGIR 09, and I&#8217;ve been wanting to post something about the test setup we used in that study.
There&#8217;s no existing IR test collection for such a task, although some similar datasets do exist.   For various reasons we weren&#8217;t able to create [...]]]></description>
			<content:encoded><![CDATA[<p>We recently heard our <a href="http://www.cs.cmu.edu/~jelsas/papers/SIGIR2009-ForumThreadSearch_poster.pdf">poster on online forum search</a> was accepted to <a href="http://www.sigir2009.org">SIGIR 09</a>, and I&#8217;ve been wanting to post something about the test setup we used in that study.</p>
<p>There&#8217;s no existing IR test collection for such a task, although <a href="http://www.ins.cwi.nl/projects/trec-ent/">some similar datasets do exist</a>.   For various reasons we weren&#8217;t able to create a traditional test collection, with user-issued queries and deep pools of relevance judgements.  But, this particular dataset and possibly other online dialog archives can be mined to produce a ready-made IR test collection.</p>
<p>The users of <a href="http://forums.macrumors.com/">the online forum we&#8217;ve been looking at</a> frequently include links in their forum posts &#8212; often to previous messages and threads in the same forum. These links are sometimes in response to a new user&#8217;s question, and refer the user to a previous instance of the same (or similar) question and an answer contributed by another user.  Here&#8217;s <a href="http://forums.macrumors.com/showthread.php?p=1359222">a</a> <a href="http://forums.macrumors.com/showthread.php?p=4879012">few</a> <a href="http://forums.macrumors.com/showthread.php?p=1054727">examples</a> to illustrate my point.  This interaction among forum users can be used as a form of query/relevance judgement pair.  See <a href="http://www.cs.cmu.edu/~jelsas/papers/SIGIR2009-ForumThreadSearch_poster.pdf">the paper</a> for a few more details on how we characterize the presence of a question-post/answer-link pair.</p>
<p>This type of test collection creation does have some distinct advantages over the typical retrieval test collections used at TREC.  First, the queries represent real information needs of real users of the online forum.  Many TREC queries are pulled from search engine logs, but frequently (as in the <a href="http://ir.dcs.gla.ac.uk/wiki/TREC-BLOG">Blog Track</a>&#8217;s Feed Distillation task) the queries are invented by participants or assessors.  The information needs present in the online forum posts are much more verbose than typical keyword queries on a web search engine, providing a retrieval system more evidence with which to use in relevance scoring.  The &#8220;relevance judgement&#8221;, provided by another forum user linking to a previous thread, also presents <em>in-situ relevance information</em> &#8212; sensitive not only to the original question, but also to the overall nature of the forum and the time when the question was asked.</p>
<p>There are several drawbacks inherent in this type of corpus creation, most importantly with regard to the exhaustiveness of the relevance assessment.  Typically in TREC-style collection development, ranked results from several retrieval systems are pooled and those pooled documents are assessed for relevance.  When the systems&#8217; output is sufficiently diverse and relevance assessment is sufficiently deep, this produces a reasonably complete relevance assessment for each query &#8212; if a relevant document is in the collection, it would most likely be retrieved by one of the systems and be judged by being admitted into the pool.  The method of collecting relevance judgements we use in our SIGIR poster, on the other hand, will not produce anything close to an exhaustive set of relevant threads.  In the great majority of cases, only a single thread is linked to in a subsequent reply message.  There is no guarantee that this thread is the best or only relevant thread in the collection.   For this reason, we must take care not to assume non-judged threads are necessarily irrelevant.</p>
<p>There are plenty of datasets that seem to be ready-made for classification or regression tasks, without any need for annotation &#8212; for example the classic <a href="http://people.csail.mit.edu/jrennie/20Newsgroups/">20 newsgroups</a> for text classification and <a href="http://answers.yahoo.com/">Yahoo! Answers</a> for a number of <a href="http://www.mathcs.emory.edu/~eugene/papers/sigir2008-cqa-satisfaction.pdf">prediction</a> <a href="http://www.mathcs.emory.edu/~eugene/papers/acl08s_cqa-personalization-prelim.pdf">tasks</a>.  For relevance ranking, however, I haven&#8217;t seen any ready-made datasets with real relevance <em>judgements</em>, as opposed to noisy interaction indicators such as click-through statistics.  Conversation archives like the one we use offer one way to mine behavioral data for relevance judgements, offering ground-truth preferable in many ways to post-hoc relevance assessment.</p>
]]></content:encoded>
			<wfw:commentRss>http://probablyirrelevant.org/2009/04/finding-relevance-judgements-in-the-wild/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>SIGIR 2009 ACCEPTED PAPERS THREAD</title>
		<link>http://probablyirrelevant.org/2009/04/sigir-2009-accepted-papers-thread/</link>
		<comments>http://probablyirrelevant.org/2009/04/sigir-2009-accepted-papers-thread/#comments</comments>
		<pubDate>Thu, 09 Apr 2009 14:04:09 +0000</pubDate>
		<dc:creator>Fernando</dc:creator>
				<category><![CDATA[Conferences]]></category>

		<guid isPermaLink="false">http://probablyirrelevant.org/?p=55</guid>
		<description><![CDATA[SIGIR Poster decisions have been mailed.  Full paper decisions should be soon as well.  Authors are encouraged to post preprints/drafts of accepted publications in the comments section.
PAPERS
The program committee reviewed 494 full paper submissions and accepted 78, about a 16% acceptance rate.
In 2008, 497 submitted, 85 accepted, about a 17% acceptance rate.
POSTERS
The program [...]]]></description>
			<content:encoded><![CDATA[<p>SIGIR Poster decisions have been mailed.  Full paper decisions should be soon as well.  Authors are encouraged to post preprints/drafts of accepted publications in the comments section.</p>
<p><strong>PAPERS</strong></p>
<p>The program committee reviewed 494 full paper submissions and accepted 78, about a 16% acceptance rate.</p>
<p>In 2008, 497 submitted, 85 accepted, about a 17% acceptance rate.</p>
<p><strong>POSTERS</strong></p>
<p>The program committee reviewed 256 poster submissions and accepted 86, about a 34% acceptance rate.</p>
<p>In 2008, 173 submitted, 91 accepted, about a 53% acceptance rate.</p>
<p><strong>UPDATE</strong></p>
<p>Accepted papers <a href="http://www.sigir2009.org/Program/papers">published</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://probablyirrelevant.org/2009/04/sigir-2009-accepted-papers-thread/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>SIGIR 2009 Information</title>
		<link>http://probablyirrelevant.org/2008/12/sigir-2009-information/</link>
		<comments>http://probablyirrelevant.org/2008/12/sigir-2009-information/#comments</comments>
		<pubDate>Wed, 10 Dec 2008 21:37:12 +0000</pubDate>
		<dc:creator>Fernando</dc:creator>
				<category><![CDATA[Conferences]]></category>

		<guid isPermaLink="false">http://probablyirrelevant.org/?p=51</guid>
		<description><![CDATA[Conference URLs:

SIGIR 2009 Homepage
SIGIR 2009 Facebook Event

Important Dates:



Jan 19, 2009
Abstracts for full research papers due


Jan 26, 2009
Full research paper submissions due


Feb 2, 2009
Workshop proposals due


Feb 23, 2009
Posters, demonstration, and tutorial proposals due


Mar 2, 2009
Doctoral consortium proposals due


Mar 9, 2009
Notification of workshop acceptances


Apr 11, 2009
All other acceptance notification



]]></description>
			<content:encoded><![CDATA[<p>Conference URLs:</p>
<ul>
<li><a href="http://sigir2009.org">SIGIR 2009 Homepage</a></li>
<li><a href="http://www.facebook.com/home.php#/event.php?eid=42368171077">SIGIR 2009 Facebook Event</a></li>
</ul>
<p>Important Dates:</p>
<table border="0">
<tbody>
<tr>
<td>Jan 19, 2009</td>
<td>Abstracts for full research papers due</td>
</tr>
<tr>
<td>Jan 26, 2009</td>
<td>Full research paper submissions due</td>
</tr>
<tr>
<td>Feb 2, 2009</td>
<td>Workshop proposals due</td>
</tr>
<tr>
<td>Feb 23, 2009</td>
<td>Posters, demonstration, and tutorial proposals due</td>
</tr>
<tr>
<td>Mar 2, 2009</td>
<td>Doctoral consortium proposals due</td>
</tr>
<tr>
<td>Mar 9, 2009</td>
<td>Notification of workshop acceptances</td>
</tr>
<tr>
<td>Apr 11, 2009</td>
<td>All other acceptance notification</td>
</tr>
</tbody>
</table>
]]></content:encoded>
			<wfw:commentRss>http://probablyirrelevant.org/2008/12/sigir-2009-information/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Directions in Search over Social Media</title>
		<link>http://probablyirrelevant.org/2008/11/directions-in-search-over-social-media/</link>
		<comments>http://probablyirrelevant.org/2008/11/directions-in-search-over-social-media/#comments</comments>
		<pubDate>Fri, 07 Nov 2008 19:41:42 +0000</pubDate>
		<dc:creator>Jon Elsas</dc:creator>
				<category><![CDATA[Blog Search]]></category>
		<category><![CDATA[Social Media]]></category>

		<guid isPermaLink="false">http://probablyirrelevant.org/?p=40</guid>
		<description><![CDATA[In his keynote at the Search in Social Media workshop at CIKM, Andrew Tomkins suggested that there is plenty of room for academic IR research progress in social media.  I happen to agree.
Community generated content has been all the rage for a few years:  blogs, Wikipedia, online forums, twitter, Yahoo! Answers, and the list goes on. [...]]]></description>
			<content:encoded><![CDATA[<p><em>In his keynote at the </em><a href="http://ir.mathcs.emory.edu/SSM2008/"><em>Search in Social Media workshop at CIKM</em></a><em>, </em><a href="http://datamining.typepad.com/data_mining/2008/10/search-and-social-media-cikm-2008-rough-notes-from-keynote-by-andrew-tomkins.html"><em>Andrew Tomkins suggested</em></a><em> that there is plenty of room for academic IR research progress in social media.  I happen to agree.</em></p>
<p>Community generated content has been all the rage for a few years:  <a href="http://probablyirrelevant.org">blogs</a>, <a href="http://wikipedia.org">Wikipedia</a>, online forums, <a href="http://twitter.com">twitter</a>, <a href="http://answers.yahoo.com">Yahoo! Answers</a>, and the list goes on.  Many of these generate a large volume of archived data &#8212; some in the form of more or less polished documents, like a blog post or Wikipedia article;  others, like twitter, are snippets of an often one-sided conversation and broadcast messages.</p>
<p>From the IR researcher&#8217;s perspective, is it worth studying these <em>artifacts</em> of &#8220;social media&#8221;?  Is there something that distinguishes these from other document collections?  If so, how can we leverage that distinction in our retrieval models?  This post aims to answer a couple of these questions and hopefully bring up a few more.</p>
<p>First and foremost, we need to identify whether there is value in providing access to artifacts of social media.  Some, like twitter, seem to be mostly ephemeral, only (generally) interesting in the moment and quickly fading from view.  Even the twitter search engine advertises: &#8220;See what&#8217;s happening — right now&#8221; and the results (as far as I can tell) are only ranked chronologically.  </p>
<p>Many other types of social media &#8212; some existing long before Web 2.0 was born &#8212; can be real treasure-troves of information.  There exists an online forum, public mailing lists, newsgroup or message board for virtually every special interest group under the sun &#8212; from <a href="http://forums.gardenweb.com/forums/">gardening</a>, to <a href="http://www.homebrewtalk.com/">home-brewing</a>, to <a href="http://forums.macrumors.com/">apple computers</a>.  These are often heavily trafficked, populated with real subject matter experts, and host a rich information exchange.  I would argue that the content created through these social media outlets present an enormous value to searchers, and information retrieval research has a lot to contribute in this corner of social media.</p>
<p>What makes these document collections different than what has been previously studied?  Can we just treat them the same as web pages?  Or do they need special consideration?</p>
<p>In many of these collections, the unit of retrieval &#8212; what we consider a document &#8212; is not fixed, but rather dependent on the task.  Consider online forums, often organized into topical sub-forums, which in turn are organized into conversation threads of individual posts.  Some information needs many only require a single post as a result, some require the context of the full conversation thread, and others may need to retrieve a pertinent sub-forum.</p>
<p>These collections often offer another orthogonal axis of retrieval &#8212; the author.  In highly trafficked message boards and mailing lists, tens or hundreds of thousands of users with varying levels of expertise contribute to the conversation.  One may wish to find subject matter experts to address a question to, or favor message threads with contributions from those more likely to know the answer.</p>
<p>These factors, of course, are not entirely unique to social media search, and have to some degree been addressed in previous research.  This question of identifying the granularity of the unit of retrieval has been addressed at the document level (for example in XML element retrieval at <a href="http://inex.is.informatik.uni-duisburg.de/">INEX</a>), but not so much at the collection level.   <a href="http://www.cs.purdue.edu/homes/lsi/f233-Si.pdf">Resource ranking in federated search</a> and <a href="http://www.dcs.gla.ac.uk/Keith/Chapter.3/Ch.3.html">cluster-based retrieval</a> bear some resemblance to the selection of a topical sub-collection, such as a sub-forum ranking.  Author-ranking has also been studied at <a href="http://trec.nist.gov">TREC</a> in the <a href="http://ir.dcs.gla.ac.uk/wiki/TREC-BLOG">Blog</a> and <a href="http://www.ins.cwi.nl/projects/trec-ent/wiki/index.php/Main_Page">Enterprise Tracks</a>.  But, each of these have been studied in isolation, without much regard to the interaction between the different aspects of the collection.  To my knowledge, no IR testbeds exist that contain the rich <em>collection</em> structure offered in these types of social media.</p>
<p>This, in my mind, is the real promise of research in search over social media.  These collections provide multiple levels of organizational granularity, different axes of organization, multiple types of searchable objects, and relations among those objects.  I predict that this will be an interesting and fertile direction of information retrieval research &#8212; pushing the systems to support more sophisticated multi-dimensional indexing and extending existing retrieval models to handle rich relationships between documents.</p>
]]></content:encoded>
			<wfw:commentRss>http://probablyirrelevant.org/2008/11/directions-in-search-over-social-media/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>
