Is the science of IR improving?

I’m just back from the annual meeting of ASIST (American Society for Info Science and Technology) in Columbus, OH.   I gave a talk during one of the five sessions on IR, and after all the speakers were through there was a session of audience questions.  Andrew Dillon lobbed a provocative question our way:  how do we know if IR as a field is making forward progress?  (I’m paraphrasing, of course).  An uncomfortable pause set in, followed by obligatory sidestepping, e.g. “first we need to define progress.”  It’s a fair question, though: we see incremental progress reported in the literature, but getting a high-level sense of the field’s forward motion strikes me as harder to come by.

I offered an off-the-cuff answer that I suspect readers might comment on.  Actually it was two answers.

First, surely there is meaning in the increasing competition to publish in the field’s best venues.  This isn’t news, but the following figure showcases the fact that getting a paper into SIGIR is indeed growing more difficult (many more people are trying).

 

Of course SIGIR is not synonymous with the field, but I think the figure speaks to the question Andrew asked.   Unless the SIGIR community is spinning its wheels, increasing competition among researchers suggests expectations and standards for “successful research” is climbing.

My second answer had to do with the diversity of tasks that fit under the umbrella term of IR.  Looking at TREC over the years we see new tasks appear (and disappear), new problems to tackle.  I argued that the field is indeed making progress, and we can see that progress in this creativity.  We are solving problems that we didn’t know existed (e.g. adversarial IR) or that actually didn’t exist (e.g. blog search) only several years ago.  Does this creativity imply improvement?  I argued that it does.

8 Responses to “Is the science of IR improving?”

  1. An interesting question. I’d assume that the increasing selectivity of SIGIR mostly reflects an increasing number of researchers. I would guess that, in contrast, selectivity for complexity theory papers has held relatively constant.

    Does an increasing number of researchers correlate to progress in the field? I’d hope so, though I’d offer the competing (though not contradictory) explanation that the success of commercial search engines has generated buzz and funding for IR research.

    I’m more encouraged by the diversity and creativity of the work, which I feel is getting the IR community out of a rut. I have renewed hope that the IR community will address some of Nick Belkin’s “grand challenges” in our lifetime.

    http://www.sigir.org/forum/2008J/2008j-sigirforum-belkin.pdf

  2. This tells me one thing: I wish I had submitted tons of papers to SIGIR in the nineties. Now I could impress people. I submitted once to SIGIR, in 2002 I think… and I was rejected… I turned it into a fairly cited journal paper in a good journal…

    Increased competition, btw, is a bad thing beyond a certain point… see this post of mine:

    http://www.daniel-lemire.com/blog/archives/2008/10/28/when-in-doubts-prefer-unimpressive-negative-results/

  3. I’m interested in definitions of progress before jumping into the mush of acceptance rates and diversity of TREC tracks.

    1. From the experimental perspective: Progress=improving the effectiveness of a system for the user. One measure of this is performance in offline experiments. This is the route taken by the majority of IR papers.

    2. From the theoretical perspective: Progress=understanding the technologies we have developed. This means (for me) generalizing across retrieval models, understanding when to apply things and when not to, etc.

    3. From the “practical” perspective: Progress=adoption of the technology by people who really interact with users. Trickier to measure.

  4. 4. From the financial perspective: Progress=$$$. In the past 10 years, advances in the sciences of IR have been translated into one of the most commercially successful technology business models of all time.

    Of course, monetary potential is an awful way to measure the advance of science. But, the incredible success of current commercial web IR systems is undeniable — both in terms of money and broad accessibility.

    Another question we need to answer before really understanding whether or not any science is making progress: what is an appropriate timescale of measurement? Looking over the past 20 years, certainly IR has made incredible leaps and bounds. Looking at just the past year’s SIGIR publications, its really impossible to tell. There’s no way to assess the impact of a body of work until its been digested by the community.

    Five or ten years doesn’t seem like an unreasonably long timescale on which to judge progress. (Does it?) If that’s really the case, IR is still in its infancy.

  5. What sorts of leaps has IR made in the past 20 years? While most people may be thinking web search, I claim that this mostly validates progress made 50 years ago. Most of the progress in the past 20 has occurred because of TREC (cf. Croft Salton award speech).

    In this vein, I would like to underline Jon’s comment that we do IR research with the perspective that very smart people have been thinking hard about this since the 1950s (at least). The rationale for this is not just to calibrate how much better we are by using 10, 20, 30 year-old work as a baseline. But also to recognize work which is a minor improvement on—or even completely redundant with—research which is 10, 20, 30 years old.

    Unfortunately, there is a writing culture which encourages students to write papers as though they are inventing the next best thing. Combined with the ease of access of online-but-recent publications, this has resulted in papers being relatively ignorant of previous work older than a decade. This makes judging the contribution of a single publication very difficult unless you are aware of the historic context of the paper. I am very worried that this trend will continue.

    So I encourage folks and especially students to comb through their SIGIR 25 Year CDs; to get into the library to read older years of the following journals: Information Storage and Retrieval, Information Systems, and Journal of Documentation. Not only may you be writing a paper that was done in the 1980s but you may discover some gem of a paper that needed a current research context to be relevant.

  6. My knowledge of the historical IR discoveries & breakthroughs is admittedly very sketchy. I propose a challenge to the community: develop an annotated timeline of significant information retrieval research advances. I’ll start us off (with pointers to original research forthcoming):

    A few of the big ones:
    - The recognition that boolean retrieval can be improved upon & development of the vector space model (Salton)
    - The theory of inverse document frequency (Robertson & Spark Jones)
    - The application of probability theory to document ranking (Van Rijsbergen)

    A few (more recent ones?) that I think are significant:
    - The use of multiple document representations, particularly external representations
    - The application of machine learning to document ranking (Burgess, Joachims)

  7. I would say that IR is by its nature not a science, but rather an engineering problem, because its goal is to help users access data better.

    There are certainly many sciences involved in solving the problem,
    * significant amount of computer science (parallel systems, data structures, AI),
    * evaluation and measurements (TREC@NIST),
    * branches of statistics (for ranking, for analysis),
    * branches of algebra (for ranking, for modeling structured documents)
    * cognitive sciences (task solving, user attention),
    * sociology (user interaction and intention – query log etc.), and many other sciences.
    But these are just tools sharpened to solve the information access problem, not the goal of IR.

    There are some work focused on the notion of relevance, these are either practical or metaphysical, thus, are engineering or philosophy. If they are anywhere near science, then IR may become a science, in the future.

    IR, as a problem, has certainly promoted significant advances in sciences (listed above). Most of them relevant to IR, some of them have implications outside. But to think of IR as a science can be misleading, and probably that’s why when trying to answer this question people sidestepped at first.

    So, to correctly ask the question, one should probably say, “what scientific advances are promoted by the IR community, or related to the IR problem”. And the several comments posted before this one have already answered the question well.

  8. My sense is that progress in IR has been toward larger scale, with all the algorithmic hoops that entails. But the field, particularly as practiced at SIGIR, has significant blinders on in terms of what is considered legitimate work. There are many well-written papers that describe algorithms showing a 5% improvement over some other algorithm. The work is well done, the papers well written, the tables formatted in conformance with orthodoxy. But to me, that’s an indication of a field in decline, rather than of intellectual vigor.
    We need to see more emphasis on the user, on why people search, on how effective they are at incorporating their results into other processes. I am not saying we don’t no steenkin’ algorithms. I am saying that there should be more to IR than that.

Discussion Area - Leave a Comment