Both Pavel Serdyukov and Claudia Hauff have joint papers accepted for the 32nd Annual ACM SIGIR Conference in Boston, USA.
Placing Flickr Photos on a Map
by Pavel Serdyukov, Vanessa Murdock (Yahoo!), and Roelof van Zwol (Yahoo!)
In this paper we investigate generic methods for placing photos uploaded to Flickr on the World map. As primary input for our methods we use the textual annotations provided by the users to predict the single most probable location where the image was taken. Central to our approach is a language model based entirely on the annotations provided by users. We define extensions to improve over the language model using tag-based smoothing and cell-based smoothing, and leveraging spatial ambiguity. Further we demonstrate how to incorporate GeoNames, a large external database of locations. For varying levels of granularity, we are able to place images on a map with at least twice the precision of the state-of-the-art reported in the literature.
Efficiency trade-offs in two-tier web search systems
by Ricardo Baeza-Yates (Yahoo!), Vanessa Murdock (Yahoo!), and Claudia Hauff
Search engines rely on searching multiple partitioned corpora to return results to users in a reasonable amount of time. In this paper we analyze the standard two-tier architecture for Web search with the difference that the corpus to be searched for a given query is predicted in advance. We show that any predictor better than random yields time savings, but this decrease in the processing time yields an increase in the infrastructure cost. We provide an analysis and investigate this trade-off in the context of two different scenarios on real-world data. We demonstrate that in general the decrease in answer time is justified by a small increase in infrastructure cost.