Efficient XML and Entity Retrieval with PF/Tijah

by Henning Rode, Djoerd Hiemstra, Arjen de Vries, and Pavel Serdyukov

PF/Tijah is a research prototype created by the University of Twente and CWI Amsterdam with the goal to create a flexible environment for setting up search systems. PF/Tijah is first of all a system for structured retrieval on XML data. Compared to other open source retrieval systems it comes with a number or unique features:

  • It can execute any NEXI query without limits to a predefined set of tags. Using the same index, it can easily produce a “focused”, “thorough”, or “article” ranking, depending only on the specified query and retrieval options.
  • The applied retrieval model, score propagation and combination operators are set at query time, which makes PF/Tijah an ideal experimental platform.
  • PF/Tijah embeds NEXI queries as functions in the XQuery language. This way the system supports ad hoc result presentation by means of its query language. The INEX efficiency task submission described in the paper demonstrates this feature. The declared function INEXPath for instance computes a string that matches the desired INEX submission format.
  • PF/Tijah supports text search combined with traditional database querying, including for instance joins on values. The entity ranking experiments described in this article intensively exploit this feature.

With this year's INEX experiments, we try to demonstrate the mentioned features of the system. All experiments were carried out with the least possible pre- and post-processing outside PF/Tijah.

[download draft paper]