Score Region Algebra: A flexible framework for structured information retrieval
by Vojkan Mihajlovic
The scope of the research presented in this thesis is the retrieval of relevant information from structured documents. The thesis describes a framework for information retrieval in documents that have some form of annotation used for describing logical and semantical document structure, such as XML and SGML. The development of the structured information retrieval framework follows the ideas from both database and information retrieval worlds. It uses a three-level database architecture and implements relevance scoring mechanisms inherited from information retrieval models.
To develop the structured retrieval framework, the problem of structured information retrieval is analyzed and elementary requirements for structured retrieval systems are specified. These requirements are: (1) entity selection – the selection of different entities in structured documents, such as elements, terms, attributes, image and video references, which are parts of the user query; (2) entity relevance score computation – the computation of relevance scores for different structured elements with respect to the content they contain; (3) relevance score combination – the combination of relevance scores from (different) elements in a document structure, resulting in a common element relevance score; (4) relevance score propagation – the propagation of scores from different elements to common ancestor or descendant elements following the query. These four requirements are supported when developing a database logical algebra in harmony with the retrieval models used for ranking. In the specification of the logical algebra we face a challenge of a transparent instantiation of retrieval models, i.e., the specification of different retrieval models without affecting the algebra operators.
Your slides for the 10-minute presentation on Wednesday, December 6, have to be handed in through TeleTOP one day before (yes, that's TUESDAY!). Select the appropriate roster row and click on the assignment button.
For the practical work, the count now is at ten groups but if students have not submitted their proposals to all teachers, we may have missed some. Groups are visible in the Email/Group section of this TeleTOP-site. Group J doesn't exist to avoid confusion with group I, which does exist; Group O doesn't exist because the difference between “Oh” and zero is small.
Today, the assignments were discussed. Groups of two students should choose an assignment and send an e-mail to the teachers with their plan before the next week (22nd of November). If you want to work on a custom project, make sure you get permission. The plan should contain:
Names of the group members
Approach and timeline
(Optionally, if you come up with your own assignment: a project description)
For a project guide, see the TeleTOP Roster section.
Tomorrow, Tuesday 24 October, is the dead line for sending in the prepared presentations (including e.g. a Powerpoint slide show) of about half an hour. The next day, on Wednesday 25 October, I will announce the lucky winners that will be asked to give the presentation to all of us on Friday 27 October.
We are doing too little as a university for our excellent students. Students with difficulties may expect a lot of attention from lecturers and BOZ (the educational bureau). Excellent student get a high grade, that's it. The academic climate on campus in more and more looking like that of a High School.
The idea for an academic student journal at the University of Twente was born in the autumn of 2005, when a small group of students was discussing original ways to improve the academic climate on campus. They felt that more emphasis should be put on exceptional academic efforts. Student should have the will to excel and be proud of their work. When realizing that many interesting scholar activities by students only lead to an inches thick report, they decided a peer-reviewed journal would be of tremendous added value to the student community and could foster existing and potential academic talents to flourish.
Check out the TSR Web Site