[XML & Databases 1]: Changes to the upcoming lectures and papers XML & DB

We decided to change the contents of the upcoming lectures. Some new topics have emerged in research on XML and Databases that we like to tell you about. To make room for that we have to skip and reduce some other topics. This also means that we skip 4 papers from the reader and replace them by 5 other papers which are available electronically in the Archive. I will update the Roster and Archive of TeleTOP today. The final three lectures in the new roster are now organized to fall into one theme: “XML Database Support for Multimedia Applications”. The database support we will look into are information retrieval for XML, XML updating, distributed XML querying, XML stand-off annotations and streaming XML.

Bram Smulders graduates on real time data distribution and storage

The techniques used nowadays in a lot of research centers processing lots of sensor data, for instance wind tunnels, are often based on traditional relational databases. In some cases, no databases are used at all. No care is taken to deliver data under real time constraints. This does not have a negative effect in case data is merely stored for analysis after the measurement process has completed, but it can have disastrous effects when the data is needed for control of critical elements in the measurement process itself.

This report discusses a possible solution to real time delivery storage of data. It does not focus on sensor data. Instead, it should be flexible enough to suit any situation in which it is desirable to distribute data under real time constraints. The newly created solution, carrying the name “SQLbusRT”, is based on the blackboard architecture pattern, which will be explained in this report. A comparison is made on how the architecture of the new solution matches with the blackboard architecture. The choice for the blackboard pattern is mainly for its flexibility in the addition and removal of components to and from the system. System components will be able to work on a shared storage. This shared storage is called the blackboard, giving the name to the architecture pattern. A prototype is developed by combining readily available open source products and creating new interfaces. The open source products which are used in this project are MySQL and ORTE. MySQL is a database management system which is known for its high performance and is used on a large scale worldwide. ORTE is an implementation of the RTPS protocol, which serves as a data communication channel over Ethernet, using a publish subscribe mechanism. An explanation of ORTE and the publish subscribe mechanism is given in this report. This report discusses some tests which were executed to predict the performance, reliability and scalability of SQLbusRT in a simple setup. This set of tests can be extended in future research when SQLbusRT matures.

More info on e-Prints, and on Bram's blog

Evaluation of Multimedia Retrieval Systems

by Djoerd Hiemstra and Wessel Kraaij

In this chapter, we provide the tools and methodology for comparing the effectiveness of two or more multimedia retrieval systems in a meaningful way. Several aspects of multimedia retrieval systems can be evaluated without consulting the potential users or customers of the system, such as the query processing time (measured for instance in milliseconds per query) or the query throughput (measured for instance as the number of queries per second). In this chapter, however, we will focus on aspects of the system that influence the effectiveness of the retrieved results. In order to measure the effectiveness of search results, one must at some point consult the potential user of the system. For, what are the correct results for the query “black jaguar”? Cars, or cats? Ultimately, the user has to decide.

Download author version.

[XML & Databases 1]: Reader ‘sold out’?!?

Several students told me that the reader is sold out at the Union Shop. This is strange, because we ordered 50 readers in January last year. I notified BOZ about this. Hopefully, the problem will be solved soon. In the mean-time, you can use the PDFs in the Archive. All papers in the reader are electronically available there. We will keep you informed about the developments here on TeleTOP.

Vojkan Mihajlovic defends Ph.D. thesis on structured information retrieval

Score Region Algebra: A flexible framework for structured information retrieval

by Vojkan Mihajlovic

The scope of the research presented in this thesis is the retrieval of relevant information from structured documents. The thesis describes a framework for information retrieval in documents that have some form of annotation used for describing logical and semantical document structure, such as XML and SGML. The development of the structured information retrieval framework follows the ideas from both database and information retrieval worlds. It uses a three-level database architecture and implements relevance scoring mechanisms inherited from information retrieval models.

To develop the structured retrieval framework, the problem of structured information retrieval is analyzed and elementary requirements for structured retrieval systems are specified. These requirements are: (1) entity selection – the selection of different entities in structured documents, such as elements, terms, attributes, image and video references, which are parts of the user query; (2) entity relevance score computation – the computation of relevance scores for different structured elements with respect to the content they contain; (3) relevance score combination – the combination of relevance scores from (different) elements in a document structure, resulting in a common element relevance score; (4) relevance score propagation – the propagation of scores from different elements to common ancestor or descendant elements following the query. These four requirements are supported when developing a database logical algebra in harmony with the retrieval models used for ranking. In the specification of the logical algebra we face a challenge of a transparent instantiation of retrieval models, i.e., the specification of different retrieval models without affecting the algebra operators.

Download Vojkan’s thesis from EPrints.