Modeling Uncertainty in Video Retrieval: A Retrieval Model for Uncertain Semantic Representations of Videos
by Robin Aly
The need for content based multimedia retrieval increases rapidly because of ever faster growing collection sizes. However, retrieval systems often do not perform well enough for real-life applications. A promising approach is to detect semantic primitives at indexing time. Currently investigated primitives are: the uttering of the words and the occurrence of so-called semantic concepts, such as “Outdoor” and “Person”. We refer to a concrete instantiation of these primitives as the representation of the video document. Most detector programs emit scores reflecting the likelihood of each primitive. However, the detection is far from perfect and a lot of uncertainty about the real representation remains. Some retrieval algorithms ignore this uncertainty, which clearly hurts precision and recall. Other methods use the scores as anonymous features and learn their relationship to relevance. This has the disadvantage of requiring vast amounts of training data and has to be redone for every detector change.
The main contribution of our work is a formal retrieval model of treating this uncertainty. We conceptually consider the retrieval problem as two steps: (1) the determination of the posterior probability distribution given the scores over all representations (using existing methods) and (2) the derivation of a ranking status value (RSV) for each representation. We then take the expected RSV weighted by the respresentation’s posterior probability as the effective RSV of this shot for ranking. We claim that our approach has following advantages: (a) that step (2) is easier achieved than using the machine learning alternative and (b) that it benefits from all detector improvements.