Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-3485
Title: A study of untrained models for multimodal information retrieval
Authors : Imhof, Melanie
Braschler, Martin
Published in : Information Retrieval Journal
Publisher / Ed. Institution : Springer
Publisher / Ed. Institution: Netherlands
Issue Date: 3-Nov-2017
License (according to publishing contract) : Licence according to publishing contract
Type of review: Editorial review
Language : English
Subject (DDC) : 020: Library and information sciences
Abstract: Operational multimodal information retrieval systems have to deal with increasingly complex document collections and queries that are composed of a large set of textual and non-textual modalities such as ratings, prices, timestamps, geographical coordinates, etc. The resulting combinatorial explosion of modality combinations makes it intractable to treat each modality individually and to obtain suitable training data. As a consequence, instead of finding and training new models for each individual modality or combination of modalities, it is crucial to establish unified models, and fuse their outputs in a robust way. Since the most popular weighting schemes for textual retrieval have in the past generalized well to many retrieval tasks, we demonstrate how they can be adapted to be used with non-textual modalities, which is a first step towards finding such a unified model. We demonstrate that the popular weighting scheme BM25 is suitable to be used for multimodal IR systems and analyze the underlying assumptions of the BM25 formula with respect to merging modalities under the so-called raw-score merging hypothesis, which requires no training. We establish a multimodal baseline for two multimodal test collections, show how modalities differ with respect to their contribution to relevance and the difficulty of treating modalities with overlapping information. Our experiments demonstrate that our multimodal baseline with no training achieves a significantly higher retrieval effectiveness than using just the textual modality for the social book search 2016 collection and lies in the range of a trained multimodal approach using the optimal linear combination of the modality scores.
Departement: School of Engineering
Organisational Unit: Institute of Applied Information Technology (InIT)
Publication type: Article in scientific Journal
DOI : 10.21256/zhaw-3485
10.1007/s10791-017-9322-x
ISSN: 1386-4564
URI: https://digitalcollection.zhaw.ch/handle/11475/2169
Restricted until : 2023-01-01
Appears in Collections:Publikationen School of Engineering

Files in This Item:
File Description SizeFormat 
10.1007_s10791-017-9322-x.pdf
  Until 2023-01-01
446.22 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.