Multi-modal information retrieval from broadcast video
using OCR and speech recognition
A. Hauptmann, R. Jin, and T. Ng. JCDL'02: Proceedings of the 2nd ACM/IEEE-CS Joint
Conference on Digital Libraries, page 160--161. (2002)
Abstract
We examine multi-modal information retrieval from
broadcast video where text can be read on the screen
through OCR and speech recognition can be performed on
the audio track. OCR and speech recognition are
compared on the 2001 TREC Video Retrieval evaluation
corpus. Results show that OCR is more important that
speech recognition for video retrieval. OCR retrieval
can further improve through dictionary-based
post-processing. We demonstrate how to utilize
imperfect multi-modal metadata results to benefit
multi-modal information retrieval.
%0 Conference Paper
%1 HJN02
%A Hauptmann, Alexander G.
%A Jin, Rong
%A Ng, Tobun Dorbin
%B JCDL'02: Proceedings of the 2nd ACM/IEEE-CS Joint
Conference on Digital Libraries
%D 2002
%K annotation multimedia
%P 160--161
%T Multi-modal information retrieval from broadcast video
using OCR and speech recognition
%U http://doi.acm.org/10.1145/544220.544252
%X We examine multi-modal information retrieval from
broadcast video where text can be read on the screen
through OCR and speech recognition can be performed on
the audio track. OCR and speech recognition are
compared on the 2001 TREC Video Retrieval evaluation
corpus. Results show that OCR is more important that
speech recognition for video retrieval. OCR retrieval
can further improve through dictionary-based
post-processing. We demonstrate how to utilize
imperfect multi-modal metadata results to benefit
multi-modal information retrieval.
@inproceedings{HJN02,
abstract = {We examine multi-modal information retrieval from
broadcast video where text can be read on the screen
through OCR and speech recognition can be performed on
the audio track. OCR and speech recognition are
compared on the 2001 TREC Video Retrieval evaluation
corpus. Results show that OCR is more important that
speech recognition for video retrieval. OCR retrieval
can further improve through dictionary-based
post-processing. We demonstrate how to utilize
imperfect multi-modal metadata results to benefit
multi-modal information retrieval.},
added-at = {2006-07-31T15:48:59.000+0200},
author = {Hauptmann, Alexander G. and Jin, Rong and Ng, Tobun Dorbin},
biburl = {https://www.bibsonomy.org/bibtex/21ff9e1cd7b2ef9e71f85288856d3c05d/lysander07},
booktitle = {JCDL'02: Proceedings of the 2nd ACM/IEEE-CS Joint
Conference on Digital Libraries},
interhash = {b0f2413f72ec04b1326c1e93cc4e5737},
intrahash = {1ff9e1cd7b2ef9e71f85288856d3c05d},
keywords = {annotation multimedia},
mrnumber = {C.DL.02.160},
pages = {160--161},
series = {Video and multimedia digital libraries},
timestamp = {2009-01-27T15:24:50.000+0100},
title = {Multi-modal information retrieval from broadcast video
using {OCR} and speech recognition},
url = {http://doi.acm.org/10.1145/544220.544252},
year = 2002
}