Author of the publication

Coherent Multi-sentence Video Description with Variable Level of Detail.

, , , , , and . GCPR, volume 8753 of Lecture Notes in Computer Science, page 184-195. Springer, (2014)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency., , , , , , and . CoRR, (2022)Video Object Segmentation with Referring Expressions., , and . ECCV Workshops (4), volume 11132 of Lecture Notes in Computer Science, page 7-12. Springer, (2018)Adversarial Inference for Multi-Sentence Video Description., , , and . CVPR Workshops, page 0. Computer Vision Foundation / IEEE, (2019)Fooling Vision and Language Models Despite Localization and Attention Mechanism., , , , , and . CVPR, page 4951-4961. Computer Vision Foundation / IEEE Computer Society, (2018)Simple Token-Level Confidence Improves Caption Correctness., , , , , and . WACV, page 5730-5740. IEEE, (2024)MammalNet: A Large-Scale Video Benchmark for Mammal Recognition and Behavior Understanding., , , , , , , and . CVPR, page 13052-13061. IEEE, (2023)Focus! Relevant and Sufficient Context Selection for News Image Captioning., , , and . EMNLP (Findings), page 6078-6088. Association for Computational Linguistics, (2022)Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal Misinformation., , , and . NAACL-HLT, page 1530-1549. Association for Computational Linguistics, (2022)Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation., , , , , and . ACL (1), page 6551-6557. Association for Computational Linguistics, (2019)Generation and grounding of natural language descriptions for visual data.. Saarland University, Saarbrücken, Germany, (2017)