Author of the publication

Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models.

, , , , , , , and . ICCV (Workshops), page 272-283. IEEE, (2023)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Learning Transferable Spatiotemporal Representations from Natural Script Knowledge., , , , , , and . CoRR, (2022)Unsupervised Medical Image Registration Based on Multi-scale Cascade Network., , , and . PRCV (2), volume 13535 of Lecture Notes in Computer Science, page 251-261. Springer, (2022)JourneyDB: A Benchmark for Generative Image Understanding., , , , , , , , , and 4 other author(s). NeurIPS, (2023)JourneyDB: A Benchmark for Generative Image Understanding., , , , , , , , , and 3 other author(s). CoRR, (2023)Making LLaMA SEE and Draw with SEED Tokenizer., , , , , , and . ICLR, OpenReview.net, (2024)Bridging Video-text Retrieval with Multiple Choice Questions., , , , , , and . CVPR, page 16146-16155. IEEE, (2022)MetaCloth: Learning Unseen Tasks of Dense Fashion Landmark Detection From a Few Samples., , and . IEEE Trans. Image Process., (2022)SEED-Story: Multimodal Long Story Generation with Large Language Model., , , , , , and . CoRR, (2024)MILES: Visual BERT Pre-training with Injected Language Semantics for Video-Text Retrieval., , , , , , , and . ECCV (35), volume 13695 of Lecture Notes in Computer Science, page 691-708. Springer, (2022)VIT-LENS: Towards Omni-modal Representations., , , , , , , , and . CVPR, page 26637-26647. IEEE, (2024)