Author of the publication

Connecting What To Say With Where To Look by Modeling Human Attention Traces.

, , , , , , and . CVPR, page 12679-12688. Computer Vision Foundation / IEEE, (2021)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Visual to Sound: Generating Natural Sound for Videos in the Wild., , , , and . CVPR, page 3550-3558. Computer Vision Foundation / IEEE Computer Society, (2018)ReferItGame: Referring to Objects in Photographs of Natural Scenes., , , and . EMNLP, page 787-798. ACL, (2014)Revealing Single Frame Bias for Video-and-Language Learning., , and . ACL (1), page 487-507. Association for Computational Linguistics, (2023)Who are you with and where are you going?, , , and . CVPR, page 1345-1352. IEEE Computer Society, (2011)Parsing clothing in fashion photographs., , , and . CVPR, page 3570-3577. IEEE Computer Society, (2012)VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation., , , , , , , , , and 5 other author(s). NeurIPS Datasets and Benchmarks, (2021)Dance Dance Generation: Motion Transfer for Internet Videos., , , , and . ICCV Workshops, page 1208-1216. IEEE, (2019)CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval., , , , , , and . KDD, page 4433-4442. ACM, (2022)Automatic Attribute Discovery and Characterization from Noisy Web Data., , and . ECCV (1), volume 6311 of Lecture Notes in Computer Science, page 663-676. Springer, (2010)iWalk: a tool for interacting with geo-located data through movement and gesture., , and . ACM Multimedia, page 1059-1062. ACM, (2010)