Author of the publication

Connecting What To Say With Where To Look by Modeling Human Attention Traces.

, , , , , , and . CVPR, page 12679-12688. Computer Vision Foundation / IEEE, (2021)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models., , , , , and . ECCV (6), volume 12351 of Lecture Notes in Computer Science, page 565-580. Springer, (2020)TVQA+: Spatio-Temporal Grounding for Video Question Answering., , , and . CoRR, (2019)A Unified Framework for Manifold Landmarking., , , and . IEEE Trans. Signal Process., 66 (21): 5563-5576 (2018)GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval., , , , , and . ECCV (35), volume 13695 of Lecture Notes in Computer Science, page 709-725. Springer, (2022)Modeling Context in Referring Expressions., , , , and . ECCV (2), volume 9906 of Lecture Notes in Computer Science, page 69-85. Springer, (2016)Assistive supernumerary grasping with the back of the hand., , , and . ICRA, page 6154-6160. IEEE, (2021)FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks., , , , , and . CVPR, page 2669-2680. IEEE, (2023)Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations., , , , , and . CVPR, page 14825-14835. IEEE, (2023)MCMG simulator: A unified simulation framework for CPU and graphic GPU., , , and . J. Comput. Syst. Sci., 81 (1): 57-71 (2015)Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression., , , , , , , , , and 7 other author(s). CoRR, (2023)