Author of the publication

Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-Modal Structured Representations.

, , , , , , , , , , and . AAAI, page 2417-2425. AAAI Press, (2024)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Video Dialog via Multi-Grained Convolutional Self-Attention Context Networks., , , , , and . SIGIR, page 465-474. ACM, (2019)Learning Max-Margin GeoSocial Multimedia Network Representations for Point-of-Interest Suggestion., , , , , , and . SIGIR, page 833-836. ACM, (2017)Saliency based proposal refinement in robotic vision., , and . RCAR, page 85-90. IEEE, (2017)Video Question Answering via Knowledge-based Progressive Spatial-Temporal Attention Network., , , , , and . TOMM, 15 (2s): 52:1-52:22 (2019)Efficient location-based search of trajectories with location importance., , , and . Knowl. Inf. Syst., 45 (1): 215-245 (2015)Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks., , , , , and . CoRR, (2023)User Preference Learning for Online Social Recommendation., , , , and . IEEE Trans. Knowl. Data Eng., 28 (9): 2522-2534 (2016)TaoHighlight: Commodity-Aware Multi-Modal Video Highlight Detection in E-Commerce., , , , , and . IEEE Trans. Multim., (2022)Frame-Subtitle Self-Supervision for Multi-Modal Video Question Answering., , and . CoRR, (2022)Unsupervised Discovery of Interpretable Directions in h-space of Pre-trained Diffusion Models., , , , and . CoRR, (2023)