From post

Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models.

, , , , , и . ECCV (6), том 12351 из Lecture Notes in Computer Science, стр. 565-580. Springer, (2020)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed.

 

Другие публикации лиц с тем же именем

A quantitative quality control method of big data in cancer patients using artificial neural network., , , , , , и . CCIS, стр. 499-504. IEEE, (2014)MCMG simulator: A unified simulation framework for CPU and graphic GPU., , , и . J. Comput. Syst. Sci., 81 (1): 57-71 (2015)AVID: Any-Length Video Inpainting with Diffusion Model., , , , , , , , и . CoRR, (2023)Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression., , , , , , , , , и 7 other автор(ы). CoRR, (2023)Improving branch divergence performance on GPGPU with a new PDOM stack and multi-level warp scheduling., , , и . J. Syst. Archit., 60 (5): 420-430 (2014)CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval., , , , , , и . KDD, стр. 4433-4442. ACM, (2022)CiT: Curation in Training for Effective Vision-Language Data., , , , , , , и . ICCV, стр. 15134-15143. IEEE, (2023)Question Answering, Grounding, and Generation for Vision and Language.. University of North Carolina, Chapel Hill, USA, (2019)base-search.net (ftcarolinadr:cdr.lib.unc.edu:7h149v557).BachGAN: High-Resolution Image Synthesis From Salient Object Layout., , , , , и . CVPR, стр. 8362-8371. Computer Vision Foundation / IEEE, (2020)Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation., , , , , , и . CVPR, стр. 10681-10692. IEEE, (2023)