Author of the publication

LLMs Meet VLMs: Boost Open Vocabulary Object Detection with Fine-grained Descriptors.

, , , , and . ICLR, OpenReview.net, (2024)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Masked AutoDecoder is Effective Multi-Task Vision Generalist., , , , , and . CoRR, (2024)InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks., , , , , , , , , and 5 other author(s). CoRR, (2023)Weakly Supervised Monocular 3D Detection with a Single-View Image., , , , and . CoRR, (2024)BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision., , , , , , , , , and 2 other author(s). CVPR, page 17830-17839. IEEE, (2023)OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text., , , , , , , , , and 30 other author(s). CoRR, (2024)Scene as Occupancy., , , , , , , , , and 1 other author(s). ICCV, page 8372-8381. IEEE, (2023)Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information., , , , , , , , , and . CoRR, (2022)Needle In A Multimodal Haystack., , , , , , , , , and 6 other author(s). CoRR, (2024)InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions., , , , , , , , , and 2 other author(s). CVPR, page 14408-14419. IEEE, (2023)Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures., , , , , , , , , and . CoRR, (2024)