Large Vision-Language Models and Their Creative Use
Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, and Steven Hoi. InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning. Advances in Neural Information Processing Systems (NeurIPS). 2023.
Luo Jiayun, Siddhesh Khandelwal, Leonid Sigal, and Boyang Li. Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models. The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2024.
Training on Synthetic Multimodal Data
Aligning Video and Textual Sequences
Yidan Sun, Qin Chao, Yangfeng Ji, and Boyang Li. Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding. ArXiv Preprint 2203.05711. 2022. [New Dataset]
Jianan Wang, Boyang Li, Xiangyu Fan, Jing Lin, and Yanwei Fu. Data-efficient Alignment of Multimodal Sequences by Aligning Gradient Updates and Internal Feature Distributions. The IEEE Winter Conference on Applications of Computer Vision (WACV). 2021. [Supplemental Material] [Video] [Code & Data]
Pelin Dogan, Boyang Li, Leonid Sigal, Markus Gross. A Neural Multi-sequence Alignment TeCHnique (NeuMATCH). The Conference on Computer Vision and Pattern Recognition (CVPR) . 2018. [Data]
Image and Video Captioning
Visual Question Answering
Jiaxian Guo, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Boyang Li, Dacheng Tao, and Steven CH Hoi. From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models . The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2023.
Anthony Meng Huat Tiong, Junnan Li, Boyang Li, Silvio Savarese, and Steven C.H. Hoi. Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training. Findings of the Conference on Empirical Methods in Natural Language Processing (Findings of EMNLP). 2022.
All Publications
Luo Jiayun, Siddhesh Khandelwal, Leonid Sigal, and Boyang Li. Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models. The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2024.
Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, and Steven Hoi. InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning. Advances in Neural Information Processing Systems (NeurIPS). 2023.
Zilin Du, Yunxin Li, Xu Guo, Yidan Sun, and Boyang Li. Training Multimedia Event Extraction With Generated Images and Captions. The ACM International Conference on Multimedia (ACM MM). 2023. [Code]
Qin Chao, Eunsoo Kim, and Boyang Li. Movie Box Office Prediction With Self-Supervised and Visually Grounded Pretraining. IEEE International Conference on Multimedia and Expo (ICME). 2023.
Jiaxian Guo, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Boyang Li, Dacheng Tao, and Steven CH Hoi. From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models . The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2023.
Anthony Meng Huat Tiong, Junnan Li, Boyang Li, Silvio Savarese, and Steven C.H. Hoi. Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training. Findings of the Conference on Empirical Methods in Natural Language Processing (Findings of EMNLP). 2022.
Jun Chen, Han Guo, Kai Yi, Boyang Li, and Mohamed Elhoseiny. VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning. The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2022.
Jianan Wang, Boyang Li, Xiangyu Fan, Jing Lin, and Yanwei Fu. Data-efficient Alignment of Multimodal Sequences by Aligning Gradient Updates and Internal Feature Distributions. The IEEE Winter Conference on Applications of Computer Vision (WACV). 2021. [Supplemental Material]
Guoyun Tu, Yanwei Fu, Boyang Li, Jiarui Gao, Yu-Gang Jiang, and Xiangyang Xue. A Multi-task Neural Approach for Emotion Attribution, Classification and Summarization. IEEE Transaction on Multimedia . 2019.
Huijuan Xu, Boyang Li, Vasili Ramanishka, Leonid Sigal, Kate Saenko. Joint Event Detection and Description in Continuous Video Streams. The IEEE Winter Conference on Applications of Computer Vision (WACV). 2019.
Pelin Dogan, Boyang Li, Leonid Sigal, Markus Gross. A Neural Multi-sequence Alignment TeCHnique (NeuMATCH). The Conference on Computer Vision and Pattern Recognition (CVPR) . 2018.
Baohan Xu, Yanwei Fu, Yu-Gang Jiang, Boyang Li, and Leonid Sigal. Heterogeneous Knowledge Transfer in Video Emotion Recognition, Attribution and Summarization. IEEE Transaction on Affective Computing. 2016.
Baohan Xu, Yanwei Fu, Yu-Gang Jiang, Boyang Li and Leonid Sigal. Video Emotion Recognition with Transferred Deep Feature Encodings. The 2016 ACM International Conference in Multimedia Retrieval . New York, NY. 2016.