Large Vision-Language Models and Their Creative Use

Training on Synthetic Multimodal Data

Aligning Video and Textual Sequences

Image and Video Captioning

Visual Question Answering

All Publications