Large Vision-Language Models and Their Creative Use

Visual Question-answering

Training on Synthetic Multimodal Data

Aligning Video and Textual Sequences

Image and Video Captioning

Other Publications