Topic: vision language capabilities