Gvenet And Alice [2021] «8K»

: Pure text pre-training does not adapt well to visual grounding; the AG-ALICE integration requires careful tuning of attention temperature.