R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, und D. Batra. (2016)cite arxiv:1610.02391Comment: This version was published in International Journal of Computer Vision (IJCV) in 2019; A previous version of the paper was published at International Conference on Computer Vision (ICCV'17).