SANG, Haifeng; HAI, Ge. A Framework: Region-Frame-Attention-Compact Bilinear Pooling Layer Based S2VT For Video Description. European Journal of Applied Sciences, [S. l.], v. 7, n. 4, p. 17–30, 2019. DOI: 10.14738/aivp.74.6717. Disponível em: https://scholarpublishing.org/journals/index.php/EJAS/article/view/7862. Acesso em: 26 jul. 2026.