Sang H, Hai G. A Framework: Region-Frame-Attention-Compact Bilinear Pooling Layer Based S2VT For Video Description. EJAS [Internet]. 2019 Sep. 8 [cited 2026 Jul. 26];7(4):17-30. Available from: https://scholarpublishing.org/journals/index.php/EJAS/article/view/7862