Sang, Haifeng, and Ge Hai. “A Framework: Region-Frame-Attention-Compact Bilinear Pooling Layer Based S2VT For Video Description”. European Journal of Applied Sciences 7, no. 4 (September 8, 2019): 17–30. Accessed May 30, 2026. https://scholarpublishing.org/journals/index.php/EJAS/article/view/7862.