Name: Joined Video Description from Multiple Sources
Start: 2025-02-19T11:45:00+0000
End: 2025-02-19T13:15:00+0000

10th International Congress on Information and Communication Technology in concurrent with ICT Excellence Awards (ICICT 2025) will be held at London, United Kingdom | February 18 - 21 2025.

Wednesday February 19, 2025 11:45am - 1:15pm GMT

Virtual Room C

Authors - Francisco Seipel-Soubrier, Jonathan Cyriax Brast, Eicke Godehardt, Jorg Schafer
Abstract - We propose an architecture of a proof-of-concept for automated video summarization and evaluate its performance, addressing the challenges posed by the increasing prevalence of video content. The research focuses on creating a multi-modal approach that integrates audio and visual analysis techniques to generate comprehensive video descriptions. Evaluation of the system across various video genres revealed that while video-based large language models show improvements over image-only models, they still struggle to capture nuanced visual narratives, resulting in generalized output for videos without a strong speech based narrative. The multi-modal approach demonstrated the ability to generate useful short summaries for most video types, but especially in speech-heavy videos offers minimal advantages over speech-only processing. The generation of textual alternatives and descriptive transcripts showed promise. While primarily stable for speech-heavy videos, future investigation into refinement techniques and potential advancements in video-based large language models holds promise for improved performance in the future.

Paper Presenters

Jonathan Cyriax Brast

Germany

Wednesday February 19, 2025 11:45am - 1:15pm GMT
Virtual Room C London, United Kingdom

Virtual Room 5C, Virtual Room C

10th International Congress on Information and Communication Technology

Jonathan Cyriax Brast

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!