Categorisation of webvideos

This paper describes the possibilities of cross-modal classification of multimedia documents in social media platforms. Our framework predicts the user-chosen category of consumer-produced video sequences based on their textual and visual features. These text resources---includes metadata and automatic speech recognition transcripts---are represented as bags of words and the video content is represented as a bag of clustered local visual features. The contribution of the different modalities is investigated and how they should be combined if sequences lack certain resources. Therefore, several classification methods are evaluated, varying the resources. The paper shows an approach that achieves a mean average precision of 0.3977 using user-contributed metadata in combination with clustered SURF.

ASR Transcripts: 4.317% (sports)
Metadata: 100% (sports)
Visual features: 7.692% (default_category)
Fusion: 100% (sports)
Ground truth: sports

Related Papers

Pascal Kelm, Sebastian Schmiedeke and Thomas Sikora. A hierarchical, multi-modal approach for placing videos on the map using millions of Flickr photographs. Proceedings of the 2011 ACM workshop on Social and behavioural networked media access (SBNMA '11).
Martha Larson, Maria Eskevich, Roeland Ordelman, Christoph Kofler, Sebastian Schmiedeke and Gareth J. F. Jones. Overview of MediaEval 2011 Rich Speech Retrieval Task and Genre Tagging Task. Working Notes Proceedings of the MediaEval 2011 Workshop. ISSN 1613-0073.
Glasberg, R., Schmiedeke, S., Kelm, P. and Sikora, T.. An automatic system for real-time video-genres detection using high-level-descriptors and a set of classifiers. Proc. IEEE International Symposium on Consumer Electronics ISCE 2008.
Pascal Kelm, Sebastian Schmiedeke and Thomas Sikora. FEATURE-BASED VIDEO KEY FRAME EXTRACTION FOR LOW QUALITY VIDEO. Proceedings of the International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2009).

Funding

We would like to acknowledge the 2011 Genre Tagging Task of the MediaEval Multimedia Benchmark for providing the data used in this research. The research leading to these results has received funding from the European Community's FP7 under grant agreement number 216444 (NoE PetaMedia) and 261743 (NoE VideoSense).

Comments and questions to schmiedeke[a]nue.tu-berlin.de

Demonstrator

Related Papers