MediaCCNY at TRECVID Surveillance Event Detection 2012

Xiaodong Yang1, Chucai Yi 1,2, Liangliang Cao3, and Yingli Tian1,2

1 - Media Lab, Dept. of Electrical Engineering, City College of New York, New York, USA

2 - Dept. of Computer Science, The Graduate Center, City University of New York, New York, USA.

3 - Multimedia Analytics Group, IBM T. J. Watson Research Center, USA.


Automatic event detection of video surveillance has many real-world security applications for public areas, such as airports, banks, supermarkets, etc. To bridge research efforts and real-world applications, TRECVID provides the Surveillance Event Detection (SED) task to evaluate event detection in real-world surveillance settings. We propose a general event detection system evaluted on all the 7 SED events and achieve TOP 3 performance in 5 events.



Crowded scene and cluttered background

Viewpoint and scale change

Illumination, lighting, and low-resolution



1) Feature Representations by combining STIP-HOG/HOF and SURF/MHI-HOG

2) Cascade SVM Learning to handle large-scale and imbalanced training data


As demonstrated in the above figure, our system includes 4 main components: (1) low-level feature extraction, (2) video representation, (3) learning event models, and (4) post processing to localize event temporal extents.

Cascade SVM Learning

As the sliding window scheme in our system generates quite imbalanced data, e.g., negative samples are over 60 times than positive ones, we propose a cascade SVMs algorithm to handle this high imbalance. The camera and event dependent models are learned to reduce intra-class variance and memory consumption in training


The above figure presents the pseudocode of Cascade SVM learning



Cascade SVM Learning code is avaiable, and it adopts Lib-linear to train stage classifier and Feature-mapping in VLFeat to implement the Chi-square SVM kernel.


- TRECVID Project in Xiaodong Yang's webpage

- MediaCCNY at TRECVID 2012: Surveillance Event Detection