Human action recognition is an active research area with applications in several domains such as visual surveillance, video retrieval and human-computer interaction. Current approaches assign action labels to video streams considering the whole video as a single sequence but, in some cases, the large variability between frames may lead to misclassifications. The authors propose a multiple subsequence combination (MSC) method that divides the video into several consecutive subsequences. It applies part-based and bag of visual words approaches to classify each subsequence. Then, it combines subsequence labels to assign an action label to the video. The proposed approach was tested on the KTH, UCF sports, Youtube and Robo- Kitchen datasets, which have large differences in terms of video length, object appearance and pose, object scale, viewpoint, background, as well as number, type and complexity of actions performed. Two main results were achieved. First, the MSC approach shows better performances compared to classify the video as a whole, even when few subsequences are used. Second, the approach is robust and stable since, for each dataset, its performances are comparable to the part-based approach at the state-of-the-art.
Multiple subsequence combination in human action recognition
Soda P;Iannello G
2014-01-01
Abstract
Human action recognition is an active research area with applications in several domains such as visual surveillance, video retrieval and human-computer interaction. Current approaches assign action labels to video streams considering the whole video as a single sequence but, in some cases, the large variability between frames may lead to misclassifications. The authors propose a multiple subsequence combination (MSC) method that divides the video into several consecutive subsequences. It applies part-based and bag of visual words approaches to classify each subsequence. Then, it combines subsequence labels to assign an action label to the video. The proposed approach was tested on the KTH, UCF sports, Youtube and Robo- Kitchen datasets, which have large differences in terms of video length, object appearance and pose, object scale, viewpoint, background, as well as number, type and complexity of actions performed. Two main results were achieved. First, the MSC approach shows better performances compared to classify the video as a whole, even when few subsequences are used. Second, the approach is robust and stable since, for each dataset, its performances are comparable to the part-based approach at the state-of-the-art.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.