Space-Time Shapelets for Action Recognition


People:

Dhruv Batra, Tsuhan Chen, Rahul Sukthankar


Keywords:

Action Recognition, Mid-level spatio-temporal features, bag-of-words

blocks_image
Figure 1: Weizmann Dataset: The rows represent different actions, while the columns show different people performing those actions.

Abstract


Recent works in action recognition have begun to treat actions as space-time volumes. This allows actions to be converted into 3-D shapes, thus converting the problem into that of volumetric matching. However, the special nature of the temporal dimension and the lack of intuitive volumetric features makes the problem both challenging and interesting. In a data-driven and bottom-up approach, we propose a dictionary of mid-level features called Space-Time Shapelets.1 This dictionary tries to characterize the space of local space-time shapes, or equivalently local motion patterns formed by the actions. Representing an action as a bag of these space-time patterns allows us to reduce the combinatorial space of these volumes, become robust to partial occlusions and errors in extracting spatial support. The proposed method is computationally efficient and achieves competitive results on a standard dataset.
blocks_image
Figure 2: Space-Time Shapelets: Shown are the data-points closest to a few cluster centers (pseudo-medians), created from 7x7x7 volumes. The indicated temporal dimension makes it easier to visualize motion.
blocks_image
Figure 3: Unrolling volumes: Each row depicts a shapelet as a volume, and then x-y time slices, or frames that make up these volumes. In the frames, white represents object pixels, black represents background, and gray pixels exist for illustration purposes to provide contrast.

Publication:


(Oral) Dhruv Batra, Tsuhan Chen, and Rahul Sukthankar. Space-Time Shapelets for Action Recognition. Workshop on Motion and Video Computing 2008 (WMVC '08), IEEE Winter Vision Meetings.
[ pdf ]