Dense Trajectory Notes

Notes of Dense Trajectories

densely sample feature points in each frame
track points in the video based on optical flow.
compute multiple descriptors along the trajectories of feature points to capture shape, appearance and motion information.

Dense Sampling
- Sampling step size $ W=5 $ pixels
- # spatial scales ≤ 8
- Spatial scale increase: $ 1 / \sqrt{2} $
- Removing points in homogeneous areas: $$ T=0.001 \times \max_{i \in l}\min(\lambda_{i}^{1},\lambda_{i}^{2}) $$, where $ (\lambda_{i}^{1},\lambda_{i}^{2}) $ are eigenvalues of point $i$ in image $I$ (the auto-correlation matrix).

where L is the length of trajectory, and the displacement vectors

The features are computed one by one, and each one in a single line, with the following format:

frameNum mean_x mean_y var_x var_y length scale x_pos y_pos t_pos Trajectory HOG HOF MBHx MBHy

The first 10 elements are information about the trajectory:

frameNum: The trajectory ends on which frame
mean_x: The mean value of the x coordinates of the trajectory
mean_y: The mean value of the y coordinates of the trajectory
var_x: The variance of the x coordinates of the trajectory
var_y: The variance of the y coordinates of the trajectory
length: The length of the trajectory
scale: The trajectory is computed on which scale
x_pos: The normalized x position w.r.t. the video (0~0.999), for spatio-temporal pyramid
y_pos: The normalized y position w.r.t. the video (0~0.999), for spatio-temporal pyramid
t_pos: The normalized t position w.r.t. the video (0~0.999), for spatio-temporal pyramid

The following element are five descriptors concatenated one by one:

Explicit camera motion estimation
Assumption: two consecutive frames are related by a homography.
Match feature points between frames using SURF descriptors and dense optical flow
Removing inconsistent matches due to humans: use a human detector to remove matches from human regions (computation expensive)
Estimate a homography with RANSAC with these matches

H Wang, C Schmid, Action recognition with improved trajectories, ICCV 2013
H Wang, A Kläser, C Schmid, CL Liu, Dense trajectories and motion boundary descriptors for action recognition, International Journal of Computer Vision, May 2013, Volume 103, Issue 1, pp 60-79