Key Frame Extraction from Motion Capture Data by Curve
Saliency
Eyuphan Bulut
Tolga Capin
Bilkent University
Department of Computer Engineering
06800 Ankara, Turkey
Bilkent University
Department of Computer Engineering
06800 Ankara, Turkey
eyuphan@cs.bilkent.edu.tr
tcapin@cs.bilkent.edu.tr
ABSTRACT
We propose a new method for extracting key frames from a
motion capture sequence. Our proposed approach consists of two
steps. In the first step, we propose a new metric, curve saliency,
for motion curves that specifies the important frames of the
motion. In the second step, we detect the final key frames by
clustering the computed important frames. As a result of our
experimental results, on the average, by using only 3.7% of all
frames as key frames, we can represent the captured motion
sequence
way that they treat motion sequences. We compare these solutions
in the related work section.
In this paper, we propose a new approach to find key frames in a
motion captured sequence. We treat the input motion sequence as
a curve, and find the most salient parts of this curve which are
crucial in the representation of the motion behavior. We apply the
idea of saliency to motion curves in the first part of our algorithm.
Then in the second part, we apply key frame reduction techniques
in order to obtain the most important key frames of the motion.
I.3.7 [Computer Graphics]: Three-Dimensional Graphics and
Realism: Animation, Visible line/surface algorithms.
The remainder of the paper is organized as follows: Section 2
reviews previous research. In Section 3 we introduce our
proposed approach. In Section 4 we present our experimental
results. Finally, in Section 5 we summarize our approach for key
frame extraction.
General Terms
2. RELATED WORK
Algorithms, Management, Performance, Design, Economics,
Theory
There have been various proposed solutions to keyframe
extraction in literature. The techniques used mainly fall into three
categories: Curve Simplification, Clustering, and Matrix
Factorization. The basic approaches of each of these techniques
are as follows:
Categories and Subject Descriptors
Keywords
Virtual human animation, motion capture, key frame extraction
1. INTRODUCTION
In the last decade, we have witnessed the rising significance of
motion capture for several applications. Movies and games have
started to widely use motion capture systems. At the same time,
the storage and transmission of motion capture content has
become a problem due to their tremendous size. The need for
more compact representation has lead researchers to investigate
the ways of handling these large data.
Another problem with motion capture is editing of motion
sequences. Different from other animation techniques (such as
procedural and keyframe animation), it is difficult to edit a motion
capture content. Various researchers have proposed solutions,
such as stylization, to modify an input motion sequence.
Keyframing is one of the most effective ways of achieving this.
The important frames of a motion are selected to be the key
frames and the others are computed via the interpolation
techniques by using the key frames. To edit a motion capture
sequence, this emerges a new problem, “which frames of the
motion will we choose as the key frames?” Up to now, there have
been a number of approaches proposed for the solution of this
problem. They essentially differ from each other in terms of the
Curve simplification: In this method, the motion sequence is
represented as a trajectory curve in high-dimensional feature
space and the curve simplification algorithms are applied to these
trajectory curves. The extracted key frames are the junctions
between simplified curve segments.
Clustering: Motions are defined with feature sets and the frames
of the motion data are clustered in terms of these features. Key
frames are selected by selecting a frame in each cluster. (i.e. [12])
Matrix Factorization: The frames of motion data are represented
as matrices such as feature frame matrices formed by color
histograms of frames. Then by using techniques such as singular
value decomposition (SVD) [7] and low-order discrete cosine
term (DCT) [10], the summary of the motion is constructed.
These three categories differ from each other in terms of the
representation of motions and there are various works done in
each category. Our algorithm belongs to the first approach,
therefore we will survey the previous approaches for curve
simplification in detail, and analyze the advantage of our
algorithm over these methods.
An initial work of curve simplification belongs to Lim and
Thalmann [3]. Their method uses Lowe’s algorithm [6] for curve
simplification, which represents the values of a single joint over a
motion sequence. Starting with the line which combines the start
and end points of the curve, the algorithm divides the line into
two sub-lines, if the maximum deviation of any point on the curve
from the line is larger than a certain error rate. The algorithm does
the same process recursively for each sub-line, until the error rate
is small enough for each sub-line. Figure 1 illustrates this idea.
A point that has “large angle difference” on both sides. That
is, the point which is at least 50% of the amplitude far away
from the neighbor frames.
Having the fixed frames of the motion that can not be deleted, the
authors apply reduction operations to the other characteristic
frames and find the key frames of the motion. However, this
method is not optimal, e.g. on the average, 55% of all frames are
selected to be key frames of a motion.
•
3. OUR APPROACH FOR KEYFRAME
EXTRACTION
Our approach consists of two main steps. The first step is applying
a curve saliency metric to the motion curves to measure the
importance of each frame. This results in a number of candidate
key frames. The second step is reducing the number of candidate
key frames by applying clustering methods and selecting only
sufficient number of frames from each cluster.
Figure 1. The steps of curve simplification method used in [3].
From (a) to (d) the line segment is separated from the point
which has the longest distance to the line.
Another approach that aims to find the key frames by dealing with
motion curves is presented in the work of Okuda et al. [4] [5].
This approach detects the key frames in motion capture data by
using frame decimation. The frames are decimated one by one,
according to their importance. When a desired number of key
frames are obtained, the algorithm stops.
Lee et al. [8] have introduced the approach of mesh saliency to
computer graphics. Their work aims to find a metric to define the
most important parts of an input mesh that could be used in mesh
simplification and viewpoint selection.
The mesh saliency of a vertex v is calculated as follows:
S(v) = |G(C(v), )-G(C(v),2 )|
(2)
In other words, the saliency of a vertex is defined as the absolute
difference value between the Gaussian weighted averages
computed at fine ( ) and coarse (2 ) scales.
In the above formula, C is a curvature map from each vertex of a
mesh to its mean curvature and C (v) denotes the mean curvature
of vertex v.
3.1 Curve Saliency
Inspired from mesh saliency approach, we compute the curve
saliency of each point (i.e. frame) in the curve in order to find its
importance for the representation of the motion.
Figure 2. Decimation of a frame and further updates done for
neighbors
In order to handle the results of each curve (representing a degree
of freedom), the authors define a weight function, W(j) which is
the weight of jth curve. Then the total error is defined as
En(k)= W (j) D (j, k)
(1)
n
The algorithm deletes the frame with the minimum E (k). And the
frames are decimated until the number of frames reaches to the
desired number of key frames. This approach gives the near-tooptimal result, however it has a higher complexity due to the
calculation of errors at each step.
Another work is Matsuda and Kondo’s approach [9]. First, the
proposed solution finds the fixed frames of the motion which
satisfy one of the following:
•
•
Local minimum or maximum value
One of the end points of a straight line
The curve saliency of each point is computed as follows. First, we
compute the Gaussian weighted average value of a point assuming
a Gaussian distribution with mean 0 and standard deviation and
centered at that point. Then we calculate a similar value with
standard deviation 2 . The curve saliency value for that point then
becomes the absolute difference of these two values. If a point is
significant for the motion, it is due to its location on the curve.
That is, if its value shows a remarkable change according to the
values of neighboring points, it is an important point. Therefore, if
a remarkable change occurs in the value of the point between the
results of the two Gaussians, then its curve saliency has a higher
value. Figure 3 shows the saliency values on a sample motion
curve.
After the calculation of the saliency value for each point on the
curve, we define the points having a saliency value greater than
average saliency for the motion as candidate key frames. Figure 4
shows the result of this process on two sample motion curves.
The set of candidate keyframes include groups of consecutive
frames, which we can treat as clusters. When we consider these
clusters independent from each other, we notice that one frame
can easily reflect the characteristic of the frames in the cluster.
Therefore, we only select the frame with local maximum or local
minimum value among the frames in the cluster.
Fcluster (i) = fki, fki+1, fki+2 … fki+mi
Figure 3. Saliency values of frames are indicated in the motion
curve of x axis angle of left upper leg joint in walking action
fki
fki+1
…
fki+t
fki+t+1 …
fki+mi
(4)
fki
fki+1
…
fki+t
fki+t+1 …
fki+mi
(5)
The situations illustrated in (4) and (5) result the selection of
frame fki+t as keyframe.
Figure 5 shows the local minimum and maximums on a sample
curve.
Figure 5. Decrease the number of candidate key frames by
only selecting local minimums or local maximums from each
cluster.
After we apply the reduction of candidate key frames as described
above, the resulting number of key frames becomes satisfactory.
Figure 6 shows the result of the first reduction operation.
Figure 4. Candidate key frames are indicated in the motion
curves of (above) y axis angle of left upper leg joint in jumping
action (below) x axis angle of left upper leg joint in walking
action.
3.2 Reduction
Selecting the frames that have higher curve saliency values than
the average saliency value results an excessive number of
candidate keyframes. Since not all of them are needed in order to
represent the characteristics of the motion, we select the important
ones among these candidate keyframes.
As it is seen in Figure 3, the candidate key frames form clusters.
Let F be the set of all frames and Fkeyframe be the set of candidate
keyframes. Then they are formulated as follows:
Figure 6. Decrease the number of candidate key frames by
only selecting local minimums or local maximums from each
cluster
Since the joints are represented by three angles, there occur three
different sets of key frames for each joint. Let Fkeyframe-X, Fkeyframe-Y
and Fkeyframe-Z be the keyframe sets of each angle space.
F keyframe-X = {fx1, fx2, fx3, ... fxm1}
F = {f1, f2, f3, … fn-1, fn}
F keyframe-Y = {fy1, fy2, fy3, ... fym2}
F keyframe = {f1,
fk1, fk1+1, fk1+2, ... fk1+m1,
F keyframe-Z = {fz1,fz2,fz3...fzm3}
fk2, fk2+1, fk2+2, … fk2+m2,
Since all angle spaces are equally important for the motion, we
should consider equal contribution of decided keyframes of each
angle space to the final list of keyframes of the whole motion.
Therefore, as the next step we combine all the decided key frames
of each angle by taking union of them. Since the keyframes are
……..
fkn, fkn+1, fkn+2, … fkn+mn, fn}
(3)
(6)
selected for each angle curve independently, combination of them
may result closer keyframes. (i.e. 10 frames) That is why, as the
final step, we delete the least important key frames among the
ones that are close to each other. Figure 7 shows the combination
of all key frames and the final key frames after the deletion of
closer ones on the curve of x axis angle.
Table 2. Final number of key frames decided after reduction of
close key frames
Graphics
Top
In-between
Bottom
Before
51
97
31
After
33
36
25
All
800
800
800
Ratio
4.1%
4.5%
3.1%
We have also computed the mean absolute error rate of our
algorithm using the below formula:
E= [( |Fo(k) – Fr(k)|2 )] / N
(7)
o
Here F (k) and Fr(k) are the values of joints in original and
reconstructed motion respectively. Figure 8 shows the comparison
of this error with two previous works (Frame decimation [3] and
curve simplification [5]) in a sample walking motion.
Figure 7. Combination of all key frames (above) and the final
key frames after deletion of close key frames (below) are
shown on x axis angle curve of left upper leg joint in walking
motion.
4. RESULTS
For experimental results, we have used the motion capture
database from CMU [11]. As the first step, we have tested our
algorithm on three different motions; walking, jumping and
playing with a sword. In general, we have used 800 frames of the
motion captured sequences.
Table 1 shows the number of key frames decided for each axis
angle independent from each other and as a total in these three
actions.
Table 1. Number of key frames for each axis angle and total in
three different actions.
Walking
Jumping
Sword
x
13
16
7
y
31
31
22
z
12
58
9
Total
51
97
31
Furthermore, when we apply the last step of reduction of key
frames on these motions, we obtained the results in Table 2. On
the average, we achieve a compression rate of 27 times for the
motions.
Figure 8. Comparison of Mean Absolute Errors in three
methods
Our algorithm creates an error rate closer to [3] but remarkable
bigger than [5]. This is an expected result because [5] creates the
optimum keyframe selection that achieve minimum error.
However, our algorithm overcomes both methods in terms of
computation. Our algorithm does not only provide a method for
keyframe selection, but also shows an efficient way of achieving
it. Moreover, since our algorithm decides keyframes by looking
the saliency metric of frames in a window of closer frames, it does
not need all frames of the motion to decide whether it is a
keyframe or not. Therefore our algorithm can achieve fast and
efficient keyframe selection in streaming and real-time motions.
5. CONCLUSION
In this paper, we have proposed a new approach for key frame
extraction from motion capture sequences. We find the most
important points of the motion curves via computation of a new
curve saliency metric. Curve saliency is computed simply by
taking the absolute difference between the Gaussian weighted
averages of each point computed at fine ( ) and coarse (2 )
scales. Obtaining the candidate key frames from this approach; we
get rid of redundant key frames with reduction operations. Based
on the experimental results, motion captured sequences can be
represented by only 3.7% of all captured frames.
6. ACKNOWLEDGMENTS
This work was supported by EC FP6 3DTV Project (Grant no:
FP6-511568) and Bilkent University Research Development
Grant.
7. REFERENCES
[1] K. Huang, C. Chang, Y. Hsu, S. Yang. “KeyProbe: A
Technique for Animation Keyframe Extraction” The Visual
Computer, Volume 21, pp. 532-541, 2005
[2] Assa, J., Caspi, Y., Cohen-Or, D. “Action synopsis: Pose
selection and illustration.” ACM Transactions on Graphics
(SIGGRAPH) 24(3), 667–676 (2005)
[3] S. Lim and D. Thalmann, “Key-posture extraction out of
human motion data by curve simplification”, In Proceedings
of 23rd Annual International Conference of the IEEE
Engineering in Medicine and Biology Society (EMBC2001),
vol.2, pp. 1167-1169, 2001
[4] S. Li, M. Okuda, S. Takahashi, “Embedded Key Frame
Extraction for CG Animation by Frame Decimation”, ICME,
2005
[5] H. Togawa, M. Okuda, “Position-Based Keyframe Selection
for Human Motion Animation”, International Workshop on
Network-based Virtual Reality and Tele-existence
[6] D. G. Lowe. “Three-dimensional object recognition from
single two dimensional images Artificial Intelligence”,
31:355-395, 1987.
[7] Gong, Y., Liu, X: “Video summarization using singular
value decomposition”. In Computer Vision and Pattern
Recognition, pp 174-180 (2000)
[8] C. H. Lee, A. Varshney, and D. W. Jacob. Mesh saliency.
ACM Trans. on Graphics, 659-666, 2005.
[9] Matsuda K., Kondo K., “Keyframes Extraction Method for
Motion Capture Data” Journal for Geometry and Graphics,
Volume 8 (2004), No.1 81-90
[10] Cooper, M., Foote, J.: Summarizing video using nonnegative similarity matrix factorization. In: IEEE Workshop
on Multimedia Signal Processing, pp. 25– 28 (2002)
[11] http://mocap.cs.cmu.edu
[12] Liu, F., Zhuang, Y., Wu, F., Pan, Y.: 3d motion retrieval
with motion index tree. Comput. Vis. Image Understand.
92(2), 265–284 (2003)