Second International Conference on Emerging Trends in Engineering and Technology, ICETET-09
Gesture Based Music Generation
Jay Shankar Prasad,G C Nandi
Amit Kumar
Robotics & AI Lab
Indian Institute of Information Technology
Allahabad, India
{jsp, gcnandi}@iiita.ac.in
RGIIT, Amethi Campus
Indian Institute of Information Technology
Allahabad, India
akumariiit@gmail.com
analysis, mainly used by sports coaches. Here we are
utilizing it for determining the body kinematics, which is
applied to the Digital Central Pattern Generator (DCPG).
DCPG requires digital input for rhythmic pattern generation
the speed parameter is useful for it.
Rest of the paper is arranged as follows:
Section II describes the related previous research in this
aspect. The detailed methodology is explained in section III.
Implementation of the framework is thoroughly discussed in
the section IV. Results and their analysis is presented in
section V. Conclusion and future work is explained in
section VI. At the end references has been given.
Abstract— We designed and developed a framework for
generation of music and synchronization of dance on
Humanoid Robot.
Gestures are well suited for
communication, here used for producing entertaining
rhythmic musical pattern. We applied two different
music generation approaches: Random motion and
human gesture. The linear and angular body
movements are extracted and several motion features
were found, these features were mapped to musical
knowledge and thus generated the rhythmic pattern.
We synchronized the obtained music with HOAP-2
Humanoid Robot dance on Webots simulator. We find
very interesting musical patterns which infer gesture
can be used for the composition of music.
Keywords-Gesture; Music; Humanoid Robot; Webots
II.
I.
Generation of music from gesture has been studied in
past [1, 11, 13]. Human hand gesture for
music
composition is described as Cyber composer [1]. It
generated the music according to the hand motion and few
gestures in the absence of real musical synthesizers. They
considered theory of music, analyzed the melody flow and
musical expressions like pitch and rhythm and loudness type
of musical expressions. For gesture recording many motion
sensing devices were used [1, 2, 5]. Sensor information
actuates the different musical patterns, in [2, 12] thumb
sensor values are utilized. Music generating system requires
a musical notes which is produced with MIDI (Musical
Instrument and Digital Interface). In [1, 7] the
use of
Music interface module responsible for producing musical
pattern given to MIDI sequence is discussed. This type of
system can be utilized by musician and music laypersons.
Background music which enhances the quality of music is
used in [1,8] with the help of melody generation module
and music creation module. Music generation and playing of
musical instrument by a robotic setup is explained in [2].
Our problem is similar to this; the difference is that our
system works in a simulated environment. In [2], the robot
performs with the human artists and perfect synchronization
is present among both human and robot during performance.
Synchronization of the various tasks needs several
hardwares, software and networking arrangement, due to
complexity of the overall system and harmonization is the
INTRODUCTION
We use gesture in our daily life for communicating our
thoughts, messages and ideas. Gestures can hence be
universally acceptable and the system based on it is also
more useful than the other. Usually gestures have been
applied for controlling purposes. We found that applications
of gestures are many more and some of the area needs
attention of the researcher. Using human gesture for music
generation and dance synchronization is an important area
of research. Previously few works were based on hardware
for
music generation. The software effort for music
generation is yet to be explored. We applied human gesture
for music generation and our framework is able to produce a
synchronized dance and music. We supplied video input
which we had captured from webcam to a software
SkillSpector[16] , and the body joint angles , positions
,velocity and acceleration are the salient parameters which
synthesize musical pattern. The intensity and movement
pattern of body joints are the key terms for producing the
musical rhythm. Gesture
based approach for music
generation does not need knowledge of music hence our
framework can be used by musician and music lay persons
also. “SkillSpector” is open software for body dynamics
978-0-7695-3884-6/09 $26.00 © 2009 IEEE
RELATED WORK
209
prime concern, the implementation became challenging.
Because music rhythm is a necessary factor,
computationally intelligent logic is used for activating a
fixed pattern [1, 2]. For this the velocity and acceleration of
arms are used. The fast, medium and slow movements of the
hand become the criteria for music generation. Many
musical instruments can be played by calculating the energy
feature like drum beating and piano. The system requires a
separate module for rhythm consideration. In [5], a robotic
setup for playing musical instruments is given. They
considered servo control and solenoid in real time for
playing musical instrument. A motion is recorded through
sensors and fed into control module for controlling the robot
through commands [5]. A Mouse movement is utilized for
creating musical pieces. Mouse movement infers some rules
and thus composes the music. Degree of freedom, transition
among several musical notes and time delays are the
challenges which affects the work of robotic musician [2, 5].
Weinberg et al. [6] proposed a system which responds to
human input in musical form. The robot listens to MIDI and
audio input and generates music responses. In [6], the
system use Genetic Algorithm to allow the robot for
responding to human input. For GA, fit response is found
through mutation and crossover application. The evolved
phrase is evaluated through fitness function which is a
measure of similarity. The least fit phrases are changed with
the new fit members of next generation. Initially suitable
population was selected which is of variable length and
different type.DTW approach is used in [6] for finding the
similarity between observed and generated melody. In [7],
an interactive technique which was useful for music
generation through robot is proposed. They discussed
experiments related with sitar synthesis beat control and
robotic control. For playing the music, events are triggered
rather than samples [7]. They used an audio file to trigger
the event, database is maintained and rhythmic pattern is
matched through a query in order to provide the automatic
generated instrument music. Another interactive multimodal
environment for human robot communication is described in
[8]. Musical instruments are equipped with motion ability
for performing in real time. Their system can communicate
through sound, music, expression and movement. In [8],
sound and music were generated using rule base and
stochastic approaches. Some modification of
musical
feature is done
in real time. The musical features are
timber, pitch, volume, tempo, and style of music [1,8]. In
[8], devices like CCD camera, motion interface and
microphone were for the input like RGB, HSV components,
torque and gravity, volume and pitch respectively. The
obtained information is processed in real time for finding
the behavior of robot in terms of movement and rotation,
audio features and musical components detection and
generation. With processed information robot followed
musical behavior and generates musical pattern. In [9],
authors use task model and task primitive with skill
parameters .Their system obtain these primitive and
parameter from human motion. The motion of the robot is
generated from the obtained result under robotic constraint.
This is useful for generating human like motion generation
for robots. The task primitive were detected and skill
parameters were set, all these values were obtained through
motion capture data. The velocity, speed of steps, roll angles
and pitch angle were the primitives. Position parameters
were extracted from motion data. These parameters utilized
in [9] for imitating human dance at lower body level in a
robot. Earlier some work is done on reverse problem that is
beat Counting Robot [4]. A beat counting robot is developed
using these three issues: (1) Recognition of hierarchical beat
structures, (2) Expression of these structures by counting
beats, (3) Suppression of counting voice (self generated
sound) in sound mixtures. Music-understanding robot [4]
was designed that is capable of dealing with the issue of
self-generated sounds through these approaches:(1) beat
structure prediction based on musical knowledge on chords
and drums, (2) speed control of counting voice according to
music tempo, (3) Semi-blind separation of sound mixtures
into music and counting voice via an adaptive filter based
on ICA (Independent Component Analysis) that uses the
waveform of the counting voice as a prior knowledge.
III.
METHODOLOGY
Gestures are having inherent ambiguity. Gesture
attributes considered for this work become useful even if it
is stochastic in nature. There are three problems associated
with gesture based systems [3]: One task can be
accomplished through large number of gestures, high
recognition accuracy for detecting the gesture command and
user should train the control command.
Overview of the music generation process and dance
simulation is depicted in figure 1. We captured gestures
from webcam at 30 fps and recorded in avi format. The
preprocessing of video input is done which applies deinterlacing, cropping, resizing, noise reduction and color
correction steps. We extracted the motion features from
gesture data using SkillSpector[16] as shown in Figure 2.
SkillSpector tracks the body parts in each frame. It uses a
calibration image like figure 3 for finding out the direct
linear transformation (DLT) parameters.DLT values are
used in kinematic analysis. Linear and angular body
kinematics have been obtained and used for finding the
several features. Velocity and acceleration is also obtained
using SkillSpector which gives us speed of motion for a
particular body part. Music is generated using the motion
features which are mapped to musical knowledge. The
intensity and rhythm of motion became the input of the
digital central pattern generator (DCPG), are used as the
feature for generating the music. The joint angles are
mapped into Webots [17] Comma Separated Values (CSV)
for HOAP-2. We considered these Gestures for our
experiment: Hands up, Hands down, Straight left hand,
210
Gesture Recording through
Video camera
Preprocessing of video input
Feature Extraction
Recognize motion type & speed
Figure 2. Extraction of motion features using SkillSpector, cross marks
shows body joints.
Generate music
According to [1] seven types of musical expressions
rhythm, pitch, pitch- shifting, dynamics, volume, instrument
mode and cadence can be
mapped to the respective
gestures. We use following mapping methodology as [1] :
When the wrist flexes or extends to reach the triggering
level, a melody note is generated.
The control of the pitch is mapped to the relative height of
the right/left hand to the ground to that of the last note. If a
new melody note is generated at a height higher than that of
the last note, the pitch of melody note will be set higher than
the previous one, and vice versa.
If the right hand is lifted up after a melody note is generated
the pitch will shift up gradually, and vice versa. Once pitch
shifting has started, we can control continuous pitch
variation of the melody note freely. Intensity of the shoulder
joint is mapped with the dynamics of music part.
We defined cadence and end with straight right hand. If the
user opens his hand volume is mapped to it.
After music generation we simulated dance and music on
the Webots simulator for HOAP-2 humanoid robot.
Simulate music and motion
Figure 1. Music generation from gesture
Straight Right Hand, Hip movement in simple walking,
simple dance with hands movement.
3.1 Generation of music from gesture data
We used motion segments from gesture motion capture
data to extract musical information. We are using ‘Effort’
and ‘Shape’ component for this purpose as explained in
[10]. Effort is body movement and shape is the key poses.
Motion key poses has relation with musical rhythm [4, 10].
The Motion feature (Mof) has two important feature vector
components: Motion rhythm feature (Morf) and motion
intensity feature (Moif). Morf is the local minimum of
weight effort components and Moif is the linear sum of
rotational (angular) velocities of each body joints.
Motion rhythm feature (f) = 1 if w (f) is about the local
minima else 0. Here w (f) is the weight effort of motion.
I (i) = Intensity obtained from momentum of motion.
Motion intensity feature is calculated using equation (1)
∑
IV.
IMPLEMENTATION
These are the four steps we followed for implementing
the above discussed methodology:
1) Generation of CSV file representing motion.
2) Generation of TEXT file representing sound.
3) Generation of sound file using previously
generated text file.
4) Webots [16] simulation of the music on the
HOAP-2 Robot.
(1)
Here current and previous frame motion information is
obtained from SkillSpector kinematic data as shown in
figure 5, is used to calculate the motion intensity feature
[10]. Music to gesture mapping is explained in [1], the
mapping is based on following five facts: (i) musical
expression should be intuitive, (ii) if musical expressions
needs some fine control it should be mapped to agile part of
the body, (iii) each important musical expressions should
be easily triggered, (iv)Every triggering motion or gesture
should be different, (v) Avoid unnecessary triggering of
any musical expression.
4.1 Generation of CSV file representing motion
We tested our methodology on HOAP-2 Humanoid
Robot Simulator Webots [16]. HOAP-2 has 25 degree of
freedom, six DOF is present in both the legs, 5 DOF is
present for hands, 1 DOF is present for torso and 2 DOF is
present for head. Each joint needs correct value for a
particular posture.
211
We created these CSV files using two approaches: First
Randomly and then through human motion parameters.
Time of simulation is obtained from gesture video becomes
input, the system produces random number.
These random numbers were initialized as body joints csv
values. In another experiment we extracted the csv values
from human joints which we obtained using SkillSpector
and interpolated and extrapolated it .The X coordinate and
Y coordinate value is utilized for this purpose. The angular
value is found using atan2(y/x) formulae, The angular value
result which is in radian is transformed into degree .Angular
value for simulation of HOAP-2 is obtained by multiplying
it with 209[16]. Each joint has maximum and minimum
movement range to prevent the joints from breakage and
misbalance. When threshold is reached the direction of
movement gets reversed. Below two different CSV
generation algorithms are explained.
1.
Algorithm for creating the text file:
Begin:
Step 1: Initialize the Parameters for text file.
Step 2: Select the channel for playing
If Channel ==1 playing instrument=piano
If Channel==10 playing instrument= drum
Channels can be set up to 127.
Step 3: Synchronize the motion and sound
(For slower sound select small intensity value, for fast music
select large intensity values).
Step 4: Save the text file for producing the music.
End.
4.3 Generation of sound file from previously generated
text file
We use MIDI converter t2mf [14, 15] to convert the text
file to sound file. The output music file is obtained through
this command t2mf.exe -r output.txt output.mid. Here t2mf is
the application file which can be downloaded from [15].
Algorithm for Random Generation of CSV
4.4 Simulation
Webots is a simulation software developed by
Cyberbotics [17] used in the area of robotics. It has the
provision to model any robot using their physics. We can
use Webots for developing application for any robot. This
software requires a controller program for controlling the
operation and behavior of the robot. The controller program
can be written in some programming language. We used the
HOAP-2 Robot for our simulation purpose. We developed
the necessary control program which controls the dancing of
the robot using Java. The Webots simulation and sound file
should be executed for getting the dance and music
performance simultaneously.
Java threads application
specially designed for this purpose performs these tasks.
After some performance tuning, we achieved the
synchronized dancing of HOAP-2 Robot on the music
generated through the gesture interface
begin:
Set the random initial value for each joint
Set minimum and maximum value for each joint
Convert angle axis to rotation matrix
Convert rotation matrix to Euler angle.
Update joint values
Check the joint value between specified range set in step 2
If it crosses the threshold reverse the movement direction.
Write each row into csv file used by HOAP-2
end.
2.
Algorithm for Generation of CSV from human
motion capture data
begin:
Set the initial value for each joint which is obtained from motion
Capture data
Set minimum and maximum value for each joint
Convert angle axis to rotation matrix
Convert rotation matrix to Euler angle.
Update joint values
Check the joint value between specified range set in step 2
If it crosses the threshold reverse the movement direction.
Write each row into csv file used by HOAP-2
end.
V.
RESULTS & ANALYSIS
We captured video images using webcam then performed
the kinematic analysis using SkillSpector. Figure 3 shows
the video frame and calibration image. Four points are
selected as calibration points.
4.2 Generation of text file representing sound
The text file is created by our java program which use
speed obtained from velocity of a body part for determining
the intensity value of a particular sound. We use rules like:
If speed = 30 then intensity is between 60 and 70
If speed = 60 then intensity is between 70 and 90
If speed = 80 then intensity is between 90 and 100
Figure 3. Gesture Acquisition and calibration image: Left images are the
frame of acquired image, right images are the calibration images.
Generated output file is in MIDI format specified in [14].
212
Figure 8. Acceleration of various parts of hands joints.
Figure 4. Position of left hand joints in X axis while movement, elbow
joints position changes very fast a non linear behaior is observed . Rest
joints shows a linear behavior.
From figure 4 to 8 depicts the various motion parameters
obtained.Position parameter of elbow joint shown in figure
4 is used in obtaing the music pattern such that it should be
started at low pitch and then reaches to some maximum
value and again chaneges its direction. Wrist position
parametrs are used for melody note generation.Finger
position is used for controlling the pitch.
Veolcity parameter as shown in figure 5 is related with
speed for generating the intensity of music.
Elbow joint velocity generates intensity according to
algorithm explained in previous section. Wrist and finger
joint do not contribute for us becaue the variation is very
slow and no audible effect is obtained.
We are applying the same concept of music generation as
above for angular position and angular velocity parameter.
The result obtained using angular paramerts are good.
Figure 8 shows the acceleration of various joints of hand.
Figure 5. Velocity of variuos parts of hand , elbow joints velocity is
linearly increasing, wrist and finger joints velocity is changing slowly.
5.1 Graphical user interface
A GUI as shown in figure 9 is designed for the creation
of CSV file and generation of music .User has to enter time
of simulation in second. Presently we are generating drum
and piano instrumental music, we can select any one
instrument at a time then we have to generate CSV file from
it. Music generation module generates the music in MIDI
format, the algorithm already discussed in the previous
section. Simulation of dance on the Webots simulator and
Figure 6. Angular position of hand joint, Wrist joint’s angular position is
increasing, shoulder joint’s angular position is decreasing, but the elbow
joint angular position is almost constant it changes only in last moment of
the gesture acquisition.
Figure 9. GUI used for generation of music and simulation of dance on
HOAP-2 robot
Figure 7. Angular velocity of various joints of left hand .
213
[3].
[4].
[5].
[6].
Figure 10. Snapshot of simulation result of HOAP-2 dance.
[7].
music playing are accomplished through two separate
threads written in java. After some performance tuning we
generated synchronized dance and music. Simulation result
is shown in the figure 10.
[8].
VI.
CONCLUSION
[9].
Our approach provides a way for creating music from
gestures. It has the flexibility to control the music
generation without playing the actual instrument. No
specialized knowledge of music is needed to create the
music hence can help to promote music in an interactive and
entertaining way. We considered only two instruments drum
and piano. For music, we can add more instrumental
features in future. Additional musical theories can be added
to make sound and music more professional. This will open
a new set of possibilities for robotized music and dance
performance. We are working on developing a fully
functional gesture based software driven robot composer
and dance performer, which will generate not only simple
tunes, but also real songs.
[10].
[11].
[12].
REFERENCES
[13].
[1].
[2].
Ip H.H.S., K.C.K. Law, B. Kwong, "Cyber Composer:
Hand Gesture-Driven Intelligent Music Composition and
Generation," Multimedia Modelling Conference, 2005.
MMM 2005. Proceedings of the 11th International , pp.
46-52, 12-14 Jan. 2005.
A. Kapur, A. Eigenfeldt, C. Bahn, W.A. Schloss,
“Collaborative Composition for Musical Robots”,
Proceedings of the International Conference on Digital
Arts, Porto, Portugal, November 2008.
[14].
[15].
[16].
[17].
214
J. Kela, P. Korpipaää, J. Mäntyjärvi, S. Kallio, G.
Savino,L. Jozzo, S. D. Marca. “Accelerometer-based
gesture control for a design environment”, Pers.
Ubiquitous Computing, 10: 285–299, 2006.
T. Mizumoto, R. Takeda, K.Yoshii, K. Komatani, T.
Ogata, H. G. Okuno, "A robot listens to music and
counts its beats aloud by separating music from counting
voice," Intelligent Robots and Systems, 2008. IROS
2008. IEEE/RSJ International Conference on ,pp.15381543, 22-26 Sept. 2008.
M. Tarek, M. Sobh, Bei Wang, Kurt W. Coble, “
Experimental Robot Musicians”, Journal of Intelligent
and Robotic Systems , vol 38, issue 2 , page 197-212
,2003.
Gil Weinberg , Mark Godfrey , Alex Rae , John Rhoads,
“A Real-Time Genetic Algorithm in Human-Robot
Musical Improvisation, Computer Music Modeling and
Retrieval. Sense of Sounds”, 4th International
Symposium, CMMR 2007, Copenhagen, Denmark,
August 27-31, 2007.
A. Kapur, G. Tzanetakis, W.A. Schloss, P. Driessen, E.
Singer, “Towards the One-Man Indian Computer Music
Performance System”, Proceedings of the International
Computer Music Conference (ICMC), New Orleans,
USA, November, 2006.
K. Suzuki, T. Ohashi, and S. Hashimoto, “Interactive
multimodal mobile robot for musical performance,” in
Proc. Int. Computer Music Conf., pp. 407–410, 1999.
S. Nakaoka, A. Nakazawa, F. Kanehiro, K. Kaneko, M.
Morisawa, K. Ikeuchi, "Task model of lower body
motion for a biped humanoid robot to imitate human
dances," Intelligent Robots and Systems, 2005. (IROS
2005). 2005 IEEE/RSJ International Conference on , pp
3157-3162, 2-6 Aug. 2005.
Takaaki Shiratori, Atsushi Nakazawa, KatsushiIkeuchi,
“Dancing-to-Music
Character
Animation”
in
“EUROGRAPHICS 2006”, Volume 25 ,Number 3 ,pp.
449-452,2006.
Camurri A., Canepa C., Volpe G., “Active listening to a
virtual orchestra through an expressive gestural
interface:The Orchestra Explorer”, Proc. Intl. Conf. on
New Interfaces for Musical Expression (NIME07), New
York, 2007.
F. Bevilacqua, L. Naugle and I. Valverde, “Virtual dance
and music environment using motion capture”, Proc. Of
IEEE- Multimedia Technology and Applications
conference, Irvine, 2001.
C. Cadoz, Wanderley, "Gesture-Music." In M.
Wanderley and M. Battier, eds. Trends in Gestural
Control of Music. Ircam – Centre Pompidou, 2000.
www.midi.org/aboutmidi/tut_midimusicsynth.php
www.hitsquad.com/smm/programs/t2mf/download
www.video4coach.com/images/SkillSpector2D
IntroENG.pdf
www.cyberbotics.com/cdrom/common/doc/webots/refer
ence/reference.pdf