Gesture Based Music Generation

An inertial motion capture system and real-time algorithm for command and control of HOAP2 robot using natural gestures has been developed. MEMS accelerometer is used as motion capture device and exploited for controlling the humanoid robot. The robot is able to recognise the gesture command and react accordingly. In addition robot has been trained to imitate the human hand movements. Our system is sensitive towards natural gestures. We found very encouraging results and thus inertial sensor can be used as an effective motion capture device. Developed system can accurately capture, classify, synthesise and recognise the gesture in real time. We tested our framework on Webots simulator. Keywords— HOAP2, Humanoid Robot, Real time human body motion capture, gesture recognition, gesture commands, inertial sensor, Webots.

This paper describes a novel approach towards recognizing of Indian Sign Language (ISL) gestures for Humanoid Robot Interaction (HRI). An extensive approach is being introduced for classification of ISL gesture which imparts an elegant way of interaction between humanoid robot HOAP-2 and human being. ISL gestures are being considered as a communicating agent for humanoid robot which is being used in this context explicitly. It involves different image processing techniques followed by a generic algorithm for feature extraction process. The classification technique deals with the Euclidean distance metric. The concrete HRI system has been established for initiation based learning mechanism. The Real time robotics simulation software, WEBOTS has been adopted to simulate the classified ISL gestures on HOAP-2 robot. The JAVA based software has been developed to deal with the entire HRI process.

Expressiveness and naturalness in robotic motions and behaviors can be replicated with the usage of captured human movements. Considering dance as a complex and expressive type of motion, in this paper we propose a method for generating humanoid dance motions transferred from human motion capture (MoCap) data.

Second International Conference on Emerging Trends in Engineering and Technology, ICETET-09 Gesture Based Music Generation Jay Shankar Prasad,G C Nandi Amit Kumar Robotics & AI Lab Indian Institute of Information Technology Allahabad, India {jsp, gcnandi}@iiita.ac.in RGIIT, Amethi Campus Indian Institute of Information Technology Allahabad, India akumariiit@gmail.com analysis, mainly used by sports coaches. Here we are utilizing it for determining the body kinematics, which is applied to the Digital Central Pattern Generator (DCPG). DCPG requires digital input for rhythmic pattern generation the speed parameter is useful for it. Rest of the paper is arranged as follows: Section II describes the related previous research in this aspect. The detailed methodology is explained in section III. Implementation of the framework is thoroughly discussed in the section IV. Results and their analysis is presented in section V. Conclusion and future work is explained in section VI. At the end references has been given. Abstract— We designed and developed a framework for generation of music and synchronization of dance on Humanoid Robot. Gestures are well suited for communication, here used for producing entertaining rhythmic musical pattern. We applied two different music generation approaches: Random motion and human gesture. The linear and angular body movements are extracted and several motion features were found, these features were mapped to musical knowledge and thus generated the rhythmic pattern. We synchronized the obtained music with HOAP-2 Humanoid Robot dance on Webots simulator. We find very interesting musical patterns which infer gesture can be used for the composition of music. Keywords-Gesture; Music; Humanoid Robot; Webots II. I. Generation of music from gesture has been studied in past [1, 11, 13]. Human hand gesture for music composition is described as Cyber composer [1]. It generated the music according to the hand motion and few gestures in the absence of real musical synthesizers. They considered theory of music, analyzed the melody flow and musical expressions like pitch and rhythm and loudness type of musical expressions. For gesture recording many motion sensing devices were used [1, 2, 5]. Sensor information actuates the different musical patterns, in [2, 12] thumb sensor values are utilized. Music generating system requires a musical notes which is produced with MIDI (Musical Instrument and Digital Interface). In [1, 7] the use of Music interface module responsible for producing musical pattern given to MIDI sequence is discussed. This type of system can be utilized by musician and music laypersons. Background music which enhances the quality of music is used in [1,8] with the help of melody generation module and music creation module. Music generation and playing of musical instrument by a robotic setup is explained in [2]. Our problem is similar to this; the difference is that our system works in a simulated environment. In [2], the robot performs with the human artists and perfect synchronization is present among both human and robot during performance. Synchronization of the various tasks needs several hardwares, software and networking arrangement, due to complexity of the overall system and harmonization is the INTRODUCTION We use gesture in our daily life for communicating our thoughts, messages and ideas. Gestures can hence be universally acceptable and the system based on it is also more useful than the other. Usually gestures have been applied for controlling purposes. We found that applications of gestures are many more and some of the area needs attention of the researcher. Using human gesture for music generation and dance synchronization is an important area of research. Previously few works were based on hardware for music generation. The software effort for music generation is yet to be explored. We applied human gesture for music generation and our framework is able to produce a synchronized dance and music. We supplied video input which we had captured from webcam to a software SkillSpector[16] , and the body joint angles , positions ,velocity and acceleration are the salient parameters which synthesize musical pattern. The intensity and movement pattern of body joints are the key terms for producing the musical rhythm. Gesture based approach for music generation does not need knowledge of music hence our framework can be used by musician and music lay persons also. “SkillSpector” is open software for body dynamics 978-0-7695-3884-6/09 $26.00 © 2009 IEEE RELATED WORK 209 prime concern, the implementation became challenging. Because music rhythm is a necessary factor, computationally intelligent logic is used for activating a fixed pattern [1, 2]. For this the velocity and acceleration of arms are used. The fast, medium and slow movements of the hand become the criteria for music generation. Many musical instruments can be played by calculating the energy feature like drum beating and piano. The system requires a separate module for rhythm consideration. In [5], a robotic setup for playing musical instruments is given. They considered servo control and solenoid in real time for playing musical instrument. A motion is recorded through sensors and fed into control module for controlling the robot through commands [5]. A Mouse movement is utilized for creating musical pieces. Mouse movement infers some rules and thus composes the music. Degree of freedom, transition among several musical notes and time delays are the challenges which affects the work of robotic musician [2, 5]. Weinberg et al. [6] proposed a system which responds to human input in musical form. The robot listens to MIDI and audio input and generates music responses. In [6], the system use Genetic Algorithm to allow the robot for responding to human input. For GA, fit response is found through mutation and crossover application. The evolved phrase is evaluated through fitness function which is a measure of similarity. The least fit phrases are changed with the new fit members of next generation. Initially suitable population was selected which is of variable length and different type.DTW approach is used in [6] for finding the similarity between observed and generated melody. In [7], an interactive technique which was useful for music generation through robot is proposed. They discussed experiments related with sitar synthesis beat control and robotic control. For playing the music, events are triggered rather than samples [7]. They used an audio file to trigger the event, database is maintained and rhythmic pattern is matched through a query in order to provide the automatic generated instrument music. Another interactive multimodal environment for human robot communication is described in [8]. Musical instruments are equipped with motion ability for performing in real time. Their system can communicate through sound, music, expression and movement. In [8], sound and music were generated using rule base and stochastic approaches. Some modification of musical feature is done in real time. The musical features are timber, pitch, volume, tempo, and style of music [1,8]. In [8], devices like CCD camera, motion interface and microphone were for the input like RGB, HSV components, torque and gravity, volume and pitch respectively. The obtained information is processed in real time for finding the behavior of robot in terms of movement and rotation, audio features and musical components detection and generation. With processed information robot followed musical behavior and generates musical pattern. In [9], authors use task model and task primitive with skill parameters .Their system obtain these primitive and parameter from human motion. The motion of the robot is generated from the obtained result under robotic constraint. This is useful for generating human like motion generation for robots. The task primitive were detected and skill parameters were set, all these values were obtained through motion capture data. The velocity, speed of steps, roll angles and pitch angle were the primitives. Position parameters were extracted from motion data. These parameters utilized in [9] for imitating human dance at lower body level in a robot. Earlier some work is done on reverse problem that is beat Counting Robot [4]. A beat counting robot is developed using these three issues: (1) Recognition of hierarchical beat structures, (2) Expression of these structures by counting beats, (3) Suppression of counting voice (self generated sound) in sound mixtures. Music-understanding robot [4] was designed that is capable of dealing with the issue of self-generated sounds through these approaches:(1) beat structure prediction based on musical knowledge on chords and drums, (2) speed control of counting voice according to music tempo, (3) Semi-blind separation of sound mixtures into music and counting voice via an adaptive filter based on ICA (Independent Component Analysis) that uses the waveform of the counting voice as a prior knowledge. III. METHODOLOGY Gestures are having inherent ambiguity. Gesture attributes considered for this work become useful even if it is stochastic in nature. There are three problems associated with gesture based systems [3]: One task can be accomplished through large number of gestures, high recognition accuracy for detecting the gesture command and user should train the control command. Overview of the music generation process and dance simulation is depicted in figure 1. We captured gestures from webcam at 30 fps and recorded in avi format. The preprocessing of video input is done which applies deinterlacing, cropping, resizing, noise reduction and color correction steps. We extracted the motion features from gesture data using SkillSpector[16] as shown in Figure 2. SkillSpector tracks the body parts in each frame. It uses a calibration image like figure 3 for finding out the direct linear transformation (DLT) parameters.DLT values are used in kinematic analysis. Linear and angular body kinematics have been obtained and used for finding the several features. Velocity and acceleration is also obtained using SkillSpector which gives us speed of motion for a particular body part. Music is generated using the motion features which are mapped to musical knowledge. The intensity and rhythm of motion became the input of the digital central pattern generator (DCPG), are used as the feature for generating the music. The joint angles are mapped into Webots [17] Comma Separated Values (CSV) for HOAP-2. We considered these Gestures for our experiment: Hands up, Hands down, Straight left hand, 210 Gesture Recording through Video camera Preprocessing of video input Feature Extraction Recognize motion type & speed Figure 2. Extraction of motion features using SkillSpector, cross marks shows body joints. Generate music According to [1] seven types of musical expressions rhythm, pitch, pitch- shifting, dynamics, volume, instrument mode and cadence can be mapped to the respective gestures. We use following mapping methodology as [1] : When the wrist flexes or extends to reach the triggering level, a melody note is generated. The control of the pitch is mapped to the relative height of the right/left hand to the ground to that of the last note. If a new melody note is generated at a height higher than that of the last note, the pitch of melody note will be set higher than the previous one, and vice versa. If the right hand is lifted up after a melody note is generated the pitch will shift up gradually, and vice versa. Once pitch shifting has started, we can control continuous pitch variation of the melody note freely. Intensity of the shoulder joint is mapped with the dynamics of music part. We defined cadence and end with straight right hand. If the user opens his hand volume is mapped to it. After music generation we simulated dance and music on the Webots simulator for HOAP-2 humanoid robot. Simulate music and motion Figure 1. Music generation from gesture Straight Right Hand, Hip movement in simple walking, simple dance with hands movement. 3.1 Generation of music from gesture data We used motion segments from gesture motion capture data to extract musical information. We are using ‘Effort’ and ‘Shape’ component for this purpose as explained in [10]. Effort is body movement and shape is the key poses. Motion key poses has relation with musical rhythm [4, 10]. The Motion feature (Mof) has two important feature vector components: Motion rhythm feature (Morf) and motion intensity feature (Moif). Morf is the local minimum of weight effort components and Moif is the linear sum of rotational (angular) velocities of each body joints. Motion rhythm feature (f) = 1 if w (f) is about the local minima else 0. Here w (f) is the weight effort of motion. I (i) = Intensity obtained from momentum of motion. Motion intensity feature is calculated using equation (1) ∑ IV. IMPLEMENTATION These are the four steps we followed for implementing the above discussed methodology: 1) Generation of CSV file representing motion. 2) Generation of TEXT file representing sound. 3) Generation of sound file using previously generated text file. 4) Webots [16] simulation of the music on the HOAP-2 Robot. (1) Here current and previous frame motion information is obtained from SkillSpector kinematic data as shown in figure 5, is used to calculate the motion intensity feature [10]. Music to gesture mapping is explained in [1], the mapping is based on following five facts: (i) musical expression should be intuitive, (ii) if musical expressions needs some fine control it should be mapped to agile part of the body, (iii) each important musical expressions should be easily triggered, (iv)Every triggering motion or gesture should be different, (v) Avoid unnecessary triggering of any musical expression. 4.1 Generation of CSV file representing motion We tested our methodology on HOAP-2 Humanoid Robot Simulator Webots [16]. HOAP-2 has 25 degree of freedom, six DOF is present in both the legs, 5 DOF is present for hands, 1 DOF is present for torso and 2 DOF is present for head. Each joint needs correct value for a particular posture. 211 We created these CSV files using two approaches: First Randomly and then through human motion parameters. Time of simulation is obtained from gesture video becomes input, the system produces random number. These random numbers were initialized as body joints csv values. In another experiment we extracted the csv values from human joints which we obtained using SkillSpector and interpolated and extrapolated it .The X coordinate and Y coordinate value is utilized for this purpose. The angular value is found using atan2(y/x) formulae, The angular value result which is in radian is transformed into degree .Angular value for simulation of HOAP-2 is obtained by multiplying it with 209[16]. Each joint has maximum and minimum movement range to prevent the joints from breakage and misbalance. When threshold is reached the direction of movement gets reversed. Below two different CSV generation algorithms are explained. 1. Algorithm for creating the text file: Begin: Step 1: Initialize the Parameters for text file. Step 2: Select the channel for playing If Channel ==1 playing instrument=piano If Channel==10 playing instrument= drum Channels can be set up to 127. Step 3: Synchronize the motion and sound (For slower sound select small intensity value, for fast music select large intensity values). Step 4: Save the text file for producing the music. End. 4.3 Generation of sound file from previously generated text file We use MIDI converter t2mf [14, 15] to convert the text file to sound file. The output music file is obtained through this command t2mf.exe -r output.txt output.mid. Here t2mf is the application file which can be downloaded from [15]. Algorithm for Random Generation of CSV 4.4 Simulation Webots is a simulation software developed by Cyberbotics [17] used in the area of robotics. It has the provision to model any robot using their physics. We can use Webots for developing application for any robot. This software requires a controller program for controlling the operation and behavior of the robot. The controller program can be written in some programming language. We used the HOAP-2 Robot for our simulation purpose. We developed the necessary control program which controls the dancing of the robot using Java. The Webots simulation and sound file should be executed for getting the dance and music performance simultaneously. Java threads application specially designed for this purpose performs these tasks. After some performance tuning, we achieved the synchronized dancing of HOAP-2 Robot on the music generated through the gesture interface begin: Set the random initial value for each joint Set minimum and maximum value for each joint Convert angle axis to rotation matrix Convert rotation matrix to Euler angle. Update joint values Check the joint value between specified range set in step 2 If it crosses the threshold reverse the movement direction. Write each row into csv file used by HOAP-2 end. 2. Algorithm for Generation of CSV from human motion capture data begin: Set the initial value for each joint which is obtained from motion Capture data Set minimum and maximum value for each joint Convert angle axis to rotation matrix Convert rotation matrix to Euler angle. Update joint values Check the joint value between specified range set in step 2 If it crosses the threshold reverse the movement direction. Write each row into csv file used by HOAP-2 end. V. RESULTS & ANALYSIS We captured video images using webcam then performed the kinematic analysis using SkillSpector. Figure 3 shows the video frame and calibration image. Four points are selected as calibration points. 4.2 Generation of text file representing sound The text file is created by our java program which use speed obtained from velocity of a body part for determining the intensity value of a particular sound. We use rules like: If speed = 30 then intensity is between 60 and 70 If speed = 60 then intensity is between 70 and 90 If speed = 80 then intensity is between 90 and 100 Figure 3. Gesture Acquisition and calibration image: Left images are the frame of acquired image, right images are the calibration images. Generated output file is in MIDI format specified in [14]. 212 Figure 8. Acceleration of various parts of hands joints. Figure 4. Position of left hand joints in X axis while movement, elbow joints position changes very fast a non linear behaior is observed . Rest joints shows a linear behavior. From figure 4 to 8 depicts the various motion parameters obtained.Position parameter of elbow joint shown in figure 4 is used in obtaing the music pattern such that it should be started at low pitch and then reaches to some maximum value and again chaneges its direction. Wrist position parametrs are used for melody note generation.Finger position is used for controlling the pitch. Veolcity parameter as shown in figure 5 is related with speed for generating the intensity of music. Elbow joint velocity generates intensity according to algorithm explained in previous section. Wrist and finger joint do not contribute for us becaue the variation is very slow and no audible effect is obtained. We are applying the same concept of music generation as above for angular position and angular velocity parameter. The result obtained using angular paramerts are good. Figure 8 shows the acceleration of various joints of hand. Figure 5. Velocity of variuos parts of hand , elbow joints velocity is linearly increasing, wrist and finger joints velocity is changing slowly. 5.1 Graphical user interface A GUI as shown in figure 9 is designed for the creation of CSV file and generation of music .User has to enter time of simulation in second. Presently we are generating drum and piano instrumental music, we can select any one instrument at a time then we have to generate CSV file from it. Music generation module generates the music in MIDI format, the algorithm already discussed in the previous section. Simulation of dance on the Webots simulator and Figure 6. Angular position of hand joint, Wrist joint’s angular position is increasing, shoulder joint’s angular position is decreasing, but the elbow joint angular position is almost constant it changes only in last moment of the gesture acquisition. Figure 9. GUI used for generation of music and simulation of dance on HOAP-2 robot Figure 7. Angular velocity of various joints of left hand . 213 [3]. [4]. [5]. [6]. Figure 10. Snapshot of simulation result of HOAP-2 dance. [7]. music playing are accomplished through two separate threads written in java. After some performance tuning we generated synchronized dance and music. Simulation result is shown in the figure 10. [8]. VI. CONCLUSION [9]. Our approach provides a way for creating music from gestures. It has the flexibility to control the music generation without playing the actual instrument. No specialized knowledge of music is needed to create the music hence can help to promote music in an interactive and entertaining way. We considered only two instruments drum and piano. For music, we can add more instrumental features in future. Additional musical theories can be added to make sound and music more professional. This will open a new set of possibilities for robotized music and dance performance. We are working on developing a fully functional gesture based software driven robot composer and dance performer, which will generate not only simple tunes, but also real songs. [10]. [11]. [12]. REFERENCES [13]. [1]. [2]. Ip H.H.S., K.C.K. Law, B. Kwong, "Cyber Composer: Hand Gesture-Driven Intelligent Music Composition and Generation," Multimedia Modelling Conference, 2005. MMM 2005. Proceedings of the 11th International , pp. 46-52, 12-14 Jan. 2005. A. Kapur, A. Eigenfeldt, C. Bahn, W.A. Schloss, “Collaborative Composition for Musical Robots”, Proceedings of the International Conference on Digital Arts, Porto, Portugal, November 2008. [14]. [15]. [16]. [17]. 214 J. Kela, P. Korpipaää, J. Mäntyjärvi, S. Kallio, G. Savino,L. Jozzo, S. D. Marca. “Accelerometer-based gesture control for a design environment”, Pers. Ubiquitous Computing, 10: 285–299, 2006. T. Mizumoto, R. Takeda, K.Yoshii, K. Komatani, T. Ogata, H. G. Okuno, "A robot listens to music and counts its beats aloud by separating music from counting voice," Intelligent Robots and Systems, 2008. IROS 2008. IEEE/RSJ International Conference on ,pp.15381543, 22-26 Sept. 2008. M. Tarek, M. Sobh, Bei Wang, Kurt W. Coble, “ Experimental Robot Musicians”, Journal of Intelligent and Robotic Systems , vol 38, issue 2 , page 197-212 ,2003. Gil Weinberg , Mark Godfrey , Alex Rae , John Rhoads, “A Real-Time Genetic Algorithm in Human-Robot Musical Improvisation, Computer Music Modeling and Retrieval. Sense of Sounds”, 4th International Symposium, CMMR 2007, Copenhagen, Denmark, August 27-31, 2007. A. Kapur, G. Tzanetakis, W.A. Schloss, P. Driessen, E. Singer, “Towards the One-Man Indian Computer Music Performance System”, Proceedings of the International Computer Music Conference (ICMC), New Orleans, USA, November, 2006. K. Suzuki, T. Ohashi, and S. Hashimoto, “Interactive multimodal mobile robot for musical performance,” in Proc. Int. Computer Music Conf., pp. 407–410, 1999. S. Nakaoka, A. Nakazawa, F. Kanehiro, K. Kaneko, M. Morisawa, K. Ikeuchi, "Task model of lower body motion for a biped humanoid robot to imitate human dances," Intelligent Robots and Systems, 2005. (IROS 2005). 2005 IEEE/RSJ International Conference on , pp 3157-3162, 2-6 Aug. 2005. Takaaki Shiratori, Atsushi Nakazawa, KatsushiIkeuchi, “Dancing-to-Music Character Animation” in “EUROGRAPHICS 2006”, Volume 25 ,Number 3 ,pp. 449-452,2006. Camurri A., Canepa C., Volpe G., “Active listening to a virtual orchestra through an expressive gestural interface:The Orchestra Explorer”, Proc. Intl. Conf. on New Interfaces for Musical Expression (NIME07), New York, 2007. F. Bevilacqua, L. Naugle and I. Valverde, “Virtual dance and music environment using motion capture”, Proc. Of IEEE- Multimedia Technology and Applications conference, Irvine, 2001. C. Cadoz, Wanderley, "Gesture-Music." In M. Wanderley and M. Battier, eds. Trends in Gestural Control of Music. Ircam – Centre Pompidou, 2000. www.midi.org/aboutmidi/tut_midimusicsynth.php www.hitsquad.com/smm/programs/t2mf/download www.video4coach.com/images/SkillSpector2D IntroENG.pdf www.cyberbotics.com/cdrom/common/doc/webots/refer ence/reference.pdf

RELATED PAPERS

RELATED TOPICS

Log In

Gesture Based Music Generation

Gesture Based Music Generation

Related Papers

RELATED PAPERS

RELATED TOPICS