CN112669796A

CN112669796A - Method and device for converting music into music book based on artificial intelligence

Info

Publication number: CN112669796A
Application number: CN202011603739.3A
Authority: CN
Inventors: 何限; 程飞
Original assignee: Xian Jiaotong Liverpool University
Current assignee: Xian Jiaotong Liverpool University
Priority date: 2020-12-29
Filing date: 2020-12-29
Publication date: 2021-04-16

Abstract

The application relates to a method and a device for converting music into music score based on artificial intelligence, belonging to the technical field of computers, wherein the method comprises the following steps: inputting the music file into a pre-trained music recognition model to obtain an intermediate file; the music identification model is obtained by training the artificial intelligent model by using a plurality of groups of sample data, wherein each group of sample data comprises a sample music file and a digital music score corresponding to the sample music file; the intermediate file comprises a characteristic vector used for indicating music information corresponding to the music file; acquiring an expected music score format; a preset file conversion tool is called to convert the file format of the intermediate file into an expected music score format, and a music score file corresponding to the music file is obtained; the problem that midi files obtained by the existing music-to-music mode are often lack of sound part division and can be transcribed into music scores by a dividing method can be solved; the information such as the tone marks, the playing methods, the pedals and the like is lost, and the information can be transcribed into the music score by an identification party; can be understood without machine translation.

Description

Method and device for converting music into music book based on artificial intelligence

Technical Field

The application relates to a method and a device for converting music into music score based on artificial intelligence, and belongs to the technical field of computers.

Background

The music-to-score technology refers to a technology of converting music into a readable and playable score. Currently, music to music score technology can be implemented by computer devices.

In a typical music to score method, music may be converted to a digital musical instrument interface (midi) file. However, midi files often lack part of voice division and need to be transcribed into a music score through a dividing party; and the information such as the tone marks, the playing methods, the pedals and the like is lost, and the information can be transcribed into the music score through an identification party.

Disclosure of Invention

The application provides a method and a device for converting music into music score based on artificial intelligence, which can solve the problem that midi files obtained by the conventional music-to-music mode are often lack of sound part division and can be converted into music score by a dividing method; and the information such as the tone marks, the playing methods, the pedals and the like is lacked, and the information can be transcribed into the music score by a recognizer. The application provides the following technical scheme:

in a first aspect, a method for converting music into music score based on artificial intelligence is provided, the method comprising:

inputting the music file into a pre-trained music recognition model to obtain an intermediate file; the music identification model is obtained by training an artificial intelligent model by using a plurality of groups of sample data, wherein each group of sample data comprises a sample music file and a digital music score corresponding to the sample music file; the intermediate file comprises a feature vector used for indicating music information corresponding to the music file;

acquiring an expected music score format;

and calling a preset file conversion tool to convert the file format of the intermediate file into the expected music score format to obtain the music score file corresponding to the music file.

Optionally, after the sample data is obtained, the music recognition model converts the digital music score in each group of sample data into a corresponding sample intermediate file by using the file conversion tool; and training the artificial intelligence model based on the sample music files in each group of sample data and the sample intermediate files corresponding to each sample music file to obtain the artificial intelligence model.

Optionally, after the sample data is obtained, the music recognition model converts the digital music score in each group of sample data into a corresponding sample intermediate file by using the file conversion tool; converting the sample music files in each group of sample data into frequency spectrum files, and dividing the frequency spectrum files into a plurality of music fragments; and training the artificial intelligence model based on the plurality of music pieces corresponding to each sample music file and the sample intermediate file.

Optionally, the music information includes: musical instruments, key marks, beats, tempo, notes, rests, pitches, duration, tempo changes, key mark changes, bar divisions, vocal part divisions, clef allocation, musical notation, inflexion marks, and decorative tones of music.

Optionally, the desired score format comprises a first format type and/or a second format type;

the first format type is a format type of a file for storing human-readable music symbols corresponding to music files;

the second format type is a format type of a file storing music information readable by a computer program corresponding to the music file.

Optionally, the first format type includes at least one of the following: picture format, portable file format;

the second format type includes at least one of: MIDI format and MXL format.

Optionally, the step of calling a preset file conversion tool to convert the file format of the intermediate file into the expected music score format to obtain the music score file corresponding to the music file includes:

determining a file conversion tool corresponding to the expected music score format;

and calling the determined file conversion tool to convert the file format of the intermediate file into the expected music score format to obtain the music score file.

In a second aspect, an apparatus for converting music into music score based on artificial intelligence is provided, the apparatus comprising:

the music recognition module is used for inputting the music file into a pre-trained music recognition model to obtain an intermediate file; the music identification model is obtained by training an artificial intelligent model by using a plurality of groups of sample data, wherein each group of sample data comprises a sample music file and a digital music score corresponding to the sample music file; the intermediate file comprises a feature vector used for indicating music information corresponding to the music file;

the format acquisition module is used for acquiring an expected music score format;

and the format conversion module is used for calling a preset file conversion tool to convert the file format of the intermediate file into the expected music score format to obtain the music score file corresponding to the music file.

The beneficial effect of this application lies in: inputting the music file into a pre-trained music recognition model to obtain an intermediate file; the music identification model is obtained by training the artificial intelligent model by using a plurality of groups of sample data, wherein each group of sample data comprises a sample music file and a digital music score corresponding to the sample music file; the intermediate file comprises a characteristic vector used for indicating music information corresponding to the music file; acquiring an expected music score format; a preset file conversion tool is called to convert the file format of the intermediate file into an expected music score format, and a music score file corresponding to the music file is obtained; the problem that midi files obtained by the existing music-to-music mode are often lack of sound part division and can be transcribed into music scores by a dividing method can be solved; the information such as the tone marks, the playing methods, the pedals and the like is lost, and the information can be transcribed into the music score by an identification party; the intermediate file generated by the music recognition model is input into a file conversion tool, and the file conversion tool converts the file format of the intermediate file into an expected music score format according to the requirements of a user; flexible conversion of file formats can be achieved. Meanwhile, when the file format is the picture format, the file format can be directly understood by a user without machine translation.

In addition, since the music information includes information such as instruments, key marks, beats, velocities, notes, rests, pitches, lengths of pitches, tempo conversion, velocity conversion, key mark conversion, bar division, vocal part division, clef allocation, musical notation allocation, musical performance, inflexion marks, decorative tones, pedals, and the like required for making a music score corresponding to the music, and influences caused by slight variations such as slight variations in the lengths of notes of the same kind and in the pause times of rests of the same kind between a music file and the music score due to understanding and expression of the music score during the performance are effectively ignored, a good music score which is accurate and easy to read and convenient to perform can be generated.

The foregoing description is only an overview of the technical solutions of the present application, and in order to make the technical solutions of the present application more clear and clear, and to implement the technical solutions according to the content of the description, the following detailed description is made with reference to the preferred embodiments of the present application and the accompanying drawings.

Drawings

FIG. 1 is a flowchart of a method for converting music into music score based on artificial intelligence according to an embodiment of the present application;

FIG. 2 is a schematic diagram of feature extraction of a sample music file provided by one embodiment of the present application;

FIG. 3 is a schematic diagram of a process for generating a feature vector according to an embodiment of the present application;

FIG. 4 is a schematic illustration of an intermediate file provided by an embodiment of the present application;

FIG. 5 is a block diagram of an apparatus for converting music into music score based on artificial intelligence provided by an embodiment of the present application;

fig. 6 is a block diagram of an apparatus for converting music into music score based on artificial intelligence according to another embodiment of the present application.

Detailed Description

The following detailed description of embodiments of the present application will be described in conjunction with the accompanying drawings and examples. The following examples are intended to illustrate the present application but are not intended to limit the scope of the present application.

First, several terms referred to in the present application will be described.

Long Short-Term Memory network (LSTM): is a special recurrent neural network. LSTM is a time-recursive neural network suitable for processing and predicting significant events of relatively long intervals and delays in a time series.

Connection Timing Classification (CTC) penalty function: the method is used for processing the alignment problem of the input label and the output label in the sequence labeling problem. Conventional sequence labeling algorithms require that the input and output symbols be perfectly aligned at each time instant. And CTCs extend the set of tags, adding null elements. After the sequences are labeled by using the extended label set, all the predicted sequences which can be converted into real sequences through the mapping function are correct prediction results. The CTC loss function can obtain a prediction sequence without data alignment processing.

Optionally, the present application is described by taking an execution subject of each embodiment as an example of an electronic device with computing capability, where the electronic device may be a desktop computer, a notebook computer, a server, a mobile phone, a tablet computer, a wearable device, and the like, and the embodiment does not limit the type of the electronic device.

Fig. 1 is a flowchart of a method for converting music into music score based on artificial intelligence according to an embodiment of the present application. The method at least comprises the following steps:

step 101, inputting a music file into a pre-trained music recognition model to obtain an intermediate file; the music recognition model is obtained by training an artificial intelligent model by using a plurality of groups of sample data, wherein each group of sample data comprises a sample music file and a digital music score corresponding to the sample music file; the intermediate file includes a feature vector for indicating music information to which the music file corresponds.

Music files refer to audio files that can be played directly by an audio player. Optionally, the file format of the music file may be MP3 format and/or WAV format, and the file format of the music file is not limited in this embodiment.

The music recognition model is pre-stored in the electronic device.

Optionally, the method for obtaining the music recognition model by training the artificial intelligence model using multiple sets of sample data includes: after the sample data are obtained, converting the digital music score in each group of sample data into a corresponding sample intermediate file by using a file conversion tool; and training the artificial intelligent model based on the sample music files in each group of sample data and the sample intermediate files corresponding to each sample music file to obtain the artificial intelligent model. Or after the sample data is obtained, converting the digital music score in each group of sample data into a corresponding sample intermediate file by using a file conversion tool; converting the sample music files in each group of sample data into frequency spectrum files, and dividing the frequency spectrum files into a plurality of music fragments; and training the artificial intelligent model based on the plurality of music pieces corresponding to each sample music file and the sample intermediate file.

Optionally, the sample data is from public domain (public domain) free music and its corresponding digital music score. Because the music and the digital music score which belong to the public field are huge in quantity and complete in variety, abundant data can be provided for training of the artificial intelligent model.

Wherein, as data preprocessing, the digital music score in the sample data is converted into a sample intermediate file. These sample intermediate files will serve as labels for music files during model training. Alternatively, the sample data may be divided into three parts, a training set, a development set and a test set. The training set is used to train a model architecture of a plurality of music recognition models; the models obtained by training are evaluated by a development set, and proper models are screened out from the models for tuning; and finally, evaluating the final performance of the model by using the test set to obtain a final music recognition model.

Optionally, the artificial intelligence network comprises an LSTM, and accordingly, the music recognition model is built based on the LSTM; and/or, a Gated current Units (GRU), and accordingly, the music recognition model is established based on the GRU, and the embodiment does not limit the network type of the artificial intelligence network. It should be added that, in other implementation manners, the artificial intelligence network may also be another type of network model, and this embodiment is not limited to this.

Optionally, the music recognition model is trained using a CTC loss function. In other embodiments, the loss function used in the training process may also be other loss functions, such as: conditional Random Fields (CRF) loss functions, etc., and the number of loss functions used in the training process may be one or more, and the present embodiment does not limit the type and number of the loss functions.

Such as: reference to the drawings2, the sample music file is converted into a corresponding spectrogram, and the spectrogram records the energy of each frequency audio at different moments. Subsequently, the spectrogram is divided into a plurality of music pieces according to a preset sampling rate and a preset sampling length. For example, if the audio length is 2 seconds, the sampling rate is 100Hz, and the sampling length is 0.05 seconds, the audio segment is divided into 200 segments with a length of 0.05 seconds. If the spectrogram is 400 pixels in length, the spectrogram is divided into 200 segments of 10 pixels wide. Then, each music piece is input into a feature extraction network in an artificial intelligence model to extract a feature vector of the music piece, and feature data of each music piece is obtained. The feature extraction network may be a neural network including a plurality of convolutional layers. Then, referring to fig. 3, taking an example that the artificial intelligence network further includes an LSTM and the loss function is a CTC, the feature data of the music pieces at different times acquired in fig. 2 are input to a bidirectional LSTM network, and the network outputs a vector y conforming to the CTC loss function specification₁-y_nEach column of the vector represents the sound information required for a score (e.g., c3, a, binary rest, etc.). Each column of the vector may be designed for an open source score format (e.g., music xml), thereby producing an intermediate file that may be converted to a score file by a simple transformation; and arranging the output into a matrix and processing to obtain an intermediate file of the whole sample music file.

Schematically, one possible intermediate document is illustrated with reference to fig. 4; suppose the output of the artificial intelligence model at each time instant is the vector y of the acoustic information in FIG. 3₁-y_nThen y will be₁-y_nArranging the intermediate file in a matrix, folding the matrix after folding the matrix to obtain the intermediate file.

The folding is an algorithm characteristic of the CTC loss function in a speech recognition scene, and can fold repeated information in the audio information into the same information. Such as: for the pronunciation of the word music, the letter corresponding every 0.1 second may be mmuussisiiiic, and this output is folded to yield music.

Optionally, in this embodiment, the music information includes, but is not limited to: musical instruments, key marks, beats, tempo, notes, rests, pitches, duration, tempo changes, key mark changes, bar divisions, vocal part divisions, clef allocation, musical notation, inflexion marks, decorative tones, pedals, and the like of music pieces. It should be added that the music information may also include other music information included in the real music score, and the content of the music information is not limited in this embodiment.

Step 102, obtaining a desired score format.

The desired score format is a score format that a user desires to acquire. Optionally, the desired score format includes a first format type and/or a second format type. The first format type refers to a format type of a file storing human-readable music symbols corresponding to music files, such as: the first format type includes at least one of: picture format, portable file format. Of course, the first format type may be other types, and this embodiment is not listed here. The second format type is the format type of the file storing the music information which can be read by the computer program and corresponds to the music file; the music information is translated by a machine to obtain corresponding music symbols, such as: the second format type includes at least one of: MIDI format and MXL format. Of course, the second format type may be other types, and the embodiment is not listed here.

Optionally, obtaining the desired score format comprises: displaying a format selection interface, wherein the format selection interface displays a plurality of music score formats; when a selection operation of at least one score format in a plurality of score formats is received, the score format indicated by the selection operation is determined as a desired score format. Or receiving an expected music score format sent by other equipment; or, a default expected score format is read, and the embodiment does not limit the manner of obtaining the expected score format.

Step 103, a preset file conversion tool is called to convert the file format of the intermediate file into an expected music score format, so as to obtain a music score file corresponding to the music file.

The file conversion tool supports the conversion of the file format of the intermediate file into the expected music score format; meanwhile, the file conversion tool also supports the conversion of a desired score format (such as a data score format) into a file format of an intermediate file.

In one example, the file conversion tool converts the file format of the intermediate file into a desired music score format, resulting in a music score file corresponding to the music file, comprising: creating a score file in a desired score format; identifying each feature vector in the intermediate file to obtain corresponding music information; and writing the music information into the pre-created music score file in a desired music score format to obtain the music score file with the desired music score format.

The file conversion tool has the function of identifying feature vectors, such as: each line of the characteristic vector corresponds to a part of the music information, and all the information segments are combined to obtain the music information.

Optionally, different expected music score formats correspond to different file conversion tools, and at this time, the electronic device further needs to determine a file conversion tool corresponding to the expected music score format; and calling the determined file conversion tool to convert the file format of the intermediate file into the expected music score format to obtain the music score file.

Such as: the file format of the intermediate file is designed with reference to the MusicXML file, and a tool for converting the intermediate file into the MusicXML file may be written and converted into the MusicXML file using the tool. It can then be converted to pdf, midi, etc. using a third party, open source tool. For example: the reference MusicXML file may be converted to pdf, midi, etc. using the open source software musescore, the LilyPond file may be converted to a picture using LilyPond, the midi file may be converted to mp3 using a compositor, etc.

In summary, in the method for converting music into music score based on artificial intelligence provided by this embodiment, the music file is input into the pre-trained music recognition model to obtain an intermediate file; the music identification model is obtained by training the artificial intelligent model by using a plurality of groups of sample data, wherein each group of sample data comprises a sample music file and a digital music score corresponding to the sample music file; the intermediate file comprises a characteristic vector used for indicating music information corresponding to the music file; acquiring an expected music score format; a preset file conversion tool is called to convert the file format of the intermediate file into an expected music score format, and a music score file corresponding to the music file is obtained; the problem that midi files obtained by the existing music-to-music mode are often lack of sound part division and can be transcribed into music scores by a dividing method can be solved; the information such as the tone marks, the playing methods, the pedals and the like is lost, and the information can be transcribed into the music score by an identification party; the intermediate file generated by the music recognition model is input into a file conversion tool, and the file conversion tool converts the file format of the intermediate file into an expected music score format according to the requirements of a user; flexible conversion of file formats can be achieved. Meanwhile, when the file format is the picture format, the file format can be directly understood by a user without machine translation.

In addition, since the music information includes information such as instruments, key marks, beats, velocities, notes, rests, pitches, lengths of pitches, tempo conversion, velocity conversion, key mark conversion, bar division, vocal part division, clef allocation, musical notation, inflexion marks, and decorative tones required for making a music score corresponding to the music, and influences caused by slight variations such as slight variations in the note lengths of the same kind and variations in the pause times of the same kind of rests between a music file and the music score due to understanding and expression of the music score during playing are effectively ignored, a good music score which is accurate and easy to read and convenient to play can be generated.

Fig. 5 is a block diagram of an apparatus for converting music into music score based on artificial intelligence according to an embodiment of the present application. The device at least comprises the following modules: a music recognition module 510, a format acquisition module 520, and a format conversion module 530.

A music recognition module 510, configured to input a music file into a pre-trained music recognition model to obtain an intermediate file; the music identification model is obtained by training an artificial intelligent model by using a plurality of groups of sample data, wherein each group of sample data comprises a sample music file and a digital music score corresponding to the sample music file; the intermediate file comprises a feature vector used for indicating music information corresponding to the music file;

a format obtaining module 520, configured to obtain a desired score format;

a format conversion module 530, configured to invoke a preset file conversion tool to convert the file format of the intermediate file into the expected music score format, so as to obtain a music score file corresponding to the music file.

For relevant details reference is made to the above-described method embodiments.

It should be noted that: in the above embodiment, when the device for converting music into music score based on artificial intelligence is used for converting music into music score based on artificial intelligence, only the division of the functional modules is used for illustration, and in practical applications, the distribution of the functions may be completed by different functional modules as needed, that is, the internal structure of the device for converting music into music score based on artificial intelligence may be divided into different functional modules to complete all or part of the functions described above. In addition, the device for converting music into music score based on artificial intelligence and the method for converting music into music score based on artificial intelligence provided by the above embodiments belong to the same concept, and the specific implementation process thereof is described in detail in the method embodiments and will not be described herein again.

Fig. 6 is a block diagram of an apparatus for converting music into music score based on artificial intelligence according to an embodiment of the present application, such as: a smartphone, a tablet, a laptop, a desktop, or a server. The apparatus for converting music into music score based on artificial intelligence may also be referred to as a user equipment, a portable terminal, a laptop terminal, a desktop terminal, a control terminal, etc., which is not limited in this embodiment. The apparatus comprises at least a processor 601 and a memory 602.

Processor 601 may include one or more processing cores such as: 4 core processors, 8 core processors, etc. The processor 601 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 601 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 601 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, processor 601 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.

The memory 602 may include one or more computer-readable storage media, which may be non-transitory. The memory 602 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 602 is used to store at least one instruction for execution by processor 601 to implement the artificial intelligence based music to music score method provided by the method embodiments herein.

In some embodiments, the apparatus for converting music into music score based on artificial intelligence may further include: a peripheral interface and at least one peripheral. The processor 601, memory 602 and peripheral interface may be connected by a bus or signal lines. Each peripheral may be connected to the peripheral interface via a bus, signal line, or circuit board. Illustratively, peripheral devices include, but are not limited to: radio frequency circuit, touch display screen, audio circuit, power supply, etc.

Of course, the device for converting music into music score based on artificial intelligence may also include fewer or more components, which is not limited by the embodiment.

Optionally, the present application further provides a computer-readable storage medium, in which a program is stored, where the program is loaded and executed by a processor to implement the method for transforming music into music score based on artificial intelligence of the above method embodiments.

Optionally, the present application further provides a computer product, which includes a computer-readable storage medium, in which a program is stored, where the program is loaded and executed by a processor to implement the method for converting music into music score based on artificial intelligence of the above-mentioned method embodiments.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An artificial intelligence based music-to-music score method, the method comprising:

acquiring an expected music score format;

2. The method of claim 1, wherein the music recognition model is implemented by converting the digital music score in each set of sample data into a corresponding sample intermediate file using the file conversion tool after the sample data is obtained; and training the artificial intelligence model based on the sample music files in each group of sample data and the sample intermediate files corresponding to each sample music file to obtain the artificial intelligence model.

3. The method of claim 1, wherein the music recognition model is implemented by converting the digital music score in each set of sample data into a corresponding sample intermediate file using the file conversion tool after the sample data is obtained; converting the sample music files in each group of sample data into frequency spectrum files, and dividing the frequency spectrum files into a plurality of music fragments; and training the artificial intelligence model based on the plurality of music pieces corresponding to each sample music file and the sample intermediate file.

4. The method of claim 1, wherein the music information comprises: musical instruments, key marks, tempos, speeds, notes, rests, pitches, durations, tempo changes, speed changes, key mark changes, bar divisions, vocal part divisions, clef allocation, musical notation, inflexion marks, decorative tones, and pedals of music pieces.

5. The method of claim 1, wherein the desired score format comprises a first format type and/or a second format type;

6. The method of claim 5,

the first format type includes at least one of: picture format, portable file format;

the second format type includes at least one of: MIDI format and MXL format.

7. The method of claim 1, wherein the invoking a preset file conversion tool to convert the file format of the intermediate file into the expected music score format to obtain a music score file corresponding to the music file comprises:

8. An apparatus for converting music into music score based on artificial intelligence, the apparatus comprising: