WO2021240644A1

WO2021240644A1 - Information output program, device, and method

Info

Publication number: WO2021240644A1
Application number: PCT/JP2020/020734
Authority: WO
Inventors: 健二山本; 教数塩月
Original assignee: 富士通株式会社
Priority date: 2020-05-26
Filing date: 2020-05-26
Publication date: 2021-12-02
Also published as: WO2021240837A1; JPWO2021240837A1

Abstract

This information output device: acquires a video picture of a baseball game that includes sounds at a ball park and a dynamic picture image; acquires scene information of respective sections corresponding to events in the dynamic picture image; acquires stats information that has been inputted externally on the basis of the video picture; associates the stats information with the scene information so as to generate event information that comprises accurate clock time information with respect to the video picture and detailed information in relation to an event; generates play-by-play commentary sentences about the respective events on the basis of the event information and templates; and adjusts the output timing of the generated play-by-play commentary sentences on the basis of the output timing, for each of the sections, of the ballpark sounds and/or the dynamic picture image, to thereby output the commentary sentences together with the ballpark sounds and/or the dynamic picture.

Description

Information output programs, devices, and methods

The disclosed technology relates to an information output program, an information output device, and an information output method.

For example, live distribution of sports videos such as baseball and soccer games is being carried out. In many cases, a commentary and a commentary on the situation such as a match are added to such a video. Therefore, a technique for adding a live commentary and a commentary to a video has been proposed.

For example, a commentary additional voice generation device has been proposed that generates a commentary voice by synthesizing a commentary manuscript (text data) related to the content of a video. This device detects a talking section (voiced sound section), which is a voiced sound section, and a pause section, which is a silent or background sound-only voice section, from the video sound which is the sound of the video. Then, this device converts the commentary sound into the speech speed based on the section length of the pause section, and adds the commentary sound converted into the speech speed to the video sound.

Japanese Unexamined Patent Publication No. 2008-39845

In the conventional technology, it is necessary to prepare in advance the commentary manuscript that is the source of the commentary audio to be added to the video / audio. However, in live distribution of video and audio related to sports games and the like, it is difficult to prepare commentary manuscripts in advance because the actual situation and the contents of the commentary differ depending on the situation of the games and the like. In addition, due to the diversification of viewing styles and the like, preparing a commentator and a commentator according to the distributed content will be a heavy burden in terms of labor costs, distribution equipment, and the like. Furthermore, it is necessary that the live commentary to be added is output at an appropriate timing with respect to the original moving image and sound.

As one aspect, the disclosed technology outputs information showing the actual condition and explanation at an appropriate timing in live distribution contents such as sports games without the need for a manuscript prepared in advance and the actual condition and commentator. With the goal.

As one embodiment, the disclosed technique acquires a sports image including sound information and a moving image, and event information related to an event indicated by each section of the moving image, and based on the acquired event information, the disclosed technique is used. A commentary on the event is generated for each section. Further, the disclosed technique adjusts the output timing of the generated live commentary based on the output timing of at least one section of the sound information and the moving image, and adjusts the output timing of the sound information and the moving image at least. Output with one.

One aspect is that it is possible to output information showing the actual condition and explanation at an appropriate timing in live distribution contents such as sports games without the need for a manuscript prepared in advance and the actual condition and commentator. Has.

It is a block diagram which shows the schematic structure of an information output system. It is a functional block diagram of an information output device. It is a figure which shows an example of a scene information DB. It is a figure which shows an example of a stats DB. It is a figure which shows an example of the event information DB. It is a figure which shows an example of a template DB. It is a figure for demonstrating the selection of a template group. It is a figure for demonstrating the generation of the content with live commentary. It is a block diagram which shows the schematic structure of the computer which functions as an information output device. It is a flowchart which shows an example of information output processing.

Hereinafter, an example of an embodiment relating to the disclosed technology will be described with reference to the drawings. In the following embodiment, as an example, an example in which the disclosed technology is applied to the live distribution content of a baseball game will be described. Specifically, the content is distributed by adding the generated live commentary to the sound information collected at the stadium (hereinafter referred to as "ballpark audio"), the moving image taken at the stadium, or the video including the stadium audio. The case of doing so will be described.

As shown in FIG. 1, the information output system 100 according to the present embodiment includes an information output device 10, a video distribution system 32, a stats input system 34, and a user terminal 36.

The video distribution system 32 shoots a baseball game held at a stadium with a camera and outputs the shot video. The video includes a stadium sound and a moving image composed of a plurality of frames. Time information is associated with each sampling point of the stadium sound and each frame of the moving image, and the stadium sound and the moving image are synchronized based on this time information. The time information is the date and time when the video was shot, the elapsed time from the start of the game, and the like.

The stats input system 34 is a system for a person in charge to input stats information about a match while acquiring a video output from the video distribution system 32 and watching the video. As the stats, for example, the content of the event is input for each event corresponding to one play such as pitching, hitting, running, and defense. In addition, a time stamp is added to each event together with the input of the event, for example, manually by the person in charge.

The user terminal 36 is a terminal used by a user who uses the service provided by the information output system 100. The user terminal 36 has a function of receiving content distributed from the information output device 10 and a function of outputting at least one of audio and moving images. The user terminal 36 is, for example, a personal computer, a smartphone, a tablet terminal, a mobile phone, a television, a radio, or the like.

The information output device 10 generates a commentary on the video and outputs the content with the commentary added to the video acquired from the video distribution system 32. Functionally, as shown in FIG. 2, the information output device 10 includes a video acquisition unit 11, an analysis unit 12, a stats acquisition unit 13, a synchronization unit 14, a generation unit 15, and a synthesis unit 16. .. Further, a scene information DB (Database) 21, a stats DB 22, an event DB 23, and a template DB 24 are stored in a predetermined storage area of the information output device 10. The video acquisition unit 11, the analysis unit 12, the stats acquisition unit 13, and the synchronization unit 14 are examples of the disclosure technology acquisition unit. Further, the synthesis unit 16 is an example of an output unit of the disclosed technology.

The video acquisition unit 11 acquires the video output from the video distribution system 32, and divides the video into a stadium audio and a moving image. The video acquisition unit 11 passes the divided moving image to the analysis unit 12, and also delivers the acquired video to the synthesis unit 16.

The analysis unit 12 acquires scene information for each section corresponding to each event in the moving image by image analysis of the moving image delivered from the video acquisition unit 11. Specifically, the analysis unit 12 detects the switching point of the cut of the camera from the difference in the pixel value between each frame of the moving image, and detects the switching point between the switching points as one section. Further, the analysis unit 12 recognizes the scene shown by the moving image of each section by using the recognition model. The scene can be, for example, a scene that captures a defensive body shape, a scene that captures a batter standing in a turn at bat, a scene that captures the state of a bench, a scene of a running base, a scene of a pick-off ball, a scene of sliding, and the like. The recognition model is machine-learned in advance about the correspondence between the moving image for each section and the label indicating the type of the correct scene shown by the moving image.

Further, the analysis unit 12 acquires information such as a ball count, a strike count, an out count (hereinafter referred to as “BSO”), a score, an inning, and a runner's situation from the telop part of the frame image included in each section. This information can be obtained by comparison with a predetermined format, character recognition processing, or the like. In the following, BSO will be referred to as "ball count (B) -strike count (S) -out count (O) (for example, 0-0-0)".

The analysis unit 12 stores the information acquired for each section in the scene information DB 21 in association with the time information associated with the start frame of the section and the time information associated with the end frame.

FIG. 3 shows an example of the scene information DB 21. In the example of FIG. 3, each row (each record) corresponds to the scene information of one section. "Sequence No." is assigned to each scene information in the order of time information. Further, in the example of FIG. 3, the "start time" is the time information associated with the start frame of the section, and the "end time" is the time information associated with the end frame of the section. Further, the "scene" is information indicating the type of the scene recognized by using the recognition model. Further, "inning" and "pre-event BSO" are information acquired from the telop in the frame image of the section. The information included in the scene information is not limited to the above example.

The stats acquisition unit 13 acquires stats information for each event input in the stats input system 34 and stores it in the stats DB 22. FIG. 4 shows an example of the stats DB22. In the example of FIG. 4, each row (each record) corresponds to stats information for one event. "Sequence No." is assigned to each stats information in the order of time information. Further, in the example of FIG. 4, the stats information includes a "start time" and a "end time" which are times entered by the person in charge as the start time and end time of the event indicated by each stats information. In addition, the stats information includes "inning" at the time of the event, "batter team" which is the name of the team to which the batter belongs, "batter" which is the name of the batter, and "pitcher team" which is the name of the team to which the pitcher belongs. And the information of "pitcher" which is the name of the pitcher is included. In addition, the stats information includes "number of pitches in the turn at bat", "event content", "direction of hitting" when the event is a hit, and "event result" indicating the number of pitches in the turn at bat for the batter at the time of the event. Contains information. Further, the stats information includes "pre-event BSO" which is the BSO before the event and "post-event BSO" based on the event result. The information included in the stats information is not limited to the above example.

The synchronization unit 14 generates event information in which the scene information and the stats information are synchronized by associating each of the stats information with the scene information of each section based on the order of the stats information.

Here, since the scene information is the information acquired by analyzing the moving image, the scenes that can be acquired are limited. However, since the time information (start time and end time) of the scene information is the time information associated with each frame of the moving image, it represents the accurate time information of the scene information acquired from each section and also. It is also synchronized with the time information of the stadium voice. On the other hand, the stats information can also acquire detailed information that cannot be acquired by the analysis of moving images. However, since the time information of the stats information is input by the person in charge, there may be cases of inaccuracies, input omissions, etc., and the grain size is coarse, and synchronization with the time information of the stadium voice is guaranteed. It has not been.

Therefore, by associating the stats information with the scene information, the time information of the stats information is corrected based on the time information of the scene information, and the information becomes accurate. This makes it possible to generate event information that is more detailed than the scene information and more accurate in the time information than the stats information.

Specifically, the synchronization unit 14 has a sequence number of each of the scene information and the stats information. Event information is generated by associating the stats information with the scene information in which the information of the items common to the scene information and the stats information match, while guaranteeing the context. Common items are, for example, "scene" and "event content", "pre-event BSO", and the like.

The synchronization unit 14 stores the generated event information in the event DB 23. FIG. 5 shows an example of the event DB 23. In the example of FIG. 5, each row (each record) corresponds to one event information. As shown in FIG. 5, the event information has an item in which the item of the scene information and the item of the stats information are integrated. Further, the synchronization unit 14 may refer to, for example, sequence No. 14 for scene information in which the corresponding stats information does not exist. Event information is generated by associating the same information as the immediately preceding stats information in order.

Note that each of the scene information and the event information is an example of the event information of the disclosed technology, and the stats information is an example of the external information of the disclosed technology.

Based on the event information stored in the event DB 23, the generation unit 15 generates a sentence (hereinafter, referred to as “actual commentary”) that is a commentary or a commentary regarding the event indicated by each event information. Specifically, the generation unit 15 selects a template corresponding to each event information from a plurality of live commentary templates stored in the template DB 24, and combines the selected template with the event information to provide a live commentary. To generate.

FIG. 6 shows an example of the template DB 24. In the example of FIG. 6, each row (each record) is information about one template, and by arranging a plurality of templates, a template group corresponding to one event information is formed. The same "template group ID" is assigned as identification information to the templates included in the same template group, and "sequence No. in the template group" is assigned in the order of output. Further, the template DB 24 includes information on a "speaker type" indicating whether each template is intended to be a commentator or a commentator as a speaker.

The "template" is a format in which parameters are inserted in a part of a sentence that is a commentary or a commentary. In the example of FIG. 6, the part of <> is the part where the parameter is inserted, and the

numbers

1, 2, ... Are assigned in <> in the order of appearance in each template. For the parameter to be inserted in <>, the item name of the event information is specified as the "parameter type". In FIG. 6, "before (or after) B (or S)" defined as the parameter type represents only the corresponding count of BSO. Further, the template DB 24 includes a reproduction time (hereinafter, referred to as “voice time”) when the live commentary generated based on each template is used as voice data.

In addition, in order to make the content with live commentary text compatible with various languages, the template DB 24 has items of "template", "voice time", and "parameter type" corresponding to each of a plurality of languages. You may remember.

Below, we will explain more specifically the generation of live commentary. The generation unit 15 selects the template group corresponding to the event information by using the selection model for selecting the template group suitable for the event information. The selection model is a model in which the correspondence between the event information and the optimum template group for the event indicated by the event information is machine-learned in advance, and the degree of conformity between the target event information and each of the template groups is output. It is a model to do.

For example, as shown in FIG. 7, the generation unit 15 has a “voice time” that is the sum of the voice times of each template included in the predetermined template groups in descending order of the goodness of fit output from the selection model. Is calculated. Then, the generation unit 15 has the highest degree of conformity among the template groups whose voice time is shorter than the time from the "start time" to the "end time" of the event information (hereinafter referred to as "event time"). Select. For example, when the goodness of fit with each template group as shown in FIG. 7 is obtained for the event information having an event time of 20 seconds, the generation unit 15 selects the template group whose template group ID is 3.

The generation unit 15 inserts the value of the event information item indicated by the "parameter type" into the <> part of each template included in the selected template group, and generates a live commentary. For example, the sequence No. of the event DB 23 shown in FIG. Regarding the event information of 5, it is assumed that the template group whose template group ID is 1 as shown in FIG. 6 is selected. Of these, the sequence No. in the template group. Take the template of 6 as an example. The generation unit 15 inserts the event result "foul" in <1> of the template, inserts "1" which is B after the event in <2>, and S after the event in <3>. Insert "1". As a result, the generation unit 15 generates a commentary sentence "Foul. This is one ball and one strike." The generation unit 15 passes the generated commentary to the synthesis unit 16.

The synthesizing unit 16 is a commentary on the actual situation delivered from the generation unit 15 based on the output timing of at least one of the stadium audio and the moving image of the video delivered from the video acquisition unit 11 for each section corresponding to each event. Adjust the output timing of the statement. Then, the synthesis unit 16 generates and outputs the content with the live commentary so that the live commentary sentence whose output timing is adjusted is output together with at least one of the stadium sound and the moving image.

Specifically, as shown in FIG. 8, the synthesizing unit 16 generates a content obtained by synthesizing the stadium audio and the audio data indicating the commentary text as the content with the commentary (see A in FIG. 8). This content is applicable to radio broadcasting and the like. Further, the synthesizing unit 16 generates a content obtained by synthesizing the original video (with stadium sound) or moving image (without stadium sound) and the audio data showing the live commentary as the content with the live commentary. This content can be applied to television broadcasting, Internet video distribution, and the like. Further, the synthesizing unit 16 synthesizes the original video (with stadium sound) or moving image (without stadium sound) and the image data (subtitle) that visualizes the text showing the live commentary as the content with the live commentary. (See B in FIG. 8). This content can also be applied to television broadcasting, Internet video distribution, and the like.

When synthesizing at least one of the stadium sound and the moving image with the live commentary, the synthesizing unit 16 corresponds to the time information of the moving image or the time information of the stadium sound synchronized with the time information of the moving image and the live commentary. Synchronize with the time information of the event information. Since the time information of the event information matches the time information of the moving image, it is possible to easily synchronize the two.

The information output device 10 can be realized by, for example, the computer 40 shown in FIG. The computer 40 includes a CPU (Central Processing Unit) 41, a memory 42 as a temporary storage area, and a non-volatile storage unit 43. Further, the computer 40 includes an input / output device 44 such as an input unit and a display unit, and an R / W (Read / Write) unit 45 that controls reading and writing of data to the storage medium 49. Further, the computer 40 includes a communication I / F (Interface) 46 connected to a network such as the Internet. The CPU 41, the memory 42, the storage unit 43, the input / output device 44, the R / W unit 45, and the communication I / F 46 are connected to each other via the bus 47.

The storage unit 43 can be realized by an HDD (Hard Disk Drive), an SSD (Solid State Drive), a flash memory, or the like. The storage unit 43 as a storage medium stores an information output program 50 for causing the computer 40 to function as the information output device 10. The information output program 50 includes a video acquisition process 51, an analysis process 52, a stats acquisition process 53, a synchronization process 54, a generation process 55, and a synthesis process 56. Further, the storage unit 43 has an information storage area 60 in which information constituting each of the scene information DB 21, the stats DB 22, the event DB 23, and the template DB 24 is stored.

The CPU 41 reads the information output program 50 from the storage unit 43, expands it into the memory 42, and sequentially executes the processes included in the information output program 50. The CPU 41 operates as the video acquisition unit 11 shown in FIG. 2 by executing the video acquisition process 51. Further, the CPU 41 operates as the analysis unit 12 shown in FIG. 2 by executing the analysis process 52. Further, the CPU 41 operates as the stats acquisition unit 13 shown in FIG. 2 by executing the stats acquisition process 53. Further, the CPU 41 operates as the synchronization unit 14 shown in FIG. 2 by executing the synchronization process 54. Further, the CPU 41 operates as the generation unit 15 shown in FIG. 2 by executing the generation process 55. Further, the CPU 41 operates as the synthesis unit 16 shown in FIG. 2 by executing the synthesis process 56. Further, the CPU 41 reads information from the information storage area 60, and expands each of the scene information DB 21, the stats DB 22, the event DB 23, and the template DB 24 into the memory 42. As a result, the computer 40 that has executed the information output program 50 functions as the information output device 10. The CPU 41 that executes the program is hardware.

The function realized by the information output program 50 can also be realized by, for example, a semiconductor integrated circuit, more specifically, an ASIC (Application Specific Integrated Circuit) or the like.

Next, the operation of the information output system 100 according to the present embodiment will be described. The video distribution system 32 shoots a baseball game held at the stadium with a camera and starts outputting the shot video. Then, the stats input system 34 acquires the video output from the video distribution system 32, and the person in charge inputs the stats information regarding the match while watching the video. Further, in the information output device 10, the information output process shown in FIG. 10 is executed. The information output process is an example of the information output method of the disclosed technology.

In step S12, the video acquisition unit 11 acquires the video for a predetermined time output from the video distribution system 32, and divides the video into a stadium audio and a moving image. Then, the video acquisition unit 11 passes the divided moving image to the analysis unit 12, and also delivers the acquired video to the synthesis unit 16.

Next, in step S14, the analysis unit 12 acquires scene information for each section corresponding to each event in the moving image by performing image analysis on the moving image delivered from the video acquisition unit 11. Then, the analysis unit 12 stores the information acquired for each section in the scene information DB 21 in association with the time information associated with the start frame of the section and the time information associated with the end frame. ..

Next, in step S16, the stats acquisition unit 13 acquires the stats information for each event input in the stats input system 34 and stores it in the stats DB 22.

Next, in step S18, the synchronization unit 14 sets the sequence No. of each of the scene information and the stats information. While guaranteeing the context, the stats information is associated with the scene information in which the information of the items common to the scene information and the stats information match. As a result, the synchronization unit 14 generates event information. The synchronization unit 14 stores the generated event information in the event DB 23.

Next, in step S20, the generation unit 15 selects a template corresponding to each event information from the templates of the plurality of live commentary sentences stored in the template DB 24, and combines the selected template with the event information to perform the live commentary. Generate a commentary. The generation unit 15 passes the generated commentary to the synthesis unit 16.

Next, in step S22, the synthesizing unit 16 synthesizes at least one of the stadium audio and the moving image with the audio data or the image data (subtitles) indicating the live commentary. At this time, the synthesis unit 16 synchronizes the time information of the stadium sound synchronized with the time information of the moving image or the time information of the moving image with the time information of the event information corresponding to the live commentary. As a result, the synthesizing unit 16 generates and outputs content with a live commentary in which at least one of the stadium audio and the moving image and the output timing of the live commentary are synchronized. Then, the information output process ends.

The content with live commentary output from the synthesis unit 16 is distributed to the user terminal 36, and the user who uses the user terminal 36 can view the content with live commentary.

As described above, according to the information output system according to the present embodiment, the information output device acquires a video of a baseball game including a stadium sound and a moving image, and in the moving image, it is used for each event. Acquire the scene information for each corresponding section. Further, the information output device acquires stats information input externally based on the video. The information output device associates the stats information with the scene information to generate event information having accurate time information for the video and detailed information about the event. Then, the information output device generates a live commentary for each event based on each event information and a template. Further, the information output device adjusts the output timing of the generated live commentary based on the output timing of at least one section of the stadium audio and the moving image, and outputs the output together with at least one of the stadium audio and the moving image. As a result, it is possible to output information showing the actual condition and the explanation at an appropriate timing in the live distribution content such as a sports match without requiring a manuscript prepared in advance and the actual condition and the commentator. As a result, since there is no need for a commentator and a commentator, labor costs, distribution equipment, and other costs can be reduced. In addition, it is possible to eliminate the bias of the commentary text due to the difference in ability between the commentator and the commentator.

In the above embodiment, when selecting a template according to the event information, the template according to the attribute information of the user who views the delivered content with live commentary may be selected. Specifically, a template for generating a commentary that is biased toward one of the teams, a template for generating a commentary that is biased toward the other team, and a template for a neutral position are prepared. In addition, the information of the user's favorite team is acquired as the user's attribute information. Information may be acquired in advance, or information input by the user may be acquired before the start of distribution or during distribution. It may also be estimated based on the user's past viewing history. Then, when selecting a template, select a template that generates a commentary that is biased toward the user's favorite team. As a result, the live commentary can be flexibly changed according to the user's preference, and the distributed content can be diversified. The attributes of the user are not limited to the favorite team, but may be gender, age, proficiency level for rules, and the like.

Further, in the above embodiment, the case where the live comment is mainly generated is explained, but the same applies to the commentary. Specifically, by preparing a template for commentary using event information and a selection model that associates the event information with the template for appropriate commentary for the event information, the commentary according to the event information is prepared. Can generate statements. In the case where two speakers, a commentator and a commentator, are assumed, when the commentary sentence is converted into voice data, the voice sound may be different depending on the speaker.

Further, in the above embodiment, the case where the event information is generated by associating the stats information with the scene information has been described, but only the scene information may be used as the event information. Further, the accuracy of the image analysis may be improved so that detailed information equivalent to the stats information may be acquired as the scene information. Further, external information other than stats information may be acquired and included in the event information. For example, prepare information such as past competition results, pitcher's ball type, and batting average for each batter's course. Then, such information related to the team, pitcher, and batter at each event can be included in the event information. This makes it possible to generate commentary sentences such as "So far, the batting average of BB players against AA pitchers is 30%" and "This batter is strong in raising the outside angle".

Furthermore, as external information, information collected from teams and players, information collected from the Internet, etc. can be prepared, and such information related to teams, pitchers, and batters at each event can be included in the event information. good. Then, for example, for event information corresponding to a scene in which a bench or an audience seat is shown, a template that uses such information is selected. In addition, this information may be used to generate a live commentary for a portion where time is available between events. This makes it possible to generate a commentary such as "AA player, yesterday's hit by pitch does not seem to have any effect."

Further, in the above embodiment, the case where the template corresponding to the event information is selected by using the selection model has been described, but the present invention is not limited to this, and the selection may be made on a rule basis. For example, if the event result includes out, the template A may be selected, and if the pre-event BSO is 3-2-2 and there is a runner, the template B may be selected according to a predetermined rule.

Further, in the above embodiment, when selecting a template group, a case where a template group having a voice time shorter than the event time is selected has been described, but the present invention is not limited to this. When generating a moving image or video with subtitles showing a live commentary as the content with live commentary, it is not necessary to consider the audio time. Further, even when the audio data showing the live commentary is output, the template group having the highest degree of conformity may be selected without considering the audio time. In this case, if the audio time of the selected template group is longer than the event time, it may be processed to speed up the playback speed of the audio data showing the live commentary, or some templates included in the template group may be deleted. do it.

In the above embodiment, an example of a baseball game has been described as a sports image, but the application example of the disclosed technology is not limited to this, and can be applied to, for example, soccer, basketball, and the like. For example, in the case of soccer, by analyzing the moving image, along with the time information corresponding to each frame, the running speed of the player, the current score (telop information), the distance of the pass, the bias of the positioning of all the players, and the attack direction. Is acquired as scene information. In addition, as stats information, player names, positioning, play contents such as sliding and passes, play results, etc. are acquired. Further, for example, in the case of basketball, the player name, the running speed of the player, the positioning, the result of the shot, the number of points scored when the shot is made, the content of the play such as steel, no-look pass, rebound, etc. are acquired as scene information. Also, as stats information, the team name, player name, score, etc. are acquired. Then, in either case, as in the above embodiment, the event information may be generated by associating the stats information with the scene information, the template corresponding to the event information may be selected, and the live commentary may be generated.

Further, in the above embodiment, the embodiment in which the information output program is stored (installed) in the storage unit in advance has been described, but the present invention is not limited to this. The program according to the disclosed technology can also be provided in a form stored in a storage medium such as a CD-ROM, a DVD-ROM, or a USB memory.

10 Information output device 11 Video acquisition unit 12 Analysis unit 13 Stats acquisition unit 14 Synchronization unit 15 Generation unit 16 Synthesis unit 21 Scene information DB
22 Stats DB
23 Event DB
24 Template DB
32 Video distribution system 34 Stats input system 36 User terminal 40 Computer 41 CPU
42 Memory 43 Storage unit 49 Storage medium 50 Information output program 100 Information output system

Claims

The sports video including the sound information and the moving image and the event information about the event indicated by each section of the moving image are acquired.
Based on the acquired event information, a commentary on the event is generated for each section.
This includes adjusting the output timing of the generated live commentary based on the output timing of at least one section of the sound information and the moving image, and outputting the sound information and the moving image together with at least one of the moving images. An information output program that causes a computer to perform processing.
The information output program according to claim 1, wherein the sound information and the voice data indicating the live commentary are combined and output.
The information output program according to claim 1, wherein the moving image or the video and the audio data indicating the live commentary or the image data in which the text showing the live commentary is visualized are combined and output.
The information output program according to any one of claims 1 to 3, wherein the event information is acquired by image analysis of the moving image for each section.
The information output program according to any one of claims 1 to 4, wherein the event information is acquired from external information input from the outside.
The information output program according to claim 5, wherein the information in which each of the external information is associated with each of the sections is acquired as the event information based on the order of the external information.
A claim that selects a template corresponding to the acquired event information from a plurality of predetermined live commentary templates, and combines the selected template with the acquired event information to generate the live commentary. The information output program according to any one of claims 1 to 6.
Acquires the output information of the sound or the attribute information of the viewer of the moving image,
The information output program according to claim 7, wherein the template is selected based on the attribute information of the viewer.
The information output program according to claim 7 or 8, which selects the template according to the length of the generated commentary.
An acquisition unit that acquires a sports image including sound information and a moving image, and event information related to an event indicated by each section of the moving image.
Based on the event information acquired by the acquisition unit, a generation unit that generates a commentary on the event for each section, and a generation unit.
Based on the output timing of at least one section of the sound information and the moving image, the output timing of the commentary on the live commentary generated by the generation unit is adjusted together with at least one of the sound information and the moving image. The output section to output and
Information output device including.
The information output device according to claim 10, wherein the output unit synthesizes and outputs the sound information and the voice data indicating the live commentary.
The information output according to claim 10, wherein the output unit synthesizes and outputs the moving image or the video and the audio data indicating the live commentary or the image data in which the text showing the live commentary is visualized. Device.
The information output device according to any one of claims 10 to 12, wherein the acquisition unit acquires the event information by image analysis of the moving image for each section.
The information output device according to any one of claims 10 to 13, wherein the acquisition unit acquires the event information from external information input from the outside.
The information output device according to claim 14, wherein the acquisition unit acquires information in which each of the external information is associated with each of the sections as event information, based on the order of the external information.
The generation unit selects a template corresponding to the acquired event information from a plurality of predetermined live commentary templates, and combines the selected template with the acquired event information to obtain the live commentary. The information output device according to any one of claims 10 to 15.
The acquisition unit acquires the output sound information or the attribute information of the viewer of the moving image, and obtains the output information.
The information output device according to claim 16, wherein the generation unit selects the template based on the attribute information of the viewer.
The information output device according to claim 16 or 17, wherein the generation unit selects the template according to the length of the generated commentary.
The sports video including the sound information and the moving image and the event information about the event indicated by each section of the moving image are acquired.
Based on the acquired event information, a commentary on the event is generated for each section.
This includes adjusting the output timing of the generated live commentary based on the output timing of at least one section of the sound information and the moving image, and outputting the sound information and the moving image together with at least one of the moving images. An information output method in which a computer executes processing.
The sports video including the sound information and the moving image and the event information about the event indicated by each section of the moving image are acquired.
Based on the acquired event information, a commentary on the event is generated for each section.
This includes adjusting the output timing of the generated live commentary based on the output timing of at least one section of the sound information and the moving image, and outputting the sound information and the moving image together with at least one of the moving images. A storage medium that stores an information output program for causing a computer to execute processing.