Nothing Special   »   [go: up one dir, main page]

CA2129925A1 - Audio synchronization of subtitles - Google Patents

Audio synchronization of subtitles

Info

Publication number
CA2129925A1
CA2129925A1 CA002129925A CA2129925A CA2129925A1 CA 2129925 A1 CA2129925 A1 CA 2129925A1 CA 002129925 A CA002129925 A CA 002129925A CA 2129925 A CA2129925 A CA 2129925A CA 2129925 A1 CA2129925 A1 CA 2129925A1
Authority
CA
Canada
Prior art keywords
display
subtitle
cue
audio signal
stamp
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA002129925A
Other languages
French (fr)
Inventor
Hendrik Adolf Eldert Zwaneveld
Brian Craig Dickson
Roy Cameron Snell
Morris Jaslowitz
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Film Board of Canada
Original Assignee
National Film Board of Canada
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Film Board of Canada filed Critical National Film Board of Canada
Priority to CA002129925A priority Critical patent/CA2129925A1/en
Publication of CA2129925A1 publication Critical patent/CA2129925A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G03PHOTOGRAPHY; CINEMATOGRAPHY; ANALOGOUS TECHNIQUES USING WAVES OTHER THAN OPTICAL WAVES; ELECTROGRAPHY; HOLOGRAPHY
    • G03BAPPARATUS OR ARRANGEMENTS FOR TAKING PHOTOGRAPHS OR FOR PROJECTING OR VIEWING THEM; APPARATUS OR ARRANGEMENTS EMPLOYING ANALOGOUS TECHNIQUES USING WAVES OTHER THAN OPTICAL WAVES; ACCESSORIES THEREFOR
    • G03B31/00Associated working of cameras or projectors with sound-recording or sound-reproducing means
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8106Monomedia components thereof involving special audio data, e.g. different tracks for different languages
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/278Subtitling

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Studio Circuits (AREA)

Abstract

A method and system (or apparatus) are provided for creating and presenting continuous, dialogue-matched and uninterrupted display of subtitles while displaying or presenting motion picture images, video images, and the like. The text of a sequence of subtitles to be displayed, and the in- and out-points corresponding to the time during which dialogue or narration occurs on the screen are determined by matching or comparing sound modulation samples, previously captured from the sound track, with sound modulation samples taken during the presentation.

Description

21~9925 Title: AUDIO SYNCHRONIZATION OF SU~ 'l'LES

The present invention relates to the display of subtitles (e.g.
voice content) by electronic means which exploits digital audio signal signature sampling and digital signature pattern matching to synchronize or correlate the display of a subtitle with a particular point in time during the presentation or play back of a medium carrying an audio signal; in accordance with the present invention subtitles (i.e. text) may be displayed with respect to motion pictures, videos, slide presentations and the like or even with respect to audio sound tracks not associated with images.

Conventional film subtitling has previously required dialogue or narration text to be physically incorporated into the image areas of a film print. The incorporation of subtitles on film may be accomplished for example by using known chemical or laser etching techniques or by known photo-optical means; thus for example, the text of a subtitle may be applied onto a latent film element which may be subsequently used as an overlay during the printing process.

U.S patent no. 4,673,266 is an example of a subtitle presentation system which marks a film print with coded signals; each recorded 212992~
subtitle is recalled by means of a corresponding coded signal on the print and displayed.

On the other hand, the present invention allows an unaltered presentation copy to be projected along with subtitles or captions which are generated and synchronized with the images by electronic means, i.e. under control of a micro-computer. The subtitles may for example appear in the image frame itself or alternatively on a display means located below, above or beside the image frame.

The present invention offers an effective way to avoid the high costs associated with conventional subtitling processes. The present invention also advantageously allows a single presentation copy to be projected with subtitles in any number of languages or with same-language captions without having to permanently alter the presentation copy.

U.S. patent nos. 4,839,744 and 5,055,939 as well as related Canadian patent no. 1,307,745 teach a system for providing a lock synchronization between a lower quality original (analogue) sound track associated with a motion picture film or the like and a higher quality digital sound track or source; the higher quality digital sound, external to the print is heard by the audience, instead of the original sound track(s) on the film print. These patents do not deal with the problem of displaying subtitles:

they deal with the replacement of one sound track by another sound track and thus teach away from the present invention which is concerned with providing text as a visual supplement to an actual sound track.

SUMMARY OF THE INVENTION

Thus, in a broad aspect the present invention provides a method for providing subtitle co-ordination display information from a lo sound track providing an audio signal carrying an audio message, the display information being configured for displaying one or more subtitles during play of the audio message from the sound track, comprising - providing a first storage element containing subtitle cue information comprising one or more predetermined subtitles, each subtitle being associated with a respective display cue stamp, each display cue stamp comprising a first cue value and a second cue value respectively marking the beginning and the end of the display of a respective subtitle, - sensing said audio signal and providing a digital audio signal output to a processor element, said processor element being able to associate a digital audio signal signature (i.e. fingerprint, pattern or the like) with a respective predetermined subtitle of stage first storage element, and 212992~

- storing subtitle display information in a second storage element, said subtitle display information comprising one or more of the said subtitles produced by said processor element, each of said so stored subtitles being associated with a respective said audio signal signature and a respective said display cue stamp.

In a further aspect the present invention provides a method for displaying one or more subtitles during play of an audio message from a sound track providing an audio signal carrying said audio message, comprising - providing a storage element containing subtitle display information comprising one or more predetermined subtitles and one or more digital audio signal signatures derived from said audio signal, each said subtitle being associated with a respective digital audio signal signature, each said subtitle being associated with a respective display cue stamp, each display cue stamp comprising a first cue value and a second cue value respectively marking the beginning and the end of the display of a respective subtitle, - sensing the audio signal and producing an output comprising a said digital audio signal signature, - selecting from said storage element, during play of said audio message, a specific subtitle by correlating the audio signal signatures of said output from the sensing element with the audio signal signatures of the storage element, and - displaying the specific subtitle, the specific subtitle being displayed in accordance with the display cue stamp associated therewith.

It is to be understood that the method can continuously sample the audio signal and implement the display of one subtitle after the other in succession in accordance with the cue data associated with the subtitles.

The present invention in another broad aspect provides an audio cue system for providing subtitle co-ordination display information from a sound track providing an audio signal carrying an audio message, said display information being configured for displaying one or more subtitles during play of the audio message from the sound track, comprising - a processor element, - a first storage element containing subtitle cue information comprising one or more predetermined subtitles, each subtitle being associated with a respective display cue stamp, each display cue stamp comprising a first cue value and a second cue value respectively marking the beginning and the end of the display of a respective subtitle, - a sensing element for sensing said audio signal and for providing an input to said processor element comprising a digital audio signal, 2129Y2~
said processor element being able to associate a digital audio signal signature with a respective predetermined subtitle of said first storage element, and - a second storage element for storing subtitle display information comprising one or more of the said subtitles produced by said processor element, each of said so stored subtitles being associated with a respective said audio signal signature and a respective said display cue stamp.
In accordance with a particular aspect the present invention provides an audio cue system for providing subtitle co-ordination display information for displaying subtitles during projection of viewable images, the images having a sound track associated therewith for providing an audio signal carrying an audio message (including a first spoken language) synchronised with the viewable images, the display information being obtained from the sound track, comprising - a processor element, - a first storage element containing subtitle cue information comprising one or more predetermined subtitles, each subtitle being associated with a respective display cue stamp, each display cue stamp comprising a first cue value and a second cue value respectively marking the beginning and the end of the display of a respective subtitle, 2129~25 - a sensing element for sensing said audio signal and for providing an input to said processor element comprising a digital audio signal, said processor element being able to associate a digital audio signal signature with a respective predetermined subtitle of said first storage element, and - a second storage element for storing subtitle display information comprising one or more of the said subtitles produced by said processor element, each of said so stored subtitles being associated with a respective said audio signal signature and a respective said display cue stamp.

The present invention in another broad aspect provides a display system for displaying subtitles during play of an audio message from a sound track providing an audio signal carrying said audio message, comprising - a storage element containing subtitle information comprising one or more predetermined subtitles and one or more digital audio signal signatures (i.e. fingerprint, pattern or the like) derived from the audio signal, each said subtitle being associated with a respective digital audio signal signature, each said subtitle being associated with a respective display cue stamp, each display cue stamp comprising a first cue value and a second cue value 212992~
respectively marking the beginning and the end of the display of a respective subtitle, - a sensing element for sensing said audio signal and for producing an output comprising a said digital audio signal signatures, - a processor element for selecting from the storage element, during play of said audio message, a specific subtitle for display by correlating the audio signal signatures of the output from the sensing element with the audio signal signatures of the storage element, the processor element being able to produce an output representative of the specific subtitle, and - a display element for displaying the selected subtitle in response to the output from the processor element, the selected subtitle being displayed in accordance with the display cue stamp associated therewith.

In a more particular aspect the present invention provides a display system for displaying subtitles during projection of viewable images, said images having a sound track associated therewith for providing an audio signal carrying an audio message (e.g. an audio message including a first spoken language) synchronised with said viewable images, comprising - a storage element containing subtitle display information comprising one or more predetermined subtitles and one or more digital audio signal signatures derived from the audio signal, each said subtitle being associated with a respective digital audio signal signature, each said subtitle being associated with a respective display cue stamp, each display cue stamp comprising a first cue value and a second cue value respectively marking the beginning and the end of the display of a respective subtitle, - a sensing element for sensing the audio signal and for producing an output comprising a said digital audio signal signatures, - a processor element for selecting from said storage element, during display of said viewable images, a specific subtitle for display by correlating the audio signal signature of the output from the sensing element with the audio signal signatures of the storage element, the processor element being able to produce an output representative of the specific subtitle, and - a display element for displaying the specific subtitle in response to the output from the processor element, the selected subtitle being displayed in accordance with the display cue stamp associated therewith and in synchronization with the viewable images.

Cue data or values as mentioned herein may be expressed in any suitable manner whatsoever; it may for example be expressed in feet, in frames, in time, in drop- or non-drop frame video mode, film edge keynumbers, and the like.

212992~

Thus, in accordance with the present invention each display cue stamp may for example comprise a display time stamp, each display time stamp comprising a first time and a second time respectively marking the beginning and the end of the display of a subtitle,;
in this case a selected subtitle is displayed in accordance with the display time stamp associated therewith.

Alternatively, in accordance with the present invention each display cue stamp may if desired for example, comprise a display frame count stamp, each display frame count stamp comprising a first frame count value and a second frame count value respectively marking the beginning and the end of the display of a subtitle; a selected subtitle is displayed in accordance with the display frame count stamp associated therewith.

The text of a subtitle may be in the same language as the original film, video or the like or else it may be a translation in some other language; in either case the presentation copy of the film, video and the like need not be permanently altered to include the subtitle.

As may be understood, the present invention provides a method and system (or apparatus) for creating and presenting continuous, dialogue-matched and uninterrupted display of subtitles during the presentation or play on the one hand of motion picture images, video images, still images and the like associated with a 21~992~

sound track or on the other hand even of an audio message alone (i.e. without such images).

Any suitable computer provided with suitable software may be used to provide the (ASCII) text of a sequence of subtitles to be displayed; the software should also be able to associate the in-and out-points corresponding to the time during which dialogue or narration occurs on the screen which may be recorded using any suitable memory medium, along with a continuous standard SMPTE/EBU time-code or frame count.

During presentation, for example, sound modulation samples which may be stored in computer-based memory, are compared against sound modulation samples taken during the presentation, are matched and subtitles synchronized with the audio message by any suitable pattern matching techniques (i.e. using any suitable computer program). Using commercial electronic display technology, the display of a subtitle text simultaneously with the actual dialogue or narration in the original audio on the presentation copy, may be activated by processor means (e.g.
comprising the computer) which is prompted by the timing cue instructions and sound modulation samples already stored on the computer memory,.

The basic aspects of the system relate to creating subtitle text and associated display cue stamps, capturing audio synchronization information and associating audio signatures with the text and display cue stamps and finally presenting the text on screen on the basis of such synchronization information. The display system may as described herein, for example, include an electronic representation of a cuesheet, which comprises the text of each of a film's subtitles, along with the cuepoints identifying when each subtitle is required to appear and subsequently disappear.

Initial text editing, adaptation and versioning of the cuesheet or text/cue file may be done from a dialogue script, spotting list or ASCII file or by viewing of a timecoded video cassette, and may be executed from data created on commonly used text editing platforms, including for example, MacIntosh, MS-DOS and Windows and the like. Any suitable capture software (e.g.
spotting, translation and text editing software available from the National Film Board of Canada under the product name Cine-Text v 3.1, module 1) may be used to create an above mentioned initial text/cue file. Thus, for example subtitle information comprising a subtitle and an associated display cue stamp, may be organized in an ASCII file that has any legal DOS filename, as below I'm Annie Vidal 212992~
-I want to tell you the "Rest-O-Pop"
story Let me present you the "rest-O-Pop" crew.

In the above example file each subtitle may comprise two lines of text with a maximum of 40 characters per line; when only one line is used the other line must be left blank. In the pair of lines immediately following a subtitle of the cuesheet file the first line is used to present the subtitle's (first) cue-in value (i.e.

start time) e.g. 827 seconds; the second line is used to present the subtitle's (second) cue-out value (i.e. finish time) e.g. 927 seconds. In any case the cue in and cue out values are expressed as cumulative time or frame count values (i.e. starting at an initial time or frame count of zero). A pair of subtitle lines may, if desired, be preceded by a line which is either blank or contains a subtitle identification number.

The above subtitle file format respects industry conventions which call for up to 40 characters per line, with text positioning on the left, right or in the centre, the subtitle window permits the lay-out of static, non-scrolling text lines.

Subtitles appear and disappear in their totality, as determined by cuepoints. Scene length determines the size of the text window and therewith the duration of a subtitle, which assures adequate picture-content viewing time.

After the text/cue file is obtained it may, for example, then be used to obtain subtitle display information comprised for example in a file including audio signal signatures associated with respective subtitles.

Any suitable capture software may be used to create an above mentioned subtitle display information provided that it enables lay-out of synchronization parameters, and capture of sound track modulation reference samples (optical, magnetic and the like) throughout the reel, tape, compact disk and the like so as to enable real-time, continuous linkage or synchronization of picture content with subtitle/caption representations of dialogue or narration. Preferably, the capture software may be used in conjunction with standard film editing devices and is usable with standard projection equipment or video playback eguipment with a common monitor and commonly available personal computers. A
suitable program is the Cine-Text, v3.1 module 2, preparation/capturing/testing software which is available from the National Film Board of Canada.

A suitable computer may be equipped with a Digital Signal Processing (DSP) card (e.g. a Spectrum TMS320C25 system board card from Spectrum Signal Processing Inc. Burnaby, B.C. Canada) and an Audio Interface box (e.g. a CineText Audio Interface box available from the National Film Board), which enable the system having suitable capture software to sample a film's soundtracks, and thereby create and store digital "sound signatures" that are unique to that film. The computer is plugged into the sound reproducer for sound reference sampling/monitoring and real-time syncing during initial screening.

As may be surmised, the signature sampling process relates the behaviour of an audio signal of a film, video and the like, at any location, to the temporal/positional value of that location.
Thus, during any subsequent playback of the images and/or associated soundtrack, the system or method of the present invention can compare the film's audio signal against the previously stored sound signatures to track the images (or sound track) in real-time. This in turn allows the system computer to automatically generate and correctly synchronize the electronic display of the film's subtitles/captions in accordance with its subtitle cuesheet file.

Sound signature sampling may, of course, for example, be based upon the playback of a film's optical sound track or its final (e.g. magnetic) sound mix. In any case, the sound signatures should be created from a high quality reproduction of the film's final sound mix. The method is especially suited for subtitle preparation when lead-time before the screening is short, such as for film festivals.

A system in accordance with the present invention may be significantly less expensive than alternatives, is compatible with low-cost personal computers and works reliably in any TV
standard.

Any computer file containing text and sync cues as described herein may be stored in a computer RAM memory, on a CD-ROM disk, magnetic recording media (e.g. floppy or hard disk) and the like.
Thus, data representing the sound signatures and subtitle cuesheet files for a full-length feature film, may, for example, be conveniently stored and transported on a single 3 1/2 inch floppy disk or other similar storage media. The memory media may, if desired, also be used for subtitling/captioning by means of optical, chemical or laser methods, to control text cues during the etching or printing, as well as for video data text display in TV and sound dubbing studios.

Although the above discussion has been directed to associating an audio signal signature with a text/cue file comprising subtitles associated with display cue stamps, it is to be understood herein that an audio signal signature could initially be associated with either a subtitle or display cue stamp and thereafter the display cue stamp or subtitle as the case may be could thereafter be associated therewith.

212992~

The present invention provides for the subtitles associated with suitable synchronization information to be presented or played back with sound film prints, video source materials, slides, optical or electronic overhead slides with recorded sound, and all formats of these media. A standard video data projector, equipped with a suitable personal computer (i.e. PC) interface, represents one of the most convenient types of subtitle output devices for this purpose. For presentation purposes however, a Digital Interface card is required for play back, as well as a standard commercial Patch and Motion Detector based film monitoring system (such as for example model FM 35 made by Component Engineering of Seattle, WA, USA), if the projector is not already equipped with one.

A suitable computer may, for example, be equipped with a Digital Signal Processing (DSP) card (e.g. the above mention Spectrum card) and an Audio Interface box (e.g. the above mentioned Audio Interface box from the National Film Board of Canada), which enable the system having any suitable software to sample a film's soundtracks, and thereby compare "sound signatures" that are unique to that film. A suitable presentation software program is the Cine-Text, v3.1 module 3, presentation software which is available from the National Film Board of Canada.

The choice of display device for electronic subtitles/captions is based upon consideration of theatre size and layout, since - 212992~
different types of video projectors and electronic display apparatus vary in terms of their output intensities, focal characteristics and viewing angles which can be accommodated.

In order to achieve maximum brightness and contrast, projected subtitles are best displayed using an auxiliary screen rather than the theatre's main screen. The former should be a non-perforated high-gain or white surface screen, sized according to the dimensions of the subtitles to be projected, and capable of being mounted outside (e.g. below) the theatre's normal screen.
If light output is not a concern, and if sufficient white screen area is available on a theatre's normal screen (i.e. outsid) the area that is required for the motion picture images), the electronic subtitles/captions may be projected there instead.

Standard video coax cabling between computer and video data projector is required. Audio modulation pattern recognition technology is used to reference and track the projected film in real-time and, thereby, to automatically generate the film's subtitles in closed-loop synchronization with it.

This technique obviates the need for installing optical shaft encoders (or any other optical/electro-mechanical film position feedback devices) in a theatre, and it gives the display system the capability of automatically recalibrating itself, if and when necessary to accommodate missing film footage e.g. for film 212992~
prints that have been broken or modified and subsequently repaired. It can be used to track the projection of a film print even when the projector(s) being used suffer(s) from poor speed regulation.

s The display system can recognize and correct automatically for projector or playback device speed variations. It is tolerant of splices which are made after capture of sound reference samples as well as switch-overs between projectors which are ill-timed since the system hunts for related sound patterns and if one cannot be found it is merely ignored. The system is thus projector-stop and computer-stop resistant: the system is also film-break or modification resistant, even, for example, when up to 10 feet is removed after spotting. The system or apparatus may be installed permanently or temporarily and may be configured to be transportable.

Film copies are re-usable for other subtitle versions, no etching or laser burn-in damage occurs on the film print picture. No inherent encoding of the copy is required, no labour intensive attachment (and subsequent removal) of coded signal (barcoded) stickers to the print, to trigger in- and out-cues during projection are used. It follows that no decoding detector to read those coded signals is required.

Subtitle text which is displayed is discrete, and suitably is readable at a distance. Any known electronic data display technology is acceptable for display, such as a video data display, a Liquid Crystal Display or a Light Emitting Diode (LED) display. Optionally modular, transportable LED display panels may be used which may be mounted on adjustable height tripods, as required.

No additional operator is required in the projection booth, since the system may be run semi-automatically and only requires being turned on and a cursor to be placed at the film title displayed on the computer screen, before screening of the film.

No or only minor mechanical modifications are required to the projector. The system may operate with platter or single reel, single or multiple film projector or video projector configurations.

In drawings which illustrate example embodiments of the invention:
Figure 1 is a block diagram showing a text/cue system, Figure 2 is a block diagram showing an audio cue system, Figure 3 is a block diagram showing a display system, Figure 3a shows an example audio signature obtained by the cue system of figure 1 from a moving picture film for association with a subtitle, 212992~

Figure 4a shows the same digital audio signature of figure 3 but as sensed by the display system of figure 2 during play back of the moving picture film;
Figure 5 illustrates by means of block diagrams example presentation set-ups exploiting audio pattern recognition in accordance with the present invention Figure 6 shows an example film strip illustrating cue placement;
Figure 7 shows a cue detectro assembly and optical sensor of a film Patch and Motion detector.
Figure 1 illustrates the preliminary text/cue processing of a medium carrying a sound track, i.e. to obtain a desired preliminary subtitle cue information file stored on an appropriate storage medium. For the illustrated system, the sound track medium is shown as a moving picture film 1 by way of example only; the medium could of course be any magnetic, optical or magneto-optical medium or the like and may if desired comprise only a sound track and no viewable images.

The text cue system as shown in figure 1 is provided with a cue source element 2 which comprises a start cue element (or patch).
The cue source element 2 facilitates the preparation by the processor element 3 of display cue stamps (as defined above) which are derived from the film 1 and which are associated with one or more subtitles. The start cue element may comprise a known patch and motion detector and an associated time-code 212992i generator. A patch detector establishes the starting time for recording or capture; a motion and film presence detector senses the movement of fim (in case of a break the film stops and an alarm is triggered); and the time generator establishes the time references once the starting time is established by the patch.
The cue source element 2 may, for example, also comprise a photoelectric cell which converts light variations to electrical impulses, such as is used on projectors, editing tables, or optical sound readers; on the other hand the cue source element may comprise a magnetic playback head or reproduce head, used to sense the magnetic flux from a magnetic sound recording.

The cue source element 2 may, for example, provide a cue value output 4 to the processor element 3 in the form of cumulative play time values indicative of the amount of time the film has been projected (as provided by the photoelectric cell or magnetic playback head). Alternatively, the cue value output 4 may be in the form of cumulative frame numbers indicative of the amount of frames which have been shown (as provided by pulse codes from an optical shaft encoder mounted on the film transport - it is presumed that the relationship of the shaft to the amount of film is known - there are no codes on the film to provide such information).

The text/cue system of figure 1 includes a processor element 3 which may comprise a suitable computer and an associated monitor, the system being suitably configured to allow an operator to verify the scenes associated with any given pair of cue values.
The computer is provided with a suitable program for allowing an operator to command the computer to associate a specific subtitle with a specific display cue stamp comprising a first and second cue value as described above.

The subtitle may be provided by input 5 which may be provided directly from a keyboard entry or from some previously prepared 10 ASCII file available to the computer; a suitable example program is the spotting, translation and text editing software available from the National Film Board of Canada under the product name Cine-Text v 3.1, module 1.

The processor element 3 is suitably configured to take the above mentioned subtitles and respective associated display cue stamps and output it via output 6 to storage element 7 in the form of a text/cue file on a suitable memory medium, the text/cue file being intended to comprise a plurality of such specific subtitles; alternatively, each specific subtitle may be sent to an individual file on the memory medium.

In operation, the operator provides a subtitle from the keyboard or from a previously prepared ASCII file and using the computer selects an appropriate pair of cue values obtained from the cue value source to create a display cue stamp for association 21~9925 therewith. The pair of cue values is of course selected on the basis of where during the play of the film the subtitle must appear; this is predetermined by viewing on the monitor the portion of the film associated with the cue values.

Figure 2 illustrates the audio processing of the film medium 1 of figure 1, i.e. to obtain the desired subtitle display information file stored on an appropriate storage medium. As previously mentioned above, for the illustrated system, the sound track medium is shown as a moving picture film 1 by way of example only; the medium could as mentioned above of course be any magnetic, optical or magneto-optical medium or the like.

The audio cue system as shown in figure 2 is provided with an audio sensor element 8 which also comprises a start cue element (or patch). The audio sensor element 8 is configured to provide a processor element g with a digitised audio signal derived from the film 1 for the preparation of audio signal signatures to be associated with one or more subtitles in an above mentioned text/cue file. The start cue element as previously mentioned may comprise a known patch and motion detector and an associated time-code generator; a patch detector establishes the starting time for recording or capture and the time generator establishes the time references once the starting time is established by the patch. The audio sensor element 8 may, for example, also comprise a photoelectric cell which converts light variations to 2I2992~

electrical impulses, such as is used on projectors, editing tables, or optical sound readers; on the other hand the audio sensor element 8 may comprise a magnetic playback head or reproduce head, used to sense the magnetic flux from a magnetic sound recording.

The duration of the audio pattern sampling (file) depends on the length of the reel but is continuous throughout - starting at a start cue patch in the leader of the reel which enables the system to correlate the audio pattern (file) with elapsed time.
An industry standard Patch and motion detection film monitoring system ( e.g. model FM 35 from Component Enginering of Seattle, WA. USA) may be used.

The audio sampling rate may be variable but may for example range around 10,000 samples per second. The intervals between frames of time of which there may for example be 50 per second may range between 20 ms and 50 ms.

A digital audio signature (finger print, etc.) sample may for example be taken at some point well before the appearance of the image for which a subtitle is to appear (i.e. the audio signature is offset with respect to the image presentation); alternatively, the signature may be located just before the place of intended display of the subtitle to provide sufficient intrapolation time 2129~12.5 to assure that the dialogue or narration continues to match with the image to be displayed.

The audio sensing element 8 provides an output 10 to the processor element 9; the output 10 comprises a digitised audio signal. If the sound track comprises an analogue sound signal, samples of light variations, sensed by a photoelectric cell mounted on a film projector , editing table, or independent optical sound reader or the analogue sound values picked up by a magnetic head on an audio or video playback device is converted into digital word values or audio reference signatures in any known manner. If the sound track already comprises a digitised sound track such conversion will of course not be needed, i.e. a digitised signal may be immediately read by suitable means directly from the film.

Each digital word or reference signature representative of an audio value, may be in binary code word of any desired or suitable length; it could be 8 bits long, 16 bits long, 32 bits long, and so on. Each word or signature represents a region of each audio signal range. Extremes of the signal voltage ranges are mapped into digital words wherein the first bit (the sign bit) indicates the polarity of each voltage. The next bit informs whether the voltage is in the upper or lower half of the voltage range. The following bit again divides this assigned range into two and so forth. The last or least significant bit is the final location of the quantization interval, which in size corresponds to the voltage interval (i.e. space between voltage peaks).
The resulting digital pattern words form the core of the digital audio signal signature samples for the display cue file referred to below.

Temporal sample elements are also captured as part of the digital audio signal pattern signatures and are interpolated with the "time elapsed since start" (as encoded or read by means of the time code generator/reader) cue during final presentation which provides the context for the digital audio signature pattern matching window. The time stamp data relates to the time encoded (i.e. timecode generator encoded time) during capture phase which at the Cue Patch at the head of each reel. There is no means by which the system can check all the data, due to time constraints dictated by real-time, uninterruptible presentation, in the case of a film break. Therefore a selectable audio signature pattern matching or search windo, usually may consist for example of between 5 seconds backwards and 15 seconds forwards from the point of a film break or interruption, is used to re-establish matching.

Figure 3 shows and example of an audio signature (set at 20 ms interval).

- 212992~
The processor element 9 may comprise a suitable computer and an associated monitor, the computer having a suitable program for allowing an operator to command the computer to associate a specific subtitle in storage element 7 (see above) with a specific digital audio signature; as mentioned above the subtitle in storage element 7 will be associated with a first and second cue value. A suitable program is the Cine-Text, v3.1 module 2, preparation/capturing/testing software which is available from the National Film Board of Canada.

The processor element 9 is configured to take the above mentioned specific subtitle and associated audio signatures and cue values and output it via output 10 to storage element 11 in the form of a display cue file on a suitable memory medium, the display cue file being intended to comprise a plurality of such specific subtitles; alternatively, each specific subtitle may be sent to an individual file on the memory medium.

Audio signatures to be recognisable must include significant modulation variations of intensity or peaks, to distinguish them from even or non-variable signal periods (i.e. during periods of monotonous sound or silence) when no information is available to create signatures.

In operation, the capture of appropriate audio signature samples is determined by the software not by human intervention. It 212992~
also correlates the particular signatures associated with each subtitle, as well as with the temporal (timecode) data provided.

Figure 3 shows a subtitle display system. The display system has an audio signal sensor element 12 which may be,the same as or different from the sensor 8 of the audio cue system in figure 2;
the audio sensor element 12 may for example be part a sound reader of a motion picture film projector or a video playback device which is being used to display the moving viewable images.
The presentation copy of the film 1 is exposed to the sensor element 12 so as to output a digital audio signal 13 to the processor element 14 which may be a computer associated if desired with a monitor (see above). The processor element 14 is provided with a suitable presentation program for outputting requests 15 to the storage element 11 (see above) to send via output 16 audio signal signatures and associated subtitles to the processor element 14; the software presentation program is also suitable to compare the audio signal signatures with the audio signal from the sensor element to determine if there is a match (i.e. as in figures 3a and 4a) and if so to send a (trigger) output 17, representative of the subtitle associated with the matched signatures, to a suitable display element 18 (such as a video data projector, LCD or LED display); the subtitle being spotted by the trigger audio signature or pattern being displayed in accordance with the associated cue values. The subtitle is displayed on an appropriate screen element as described above.

212992~
.
While the subtitles are being displayed the film itself is of course being displayed in known manner.

A suitable presentation program is the Cine-Text, v3.1 module 3, presentation software which is available from the National Film Board of Canada.

In operation, the operator loads the theatre program on the diskettes or storag medium supplied, selects the film, loads the film print reel one on the projector and cues it to start in usual manner. The projector is then started and the patch and motion detector automatically triggers the subtitle display;
reels are changed as required.

Figure 5 shows possible presentation set-ups in accordance with the present invention; signals going from the dolby sound unit correspond to the signals used during the capture process, i.e.
during the process of associating the audio signal signatures to the subtitles.

Figures 6 and 7 illustrate an example of a known film Patch and Motion Detector. The film Patch and Motion Detector consists of two elements. The first is the patch or cue detector which is of the Eddy Current Killed Oscillation (ECKO) proximity type. It responds to small foil cue patches on the film (e.g. commonly used to trigger curtain openings and closings, etc..). The 212992~

second element is the film Presence and Motion Detector which consists of two infrared light detectors, scanning the two edges of the film. Film presence will close the relay. The Motion Detector scans the perforations of the film.

Claims (10)

1. A display system for displaying one or more subtitles during play of an audio message from a sound track providing an audio signal carrying said audio message, comprising - a storage element containing subtitle display information comprising one or more predetermined subtitles and one or more digital audio signal signatures derived from said audio signal, each said subtitle being associated with a respective digital audio signal signature, each said subtitle being associated with a respective display cue stamp, each display cue stamp comprising a first cue value and a second cue value respectively marking the beginning and the end of the display of a respective subtitle, - a sensing element for sensing said audio signal and for producing an output comprising a digital audio signal signatures, - a processor element for selecting from said storage element, during play of said audio message, a specific subtitle for display by correlating said audio signal signature of said output from the sensing element with said audio signal signatures of said storage element, said processor element being able to produce an output representative of said specific subtitle, and - a display element for displaying said specific subtitle in response to said output from said processor element, said specific subtitle being displayed in accordance with the display cue stamp associated therewith.
2. A system as defined in claim 1 wherein each display cue stamp comprises a display time stamp, each display time stamp comprising a first time and a second time respectively marking the beginning and the end of the display of a subtitle, and wherein said selected subtitle is displayed in accordance with the display time stamp associated therewith.
3. A system as defined in claim 1 wherein each display cue stamp comprises a display frame count stamp, each display frame count stamp comprising a first frame count value and a second frame count value respectively marking the beginning and the end of the display of a subtitle, and wherein said selected subtitle is displayed in accordance with the display frame count stamp associated therewith.
4. A display system for displaying subtitles during projection of viewable images, said images having a sound track associated therewith for providing an audio signal carrying an audio message synchronised with said viewable images, comprising - a storage element containing subtitle display information comprising one or more predetermined subtitles and one or more digital audio signal signatures derived from said audio signal, each said subtitle being associated with a respective audio signal signature, each said subtitle being associated with a respective display cue stamp, each display cue stamp comprising a first cue value and a second cue value respectively marking the beginning and the end of the display of a respective subtitle, - a sensing element for sensing said audio signal and for producing an output comprising a said digital audio signal signature, - a processor element for selecting from said storage element a specific subtitle for display by correlating said audio signal signature of said output from the sensing element with said audio signal signatures of said storage element, said processor element being able to produce an output representative of said specific subtitle, and - a display element for displaying said specific subtitle in response to said output from said processor element, said specific subtitle being displayed in accordance with the display cue stamp associated therewith and in synchronization with said viewable images.
5. A system as defined in claim 4 wherein each display cue stamp comprises a display time stamp, each display time stamp comprising a first time and a second time respectively marking the beginning and the end of the display of a subtitle, and wherein said selected subtitle is displayed in accordance with the display time stamp associated therewith.
6. A system as defined in claim 4 wherein each display cue stamp comprises a display frame count stamp, each display frame count stamp comprising a first frame count value and a second frame count value respectively marking the beginning and the end of the display of a subtitle, and wherein said selected subtitle is displayed in accordance with the display frame count stamp associated therewith.
7. A cue system for providing subtitle co-ordination display information from a sound track providing an audio signal carrying an audio message, said display information being configured for displaying one or more subtitles during play of said audio message from said sound track, comprising - a processor element, - a first storage element containing subtitle cue information comprising one or more predetermined subtitles, each subtitle being associated with a respective display cue stamp, each display cue stamp comprising a first cue value and a second cue value respectively marking the beginning and the end of the display of a respective subtitle, - a sensing element for sensing said audio signal and for providing an input to said processor element comprising a digital audio signal, said processor element being able to associate a digital audio signal signature with a respective predetermined subtitle of said first storage element, and - a second storage element for storing subtitle display information comprising one or more of the said subtitles produced by said processor element, each of said so stored subtitles being associated with a respective said audio signal signature and a respective said display cue stamp.
8. A cue system for providing subtitle co-ordination display information for displaying subtitles during projection of viewable images, said images having a sound track associated therewith for providing an audio signal carrying an audio message synchronised with said viewable images, said display information being obtained from said sound track, comprising - a processor element, - a first storage element containing subtitle cue information comprising one or more predetermined subtitles, each subtitle being associated with a respective display cue stamp, each display cue stamp comprising a first cue value and a second cue value respectively marking the beginning and the end of the display of a respective subtitle, - a sensing element for sensing said audio signal and for providing an input to said processor element comprising a digital audio signal, said processor element being able to associate a digital audio signal signature with a respective predetermined subtitle of said first storage element, and - a second storage element for storing subtitle display information comprising one or more of the said subtitles produced by said processor element, each of said so stored subtitles being associated with a respective said audio signal signature and a respective said display cue stamp.
9. A method for providing subtitle co-ordination display information from a sound track providing an audio signal carrying an audio message, said display information being configured for displaying one or more subtitles during play of said audio message from said sound track, comprising - providing a first storage element containing subtitle cue information comprising one or more predetermined subtitles, each subtitle being associated with a respective display cue stamp, each display cue stamp comprising a first cue value and a second cue value respectively marking the beginning and the end of the display of a respective subtitle, - sensing said audio signal and providing a digital audio signal output to a processor element, said processor element being able to associate a digital audio signal signature with a respective predetermined subtitle of said first storage element, and - storing subtitle display information in a second storage element, said subtitle display information comprising one or more of the said subtitles produced by said processor element, each of said so stored subtitles being associated with a respective said audio signal signature and a respective said display cue stamp.
10. A method for displaying one or more subtitles during play of an audio message from a sound track providing an audio signal carrying said audio message, comprising - providing a storage element containing subtitle display information comprising one or more predetermined subtitles and one or more digital audio signal signatures derived from said audio signal, each said subtitle being associated with a respective digital audio signal signature, each said subtitle being associated with a respective display cue stamp, each display cue stamp comprising a first cue value and a second cue value respectively marking the beginning and the end of the display of a subtitle, - sensing said audio signal and producing an output comprising a said digital audio signal signature, - selecting from said storage element, during play of said audio message, a specific subtitle for display by correlating said audio signal signature of said output with said audio signal signatures of said storage element, and - displaying said specific subtitle, said specific subtitle being displayed in accordance with the display cue stamp associated therewith.
CA002129925A 1994-08-11 1994-08-11 Audio synchronization of subtitles Abandoned CA2129925A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CA002129925A CA2129925A1 (en) 1994-08-11 1994-08-11 Audio synchronization of subtitles

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CA002129925A CA2129925A1 (en) 1994-08-11 1994-08-11 Audio synchronization of subtitles

Publications (1)

Publication Number Publication Date
CA2129925A1 true CA2129925A1 (en) 1996-02-12

Family

ID=4154154

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002129925A Abandoned CA2129925A1 (en) 1994-08-11 1994-08-11 Audio synchronization of subtitles

Country Status (1)

Country Link
CA (1) CA2129925A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003061285A2 (en) 2001-12-24 2003-07-24 Scientific Generics Limited Captioning system
US7460991B2 (en) 2000-11-30 2008-12-02 Intrasonics Limited System and method for shaping a data signal for embedding within an audio signal
US7505823B1 (en) 1999-07-30 2009-03-17 Intrasonics Limited Acoustic communication system
US7796978B2 (en) 2000-11-30 2010-09-14 Intrasonics S.A.R.L. Communication system for receiving and transmitting data using an acoustic data channel
US8009966B2 (en) 2002-11-01 2011-08-30 Synchro Arts Limited Methods and apparatus for use in sound replacement with automatic synchronization to images
US8560913B2 (en) 2008-05-29 2013-10-15 Intrasonics S.A.R.L. Data embedding system
US9609397B1 (en) 2015-12-28 2017-03-28 International Business Machines Corporation Automatic synchronization of subtitles based on audio fingerprinting
CN106792071A (en) * 2016-12-19 2017-05-31 北京小米移动软件有限公司 Method for processing caption and device
EP3451177A4 (en) * 2016-04-25 2019-12-25 Yamaha Corporation Information processing method and terminal device

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7505823B1 (en) 1999-07-30 2009-03-17 Intrasonics Limited Acoustic communication system
US8185100B2 (en) 2000-11-30 2012-05-22 Intrasonics S.A.R.L. Communication system
US7460991B2 (en) 2000-11-30 2008-12-02 Intrasonics Limited System and method for shaping a data signal for embedding within an audio signal
US7796978B2 (en) 2000-11-30 2010-09-14 Intrasonics S.A.R.L. Communication system for receiving and transmitting data using an acoustic data channel
WO2003061285A3 (en) * 2001-12-24 2004-03-11 Scient Generics Ltd Captioning system
WO2003061285A2 (en) 2001-12-24 2003-07-24 Scientific Generics Limited Captioning system
US8248528B2 (en) * 2001-12-24 2012-08-21 Intrasonics S.A.R.L. Captioning system
US8009966B2 (en) 2002-11-01 2011-08-30 Synchro Arts Limited Methods and apparatus for use in sound replacement with automatic synchronization to images
US8560913B2 (en) 2008-05-29 2013-10-15 Intrasonics S.A.R.L. Data embedding system
US9609397B1 (en) 2015-12-28 2017-03-28 International Business Machines Corporation Automatic synchronization of subtitles based on audio fingerprinting
US10021445B2 (en) 2015-12-28 2018-07-10 International Business Machines Corporation Automatic synchronization of subtitles based on audio fingerprinting
EP3451177A4 (en) * 2016-04-25 2019-12-25 Yamaha Corporation Information processing method and terminal device
US10846150B2 (en) 2016-04-25 2020-11-24 Yamaha Corporation Information processing method and terminal apparatus
CN106792071A (en) * 2016-12-19 2017-05-31 北京小米移动软件有限公司 Method for processing caption and device

Similar Documents

Publication Publication Date Title
JP4406423B2 (en) Method and apparatus for synchronizing an image data stream with a corresponding audio data stream
EP0720109B1 (en) Multimedia program indexing system and method
US4660107A (en) Method and apparatus for cueing and pacing in audio and audio-visual work
US4067049A (en) Sound editing system
US5532773A (en) Method and apparatus for indexing and retrieval of a continuous visual image medium
US20090231492A1 (en) Smart slate
KR900000756A (en) System of pre-purchase means
US4587572A (en) Film to tape transfer system
US5877842A (en) Digital Dailies
EP2041744B1 (en) Audio watermarking technique for motion picture presentations
US6397184B1 (en) System and method for associating pre-recorded audio snippets with still photographic images
CA2129925A1 (en) Audio synchronization of subtitles
US4467371A (en) Method of pre-editing an original video tape in combination of scene numbers with a card and systems for carrying out the method
US5506639A (en) Method and apparatus for editing motion picture film and synchronized sound
JP3934780B2 (en) Broadcast program management apparatus, broadcast program management method, and recording medium recording broadcast program management processing program
FR2765354A1 (en) Film dubbing synchronisation system
Singleton-Turner In the studio: Communication
JP2005202381A (en) Movie sub-information reproducing device
Audemars Film and Video Tape Editing
NL8800202A (en) Subtitle display system used with film projector - displays computer generated subtitles on separate display under screen
Perron A Rhythmo-Band Dialogue Replacement Technique
Becker System Considerations for Off-Line Disc Editing
Strong SMPTE Time and Edit Code and Its Application to Motion-Picture Production
Oudin Direct introduction of time code on film
Uhlig Technical Experience with Datakode™ Magnetic Control Surface

Legal Events

Date Code Title Description
FZDE Discontinued