Nothing Special   »   [go: up one dir, main page]

WO2007058517A1 - Method and apparatus for synchronizing visual and voice data in dab/dmb service system - Google Patents

Method and apparatus for synchronizing visual and voice data in dab/dmb service system Download PDF

Info

Publication number
WO2007058517A1
WO2007058517A1 PCT/KR2006/004901 KR2006004901W WO2007058517A1 WO 2007058517 A1 WO2007058517 A1 WO 2007058517A1 KR 2006004901 W KR2006004901 W KR 2006004901W WO 2007058517 A1 WO2007058517 A1 WO 2007058517A1
Authority
WO
WIPO (PCT)
Prior art keywords
synchronization
web document
speech
document
visual
Prior art date
Application number
PCT/KR2006/004901
Other languages
French (fr)
Inventor
Bong-Ho Lee
So-Ra Park
Hee-Jeong Kim
Kyu-Tae Yang
Chung-Hyun Ahn
Soo-In Lee
Original Assignee
Electronics And Telecommunications Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electronics And Telecommunications Research Institute filed Critical Electronics And Telecommunications Research Institute
Priority to EP06823659A priority Critical patent/EP1952629A4/en
Publication of WO2007058517A1 publication Critical patent/WO2007058517A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H60/00Arrangements for broadcast applications with a direct linking to broadcast information or broadcast space-time; Broadcast-related systems
    • H04H60/76Arrangements characterised by transmission systems other than for broadcast, e.g. the Internet
    • H04H60/81Arrangements characterised by transmission systems other than for broadcast, e.g. the Internet characterised by the transmission system itself
    • H04H60/82Arrangements characterised by transmission systems other than for broadcast, e.g. the Internet characterised by the transmission system itself the transmission system being the Internet
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/242Synchronization processes, e.g. processing of PCR [Program Clock References]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • H04N21/43072Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of multiple content streams on the same device
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234318Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by decomposing into objects, e.g. MPEG-4 objects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4782Web browsing, e.g. WebTV
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8106Monomedia components thereof involving special audio data, e.g. different tracks for different languages
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8126Monomedia components thereof involving additional data, e.g. news, sports, stocks, weather forecasts
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8543Content authoring using a description language, e.g. Multimedia and Hypermedia information coding Expert Group [MHEG], eXtensible Markup Language [XML]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/04Synchronising
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H2201/00Aspects of broadcast communication
    • H04H2201/10Aspects of broadcast communication characterised by the type of broadcast system
    • H04H2201/20Aspects of broadcast communication characterised by the type of broadcast system digital audio broadcasting [DAB]

Definitions

  • the present invention relates to a digital data broadcasting service; and, more particularly, to a method for providing a Web service that can simultaneously input/output speech data along with visual data by integrating speech Web data with broadcasting Web sites
  • DMB Multimedia Broadcasting
  • DAB Digital Audio Broadcasting
  • BWS Conventional broadcasting Web sites
  • HTTP Hyper Text Markup Language
  • DMB Digital Multimedia Broadcasting
  • DAB Digital Audio Broadcasting
  • the method can simply output the Web data defined by the HTML onto the screen. Therefore, the method cannot sufficiently transfer data in a broadcasting system for a mobile environment, such as a DAB-based DMB.
  • an X + V method is underway for standardization and development to provide a multi-modal Web service.
  • the method too, operates based on a visual interface with the XHTML as a host language, and it is somewhat inappropriate for a mobile environment.
  • the present invention provides a method for synchronizing visual and speech Web data that can overcome the aforementioned drawbacks and provide users with a speech-directed Web service in a mobile environment or a fixed location environment, instead of a visual-directed Web service, and an apparatus thereof.
  • BWS broadcasting Web sites
  • an embodiment of the present invention defines a speech-directed Web language to provide a speech-directed Web service in consideration of a mobile environment, instead of a screen-directed Web service .
  • another embodiment of the present invention provides a service capable of inputting/outputting speech data by integrating a conventional Web service framework, e.g., a BWS service, with a speech input/output module.
  • a conventional Web service framework e.g., a BWS service
  • yet another embodiment of the present invention provides a technology of synchronizing a content following a visual Web specification, e.g., HTML, and a VoiceXML content capable of providing a speech Web service, that is, a technology of synchronizing visual data with speech data.
  • a visual Web specification e.g., HTML
  • a VoiceXML content capable of providing a speech Web service
  • processing of documents should be synchronized, and a user input device should be synchronized for one document. It is the object of the present invention to provide a method and apparatus for the synchronizations.
  • a method for synchronizing visual data with speech data to provide a broadcasting Web sites (BWS) service capable of simultaneously inputting/outputting speech data in a multimedia broadcasting service which includes the steps of: a) generating a visual Web document; b) generating a speech Web document including synchronization tags related to the visual Web document; and c) identifying the speech Web document and the visual Web document based on a sub-channel or a directory and transmitting the speech Web document and the visual Web document independently.
  • BWS broadcasting Web sites
  • an apparatus for synchronizing visual data with speech data to provide a BWS service capable of simultaneously inputting/outputting speech data in a multimedia broadcasting service which includes: a) a content data generator for generating a visual Web document and a speech Web document including synchronization tags related to the visual Web document; b) a multimedia object transfer (MOT) server for transforming both the generated visual Web document and the speech Web document into an MOT protocol; and c) a transmitting system for identifying the speech Web document and the visual Web document of the MOT protocol based on a sub-channel or a directory and transmitting the speech Web document and the visual Web document independently .
  • a content data generator for generating a visual Web document and a speech Web document including synchronization tags related to the visual Web document
  • a multimedia object transfer (MOT) server for transforming both the generated visual Web document and the speech Web document into an MOT protocol
  • a transmitting system for identifying the speech Web document and the visual Web document of the MOT protocol based on a sub-channel or
  • a method for synchronizing visual data with speech data to provide a BWS service capable of simultaneously inputting/outputting speech data in a multimedia broadcasting service which includes the steps of: a) receiving and loading a visual Web document and a speech Web document including synchronization tags related to the visual Web document, the visual Web document and the speech Web document being identified based on a sub-channel or a directory and transmitted independently; and b) analyzing the synchronization tags when a synchronization event occurs and performing a corresponding synchronization operation.
  • an apparatus for synchronizing visual data with speech data to provide a BWS service capable of simultaneously inputting/outputting speech data in a multimedia broadcasting service which includes: a) a baseband receiver for receiving broadcasting signals through a multimedia broadcasting network and performing channel decoding; b) a multimedia object transfer (MOT) decoder for decoding channel-decoded packets and restoring a visual Web document and a speech Web document including synchronization tags related to the visual Web document; and c) an integrated Web browser for analyzing the synchronization tag when a synchronization event occurs and executing a corresponding synchronization operation.
  • a baseband receiver for receiving broadcasting signals through a multimedia broadcasting network and performing channel decoding
  • a multimedia object transfer (MOT) decoder for decoding channel-decoded packets and restoring a visual Web document and a speech Web document including synchronization tags related to the visual Web document
  • MOT multimedia object transfer
  • an HTML document which is a visual Web document
  • a VoiceXML content which is a speech Web document
  • a multimedia broadcasting service user can conveniently access to corresponding information by receiving both screen output and speech output for a Web data service and, if necessary, making a command by speech even in a mobile environment.
  • the present invention has an advantage that it can ensure the backward compatibility with lower-ranked services by individually authoring and transmitting data to provide an integrated synchronization service, instead of integrating markup languages and transmitting them in the form of data of a sort, which is generally used.
  • the technology of the present invention adds synchronization-related elements to a host markup language to thereby maintain a conventional service framework.
  • users can receive a conventional broadcasting Web site and, at the same time, access to the Web by speech, listen to information, and control the Web by speech.
  • Fig. 1 is an exemplary view illustrating how broadcasting Web site documents are authored to be synchronized and capable of speech input/output in accordance with an embodiment of the present invention
  • Fig. 2 is a view describing broadcasting Web site documents capable of speech input/output and a data transmitting method in accordance with an embodiment of the present invention
  • Fig. 3 is an exemplary view showing broadcasting Web site documents capable of speech input/output when a synchronization document is separately provided in accordance with an embodiment of the present invention
  • Fig. 4 is a block view describing a Digital Multimedia Broadcasting (DMB) system which is configured based on a Digital Audio Broadcasting (DAB) and providing a broadcasting Web sites (BWS) service capable of simultaneous speech input/output; and
  • DMB Digital Multimedia Broadcasting
  • DAB Digital Audio Broadcasting
  • BWS broadcasting Web sites
  • Fig. 5 is a block view illustrating an integrated Web browser of Fig. 4.
  • broadcasting Web sites defined to provide a Web service in a multimedia broadcasting service, such as Digital Audio Broadcasting (DAB) and Digital Multimedia Broadcasting (DMB)
  • DAB Digital Audio Broadcasting
  • DMB Digital Multimedia Broadcasting
  • the Web language that becomes the basis for providing the service includes a basic profile which adopts HTML 3.2 as a Web specification in consideration of a terminal with a relatively low specification, and a non-restrictive profile which has no restriction in consideration of a high-specification terminal, such as a personal computer (PC).
  • PC personal computer
  • the profiles are based on the HTML, which is a Web representation language, it requires a Web browser to provide a terminal with a BWS service.
  • the browser may be called a BWS browser and it provides a Web service by receiving and decoding Web contents of txt, html, jpg, and png formats transmitted as objects through a multimedia object transfer (MOT) .
  • MOT multimedia object transfer
  • the output is provided in the visual form. That is, texts or still images are displayed on a screen with a hyperlink function and they transit into the other contents transmitted together through the MOT to thereby provide a visual-based local Web service.
  • the specification includes a function of recovering a speech file or other multimedia files, it is possible to provide the output not only on the screen but also by speech.
  • GUI Graphical User Interface
  • VoiceXML is a Web language devised for an interactive speech response service of an Interactive Voice Response (IVR) type. When it is actually mounted on the terminal, it can provide a speech enabled Web service.
  • the technology defines a markup language that can be transited into another application, document, or dialogue based on a dialogue obtained by modeling a conversation between a human being and a machine.
  • the VoiceXML can provide a Web service that can input/output data by speech.
  • Web information can be delivered by speech by applying a Text To Speech (TTS) technology which transforms text data into speech data and the Automatic Speech Recognition (ASR) technology which performs speech recognition to an input/output module, and user input data are received by speech to process a corresponding command or execute a corresponding application.
  • TTS Text To Speech
  • ASR Automatic Speech Recognition
  • the VoiceXML is effective in a mobile environment. It has an advantage that users listen to a Web service provided without a visual output on the screen and perform navigation by inputting speech data at desired information.
  • there is a limitation in delivering Web information by speech only and, when speech input/output is made together with visual data on the screen it is convenient and it is possible to provide diverse additional data services.
  • the present invention provides a transmission and synchronization method for providing a multi-modal Web service by integrating the conventional BWS Web specification, i.e., HTML, with a speech Web language, i.e., VoiceXML.
  • a transmission and synchronization method will be described.
  • the basic principle of the present invention is to generate a speech Web document including synchronization information related to a visual Web document and transmit the visual Web document and the speech Web document through another sub-channel or another directory of the same sub-channel.
  • Fig. 1 is an exemplary view illustrating how of broadcasting Web site documents are authored to be synchronized and capable of speech input/output in accordance with an embodiment of the present invention.
  • a visual Web document and a speech Web document are separately created in the embodiment of the present invention.
  • the visual Web document is an HTML or an xHTML content defined in the BWS
  • the speech Web document is a document integrating elements or tags in charge of synchronization between the VoiceXML and the visual Web documents, a speech recognition module, and a component-related module such as a speech combiner and a receiver.
  • Fig. 2 is a view describing broadcasting Web site documents capable of speech input/output and a data transmitting method in accordance with an embodiment of the present invention.
  • the visual Web document and the speech Web document are transmitted and signaled through different sub-channels or the same subchannel, they are transmitted using different directories, This is to make a terminal capable of receiving an existing BWS service receive the conventional service, even if the BWS is cooperated with a speech Web document.
  • the signaling for the speech BWS is additionally processed in the speech Web document, i.e., a speech module.
  • the synchronization between the visual Web document and the speech Web document is processed by using synchronization tags ⁇ esync>, ⁇ icync> and ⁇ fsync>.
  • the synchronization tags are described in the speech Web document without exception. Also, the synchronization tags are identified by the following namespace.
  • a VoiceXML forming a speech Web document has the following name space.
  • the entire namespace including the visual Web document and the speech Web document may be designated as follows:
  • the synchronization tags for processing synchronization between the visual Web document, i.e., HTML, and the speech Web document, i.e., the VoiceXML, should describe synchronization between an application, documents, and forms within the document.
  • 30 synchronization tags used for the purpose are ⁇ esync>, ⁇ isync> and ⁇ fsync>.
  • the tags ⁇ esync> and ⁇ isync> are in charge of synchronization between an application and documents, whereas the ⁇ fsync> tag is in charge of synchronization between forms.
  • the synchronization between the application and the documents should be simultaneously loaded, interpreted and rendered in the initial period when the application starts.
  • the synchronization between the forms signifies that user input data are simultaneously inputted to a counterpart form.
  • the ⁇ esync> tag is used to describe synchronization information between applications or between documents, when synchronization related information, i.e., ⁇ esync>, and related attributes do not exist in the speech Web document but exist in an independent external document, e.g., a document with an extension name of '.sync'.
  • the ⁇ esync> tag supports the synchronization function based on the attributes shown in the following Table 1.
  • the external document synchronization using the ⁇ esync> tag requires metadata which provide the speech Web document with information on the external synchronization document.
  • metadata used is a ⁇ metasync> tag having attributes defined as shown in Table 2.
  • the ⁇ metasync> tag should be positioned in the speech Web document and it provides metadata to the ⁇ esync> tags stored in the external document.
  • the entire operation mechanism is as shown in Fig. 3. That is, the synchronization document and the related ⁇ esync> tags are interpreted through the ⁇ metasync> tag described in the speech Web document and then the related BWS document is simultaneously loaded and rendered.
  • the ⁇ isync> tag indicates a synchronization method of a document. Differently from the ⁇ esync> tag, it is not authored in a separate document but it is formed by directly describing related synchronization tags within a predetermined form.
  • the form includes a ⁇ form> tag and a ⁇ menu> tag of a VoiceXML, in a speech Web document. This is to support synchronization occurring when a predetermined form of the speech Web document should be synchronized with a BWS Web document and when a predetermined document needs to be transited.
  • each form when there are a plurality of forms in on speech Web document and each form requires BWS documents having multiple pages and a synchronized operation with the BWS documents, it can be resolved by describing related ⁇ isync> tags in each form.
  • the tag ⁇ isync> may be described in tags ⁇ link> or ⁇ goto> of the VoiceXML and secure synchronized transit.
  • the attributes of the ⁇ isync> tag for realizing synchronization based on the tag ⁇ isync> tag are as shown in the following Table 3
  • the following shows an example that shows synchronization of an application document authored by using the tags ⁇ esync> and ⁇ isync>.
  • a X ⁇ service_main_intro" dialogue is outputted by speech and, at the same time, a corresponding html page which affects the entire document, for example, a main page of the entire service, is synchronized based on the
  • ⁇ dialogue is executed.
  • a "hotnews_intro" dialogue is executed, a BWS Web document corresponding to the
  • the ⁇ fsync> tag is needed for inter-form synchronization between the speech Web document and the BWS Web document and it processes the user input data.
  • the concept of the ⁇ fsync> tag is similar to document synchronization. It signifies that, when the user input data are processed through speech recognition of the speech Web document, the processed user input data are transferred to and reflected in a ⁇ input> tag of the BWS. Conversely, when data are inputted from a user on the BWS, the content of the user input data is reflected in a ⁇ field> tag of the speech Web document.
  • the ⁇ fsync> tag is a sort of executable contents. It may be positioned within the ⁇ form> of the VoiceXML or it may exist independently. If any, the scope of the ⁇ fsync> tag is limited to a document. When the ⁇ fsync> has a global scope over all documents, it should be specified in a root document. Then, it may be activated in all documents.
  • the ⁇ field' attribute When the ⁇ field' attribute is not specified, it means that the ⁇ field> of a corresponding form is an object to be synchronized. In this case, the form should have only one ⁇ field> tag. If there are a plurality of ⁇ field> tags in one form, its attributes should be specified necessarily. Also, each field should have a unique name.
  • the ⁇ fsync> tag has the attributes shown in the following Table 4 and it should be in charge of synchronization between forms.
  • Attribute Function field This signifies a ⁇ field> name of a VoiceXML
  • Speech data input from the user should be updated in the ⁇ field> tag of VoiceXML and ⁇ input> tag of the BWS HTML.
  • Visual data inputted from the user such as data input through a keyboard or a pen, should be updated in the ⁇ input> tag of HTML and the ⁇ field> tag of VoiceXML simultaneously.
  • Visual data inputted from the user should satisfy a guard condition of the ⁇ field> tag of VoiceXML.
  • the ⁇ field> tag of VoiceXML should be matched one- to-one with the ⁇ input> tag of HTML in the moment when the inputted data are about to be reflected.
  • the form synchronization should be carried out in parallel to the document synchronization. That is, the ⁇ field> or ⁇ input> tag to be synchronized may be validly updated only in a document already synchronized.
  • the tags of the two modules, which should receive the inputted data should mutually exist in the synchronized document. When they are described on the external document, only the 'root' document is allowed for general synchronization.
  • the data should be mutually inputted only in the form of an activated speech Web document.
  • Fig. 4 is a block view describing a Digital Multimedia Broadcasting (DMB) system providing a broadcasting Web sites (BWS) service capable of simultaneous speech input/output.
  • DMB Digital Multimedia Broadcasting
  • the DMB system for providing speech-based BWS service capable of simultaneous speech input/output can be divided into a DMB transmission part and a DMB reception part based on a DAB system.
  • the DMB transmission part includes a content data generator 110, a multimedia object transfer server (MOT) 120, and a DMB transmitting system 130.
  • the content data generator 110 generates speech contents (speech Web documents) and BWS contents (visual Web documents).
  • the MOT server 120 transforms the directory and file objects of the speech contents and BWS contents into MOT protocols before they are transmitted.
  • the DMB transmitting system 130 multiplexes the respective MOT data of the transformed MOT protocol, which include both speech Web documents and visual Web documents, with different directory of the same sub- channel or different sub-channels and broadcasts them through a DMB broadcasting network.
  • the present invention is not limited to them and a speech Web document and a visual Web document may be generated in an external device and transmitted from the external device.
  • the DMB broadcasting reception part i.e., the DMB receiving block 200, includes a DMB baseband receiver 210, an MOT decoder 220, and a DMB integrated Web browser 230.
  • the DMB baseband receiver 210 receives DMB broadcasting signals from the DMB broadcasting network based on the DAB system, performs decoding for corresponding subchannels, and outputs data of the respective sub-channels.
  • the MOT decoder 220 decodes packets transmitted from the DMB baseband receiver 210 and restores MOT objects.
  • the DMB integrated Web browser 230 executes the restored MOT objects, which include directories and files, independently or based on a corresponding synchronization method.
  • the restored objects includes visual Web documents and speech Web documents related to the visual Web documents
  • the DMB integrated Web browser 230 analyzes the aforementioned synchronization tags during the generation of synchronization event and executes the synchronization function based on the synchronization tags.
  • Fig. 5 is a block view illustrating an integrated Web browser of Fig. 4.
  • the integrated Web browser 230 includes a speech Web browser 233, a BWS browser 235, and a synchronization management module 231.
  • the speech Web browser 233 drives speech markup generation language extended based on VoiceXML .
  • the BWS browser 235 drives Web pages based on the HTML.
  • the synchronization management module 231 manages synchronization between the speech Web browser 233 and the BWS browser 235.
  • the speech Web browser 233 sequentially drives Web pages authored in the VoiceXML-based speech markup language, outputs speech to the user, and processes user input data through a speech device.
  • the BWS browser 235 drives the Web pages authored in the HTML language defined in the DAB/DMB specifications and displays input/output on a screen, just as commercial browsers.
  • the synchronization management module 231 receives synchronization events generated in the speech Web browser 233 and the BWS browser 235 and synchronizes corresponding pages and forms of each page based on the pre-defined synchronization protocol (synchronization tags).
  • Examples of the DMB receiving block 200 include a personal digital assistant (PDA), a mobile communication terminal, and a settop box for a vehicle that can receive and restore DAB and DMB service.
  • PDA personal digital assistant
  • mobile communication terminal a mobile communication terminal
  • settop box for a vehicle that can receive and restore DAB and DMB service.
  • the method of the present invention can be realized in a computer-readable recording medium, such as CD-ROM, RAM, ROM, floppy disks, hard disks, magneto-optical disks and the like. Since the process can be easily implemented by those skilled in the art to which the present invention pertains, further description will not be provided herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Transfer Between Computers (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

Provided is a digital broadcasting service, more particularly, a method for providing a Web service that can simultaneously input/output speech data along with visual data by integrating speech Web data with broadcasting Web sites (BWS) provided for a multimedia broadcasting in a digital multimedia broadcasting (DMB) system configured based on a digital audio broadcasting (DAB), and an apparatus thereof . The method for synchronizing visual data with speech data includes the steps of : a) generating a visual Web document; b) generating a speech Web document including synchronization tags related to the visual Web document; and c) identifying the speech Web document and the visual Web document based on a sub-channel or a directory and transmitting the speech Web document and the visual Web document independently.

Description

METHOD AND APPARATUS FOR SYNCHRONIZING VISUAL AND VOICE DATA IN DAB/DMB SERVICE SYSTEM
Description Technical Field
The present invention relates to a digital data broadcasting service; and, more particularly, to a method for providing a Web service that can simultaneously input/output speech data along with visual data by integrating speech Web data with broadcasting Web sites
(BWS) provided for a multimedia broadcasting in a Digital
Multimedia Broadcasting (DMB) system configured based on a Digital Audio Broadcasting (DAB), and an apparatus thereof.
Background Art
Conventional broadcasting Web sites (BWS) providing methods make Hyper Text Markup Language (HTML) contents, which is of a Web specification, data and transmit the data to provide a Web service on a screen by using a multimedia object transfer (MOT) method through a Digital Multimedia Broadcasting (DMB) network configured based on Digital Audio Broadcasting (DAB), and an apparatus thereof. The method, however, can simply output the Web data defined by the HTML onto the screen. Therefore, the method cannot sufficiently transfer data in a broadcasting system for a mobile environment, such as a DAB-based DMB. Also, an X + V method is underway for standardization and development to provide a multi-modal Web service. It is a method directed through a screen and it can provide a multi-modal Web service that can input/output speech data by combining the host language XHTML with forms in charge of speech interface of VoiceXML. However, the method, too, operates based on a visual interface with the XHTML as a host language, and it is somewhat inappropriate for a mobile environment.
The present invention provides a method for synchronizing visual and speech Web data that can overcome the aforementioned drawbacks and provide users with a speech-directed Web service in a mobile environment or a fixed location environment, instead of a visual-directed Web service, and an apparatus thereof.
Disclosure Technical Problem
It is, therefore, an object of the present invention to provide a broadcasting Web sites (BWS) service that can overcome the limitation of inputting/outputting only visual data of Web data in a conventional Digital Audio Broadcasting (DAB) BWS service or a conventional Digital Multimedia Broadcasting (DMB) BWS service and offer the speech enabled web data service, too.
In the first place, an embodiment of the present invention defines a speech-directed Web language to provide a speech-directed Web service in consideration of a mobile environment, instead of a screen-directed Web service .
Secondly, another embodiment of the present invention provides a service capable of inputting/outputting speech data by integrating a conventional Web service framework, e.g., a BWS service, with a speech input/output module.
Thirdly, yet another embodiment of the present invention provides a technology of synchronizing a content following a visual Web specification, e.g., HTML, and a VoiceXML content capable of providing a speech Web service, that is, a technology of synchronizing visual data with speech data. For this, processing of documents should be synchronized, and a user input device should be synchronized for one document. It is the object of the present invention to provide a method and apparatus for the synchronizations.
Technical Solution
In accordance with one aspect of the present invention, there is provided a method for synchronizing visual data with speech data to provide a broadcasting Web sites (BWS) service capable of simultaneously inputting/outputting speech data in a multimedia broadcasting service, the method which includes the steps of: a) generating a visual Web document; b) generating a speech Web document including synchronization tags related to the visual Web document; and c) identifying the speech Web document and the visual Web document based on a sub-channel or a directory and transmitting the speech Web document and the visual Web document independently.
In accordance with another aspect of the present invention, there is provided an apparatus for synchronizing visual data with speech data to provide a BWS service capable of simultaneously inputting/outputting speech data in a multimedia broadcasting service, the apparatus which includes: a) a content data generator for generating a visual Web document and a speech Web document including synchronization tags related to the visual Web document; b) a multimedia object transfer (MOT) server for transforming both the generated visual Web document and the speech Web document into an MOT protocol; and c) a transmitting system for identifying the speech Web document and the visual Web document of the MOT protocol based on a sub-channel or a directory and transmitting the speech Web document and the visual Web document independently . In accordance with another aspect of the present invention, there is provided a method for synchronizing visual data with speech data to provide a BWS service capable of simultaneously inputting/outputting speech data in a multimedia broadcasting service, the method which includes the steps of: a) receiving and loading a visual Web document and a speech Web document including synchronization tags related to the visual Web document, the visual Web document and the speech Web document being identified based on a sub-channel or a directory and transmitted independently; and b) analyzing the synchronization tags when a synchronization event occurs and performing a corresponding synchronization operation.
In accordance with yet another aspect of the present invention, there is provided an apparatus for synchronizing visual data with speech data to provide a BWS service capable of simultaneously inputting/outputting speech data in a multimedia broadcasting service, the apparatus which includes: a) a baseband receiver for receiving broadcasting signals through a multimedia broadcasting network and performing channel decoding; b) a multimedia object transfer (MOT) decoder for decoding channel-decoded packets and restoring a visual Web document and a speech Web document including synchronization tags related to the visual Web document; and c) an integrated Web browser for analyzing the synchronization tag when a synchronization event occurs and executing a corresponding synchronization operation.
Advantageous Effects When an HTML document, which is a visual Web document, is synchronized with a VoiceXML content, which is a speech Web document, by using the synchronization tags and it is possible to perform synchronized input/output, a multimedia broadcasting service user can conveniently access to corresponding information by receiving both screen output and speech output for a Web data service and, if necessary, making a command by speech even in a mobile environment.
In other words, the present invention has an advantage that it can ensure the backward compatibility with lower-ranked services by individually authoring and transmitting data to provide an integrated synchronization service, instead of integrating markup languages and transmitting them in the form of data of a sort, which is generally used. To synchronize two Web documents, the technology of the present invention adds synchronization-related elements to a host markup language to thereby maintain a conventional service framework. Thus, users can receive a conventional broadcasting Web site and, at the same time, access to the Web by speech, listen to information, and control the Web by speech.
Description of Drawings
The above and other objects and features of the present invention will become apparent from the following description of the preferred embodiments given in conjunction with the accompanying drawings, in which:
Fig. 1 is an exemplary view illustrating how broadcasting Web site documents are authored to be synchronized and capable of speech input/output in accordance with an embodiment of the present invention; Fig. 2 is a view describing broadcasting Web site documents capable of speech input/output and a data transmitting method in accordance with an embodiment of the present invention; Fig. 3 is an exemplary view showing broadcasting Web site documents capable of speech input/output when a synchronization document is separately provided in accordance with an embodiment of the present invention;
Fig. 4 is a block view describing a Digital Multimedia Broadcasting (DMB) system which is configured based on a Digital Audio Broadcasting (DAB) and providing a broadcasting Web sites (BWS) service capable of simultaneous speech input/output; and
Fig. 5 is a block view illustrating an integrated Web browser of Fig. 4.
Best Mode for the Invention
To have a look at conventional technologies, broadcasting Web sites (BWS) defined to provide a Web service in a multimedia broadcasting service, such as Digital Audio Broadcasting (DAB) and Digital Multimedia Broadcasting (DMB), is a specification for providing users with a Web site by authoring a Web content based on Hyper Text Markup Language (HTML) and transmitting the Web content through a multimedia broadcasting network. The Web language that becomes the basis for providing the service includes a basic profile which adopts HTML 3.2 as a Web specification in consideration of a terminal with a relatively low specification, and a non-restrictive profile which has no restriction in consideration of a high-specification terminal, such as a personal computer (PC). Since the profiles are based on the HTML, which is a Web representation language, it requires a Web browser to provide a terminal with a BWS service. The browser may be called a BWS browser and it provides a Web service by receiving and decoding Web contents of txt, html, jpg, and png formats transmitted as objects through a multimedia object transfer (MOT) . Generally, the output is provided in the visual form. That is, texts or still images are displayed on a screen with a hyperlink function and they transit into the other contents transmitted together through the MOT to thereby provide a visual-based local Web service. Of course, when the specification includes a function of recovering a speech file or other multimedia files, it is possible to provide the output not only on the screen but also by speech. However, with the current specification, which is the basic profile, it is possible to provide only a local Web service capable of visual input/output based on the Graphical User Interface (GUI).
Particularly, there are increasing demands for services that can provide a data service additionally while securing mobility in a mobile multimedia broadcasting such as the DAB and DMB and access to multimodal information, instead of a single-modal information. The World Wide Web Consortium (W3C) has completed developing VoiceXML and SALT specifications capable of providing speech-based Web services, and it is expected to embark in standardization of multi-modal Web specification additionally.
VoiceXML is a Web language devised for an interactive speech response service of an Interactive Voice Response (IVR) type. When it is actually mounted on the terminal, it can provide a speech enabled Web service. The technology defines a markup language that can be transited into another application, document, or dialogue based on a dialogue obtained by modeling a conversation between a human being and a machine. Differently from the conventional visual-based Web service, the VoiceXML can provide a Web service that can input/output data by speech. Also, Web information can be delivered by speech by applying a Text To Speech (TTS) technology which transforms text data into speech data and the Automatic Speech Recognition (ASR) technology which performs speech recognition to an input/output module, and user input data are received by speech to process a corresponding command or execute a corresponding application. The VoiceXML is effective in a mobile environment. It has an advantage that users listen to a Web service provided without a visual output on the screen and perform navigation by inputting speech data at desired information. However, there is a limitation in delivering Web information by speech only and, when speech input/output is made together with visual data on the screen, it is convenient and it is possible to provide diverse additional data services. For this, the present invention provides a transmission and synchronization method for providing a multi-modal Web service by integrating the conventional BWS Web specification, i.e., HTML, with a speech Web language, i.e., VoiceXML. Hereinafter, the transmission and synchronization method will be described.
The basic principle of the present invention is to generate a speech Web document including synchronization information related to a visual Web document and transmit the visual Web document and the speech Web document through another sub-channel or another directory of the same sub-channel.
Fig. 1 is an exemplary view illustrating how of broadcasting Web site documents are authored to be synchronized and capable of speech input/output in accordance with an embodiment of the present invention. As shown in Fig. 1, a visual Web document and a speech Web document are separately created in the embodiment of the present invention. The visual Web document is an HTML or an xHTML content defined in the BWS, whereas the speech Web document is a document integrating elements or tags in charge of synchronization between the VoiceXML and the visual Web documents, a speech recognition module, and a component-related module such as a speech combiner and a receiver. Fig. 2 is a view describing broadcasting Web site documents capable of speech input/output and a data transmitting method in accordance with an embodiment of the present invention.
Referring to Fig. 2, although the visual Web document and the speech Web document are transmitted and signaled through different sub-channels or the same subchannel, they are transmitted using different directories, This is to make a terminal capable of receiving an existing BWS service receive the conventional service, even if the BWS is cooperated with a speech Web document. The signaling for the speech BWS is additionally processed in the speech Web document, i.e., a speech module.
The synchronization between the visual Web document and the speech Web document is processed by using synchronization tags <esync>, <icync> and <fsync>. The synchronization tags are described in the speech Web document without exception. Also, the synchronization tags are identified by the following namespace.
<va version = "1.0" xmlns:va="http://www. worlddab.org/schemes/va"
Figure imgf000011_0001
xsi:schemaLocation="http://www. worlddab.org/schemas/va va.xsd"> Also, an HTML forming a BWS document uses the following namespace.
HTMUIDOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"
"http://www.w3.org/TR/HTML32.dtd"> b
A VoiceXML forming a speech Web document has the following name space.
IQ <vxml version = "2.0" xmlns= "http://www.w3.org/2OOl /vxml" xmlns :xsi= http://www.w3.org/2001 /XMLSchema-instance xsi :schemaLocation= "http://www.w3.org/2001 /vxml> http://www.w3/org/TR/voicexml20/vxml.xsd">
15 To take an example, the entire namespace including the visual Web document and the speech Web document may be designated as follows:
<?xml version ="1.0" encoding ="UTF-8"> <vxml version ="2.O' xmlns = "http"//www.w3.org/2001 /vxml
(, xmlns = "http://www.w3.org/20OI /XMLSchema-instance xmlns = "http://www.worlddab.org/schemas/va" xsi :schemaLocation =
"http://www.w3.org/2OOl /vxml http7/www .w3.org/TR/voicexml20/vxml/sxd http://www.worlddab.org/schemas/va http://www.worlddab.org/schemas/va/va.xsd">
25 The synchronization tags for processing synchronization between the visual Web document, i.e., HTML, and the speech Web document, i.e., the VoiceXML, should describe synchronization between an application, documents, and forms within the document. The
30 synchronization tags used for the purpose are <esync>, <isync> and <fsync>. The tags <esync> and <isync> are in charge of synchronization between an application and documents, whereas the <fsync> tag is in charge of synchronization between forms. Herein, the synchronization between the application and the documents should be simultaneously loaded, interpreted and rendered in the initial period when the application starts. The synchronization between the forms signifies that user input data are simultaneously inputted to a counterpart form.
The <esync> tag is used to describe synchronization information between applications or between documents, when synchronization related information, i.e., <esync>, and related attributes do not exist in the speech Web document but exist in an independent external document, e.g., a document with an extension name of '.sync'. The <esync> tag supports the synchronization function based on the attributes shown in the following Table 1.
Figure imgf000013_0001
This designates a URI for a speech part document to be synchronized Also it vadoc specifies a form of VoiceXML as follows : e.g. ,
" . /ensemble.vxml#sndform".
This is a URI for a BWS part document to be bwsdoc synchronized.
The external document synchronization using the <esync> tag requires metadata which provide the speech Web document with information on the external synchronization document. For the metadata, used is a <metasync> tag having attributes defined as shown in Table 2.
Table 2
Figure imgf000014_0001
The <metasync> tag should be positioned in the speech Web document and it provides metadata to the <esync> tags stored in the external document. The entire operation mechanism is as shown in Fig. 3. That is, the synchronization document and the related <esync> tags are interpreted through the <metasync> tag described in the speech Web document and then the related BWS document is simultaneously loaded and rendered.
The <isync> tag indicates a synchronization method of a document. Differently from the <esync> tag, it is not authored in a separate document but it is formed by directly describing related synchronization tags within a predetermined form. Herein, the form includes a <form> tag and a <menu> tag of a VoiceXML, in a speech Web document. This is to support synchronization occurring when a predetermined form of the speech Web document should be synchronized with a BWS Web document and when a predetermined document needs to be transited.
According to an example, when there are a plurality of forms in on speech Web document and each form requires BWS documents having multiple pages and a synchronized operation with the BWS documents, it can be resolved by describing related <isync> tags in each form. Actually, the tag <isync> may be described in tags <link> or <goto> of the VoiceXML and secure synchronized transit.
Therefore, when a specific speech Web document is stacked and no specific form is designated, synchronization is processed with reference to the synchronization document providing the initial <esync>. When a related tag <isync> is designated to a specific form, the tag <isync> has a priority to be synchronized with the BWS Web document. For the synchronization, another <isync> tag is used, which is shown in the following Table. When the <isync> tag is defined in the <link> or <goto> tag, the BWS Web document should be transited according to the definition of the <isync> tag. When the form has a designated synchronization, there should be only one <isync> related to this form. However, when synchronization is specified in a form, there may be a plurality of transit tags. In other words, when synchronization is described with the tag <goto> or <link>, there may be a plurality of <isync> tags. The <isync> tags should be necessarily described in the <goto> or <link>, and synchronization information only affects to the transition process. For this, an attribute λtype' is supported.
The attributes of the <isync> tag for realizing synchronization based on the tag <isync> tag are as shown in the following Table 3
Table 3
Figure imgf000016_0001
The following shows an example that shows synchronization of an application document authored by using the tags <esync> and <isync>.
<?xml version ="1.0" encoding = "UTF-8">
<vxml version ="2.0" xmlns = "http"//www.w3.org/2001/vxml xmlns = "http://www.w3.org/2001/XMLSchema-instance xmlns = "http://www.worlddab.org/schemas/va" xsi:schemaLocation = "http://www.w3.org/2001/vxml http://www. w 3. org/TR/voicexml20/vxml/sxd http://www.worlddab.org/schemas/va http://www.worlddab.org/schemas/va/va.xsd">
<va:metasync doc="main.esync" syncid ~"#service_main" />
<form id = ' 'service _mainjntro"> <block bargine =f ' alse '>
In this data service, you can get various information dedicated to life in general such as local hot news, local transportation, shopping and so on. Are you ready for surfing this service? </bhck> </form>
<form id = "hotnews_intro"> <va:isync> id = "hotnews jntro sync" type ="form" next= "hotnews_mtro.html" </va:isync>
<fιeld name ="move_to_news_page">
<grammar mode = "voice" version-" 1.0" root="conιmand"> <rule id= "command" scope -"dialog" > <one-of>
<item> news </item> <item> local news </item> <item> hot news </item> <item> headline </item> </one-op' </rule> </grammar>
<prompt> In this service, headline news and breaking news are provided.
Do you want to move? </prompt~> <fllled>
<if cond = "move_to_news_page =— 'news'">
<goto next ="#initial_dialogjbr_news" >
<va:isync type = "transit" next = "../../hotnews. html" /> </goto> </if> </filled>
<catch event — "noinput"
<goto next = "#game_intro" /> </catch> </field>
<form id = "game_intrυ"> <va:isync> id = "game_intro_sync" type ="form" next="game _inlro.html" </va:isync>
< field name ="move to_game_page">
<grammar mode = "voice" version="1.0" root="command">
<rule id=" command" scope ="dialog" > game </rule> </grammar>
<prompt> In this service, voice quiz and multimodal games are provided.
Do you want to play? </prompt> <filled>
<if cond = "move_to_game_page == game'"> <goto next ="# dialog Jbr_games" >
<va:isync type = "transit" next ="../../games.html"/> </goto> </if> </flIled> </βeld> </foπn>
When the document is executed in the above example, a service_main_intro" dialogue is outputted by speech and, at the same time, a corresponding html page which affects the entire document, for example, a main page of the entire service, is synchronized based on the
<metasync> tag and rendered onto a screen. Herein, since bargine is not permitted, user input data are not processed and the screen is maintained until the next
dialogue is executed. When a "hotnews_intro" dialogue is executed, a BWS Web document corresponding to the
"hotnews_intro" is automatically synchronized based on the "hotnews_intro_sync" <isync>, and eventually loaded and rendered. The <fsync> tag is needed for inter-form synchronization between the speech Web document and the BWS Web document and it processes the user input data. The concept of the <fsync> tag is similar to document synchronization. It signifies that, when the user input data are processed through speech recognition of the speech Web document, the processed user input data are transferred to and reflected in a <input> tag of the BWS. Conversely, when data are inputted from a user on the BWS, the content of the user input data is reflected in a <field> tag of the speech Web document. For this, an input function should be provided to the two-party- modules, and synchronization mechanism should be accompanied. The <fsync> tag is a sort of executable contents. It may be positioned within the <form> of the VoiceXML or it may exist independently. If any, the scope of the <fsync> tag is limited to a document. When the <fsync> has a global scope over all documents, it should be specified in a root document. Then, it may be activated in all documents.
When the λfield' attribute is not specified, it means that the <field> of a corresponding form is an object to be synchronized. In this case, the form should have only one <field> tag. If there are a plurality of <field> tags in one form, its attributes should be specified necessarily. Also, each field should have a unique name.
The <fsync> tag has the attributes shown in the following Table 4 and it should be in charge of synchronization between forms.
Table 4
Attribute Function field This signifies a <field> name of a VoiceXML
Figure imgf000020_0001
Also, the following conditions should be satisfied to achieve the form synchronization.
Speech data input from the user should be updated in the <field> tag of VoiceXML and <input> tag of the BWS HTML.
Visual data inputted from the user, such as data input through a keyboard or a pen, should be updated in the <input> tag of HTML and the <field> tag of VoiceXML simultaneously.
Visual data inputted from the user should satisfy a guard condition of the <field> tag of VoiceXML.
The <field> tag of VoiceXML should be matched one- to-one with the <input> tag of HTML in the moment when the inputted data are about to be reflected.
The form synchronization should be carried out in parallel to the document synchronization. That is, the <field> or <input> tag to be synchronized may be validly updated only in a document already synchronized. In short, the tags of the two modules, which should receive the inputted data, should mutually exist in the synchronized document. When they are described on the external document, only the 'root' document is allowed for general synchronization. The data should be mutually inputted only in the form of an activated speech Web document. In short, there are a plurality of <input> tags in one BWS document and, when data are inputted into an <input> tag which is not linked with a <form> tag activated in the speech Web document, update into the <form> currently activated in the speech Web document is prohibited. For example, when a form corresponding to a speech output dictating to input a card number is executed and a valid date field is requested to be filled into an <input> of the BWS Web document, the mixed initiative form operation is prohibited. An example of the form synchronization is as follows .
10
<!— Voice Part -> <form id = "gasstationjsync"> <va:isync> id = "id_gas" type ^"document" next= "lbs_service.html"
</ va:isync>
<field name ~"yourjocation">
<grammar src= "yourlocation.grxml" type=' 'application/ 'srgs+xml"/> 15 <prompt>
If you say your city name, you can get your gas station information. </prompt>
<catch event — nomatch help> <prompt>
This service is only available in Seoul, Daejeon, Kwangju, Taegue, Busan, lncheon. So please say above cities. </prompt> </catch~>
<filled> Jocation" />
Figure imgf000021_0001
method = "post" namelist = "your Jocation" /> </fllled> </fieldϊ- </fbrm>
<!-- BWS Part "dab://lbs _service.html" --> <head>
<title> Location based service </title> </head> <body> <->3 <hl> Local gas station information </hl>
<p> If you type the name of your city, you can get some gas station information </p>
<form id = "local _gas_station" method ="post" action ="cgi/hotel.pl">
<input name ="your location" type ="text"/>
<input type="submit" value=" my Jocation" />
</form>
</body> When the <fsync> tag is specified as shown in the above example, values are simultaneously inputted from the speech module to the <input> tags of the BWS Web document. Fig. 4 is a block view describing a Digital Multimedia Broadcasting (DMB) system providing a broadcasting Web sites (BWS) service capable of simultaneous speech input/output.
Referring to Fig. 4, the DMB system for providing speech-based BWS service capable of simultaneous speech input/output can be divided into a DMB transmission part and a DMB reception part based on a DAB system. The DMB transmission part includes a content data generator 110, a multimedia object transfer server (MOT) 120, and a DMB transmitting system 130. The content data generator 110 generates speech contents (speech Web documents) and BWS contents (visual Web documents). The MOT server 120 transforms the directory and file objects of the speech contents and BWS contents into MOT protocols before they are transmitted.
The DMB transmitting system 130 multiplexes the respective MOT data of the transformed MOT protocol, which include both speech Web documents and visual Web documents, with different directory of the same sub- channel or different sub-channels and broadcasts them through a DMB broadcasting network. The present invention, however, is not limited to them and a speech Web document and a visual Web document may be generated in an external device and transmitted from the external device.
The DMB broadcasting reception part, i.e., the DMB receiving block 200, includes a DMB baseband receiver 210, an MOT decoder 220, and a DMB integrated Web browser 230. The DMB baseband receiver 210 receives DMB broadcasting signals from the DMB broadcasting network based on the DAB system, performs decoding for corresponding subchannels, and outputs data of the respective sub-channels. The MOT decoder 220 decodes packets transmitted from the DMB baseband receiver 210 and restores MOT objects. The DMB integrated Web browser 230 executes the restored MOT objects, which include directories and files, independently or based on a corresponding synchronization method. In the present invention, the restored objects includes visual Web documents and speech Web documents related to the visual Web documents, and the DMB integrated Web browser 230 analyzes the aforementioned synchronization tags during the generation of synchronization event and executes the synchronization function based on the synchronization tags. Fig. 5 is a block view illustrating an integrated Web browser of Fig. 4.
Referring to Fig. 5, the integrated Web browser 230 includes a speech Web browser 233, a BWS browser 235, and a synchronization management module 231. The speech Web browser 233 drives speech markup generation language extended based on VoiceXML . The BWS browser 235 drives Web pages based on the HTML. The synchronization management module 231 manages synchronization between the speech Web browser 233 and the BWS browser 235. The speech Web browser 233 sequentially drives Web pages authored in the VoiceXML-based speech markup language, outputs speech to the user, and processes user input data through a speech device. The BWS browser 235 drives the Web pages authored in the HTML language defined in the DAB/DMB specifications and displays input/output on a screen, just as commercial browsers. The synchronization management module 231 receives synchronization events generated in the speech Web browser 233 and the BWS browser 235 and synchronizes corresponding pages and forms of each page based on the pre-defined synchronization protocol (synchronization tags).
Examples of the DMB receiving block 200 include a personal digital assistant (PDA), a mobile communication terminal, and a settop box for a vehicle that can receive and restore DAB and DMB service.
As described above, the method of the present invention can be realized in a computer-readable recording medium, such as CD-ROM, RAM, ROM, floppy disks, hard disks, magneto-optical disks and the like. Since the process can be easily implemented by those skilled in the art to which the present invention pertains, further description will not be provided herein.
While the present invention has been described with respect to certain preferred embodiments, it will be apparent to those skilled in the art that various changes and modifications may be made without departing from the scope of the invention as defined in the following claims,

Claims

What is claimed is;
1. A method for synchronizing visual data with speech data to provide a broadcasting Web sites (BWS) service capable of simultaneously inputting/outputting speech data in a multimedia broadcasting service, comprising the steps of: a) generating a visual Web document; b) generating a speech Web document including synchronization tags related to the visual Web document; and c) identifying the speech Web document and the visual Web document based on a sub-channel or a directory and transmitting the speech Web document and the visual Web document independently.
2. The method as recited in claim 1, wherein the synchronization tags include an external document synchronization tag <esync> that provides synchronization information with the visual Web document which should be simultaneously loaded and rendered along with the speech Web document when an application is rendered and driven.
3. The method as recited in claim 2, wherein the synchronization tags further include a metadata tag
<metasync> that provides metadata on the external document synchronization tags <esync>.
4. The method as recited in claim 3, wherein the external document synchronization tag <esync> has attribute information defined as the following table:
Attribute Function id This is an identifier indicating an <esync> tag and referred to in a <metasync> tag.
Figure imgf000026_0001
5. The method as recited in claim 3, wherein the metadata tag <metasync> has attribute information defined as the following table:
Figure imgf000026_0002
6. The method as recited in claim 1, wherein the synchronization tags includes an internal document synchronization tag <isync> that provides synchronization information with the visual Web document related to a corresponding form among the forms of the speech Web document.
7. The method as recited in claim 6, wherein the internal document synchronization tag <isync> has attribute information defined as the following table:
Figure imgf000027_0001
yform, ' a corresponding <isync> signifies that a predetermined form is synchronized with a predetermined document. When it is Λtransit, ' it signifies synchronization that should be simultaneously transited to the next document. A default value is the 'form.'
8. The method as recited in claim 1, wherein the synchronization tags include an inter-form synchronization tag <fsync> that provides synchronization information between a form of the speech Web document and a form of the visual Web document to secure mutual input synchronized between the form of the speech Web document and the form of the visual Web document.
9. The method as recited in claim 8, wherein the inter-form synchronization tag <fsync> has attribute information defined as the following table:
Figure imgf000028_0001
10. An apparatus for synchronizing visual data with speech data to provide a broadcasting Web sites (BWS) service capable of simultaneously inputting/outputting speech data in a multimedia broadcasting service, comprising: a) a content data generator for generating a visual Web document and a speech Web document including synchronization tags related to the visual Web document; b) a multimedia object transfer (MOT) server for transforming the generated visual Web document and the speech Web document as well into an MOT protocol; and c) a transmitting system for identifying the speech Web document and the visual Web document of the MOT protocol based on a sub-channel or a directory and transmitting the speech Web document and the visual Web document independently.
11. The apparatus as recited in claim 10, wherein the synchronization tags include an external document synchronization tag <esync> that provides synchronization information with the visual Web document which should be simultaneously loaded and rendered along with the speech Web document when an application is authored and driven.
12. The apparatus as recited in claim 10, wherein the synchronization tags further include a metadata tag <metasync> that provides metadata on the external document synchronization tags <esync>.
13. The apparatus as recited in claim 10, wherein the synchronization tags include an internal document synchronization tag <isync> that provides synchronization information with the visual Web document related to a corresponding form among the forms of the speech Web document.
14. The apparatus as recited in claim 10, wherein the synchronization tags include an inter-form synchronization tag <fsync> that provides synchronization information between a form of the speech Web document and a form of the visual Web document to secure mutual input synchronized between the form of the speech Web document and the form of the visual Web document.
15. A method for synchronizing visual data with speech data to provide a broadcasting Web sites (BWS) service capable of simultaneously inputting/outputting speech data in a multimedia broadcasting service, comprising the steps of: a) receiving and loading a visual Web document and a speech Web document including synchronization tags related to the visual Web document, the visual Web document and the speech Web document being identified based on a sub-channel or a directory and transmitted independently; and b) analyzing the synchronization tags when a synchronization event occurs and performing a corresponding synchronization operation.
16. The method as recited in claim 15, wherein a synchronization tag <metasync> that provides metadata on an external document synchronization tag <esync>, which provides the synchronization information with the visual Web document, are analyzed and the related visual Web document is loaded and rendered in the step b).
17. The method as recited in claim 16, wherein when an internal document synchronization tag <isync> that provides synchronization information with the visual Web document related to a corresponding form among the forms of the speech Web document during execution of the speech input/output is designated, the internal document synchronization tag <isync> is analyzed and the related visual Web document is loaded and rendered.
18. The method as recited in claim 15, wherein when user data are inputted while the application on the speech input/output is driven, synchronization is performed by using an inter-form synchronization tag <fsync> that provides synchronization information between a form of the speech Web document and a form of the visual Web document.
19. An apparatus for synchronizing visual data with speech data to provide a broadcasting Web sites (BWS) service capable of simultaneously inputting/outputting speech data in a multimedia broadcasting service, comprising: a) a baseband receiver for receiving broadcasting signals through a multimedia broadcasting network and performing channel decoding; b) a multimedia object transfer (MOT) decoder for decoding channel-decoded packets and restoring a visual Web document and a speech Web document including synchronization tags related to the visual Web document; and c) an integrated Web browser for analyzing the synchronization tag when a synchronization event occurs and executing a corresponding synchronization operation.
20. The apparatus as recited in claim 19, wherein the integrated Web browser analyzes a metadata tag
<metasync> that provides metadata on an external document synchronization tag <esync> providing synchronization information with the visual Web document, and loads and renders the related visual Web document.
21. The apparatus as recited in claim 20, wherein when an internal document synchronization tag <isync> that provides synchronization information with the visual Web document related to a corresponding form among the forms of the speech Web document during execution of the speech Web document is designated, the integrated Web browser analyzes the internal document synchronization tag <isync>, and loads and renders the related visual Web document.
22. The apparatus as recited in claim 19, wherein when user data are inputted while the application on the speech input/output is driven, the integrated Web browser performs synchronization by using an inter-form synchronization tag <fsync> that provides synchronization information between a form of the speech Web document and a form of the visual Web document.
PCT/KR2006/004901 2005-11-21 2006-11-21 Method and apparatus for synchronizing visual and voice data in dab/dmb service system WO2007058517A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP06823659A EP1952629A4 (en) 2005-11-21 2006-11-21 Method and apparatus for synchronizing visual and voice data in dab/dmb service system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2005-0111238 2005-11-21
KR20050111238 2005-11-21

Publications (1)

Publication Number Publication Date
WO2007058517A1 true WO2007058517A1 (en) 2007-05-24

Family

ID=38048864

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2006/004901 WO2007058517A1 (en) 2005-11-21 2006-11-21 Method and apparatus for synchronizing visual and voice data in dab/dmb service system

Country Status (3)

Country Link
EP (1) EP1952629A4 (en)
KR (1) KR100862611B1 (en)
WO (1) WO2007058517A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100862611B1 (en) * 2005-11-21 2008-10-09 한국전자통신연구원 Method and Apparatus for synchronizing visual and voice data in DAB/DMB service system
CN111125065A (en) * 2019-12-24 2020-05-08 阳光人寿保险股份有限公司 Visual data synchronization method, system, terminal and computer readable storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100902732B1 (en) * 2007-11-30 2009-06-15 주식회사 케이티 Proxy, Terminal, Method for processing the Document Object Model Events for modalities

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001008173A (en) * 1999-06-21 2001-01-12 Mitsubishi Electric Corp Data transmission equipment
JP2001267945A (en) * 2000-03-21 2001-09-28 Clarion Co Ltd Multiplex broadcast receiver
KR20050103105A (en) * 2004-04-24 2005-10-27 한국전자통신연구원 Apparatus and method for processing multimodal web-based data broadcasting, and system and method for receiving multimadal web-based data broadcasting

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11215490A (en) * 1998-01-27 1999-08-06 Sony Corp Satellite broadcast receiver and its method
US6357042B2 (en) * 1998-09-16 2002-03-12 Anand Srinivasan Method and apparatus for multiplexing separately-authored metadata for insertion into a video data stream
US7028306B2 (en) * 2000-12-04 2006-04-11 International Business Machines Corporation Systems and methods for implementing modular DOM (Document Object Model)-based multi-modal browsers
EP1483654B1 (en) * 2002-02-07 2008-03-26 Sap Ag Multi-modal synchronization
WO2003071422A1 (en) * 2002-02-18 2003-08-28 Kirusa, Inc. A technique for synchronizing visual and voice browsers to enable multi-modal browsing
US20030187944A1 (en) * 2002-02-27 2003-10-02 Greg Johnson System and method for concurrent multimodal communication using concurrent multimodal tags
US20040128342A1 (en) * 2002-12-31 2004-07-01 International Business Machines Corporation System and method for providing multi-modal interactive streaming media applications
KR20040063373A (en) * 2003-01-07 2004-07-14 예상후 Method of Implementing Web Page Using VoiceXML and Its Voice Web Browser
KR100561228B1 (en) * 2003-12-23 2006-03-15 한국전자통신연구원 Method for VoiceXML to XHTML+Voice Conversion and Multimodal Service System using the same
KR100862611B1 (en) * 2005-11-21 2008-10-09 한국전자통신연구원 Method and Apparatus for synchronizing visual and voice data in DAB/DMB service system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001008173A (en) * 1999-06-21 2001-01-12 Mitsubishi Electric Corp Data transmission equipment
JP2001267945A (en) * 2000-03-21 2001-09-28 Clarion Co Ltd Multiplex broadcast receiver
KR20050103105A (en) * 2004-04-24 2005-10-27 한국전자통신연구원 Apparatus and method for processing multimodal web-based data broadcasting, and system and method for receiving multimadal web-based data broadcasting

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP1952629A4 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100862611B1 (en) * 2005-11-21 2008-10-09 한국전자통신연구원 Method and Apparatus for synchronizing visual and voice data in DAB/DMB service system
CN111125065A (en) * 2019-12-24 2020-05-08 阳光人寿保险股份有限公司 Visual data synchronization method, system, terminal and computer readable storage medium
CN111125065B (en) * 2019-12-24 2023-09-12 阳光人寿保险股份有限公司 Visual data synchronization method, system, terminal and computer readable storage medium

Also Published As

Publication number Publication date
KR100862611B1 (en) 2008-10-09
KR20070053627A (en) 2007-05-25
EP1952629A1 (en) 2008-08-06
EP1952629A4 (en) 2011-11-30

Similar Documents

Publication Publication Date Title
EP1143679B1 (en) A conversational portal for providing conversational browsing and multimedia broadcast on demand
US6715126B1 (en) Efficient streaming of synchronized web content from multiple sources
US9300505B2 (en) System and method of transmitting data over a computer network including for presentations over multiple channels in parallel
JP3880517B2 (en) Document processing method
KR101027548B1 (en) Voice browser dialog enabler for a communication system
US8555151B2 (en) Method and apparatus for coupling a visual browser to a voice browser
US8645134B1 (en) Generation of timed text using speech-to-text technology and applications thereof
KR100833500B1 (en) System and Method to provide Multi-Modal EPG Service on DMB/DAB broadcasting system using Extended EPG XML with voicetag
JP2000347972A (en) Multicast data service and interactive broadcast system using broadcast signal mark-up stream
CN101617536B (en) Method of transmitting at least one content representative of a service, from a server to a terminal, and corresponding device
US20200053412A1 (en) Transmission device, transmission method, reception device, and reception method
JP2003044093A5 (en)
WO2007058517A1 (en) Method and apparatus for synchronizing visual and voice data in dab/dmb service system
KR100513045B1 (en) Apparatus and Method for Providing EPG based XML
EP2447940B1 (en) Method of and apparatus for providing audio data corresponding to a text
KR100576546B1 (en) Data service apparatus for digital broadcasting receiver
Lee et al. Mobile multimedia broadcasting applications: Speech enabled data services
EP1696342A1 (en) Combining multimedia data
Kim et al. An Extended T-DMB BWS for User-friendly Mobile Data Service
EP1696341A1 (en) Splitting multimedia data
Matsumura et al. Restoring semantics to BML content for data broadcasting accessibility
Guo et al. A method of mobile video transmission based on J2ee
JP2002182684A (en) Data delivery system for speech recognition and method and data delivery server for speech recognition
WO2007067022A1 (en) Method for authoring location-based web contents, and appapatus and method for receiving location-based web data service in digital mobile terminal

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2006823659

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE