JP4155185B2

JP4155185B2 - Content distribution method, content distribution server, and content receiving apparatus

Info

Publication number: JP4155185B2
Application number: JP2003413485A
Authority: JP
Inventors: 毅也藤井
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2003-12-11
Filing date: 2003-12-11
Publication date: 2008-09-24
Anticipated expiration: 2023-12-11
Also published as: JP2005173241A

Description

本発明は、ネットワークを介して、複数トラックから構成されるコンテンツデータを送信するコンテンツ配信サーバと当該コンテンツデータを受信再生するコンテンツ受信装置からなるコンテンツ配信システムにおいて、トラックを動的に切り替えて送信し、切り替えられた箇所を滑らかに再生することを可能にするコンテンツ配信方法、およびコンテンツ配信方法を用いたコンテンツ配信サーバ、およびコンテンツ受信装置に関する。 The present invention dynamically switches and transmits tracks in a content distribution system including a content distribution server that transmits content data composed of a plurality of tracks and a content reception device that receives and reproduces the content data via a network. The present invention relates to a content distribution method capable of smoothly reproducing a switched portion, a content distribution server using the content distribution method, and a content receiving device.

まず、映像・音声等の実時間配信に用いられているストリーミングシステムの非同期モデルについて説明する。 First, an asynchronous model of a streaming system used for real-time distribution of video / audio will be described.

Ethernet（登録商標）（IEEE802.3）を代表とする小さなデータサイズの伝送フレームの相互交換を基調としたデータリンクでは、エンド・ツー・エンド接続された２つの通信装置間に同期クロックは存在しない場合が大半であり、データリンクを下位層として、複数のネットワーク間のパケット交換を可能にするためのIP（Internet Protocol,RFC791,RFC2460）、さらにIPを下位層としてIPパケットを用いてセッション（通信路）管理を行うためのTCP（Transmission Control Protocol,RFC793）やUDP（User Datagram Protocol,RFC768）などのネットワークプロトコルにも同期クロックは存在しない。 There is no synchronization clock between two end-to-end communication devices in a data link based on the exchange of transmission frames of small data size represented by Ethernet (registered trademark) (IEEE802.3). In most cases, IP (Internet Protocol, RFC791, RFC2460) is used to enable packet exchange between multiple networks with the data link as the lower layer, and IP packets are used as sessions (communications) with IP as the lower layer. There is no synchronization clock in network protocols such as TCP (Transmission Control Protocol, RFC793) and UDP (User Datagram Protocol, RFC768) for performing path management.

そのため、ある通信装置から送信された伝送フレームがいつ何時目的の通信装置へ届くかは保証されない。また、伝送フレームの送信から受信までに発生する時間遅延は一定ではなく、時間揺らぎ（ジッタ）が生じる。また、伝送フレームは伝送路上で消失することもあり、ある時間軸に沿って複数の伝送フレームを送信したとしても、その到着順序は保証されない。 Therefore, it is not guaranteed when and when a transmission frame transmitted from a certain communication device arrives at the target communication device. Also, the time delay that occurs from transmission to reception of the transmission frame is not constant, and time fluctuation (jitter) occurs. Also, transmission frames may be lost on the transmission path, and even if a plurality of transmission frames are transmitted along a certain time axis, the arrival order is not guaranteed.

ところで、ストリーミングシステムとは、ネットワークを介してコンテンツを受信し再生するシステムのうち、受信側でコンテンツの全てを蓄積せずに、受信しながら再生が可能となるようなシステムの総称である。映像・音声等は、実時間性を持ったコンテンツであるから、1秒間に所定のコマ数（例えば29.97フレーム／秒）で再生するといった再生同期制御が必須であり、同期クロックの存在は好適であると言える。例えば、ＴＶ放送においては、NTSC_TV信号におけるフレーム同期信号と呼ばれる同期クロックが存在し、受信側は、送信側の同期クロックを用いて非常に高い精度で再生することが可能である。 By the way, a streaming system is a generic name of systems that receive and reproduce content via a network and that can reproduce while receiving without accumulating all of the content on the receiving side. Since video, audio, etc. are contents with real-time properties, playback synchronization control such as playback at a predetermined number of frames per second (eg 29.97 frames / second) is essential, and the presence of a synchronization clock is suitable. It can be said that there is. For example, in TV broadcasting, there is a synchronization clock called a frame synchronization signal in the NTSC_TV signal, and the reception side can reproduce it with very high accuracy using the synchronization clock on the transmission side.

しかし、IPをベースとしたIPネットワークを介したストリーミングシステムでは、上述のような同期クロックがなく、送信側と受信側を同期させることはできない。必然的に非同期モデルによって再生機構を実現せざるを得ない。非同期モデルとは、各々の機器自身のシステムクロックを基準として動作する構成およびプログラミング手法である。 However, in a streaming system via an IP network based on IP, there is no synchronization clock as described above, and the transmission side and the reception side cannot be synchronized. Inevitably, the playback mechanism must be realized by an asynchronous model. The asynchronous model is a configuration and programming technique that operates based on the system clock of each device itself.

図８に、コンテンツ配信システム１０１ａ（非同期モデル）におけるコンテンツ配信サーバ１０２ａとコンテンツ受信装置１０３ａの機能ブロック図を示す。コンテンツ配信サーバ１０２ａとコンテンツ受信装置１０３ａは、IPネットワーク１０４によって相互に接続される。 FIG. 8 shows a functional block diagram of the content distribution server 102a and the content receiving device 103a in the content distribution system 101a (asynchronous model). The content distribution server 102a and the content receiving device 103a are connected to each other by the IP network 104.

コンテンツ配信サーバ１０２ａは、コンテンツ取得部１０２１、送信部１０２２、システムクロック発振部１０２３、およびコンテンツ記憶部１０２４を有する。 The content distribution server 102a includes a content acquisition unit 1021, a transmission unit 1022, a system clock oscillation unit 1023, and a content storage unit 1024.

コンテンツ取得部１０２１は、再生するコンテンツのコンテンツデータをコンテンツ記憶部１０２４に記録し、任意のファイルポインタ（コンテンツ上の読み出し位置）を用いてコンテンツ素片を取得する機能を有する。この時、ファイルポインタとは、コンテンツファイル名、トラック番号、フレーム番号の３つのうちいずれか、もしくはそれらのうちいくつかの組み合わせである。また、“コンテンツ素片”とは、任意のデータ長を有するコンテンツの実データのことを指す。 The content acquisition unit 1021 has a function of recording content data of content to be reproduced in the content storage unit 1024 and acquiring a content fragment using an arbitrary file pointer (reading position on the content). At this time, the file pointer is any one of the content file name, the track number, and the frame number, or some combination thereof. A “content segment” refers to actual data of content having an arbitrary data length.

コンテンツ取得部１０２１は、システムクロック信号に従ってコンテンツ素片を読み出す。コンテンツ取得部１０２１は、内部にコンテンツファイル名、トラック番号、フレーム番号等の初期値を持ち、起動時から自動的に読み出しを開始しても良いし、外部入力からコンテンツファイル名、トラック番号、コンテンツ再生開始時刻を得てから読み出しを開始しても良い。また、コマンド入力を受け付ける外部入力を設け、コマンド入力＝再生開始定数（例えば“PLAY”）である場合は再生するようにしても良いし、コマンド入力＝再生停止定数（例えば“STOP”）である場合は停止するようにしても良い。あるいは自動読み出し開始動作と外部読み出し開始動作を自動的に判別するようにしても良い。 The content acquisition unit 1021 reads the content segment according to the system clock signal. The content acquisition unit 1021 has initial values such as a content file name, a track number, and a frame number inside, and may automatically start reading from the time of startup, or the content file name, track number, and content from an external input Reading may be started after the reproduction start time is obtained. Further, an external input for receiving command input is provided, and when command input = reproduction start constant (for example, “PLAY”), reproduction may be performed, or command input = reproduction stop constant (for example, “STOP”). If so, it may be stopped. Alternatively, the automatic read start operation and the external read start operation may be automatically determined.

また、システムクロック発振部１０２３は、水晶発振子等で実現されたリアルタイムクロック（高精度の時計）であり、システムクロック信号をコンテンツ取得部へ供給する機能を有する。 The system clock oscillator 1023 is a real-time clock (high-precision clock) realized by a crystal oscillator or the like, and has a function of supplying a system clock signal to the content acquisition unit.

送信部１０２２は、セッションと呼ぶ仮想的な通信路を確保し、コンテンツ素片を適切なヘッダを付加してパケット化し、そのパケットを、IPネットワーク１０４を介してコンテンツ受信装置１０３ａへ送信する機能を有する。例えば、ヘッダにはネットワークタイムスタンプ（後述）が含まれる。 The transmission unit 1022 has a function of securing a virtual communication path called a session, packetizing the content fragment with an appropriate header, and transmitting the packet to the content reception device 103a via the IP network 104. Have. For example, the header includes a network time stamp (described later).

パケットを送信する際には、送信部１０２２は、コンテンツ受信装置１０３ａへパケットを届けるために送信アドレスを用いる。送信アドレスは、送信アドレスと送信ポート番号の組で構成される。送信アドレスは送信部１０２２内部の定数として予め持っていてもよいし、他のブロックから入力を受け付けてもよい。なお、パケットを送信するためのセッション（通信路）は送信アドレスが確定した時点で動的に確立するものとする。また、コンテンツ素片とパケットが必ずしも一致する必要はない。 When transmitting a packet, the transmission unit 1022 uses a transmission address to deliver the packet to the content reception device 103a. The transmission address is composed of a combination of a transmission address and a transmission port number. The transmission address may be previously stored as a constant in the transmission unit 1022 or may be input from another block. Note that a session (communication path) for transmitting a packet is dynamically established when a transmission address is determined. Further, the content segment and the packet do not necessarily match.

また、コンテンツ受信装置１０３ａは、受信部１０３１、バッファ部１０３２、システムクロック発振部１０３３、コンテンツ読取部１０３４、および再生部１０３５を有する。 In addition, the content receiving apparatus 103a includes a receiving unit 1031, a buffer unit 1032, a system clock oscillation unit 1033, a content reading unit 1034, and a reproduction unit 1035.

受信部１０３１は、IPネットワーク１０４からパケットを受信して、ヘッダを解釈して、コンテンツ素片を取り出し、バッファ部１０３２に供給する機能を有する。受信部１０３１は、コンテンツ配信サーバ１０２ａからのパケットのうち自己に必要なパケットのみを受信するために受信アドレスを用いる。受信アドレスは、受信アドレスと受信ポート番号の組で構成される。受信アドレスは受信部１０３１内部の定数として予め持っていても良いし、他のブロックから入力を受け付けても良い。 The receiving unit 1031 has a function of receiving a packet from the IP network 104, interpreting the header, taking out a content fragment, and supplying it to the buffer unit 1032. The receiving unit 1031 uses a reception address in order to receive only a packet necessary for itself among the packets from the content distribution server 102a. The reception address is composed of a combination of a reception address and a reception port number. The reception address may be previously stored as a constant in the reception unit 1031 or may be input from another block.

バッファ部１０３２は、コンテンツ素片をヘッダから得られたネットワークタイムスタンプ（後述）と共に一次的に蓄積記憶する機能を有する。 The buffer unit 1032 has a function of temporarily accumulating and storing content pieces together with a network time stamp (described later) obtained from the header.

コンテンツ読取部１０３４は、バッファ部１０３２を監視し、再生に十分なコンテンツ素片が蓄積されたと判断した時点から、システムクロック発振部１０３３からのシステムクロック信号に従ってコンテンツ素片の読み出しを開始し、コンテンツ素片の集合をコンテンツに復元して出力する機能を有する。 The content reading unit 1034 monitors the buffer unit 1032 and starts reading the content unit according to the system clock signal from the system clock oscillation unit 1033 from the time when it is determined that the content unit sufficient for reproduction has been accumulated. It has a function of restoring and outputting a set of segments as content.

再生部１０３５は、入力されたコンテンツ素片に応じた復号化を行い、スピーカ等の所定の出力装置に対して音声信号、映像信号を出力する機能を有する。例えば、コンテンツ素片がMPEGに類する高能率符号化されたデジタル音声データである場合、再生部１０３５は、高能率符号復号器（デコーダ）、D/A変換器、アナログアンプ等を有して構成される。 The playback unit 1035 has a function of performing decoding according to the input content segment and outputting an audio signal and a video signal to a predetermined output device such as a speaker. For example, when the content segment is high-efficiency encoded digital audio data similar to MPEG, the playback unit 1035 includes a high-efficiency code decoder (decoder), a D / A converter, an analog amplifier, and the like. Is done.

図８に示すコンテンツ配信システムａ（非同期モデル）の第１の特徴は、コンテンツ配信サーバ１０２ａとコンテンツ受信装置１０３ａが個別にシステムクロック発振部１０２３、１０３３を有する点にある。両者のシステムクロック周波数の精度は相当に高い必要はあるが、伝送路であるIPネットワーク１０４が同期クロックの伝送機構を備えてはいない。 The first feature of the content distribution system a (asynchronous model) shown in FIG. 8 is that the content distribution server 102a and the content receiving device 103a individually have system clock oscillation units 1023 and 1033, respectively. Although the accuracy of both system clock frequencies needs to be considerably high, the IP network 104 serving as a transmission path does not include a synchronous clock transmission mechanism.

また、第２の特徴は、コンテンツ受信装置１０３ａがバッファ部１０３２を備え、バッファ部１０３２が空にならないようにコンテンツ読取部１０３４を構成することによって、再生部１０３５が映像・音声等の実時間性を持ったコンテンツを再生する場合であっても、IPネットワーク１０４によって生じるパケットの時間遅延や時間揺らぎ（ジッタ）を吸収しつつ、正常な再生を継続する点にある。 The second feature is that the content receiving device 103a includes the buffer unit 1032 and the content reading unit 1034 is configured so that the buffer unit 1032 is not emptied. Even in the case of reproducing content having a delay, normal reproduction is continued while absorbing time delay and time fluctuation (jitter) of packets caused by the IP network 104.

以上が、映像・音声等の実時間配信に用いられているストリーミングシステムの非同期モデルの概要である。 The above is the outline of the asynchronous model of the streaming system used for real-time distribution of video / audio.

次に、ストリーミングシステムにおける再生時刻管理について説明する。 Next, playback time management in the streaming system will be described.

まず、コンテンツの再生時刻の付与方法について説明する。 First, a method for assigning content playback time will be described.

上述の非同期モデルにおいては、コンテンツが連続メディアとして正しい再生時刻に再生されるためには、コンテンツ再生時刻をコンテンツ素片に関連付けて伝送する必要がある。例えば、図８において、コンテンツ読取部１０３４が、0秒0フレーム目の映像に相当するコンテンツ素片を選び出して、再生部１０３５に出力するためには、そのコンテンツ素片が0秒0フレーム目のデータであることを示すコンテンツ再生時刻が必要である。そうしなければ、ネットワーク上で時間遅延などが生じた場合は、そのまま再生時の時刻がずれてしまうことになる。 In the above asynchronous model, in order for content to be played back as a continuous medium at the correct playback time, it is necessary to transmit the content playback time in association with the content segment. For example, in FIG. 8, in order for the content reading unit 1034 to select a content segment corresponding to the 0 second 0 frame video and output it to the playback unit 1035, the content segment is the 0 second 0 frame. Content playback time indicating data is required. Otherwise, if a time delay or the like occurs on the network, the playback time will be shifted as it is.

ストリーミングシステムにおいては、コンテンツ素片にコンテンツ再生時刻を付与するために、RTP（Real-time Transfer Protocol,RFC1889）を用いることが多い（Audio-Video Transport Working Group, "RTP: A Transport Protocol for Real-Time Applications",RFC1889, Internet Engineering Taskforce, Jan 1996）。RTPでは、コンテンツ素片をペイロード（貨物）とし、また、ペイロードに先立って付与される最低96bitのRTPヘッダを主体とし、そのうち32bitがRTPタイムスタンプと呼ばれるネットワークタイムスタンプ（コンテンツ再生時刻を変形したもの）に割り当てられている。 In streaming systems, RTP (Real-time Transfer Protocol, RFC1889) is often used to give content playback time to content pieces (Audio-Video Transport Working Group, "RTP: A Transport Protocol for Real- Time Applications ", RFC 1889, Internet Engineering Taskforce, Jan 1996). In RTP, the content fragment is the payload (cargo), and it is mainly composed of an RTP header with a minimum of 96 bits given prior to the payload, 32 bits of which are network time stamps (modified content playback time) ).

RTPタイムスタンプは、予めコンテンツの種類によって定められた、RTPタイムスタンプ周波数によって実時間に写像される。RTPタイムスタンプ周波数は、ネットワークタイムスタンプ周波数とも呼ばれる。一般に、MPEG（Motion Picture Experts Group）の定めた高能率符号化方式によってデジタル化されたコンテンツの場合、RTPタイムスタンプ周波数には90000（90KHz）を用いることが多い。これは、1秒分のコンテンツを読み出した時、RTPタイムスタンプは+90000増加されることを意味する。つまり、RTPタイムスタンプに類するネットワークタイムスタンプとネットワークタイムスタンプ周波数とコンテンツ再生時刻には、以下の式が成り立つ。 The RTP time stamp is mapped in real time by the RTP time stamp frequency determined in advance by the content type. The RTP timestamp frequency is also called a network timestamp frequency. In general, in the case of content digitized by a high-efficiency encoding method defined by MPEG (Motion Picture Experts Group), 90000 (90 KHz) is often used as the RTP timestamp frequency. This means that when reading 1 second of content, the RTP timestamp is incremented by +90000. That is, the following formula is established for the network time stamp, the network time stamp frequency, and the content playback time similar to the RTP time stamp.

＜数１＞
コンテンツ再生時刻＝ネットワークタイムスタンプ／ネットワークタイムスタンプ周波数
例えば、29.97fpsのMPEG-2映像の場合、最初の1フレーム目のRTPタイムスタンプが“0”であるとすると、2フレーム目は“3003”、30フレーム目は“90090”となる。 <Equation 1>
Content playback time = network time stamp / network time stamp frequency For example, in the case of MPEG-2 video at 29.97fps, if the RTP time stamp of the first frame is "0", the second frame is "3003", The 30th frame is “90090”.

また、MPEG-4 Audio AACの場合、１フレームは1024サンプル固定であり、かつRTPタイムスタンプ周波数は音源のPCMサンプリング周波数を用いる。例えば、PCMサンプリング周波数44100Hz（44.1KHz PCM= CD Audio）の場合、RTPタイムスタンプ周波数は44100であり、1フレーム目のRTPタイムスタンプは“0”、2フレーム目は“1024”、約1秒分に当たる43フレーム目は“44032”となる。 In the case of MPEG-4 Audio AAC, one frame is fixed at 1024 samples, and the RTP timestamp frequency uses the PCM sampling frequency of the sound source. For example, if the PCM sampling frequency is 44100Hz (44.1KHz PCM = CD Audio), the RTP timestamp frequency is 44100, the RTP timestamp of the first frame is “0”, the second frame is “1024”, approximately 1 second The 43rd frame corresponding to is “44032”.

なお、今回はRTPを例示して説明したが、再生時刻に供されるネットワークタイムスタンプと、それに対応したネットワークタイムスタンプ周波数から、受信側でのコンテンツ再生時刻を計算しうるパケットフォーマットは他にもあり、極めて有り触れたものである。例えば、MPEGが規格化した伝送規格MPEG-2 Transport StreamにはPresentation Time Stampと呼ばれる、全く同様の機能を持つタイムスタンプが規格化されている。以上が、コンテンツ再生時刻の付与方法の概要である。 In this example, RTP has been described as an example, but there are other packet formats that can be used to calculate the content playback time on the receiving side from the network time stamp provided for the playback time and the corresponding network time stamp frequency. Yes, very common. For example, the MPEG-2 Transport Stream standardized by MPEG standardizes a time stamp called a “Presentation Time Stamp” having exactly the same function. The above is the outline of the content reproduction time giving method.

次に、コンテンツ配信サーバ１０２ａが管理しコンテンツ受信装置へ送信するコンテンツを記録したファイル（以下、コンテンツファイルと称す）のデータ構造について説明する。 Next, a data structure of a file (hereinafter referred to as a content file) in which content managed by the content distribution server 102a and transmitted to the content receiving device is recorded will be described.

図９に、コンテンツ取得部１０２１によって管理されるコンテンツデータのデータ構造の一例を示す。図９に示すように、コンテンツデータは、コンテンツファイル名、総トラック数Ｔ、および各トラックのトラックデータから構成され、さらに各トラックのトラックデータは、トラック番号、トラックタイムスタンプ周波数、総フレーム数、およびフレーム番号、時刻情報、コンテンツ素片からなる複数のデータフレームから構成される。 FIG. 9 shows an example of the data structure of content data managed by the content acquisition unit 1021. As shown in FIG. 9, the content data is composed of a content file name, the total number of tracks T, and the track data of each track. The track data of each track further includes a track number, a track time stamp frequency, the total number of frames, And a plurality of data frames composed of frame numbers, time information, and content pieces.

コンテンツファイル名は、コンテンツ記憶部上でコンテンツデータを格納するファイルを一意に特定する名前である。また、総トラック数は１つのコンテンツデータ内に収められたトラック数を記録する。トラックとは、１つのファイル内に格納されている複数のコンテンツのそれぞれを指し、トラック番号によって、任意のトラック（コンテンツ）を選択することができ、ポインタとしての時刻情報によって、任意の位置から取り出すことができる。また、各トラックに記録されるトラックタイムスタンプ周波数は、時刻情報の単位時間当たりの増分を示す。 The content file name is a name that uniquely identifies a file that stores content data on the content storage unit. The total number of tracks records the number of tracks contained in one content data. A track refers to each of a plurality of contents stored in one file, and an arbitrary track (content) can be selected by a track number, and is taken out from an arbitrary position by time information as a pointer. be able to. The track time stamp frequency recorded in each track indicates an increment per unit time of time information.

また、各トラックには、コンテンツ素片の総数である総フレーム数が記録されているが、この総フレーム数は各トラックで一致していなくても良い。また、各コンテンツ素片には、時系列順にフレーム番号が付与され、トラックタイムスタンプ周波数を基準とした時刻情報が付与される。 In addition, the total number of frames, which is the total number of content segments, is recorded in each track, but the total number of frames may not be the same for each track. In addition, frame numbers are assigned to each content segment in time series, and time information based on the track timestamp frequency is assigned.

以上のような複数のコンテンツが多重化されたコンテンツデータのデータ構造は、音声・動画等を格納するファイル形式として極めて有り触れたものである。ファイル形式によっては、トラックタイムスタンプ周波数がシステム定数でデータ内に含まれていない場合や、時刻情報の保存領域を節約するために同一の時刻情報を持つフレーム同士がインデックス化されている場合がある等の細かな違いはあるが、フレーム番号と時刻情報が相互変換可能な状態で保持されているという点は共通である。 The data structure of content data in which a plurality of contents are multiplexed as described above is very common as a file format for storing audio / moving images. Depending on the file format, the track time stamp frequency may not be included in the data as a system constant, or frames with the same time information may be indexed to save the time information storage area. However, the frame number and the time information are held in a mutually convertible state in common.

このようなファイル形式としては、Microsoft AVIや、Apple Computer QuickTime FormatやISO/IEC 14496-1 MP4 Fileなどが挙げられる。また、Apple Computer QuickTime Format やISO/IEC 14496-1 MP4 Fileにおいては、参照Atom（Data Reference Atom）を使用することによって、複数トラックの全部または一部のコンテンツを、外部のコンテンツデータとして独立することができるが、主となるコンテンツデータにフレーム番号と時刻情報が相互変換可能な状態で記録されているので、コンテンツデータと同等のデータ構造を持つとみなして良い。以上がコンテンツ配信サーバ１０２ａにおけるコンテンツデータのデータ構造の概要である。 Examples of such file formats include Microsoft AVI, Apple Computer QuickTime Format, and ISO / IEC 14496-1 MP4 File. In addition, in Apple Computer QuickTime Format and ISO / IEC 14496-1 MP4 File, by using Reference Atom (Data Reference Atom), all or part of the contents of multiple tracks can be made independent as external content data. However, since the frame number and time information are recorded in the main content data in a mutually convertible state, it may be regarded as having the same data structure as the content data. The above is the outline of the data structure of the content data in the content distribution server 102a.

次に、コンテンツデータを読み出す際のコンテンツ取得部１０２１の再生時刻管理処理（送信側）について、図１０のフローチャートを用いて説明する。 Next, the reproduction time management process (transmission side) of the content acquisition unit 1021 when reading content data will be described with reference to the flowchart of FIG.

まず、コンテンツ受信装置１０３ａからコンテンツの再生要求がされると、コンテンツ配信サーバ１０２ａのコンテンツ取得部１０２１は、以下に示す６つの変数の初期化を行う（ステップＳ１０１）。 First, when a content reproduction request is received from the content receiving apparatus 103a, the content acquisition unit 1021 of the content distribution server 102a initializes the following six variables (step S101).

１：コンテンツファイル名CF＝デフォルトのコンテンツファイル名（例えば“hoge.mp4”）
２：トラック番号TN＝デフォルトのトラック番号（例えば“1”）
３：開始フレーム番号FN＝デフォルトのフレーム番号（例えば“1”）
４：再生開始時刻STS＝システムクロック発振部１０２３から得た現在時刻NTS
５：コンテンツ再生時刻CTS＝“0”
６：再生状態フラグF＝真偽値“Ｎ”
次に、コンテンツ取得部１０２１は、自動的に再生を開始するか否かを示す予め定められた自動モード定数を参照し、自動再生を行うかどうか判定する（ステップＳ１０２）。自動モード定数が真偽値“Ｙ”である場合は、ステップＳ１１０の処理へ移行し、自動モード定数が真偽値“Ｎ”である場合には、ステップＳ１０３の処理へ移行する。 1: Content file name CF = Default content file name (eg "hoge.mp4")
2: Track number TN = default track number (eg "1")
3: Start frame number FN = default frame number (for example, “1”)
4: Playback start time STS = current time NTS obtained from the system clock oscillator 1023
5: Content playback time CTS = “0”
6: Playback state flag F = true value “N”
Next, the content acquisition unit 1021 refers to a predetermined automatic mode constant indicating whether or not to automatically start reproduction, and determines whether to perform automatic reproduction (step S102). When the automatic mode constant is the true value “Y”, the process proceeds to step S110, and when the automatic mode constant is the true value “N”, the process proceeds to step S103.

ステップＳ１０２の処理において、定数が真偽値“Ｎ”であった場合には、コンテンツ取得部１０２１は、外部入力（コマンド入力CMD、コンテンツファイル名CF1、トラック番号TN1、コンテンツ再生開始時刻CTS1の４つの変数の組）があるかどうか判定し（ステップＳ１０３）、外部入力がある場合に真偽値“Ｙ”となりステップＳ１０４の処理へ移行し、外部入力がない場合に真偽値“Ｎ”となりステップＳ１０６の処理へ移行する。 In the process of step S102, if the constant is a true / false value “N”, the content acquisition unit 1021 determines that the external input (command input CMD, content file name CF1, track number TN1, content reproduction start time CTS1 4) It is determined whether or not there is a set of two variables (step S103). If there is an external input, the truth value is “Y” and the process proceeds to step S104. If there is no external input, the truth value is “N”. The process proceeds to step S106.

ステップＳ１０３の処理において、自動モード定数が真偽値“Ｙ”であった場合には、コンテンツ取得部１０２１は、外部入力のうちコマンド入力CMD＝再生停止定数であるかを判定し（ステップＳ１０４）、コマンド入力CMD＝再生停止定数である場合は真偽値“Ｙ”となり、全体の動作を停止終了し、それ以外の場合は真偽値“Ｎ”となり、ステップＳ１０５の処理へ移行する。 In the process of step S103, when the automatic mode constant is a true / false value “Y”, the content acquisition unit 1021 determines whether command input CMD = reproduction stop constant among external inputs (step S104). When the command input CMD = reproduction stop constant, the truth value “Y” is obtained, and the entire operation is stopped. Otherwise, the truth value “N” is obtained, and the process proceeds to step S105.

ステップＳ１０４の処理において、真偽値“Ｎ”であった場合には、コンテンツ取得部１０２１は、外部入力（コンテンツファイル名CF1、トラック番号TN1、コンテンツ再生開始時刻CTS1）の３変数を用いて、まず５つの変数の初期化を行う（ステップＳ１０５）。 In the process of step S104, when the truth value is “N”, the content acquisition unit 1021 uses three variables of external input (content file name CF1, track number TN1, content playback start time CTS1), First, five variables are initialized (step S105).

１：コンテンツファイル名CF＝コンテンツファイル名CF1
２：トラック番号TN＝トラック番号TN1
３：再生開始時刻STS＝システムクロック発振部１０２３から得た現在時刻NTS
４：コンテンツ再生時刻CTS＝コンテンツ再生開始時刻CTS1
５：再生状態フラグF＝真偽値“Ｙ”
続いてコンテンツ取得部１０２１は、コンテンツ再生時刻CTSを用いた開始フレーム番号FNの初期化を行い、前段階として送出開始時刻情報StartTSを算出する。送出開始時刻情報StartTSの算出式を以下に示す。 1: Content file name CF = Content file name CF1
2: Track number TN = Track number TN1
3: Playback start time STS = current time NTS obtained from the system clock oscillator 1023
4: Content playback time CTS = Content playback start time CTS1
5: Playback state flag F = true value “Y”
Subsequently, the content acquisition unit 1021 initializes the start frame number FN using the content reproduction time CTS, and calculates transmission start time information StartTS as a previous step. The calculation formula of the transmission start time information StartTS is shown below.

＜数２＞
StartTS＝CTS×TRS
ここでトラックタイムスタンプ周波数TRSは、コンテンツファイル名CFの中のトラック番号TNが指し示すトラックのトラックタイムスタンプ周波数である。 <Equation 2>
StartTS = CTS × TRS
Here, the track time stamp frequency TRS is the track time stamp frequency of the track indicated by the track number TN in the content file name CF.

次に、コンテンツ取得部１０２１は、コンテンツファイル名CFの中のトラック番号TNが指し示すトラックのフレーム番号を１から順にサーチし、時刻情報≧StartTSが成り立つ地点のフレーム番号NFNを得て、フレーム番号NFNを開始フレーム番号FNに設定する。以上がステップＳ１０５の概要である。 Next, the content acquisition unit 1021 searches the frame number of the track indicated by the track number TN in the content file name CF in order from 1 to obtain the frame number NFN of the point where time information ≧ StartTS is satisfied, and the frame number NFN. Is set to the start frame number FN. The above is the outline of step S105.

また、ステップＳ１０３の処理において、自動モード定数が真偽値“Ｎ”であった場合には、コンテンツ取得部１０２１は、再生状態フラグの値が真偽値“Ｙ”であるかどうか判定し（ステップＳ１０６）、値が真偽値“Ｙ”である場合ステップＳ１０７の処理へ移行し、真偽値“Ｎ”である場合にはＳ１０３の処理へ移行する。 If the automatic mode constant is a true / false value “N” in the process of step S103, the content acquisition unit 1021 determines whether the value of the playback state flag is a true / false value “Y” ( In step S106), if the value is a true / false value “Y”, the process proceeds to step S107. If the value is a true / false value “N”, the process proceeds to step S103.

ステップＳ１０６の処理において再生状態フラグの値が真偽値“Ｙ”であった場合、またはステップＳ１０５から継続する処理の場合には、コンテンツ取得部１０２１は、システムクロック発振部１０２３から得た現在時刻NTSと再生開始時刻STSとシステムクロック周波数SSから経過時間を算出し、経過時間とコンテンツ再生時刻CTSとトラックタイムスタンプ周波数TRSから、送出可能時刻情報LastTSを算出することによって、送信可能範囲を検索する（ステップＳ１０７）。送出可能時刻情報LastTSの算出式を以下に示す。 In the case where the value of the reproduction state flag is the true / false value “Y” in the process of step S106, or in the case of the process continued from step S105, the content acquisition unit 1021 obtains the current time obtained from the system clock oscillation unit 1023. The elapsed time is calculated from NTS, playback start time STS, and system clock frequency SS, and the transmittable range is searched by calculating sendable time information LastTS from the elapsed time, content playback time CTS, and track timestamp frequency TRS. (Step S107). The formula for calculating the sendable time information LastTS is shown below.

＜数３＞
LastTS＝（NTS−STS）／SS×TRS
ここでシステムクロック周波数SSとは、（現在時刻NTS−再生開始時刻STS）を秒単位に変換するための定数である。例えばシステムクロックが1/1000000秒の精度を持っているとすると1000000である。 <Equation 3>
LastTS = (NTS−STS) / SS × TRS
Here, the system clock frequency SS is a constant for converting (current time NTS−reproduction start time STS) into seconds. For example, if the system clock has an accuracy of 1/1000000 second, it is 1000000.

次に、コンテンツ取得部１０２１は、コンテンツファイル名CFの中のトラック番号TNが指し示すトラック内の、開始フレーム番号FNからフレーム番号を順にサーチし、時刻情報≧LastTSが成り立つ点のフレーム番号NFNを取得する。条件を満たすフレーム番号NFNが求まらない場合は、コンテンツ取得部１０２１は、フレーム番号NFNに終端定数“−1”を設定する。 Next, the content acquisition unit 1021 sequentially searches the frame number from the start frame number FN in the track indicated by the track number TN in the content file name CF, and acquires the frame number NFN at which time information ≧ LastTS is satisfied. To do. When the frame number NFN that satisfies the condition is not found, the content acquisition unit 1021 sets the termination constant “−1” to the frame number NFN.

次に、コンテンツ取得部１０２１は、開始フレーム番号FNとフレーム番号NFNを比較することによって送信の可否を判定し（ステップＳ１０８）、FN＜NFNが成立する場合には真偽値“Ｙ”となりステップＳ１０９の処理へ移行し、成立しない場合は真偽値“Ｎ”となりステップＳ１０３の処理へ移行する。 Next, the content acquisition unit 1021 determines whether or not transmission is possible by comparing the start frame number FN and the frame number NFN (step S108). If FN <NFN is satisfied, the truth value “Y” is obtained. The process proceeds to S109. If not established, the truth value is “N”, and the process proceeds to Step S103.

ステップＳ１０８の処理において真偽値“Ｙ”であった場合には、コンテンツ取得部１０２１は、コンテンツファイル名CFの中のトラック番号TNが指し示すトラック内の、開始フレーム番号FNからフレーム番号NFN未満の各フレーム番号に対応するコンテンツ素片それぞれと、それぞれのコンテンツ素片について、対応する時刻情報から下記の算出式に従って生成されたネットワークタイムスタンプPTSを、送信部１０２２へ出力する（ステップＳ１０９）。 If the value is true or false in the process of step S108, the content acquisition unit 1021 determines that the track number TN in the track indicated by the track number TN in the content file name CF is less than the frame number NFN from the start frame number FN. For each content unit corresponding to each frame number, and for each content unit, the network time stamp PTS generated from the corresponding time information according to the following calculation formula is output to the transmission unit 1022 (step S109).

＜数４＞
PTS＝時刻情報／TRS×ネットワークタイムスタンプ周波数
出力後、送信済みの位置まで開始フレーム番号FNをずらすため、コンテンツ取得部１０２１は、開始フレーム番号FN＝フレーム番号NFNと設定する。 <Equation 4>
After outputting PTS = time information / TRS × network time stamp frequency, the content acquisition unit 1021 sets start frame number FN = frame number NFN to shift the start frame number FN to the transmitted position.

また、ステップＳ１０２の処理において、自動モード定数が真偽値“Ｙ”であった場合には、コンテンツ取得部１０２１は、再生状態フラグF＝真偽値“Ｙ”と設定し、再生開始時刻STS＝システムクロック発振部１０２３から得た現在時刻NTSを設定し、自動再生開始のための設定を行う（ステップＳ１１０）。以上がコンテンツ取得部１０２１の再生時刻管理処理（送信側）についての概要である。 If the automatic mode constant is the true / false value “Y” in the process of step S102, the content acquisition unit 1021 sets the reproduction state flag F = the true / false value “Y” and the reproduction start time STS. = The current time NTS obtained from the system clock oscillator 1023 is set, and the setting for starting automatic reproduction is performed (step S110). The above is the outline of the reproduction time management process (transmission side) of the content acquisition unit 1021.

次に、ネットワークタイムスタンプを用いたコンテンツ読取部１０３４の再生時刻管理処理（受信側）について、図１１のフローチャートを用いて説明する。 Next, the reproduction time management process (reception side) of the content reading unit 1034 using the network time stamp will be described with reference to the flowchart of FIG.

コンテンツ読取部１０３４は、コンテンツの再生を開始すると、バッファ部１０３２に一定量コンテンツ素片が蓄積されるまで蓄積量を監視し（ステップＳ１２１）、予め定められた蓄積量のしきい値を超えると、真偽値“Ｙ”となりステップＳ１２２の処理へ移行する。この蓄積量のしきい値は、単純に総バッファ量の半分（Half Full）でも良いし、予めコンテンツのビットレートが既知である場合は適切な計算して、バッファ部１０３２がバッファアンダーフローを起こさないように決定しても良い。 When the content reading unit 1034 starts to reproduce the content, the content reading unit 1034 monitors the accumulation amount until a certain amount of content pieces are accumulated in the buffer unit 1032 (step S121), and exceeds a predetermined accumulation amount threshold value. The true / false value becomes “Y”, and the process proceeds to step S122. The threshold value of the accumulation amount may be simply half of the total buffer amount (Half Full), or when the bit rate of the content is known in advance, the buffer unit 1032 causes a buffer underflow. You may decide not to.

次に、コンテンツ読取部１０３４は、システムクロック発振部１０３３を参照し、再生開始時刻STS＝システムクロック発振部１０２３から得た現在時刻NTSを設定し、読み出し開始のための設定を行う（ステップＳ１２２）。また、コンテンツ読取部１０３４は、タイムアウト変数TOUT＝タイムアウト定数（例えば“10”）に設定する。 Next, the content reading unit 1034 refers to the system clock oscillating unit 1033, sets the reproduction start time STS = the current time NTS obtained from the system clock oscillating unit 1023, and performs settings for starting reading (step S122). . The content reading unit 1034 sets a timeout variable TOUT = timeout constant (for example, “10”).

次に、コンテンツ読取部１０３４は、システムクロック発振部１０３３を参照して再度現在時刻NTSを更新した上で、下記の算出式に従って読み出すべきネットワークタイムスタンプPTSを算出する（ステップＳ１２３）。 Next, the content reading unit 1034 updates the current time NTS again with reference to the system clock oscillation unit 1033, and calculates the network time stamp PTS to be read according to the following calculation formula (step S123).

＜数５＞
PTS＝（NTS−STS）／SS×ネットワークタイムスタンプ周波数
次に、コンテンツ読取部１０３４は、ネットワークタイムスタンプPTSに基づいてバッファ部１０３２を検索し、ネットワークタイムスタンプPTSと同じかより小さいネットワークタイムスタンプPTSと関連付けられたコンテンツ素片を取り出す（ステップＳ１２４）。コンテンツ素片が１つも見つからない場合は真偽値“Ｎ”となり、コンテンツ読取部１０３４は、ステップＳ１２５の処理へ移行し、コンテンツ素片が見つかった場合は真偽値“Ｙ”となり、タイムアウト変数TOUTにタイムアウト定数（例えば“10”）をセットして、ステップＳ１２６の処理へ移行する。 <Equation 5>
PTS = (NTS−STS) / SS × network time stamp frequency Next, the content reading unit 1034 searches the buffer unit 1032 based on the network time stamp PTS, and the network time stamp PTS is equal to or smaller than the network time stamp PTS. The content segment associated with is taken out (step S124). If no content fragment is found, the truth value “N” is obtained, and the content reading unit 1034 proceeds to the processing of step S125. If the content fragment is found, the truth value “Y” is obtained, and a time-out variable is set. A timeout constant (for example, “10”) is set in TOUT, and the process proceeds to step S126.

ステップＳ１２４で真偽値“Ｎ”であった場合には、コンテンツ読取部１０３４は、一定時間（例えば0.5秒程度）待機した後、タイムアウト変数TOUTを“1”だけ減算し、タイムアウトが発生したか判定する（ステップＳ１２５）。減算したTOUTが0未満になった場合は真偽値“Ｙ”となり、コンテンツ読取部１０３４は、処理を終了し、真偽値“Ｎ”となれば、ステップＳ１２３の処理に移行する。結果的に、5秒（0.5×10）の間パケットが到達しなかった場合には、終了動作を行うことになる。 In the case where the truth value is “N” in step S124, the content reading unit 1034 waits for a certain time (for example, about 0.5 seconds), and then subtracts the time-out variable TOUT by “1” to determine whether a time-out has occurred. Determination is made (step S125). When the subtracted TOUT is less than 0, the true / false value “Y” is obtained, and the content reading unit 1034 ends the process. When the subtracted TOUT becomes the true / false value “N”, the process proceeds to step S123. As a result, if the packet does not arrive for 5 seconds (0.5 × 10), the end operation is performed.

また、ステップＳ１２４で真偽値“Ｙ”であった場合には、コンテンツ読取部１０３４は、ステップＳ１２４の処理で取り出されたコンテンツ素片を再生部１０３５へ出力する（ステップＳ１２６）。この際、コンテンツ素片を、再生部１０３５の希望に沿うように、適当な手段で並べ替えたり、コンテンツ素片同士を併合したりしても良い。例えば、RTPヘッダ内のシーケンス番号を用いて並べ替えた後に、同一のネットワークタイムスタンプ（RTPタイムスタンプ）を持つコンテンツ素片同士を結合しても良い。 If the value is true / false “Y” in step S124, the content reading unit 1034 outputs the content segment extracted in step S124 to the reproduction unit 1035 (step S126). At this time, the content pieces may be rearranged by an appropriate means so as to meet the request of the playback unit 1035, or the content pieces may be merged. For example, content pieces having the same network time stamp (RTP time stamp) may be combined after rearrangement using the sequence number in the RTP header.

なお、説明の便を考えて、受信パケットに付与されるネットワークタイムスタンプは常に0から始まることを仮定していたが、後述するRTSP等の、他の通信手順によって、予め送信開始時点のネットワークタイムスタンプ初期値OFSが既知である場合は、数５は、数６に変形できる。 For the convenience of explanation, it was assumed that the network time stamp given to the received packet always starts from 0. However, the network time at the start of transmission is preliminarily determined by other communication procedures such as RTSP described later. When the stamp initial value OFS is known, the equation 5 can be transformed into the equation 6.

＜数６＞
PTS＝（NTS−STS）／SS×ネットワークタイムスタンプ周波数＋OFS
以上がネットワークタイムスタンプを用いた、コンテンツ読取部１０３４の再生時刻管理処理（受信側）の概要である。 <Equation 6>
PTS = (NTS-STS) / SS x network time stamp frequency + OFS
The above is the outline of the reproduction time management process (reception side) of the content reading unit 1034 using the network time stamp.

次に、ストリーミングシステムにおける伝送制御機構について説明する。 Next, a transmission control mechanism in the streaming system will be described.

ストリーミングシステムにおいては、利用者の入力処理から任意のコンテンツを選択・再生・停止等の操作を受け付け、操作に応じてTCP／UDP等のセッションを動的に接続・切断する必要がある。これを伝送制御機構と呼ぶ。図１２に、伝送制御機構を備えるコンテンツ配信システム１０１ｂを構成するコンテンツ配信サーバ１０２ｂとコンテンツ受信装置１０３ｂの機能ブロック図を示す。コンテンツ配信サーバ１０２ｂとコンテンツ受信装置１０３ｂは、IPネットワーク１０４によって相互に接続される。 In a streaming system, it is necessary to select / play / stop operations such as arbitrary content from user input processing, and to dynamically connect / disconnect a session such as TCP / UDP according to the operation. This is called a transmission control mechanism. FIG. 12 shows a functional block diagram of the content distribution server 102b and the content receiving device 103b that constitute the content distribution system 101b having a transmission control mechanism. The content distribution server 102b and the content receiving device 103b are connected to each other by the IP network 104.

コンテンツ配信サーバ１０２ｂは、コンテンツ取得部１０２１、送信部１０２２、システムクロック発振部１０２３、コンテンツ記憶部１０２４、および伝送制御部１０２５を有し、また、コンテンツ受信装置１０３ｂは、受信部１０３１、バッファ部１０３２、システムクロック発振部１０３３、コンテンツ読取部１０３４、再生部１０３５、伝送制御部１０３６、および入力部１０３７を有する。なお、図８と同じものについては同じ番号を付し、その説明は省略する。 The content distribution server 102b includes a content acquisition unit 1021, a transmission unit 1022, a system clock oscillation unit 1023, a content storage unit 1024, and a transmission control unit 1025. The content reception device 103b includes a reception unit 1031 and a buffer unit 1032. A system clock oscillation unit 1033, a content reading unit 1034, a reproduction unit 1035, a transmission control unit 1036, and an input unit 1037. In addition, the same number is attached | subjected about the same thing as FIG. 8, and the description is abbreviate | omitted.

コンテンツ配信サーバ１０２ｂの伝送制御部１０２５は、コンテンツ受信装置１０３ｂからのコンテンツの選択、コンテンツの再生・停止等の指示を含む伝送制御情報を受信すると、この伝送制御情報を、コンテンツ取得部１０２１と、送信部１０２２に通知する機能を有する。また、伝送制御部１０２５は、コンテンツ記憶部１０２４を参照してコンテンツファイル名、トラック番号、コンテンツ再生開始時刻を取得し、それらをコンテンツ取得部１０２１に対して入力し、送信部１０２２に対して送信アドレスを入力する機能を有する。 When the transmission control unit 1025 of the content distribution server 102b receives transmission control information including instructions for selecting content, playing / stopping the content, etc. from the content receiving device 103b, the transmission control unit 1025 sends the transmission control information to the content acquisition unit 1021. A function of notifying the transmission unit 1022; Also, the transmission control unit 1025 refers to the content storage unit 1024, acquires the content file name, track number, and content playback start time, inputs them to the content acquisition unit 1021, and transmits them to the transmission unit 1022 It has a function to input an address.

適用するプロトコルとしては、RFC2326に規定されているRTSP（Real Time Streaming Protocol）に代表される実時間データ伝送制御用プロトコルを想定しており、Setup、Play、 Pause、Teardown、Describe等のメソッドを利用できる（H.Schulzrinne etal, "Real Time Streaming Protocol", RFC2326, Internet Engineering Taskforce, Apr 1998）。 The protocol to be applied is assumed to be a real-time data transmission control protocol typified by RTSP (Real Time Streaming Protocol) defined in RFC2326, and uses methods such as Setup, Play, Pause, Teardown, Describe, etc. Yes (H. Schulzrinne etal, "Real Time Streaming Protocol", RFC 2326, Internet Engineering Taskforce, Apr 1998).

また、コンテンツ受信装置１０３ｂの伝送制御部１０３６は、入力部１０３７からの伝送制御情報をコンテンツ配信サーバ１０２ｂの伝送制御部１０２５へ送信、その応答を受信し、応答を解析した後、受信アドレスを取り出し、受信部１０３１へ入力する機能を有する。 In addition, the transmission control unit 1036 of the content receiving apparatus 103b transmits the transmission control information from the input unit 1037 to the transmission control unit 1025 of the content distribution server 102b, receives the response, analyzes the response, and then extracts the reception address. , And a function of inputting to the receiving unit 1031.

また、コンテンツ受信装置１０３ｂの入力部１０３７は、利用者から、コンテンツの選択・コンテンツの再生・停止等の伝送制御情報の入力処理を受け付ける機能を有する。例えば、ビットマップディスプレイとキーボードを用いて、再生ボタン、停止ボタン、コンテンツ選択ダイヤログなどのＧＵＩ部品を利用者に提供する。 Further, the input unit 1037 of the content receiving apparatus 103b has a function of accepting transmission control information input processing such as content selection, content playback, and stop from the user. For example, using a bitmap display and a keyboard, GUI parts such as a play button, a stop button, and a content selection dialog are provided to the user.

次に、伝送制御情報の送受信時における伝送制御部１０２５および伝送制御部１０３６の動作について、図１３のシーケンス図を用いて説明する。図１３は、RTSPを用いて１つのコンテンツを選択し、伝送のためのセッションを確立し、再生を開始し、停止するまでに行われる処理である。なお、ここでは、コンテンツ配信サーバ１０２ｂのホスト名を“server.jvc-victor.jp”とし、コンテンツを蓄積しているコンテンツファイル名を“hoge.mp4”とし、“hoge.mp4”内部には１つのMPEG-4 AAC音声トラックが存在し、トラック番号は１であるとし、再生時間は235秒であるとする。 Next, operations of transmission control section 1025 and transmission control section 1036 at the time of transmission / reception of transmission control information will be described using the sequence diagram of FIG. FIG. 13 shows a process performed by selecting one content using RTSP, establishing a session for transmission, starting playback, and stopping. Here, the host name of the content distribution server 102b is “server.jvc-victor.jp”, the content file name storing the content is “hoge.mp4”, and “hoge.mp4” contains 1 Assume that there are two MPEG-4 AAC audio tracks, the track number is 1, and the playback time is 235 seconds.

RTSPは、コンテンツをネットワーク上で一意に特定するための資源識別子として、コンテンツURI（Uniform Resource Identifier）を適用する（T. Berners-Lee, "Uniform Resource Identifiers （URI）: Generic Syntax", RFC2396, Internet Engineering Taskforce, Aug 1998）。コンテンツ“hoge.mp4”は以下のようなコンテンツURIによって表される。 RTSP applies content URIs (Uniform Resource Identifiers) as resource identifiers to uniquely identify content on the network (T. Berners-Lee, "Uniform Resource Identifiers (URI): Generic Syntax", RFC2396, Internet Engineering Taskforce, Aug 1998). The content “hoge.mp4” is represented by the following content URI.

“rtsp://server.jvc-victor.jp/hoge.mp4”
まず、伝送制御部１０３６は、コンテンツURIを用いて、セッション記述を要求する（ステップＳ１３１）。このセッション記述とは、コンテンツURIに関連付けられたコンテンツデータに対して、どのようなセッション（通信路）が確立できるかを示したテキストデータである。セッション記述形式としては、SDP（Session Description Protocol）が適用される（M. Handley, " SDP: Session Description Protocol ", RFC2327, Internet Engineering Taskforce, April 1998）。RTSPにおいてセッション記述の要求にはDESCRIBEメソッドを使用する。なお、各メソッドおよびその応答メッセージ（その他付随する情報も含めて）は、それぞれリクエストメッセージのヘッダおよびレスポンスメッセージのヘッダに挿入されて送受信される。 “Rtsp: //server.jvc-victor.jp/hoge.mp4”
First, the transmission control unit 1036 requests a session description using the content URI (step S131). The session description is text data indicating what kind of session (communication path) can be established for the content data associated with the content URI. As a session description format, Session Description Protocol (SDP) is applied (M. Handley, “SDP: Session Description Protocol”, RFC2327, Internet Engineering Taskforce, April 1998). Use the DESCRIBE method to request a session description in RTSP. Each method and its response message (including other accompanying information) are inserted and received in the header of the request message and the header of the response message, respectively.

“DESCRIBE rtsp://server.jvc-victor.jp/hoge.mp4 RTSP/1.0”
伝送制御部１０２５は、セッション記述を含んだ応答を送信する（ステップＳ１３２）。セッション記述には、再生時間と、メディア記述が含まれる。この再生時間とは、指定したコンテンツURIに関連付けられた連続メディアの最大の再生時間である。再生時間のフォーマットは多種類あるが、最も簡単なフォーマットは、開始時間と終了時間を、浮動小数点を用いた秒数で表したNPT（Normal Play Time, ISO8601）である。例えば、235秒分のMPEG-4 AAC音声トラックを含んだコンテンツファイルを指し示すコンテンツURIの再生時間を、NPTを用いて表すと以下のようになる。 “DESCRIBE rtsp: //server.jvc-victor.jp/hoge.mp4 RTSP / 1.0”
The transmission control unit 1025 transmits a response including the session description (step S132). The session description includes a playback time and a media description. This playback time is the maximum playback time of continuous media associated with the specified content URI. There are many types of playback time formats, but the simplest format is NPT (Normal Play Time, ISO8601) in which the start time and end time are expressed in seconds using floating point. For example, the reproduction time of a content URI indicating a content file including an MPEG-4 AAC audio track for 235 seconds is expressed as follows using NPT.

“a=range:npt=0.0-235.0”
また、メディア記述には、セッションを確立するための事前情報として、コンテンツ種別やネットワークタイムスタンプ周波数などの情報を含む。 “A = range: npt = 0.0-235.0”
Also, the media description includes information such as content type and network time stamp frequency as prior information for establishing a session.

“m=audio 0 RTP/AVP/UDP 96”
“a=rtpmap:96 mpeg4-generic/48000/2”
“a=control:rtsp://server.jvc-victor.jp/hoge.mp4/trackID=1”
“a=fmtp:96 streamtype=5; profile-level-id=15; mode=AAC-hbr; config=1190; SizeLength=13; IndexLength=3; IndexDeltaLength=3; Profile=1;”
上記のメディア記述は、MPEG-4 AAC Hi-bitrate符号化方式で符号化された48000Hz（48KHz）PCMサンプル周波数のステレオ音声が含まれていることを表現している。また、セッションを確立する際には、UDPを下位ネットワークプロトコルとし、RTPを適用して伝送しなければならないことが示されている。 “M = audio 0 RTP / AVP / UDP 96”
“A = rtpmap: 96 mpeg4-generic / 48000/2”
“A = control: rtsp: //server.jvc-victor.jp/hoge.mp4/trackID=1”
“A = fmtp: 96 streamtype = 5; profile-level-id = 15; mode = AAC-hbr; config = 1190; SizeLength = 13; IndexLength = 3; IndexDeltaLength = 3; Profile = 1;”
The above media description expresses that stereo sound of 48000 Hz (48 KHz) PCM sample frequency encoded by the MPEG-4 AAC Hi-bitrate encoding method is included. Also, it is shown that when establishing a session, UDP must be used as a lower layer network protocol and RTP should be applied for transmission.

また、このセッションの準備要求には以下のコントロールURIを使うことが示されている。コントロールURIは、1つのコンテンツデータ内の複数トラックを識別するためにトラック番号（trackID=1）が付与されている。 Also, it is shown that the following control URI is used for the preparation request of this session. The control URI is given a track number (trackID = 1) to identify a plurality of tracks in one content data.

“rtsp://server.jvc-victor.jp/hoge.mp4/trackID=1”
セッション記述を解析した伝送制御部１０３６は、MPEG-4 AACステレオ音声を伝送するためのセッションを確立することを決定し、セッションの受信アドレスを決定し、セッション確立準備要求を伝送制御部１０２５へ送信する（ステップＳ１３３）。RTSPにおけるセッション確立準備要求にはSETUPメソッドを用いる。下記の例では、受信アドレスは、伝送制御部１０３６の持つ受信アドレス（136.198.190.100）と受信ポート番号（6668-6669）となっている。 “Rtsp: //server.jvc-victor.jp/hoge.mp4/trackID=1”
After analyzing the session description, the transmission control unit 1036 determines to establish a session for transmitting MPEG-4 AAC stereo audio, determines a reception address of the session, and transmits a session establishment preparation request to the transmission control unit 1025. (Step S133). The SETUP method is used for a session establishment preparation request in RTSP. In the following example, the reception address is the reception address (136.198.190.100) and the reception port number (6668-6669) that the transmission control unit 1036 has.

“SETUP rtsp://server.jvc-victor.jp/hoge.mp4/trackID=1 RTSP/1.0”
“Transport: RTP/AVP/UDP;unicast;destination=136.198.190.100;client_port
=6668-6669”
セッション確立準備要求を正しく受信した伝送制御部１０２５は、新たに送信アドレスを決定し、送信アドレスを送信部１０２２へ入力し、あわせて、セッション情報を伝送制御部１０３６へ送信する（ステップＳ１３４）。セッション情報には、配信に用いる配信サーバの送信アドレス（例では136.198.190.1）や送信ポート番号（下記の例では19000〜19001）が含まれる。 “SETUP rtsp: //server.jvc-victor.jp/hoge.mp4/trackID=1 RTSP / 1.0”
“Transport: RTP / AVP / UDP; unicast; destination = 136.198.190.100; client_port
= 6668-6669 ”
Upon receiving the session establishment preparation request correctly, the transmission control unit 1025 newly determines a transmission address, inputs the transmission address to the transmission unit 1022, and transmits session information to the transmission control unit 1036 (step S134). The session information includes a transmission address (136.198.190.1 in the example) and a transmission port number (19000 to 19001 in the following example) used for distribution.

“Transport: RTP/AVP/UDP;unicast;source=136.198.190.1;server_port=19000
-19001”
また、伝送制御部１０３６では、セッション情報を受け付けた時点で、先に決定しておいた受信アドレスを受信部１０３１へ入力する。送信部１０２２と受信部１０３１双方の処理が完了した時点で、新たなセッションが確立される。 “Transport: RTP / AVP / UDP; unicast; source = 136.198.190.1; server_port = 19000
-19001 ”
In addition, the transmission control unit 1036 inputs the previously determined reception address to the reception unit 1031 when the session information is received. A new session is established when the processing of both the transmission unit 1022 and the reception unit 1031 is completed.

次に、伝送制御部１０３６は、再生開始要求を送信する（ステップＳ１３５）。再生開始要求では、再生範囲を、NPTを用いて指定することができる。下記の例では、コンテンツの最初（0.0秒）から最後（235.0秒）までの指定している。RTSPにおいて再生開始要求にはPLAYメソッドを用いる。 Next, the transmission control unit 1036 transmits a reproduction start request (step S135). In the playback start request, the playback range can be specified using NPT. In the example below, the content is specified from the beginning (0.0 seconds) to the end (235.0 seconds). The PLAY method is used for a playback start request in RTSP.

“PLAY rtsp://server.jvc-victor.jp/hoge.mp4 RTSP/1.0”
“Range: npt=0.0-235.0”
伝送制御部１０２５は、再生開始要求の応答を送信する（ステップＳ１３６）。再生開始要求を受け付けた伝送制御部１０２５は、既に準備の済んでいる全てのセッション対して、各セッションのコントロールURIと再生開始要求に含まれる再生範囲から、コンテンツファイル名・トラック番号・コンテンツ再生開始時刻を算出し、コマンド入力＝再生開始定数（例えば“PLAY”）と合わせて、コンテンツ取得部１０２１に入力し、RTPパケットの送信を開始する。 “PLAY rtsp: //server.jvc-victor.jp/hoge.mp4 RTSP / 1.0”
“Range: npt = 0.0-235.0”
The transmission control unit 1025 transmits a response to the reproduction start request (step S136). The transmission control unit 1025 that has received the reproduction start request, for all the sessions that have already been prepared, from the control URI of each session and the reproduction range included in the reproduction start request, the content file name / track number / content reproduction start The time is calculated and input to the content acquisition unit 1021 together with command input = reproduction start constant (eg, “PLAY”), and transmission of the RTP packet is started.

また、この応答は、コンテンツ受信装置１０３ｂのコンテンツ読取部１０３４の動作に必要となるRTPパケット情報を含む。例えば、このセッションにおいて、最初に送られてくるRTPパケットのRTPタイムスタンプ（ネットワークタイムスタンプ）などが含まれる（下記の例では0000000）。 This response also includes RTP packet information necessary for the operation of the content reading unit 1034 of the content receiving device 103b. For example, in this session, the RTP time stamp (network time stamp) of the first RTP packet sent is included (0000000 in the following example).

“RTP-Info: url= rtsp://server.jvc-victor.jp/hoge.mp4/trackID=1;rtptime
=0000000;”
伝送制御部１０３６は、利用者が停止入力を行った時点で停止要求を送信する（ステップＳ１３７）。RTSPにおいて停止要求はTEARDOWNメソッドである。 “RTP-Info: url = rtsp: //server.jvc-victor.jp/hoge.mp4/trackID=1; rtptime
= 0000000; ”
The transmission control unit 1036 transmits a stop request when the user makes a stop input (step S137). In RTSP, the stop request is the TEARDOWN method.

“TEARDOWN rtsp://server.jvc-victor.jp/hoge.mp4 RTSP/1.0”
停止要求を受け付けた伝送制御部１０２５は、送信部１０２２を制御し、RTPパケットの送信を停止、セッションを切断し、送信を停止したことを通知する（ステップＳ１３８）。また、伝送制御部１０２５は、コンテンツ取得部１０２１に対して、コマンド入力＝再生停止定数（例えば“STOP”）を入力し、読み出しを停止する。 “TEARDOWN rtsp: //server.jvc-victor.jp/hoge.mp4 RTSP / 1.0”
The transmission control unit 1025 that has received the stop request controls the transmission unit 1022 to stop the transmission of the RTP packet, disconnect the session, and notify that the transmission is stopped (step S138). In addition, the transmission control unit 1025 inputs command input = reproduction stop constant (for example, “STOP”) to the content acquisition unit 1021 and stops reading.

なお、説明の簡単化のために、１つのコンテンツデータに1つのコンテンツのみが含まれると仮定していたが、前記の伝送制御機構は容易に複数のコンテンツの同期再生に拡張可能である。例えば、ステップＳ１３２において、複数のコンテンツに関するメディア記述を列挙し、ステップＳ１３３〜ステップＳ１３４の準備要求をメディア記述分繰り返すだけで、複数のコンテンツの同期再生が容易に行うことが可能である。以上がストリーミングシステムにおける伝送制御機構の概要である。 For simplification of explanation, it is assumed that only one content is included in one content data. However, the transmission control mechanism can be easily extended to synchronous reproduction of a plurality of contents. For example, it is possible to easily perform synchronized playback of a plurality of contents simply by listing media descriptions regarding a plurality of contents in step S132 and repeating the preparation requests in steps S133 to S134 for the media descriptions. The above is the outline of the transmission control mechanism in the streaming system.

ここで、コンテンツ再生中に、動的に、滑らかに音声や映像の切替を行おうとした場合、非同期モデルを採用したストリーミングシステムと、動的なセッション生成を伴う伝送制御機構を備えたストリーミングシステムとでは、２つ以上のコンテンツを再生途中から滑らかにビットストリーム切り替えの動作を行うことはできない。 Here, when trying to switch between audio and video dynamically and smoothly during content playback, a streaming system employing an asynchronous model and a streaming system equipped with a transmission control mechanism with dynamic session generation Therefore, it is not possible to smoothly perform the bitstream switching operation from the middle of reproduction of two or more contents.

例えば、235秒分の再生時間を持つ２つのコンテンツ（トラック番号１および２とする）のうち、トラック番号１のコンテンツを再生しており、利用者がちょうど100秒目でトラック番号２のコンテンツへ切り替える指示を入力部１０３７へ入力したとする。 For example, out of two contents (track numbers 1 and 2) having a playback time of 235 seconds, the content of track number 1 is being played back, and the user moves to the content of track number 2 in the 100th second. It is assumed that a switching instruction is input to the input unit 1037.

コンテンツ受信装置１０３ｂは、非同期モデルを採用しているから、利用者から見た100秒目とは、コンテンツ読取部１０３４内部で計算されている時刻差分（現在時刻NTS−再生開始時刻STS）であり、既にバッファ部１０３２には、コンテンツ配信サーバ１０２ｂから送信済みの100秒目以降のコンテンツ素片が、既に蓄積されているか、その一部がIPネットワーク１０４上のルーターやスイッチングハブに滞留していると考えられる。 Since the content receiving apparatus 103b employs an asynchronous model, the 100th second seen from the user is a time difference (current time NTS−reproduction start time STS) calculated inside the content reading unit 1034. In the buffer unit 1032, the content pieces after the 100th second already transmitted from the content distribution server 102 b have already been accumulated, or a part of them has stayed in the router or switching hub on the IP network 104. it is conceivable that.

この状況において、トラック番号１のセッションを切断し（ステップＳ１３７）、トラック番号２のコンテンツに対してステップＳ１３３〜ステップＳ１３５の一連の通信手順を行うことを考えた場合、ステップＳ１３５における再生範囲は以下のような指定せざるを得ない。 In this situation, when the session of track number 1 is disconnected (step S137) and a series of communication procedures from step S133 to step S135 are performed on the content of track number 2, the reproduction range in step S135 is as follows. It must be specified like this.

“PLAY rtsp://server.jvc-victor.jp/hoge.mp4 RTSP/1.0”
“Range: npt=100.0-235.0”
ステップＳ１３６の処理直後における、バッファ部１０３２に蓄積されているコンテンツ素片の順列を、図１４に示す。再生時刻100秒のコンテンツの時刻情報をT1とし、時系列順にT2,T3,T4とする。図１４では、トラック番号１のコンテンツはコンテンツ素片４まで蓄積され、続いてトラック番号２のコンテンツがコンテンツ素片１から蓄積されていることが示されている。このコンテンツ素片の順列で利用者が視聴した場合、コンテンツはT4からT1へ戻るため、一瞬巻き戻ったように感じ、滑らかには切り替わらず、激しい違和感が生じることになる。 “PLAY rtsp: //server.jvc-victor.jp/hoge.mp4 RTSP / 1.0”
“Range: npt = 100.0-235.0”
FIG. 14 shows a permutation of content pieces stored in the buffer unit 1032 immediately after the process of step S136. Let T1 be the time information of content with a playback time of 100 seconds, and T2, T3, and T4 in chronological order. FIG. 14 shows that the content of track number 1 is accumulated up to the content segment 4, and subsequently the content of track number 2 is accumulated from the content segment 1. When the user views in this permutation of content pieces, the content returns from T4 to T1, so it feels like it has been wound for a moment, and it does not switch smoothly, and a severe discomfort occurs.

また、トラック番号１のセッションをステップＳ１３７の処理で切断した時点からトラック番号２のセッション確立準備の要求を受信するまでにかかった時間が、既にバッファ部１０３２に蓄積されているパケットの総再生時間よりも長いとバッファアンダーフローを起こし、コンテンツ読取部１０３４のタイムアウト変数TOUTが満了すれば、再生が停止することになる。
同様に、トラック番号１のセッションをステップＳ１３７の処理で切断し、トラック番号２のコンテンツに対してステップＳ１３３〜ステップＳ１３５までの一連の通信手順にかかる時間を計測し、ステップＳ１３５の処理におけるトラック番号２のNPTに補正をかけることで違和感を少なくする手法が考えられる。例えば、一連の通信手順に5.5秒かかったとしたら、npt=105.5-235.0に補正する。 The time taken from the time when the session with track number 1 is disconnected in the process of step S137 until the request for preparation for session establishment with track number 2 is received is the total playback time of packets already accumulated in the buffer unit 1032. If it is longer than this, a buffer underflow occurs, and when the timeout variable TOUT of the content reading unit 1034 expires, the reproduction stops.
Similarly, the session of track number 1 is disconnected in the process of step S137, the time taken for a series of communication procedures from step S133 to step S135 is measured for the content of track number 2, and the track number in the process of step S135 is measured. A method to reduce the sense of incongruity by correcting the NPT of 2 can be considered. For example, if it takes 5.5 seconds for a series of communication procedures, it is corrected to npt = 105.5-235.0.

しかし、この手法を適用してもトラック番号２の再生開始要求がコンテンツ配信サーバ１０２ｂへ到達して実際にパケット送信が開始されるまでの時間遅延を考慮することができず、バッファ内部の全てのパケットについてコンテンツ再生時刻が完全に連続するように構成することは困難である。 However, even if this method is applied, it is not possible to consider the time delay until the reproduction start request for track number 2 reaches the content distribution server 102b and the packet transmission is actually started. It is difficult to configure the content playback time for packets to be completely continuous.

上記の問題に対処する第１の改善方法として、特開２００２−１１８５９２（特許文献１）が開示されている。 Japanese Patent Laid-Open No. 2002-118592 (Patent Document 1) is disclosed as a first improvement method for coping with the above problem.

これは配信サーバと、クライアント間に、二次配信サーバを設け、二次配信サーバが切り替え後のビットストリームをある程度バッファリングしてから切り替えることで、クライアントのバッファ部が空にならず、コンテンツを途切れなく切り替えることができるというものである。 This is because a secondary delivery server is provided between the delivery server and the client, and the secondary delivery server buffers the bitstream after switching to some extent and then switches, so that the client buffer is not emptied and the content is It can be switched without interruption.

しかし、特許文献１では、複数の配信サーバから配信されるビットストリームの各々は互いに時間的に独立であって、フレーム境界が同期していない。つまり、バッファアンダーフローは防げるが、コンテンツの巻き戻り感に関しては無力であり、コンテンツ再生時刻を完全に連続するように構成することは困難である。 However, in Patent Document 1, each of bitstreams distributed from a plurality of distribution servers is temporally independent from each other, and frame boundaries are not synchronized. That is, buffer underflow can be prevented, but the content rewinding feeling is ineffective, and it is difficult to configure the content playback time to be completely continuous.

上記の問題に対処する第２の改善方法として、多くのストリーミングシステムでは、複数トラック同時受信構成を採用している。一例として、２トラック同時受信可能なコンテンツ配信システム１０１ｃを構成するコンテンツ配信サーバ１０２ｃとコンテンツ受信装置１０３ｃの機能ブロック図を、図１５に示す。 As a second improvement method for coping with the above problem, many streaming systems adopt a multi-track simultaneous reception configuration. As an example, FIG. 15 shows a functional block diagram of the content distribution server 102c and the content reception device 103c that constitute the content distribution system 101c capable of receiving two tracks simultaneously.

コンテンツ配信サーバ１０２ｃは、コンテンツ取得部１０２１Ａ、送信部１０２２Ａ、コンテンツ取得部１０２１Ｂ、送信部１０２２Ｂ、システムクロック発振部１０２３、およびコンテンツ記憶部１０２４を有する。また、コンテンツ受信装置１０３ｃは、受信部１０３１Ａ、バッファ部１０３２Ａ、コンテンツ読取部１０３４Ａ、受信部１０３１Ｂ、バッファ部１０３２Ｂ、コンテンツ読取部１０３４Ｂ、システムクロック発振部１０３３、再生部１０３５、伝送制御部１０３６、入力部１０３７、および切替部１０３８を有する。なお、図１２と同じものについては同じ番号を付し、その説明は省略する。 The content distribution server 102c includes a content acquisition unit 1021A, a transmission unit 1022A, a content acquisition unit 1021B, a transmission unit 1022B, a system clock oscillation unit 1023, and a content storage unit 1024. In addition, the content receiving device 103c includes a receiving unit 1031A, a buffer unit 1032A, a content reading unit 1034A, a receiving unit 1031B, a buffer unit 1032B, a content reading unit 1034B, a system clock oscillation unit 1033, a reproduction unit 1035, a transmission control unit 1036, an input Part 1037 and switching part 1038. In addition, the same number is attached | subjected about the same thing as FIG. 12, and the description is abbreviate | omitted.

切替部１０３８は、トラック切替入力に従って、２つのコンテンツ素片のうち、どちらか一方を採用して再生部１０３５へ入力する機能を有する。採用しなかった方のコンテンツ素片は、読み出しはするものの、そのまま破棄される。 The switching unit 1038 has a function of adopting one of the two content segments and inputting it to the playback unit 1035 in accordance with the track switching input. Although the content piece that has not been adopted is read, it is discarded as it is.

なお、入力部１０３７は、利用者が入力するトラック番号１または２を切り替えるためのトラック切替入力を受け付けるものとする。 Note that the input unit 1037 accepts a track switching input for switching the track number 1 or 2 input by the user.

コンテンツ受信装置１０３ｃでは、複数の同期するトラックがあった場合、ステップＳ１３３〜ステップＳ１３４に相当する一連の準備要求・返答をトラック数分繰り返してから、ステップＳ１３５に相当する再生要求を開始し、全てのトラック分のストリームを受信し、受信側で複数のトラックのうち、１つを再生するような切替部１０３８をコンテンツ受信装置１０３ｃに備えることで対処している。 When there are a plurality of synchronized tracks, the content receiving apparatus 103c repeats a series of preparation requests / responses corresponding to steps S133 to S134 for the number of tracks, and then starts a reproduction request corresponding to step S135. This is dealt with by providing the content receiving apparatus 103c with a switching unit 1038 that receives a stream corresponding to the number of tracks and reproduces one of a plurality of tracks on the receiving side.

図１６は、複数トラック同時受信構成を有するコンテンツ受信装置１０３ｃの動作を示したフローチャートである。ステップＳ１４１の処理以外、全ての処理（ステップＳ１４２〜ステップＳ１４６）は、図１１のステップＳ１２２〜ステップＳ１２６と同一なので、その説明を省略する。 FIG. 16 is a flowchart showing the operation of the content receiving apparatus 103c having a multiple-track simultaneous reception configuration. Since all the processes (steps S142 to S146) other than the process of step S141 are the same as steps S122 to S126 of FIG.

複数のトラックが互いに同期するためには、バッファ部の蓄積量の多少の差異（到着する再生コンテンツ素片の時間揺らぎ）に依存せず、一斉に再生開始する必要がある。そこで、再生を選択されているトラックのコンテンツを受信する側のコンテンツ読取部１０３４（ＡまたはＢ）は、コンテンツ受信装置１０３ｃ内全てのバッファ部１０３２（ＡおよびＢ）を監視し、一定量蓄積されるまで待機し、バッファ部のいずれか一つが予め定められた蓄積量のしきい値を超えると、真偽値“Ｙ”を生成し、次の処理へ移行する（ステップＳ１４１）。これは言い換えれば、複数のトラック間における再生開始時刻STSが全く同一となることを意図している。 In order for a plurality of tracks to synchronize with each other, it is necessary to start reproduction all at once without depending on a slight difference in the accumulation amount of the buffer unit (time fluctuation of the reproduction content pieces that arrive). Therefore, the content reading unit 1034 (A or B) on the side that receives the content of the track that is selected for playback monitors all the buffer units 1032 (A and B) in the content receiving device 103c and accumulates a certain amount. If any one of the buffer units exceeds a predetermined accumulation amount threshold value, a true / false value “Y” is generated and the process proceeds to the next process (step S141). In other words, it is intended that the reproduction start times STS between a plurality of tracks are exactly the same.

従来は、このようにしてコンテンツを配信していたが、以下に示す問題点を抱えていた。 Conventionally, content was distributed in this way, but had the following problems.

まず、特許文献１の二次配信サーバは、複数の配信サーバから送信された複数のコンテンツを伝送するビットストリームを切り替え、クライアントに提供することができるが、各配信サーバが共有する同期クロックを有することなく、同一時刻に同一のネットワークタイムスタンプを付与したパケットによって構成されるビットストリームを送信するのではないため、コンテンツ内のコンテンツ再生時刻の増分を完全に一致させ、滑らかに切り替えることができない。 First, the secondary distribution server of Patent Document 1 can switch bitstreams for transmitting a plurality of contents transmitted from a plurality of distribution servers and provide them to clients, but has a synchronization clock shared by each distribution server. In addition, since the bit stream constituted by the packets with the same network time stamp added at the same time is not transmitted, the increment of the content reproduction time in the content cannot be completely matched and switched smoothly.

また、複数トラック同時受信構成のコンテンツ配信システム１０１では次のような問題点を抱えていた。 Further, the content distribution system 101 configured to receive multiple tracks simultaneously has the following problems.

図１２と図１５のコンテンツ配信システム１０１ｂおよび１０１ｃにおいては、各トラックのビットレートの総和を越えるネットワークスループットが必要である。例えば、128KbpsのMPEG-4 AAC音声トラックが20本存在した場合、必要なネットワークスループットは20倍の2.5Mbpsとなる。特に、無線網を利用した比較的低速の通信回線を利用する場合は、スループットが足りずにサービス不能に陥る可能性がある。 In the content distribution systems 101b and 101c shown in FIGS. 12 and 15, a network throughput exceeding the sum of the bit rates of the tracks is required. For example, if there are 20 128 Kbps MPEG-4 AAC audio tracks, the required network throughput is 20 times 2.5 Mbps. In particular, when a relatively low-speed communication line using a wireless network is used, there is a possibility that the throughput becomes insufficient and the service becomes impossible.

特に図１５のコンテンツ受信装置１０３ｃにおいては、トラック数分のバッファ部（図１５中、１０３２Ａおよび１０３２Ｂ）を備える必要がある。また、コンテンツ受信装置１０３ｃは、非同期モデルを採用しているため、異なるビットストリームを同時に受信し、管理するためには、バッファ部の記憶容量を増やす必要がある。特に、携帯電話やＰＤＡ（Personal Digital Assistant＝携帯情報端末）など、メモリ搭載量に制限のある機器をコンテンツ受信装置として用いる場合は搭載できない可能性がある。
特開２００２−１１８５９２号公報 In particular, the content receiving apparatus 103c in FIG. 15 needs to include as many buffer units (1032A and 1032B in FIG. 15) as the number of tracks. Further, since the content receiving apparatus 103c employs an asynchronous model, it is necessary to increase the storage capacity of the buffer unit in order to simultaneously receive and manage different bit streams. In particular, when a device having a limited amount of memory, such as a mobile phone or a PDA (Personal Digital Assistant = portable information terminal), is used as a content receiving device, it may not be mounted.
JP 2002-118592 A

本発明は、上記事情に鑑みてなされたものであり、ネットワークを介して、複数トラックから構成されるコンテンツデータを送信するコンテンツ配信サーバと当該コンテンツデータを受信再生するコンテンツ受信装置からなるコンテンツ配信システムにおいて、トラックを動的に切り替えて送信し、切り替えられた箇所を滑らかに再生することを可能にするコンテンツ配信方法、およびコンテンツ配信方法を用いたコンテンツ配信サーバ、およびコンテンツ受信装置に関する。 The present invention has been made in view of the above circumstances, and a content distribution system comprising a content distribution server that transmits content data composed of a plurality of tracks via a network, and a content reception device that receives and reproduces the content data. The present invention relates to a content distribution method, a content distribution server using the content distribution method, and a content receiving apparatus that can dynamically switch and transmit tracks and smoothly reproduce the switched portion.

上記目的を達成するために、請求項１に記載のコンテンツ配信方法は、複数のコンテンツ素片データを有するコンテンツデータを記憶しているコンテンツ配信サーバと、このコンテンツ配信サーバに対して、ネットワークを介してコンテンツの再生要求を行い、その再生要求に応じてコンテンツ配信サーバからネットワークを介して送信されるコンテンツ素片データを受信し再生するコンテンツ受信装置とを備えたコンテンツ配信システムにおけるコンテンツ配信方法であって、コンテンツ配信サーバに記憶されているコンテンツデータは、ファイル名と複数のトラックとを有するコンテンツデータであり、前記各トラックのトラックデータは、トラック番号と複数のデータフレームとを有し、前記各データフレームは、フレーム番号、再生のタイミングを示す時刻情報、およびコンテンツ素片データを有するものであり、同一の前記フレーム番号における各トラックの前記データフレームが、同一の前記時刻情報を有するものであり、前記コンテンツ受信装置から前記コンテンツ配信サーバへ再生を要求する前記コンテンツデータの前記ファイル名を送信し、これに応じて前記コンテンツ配信サーバから送信される少なくとも予め設定されている前記トラック番号、および前記ファイル名のコンテンツデータにおける前記予め設定されているトラック番号のトラックの再生時間長を示す再生時間情報を前記コンテンツ受信装置が受信し、前記コンテンツ受信装置が所望の前記トラック番号を指定して前記コンテンツ配信サーバへ送信することにより、前記コンテンツ受信装置−前記コンテンツ配信サーバ間のセッションを確立するセッション確立工程と、前記セッションの確立後、コンテンツの再生開始要求を行う際に、前記コンテンツ受信装置において、当該再生を要求する前記コンテンツデータの前記ファイル名、および再生する時間範囲を示す再生範囲情報を含むコンテンツの再生開始要求メッセージを、前記コンテンツ受信装置から前記コンテンツ配信サーバに送信する工程と、前記コンテンツ配信サーバにおいて、前記再生開始要求メッセージを受信した場合に、当該再生開始要求メッセージ内の前記再生範囲情報に従って、送信すべき前記コンテンツ素片データを含む前記データフレームの前記フレーム番号を特定し、この特定された前記フレーム番号、当該再生開始要求メッセージ内の前記ファイル名、および前記セッション確立時に前記コンテンツ受信装置が指定した前記トラック番号によって指定される、再生要求された前記コンテンツ素片データを前記記憶されているコンテンツデータから取得し、前記コンテンツ受信装置に送信する工程と、前記セッションの確立後、再生対象トラックの切替要求を行う際に、前記コンテンツ受信装置において、再生トラック切替のための切替先トラック番号を含む切替要求メッセージを前記コンテンツ受信装置から前記コンテンツ配信サーバに送信する工程と、前記コンテンツ配信サーバにおいて、前記再生開始要求メッセージ受信後に前記切替要求メッセージを受信した場合に、当該切替要求メッセージを受信した時点での既送の前記コンテンツ素片データのうち再生順序が最後となっている前記コンテンツ素片データを含む前記データフレームにおける前記時刻情報に基づき、この再生順序が最後となっている前記コンテンツ素片データに連続して再生されるべき前記コンテンツ素片データを含む前記データフレームの前記フレーム番号を特定し、この特定された前記フレーム番号、および当該切替要求メッセージ内の前記切替先トラック番号によって指定される前記コンテンツ素片データを、前記記憶されているコンテンツデータから取得し、コンテンツ受信装置に送信する工程とを有することを特徴とする。 In order to achieve the above object, a content distribution method according to claim 1 includes a content distribution server storing content data having a plurality of content segment data, and the content distribution server via a network. A content distribution method in a content distribution system comprising: a content reception device that requests content reproduction and receives and reproduces content fragment data transmitted from the content distribution server via the network in response to the reproduction request. The content data stored in the content distribution server is content data having a file name and a plurality of tracks, and the track data of each track has a track number and a plurality of data frames. Data frame, frame number, playback Time information indicating timing and content fragment data, and the data frames of the respective tracks in the same frame number have the same time information, and the content distribution from the content receiving device The file name of the content data requesting reproduction is transmitted to a server, and at least the track number set in advance transmitted from the content distribution server in response to the file name, and the preset in the content data of the file name The content reception device receives the reproduction time information indicating the reproduction time length of the track of the track number being set, and the content reception device designates the desired track number and transmits it to the content distribution server, thereby Content receiver-front A session establishing step for establishing a session between content distribution servers, and when making a content reproduction start request after the establishment of the session, the file name of the content data requesting the reproduction in the content receiving device, and A step of transmitting a content reproduction start request message including reproduction range information indicating a reproduction time range from the content receiving device to the content distribution server, and when the content distribution server receives the reproduction start request message. The frame number of the data frame including the content fragment data to be transmitted is identified according to the reproduction range information in the reproduction start request message, and the identified frame number and the reproduction start request message The file name, And obtaining the content fragment data requested to be played, which is designated by the track number designated by the content receiving device when the session is established, from the stored content data, and transmitting the content piece data to the content receiving device; When the playback request track switching request is made after the session is established, the content receiving device sends a switching request message including a switching destination track number for switching the playback track from the content receiving device to the content distribution server. And when the switching request message is received after receiving the playback start request message in the content distribution server, the playback order of the content piece data already sent at the time when the switching request message is received Is the last Based on the time information in the data frame including the content fragment data, the data frame including the content fragment data to be reproduced continuously with the content fragment data whose reproduction order is last. A frame number is specified, and the content fragment data specified by the specified frame number and the switching destination track number in the switching request message is acquired from the stored content data, and content reception is performed. And transmitting to the apparatus.

本発明において“コンテンツ素片データ”とは、コンテンツを任意のデータ長で分割したデータブロックのことを指す。 In the present invention, “content segment data” refers to a data block obtained by dividing content by an arbitrary data length.

また、請求項２に記載のコンテンツ配信サーバは、コンテンツ受信装置からの再生要求に従って、複数のコンテンツ素片データを有するコンテンツデータから再生要求されたコンテンツ素片データを選択し、選択したコンテンツ素片データをネットワークを介して前記コンテンツ受信装置へ送信するコンテンツ配信サーバであって、ファイル名と複数のトラックとを有するコンテンツデータであり、前記各トラックのトラックデータは、トラック番号と複数のデータフレームとを有し、前記各データフレームは、フレーム番号、再生のタイミングを示す時刻情報、およびコンテンツ素片データを有するものであり、同一の前記フレーム番号における各トラックの前記データフレームが、同一の前記時刻情報を有するコンテンツデータを記憶するコンテンツ記憶手段と、前記コンテンツ受信装置から再生を要求する前記コンテンツデータの前記ファイル名を受信し、これに応じて少なくとも予め設定されている前記トラック番号、および前記ファイル名のコンテンツデータにおける前記予め設定されているトラック番号のトラックの再生時間長を示す再生時間情報を当該コンテンツ受信装置へ送信し、その後、前記コンテンツ受信装置によって指定された所望の前記トラック番号を受信することにより、前記コンテンツ受信装置との間のセッションを確立し、前記セッションの確立後、前記コンテンツ受信装置においてコンテンツの再生開始要求がなされた際に、前記コンテンツ受信装置から、再生を要求する前記コンテンツデータの前記ファイル名、および再生する時間範囲を示す再生範囲情報を含むコンテンツの再生開始要求メッセージを受信し、前記セッションの確立後、前記コンテンツ受信装置において再生対象トラックの切替要求がなされた際に、前記コンテンツ受信装置から、再生トラック切替のための切替先トラック番号を含む切替要求メッセージを受信するための伝送制御手段と、前記伝送制御手段が前記再生開始要求メッセージを受信した場合に、当該再生開始要求メッセージ内の前記再生範囲情報に従って、送信すべき前記コンテンツ素片データを含む前記データフレームの前記フレーム番号を特定し、この特定された前記フレーム番号、当該再生開始要求メッセージ内の前記ファイル名、および前記セッション確立時に前記コンテンツ受信装置が指定した前記トラック番号によって指定される、再生要求された前記コンテンツの前記コンテンツ素片データを前記コンテンツ記憶手段から取得し、前記伝送制御手段が前記再生開始要求メッセージ受信後に前記切替要求メッセージを受信した場合に、当該切替要求メッセージを受信した時点での既送の前記コンテンツ素片データのうち再生順序が最後となっている前記コンテンツ素片データを含む前記データフレームにおける前記時刻情報に基づき、この再生順序が最後となっている前記コンテンツ素片データに連続して再生されるべき前記コンテンツ素片データを含む前記データフレームの前記フレーム番号を特定し、この特定された前記フレーム番号、および当該切替要求メッセージ内の前記切替先トラック番号によって指定される前記コンテンツ素片データを、前記コンテンツ記憶手段から取得するコンテンツ取得手段と、前記コンテンツ取得手段が取得した前記コンテンツ素片データを前記コンテンツ受信装置へ送信する送信手段とを備えることを特徴とする。 The content distribution server according to claim 2 selects content fragment data requested to be reproduced from content data having a plurality of content fragment data in accordance with a reproduction request from the content receiving device, and selects the selected content fragment. A content distribution server for transmitting data to the content receiving device via a network, wherein the content data has a file name and a plurality of tracks. The track data of each track includes a track number, a plurality of data frames, And each data frame has a frame number, time information indicating reproduction timing, and content fragment data, and the data frames of the tracks in the same frame number have the same time. Store content data with information And receiving the file name of the content data to be reproduced from the content storage means and the content receiving device, and at least the track number set in advance according to the file name and the preset in the content data of the file name The content receiving apparatus transmits the reproduction time information indicating the reproduction time length of the track of the track number being set to the content receiving apparatus, and then receives the desired track number designated by the content receiving apparatus. A session with the content data, and after the establishment of the session, when a content playback request is made in the content receiving device, the content receiving device requests the file name of the content data to be played back, and Indicates the time range to play When a content playback start request message is received including content range information and the session is established in the content receiving device after the session is established, the content receiving device switches the playback track. A transmission control unit for receiving a switching request message including a switching destination track number; and when the transmission control unit receives the reproduction start request message, the transmission control unit transmits the switching request message according to the reproduction range information in the reproduction start request message. The frame number of the data frame including the content fragment data to be identified is identified, and the identified frame number, the file name in the reproduction start request message, and the content receiving device specified when the session is established Playback required specified by the track number When the content fragment data of the obtained content is acquired from the content storage means, and when the transmission control means receives the switching request message after receiving the reproduction start request message, the time when the switching request message is received Based on the time information in the data frame including the content segment data whose playback order is the last among the content segment data already sent in the above, the content segment whose playback order is the last The frame number of the data frame including the content fragment data to be continuously reproduced in the data is specified, and is specified by the specified frame number and the switching destination track number in the switching request message. The content segment data from the content storage means A content obtaining means for obtaining, and a sending means for sending the content fragment data which the content acquisition unit has acquired to the content receiving apparatus.

また、請求項３に記載のコンテンツ受信装置は、複数のコンテンツ素片データを有するコンテンツデータを記憶しているコンテンツ配信サーバに対して、ネットワークを介してコンテンツの再生要求を行い、その再生要求に応じてコンテンツ配信サーバからネットワークを介して送信されるコンテンツ素片データを受信し再生するコンテンツ受信装置であって、前記コンテンツ配信サーバに記憶されているコンテンツデータは、ファイル名と複数のトラックとを有するコンテンツデータであり、前記各トラックのトラックデータは、トラック番号と複数のデータフレームとを有し、前記各データフレームは、フレーム番号、再生のタイミングを示す時刻情報、およびコンテンツ素片データを有するものであり、同一の前記フレーム番号における各トラックの前記データフレームが、同一の前記時刻情報を有するものであり、再生を要求する前記コンテンツデータの前記ファイル名を送信し、これに応じて前記コンテンツ配信サーバから送信される少なくとも予め設定されている前記トラック番号、および前記ファイル名のコンテンツデータにおける前記予め設定されているトラック番号のトラックの再生時間長を示す再生時間情報を受信し、その後、所望の前記トラック番号を指定して前記コンテンツ配信サーバへ送信することにより、前記コンテンツ配信サーバとの間のセッションを確立し、前記セッションの確立後、コンテンツの再生開始要求を行う際に、再生を要求する前記コンテンツデータの前記ファイル名、および再生する時間範囲を示す再生範囲情報を含むコンテンツの再生開始要求メッセージを前記コンテンツ配信サーバに送信し、前記セッションの確立後、再生対象トラックの切替要求を行う際に、再生トラック切替のための切替先トラック番号を含む切替要求メッセージをコンテンツ配信サーバに送信する伝送制御手段と、前記コンテンツ配信サーバにおいて、前記再生開始要求メッセージ受信後に前記切替要求メッセージを受信した場合に、当該切替要求メッセージを受信した時点での既送の前記コンテンツ素片データのうち再生順序が最後となっている前記コンテンツ素片データを含む前記データフレームにおける前記時刻情報に基づき、この再生順序が最後となっている前記コンテンツ素片データに連続して再生されるべき前記コンテンツ素片データを含む前記データフレームの前記フレーム番号が特定され、この特定された前記フレーム番号、および当該切替要求メッセージ内の前記切替先トラック番号によって指定されて送信される前記コンテンツ素片データを前記コンテンツ配信サーバから受信する受信手段とを備えることを特徴とする。 In addition, the content receiving device according to claim 3 makes a content reproduction request via a network to a content distribution server that stores content data having a plurality of content segment data, and responds to the reproduction request. In response, the content receiving apparatus receives and plays back the content fragment data transmitted from the content distribution server via the network, and the content data stored in the content distribution server includes a file name and a plurality of tracks. The track data of each track has a track number and a plurality of data frames, and each data frame has a frame number, time information indicating reproduction timing, and content fragment data. At the same frame number. The data frame of each track has the same time information, transmits the file name of the content data requesting reproduction, and is transmitted at least in advance from the content distribution server accordingly. Receiving the playback time information indicating the playback time length of the track having the preset track number in the content data having the track number and the file name, and then specifying the desired track number to specify the content By establishing a session with the content distribution server by transmitting to the distribution server, and when making a content reproduction start request after the session has been established, the file name of the content data that requests reproduction, and Content containing playback range information indicating the time range to be played When a playback start request message is transmitted to the content distribution server and a switching request for a playback target track is made after the session is established, a switching request message including a switching destination track number for switching the playback track is sent to the content distribution server. In the transmission control means for transmitting and the content distribution server, when the switching request message is received after receiving the reproduction start request message, the content piece data already sent at the time of receiving the switching request message Based on the time information in the data frame including the content segment data whose playback order is last, the content segment to be played back continuously with the content segment data whose playback order is last The frame number of the data frame including fragment data And receiving means for receiving the content fragment data specified and transmitted by the specified frame number and the switching destination track number in the switching request message from the content distribution server. It is characterized by.

本発明によれば、コンテンツ配信サーバ側で、トラック切替個所におけるデータフレームの再生時間情報の整合を図ってデータフレームを取得送信するので、コンテンツ受信装置では、トラック切替箇所を滑らかに再生することが可能となる。特に、各トラックのコンテンツデータ同士が時間的な推移の関連性の高いコンテンツデータである場合に（例えば同じ楽曲を日本語で歌ったデータと英語で歌ったデータとである場合に）、トラック切替箇所を非常に滑らかに再生することが可能となる。 According to the present invention, the content distribution server side acquires and transmits the data frame by matching the reproduction time information of the data frame at the track switching location, so that the content receiving device can smoothly reproduce the track switching location. It becomes possible. In particular, when the content data of each track is highly related to the temporal transition (for example, when the same song is sung in Japanese and English), the track is switched. It becomes possible to reproduce the portion very smoothly.

また、冗長なデータフレームをコンテンツ受信装置へ送信する必要が無いので、コンテンツ受信装置側で多量のデータフレームを蓄積するバッファメモリを必要とせず、コンテンツ受信装置の製造コストを低減することが可能となる。また、通信回線もデータフレーム送信用に１回線しか専有しないので、低スループットの通信回線でも良好なサービスを提供することが可能となる。 Further, since there is no need to transmit redundant data frames to the content receiving device, it is possible to reduce the manufacturing cost of the content receiving device without requiring a buffer memory for storing a large amount of data frames on the content receiving device side. Become. Further, since only one communication line is exclusively used for data frame transmission, it is possible to provide a good service even with a low-throughput communication line.

本発明の実施形態を、図１〜図７を用いて説明する。 An embodiment of the present invention will be described with reference to FIGS.

図１に、コンテンツ配信システム１ａ（非同期モデル）におけるコンテンツ配信サーバ２とコンテンツ受信装置３ａの機能ブロック図を示す。コンテンツ配信サーバ２とコンテンツ受信装置３ａは、IPネットワーク４によって相互に接続される。 FIG. 1 shows a functional block diagram of the content distribution server 2 and the content receiving device 3a in the content distribution system 1a (asynchronous model). The content distribution server 2 and the content receiving device 3a are connected to each other by the IP network 4.

コンテンツ配信サーバ２は、コンテンツ取得部２１、伝送制御部２２、送信部２３、システムクロック発振部２４、およびコンテンツ記憶部２５を有する。 The content distribution server 2 includes a content acquisition unit 21, a transmission control unit 22, a transmission unit 23, a system clock oscillation unit 24, and a content storage unit 25.

コンテンツ取得部２１は、再生するコンテンツのコンテンツデータをコンテンツ記憶部２５に記録し、システムクロック信号に従って、伝送制御情報に基づいて任意のポインタ（コンテンツ上の読み出し位置）を用いてコンテンツ素片を取得する機能を有する。このポインタとは、コンテンツファイル名、トラック番号、フレーム番号の３つのうちいずれか、もしくはそれらのうちいくつかの組み合わせである。また、“コンテンツ素片”とは、任意のデータ長を有するデータブロックのことを指す。 The content acquisition unit 21 records content data of the content to be reproduced in the content storage unit 25, and acquires a content fragment using an arbitrary pointer (read position on the content) based on the transmission control information in accordance with the system clock signal. It has the function to do. This pointer is any one of the content file name, the track number, and the frame number, or some combination thereof. The “content segment” refers to a data block having an arbitrary data length.

コンテンツ取得部２１は、内部にコンテンツファイル名、トラック番号、フレーム番号等の初期値を持ち、起動時から自動的に読み出しを開始しても良いし、外部入力からコンテンツファイル名、トラック番号、コンテンツ再生開始時刻を得てから読み出しを開始しても良い。また、コマンド入力を受け付ける外部入力を設け、コマンド入力＝再生開始定数（例えば“PLAY”）である場合は再生するようにしても良いし、コマンド入力＝再生停止定数（例えば“STOP”）である場合は停止するようにしても良い。あるいは自動読み出し開始動作と外部読み出し開始動作を自動的に判別するようにしても良い。 The content acquisition unit 21 has initial values such as a content file name, a track number, and a frame number inside, and may automatically start reading from the time of start-up, or the content file name, track number, and content from an external input Reading may be started after the reproduction start time is obtained. Further, an external input for receiving command input is provided, and when command input = reproduction start constant (for example, “PLAY”), reproduction may be performed, or command input = reproduction stop constant (for example, “STOP”). If so, it may be stopped. Alternatively, the automatic read start operation and the external read start operation may be automatically determined.

伝送制御部２２は、コンテンツ受信装置３ｂからのコンテンツの選択、コンテンツの再生・停止、トラック切替要求等の指示を含む伝送制御情報を受信すると、この伝送制御情報を、コンテンツ取得部２１と、送信部２３に通知する機能を有する。また、伝送制御部２２は、コンテンツ記憶部２５を参照してコンテンツファイル名、トラック番号、コンテンツ再生開始時刻を取得し、それらをコンテンツ取得部２１に対して入力し、送信部２３に対して送信アドレスを入力する機能を有する。 When the transmission control unit 22 receives transmission control information including instructions such as content selection, content playback / stop, and track switching request from the content receiving device 3b, the transmission control unit 22 transmits the transmission control information to the content acquisition unit 21. The function of notifying the unit 23 is provided. Also, the transmission control unit 22 refers to the content storage unit 25 to acquire the content file name, track number, and content playback start time, inputs them to the content acquisition unit 21, and transmits them to the transmission unit 23. It has a function to input an address.

適用するプロトコルとしては、RFC2326に規定されているRTSPに代表される実時間データ伝送制御用プロトコルを想定しており、Setup、Play、 Pause、Teardown、Describe等のメソッドを利用できる。 As a protocol to be applied, a real-time data transmission control protocol typified by RTSP defined in RFC2326 is assumed, and methods such as Setup, Play, Pause, Teardown, and Describe can be used.

送信部２３は、セッションと呼ぶ仮想的な通信路を確保し、コンテンツ素片を適切なヘッダを付加してパケット化し、そのパケットを、IPネットワーク４を介してコンテンツ受信装置３ａへ送信する機能を有する。例えば、ヘッダにはネットワークタイムスタンプが含まれる。 The transmission unit 23 has a function of securing a virtual communication path called a session, packetizing the content fragment with an appropriate header, and transmitting the packet to the content reception device 3a via the IP network 4. Have. For example, the header includes a network time stamp.

パケットを送信する際には、送信部２３は、コンテンツ受信装置３ａへパケットを届けるために送信アドレスを用いる。送信アドレスは、送信アドレスと送信ポート番号の組で構成される。送信アドレスは送信部２３内部の定数として予め持っていても良いし、他のブロックから入力を受け付けても良い。なお、パケットを送信するためのセッション（通信路）は送信アドレスが確定した時点で動的に確立するものとする。また、コンテンツ素片とパケットが必ずしも一致する必要はない。 When transmitting a packet, the transmission unit 23 uses a transmission address to deliver the packet to the content receiving device 3a. The transmission address is composed of a combination of a transmission address and a transmission port number. The transmission address may be previously stored as a constant in the transmission unit 23, or an input may be received from another block. Note that a session (communication path) for transmitting a packet is dynamically established when a transmission address is determined. Further, the content segment and the packet do not necessarily match.

システムクロック発振部２４は、水晶発振子等で実現されたリアルタイムクロック（高精度の時計）であり、システムクロック信号をコンテンツ取得部２１へ供給する機能を有する。 The system clock oscillator 24 is a real-time clock (high-precision clock) realized by a crystal oscillator or the like, and has a function of supplying a system clock signal to the content acquisition unit 21.

また、コンテンツ記憶部２５は、コンテンツデータを所定のファイル形式で記憶する機能を有する。 The content storage unit 25 has a function of storing content data in a predetermined file format.

また、コンテンツ受信装置３ａは、受信部３１、伝送制御部３２、バッファ部３３、システムクロック発振部３４、コンテンツ読取部３５、再生部３６、および入力部３７を有する。 In addition, the content receiving device 3 a includes a receiving unit 31, a transmission control unit 32, a buffer unit 33, a system clock oscillation unit 34, a content reading unit 35, a playback unit 36, and an input unit 37.

受信部３１は、IPネットワーク４からパケットを受信して、ヘッダを解釈して、コンテンツ素片を取り出し、バッファ部３３に供給する機能を有する。受信部３１は、コンテンツ配信サーバ２からのパケットのうち自己に必要なパケットのみを受信するために受信アドレスを用いる。受信アドレスは、受信アドレスと受信ポート番号の組で構成される。受信アドレスは受信部３１内部の定数として予め持っていても良いし、他のブロックから入力を受け付けても良い。 The receiving unit 31 has a function of receiving a packet from the IP network 4, interpreting the header, taking out a content fragment, and supplying it to the buffer unit 33. The receiving unit 31 uses the reception address in order to receive only packets necessary for itself among the packets from the content distribution server 2. The reception address is composed of a combination of a reception address and a reception port number. The reception address may be previously stored as a constant inside the reception unit 31, or input may be received from another block.

伝送制御部３２は、入力部３７からの伝送制御情報をコンテンツ配信サーバ２の伝送制御部へ送信、その応答を受信し、応答を解析した後、受信アドレスを取り出し、受信部３１へ入力する機能を有する。 The transmission control unit 32 has a function of transmitting the transmission control information from the input unit 37 to the transmission control unit of the content distribution server 2, receiving the response, analyzing the response, extracting the reception address, and inputting the received address to the reception unit 31. Have

バッファ部３３は、コンテンツ素片をヘッダから得られたネットワークタイムスタンプ（後述）と共に一次的に蓄積記憶する機能を有する。 The buffer unit 33 has a function of temporarily accumulating and storing the content pieces together with a network time stamp (described later) obtained from the header.

コンテンツ読取部３５は、バッファ部３３を監視し、再生に十分なコンテンツ素片が蓄積されたと判断した時点から、システムクロック発振部３４からのシステムクロック信号に従ってコンテンツ素片の読み出しを開始し、コンテンツ素片の集合をコンテンツに復元して出力する機能を有する。 The content reading unit 35 monitors the buffer unit 33 and starts reading the content unit in accordance with the system clock signal from the system clock oscillation unit 34 when it is determined that the content unit sufficient for reproduction has been accumulated. It has a function of restoring and outputting a set of segments as content.

再生部３６は、入力されたコンテンツ素片に応じた復号化を行い、スピーカ等の所定の出力装置に対して音声信号、映像信号を出力する機能を有する。例えば、コンテンツ素片がMPEGに類する高能率符号化されたデジタル音声データである場合、再生部３６は、高能率符号復号器（デコーダ）、D/A変換器、アナログアンプ等を有して構成される。 The playback unit 36 has a function of performing decoding according to the input content segment and outputting an audio signal and a video signal to a predetermined output device such as a speaker. For example, when the content segment is digital audio data encoded with high efficiency similar to MPEG, the playback unit 36 includes a high efficiency code decoder (decoder), a D / A converter, an analog amplifier, and the like. Is done.

入力部３７は、利用者から、コンテンツの選択・コンテンツの再生・停止、トラック切替等の伝送制御情報の入力処理を受け付ける機能を有する。例えば、ビットマップディスプレイとキーボードを用いて、再生ボタン、停止ボタン、コンテンツ選択ダイヤログなどのＧＵＩ部品を利用者に提供し、また、テンキーや数段階のスライドスイッチなどを用いて、トラック番号の入力を受け付ける操作スイッチを提供する。 The input unit 37 has a function of accepting transmission control information input processing such as content selection, content playback / stop, and track switching from the user. For example, using a bitmap display and a keyboard, GUI parts such as a play button, a stop button, and a content selection dialog are provided to the user, and a track number is input using a numeric keypad and several stages of slide switches. Provide an operation switch that accepts.

次に、本実施形態におけるコンテンツデータのデータ構造について説明する。 Next, the data structure of content data in this embodiment will be described.

本実施形態におけるビットストリーム切り替え方法の特徴は、トラック切替要求が入力され、トラックを切り替える際に、切り替え前のトラックを走査していた開始フレーム番号FNをそのまま用いることで、切り替え前と切り替え後のコンテンツ再生時刻が完全に連続した形になる点にある（詳細は後述）。 The feature of the bitstream switching method according to the present embodiment is that when a track switching request is input and the track is switched, the start frame number FN that has scanned the track before switching is used as it is, so that the before and after switching The content playback time is in a completely continuous form (details will be described later).

基本的には、図９に示す従来のコンテンツデータと同様に、本実施形態におけるコンテンツデータは、コンテンツファイル名、総トラック数Ｔ、および各トラックのトラックデータから構成され、さらに各トラックのトラックデータは、トラック番号、トラックタイムスタンプ周波数、総フレーム数、および複数のデータフレームから構成され、さらに個々のデータフレームは、フレーム番号、時刻情報、コンテンツ素片から構成される。 Basically, like the conventional content data shown in FIG. 9, the content data in this embodiment is composed of a content file name, the total number of tracks T, and track data of each track, and further, track data of each track. Is composed of a track number, a track time stamp frequency, the total number of frames, and a plurality of data frames, and each data frame is composed of a frame number, time information, and a content fragment.

コンテンツ受信装置３ａの受信部３１、バッファ部３３、コンテンツ読取部３５、再生部３６の各部が、従来のものと何ら変更がない状態で、コンテンツ配信サーバ２から複数トラック分のコンテンツ素片がコンテンツ受信装置３ａへ到達した際、滑らかに再生を継続させるためには、少なくともコンテンツファイル中にトラック切替可能な複数のトラックの相互に、コンテンツ素片が割り当てられている各データフレームにおいて、コンテンツ再生時刻上での時間的な境界が一致している（同じフレーム番号のコンテンツ素片は同じ再生時刻で再生される）必要がある。 Content pieces for a plurality of tracks are received from the content distribution server 2 while the receiving unit 31, the buffer unit 33, the content reading unit 35, and the playback unit 36 of the content receiving device 3a are not changed from the conventional ones. In order to continue the reproduction smoothly when reaching the receiving device 3a, at least the content reproduction time in each data frame in which the content segment is assigned to the plurality of tracks that can be switched in the content file. The above temporal boundaries must match (content pieces having the same frame number are played back at the same playback time).

言い換えれば、あるトラックＡとあるトラックＢに、同一のフレーム番号FNAおよびFNBが存在する時、FNAに付随する時刻情報／トラックＡのトラックタイムスタンプ周波数と、FNBに付随する時刻情報／トラックＡのトラックタイムスタンプ周波数とが、一致している必要がある。 In other words, when the same frame numbers FNA and FNB exist in a certain track A and a certain track B, the time information accompanying the FNA / track time stamp frequency of the track A and the time information accompanying the FNB / the time information of the track A The track timestamp frequency must match.

図２に、複数トラックを有するコンテンツデータのデータ構造（時間軸に沿って並べたもの）を示す。横軸をコンテンツ再生時刻（各々の時刻情報／トラックタイムスタンプ周波数）とする。 FIG. 2 shows a data structure of content data having a plurality of tracks (arranged along the time axis). The horizontal axis represents the content playback time (each time information / track time stamp frequency).

図２中、コンテンツ素片、FN1，FN2…FNn，FNm等は、コンテンツ素片に割り当てられたフレーム番号である。１トラックのフレーム番号の総数は一致している必要はなく、図２では、トラック１は総フレーム数がｎ、トラック２は総フレーム数がｍとなっている。また、コンテンツのデータが、MPEG-4 AAC hi-bitrate符号化などの高能率符号化を施されている場合は、１つのコンテンツ素片に、複数の高能率符号化の圧縮パケットを含んでいてもかまわない。 In FIG. 2, content pieces, FN1, FN2,... FNn, FNm, etc. are frame numbers assigned to the content pieces. The total number of frame numbers of one track does not need to match, and in FIG. 2, track 1 has a total number of frames n and track 2 has a total number of frames m. If the content data has been subjected to high-efficiency encoding such as MPEG-4 AAC hi-bitrate encoding, a single content segment includes a plurality of high-efficiency encoded compressed packets. It doesn't matter.

しかし、１つのコンテンツ素片に含まれる複数の圧縮パケットを再生した際の時間長の合計は、２つ以上のトラックから同一フレーム番号のコンテンツ素片を取り出した時、一致していなければならない。そのため、同一フレーム番号FNxにおける各トラックのコンテンツ素片xの時刻情報xは、各トラックのトラックタイムスタンプ周波数が同じ値であれば、必ず同じ値を有している。 However, the total time length when reproducing a plurality of compressed packets included in one content segment must match when content segments having the same frame number are extracted from two or more tracks. Therefore, the time information x of the content segment x of each track in the same frame number FNx always has the same value if the track timestamp frequency of each track is the same value.

従って、図２中の矢印（Ａ）で示すように、フレーム番号FN2でトラック１からトラック２へと再生途中で切り替えても、滑らかに再生を継続することができる。 Therefore, as indicated by an arrow (A) in FIG. 2, even if the frame number FN2 is switched from the track 1 to the track 2 during the reproduction, the reproduction can be continued smoothly.

次に、伝送制御情報の送受信時におけるコンテンツ配信サーバ２の伝送制御部２２およびコンテンツ受信装置３ａの伝送制御部３２の動作について、図３のシーケンス図を用いて説明する。図２は、RTSPを用いて１つのコンテンツを選択し、伝送のためのセッションを確立し、再生を開始し、停止するまでに行われる処理であり、以下に示す送受信されるメッセージは、伝送制御情報に含まれる。 Next, operations of the transmission control unit 22 of the content distribution server 2 and the transmission control unit 32 of the content receiving device 3a at the time of transmission / reception of transmission control information will be described with reference to the sequence diagram of FIG. FIG. 2 shows a process performed until one content is selected using RTSP, a session for transmission is established, playback is started, and stopped. Included in the information.

なお、ここでは、コンテンツ配信サーバ２のホスト名を“server.jvc-victor.jp”とし、コンテンツを蓄積しているコンテンツファイル名を“hoge.mp4”とし、“hoge.mp4”内部には１つのMPEG-4 AAC音声トラックが存在し、トラック番号は１であるとし、再生時間は235秒であるとする。 Here, the host name of the content distribution server 2 is “server.jvc-victor.jp”, the content file name that stores the content is “hoge.mp4”, and “hoge.mp4” contains 1 Assume that there are two MPEG-4 AAC audio tracks, the track number is 1, and the playback time is 235 seconds.

RTSPは、コンテンツをネットワーク上で一意に特定するための資源識別子として、コンテンツURIを適用する。コンテンツ“hoge.mp4”は以下のようなコンテンツURIによって表される。 RTSP applies the content URI as a resource identifier for uniquely identifying the content on the network. The content “hoge.mp4” is represented by the following content URI.

“rtsp://server.jvc-victor.jp/hoge.mp4”
まず、伝送制御部３２は、コンテンツURIを用いて、セッション記述を要求する（ステップＳ０１）。このセッション記述とは、コンテンツURIに関連付けられたコンテンツデータに対して、どのようなセッション（通信路）が確立できるかを示したテキストである。セッション記述形式としては、SDPが適用される。RTSPにおいてセッション記述の要求にはDESCRIBEメソッドを使用する。なお、各メソッドおよびその応答メッセージ（その他付随する情報も含めて）は、それぞれリクエストメッセージのヘッダおよびレスポンスメッセージのヘッダに挿入されて送受信される。 “Rtsp: //server.jvc-victor.jp/hoge.mp4”
First, the transmission control unit 32 requests a session description using the content URI (step S01). This session description is text indicating what kind of session (communication path) can be established for the content data associated with the content URI. SDP is applied as the session description format. Use the DESCRIBE method to request a session description in RTSP. Each method and its response message (including other accompanying information) are inserted and received in the header of the request message and the header of the response message, respectively.

“DESCRIBE rtsp://server.jvc-victor.jp/hoge.mp4 RTSP/1.0”
伝送制御部２２は、セッション記述を含んだ応答を送信する（ステップＳ０２）。セッション記述には、再生時間と、メディア記述が含まれる。この再生時間とは、指定したコンテンツURIに関連付けられた連続メディアの最大の再生時間長である。再生時間のフォーマットは多種類あるが、最も簡単なフォーマットは、開始時間と終了時間を、浮動小数点を用いた秒数で表したNPTである。例えば、235秒分のMPEG-4 AAC音声トラックを含んだコンテンツデータを指し示すコンテンツURIの再生時間を、NPTを用いて表すと以下のようになる。 “DESCRIBE rtsp: //server.jvc-victor.jp/hoge.mp4 RTSP / 1.0”
The transmission control unit 22 transmits a response including the session description (step S02). The session description includes a playback time and a media description. This playback time is the maximum playback time length of continuous media associated with the specified content URI. There are many different playback time formats, but the simplest format is NPT, which represents the start time and end time in seconds using floating point. For example, the playback time of a content URI indicating content data including an MPEG-4 AAC audio track for 235 seconds is expressed as follows using NPT.

“rtsp://server.jvc-victor.jp/hoge.mp4/trackID=1”
セッション記述を解析した伝送制御部３２は、MPEG-4 AACステレオ音声を伝送するためのセッションを確立するため、セッションの受信アドレスを決定し、セッション確立準備の要求を伝送制御部２２へ送信する（ステップＳ０３）。RTSPにおけるセッション確立準備要求にはSETUPメソッドを用いる。下記の例では、受信アドレスは、伝送制御部３２の持つ受信アドレス（136.198.190.100）と受信ポート番号（6668-6669）となっている。 “Rtsp: //server.jvc-victor.jp/hoge.mp4/trackID=1”
The transmission control unit 32 that has analyzed the session description determines a reception address of the session and transmits a session establishment preparation request to the transmission control unit 22 in order to establish a session for transmitting MPEG-4 AAC stereo audio ( Step S03). The SETUP method is used for a session establishment preparation request in RTSP. In the following example, the reception address is the reception address (136.198.190.100) and the reception port number (6668-6669) that the transmission control unit 32 has.

“SETUP rtsp://server.jvc-victor.jp/hoge.mp4/trackID=1 RTSP/1.0”
“Transport: RTP/AVP/UDP;unicast;destination=136.198.190.100;client_port
=6668-6669”
セッション確立準備要求を正しく受信した伝送制御部２２は、新たに送信アドレスを決定し、送信アドレスを送信部２３へ入力し、あわせて、セッション情報を伝送制御部３２へ送信する（ステップＳ０４）。セッション情報には、配信に用いる配信サーバの送信アドレス（例では136.198.190.1）や送信ポート番号（下記の例では19000〜19001）が含まれる。 “SETUP rtsp: //server.jvc-victor.jp/hoge.mp4/trackID=1 RTSP / 1.0”
“Transport: RTP / AVP / UDP; unicast; destination = 136.198.190.100; client_port
= 6668-6669 ”
The transmission control unit 22 that has correctly received the session establishment preparation request newly determines a transmission address, inputs the transmission address to the transmission unit 23, and transmits session information to the transmission control unit 32 (step S04). The session information includes a transmission address (136.198.190.1 in the example) and a transmission port number (19000 to 19001 in the following example) used for distribution.

“Transport: RTP/AVP/UDP;unicast;source=136.198.190.1;server_port=19000
-19001”
また、伝送制御部３２では、セッション情報を受け付けた時点で、先に決定しておいた受信アドレスを受信部３１へ入力する。送信部２３と受信部３１の双方の処理が完了した時点で、新たなセッションが確立される。 “Transport: RTP / AVP / UDP; unicast; source = 136.198.190.1; server_port = 19000
-19001 ”
Further, the transmission control unit 32 inputs the previously determined reception address to the reception unit 31 when the session information is received. A new session is established when the processing of both the transmission unit 23 and the reception unit 31 is completed.

次に、伝送制御部３２は、伝送制御部２２に対して、利用者の入力操作に応じた再生するコンテンツのトラックを指定するトラック切替要求（CHANGEメソッド）を送信する（ステップＳ０５）。トラック番号指定（TrackID=）には、次に切り替えたいトラック番号を設定する。 Next, the transmission control unit 32 transmits to the transmission control unit 22 a track switching request (CHANGE method) for designating a track of content to be played in accordance with a user's input operation (step S05). In the track number designation (TrackID =), the track number to be switched next is set.

“CHANGE rtsp://server.jvc-victor.jp/hoge.mp4/TrackID=2 RTSP/1.0”
伝送制御部２２は、トラック切替要求を受信すると、コンテンツ記憶部２５を参照して、次に切り替えたいトラック番号に対応するトラックがコンテンツhoge.mp4内に存在する場合は、下記の正常応答を送信する（ステップＳ０６）。 “CHANGE rtsp: //server.jvc-victor.jp/hoge.mp4/TrackID=2 RTSP / 1.0”
Upon receiving the track switching request, the transmission control unit 22 refers to the content storage unit 25, and when the track corresponding to the track number to be switched next exists in the content hoge.mp4, transmits the following normal response (Step S06).

“RTSP/1.0 200 OK”
また、次に切り替えたいトラック番号に対応するトラックがコンテンツhoge.mp4内に存在しない場合は、伝送制御部２２は、下記の異常応答を送信する。 “RTSP / 1.0 200 OK”
If the track corresponding to the track number to be switched next does not exist in the content hoge.mp4, the transmission control unit 22 transmits the following abnormal response.

“RTSP/1.0 404 NOT FOUND”
なお、このトラック切替要求およびトラック切替応答は一例であって、文字列表現の多少の差異によって独自性が失われることはない。今回はRTSPを拡張したが、HTTP（Hyper Text Transfer Protocol）のGETメソッドを用いて、トラック切替要求を以下のように書き替えてもよい（Network Working Group, "Hypertext Transfer Protocol -- HTTP/1.1", RFC2616, The Internet Society, June 1999）。 “RTSP / 1.0 404 NOT FOUND”
Note that the track switching request and the track switching response are examples, and uniqueness is not lost due to a slight difference in the character string expression. RTSP has been expanded this time, but using the HTTP (Hyper Text Transfer Protocol) GET method, the track switching request may be rewritten as follows (Network Working Group, "Hypertext Transfer Protocol-HTTP / 1.1" , RFC2616, The Internet Society, June 1999).

“GET /hoge.mp4?TrackID=2 HTTP/1.1”
GETメソッドにはコンテンツ配信サーバ２２のホスト名が含まれないが、コンテンツファイル名と次に切り替えるトラック番号を、RTSPと同様の手法で内包させることができる。 “GET /hoge.mp4?TrackID=2 HTTP / 1.1”
Although the host name of the content distribution server 22 is not included in the GET method, the content file name and the track number to be switched next can be included in the same manner as RTSP.

次に、伝送制御部３２は、再生開始要求を送信する（ステップＳ０７）。再生開始要求では、再生範囲を、NPTを用いて指定することができる。下記の例では、コンテンツの最初（0.0秒）から最後（235.0秒）までと再生範囲を指定している。RTSPにおいて再生開始要求にはPLAYメソッドを用いる。 Next, the transmission control unit 32 transmits a reproduction start request (step S07). In the playback start request, the playback range can be specified using NPT. In the example below, the playback range is specified from the beginning (0.0 seconds) to the end (235.0 seconds) of the content. The PLAY method is used for a playback start request in RTSP.

“PLAY rtsp://server.jvc-victor.jp/hoge.mp4 RTSP/1.0”
“Range: npt=0.0-235.0”
伝送制御部２２は、再生開始要求の応答を送信する（ステップＳ０８）。再生開始要求を受け付けた伝送制御部２２は、既に準備の済んでいる全てのセッション対して、各セッションのコントロールURIと再生開始要求に含まれる再生範囲から、コンテンツファイル名・トラック番号・コンテンツ再生開始時刻を算出し、コマンド入力＝再生開始定数（例えば“PLAY”）と合わせて、コンテンツ取得部２１に入力し、RTPパケットの送信を開始する。 “PLAY rtsp: //server.jvc-victor.jp/hoge.mp4 RTSP / 1.0”
“Range: npt = 0.0-235.0”
The transmission control unit 22 transmits a response to the reproduction start request (step S08). The transmission control unit 22 that has received the playback start request, for all sessions already prepared, starts the content file name / track number / content playback start from the control URI of each session and the playback range included in the playback start request. The time is calculated and input to the content acquisition unit 21 together with command input = reproduction start constant (eg, “PLAY”), and transmission of the RTP packet is started.

また、この応答は、コンテンツ受信装置３ａのコンテンツ読取部３５の動作に必要となるRTPパケット情報を含む。例えば、このセッションにおいて、最初に送られてくるRTPパケットのRTPタイムスタンプ（ネットワークタイムスタンプ）などが含まれる（下記の例では0000000）。 This response includes RTP packet information necessary for the operation of the content reading unit 35 of the content receiving device 3a. For example, in this session, the RTP time stamp (network time stamp) of the first RTP packet sent is included (0000000 in the following example).

“RTP-Info: url= rtsp://server.jvc-victor.jp/hoge.mp4/trackID=1;rtptime
=0000000;”
なおトラック切替要求は、ステップＳ０４の準備要求応答以降かつ停止前であれば、いつでも送信して良い。従って、あるトラックを再生途中でトラック切替が入力されると、伝送制御部３２は、ステップＳ０５で送信されたトラック切替要求を送信し（ステップＳ０９）、伝送制御部２２は、トラック切替要求を受信すると、次に切り替えたいトラック番号をコンテンツ取得部２１へ入力する。コンテンツ取得部２１は、次に切り替えたいトラック番号の存在を確認して、切替結果（＝正常または＝異常）を伝達制御部２２へ入力する。伝達制御部２２は、切替結果が入力されると、トラック切替応答（切替結果＝正常であれば正常応答へ、切替結果＝異常であれば異常応答）を伝達制御部３２へ送信する（ステップＳ１０）。 “RTP-Info: url = rtsp: //server.jvc-victor.jp/hoge.mp4/trackID=1; rtptime
= 0000000; ”
The track switching request may be transmitted anytime after the preparation request response in step S04 and before the stop. Accordingly, when track switching is input during playback of a certain track, the transmission control unit 32 transmits the track switching request transmitted in step S05 (step S09), and the transmission control unit 22 receives the track switching request. Then, the track number to be switched next is input to the content acquisition unit 21. The content acquisition unit 21 confirms the presence of the track number to be switched next, and inputs the switching result (= normal or = abnormal) to the transmission control unit 22. When the switching result is input, the transmission control unit 22 transmits a track switching response (a normal response if the switching result = normal, an abnormal response if the switching result = abnormal) to the transmission control unit 32 (step S10). ).

また、コンテンツ取得部２１は、指定されたトラックのコンテンツ素片をコンテンツ記憶部２５から取得し、送信部２３を介してコンテンツ受信装置３ａへ送信する（詳細は後述）。 Further, the content acquisition unit 21 acquires a content fragment of the designated track from the content storage unit 25 and transmits it to the content reception device 3a via the transmission unit 23 (details will be described later).

また、伝送制御部３２は、利用者が停止入力を行った時点で停止要求を送信する（ステップＳ１１）。RTSPにおいて停止要求はTEARDOWNメソッドである。 Moreover, the transmission control part 32 transmits a stop request | requirement at the time of a user performing stop input (step S11). In RTSP, the stop request is the TEARDOWN method.

“TEARDOWN rtsp://server.jvc-victor.jp/hoge.mp4 RTSP/1.0”
停止要求を受け付けた伝送制御部２２は、送信部２３を制御し、RTPパケットの送信を停止、セッションを切断し、送信を停止したことを通知する（ステップＳ１２）。また、伝送制御部２２は、コンテンツ取得部２１に対して、コマンド入力＝再生停止定数（例えば“STOP”）を入力し、読み出しを停止する。 “TEARDOWN rtsp: //server.jvc-victor.jp/hoge.mp4 RTSP / 1.0”
The transmission control unit 22 that has received the stop request controls the transmission unit 23 to stop transmission of the RTP packet, disconnect the session, and notify that the transmission has been stopped (step S12). Further, the transmission control unit 22 inputs command input = reproduction stop constant (for example, “STOP”) to the content acquisition unit 21 and stops reading.

なお、説明の簡単化のために、１つのコンテンツデータに1つのコンテンツのみが含まれると仮定していたが、前記の伝送制御機構は容易に複数のコンテンツの同期再生に拡張可能である。例えば、ステップＳ０２において、複数のコンテンツに関するメディア記述を列挙し、ステップＳ０３〜ステップＳ０４の準備要求をメディア記述分繰り返すだけで、複数のコンテンツの同期再生が容易に行うことが可能である。 For simplification of explanation, it is assumed that only one content is included in one content data. However, the transmission control mechanism can be easily extended to synchronous reproduction of a plurality of contents. For example, it is possible to easily perform synchronized playback of a plurality of contents simply by listing media descriptions regarding a plurality of contents in step S02 and repeating the preparation requests in steps S03 to S04 for the media descriptions.

次に、コンテンツデータを読み出す際のコンテンツ取得部２１の再生時刻管理処理（送信側）について、図４のフローチャートを用いて説明する。 Next, the reproduction time management process (transmission side) of the content acquisition unit 21 when reading content data will be described with reference to the flowchart of FIG.

まず、コンテンツ受信装置３ａからコンテンツの再生要求がされると、コンテンツ配信サーバ２のコンテンツ取得部２１は、以下に示す６つの変数の初期化を行う（ステップＳ２１）。 First, when a content reproduction request is made from the content receiving device 3a, the content acquisition unit 21 of the content distribution server 2 initializes the following six variables (step S21).

１：コンテンツファイル名CF＝デフォルトのコンテンツファイル名（例えば“hoge.mp4”）
２：トラック番号TN＝デフォルトのトラック番号（例えば“1”）
３：開始フレーム番号FN＝デフォルトのフレーム番号（例えば“1”）
４：再生開始時刻STS＝システムクロック発振部２４から得た現在時刻NTS
５：コンテンツ再生時刻CTS＝“0”
６：再生状態フラグF＝真偽値“Ｎ”
次に、コンテンツ取得部２１は、自動的に再生を開始するか否かを示す予め定められた自動モード定数を参照し、自動再生を行うかどうか判定する（ステップＳ２２）。自動モード定数が真偽値“Ｙ”である場合は、ステップＳ３４の処理へ移行し、自動モード定数が真偽値“Ｎ”である場合には、ステップＳ２３の処理へ移行する。 1: Content file name CF = Default content file name (eg "hoge.mp4")
2: Track number TN = default track number (eg "1")
3: Start frame number FN = default frame number (for example, “1”)
4: Playback start time STS = current time NTS obtained from the system clock oscillator 24
5: Content playback time CTS = “0”
6: Playback state flag F = true value “N”
Next, the content acquisition unit 21 refers to a predetermined automatic mode constant indicating whether or not to automatically start reproduction, and determines whether to perform automatic reproduction (step S22). If the automatic mode constant is a true / false value “Y”, the process proceeds to step S34. If the automatic mode constant is a true / false value “N”, the process proceeds to step S23.

ステップＳ２２の処理において、定数が真偽値“Ｎ”であった場合には、コンテンツ取得部２１は、外部入力（コマンド入力CMD、コンテンツファイル名CF1、トラック番号TN1、コンテンツ再生開始時刻CTS1の４つの変数の組）があるかどうか判定し（ステップＳ２３）、外部入力がある場合に真偽値“Ｙ”となりステップＳ２４の処理へ移行し、外部入力がない場合に真偽値“Ｎ”となりステップＳ３０の処理へ移行する。 In the process of step S22, when the constant is a true / false value “N”, the content acquisition unit 21 performs external input (command input CMD, content file name CF1, track number TN1, content reproduction start time CTS1 4). If there is an external input, the true value is “Y” and the process proceeds to step S24. If there is no external input, the true value is “N”. The process proceeds to step S30.

ステップＳ２３の処理において、真偽値“Ｙ”であった場合には、コンテンツ取得部２１は、外部入力のうち、コマンド入力CMD＝トラック切替定数（例えば“CHANGE”）であるか判定し（ステップＳ２４）、トラック切替定数であった場合に真偽値“Ｙ”となりステップＳ２５の処理へ移行し、真偽値“Ｎ”の場合は、ステップＳ２８の処理へ移行する。 In the process of step S23, when the value is “Y”, the content acquisition unit 21 determines whether command input CMD = track switching constant (for example, “CHANGE”) among external inputs (step S23). S24) If it is a track switching constant, the true / false value is “Y”, and the process proceeds to step S25. If the true / false value is “N”, the process proceeds to step S28.

また、ステップＳ２４の処理において、真偽値“Ｙ”であった場合には、コンテンツ取得部２１は、外部入力のうち（コンテンツファイル名CF1、トラック番号TN1）を用いて、CFとCF1が等しく、かつ、コンテンツファイル名CFにトラック番号TN1が指し示すトラックが存在するか、トラック切替可能か判定し（ステップＳ２５）、可能であれ真偽値“Ｙ”となりステップＳ２６の処理へ移行し、トラック番号TN＝トラック番号TN1となる代入しトラック切替処理を行った上で、伝送制御部２２へ切替結果＝正常を通知する（ステップＳ２６）。また、トラックが存在せずに切替不可能であれば真偽値“Ｎ”となり、コンテンツ取得部２１は、ステップＳ２７の処理へ移行し、伝送制御部２２へ切替結果＝異常を通知する（ステップＳ２７）。 Also, in the process of step S24, if the truth value is “Y”, the content acquisition unit 21 uses (content file name CF1, track number TN1) among the external inputs, and CF and CF1 are equal. In addition, it is determined whether the track indicated by the track number TN1 exists in the content file name CF and whether the track can be switched (step S25). If it is possible, the true value becomes “Y” and the process proceeds to step S26. After substituting TN = track number TN1 and performing track switching processing, the transmission control unit 22 is notified of the switching result = normal (step S26). If the track is not present and cannot be switched, the truth value “N” is obtained, and the content acquisition unit 21 proceeds to the process of step S27 and notifies the transmission control unit 22 of the switching result = abnormal (step). S27).

また、ステップＳ２４の処理において、真偽値“Ｎ”となった場合には、コンテンツ取得部２１は、外部入力のうちコマンド入力CMD＝再生停止定数であるかを判定し（ステップＳ２８）、コマンド入力CMD＝再生停止定数である場合は真偽値“Ｙ”となり、全体の動作を停止終了し、それ以外の場合は真偽値“Ｎ”となり、ステップＳ２９の処理へ移行する。 In the process of step S24, when the truth value is “N”, the content acquisition unit 21 determines whether command input CMD = reproduction stop constant among external inputs (step S28), and the command If the input CMD = reproduction stop constant, the truth value “Y” is obtained, and the entire operation is stopped. Otherwise, the truth value “N” is obtained, and the process proceeds to step S29.

ステップＳ２８の処理において、真偽値“Ｎ”であった場合には、コンテンツ取得部２１は、外部入力（コンテンツファイル名CF1、トラック番号TN1、コンテンツ再生開始時刻CTS1）の３変数を用いて、まず５つの変数の初期化を行う（ステップＳ２９）。 In the process of step S28, when the truth value is “N”, the content acquisition unit 21 uses three variables of external input (content file name CF1, track number TN1, content playback start time CTS1), First, five variables are initialized (step S29).

１：コンテンツファイル名CF＝コンテンツファイル名CF1
２：トラック番号TN＝トラック番号TN1
３：再生開始時刻STS＝システムクロック発振部２４から得た現在時刻NTS
４：コンテンツ再生時刻CTS＝コンテンツ再生開始時刻CTS1
５：再生状態フラグF＝真偽値“Ｙ”
続いてコンテンツ取得部２１は、コンテンツ再生時刻CTSを用いた開始フレーム番号FNの初期化を行い、前段階として送出開始時刻情報StartTSを算出する。送出開始時刻情報StartTSの算出式を以下に示す。 1: Content file name CF = Content file name CF1
2: Track number TN = Track number TN1
3: Playback start time STS = current time NTS obtained from the system clock oscillator 24
4: Content playback time CTS = Content playback start time CTS1
5: Playback state flag F = true value “Y”
Subsequently, the content acquisition unit 21 initializes the start frame number FN using the content reproduction time CTS, and calculates transmission start time information StartTS as a previous step. The calculation formula of the transmission start time information StartTS is shown below.

＜数７＞
StartTS＝CTS×TRS
ここでトラックタイムスタンプ周波数TRSは、コンテンツファイル名CFの中のトラック番号TNが指し示すトラックのトラックタイムスタンプ周波数である。 <Equation 7>
StartTS = CTS × TRS
Here, the track time stamp frequency TRS is the track time stamp frequency of the track indicated by the track number TN in the content file name CF.

次に、コンテンツ取得部２１は、コンテンツファイル名CFの中のトラック番号TNが指し示すトラックのフレーム番号を１から順に検索し、時刻情報≧StartTSが成り立つ点のフレーム番号NFNを得て、フレーム番号NFNを開始フレーム番号FNに設定する。以上がステップＳ２９の概要である。 Next, the content acquisition unit 21 searches the frame number of the track indicated by the track number TN in the content file name CF in order from 1 to obtain the frame number NFN of the point where time information ≧ StartTS is satisfied, and the frame number NFN. Is set to the start frame number FN. The above is the outline of step S29.

また、ステップＳ２３の処理において、自動モード定数が真偽値“Ｎ”であった場合には、コンテンツ取得部２１は、再生状態フラグの値が真偽値“Ｙ”であるかどうか判定し（ステップＳ３０）、値が真偽値“Ｙ”である場合ステップＳ３１の処理へ移行し、真偽値“Ｎ”である場合にはステップＳ２３の処理へ移行する。 If the automatic mode constant is the true / false value “N” in the process of step S23, the content acquisition unit 21 determines whether or not the value of the playback state flag is the true / false value “Y” ( If the value is a true / false value “Y”, the process proceeds to step S31, and if the value is a true / false value “N”, the process proceeds to step S23.

ステップＳ３０の処理において再生状態フラグの値が真偽値“Ｙ”であった場合、またはステップＳ２６、Ｓ２７、Ｓ２９から継続する処理の場合には、コンテンツ取得部２１は、システムクロック発振部２４から得た現在時刻NTSと再生開始時刻STSとシステムクロック周波数SSから経過時間を算出し、経過時間とコンテンツ再生時刻CTSとトラックタイムスタンプ周波数TRSから、送出可能時刻情報LastTSを算出することによって、送信可能範囲を検索する（ステップＳ３１）。送出可能時刻情報LastTSの算出式を以下に示す。 In the case where the value of the playback state flag is the true / false value “Y” in the process of step S30, or in the case of the process that continues from steps S26, S27, and S29, the content acquisition unit 21 starts from the system clock oscillator 24. Transmission is possible by calculating the elapsed time from the obtained current time NTS, playback start time STS, and system clock frequency SS, and calculating sendable time information LastTS from the elapsed time, content playback time CTS, and track time stamp frequency TRS A range is searched (step S31). The formula for calculating the sendable time information LastTS is shown below.

＜数８＞
LastTS＝（NTS−STS）／SS×TRS
ここでシステムクロック周波数SSとは、（現在時刻NTS−再生開始時刻STS）を秒単位に変換するための定数である。例えばシステムクロックが1/1000000秒の精度を持っているとすると1000000である。 <Equation 8>
LastTS = (NTS−STS) / SS × TRS
Here, the system clock frequency SS is a constant for converting (current time NTS−reproduction start time STS) into seconds. For example, if the system clock has an accuracy of 1/1000000 second, it is 1000000.

次に、コンテンツ取得部２１は、コンテンツファイル名CFの中のトラック番号TNが指し示すトラック内の、開始フレーム番号FNからフレーム番号を順にサーチし、時刻情報≧LastTSが成り立つ点のフレーム番号NFNを取得する。条件を満たすフレーム番号NFNが求まらない場合は、コンテンツ取得部２１は、フレーム番号NFNに終端定数“−1”を設定する。 Next, the content acquisition unit 21 sequentially searches the frame number from the start frame number FN in the track indicated by the track number TN in the content file name CF, and acquires the frame number NFN at which time information ≧ LastTS is satisfied. To do. When the frame number NFN that satisfies the condition is not found, the content acquisition unit 21 sets the termination constant “−1” to the frame number NFN.

次に、コンテンツ取得部２１は、開始フレーム番号FNとフレーム番号NFNを比較することによって送信の可否を判定し（ステップＳ３２）、FN＜NFNが成立する場合には真偽値“Ｙ”となりステップＳ３３の処理へ移行し、成立しない場合は真偽値“Ｎ”となりステップＳ２３の処理へ移行する。 Next, the content acquisition unit 21 determines whether or not transmission is possible by comparing the start frame number FN and the frame number NFN (step S32). If FN <NFN is satisfied, the true / false value “Y” is obtained. The process proceeds to S33. If not established, the truth value is “N”, and the process proceeds to Step S23.

ステップＳ３２の処理において真偽値“Ｙ”であった場合には、コンテンツ取得部２１は、コンテンツファイル名CFの中のトラック番号TNが指し示すトラック内の、開始フレーム番号FNからフレーム番号NFN未満の各フレーム番号に対応するコンテンツ素片それぞれと、それぞれのコンテンツ素片について、対応する時刻情報から下記の算出式に従って生成されたネットワークタイムスタンプPTSとを、送信部２３へ出力する（ステップＳ３３）。 If the value in the process of step S32 is a true value “Y”, the content acquisition unit 21 is less than the frame number NFN from the start frame number FN in the track indicated by the track number TN in the content file name CF. Each content unit corresponding to each frame number and the network time stamp PTS generated according to the following calculation formula from the corresponding time information for each content unit are output to the transmission unit 23 (step S33).

＜数９＞
PTS＝時刻情報／TRS×ネットワークタイムスタンプ周波数
出力後、送信済みの位置まで開始フレーム番号FNをずらすため、コンテンツ取得部２１は、開始フレーム番号FN＝フレーム番号NFNと設定する。 <Equation 9>
After outputting PTS = time information / TRS × network time stamp frequency, the content acquisition unit 21 sets start frame number FN = frame number NFN to shift the start frame number FN to the transmitted position.

また、ステップＳ２２の処理において、自動モード定数が真偽値“Ｙ”であった場合には、コンテンツ取得部２１は、再生状態フラグF＝真偽値“Ｙ”と設定し、再生開始時刻STS＝システムクロック発振部２４から得た現在時刻NTSを設定し、自動再生開始のための設定を行う（ステップＳ３４）。 If the automatic mode constant is the true / false value “Y” in the process of step S22, the content acquisition unit 21 sets the reproduction state flag F = the true / false value “Y” and the reproduction start time STS. = The current time NTS obtained from the system clock oscillator 24 is set, and settings for starting automatic reproduction are made (step S34).

このように、伝送制御部２２、２３によって確立されるコントロールラインでトラック切替のメッセージが送受信され、そのメッセージに従って、時刻情報を基に切り替えた先のトラックのコンテンツ素片を取得、送信するので、コンテンツ受信装置３ａでは、トラックが切り替えられても滑らかな再生を行うことができる。 In this way, a track switching message is transmitted and received on the control line established by the transmission control units 22 and 23, and according to the message, the content piece of the destination track switched based on the time information is acquired and transmitted. The content receiving device 3a can perform smooth reproduction even when the track is switched.

なお、送信部２３−受信部３１間の通信は、RTP（Real-time Transfer Protocol, RFC1889）の規約に従って行っても良い。 Note that communication between the transmission unit 23 and the reception unit 31 may be performed in accordance with RTP (Real-time Transfer Protocol, RFC1889) rules.

≪応用例１≫
上記の実施形態の応用例１として、図５に、コンテンツ配信サーバ２とコンテンツ受信装置３ｂ（カラオケ装置）とからなるコンテンツ配信システム１ｂの機能ブロック図を示す。図５に示すコンテンツ配信システム１ｂは、いわゆる通信カラオケシステムである。 ≪Application 1≫
As an application example 1 of the above embodiment, FIG. 5 shows a functional block diagram of a content distribution system 1b including a content distribution server 2 and a content receiving device 3b (karaoke device). The content distribution system 1b shown in FIG. 5 is a so-called communication karaoke system.

また、コンテンツ受信装置３ｂは、受信部３１、伝送制御部３２、バッファ部３３、システムクロック発振部３４、コンテンツ読取部３５、再生部３６ｂ、入力部３７、およびマイク入力部３８を有する。なお、上記の実施形態と同じものについては、同じ番号を付し、その詳細な説明を省略する。 The content receiving device 3b includes a receiving unit 31, a transmission control unit 32, a buffer unit 33, a system clock oscillation unit 34, a content reading unit 35, a reproduction unit 36b, an input unit 37, and a microphone input unit 38. In addition, about the same thing as said embodiment, the same number is attached | subjected and the detailed description is abbreviate | omitted.

再生部３６ｂは、入力されたコンテンツ素片に応じた復号化を行い、マイク入力部からの音声信号と合成した後、スピーカ等の所定の出力装置に対して音声信号、映像信号を出力する機能を有する。例えば、コンテンツ素片がMPEGに類する高能率符号化されたデジタル音声データである場合、再生部３６は、高能率符号復号器（デコーダ）、D/A変換器、アナログアンプ等を有して構成される。 The playback unit 36b performs decoding according to the input content fragment, combines it with the audio signal from the microphone input unit, and then outputs the audio signal and video signal to a predetermined output device such as a speaker. Have For example, when the content segment is digital audio data encoded with high efficiency similar to MPEG, the playback unit 36 includes a high efficiency code decoder (decoder), a D / A converter, an analog amplifier, and the like. Is done.

ここでいう合成の最低要件は、コンテンツ読取部３５とマイク入力部３８とからの２つの入力をほぼ同時に再生することである。仮にコンテンツ素片の種別が音声である場合は、マイク入力部３８からの音声と不自然でないように適切なミキシング合成や音量調整を行っても良い。 Here, the minimum requirement for composition is to reproduce two inputs from the content reading unit 35 and the microphone input unit 38 almost simultaneously. If the content segment type is audio, appropriate mixing synthesis and volume adjustment may be performed so as not to be unnatural with the audio from the microphone input unit 38.

マイク入力部３８は、マイク等の音声取込装置によって取り込まれた音声信号の入力を受け付け、その音声信号を再生部３６ｂが扱うデータ形式に変換する機能を有する。 The microphone input unit 38 has a function of receiving an input of an audio signal captured by an audio capturing device such as a microphone and converting the audio signal into a data format handled by the playback unit 36b.

図５の通信カラオケシステムにおいて、あるコンテンツをコンテンツ配信サーバ２からコンテンツ受信装置３ｂへ送信するとする。 In the communication karaoke system of FIG. 5, it is assumed that a certain content is transmitted from the content distribution server 2 to the content receiving device 3b.

そのコンテンツは、例えば、男性ボーカルと女性ボーカルとのデュエット曲であり、トラック１に曲の伴奏のみのデータを、トラック２に伴奏と男性ボーカルのデータを、トラック３に伴奏と女性ボーカルのデータをそれぞれ記録してあるとする。 The content is, for example, a duet song of male vocals and female vocals. Track 1 contains only accompaniment data, track 2 contains accompaniment and male vocal data, and track 3 contains accompaniment and female vocal data. Assume that each is recorded.

そうすると、利用者の要求に合わせて適切なトラックを提供することで、男性ボーカルまたは女性ボーカルと合わせて一人でデュエット曲を歌唱することができ、多彩なサービスを提供することができる。 Then, by providing an appropriate track according to the user's request, it is possible to sing a duet song alone with a male vocal or a female vocal, and various services can be provided.

≪応用例２≫
上記の実施形態の応用例２として、図６に、コンテンツ配信サーバ２、コンテンツ受信装置３ｂ（カラオケ装置）、およびオーサリング装置５とからなるコンテンツ配信システム１ｃ（通信カラオケシステム）の機能ブロック図を示す。オーサリング装置５は、コンテンツ配信サーバ２と同一の筐体に収められても良いし、別の筐体に収められても良い。なお、便宜上、オーサリング装置５を利用する利用者のことを、編集者と呼ぶ。 ≪Application 2≫
As an application example 2 of the above embodiment, FIG. 6 shows a functional block diagram of a content distribution system 1c (communication karaoke system) including a content distribution server 2, a content receiving device 3b (karaoke device), and an authoring device 5. . The authoring device 5 may be housed in the same housing as the content distribution server 2 or may be housed in a separate housing. For convenience, the user who uses the authoring device 5 is called an editor.

また、オーサリング装置５は、オーサリング部５１、音程変換部５２、コンテンツ送信部５３、入力部５４、マイク入力部５５、およびコンテンツ一次記憶部５６を有する。 Further, the authoring device 5 includes an authoring unit 51, a pitch conversion unit 52, a content transmission unit 53, an input unit 54, a microphone input unit 55, and a content primary storage unit 56.

オーサリング部５１は、入力部５４からの指示入力とマイク入力部５５からの音声データを用いて、コンテンツデータの新規作成・トラック追加・コンテンツデータの削除等、コンテンツデータの作成編集を行う機能、作成編集したコンテンツデータをコンテンツ一次記憶部５６に記録する機能を有する。 The authoring unit 51 uses the instruction input from the input unit 54 and the audio data from the microphone input unit 55 to create and edit content data such as new content data creation, track addition, content data deletion, and the like. It has a function of recording edited content data in the content primary storage unit 56.

音程変換部５２は、コンテンツを構成する原音の音声データ（以降、ソース音声データと称す）に対して、少なくとも半音もしくは全音ごとに音程変換を行う機能（例えば、特開平９−１８５３９２号公報の音程変換装置のような、音声の周波数領域変換を利用した周波数シフトアルゴリズムを用いる）を有する。 The pitch conversion unit 52 performs a pitch conversion for at least a semitone or a full tone on the original sound data (hereinafter referred to as source sound data) constituting the content (for example, a pitch described in JP-A-9-185392). A frequency shift algorithm using frequency domain conversion of speech, such as a conversion device).

この応用例２では、基準となるソース音声データの音階を“0”とし、上に半音シフトすると“+1”、下に半音シフトすると“-1”となるような整数変数をキー変数kと呼ぶ。一般的なカラオケ装置における音程変換を実現する場合、キー変数kは、少なくとも-6〜0〜+6までの範囲を採り得る。 In this application example 2, the key variable k is an integer variable such that the scale of the reference source audio data is “0”, “+1” when shifted up by a semitone, and “−1” when shifted down by a semitone. Call. When the pitch conversion in a general karaoke apparatus is realized, the key variable k can take a range of at least −6 to 0 to +6.

例えば、k=+1の場合、短２度の移調に相当し、原曲がハ長調だった場合は移調後の調は変ニ長調となる。また、k=-6の場合は、減５度に相当し、原曲がハ長調だった場合は移調後の調は嬰ヘ長調となる。 For example, when k = + 1, this corresponds to a minor second transposition, and when the original music is in C major, the key after the transposition is in D major. When k = -6, it corresponds to a decrease of 5 degrees. When the original music is in C major, the key after transposition is in F major.

なお、k=-6未満（もしくは+6より大）が用いられない理由は、例えば、半音シフトで-6さげるということは、半音シフトで+6（増４度）してから１オクターブ下げる移調に相当するからである。カラオケの場合、利用者本人が意図している調に合致しさえすれば、演奏のオクターブが異なっても、快適に歌唱することができる。 The reason why k = less than -6 (or greater than +6) is not used is that, for example, a semitone shift down by -6 means a transposition of +6 (4 degrees increase) after a semitone shift and then lowered by one octave. It is because it corresponds to. In the case of karaoke, as long as it matches the key intended by the user, it can be sung comfortably even if the performance octave is different.

コンテンツ送信部５３は、入力部５４からコンテンツファイル移動指示と移動先配信サーバ名が入力されると、コンテンツ一次記憶部５６に保存されているコンテンツデータを、移動先配信サーバ名の指し示す配信サーバ内のコンテンツ記憶部２５へネットワークを介して送信する機能を有する。 When the content file movement instruction and the destination distribution server name are input from the input unit 54, the content transmission unit 53 stores the content data stored in the content primary storage unit 56 in the distribution server indicated by the destination distribution server name. The content storage unit 25 has a function of transmitting via the network.

また、コンテンツ送信部５３は、CD-RやDVD-R等の記録媒体にコンテンツデータを記録する機能を有する。ネットワークを介してコンテンツデータの送信を行わない場合は、記録媒体にてコンテンツデータを提供する。 The content transmission unit 53 has a function of recording content data on a recording medium such as a CD-R or a DVD-R. When content data is not transmitted via a network, the content data is provided on a recording medium.

入力部５４は、編集者が、コンテンツデータ新規作成指示・トラック追加指示・コンテンツデータ移動指示・コンテンツデータ削除指示の各指示入力を受け付ける機能を有する。例えば、ビットマップディスプレイとキーボードを用いて、再生ボタン、停止ボタン、コンテンツ選択ダイヤログなどのＧＵＩ部品を利用者に提供し、また、テンキーや数段階のスライドスイッチなどを用いて、トラック番号の入力を受け付ける操作スイッチを提供する。 The input unit 54 has a function for the editor to accept each instruction input of a content data new creation instruction, a track addition instruction, a content data movement instruction, and a content data deletion instruction. For example, using a bitmap display and a keyboard, GUI parts such as a play button, a stop button, and a content selection dialog are provided to the user, and a track number is input using a numeric keypad and several stages of slide switches. Provide an operation switch that accepts.

マイク入力部５５は、マイク等の音声取込装置によって取り込まれた音声信号の入力を受け付け、その音声信号を再生部３６ｂが扱うデータ形式に変換してソース音声データを生成する機能を有する。また、マイク入力部５５は、CD-ROMドライブやネットワーク等の外部からソース音声データを受け付けるためのインタフェースを備えても良い。さらに、マイク入力部５５はマイクとメモリを備え、一定時間分の音声信号を録音してソース音声データに変換するようにしても良い。編集者がトラック追加指示を入力すると、同時にソース音声データがオーサリング装置５にマイク入力部５５を通して取り込まれる。 The microphone input unit 55 has a function of receiving input of an audio signal captured by an audio capturing device such as a microphone and converting the audio signal into a data format handled by the playback unit 36b to generate source audio data. The microphone input unit 55 may include an interface for receiving source audio data from the outside such as a CD-ROM drive or a network. Further, the microphone input unit 55 may include a microphone and a memory, and may record an audio signal for a predetermined time and convert it into source audio data. When the editor inputs a track addition instruction, source audio data is simultaneously taken into the authoring device 5 through the microphone input unit 55.

コンテンツ一次記憶部５６は、オーサリング部５１が作成編集したコンテンツデータを一次的に記憶する機能を有する。 The content primary storage unit 56 has a function of temporarily storing content data created and edited by the authoring unit 51.

次に、オーサリング装置５の動作について説明する。 Next, the operation of the authoring device 5 will be described.

（１）コンテンツデータの新規作成
オーサリング部５１は、入力部５４からコンテンツデータ新規作成指示と、コンテンツファイル名CF1が入力されると、コンテンツ一次記憶部５６に、トラックの全く含まれないコンテンツデータを、コンテンツファイル名CF1で作成する。 (1) New Creation of Content Data When the content data new creation instruction and the content file name CF1 are input from the input unit 54, the authoring unit 51 stores content data that does not include any track in the content primary storage unit 56. Create with content file name CF1.

（２）コンテンツデータの削除
オーサリング部５１は、入力部５４からコンテンツデータ削除指示と、コンテンツファイル名CF1が入力されると、コンテンツ一次記憶部５６に記録されているコンテンツファイル名CF1のコンテンツデータを削除する。 (2) Deleting Content Data When the content data deletion instruction and the content file name CF1 are input from the input unit 54, the authoring unit 51 deletes the content data of the content file name CF1 recorded in the content primary storage unit 56. delete.

（３）トラック追加
オーサリング部５１は、入力部５４からトラック追加指示、コンテンツファイル名CF1、トラック番号TN1、キー変数kが入力され、マイク入力部５５からソース音声データが入力されると、まず、キー変数kが“0”か否かを判定し、“0”以外であれば音程変換部５２を用いてソース音声データに音程変換を施す。 (3) Add track When the track adding instruction, content file name CF1, track number TN1, and key variable k are input from the input unit 54 and the source audio data is input from the microphone input unit 55, the authoring unit 51 It is determined whether or not the key variable k is “0”. If the key variable k is not “0”, the pitch conversion unit 52 is used to perform pitch conversion on the source audio data.

次に、オーサリング部５１は、コンテンツファイル名CF1の指し示すコンテンツデータをオープンし、総トラック数を+1し、トラック番号TN1の指し示すトラックデータを追加し、ソース音声データを適当なサンプル数毎にコンテンツ素片に分割し、必要であれば高能率符号化を施し、フレーム番号と時刻情報を付与して多重化する。トラックタイムスタンプ周波数は、音声データのサンプリング周波数を測定して書き込む。 Next, the authoring unit 51 opens the content data indicated by the content file name CF1, adds 1 to the total number of tracks, adds the track data indicated by the track number TN1, and sets the source audio data for each appropriate number of samples. The data is divided into segments, and if necessary, highly efficient coding is performed, and a frame number and time information are assigned and multiplexed. The track timestamp frequency is written by measuring the sampling frequency of the audio data.

このようにして音程の異なるトラックを複数作成し、作成されたコンテンツデータを、オーサリング装置５はコンテンツ配信サーバ２へ供給することで、コンテンツ受信装置３ｂ側に従来あった音程変換部５２を搭載する必要がなく、また、コンテンツのオーサリング時に高品質な音程変換が可能になるため、サービス品質を向上させることができる。 A plurality of tracks having different pitches are created in this way, and the authoring device 5 supplies the created content data to the content distribution server 2, thereby mounting the conventional pitch conversion unit 52 on the content receiving device 3 b side. This is unnecessary, and high-quality pitch conversion is possible when authoring content, so that the service quality can be improved.

また、コンテンツ受信装置３ｂ側に従来あった音程変換部５２を搭載する必要が無くても、音程変換部５２を搭載したものと同等のサービスが可能となり、劇的にサービスコストを低減することできる。 Further, even if it is not necessary to install the conventional pitch conversion unit 52 on the content receiving device 3b side, a service equivalent to that equipped with the pitch conversion unit 52 is possible, and the service cost can be drastically reduced. .

また、音程変換部５２がコンテンツ受信装置３ｂとは独立して存在するため、コンテンツ受信装置のハードウェアの製作時期および出荷時期に依存せず、全てのコンテンツ受信装置３ｂで同等のサービス品質の向上効果が得られる。 In addition, since the pitch conversion unit 52 exists independently of the content receiving device 3b, the service quality is improved in the same manner in all the content receiving devices 3b regardless of the hardware production time and shipping time of the content receiving device. An effect is obtained.

また、コンテンツのオーサリング時に、複数の音程範囲のうち、ある音程分は機械的な処理によって音程変換を行い、ある音程分は移調された楽譜を用いて演奏者・歌唱者が実際に演奏・歌唱した音声データを用いることによって、従来の手法では不可能かつ高品質なサービスの実現といったことも可能となる。 Also, when authoring content, some pitch ranges are converted by a mechanical process, and some pitches are actually played and sung using transposed music scores. By using the voice data, it is possible to realize a high-quality service that is impossible with the conventional method.

また、コンテンツの著作者が音程変換等の楽曲加工に承諾しないコンテンツに関しては、オーサリング時に音程変換を行ったトラックを追加しないだけで、コンテンツ受信装置３ｂ全てに特別な情報を送信することなく、容易に特定のコンテンツに対して音程変換再生を禁止することも可能である。 In addition, for content that the content author does not consent to music processing such as pitch conversion, it is easy without adding special information to all the content receiving devices 3b, without adding a track whose pitch has been changed during authoring. It is also possible to prohibit pitch conversion reproduction for specific content.

なお、図６のコンテンツ配信システム１ｃでは、コンテンツ配信サーバ２、コンテンツ受信装置３ｂ、オーサリング装置５が、１つずつしかない構成であったが、図７に示すように、オーサリング装置５に複数台のコンテンツ配信サーバ２を接続し、それぞれのコンテンツ配信サーバ２にコンテンツ受信装置３ｂを接続するようにしても良い。 In the content distribution system 1c shown in FIG. 6, the content distribution server 2, the content receiving device 3b, and the authoring device 5 are only one each. However, as shown in FIG. The content distribution servers 2 may be connected, and the content receiving device 3b may be connected to each content distribution server 2.

これは、多数の利用者にサービスを提供可能な通信カラオケシステムの例として好適である。オーサリング装置５で音程変換処理を行って複数トラックに音程の異なるコンテンツデータが記録されたコンテンツデータを用意することによって、複数のコンテンツ配信サーバ２に同一のコンテンツデータをコピーするだけで、各コンテンツ受信装置３ｂは音程変換を行う手段を搭載せずとも、トラックを変更することで再生中に動的な音程変換を行うことが可能となる。また、利用者が増加し、コンテンツ配信サーバ２とコンテンツ受信装置３ｂの組が増加しても、音程変換処理に必要とされる時間・演算コストは常に一定であり、劇的なコスト削減が可能となる。 This is suitable as an example of a communication karaoke system that can provide services to a large number of users. The authoring device 5 performs a pitch conversion process and prepares content data in which content data having different pitches are recorded on a plurality of tracks, so that each content can be received only by copying the same content data to the plurality of content distribution servers 2. Even if the device 3b is not equipped with means for performing pitch conversion, it is possible to perform dynamic pitch conversion during playback by changing the track. In addition, even if the number of users increases and the number of sets of content distribution servers 2 and content receiving devices 3b increases, the time / calculation cost required for the pitch conversion process is always constant, enabling dramatic cost reductions. It becomes.

以上、本発明の実施形態について説明したが、本発明はこれらに限定されるものでない。また、コンテンツ配信サーバ２、コンテンツ受信装置３、およびオーサリング装置５の構成要件のうち、全てまたは一部をコンピュータで実行可能なプログラムとして実現し、予めコンピュータ読み取り可能な記録媒体などに記録して提供することも可能である。 As mentioned above, although embodiment of this invention was described, this invention is not limited to these. In addition, all or part of the configuration requirements of the content distribution server 2, the content receiving device 3, and the authoring device 5 are realized as a computer-executable program, and are recorded in a computer-readable recording medium and provided in advance. It is also possible to do.

また、物理的に１台のコンピュータで、コンテンツ配信サーバ２に相当するプログラムを複数実行しても良い。さらに、物理的に１台のコンピュータで、コンテンツ受信装置３に相当するプログラムを複数実行しても良い。 A plurality of programs corresponding to the content distribution server 2 may be executed by a single physical computer. Further, a plurality of programs corresponding to the content receiving device 3 may be executed by a single physical computer.

また、物理的に1台のコンピュータで、コンテンツ配信サーバ２に相当するプログラムとコンテンツ受信装置３に相当するプログラムを実行しても良い。さらに、物理的に１台のコンピュータで、コンテンツ配信サーバ２に相当するプログラムとオーサリング装置５に相当するプログラムを実行しても良い。 Further, a program corresponding to the content distribution server 2 and a program corresponding to the content receiving device 3 may be executed by a single computer. Furthermore, the program corresponding to the content distribution server 2 and the program corresponding to the authoring device 5 may be executed by a single physical computer.

コンテンツ配信システム１ａ（非同期モデル）におけるコンテンツ配信サーバ２とコンテンツ受信装置３ａの機能ブロック図である。It is a functional block diagram of the content delivery server 2 and the content receiver 3a in the content delivery system 1a (asynchronous model). 複数トラックを有するコンテンツデータのデータ構造を示す図である。It is a figure which shows the data structure of the content data which has several tracks. 伝送制御情報の送受信時におけるコンテンツ配信サーバ２の伝送制御部２２およびコンテンツ受信装置３ａの伝送制御部３２の動作を示すシーケンス図である。It is a sequence diagram which shows operation | movement of the transmission control part 22 of the content delivery server 2 at the time of transmission / reception of transmission control information, and the transmission control part 32 of the content receiver 3a. コンテンツデータを読み出す際のコンテンツ取得部２１の再生時刻管理処理（送信側）を示すフローチャートである。It is a flowchart which shows the reproduction time management process (transmission side) of the content acquisition part 21 at the time of reading content data. コンテンツ配信サーバ２とコンテンツ受信装置３ｂ（カラオケ装置）とからなるコンテンツ配信システム１ｂ（通信カラオケシステム）の機能ブロック図である。It is a functional block diagram of the content delivery system 1b (communication karaoke system) which consists of the content delivery server 2 and the content receiver 3b (karaoke apparatus). コンテンツ配信サーバ２、コンテンツ受信装置３ｂ（カラオケ装置）、およびオーサリング装置５とからなるコンテンツ配信システム１ｃ（通信カラオケシステム）の機能ブロック図である。It is a functional block diagram of the content distribution system 1c (communication karaoke system) which consists of the content distribution server 2, the content receiver 3b (karaoke apparatus), and the authoring apparatus 5. オーサリング装置５に複数台のコンテンツ配信サーバ２を接続した構成例を示す図である。It is a figure which shows the structural example which connected the several content delivery server 2 to the authoring apparatus 5. FIG. 非同期モデルにおけるコンテンツ配信サーバ１０２ａとコンテンツ受信装置１０３ａの機能ブロック図である。It is a functional block diagram of the content delivery server 102a and the content receiver 103a in an asynchronous model. コンテンツデータのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of content data. コンテンツファイルを読み出す際のコンテンツ取得部１０２１の再生時刻管理処理（送信側）を示すフローチャートである。It is a flowchart which shows the reproduction time management process (transmission side) of the content acquisition part 1021 at the time of reading a content file. ネットワークタイムスタンプを用いたコンテンツ読取部１０３４の再生時刻管理処理（受信側）を示すフローチャートである。It is a flowchart which shows the reproduction time management process (reception side) of the content reading part 1034 using a network time stamp. 伝送制御機構を備えるコンテンツ配信システム１０１ｂを構成するコンテンツ配信サーバ１０２ｂとコンテンツ受信装置１０３ｂの機能ブロック図である。It is a functional block diagram of the content delivery server 102b and the content receiver 103b which comprise the content delivery system 101b provided with a transmission control mechanism. 伝送制御情報の送受信時における伝送制御部１０２５および伝送制御部１０３６の動作を示すシーケンス図である。It is a sequence diagram which shows operation | movement of the transmission control part 1025 and the transmission control part 1036 at the time of transmission / reception of transmission control information. バッファ部１０３２に蓄積されているコンテンツ素片の順列を示す図である。It is a figure which shows the permutation of the content piece accumulate | stored in the buffer part 1032. ２トラック同時受信可能なコンテンツ配信システム１０１ｃを構成するコンテンツ配信サーバ１０２ｃとコンテンツ受信装置１０３ｃの機能ブロック図である。It is a functional block diagram of the content distribution server 102c and the content receiver 103c which comprise the content distribution system 101c which can receive 2 tracks simultaneously. 複数トラック同時受信構成を有するコンテンツ受信装置１０３ｃの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the content receiver 103c which has a multi-track simultaneous reception structure.

Explanation of symbols

１ａ、ｂコンテンツ配信システム
２コンテンツ配信サーバ
３ａ、ｂコンテンツ受信装置
４ IPネットワーク
５オーサリング装置
２１コンテンツ取得部
２２伝送制御部
２３送信部
２４システムクロック発振部
２５コンテンツ記憶部
３１受信部
３２伝送制御部
３３バッファ部
３４システムクロック発振部
３５コンテンツ読取部
３６再生部
３７入力部
３８マイク入力部
５１オーサリング部
５２音程変換部
５３コンテンツ送信部
５４入力部
５５マイク入力部
５６コンテンツ一次記憶部
１０１ａ、ｂ、ｃコンテンツ配信システム
１０２ａ、ｂ、ｃコンテンツ配信サーバ
１０３ａ、ｂ、ｃコンテンツ受信装置
１０４ネットワーク
１０２１コンテンツ取得部（Ａ，Ｂ）
１０２２送信部（Ａ，Ｂ）
１０２３システムクロック発振部
１０２４コンテンツ記憶部
１０２５伝送制御部
１０３１受信部（Ａ，Ｂ）
１０３２バッファ部（Ａ，Ｂ）
１０３３システムクロック発振部
１０３４コンテンツ読取部（Ａ，Ｂ）
１０３５再生部
１０３６伝送制御部
１０３７入力部
１０３８切替部 1a, b Content distribution system 2 Content distribution server 3a, b Content reception device 4 IP network 5 Authoring device 21 Content acquisition unit 22 Transmission control unit 23 Transmission unit 24 System clock oscillation unit 25 Content storage unit 31 Reception unit 32 Transmission control unit 33 Buffer unit 34 System clock oscillation unit 35 Content reading unit 36 Playback unit 37 Input unit 38 Microphone input unit 51 Authoring unit 52 Pitch conversion unit 53 Content transmission unit 54 Input unit 55 Microphone input unit 56 Content primary storage unit 101a, b, c Content Distribution system 102a, b, c Content distribution server 103a, b, c Content receiving device 104 Network 1021 Content acquisition unit (A, B)
1022 Transmitter (A, B)
1023 System clock oscillation unit 1024 Content storage unit 1025 Transmission control unit 1031 Reception unit (A, B)
1032 Buffer part (A, B)
1033 System clock oscillation unit 1034 Content reading unit (A, B)
1035 Playback unit 1036 Transmission control unit 1037 Input unit 1038 Switching unit

Claims

A content distribution server storing content data having a plurality of content segment data, and a content reproduction request to the content distribution server via the network, and the content distribution server responds to the reproduction request to the network. A content distribution method in a content distribution system comprising a content receiving device that receives and reproduces content fragment data transmitted via
The content data stored in the content distribution server is content data having a file name and a plurality of tracks, and the track data of each track has a track number and a plurality of data frames, and each data frame Is a frame number, time information indicating the timing of reproduction, and content fragment data, and the data frame of each track in the same frame number has the same time information,
The file name of the content data requesting reproduction from the content receiving device to the content distribution server is transmitted, and at least the preset track number transmitted from the content distribution server in response thereto, and the file The content receiving device receives the playback time information indicating the playback time length of the track having the preset track number in the content data of the name, and the content receiving device designates the desired track number and distributes the content A session establishing step of establishing a session between the content receiving device and the content distribution server by transmitting to a server;
When making a content playback start request after establishing the session,
In the content receiving device, a content playback start request message including the file name of the content data requesting the playback and playback range information indicating a playback time range is transmitted from the content receiving device to the content distribution server. And a process of
When the content distribution server receives the reproduction start request message, it specifies the frame number of the data frame including the content fragment data to be transmitted according to the reproduction range information in the reproduction start request message. The content fragment data requested to be reproduced, which is designated by the specified frame number, the file name in the reproduction start request message, and the track number designated by the content receiving device when the session is established. Obtaining from the stored content data and transmitting to the content receiving device;
When making a request to switch the playback target track after establishing the session,
In the content receiving device, a step of transmitting a switching request message including a switching destination track number for switching playback tracks from the content receiving device to the content distribution server;
In the content distribution server, when the switching request message is received after receiving the reproduction start request message, the reproduction order is the last among the content piece data already sent at the time of receiving the switching request message. Based on the time information in the data frame including the content segment data, the data frame including the content segment data to be reproduced continuously with the content segment data whose reproduction order is last. The content number data specified by the specified frame number and the switching destination track number in the switching request message is acquired from the stored content data, Transmitting to the content receiving device;
A content distribution method characterized by comprising:

In response to a playback request from the content receiving device, the content segment data requested to be played back is selected from content data having a plurality of content segment data, and the selected content segment data is transmitted to the content receiving device via the network. A content distribution server,
Content data having a file name and a plurality of tracks. The track data of each track has a track number and a plurality of data frames, and each data frame has a frame number and time information indicating a reproduction timing. , And a content storage means for storing content data in which the data frames of the tracks in the same frame number have the same time information.
The file name of the content data requested to be reproduced is received from the content receiving device, and at least the track number set in advance according to the file name, and the preset track number in the content data of the file name Session with the content receiving device by transmitting playback time information indicating the playback time length of the track to the content receiving device and then receiving the desired track number designated by the content receiving device. Establish
After the session is established, when a content playback start request is made at the content receiving device, the file name of the content data requested to be played from the content receiving device, and playback range information indicating the time range to play back Receiving a playback start request message for content containing
Transmission for receiving a switching request message including a switching destination track number for switching the playback track from the content receiving device when a request for switching the playback target track is made in the content receiving device after the session is established Control means;
When the transmission control means receives the reproduction start request message, the frame number of the data frame including the content fragment data to be transmitted is specified according to the reproduction range information in the reproduction start request message, The content fragment of the content requested to be played, which is specified by the identified frame number, the file name in the playback start request message, and the track number specified by the content receiving device when the session is established Obtaining data from the content storage means;
When the transmission control unit receives the switching request message after receiving the reproduction start request message, the reproduction order is the last of the content piece data already sent at the time of receiving the switching request message. Based on the time information in the data frame including the content segment data, the data frame including the content segment data to be reproduced continuously with the content segment data whose reproduction order is last. Content acquisition means for specifying the frame number and acquiring the content fragment data specified by the specified frame number and the switching destination track number in the switching request message from the content storage means;
Transmitting means for transmitting the content fragment data acquired by the content acquisition means to the content receiving device;
A content distribution server comprising:

A content reproduction server that stores content data having a plurality of pieces of content data is requested to reproduce the content via the network, and is transmitted from the content distribution server via the network in response to the reproduction request. A content receiving device for receiving and playing back content fragment data,
The content data stored in the content distribution server is content data having a file name and a plurality of tracks, and the track data of each track has a track number and a plurality of data frames. The frame has a frame number, time information indicating reproduction timing, and content fragment data, and the data frame of each track in the same frame number has the same time information,
The file name of the content data requesting reproduction is transmitted, and the previously set track number transmitted from the content distribution server in response to the file name and the content data of the file name are set in advance. A session with the content distribution server is established by receiving reproduction time information indicating the reproduction time length of the track having the track number, and then specifying the desired track number and transmitting it to the content distribution server And
When the content reproduction start request is made after the session is established, the content distribution start request message including the file name of the content data to be reproduced and the reproduction range information indicating the reproduction time range is transmitted to the content. To the server,
A transmission control means for transmitting a switching request message including a switching destination track number for switching the reproduction track to the content distribution server when performing a switching request of the reproduction target track after the establishment of the session ;
In the content distribution server, when the switching request message is received after receiving the reproduction start request message, the reproduction order is the last among the content piece data already sent at the time of receiving the switching request message. Based on the time information in the data frame including the content segment data, the data frame including the content segment data to be reproduced continuously with the content segment data whose reproduction order is last. Receiving means for receiving, from the content distribution server, the content segment data specified and transmitted by the specified frame number and the switching destination track number in the switching request message And
A content receiving apparatus comprising: