JP2017182706A

JP2017182706A - Server device, information processing method, and program

Info

Publication number: JP2017182706A
Application number: JP2016073060A
Authority: JP
Inventors: 亮佐橋; Akira Sahashi
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2016-03-31
Filing date: 2016-03-31
Publication date: 2017-10-05
Anticipated expiration: 2036-03-31
Also published as: JP6623905B2

Abstract

PROBLEM TO BE SOLVED: To provide a server device that can generate and transmit a representative image adequately representing the contents of a transmitted moving image without requiring additional data such as meta-data.SOLUTION: A distribution server 1 for transmitting content made up of a plurality of frames to a plurality of clients 2 obtains camera work data indicating a display range at each client 2 from each client 2, and calculates the number of display times at the client 2 by the predetermined region in an image frame for each image frame. The distribution server extracts a region of the image frame where the calculated number of display times meets a predetermined reference, and generates a digest image representing contents of the content on the basis of the region.SELECTED DRAWING: Figure 1

Description

本発明は、サーバ装置、情報処理方法およびプログラムの技術分野に属する。より詳細には、端末装置に対して動画を送信するサーバ装置および情報処理方法、ならびにそのサーバ装置用のプログラムの技術分野に属する。 The present invention belongs to the technical field of server devices, information processing methods, and programs. More specifically, the present invention belongs to the technical field of a server device and an information processing method for transmitting a moving image to a terminal device, and a program for the server device.

近年、インターネット等のネットワークを介してサーバ装置に接続されたクライアント端末それぞれの機能をそのサーバ装置から提供する、いわゆる仮想クライアントシステムに関する研究／開発が行われている。このような仮想クライアントシステムについての先行技術文献には、例えば下記特許文献１がある。特許文献１に開示されている技術では、クライアント端末を使用するユーザの嗜好に応じて、各クライアント端末に送信される動画の全体からその一部を切り出すことで、その動画の内容を代表する、いわゆるダイジェスト画像を生成する構成とされている。 In recent years, research / development on a so-called virtual client system in which the function of each client terminal connected to a server device via a network such as the Internet is provided from the server device has been performed. As a prior art document regarding such a virtual client system, for example, there is Patent Document 1 below. In the technique disclosed in Patent Literature 1, the content of the moving image is represented by cutting out a part of the moving image transmitted to each client terminal according to the preference of the user who uses the client terminal. A so-called digest image is generated.

特開２００４−１２６８１１号公報（図２３等）Japanese Patent Laying-Open No. 2004-126811 (FIG. 23, etc.)

しかしながら上記特許文献１に記載されている仮想クライアントシステムにおいては、ダイジェスト画像の生成のためには、元の動画の内容に対応するメタデータを作成する必要がある。ここで、この場合のメタデータとしては、元の動画におけるオブジェクトの出現時間情報やその属性情報等を含むメタデータが必要であり、これを作成するには大変な労力を必要とするという問題点があった。またそのようなメタデータの作成のためには、元の動画におけるオブジェクトの位置検出のためのオブジェクト識別処理等が必要であり、このオブジェクト識別処理等のための処理負荷も大きくなるという問題点があった。 However, in the virtual client system described in Patent Document 1, it is necessary to create metadata corresponding to the content of the original moving image in order to generate a digest image. Here, as metadata in this case, metadata including the appearance time information of the object in the original video and its attribute information is necessary, and it takes a lot of labor to create this. was there. In addition, the creation of such metadata requires an object identification process for detecting the position of the object in the original moving image, and the processing load for the object identification process increases. there were.

そこで本発明は、以上の問題点に鑑みて為されたものであり、メタデータ等の追加データを不要として、送信される動画の内容を的確に代表する代表画像を生成して送信することが可能なサーバ装置および情報処理方法ならびにプログラムを提供する。 Therefore, the present invention has been made in view of the above problems, and it is possible to generate and transmit a representative image that accurately represents the content of a moving image to be transmitted without using additional data such as metadata. A server device, an information processing method, and a program are provided.

上記の課題を解決するために、請求項１に記載の発明は、動画を表示する複数の端末装置のそれぞれに、複数のフレームからなる前記動画を送信するサーバ装置において、各前記端末装置における前記動画の表示範囲を示す範囲データを、各前記端末装置から取得する取得手段と、前記取得手段により取得された前記範囲データに基づいて、前記フレーム内の予め設定された領域ごとの前記端末装置における表示回数を、前記フレームごとに算出する算出手段と、前記算出手段により算出された前記表示回数に基づいて、前記表示回数が予め設定された基準を満たす前記フレームの前記領域を前記動画から抽出する抽出手段と、前記抽出手段により抽出された前記領域に基づいて、前記動画の内容を代表する代表画像を生成する生成手段と、を備えることを特徴とする。 In order to solve the above problem, the invention according to claim 1 is a server device that transmits the moving image including a plurality of frames to each of a plurality of terminal devices that display a moving image. In the terminal device for each preset area in the frame, based on the range data acquired by the acquisition unit, and the range data acquired by the acquisition unit, the range data indicating the display range of the moving image Based on the calculation means for calculating the display count for each frame and the display count calculated by the calculation means, the region of the frame that satisfies a preset criterion for the display count is extracted from the moving image. Extraction means; and generation means for generating a representative image representative of the content of the moving image based on the region extracted by the extraction means; Characterized in that it comprises.

請求項２に記載の発明は、請求項１に記載のサーバ装置において、前記取得手段は、予め設定された分類方法により複数の前記端末装置を分類して得られた端末装置群に属する前記端末装置から前記範囲データを取得することを特徴とする。 According to a second aspect of the present invention, in the server device according to the first aspect, the acquisition unit belongs to a terminal device group obtained by classifying the plurality of terminal devices by a preset classification method. The range data is obtained from a device.

請求項３に記載の発明は、請求項１または請求項２に記載のサーバ装置において、前記算出手段は、前記領域としての画素ごとの前記表示回数を算出し、前記抽出手段は、前記表示回数が前記基準を満たす前記画素を前記動画から抽出し、前記生成手段は、前記抽出手段により抽出された前記画素に基づいて前記代表画像を生成することを特徴とする。 According to a third aspect of the present invention, in the server device according to the first or second aspect, the calculation unit calculates the display count for each pixel as the region, and the extraction unit calculates the display count. The pixel that satisfies the criterion is extracted from the moving image, and the generation unit generates the representative image based on the pixel extracted by the extraction unit.

請求項４に記載の発明は、請求項１から請求項３のいずれか一項に記載のサーバ装置において、前記抽出手段は、前記算出手段により算出された前記表示回数が前記動画の再生時間軸に対してピークとなったタイミングを含んで前記タイミングの前後の予め設定された再生時間に再生された前記領域を前記動画から抽出することを特徴とする。 According to a fourth aspect of the present invention, in the server device according to any one of the first to third aspects, the extraction unit is configured such that the display count calculated by the calculation unit is a playback time axis of the moving image. The region reproduced at a preset reproduction time before and after the timing including the peak timing is extracted from the moving image.

請求項５に記載の発明は、請求項１から請求項４のいずれか一項に記載のサーバ装置において、前記抽出手段は、前記算出手段により算出された前記表示回数が予め設定された閾値以上である前記領域を前記動画から抽出することを特徴とする。 According to a fifth aspect of the present invention, in the server device according to any one of the first to fourth aspects, the extraction unit is configured such that the display count calculated by the calculation unit is equal to or greater than a preset threshold value. The region is extracted from the moving image.

請求項６に記載の発明は、請求項１から請求項５のいずれか一項に記載のサーバ装置において、前記抽出手段は、前記フレームにおいて前記表示回数が最大となる前記領域の中心を中心として、前記動画の送信において予め設定されている最低解像度分の画像を前記動画から抽出することを特徴とする。 According to a sixth aspect of the present invention, in the server device according to any one of the first to fifth aspects, the extraction means is centered on a center of the region where the display count is maximum in the frame. An image for a minimum resolution set in advance in the transmission of the moving image is extracted from the moving image.

請求項７に記載の発明は、請求項１から請求項５のいずれか一項に記載のサーバ装置において、前記抽出手段は、前記フレームにおいて前記表示回数が最大となる前記領域を含み、且つ、前記動画の送信において予め設定されているアスペクト比となる画像を前記動画から抽出することを特徴とする。 The invention according to claim 7 is the server device according to any one of claims 1 to 5, wherein the extraction unit includes the region where the number of display times is maximum in the frame, and An image having an aspect ratio set in advance in the transmission of the moving image is extracted from the moving image.

請求項８に記載の発明は、請求項１から請求項５のいずれか一項に記載のサーバ装置において、前記表示回数が予め設定された閾値より多い前記領域の大きさが前記動画の送信において予め設定されている最低解像度以下である場合に、前記抽出手段は、前記表示回数が前記閾値より多い前記領域の中心を中心として、前記最低解像度分の画像を前記動画から抽出し、前記表示回数が前記閾値より多い前記領域の大きさが前記最低解像度より大きい場合に、前記抽出手段は、前記表示回数が前記閾値より多い前記領域を含み、且つ、前記動画の送信において予め設定されているアスペクト比となる画像を前記動画から抽出することを特徴とする。 According to an eighth aspect of the present invention, in the server device according to any one of the first to fifth aspects, in the transmission of the moving image, the size of the area in which the display count is greater than a preset threshold value. When the resolution is equal to or lower than a preset minimum resolution, the extraction unit extracts an image corresponding to the minimum resolution from the moving image centering on a center of the area where the display count is greater than the threshold, and the display count When the size of the area larger than the threshold is larger than the minimum resolution, the extraction means includes the area where the display count is larger than the threshold, and the aspect set in advance in the transmission of the moving image A ratio image is extracted from the moving image.

請求項９に記載の発明は、動画を表示する複数の端末装置のそれぞれに、複数のフレームからなる前記動画を送信するサーバ装置において実行される情報処理方法において、各前記端末装置における前記動画の表示範囲を示す範囲データを、各前記端末装置から取得する取得ステップと、前記取得ステップにおいて取得された前記範囲データに基づいて、前記フレーム内の予め設定された領域ごとの前記端末装置における表示回数を、前記フレームごとに算出する算出ステップと、前記算出ステップにおいて算出された前記表示回数に基づいて、前記表示回数が予め設定された基準を満たす前記フレームの前記領域を前記動画から抽出する抽出ステップと、前記抽出ステップにおいて抽出された前記領域に基づいて、前記動画の内容を代表する代表画像を生成する生成ステップと、を含むことを特徴とする。 The invention according to claim 9 is an information processing method executed in a server device that transmits the moving image including a plurality of frames to each of a plurality of terminal devices that display a moving image. An acquisition step of acquiring range data indicating a display range from each of the terminal devices, and a display count in the terminal device for each preset area in the frame based on the range data acquired in the acquisition step Calculating step for each frame, and extracting step for extracting the region of the frame that satisfies the preset reference number from the moving image based on the display number calculated in the calculation step And representing the content of the video based on the region extracted in the extraction step. Characterized in that it comprises a generation step of generating a representative image.

請求項１０に記載の発明は、動画を表示する複数の端末装置のそれぞれに、複数のフレームからなる前記動画を送信するサーバ装置に含まれるコンピュータに、各前記端末装置における前記動画の表示範囲を示す範囲データを、各前記端末装置から取得する取得ステップと、前記取得ステップにおいて取得された前記範囲データに基づいて、前記フレーム内の予め設定された領域ごとの前記端末装置における表示回数を、前記フレームごとに算出する算出ステップと、前記算出ステップにおいて算出された前記表示回数に基づいて、前記表示回数が予め設定された基準を満たす前記フレームの前記領域を前記動画から抽出する抽出ステップと、前記抽出ステップにおいて抽出された前記領域に基づいて、前記動画の内容を代表する代表画像を生成する生成ステップと、を実行させることを特徴とする。 According to the tenth aspect of the present invention, a display range of the moving image in each terminal device is set in a computer included in a server device that transmits the moving image including a plurality of frames to each of a plurality of terminal devices that display the moving image. The acquisition step of acquiring the range data shown from each of the terminal devices, and based on the range data acquired in the acquisition step, the number of display times in the terminal device for each preset region in the frame, A calculation step of calculating for each frame; an extraction step of extracting the region of the frame satisfying a predetermined criterion from the moving image based on the display count calculated in the calculation step; Based on the region extracted in the extraction step, a representative image representing the content of the moving image is obtained. A generation step of forming, characterized in that for the execution.

請求項１、請求項９または請求項１０のいずれか一項に記載の発明によれば、メタデータ等の追加データを不要として、動画の内容を的確に代表する代表画像を生成することができる。 According to the invention described in any one of claims 1, 9, and 10, a representative image that accurately represents the content of a moving image can be generated without additional data such as metadata. .

請求項２に記載の発明によれば、端末装置群に属する端末装置のユーザに最適化した代表画像を生成することができる。 According to the second aspect of the present invention, it is possible to generate a representative image optimized for a user of a terminal device belonging to the terminal device group.

請求項３に記載の発明によれば、動画の内容をより的確に代表する代表画像を生成することができる。 According to the third aspect of the present invention, it is possible to generate a representative image that more accurately represents the content of the moving image.

請求項４に記載の発明によれば、表示回数として少なくても、その前後に対して表示回数がピークとなった領域を含む代表画像を生成することができる。 According to the fourth aspect of the present invention, even if the number of times of display is small, it is possible to generate a representative image including a region where the number of times of display reaches the peak before and after that.

請求項５に記載の発明によれば、表示回数がより多い領域を含む代表画像を生成して動画の内容を的確に代表させることができる。 According to the fifth aspect of the present invention, it is possible to generate a representative image including a region with a larger number of display times and accurately represent the content of the moving image.

請求項６に記載の発明によれば、最低解像度の画像からなる代表画像を生成することができる。 According to the sixth aspect of the present invention, it is possible to generate a representative image consisting of an image with the lowest resolution.

請求項７に記載の発明によれば、既定のアスペクト比を維持しつつ、動画の内容を的確に代表する代表画像を生成することができる。 According to the seventh aspect of the present invention, it is possible to generate a representative image that accurately represents the content of a moving image while maintaining a predetermined aspect ratio.

請求項８に記載の発明によれば、表示回数が閾値より多い領域の大きさと既定の最低解像度との関係に応じて、動画の内容を的確に代表する代表画像を生成することができる。 According to the eighth aspect of the present invention, it is possible to generate a representative image that accurately represents the content of the moving image in accordance with the relationship between the size of the region where the number of display times is greater than the threshold and the predetermined minimum resolution.

本実施形態の通信システムの概要構成例を示す図である。It is a figure which shows the example of a schematic structure of the communication system of this embodiment. （ａ）ないし（ｃ）は、本実施形態の最大表示回数の算出をそれぞれ例示する図である。(A) thru | or (c) is a figure which illustrates calculation of the maximum display frequency of this embodiment, respectively. （ａ）は本実施形態の最大表示回数のピーク値を例示する図である。（ｂ）は本実施形態の最大表示回数領域の切出しを例示する図である。（ｃ）は本実施形態の最大表示回数領域の切出しを例示する図である。（ｄ）は本実施形態の最大表示回数領域の切出しを例示する図である。（ｅ）は本実施形態の最大表示回数領域の切出しを例示する図である。(A) is a figure which illustrates the peak value of the maximum display frequency of this embodiment. (B) is a figure which illustrates extraction of the maximum display frequency area of this embodiment. (C) is a figure which illustrates extraction of the maximum display frequency area | region of this embodiment. (D) is a figure which illustrates extraction of the maximum display frequency area of this embodiment. (E) is a figure which illustrates extraction of the maximum display frequency area of this embodiment. （ａ）は本実施形態のユーザクラスタリング処理を示すフローチャートである。（ｂ）は本実施形態の画像フレーム抽出処理を示すフローチャートである。（ｃ）は本実施形態のダイジェスト画像生成処理を示すフローチャートである。(A) is a flowchart which shows the user clustering process of this embodiment. (B) is a flowchart showing an image frame extraction process of the present embodiment. (C) is a flowchart showing digest image generation processing of the present embodiment.

以下、本発明の実施形態を図面に基づいて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

（１）通信システムの構成および動作概要
図１は、本実施形態の通信システムの概要構成例を示す図である。図１に示すように、通信システムＳは、配信サーバ１および複数のクライアント２を含んで構成される。本実施形態の配信サーバ１は、本発明のサーバ装置の一例である。クライアント２は、本発明の端末装置の一例である。配信サーバ１とクライアント２とはネットワークＮＷを介して通信可能になっている。ネットワークＮＷは、例えば、インターネット、携帯通信網、およびゲートウェイ等により構成される。 (1) Configuration and Operation Overview of Communication System FIG. 1 is a diagram illustrating a schematic configuration example of a communication system according to the present embodiment. As shown in FIG. 1, the communication system S includes a distribution server 1 and a plurality of clients 2. The distribution server 1 of this embodiment is an example of a server device of the present invention. The client 2 is an example of a terminal device of the present invention. The distribution server 1 and the client 2 can communicate with each other via the network NW. The network NW includes, for example, the Internet, a mobile communication network, a gateway, and the like.

配信サーバ１は、例えばクライアント２からの要求に応じて、動画データを含むコンテンツをクライアント２へ送信する。動画データは複数の画像フレームから構成される動画を表すデータである。コンテンツは音声データを含んでいてもよい。コンテンツの送信は、例えばネットワークＮＷを介してＨＴＴＰ（Hyper Text Transfer Protocol）ライブストリーミングを用いて行われる。例えば配信サーバ１は、所定のビデオカメラによる撮影中に、ビデオカメラから送信されてくるコンテンツをリアルタイムで配信するライブ配信を実行してもよい。あるいは配信サーバ１は、コンテンツを予め記憶し、記憶しておいたコンテンツをオンデマンド配信してもよい。所定のビデオカメラで撮影された動画を例として、ビデオカメラの撮影範囲全領域の動画を全体動画ということにする。またビデオカメラの撮影範囲の一部の領域の動画を部分動画ということにする。配信サーバ１は、クライアント２のユーザによる後述する操作に基づいて指定された表示範囲内の領域の動画（全体動画の場合もあるし、部分動画の場合もある）を表す動画データを含むコンテンツを、クライアント２へ送信する。 For example, the distribution server 1 transmits content including moving image data to the client 2 in response to a request from the client 2. Movie data is data representing a movie composed of a plurality of image frames. The content may include audio data. The content is transmitted using, for example, HTTP (Hyper Text Transfer Protocol) live streaming via the network NW. For example, the distribution server 1 may execute live distribution in which content transmitted from the video camera is distributed in real time during shooting by a predetermined video camera. Alternatively, the distribution server 1 may store the content in advance and distribute the stored content on demand. Taking a moving image shot by a predetermined video camera as an example, a moving image in the entire shooting range of the video camera is called an entire moving image. A moving image in a part of the shooting range of the video camera is referred to as a partial moving image. The distribution server 1 includes content including moving image data representing a moving image (which may be an entire moving image or a partial moving image) in an area within a display range designated based on an operation described later by a user of the client 2. To the client 2.

クライアント２は、配信サーバ１から配信されてくるコンテンツを受信する。クライアント２は、受信したコンテンツを再生することにより、コンテンツに含まれる動画データにより表される動画を、後述する表示部２４ａの画面に表示させる。コンテンツの再生中、クライアント２は、後述する操作部２５ａを用いたユーザによる疑似的なカメラワーク操作を受け付ける。カメラワーク操作とは、撮影者がカメラを動かすことで、例えば被写体に対するカメラの位置、被写体に対するカメラの角度（向き）、および被写体のサイズを決める操作をいう。本実施形態では、全体動画を表す動画データを構成する複数の画像フレームにおける表示対象に対するカメラワーク（仮想カメラを動かすこと）を、ユーザがあたかも実際のカメラを動かすように操作部２５ａを操作して疑似的に行う。このような操作を、「表示範囲の変更操作」という。この表示範囲の変更操作により、ユーザは、全体動画の中で所望の表示範囲を指定可能となり、またその指定により表示範囲を変更することができる。表示範囲は、全体動画を構成する画像フレーム全体の中で表示部２４ａの画面に表示される範囲である。ユーザは、表示範囲の変更操作により、全体動画に対する表示範囲の位置座標を、画像フレームごとに異なるように指定することができる。このような変更操作の例として、パンおよびチルト操作がある。パンおよびチルト操作は、画像フレームに対する仮想カメラを基準とする視点の方向を変更する操作である。またユーザは、表示範囲の変更操作により、全体動画に対する表示範囲のサイズを拡大または縮小することができる。すなわちユーザは、表示される画像の内容を変更せずに、表示範囲のサイズを拡大または縮小することができる。これにより、表示範囲内の画像を拡大または縮小することができる。このような変更操作の例として、ズーム操作がある。ズーム操作は、画像フレームに対する上記視点からの画角を変更する操作である。ズーム操作には、ズームイン操作とズームアウト操作とがある。ズームインとは、画角を狭める動作をいう。ズームアウトとは、画角を広げる動作をいう。クライアント２は、ユーザの操作に基づいて特定された表示範囲内の領域の動画を表す動画データを含むコンテンツを、配信サーバ１から受信する。クライアント２は、受信された動画データに基づいて、全体動画のうち表示範囲内の領域の動画を表示部２４ａに表示させる。クライアント２は、例えば、パーソナルコンピュータ、スマートフォン、携帯電話、テレビ、テレビゲーム機等であってもよい。 The client 2 receives content distributed from the distribution server 1. The client 2 reproduces the received content, thereby displaying the moving image represented by the moving image data included in the content on the screen of the display unit 24a described later. During the reproduction of the content, the client 2 accepts a pseudo camera work operation by the user using the operation unit 25a described later. The camera work operation refers to an operation of determining the position of the camera with respect to the subject, the angle (orientation) of the camera with respect to the subject, and the size of the subject by moving the camera. In the present embodiment, camera work (moving a virtual camera) for a display target in a plurality of image frames constituting moving image data representing an entire moving image is operated by operating the operation unit 25a so that the user moves an actual camera. Simulate. Such an operation is referred to as a “display range changing operation”. By this display range changing operation, the user can designate a desired display range in the entire moving image, and can change the display range by the designation. The display range is a range displayed on the screen of the display unit 24a in the entire image frame constituting the entire moving image. The user can specify the position coordinates of the display range with respect to the entire moving image so as to be different for each image frame by changing the display range. Examples of such change operations include pan and tilt operations. The pan and tilt operations are operations for changing the direction of the viewpoint with respect to the image frame with reference to the virtual camera. Also, the user can enlarge or reduce the size of the display range for the entire moving image by changing the display range. That is, the user can enlarge or reduce the size of the display range without changing the content of the displayed image. Thereby, the image within the display range can be enlarged or reduced. An example of such a change operation is a zoom operation. The zoom operation is an operation for changing the angle of view from the viewpoint with respect to the image frame. The zoom operation includes a zoom-in operation and a zoom-out operation. Zooming in means an operation for narrowing the angle of view. Zoom out refers to the operation of widening the angle of view. The client 2 receives content including moving image data representing a moving image in an area within the display range specified based on a user operation from the distribution server 1. Based on the received moving image data, the client 2 causes the display unit 24a to display a moving image in an area within the display range of the entire moving image. The client 2 may be, for example, a personal computer, a smartphone, a mobile phone, a television, a video game machine, or the like.

一方、配信サーバ１は、コンテンツを再生するクライアント２からカメラワークデータのアップロードを受け付ける。カメラワークデータは、コンテンツの再生時間の経過に従って、そのクライアント２における動画の表示範囲を示す疑似的なカメラワークに関する情報である。カメラワークデータは、画像フレームに対する表示範囲の位置座標およびサイズと、その画像フレームの再生位置とのセットを、少なくとも表示範囲の変更操作が行われた再生位置ごとに含む。なお、表示範囲の位置座標およびサイズは、パン、チルト、およびズームで表すとよい。パンとは、ビデオカメラ（仮想カメラ）の左右振りをいう。チルトとは、ビデオカメラ（仮想カメラ）の上下振りをいう。ズームとは、表示倍率をいう。再生位置は、動画データの再生開始からの時間的な位置をいう。再生位置は、動画データの再生開始から経過した時間であるという点において、再生時間ともいう。カメラワークデータは本発明の範囲データの一例である。 On the other hand, the distribution server 1 accepts upload of camera work data from the client 2 that reproduces the content. The camera work data is information regarding pseudo camera work indicating the display range of the moving image on the client 2 as the content playback time elapses. The camera work data includes a set of the position coordinates and size of the display range with respect to the image frame and the reproduction position of the image frame at least for each reproduction position where the display range change operation has been performed. Note that the position coordinates and size of the display range may be represented by pan, tilt, and zoom. Panning refers to a left / right swing of a video camera (virtual camera). Tilt refers to swinging the video camera (virtual camera) up and down. Zoom refers to display magnification. The reproduction position refers to a temporal position from the start of reproduction of moving image data. The playback position is also referred to as a playback time in that it is the time elapsed since the start of playback of the video data. Camera work data is an example of range data of the present invention.

ここで本実施形態の通信システムＳは、仮想クライアントシステムとして動作する。すなわち通信システムＳは、各クライアント２から送信されたカメラワークデータに基づき、そのクライアント２における表示範囲内の領域の動画を表す動画データを含むコンテンツを、配信サーバ１からそのクライアント２へ送信する。このとき配信サーバ１は、各クライアント２に送信すべき動画データを含むコンテンツを記憶部１２から読み出して、各クライアント２に送信する。これにより各クライアント２では、そのユーザの任意によるカメラワーク操作に対応した表示範囲の動画を表す動画データを含むコンテンツを受信して、表示部２４ａに表示することができる。 Here, the communication system S of the present embodiment operates as a virtual client system. That is, the communication system S transmits, from the distribution server 1 to the client 2, content including moving image data representing a moving image in an area within the display range of the client 2 based on the camera work data transmitted from each client 2. At this time, the distribution server 1 reads content including moving image data to be transmitted to each client 2 from the storage unit 12 and transmits the content to each client 2. Accordingly, each client 2 can receive content including moving image data representing a moving image in a display range corresponding to the user's arbitrary camera work operation and display the content on the display unit 24a.

また配信サーバ１は、本実施形態のダイジェスト画像を生成する。本実施形態のダイジェスト画像は、各クライアント２に送信するコンテンツの内容を代表するダイジェスト画像である。本実施形態のダイジェスト画像は、コンテンツに含まれる動画データの一部、あるいはその動画データを構成する一または複数の静止画データにより構成されている。ダイジェスト画像を構成する静止画データは例えば、そのコンテンツの内容を代表する、いわゆるサムネイル画像である。ダイジェスト画像としての動画データまたは静止画データを、以下単に「ダイジェスト画像としての動画データ等」と称する。ダイジェスト画像としての動画データ等は、例えばクライアント２からの要求に応じて、そのクライアント２に配信される。ダイジェスト画像としての動画データ等を受信したクライアント２のユーザは、その動画データ等を再生することにより、そのダイジェスト画像により代表されるコンテンツの内容またはその概要を認識することができる。本実施形態のダイジェスト画像は、各クライアント２から送信されるカメラワークデータに基づいて生成される。生成されたダイジェスト画像としての動画データ等は、例えば配信サーバ１内に記憶され、クライアント２からの要求に応じて送信される。生成されたダイジェスト画像としての動画データ等のクライアント２への送信方法は、例えば、ダイジェスト画像により代表されるコンテンツのクライアント２への送信方法と同様の方法を用いることができる。本実施形態のダイジェスト画像の生成については、後ほど詳述する。 In addition, the distribution server 1 generates a digest image of the present embodiment. The digest image of this embodiment is a digest image that represents the content to be transmitted to each client 2. The digest image of the present embodiment is composed of a part of moving image data included in the content or one or a plurality of still image data constituting the moving image data. The still image data making up the digest image is, for example, a so-called thumbnail image that represents the content. The moving image data or still image data as a digest image is hereinafter simply referred to as “moving image data as a digest image”. The moving image data or the like as a digest image is distributed to the client 2 in response to a request from the client 2, for example. The user of the client 2 that has received the moving image data or the like as the digest image can recognize the content or the outline of the content represented by the digest image by reproducing the moving image data or the like. The digest image of this embodiment is generated based on camera work data transmitted from each client 2. The generated moving image data or the like as a digest image is stored in, for example, the distribution server 1 and transmitted in response to a request from the client 2. For example, a method similar to the method for transmitting content represented by the digest image to the client 2 can be used as a method for transmitting the generated moving image data as the digest image to the client 2. The generation of the digest image of this embodiment will be described in detail later.

（２）各装置の構成
次に図１を参照して、本実施形態の通信システムＳに含まれる各装置の構成について説明する。配信サーバ１は図１に示すように、制御部１１、記憶部１２およびインターフェース部１３等を備えて構成される。これらの構成要素は、バス１４に接続されている。インターフェース部１３は、ネットワークＮＷに接続される。インターフェース部１３は本発明の取得手段の一例である。制御部１１は、コンピュータとしてのＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）およびＲＡＭ（Random Access Memory）等により構成される。制御部１１はそれぞれ、本発明の算出手段の一例、抽出手段の一例および生成手段の一例である。記憶部１２は、例えばハードディスクドライブにより構成される。記憶部１２には、ＯＳ（Operating System）およびサーバプログラム等が記憶されている。サーバプログラムは、コンテンツまたはダイジェスト画像の送信処理等をＣＰＵに実行させるプログラムである。また記憶部１２には、コンテンツを構成する動画データおよび音声データが記憶される。動画データおよび音声データは、コンテンツＩＤ、および動画データまたは音声データのコンテンツにおける再生位置または再生時間と対応付けて記憶部１２に記憶される。コンテンツＩＤは、コンテンツを識別する識別情報である。 (2) Configuration of Each Device Next, the configuration of each device included in the communication system S of the present embodiment will be described with reference to FIG. As shown in FIG. 1, the distribution server 1 includes a control unit 11, a storage unit 12, an interface unit 13, and the like. These components are connected to the bus 14. The interface unit 13 is connected to the network NW. The interface unit 13 is an example of an acquisition unit of the present invention. The control unit 11 includes a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), and the like as a computer. Each of the control units 11 is an example of a calculation unit, an example of an extraction unit, and an example of a generation unit of the present invention. The storage unit 12 is configured by, for example, a hard disk drive. The storage unit 12 stores an OS (Operating System), a server program, and the like. The server program is a program that causes the CPU to execute a content or digest image transmission process or the like. The storage unit 12 stores moving image data and audio data constituting the content. The moving image data and audio data are stored in the storage unit 12 in association with the content ID and the reproduction position or reproduction time in the content of the moving image data or audio data. The content ID is identification information for identifying the content.

次にクライアント２は図１に示すように、制御部２１、記憶部２２、ビデオＲＡＭ２３、映像制御部２４、操作処理部２５、音声制御部２６およびインターフェース部２７等を備えて構成される。これらの構成要素は、バス２８に接続されている。映像制御部２４には、ディスプレイを備える表示部２４ａが接続される。制御部２１は、コンピュータとしてのＣＰＵ、ＲＯＭおよびＲＡＭ等により構成される。操作処理部２５には、操作部２５ａが接続される。操作部２５ａには、例えば、マウス、キーボード、リモコン等が含まれる。表示部２４ａと操作部２５ａとを兼ねるタッチパネルが適用されてもよい。制御部２１は、ユーザによる操作部２５ａからの操作指示を、操作処理部２５を介して受け付ける。音声制御部２６には、スピーカ２６ａが接続される。インターフェース部２７は、ネットワークＮＷに接続される。記憶部２２は、例えば、ハードディスクドライブまたはフラッシュメモリ等により構成される。記憶部２２には、ＯＳ、およびプレイヤーソフトウェア等が記憶されている。プレイヤーソフトウェアは、コンテンツまたはダイジェスト画像の受信および再生処理等をＣＰＵに実行させるプログラムである。 Next, as shown in FIG. 1, the client 2 includes a control unit 21, a storage unit 22, a video RAM 23, a video control unit 24, an operation processing unit 25, an audio control unit 26, an interface unit 27, and the like. These components are connected to the bus 28. A display unit 24 a including a display is connected to the video control unit 24. The control unit 21 includes a CPU, ROM, RAM, and the like as a computer. An operation unit 25 a is connected to the operation processing unit 25. The operation unit 25a includes, for example, a mouse, a keyboard, a remote controller, and the like. A touch panel serving both as the display unit 24a and the operation unit 25a may be applied. The control unit 21 receives an operation instruction from the operation unit 25 a by the user via the operation processing unit 25. A speaker 26 a is connected to the audio control unit 26. The interface unit 27 is connected to the network NW. The storage unit 22 is configured by, for example, a hard disk drive or a flash memory. The storage unit 22 stores an OS, player software, and the like. The player software is a program that causes the CPU to execute processing for receiving and reproducing content or a digest image.

また配信サーバ１の記憶部１２は、上記カメラワークデータの受信処理等を制御部１１のＣＰＵに実行させるサーバプログラムも記憶している。さらに記憶部１２には、いずれかのクライアント２から受信したカメラワークデータが記憶される。各カメラワークデータは、コンテンツＩＤおよびユーザＩＤに対応付けて記憶部１２に記憶される。コンテンツＩＤは、対象のカメラワークデータが示す表示範囲でクライアント２により再生されていたコンテンツを示す。ユーザＩＤは、対象のカメラワークデータが示す疑似カメラワーク操作を行ったユーザを識別する識別情報である。また、ユーザＩＤは、そのユーザが利用するクライアント２を識別する情報でもある。なお、端末装置識別情報として、クライアント２のＩＰアドレス等が用いられてもよい。 The storage unit 12 of the distribution server 1 also stores a server program that causes the CPU of the control unit 11 to execute the camera work data reception process and the like. Furthermore, the camera unit data received from any one of the clients 2 is stored in the storage unit 12. Each camera work data is stored in the storage unit 12 in association with the content ID and the user ID. The content ID indicates content that has been played back by the client 2 within the display range indicated by the target camera work data. The user ID is identification information for identifying a user who has performed a pseudo camera work operation indicated by target camera work data. The user ID is also information for identifying the client 2 used by the user. Note that the IP address of the client 2 or the like may be used as the terminal device identification information.

（３）本実施形態のダイジェスト画像の生成および送信
次に図２および図３を参照して、本実施形態のダイジェスト画像の生成およびクライアント２への送信について説明する。図２は、本実施形態の最大表示回数の算出を例示する図である。図３は、本実施形態の最大表示回数のピーク値および本実施形態の最大表示回数領域の切出しをそれぞれ例示する図である。本実施形態の通信システムＳでは、クライアント２に配信されたコンテンツの内容を代表する本実施形態のダイジェスト画像が配信サーバ１において生成され、クライアント２からの要求に応じてそのクライアント２に送信される。配信サーバ１は、配信されたコンテンツについて各クライアント２から送信されたカメラワークデータに基づいて、そのコンテンツについてのダイジェスト画像を生成する。具体的に配信サーバ１は、各クライアント２において指定された表示範囲を示すカメラワークデータを受信する。そして配信サーバ１は、各クライアント２における後述の最大表示回数が、配信されたコンテンツに含まれていた動画データの再生時間軸においてピーク値または最大値となる画像の領域を含む画像フレームを用いて、そのコンテンツについてのダイジェスト画像を生成する。この場合の画像の領域は、一の画像フレームにおいて一または複数の画素からなる画像の領域である。すなわち、一のクライアント２へ配信されたコンテンツに含まれていた動画データを構成する画像フレームが、図２（ａ）に例示する画像フレームＧであるとする。そして、その画像フレームＧの中でそのクライアント２において指定された表示範囲が、図２に相当する表示範囲ＣＷ１であるとする。この表示範囲ＣＷ１は、画像フレームＧ内の領域であり、一または複数の画素により構成されている。この場合にそのクライアント２からは、表示範囲ＣＷ１を示すカメラワークデータが送信される。そして図２（ａ）の場合、表示範囲ＣＷ１の内側の領域の画像のクライアント２における表示回数は、１回である。表示範囲ＣＷ１の内側の領域の画像は、表示範囲ＣＷ１を示すカメラワークデータを送信した一のクライアント２において表示されたからである。これに対し、画像フレームＧにおける表示範囲ＣＷ１の外側の領域の画像は、いずれのクライアント２でも表示されていない。よって、画像フレームＧにおける表示範囲ＣＷ１の外側の領域の画像のクライアント２における表示回数は、０回である。なお図２（ａ）ないし図２（ｃ）では、画像フレームＧの領域（すなわち一または複数の画素からなる領域）ごとのクライアント２における表示回数が、かっこ書きで示されている。なお表示回数は、換言すれば、その画像フレームＧの領域の画像が表示されたクライアント２の数でもある。次に、画像フレームＧから構成される動画データを含むコンテンツが例えば三つのクライアント２に送信されているとすると、各クライアント２からは、例えば図２（ｂ）に例示する表示範囲ＣＷ１ないし表示範囲ＣＷ３をそれぞれ示すカメラワークデータが送信される。ここで、各クライアント２において指定された表示範囲ＣＷ１ないし表示範囲ＣＷ３の画像フレームＧにおける位置が、図２（ｂ）に例示する位置だったとする。この場合に、表示範囲ＣＷ１ないし表示範囲ＣＷ３には、画像フレームＧ内において図２（ｂ）に例示する重複範囲が生じる。そして図２（ｂ）に例示する場合における画像フレームＧの領域ごとの画像のクライアント２における表示回数は、図２（ｂ）においてかっこ書きで示される表示回数となる。すなわち例えば図２（ｂ）において４５度のハッチングで示すように、表示範囲ＣＷ１の内側で且つ表示範囲ＣＷ２および表示範囲ＣＷ２の外側の領域の画像のクライアント２における表示回数は、１回である。一方、表示範囲ＣＷ１および表示範囲ＣＷ２の内側で且つ表示範囲ＣＷ３の外側領域の画像のクライアント２における表示回数は、２回である。さらに、図２（ｂ）においてクロスハッチングで示すように、表示範囲ＣＷ１ないし表示範囲ＣＷ３すべての内側の領域の画像のクライアント２における表示回数は、３回である。そして、クライアント２におけるその画像の表示回数が表示範囲ＣＷ１ないし表示範囲ＣＷ３すべての内側の領域よりも多い領域は、図２（ｂ）に例示する画像フレームＧ内には存在しない。よって、図２（ｂ）に例示する画像フレームＧ内の領域の画像のクライアント２における表示回数の最大値は、「３」である。なお以下の説明において、画像フレームＧ内の領域のクライアント２における表示回数の最大値を、単に「最大表示回数」と称する。すなわち、図２（ｂ）に例示する画像フレームＧに対応する最大表示回数は、「３」である。次に、画像フレームＧから構成される動画データを含むコンテンツが例えば七つのクライアント２に送信されているとすると、各クライアント２からは、例えば図２（ｃ）に例示する表示範囲ＣＷ１ないし表示範囲ＣＷ７をそれぞれ示すカメラワークデータが送信される。ここで、各クライアント２において指定された表示範囲ＣＷ１ないし表示範囲ＣＷ７の画像フレームＧにおける位置が、図２（ｃ）に例示する位置だったとする。この場合に、表示範囲ＣＷ１ないし表示範囲ＣＷ７には、画像フレームＧ内において図２（ｃ）に例示する重複範囲が生じる。そして図２（ｃ）に例示する場合における画像フレームＧの領域ごとの画像の表示回数は、図２（ｃ）においてかっこ書きで示される表示回数となる。すなわち例えば図２（ｃ）において４５度のハッチングで示すように、表示範囲ＣＷ４の内側で且つ表示範囲ＣＷ１ないし表示範囲ＣＷ３および表示範囲ＣＷ５ないし表示範囲ＣＷ７の外側の領域の画像のクライアント２における表示回数は、１回である。一方、例えば図２（ｃ）において１３５度のハッチングで示すように、表示範囲ＣＷ４および表示範囲ＣＷ２の内側で且つ表示範囲ＣＷ５の外側の領域の画像のクライアント２における表示回数は、２回である。さらに、図２（ｃ）においてクロスハッチングで示すように、表示範囲ＣＷ１ないし表示範囲ＣＷ５すべての内側の領域の画像のクライアント２における表示回数は、５回である。そして、クライアント２における表示回数が表示範囲ＣＷ１ないし表示範囲ＣＷ５すべての内側の領域よりも多い領域は、画像フレームＧ内には存在しない。よって、図２（ｃ）に例示する画像フレームＧに対応する最大表示回数は、「５」である。本実施形態の配信サーバ１は、通信システムＳに属する各クライアント２からのカメラワークデータに基づき、最大表示回数を、図２に例示するように画像フレームＧごとに算出する。配信サーバ１は、算出した画像フレームＧごとの最大表示回数を、一時的に例えば記憶部１２に記憶する。そして配信サーバ１は、ダイジェスト画像により代表されるコンテンツの全体または一部について、その再生時間に応じた最大表示回数を画像フレームＧごとに検出する。ここで上記画像フレームＧは、コンテンツの再生に従った再生タイミングでクライアント２において表示される。よって画像フレームＧは、それが対応しているコンテンツの再生時間に、クライアント２において表示される。すなわち上記コンテンツの再生時間に応じた最大表示回数は、再生時間を横軸とし、最大表示回数を縦軸とすると、例えば図３（ａ）に例示する変化をする。なお図３（ａ）の横軸は、再生時間に代えて予め設定された再生タイミングからの画像フレーム数であってもよい。ここで図３（ａ）に例示する場合において、前後の再生時間に対して最大表示回数がピーク値を取るタイミングが、再生時間で４０秒、１９０秒および２８０秒であったとし、その場合のピークがそれぞれピークＰ１ないしピークＰ３となったとする。最大表示回数がピークＰ１ないしピークＰ３となるということは、ピークＰ１ないしピークＰ３に対応する画像フレームが、その前後の画像フレームよりも多くのクライアント２において表示されていることを意味する。そしてこの場合、ピークＰ１ないしピークＰ３のそれぞれに対応する画像フレームの内容が、図３（ａ）により最大表示回数の変化が示されるコンテンツの内容を代表しているといえる。そこで配信サーバ１は、図３（ａ）に例示する場合には、ピークＰ１ないしピークＰ３に対応する画像フレームに基づいて、図３（ａ）により最大表示回数の変化が示されるコンテンツの内容を代表するダイジェスト画像を生成する。なお、最大表示回数が再生時間に沿って単純に増加または減少するのみのコンテンツの場合は、上述したようなピークＰ１等ではなく、最大表示回数の最大値に対応する画像フレームに基づいてそのコンテンツの内容を代表するダイジェスト画像を生成してもよい。この場合、最大表示回数の上記最大値に対応する画像フレームが、上記ピークＰ１等に対応する画像フレームに対応する。また配信サーバ１は、画像フレームごとの最大表示回数に代えて、画像フレーム内の領域のクライアント２における表示回数が予め設定された閾値以上となる領域に基づいてダイジェスト画像を生成してもよい。その後配信サーバ１は、生成されたダイジェスト画像としての動画データ等を、クライアント２からの要求に応じてそのクライアント２に送信する。 (3) Generation and Transmission of Digest Image According to this Embodiment Next , generation of a digest image and transmission to the client 2 according to this embodiment will be described with reference to FIGS. FIG. 2 is a diagram illustrating calculation of the maximum number of display times according to the present embodiment. FIG. 3 is a diagram illustrating the peak value of the maximum number of display times of the present embodiment and the extraction of the maximum number of display times region of the present embodiment. In the communication system S of the present embodiment, a digest image of the present embodiment that represents the content distributed to the client 2 is generated in the distribution server 1 and transmitted to the client 2 in response to a request from the client 2. . The distribution server 1 generates a digest image for the content based on the camera work data transmitted from each client 2 for the distributed content. Specifically, the distribution server 1 receives camera work data indicating a display range designated by each client 2. Then, the distribution server 1 uses an image frame including an image region in which the maximum display count described later in each client 2 becomes a peak value or a maximum value on the reproduction time axis of the moving image data included in the distributed content. Then, a digest image for the content is generated. The image area in this case is an image area composed of one or a plurality of pixels in one image frame. In other words, it is assumed that the image frame constituting the moving image data included in the content distributed to one client 2 is the image frame G illustrated in FIG. Then, it is assumed that the display range designated by the client 2 in the image frame G is the display range CW1 corresponding to FIG. The display range CW1 is an area in the image frame G, and is composed of one or a plurality of pixels. In this case, camera work data indicating the display range CW1 is transmitted from the client 2. In the case of FIG. 2A, the number of times that the image in the area inside the display range CW1 is displayed on the client 2 is one. This is because the image of the area inside the display range CW1 is displayed on the one client 2 that has transmitted the camera work data indicating the display range CW1. On the other hand, the image of the area outside the display range CW1 in the image frame G is not displayed by any client 2. Therefore, the number of times the client 2 displays the image in the region outside the display range CW1 in the image frame G is zero. In FIGS. 2A to 2C, the number of display times in the client 2 for each region of the image frame G (that is, a region composed of one or a plurality of pixels) is shown in parentheses. In other words, the display count is, in other words, the number of clients 2 on which an image in the area of the image frame G is displayed. Next, assuming that content including moving image data composed of image frames G is transmitted to, for example, three clients 2, each client 2 displays, for example, display range CW1 to display range illustrated in FIG. Camera work data indicating each CW3 is transmitted. Here, it is assumed that the position in the image frame G of the display range CW1 to the display range CW3 designated in each client 2 is the position illustrated in FIG. In this case, in the display range CW1 to the display range CW3, an overlapping range illustrated in FIG. 2B, the number of times the client 2 displays the image for each region of the image frame G is the number of times indicated by parentheses in FIG. 2B. That is, for example, as indicated by hatching of 45 degrees in FIG. 2B, the number of times the client 2 displays the image in the area inside the display range CW1 and outside the display range CW2 and the display range CW2 is one. On the other hand, the number of times the client 2 displays the images inside the display range CW1 and the display range CW2 and outside the display range CW3 is two. Further, as shown by cross-hatching in FIG. 2B, the number of times the client 2 displays the images in the inner regions of the display range CW1 to the display range CW3 is three. Then, there is no region in the image frame G illustrated in FIG. 2B in which the number of times the image is displayed on the client 2 is greater than the inner region of all the display ranges CW1 to CW3. Therefore, the maximum value of the number of display times in the client 2 of the image in the region in the image frame G illustrated in FIG. 2B is “3”. In the following description, the maximum value of the number of display times in the client 2 in the region in the image frame G is simply referred to as “maximum display number”. That is, the maximum display count corresponding to the image frame G illustrated in FIG. 2B is “3”. Next, assuming that content including moving image data composed of image frames G is transmitted to, for example, seven clients 2, each client 2 displays, for example, display range CW1 to display range illustrated in FIG. Camera work data indicating each CW7 is transmitted. Here, it is assumed that the position in the image frame G of the display range CW1 to the display range CW7 designated in each client 2 is the position illustrated in FIG. In this case, in the display range CW1 to the display range CW7, an overlapping range illustrated in FIG. The number of display times of the image for each region of the image frame G in the case illustrated in FIG. 2C is the number of display times indicated by parentheses in FIG. That is, for example, as shown by hatching of 45 degrees in FIG. 2C, the display in the client 2 of the image in the area inside the display range CW4 and outside the display range CW1 to the display range CW3 and the display range CW5 to the display range CW7. The number of times is one. On the other hand, for example, as indicated by hatching of 135 degrees in FIG. 2C, the number of times the client 2 displays the image in the area inside the display range CW4 and the display range CW2 and outside the display range CW5 is two. . Further, as shown by cross-hatching in FIG. 2C, the number of times of display in the client 2 of the images in the inner areas of all the display ranges CW1 to CW5 is five. In addition, there is no region in the image frame G in which the number of times of display in the client 2 is greater than the inner region of all the display ranges CW1 to CW5. Therefore, the maximum number of display times corresponding to the image frame G illustrated in FIG. 2C is “5”. The distribution server 1 of the present embodiment calculates the maximum number of display times for each image frame G as illustrated in FIG. 2 based on the camera work data from each client 2 belonging to the communication system S. The distribution server 1 temporarily stores the calculated maximum number of display times for each image frame G, for example, in the storage unit 12. Then, the distribution server 1 detects, for each image frame G, the maximum number of display times corresponding to the reproduction time of all or part of the content represented by the digest image. Here, the image frame G is displayed on the client 2 at a reproduction timing according to the reproduction of the content. Therefore, the image frame G is displayed on the client 2 at the playback time of the content to which it corresponds. That is, the maximum number of times of display corresponding to the playback time of the content changes as exemplified in FIG. 3A, for example, where the playback time is on the horizontal axis and the maximum number of display times is on the vertical axis. Note that the horizontal axis in FIG. 3A may be the number of image frames from a preset reproduction timing instead of the reproduction time. In the case illustrated in FIG. 3A, the timing at which the maximum number of display times takes a peak value with respect to the preceding and following playback times is 40 seconds, 190 seconds, and 280 seconds in playback time. It is assumed that the peaks become peaks P1 to P3, respectively. The fact that the maximum number of display times is peak P1 to peak P3 means that the image frames corresponding to the peaks P1 to P3 are displayed on more clients 2 than the image frames before and after that. In this case, it can be said that the content of the image frame corresponding to each of the peak P1 to the peak P3 represents the content of the content whose change in the maximum display number is shown in FIG. Therefore, in the case illustrated in FIG. 3A, the distribution server 1 displays the contents of the content whose change in the maximum number of display is shown in FIG. 3A based on the image frames corresponding to the peaks P1 to P3. A representative digest image is generated. In the case of content in which the maximum number of times of display is simply increased or decreased along the playback time, the content is not based on the peak P1 or the like as described above but based on the image frame corresponding to the maximum value of the maximum number of times of display. A digest image representing the contents of the above may be generated. In this case, the image frame corresponding to the maximum value of the maximum display count corresponds to the image frame corresponding to the peak P1 and the like. Further, the distribution server 1 may generate a digest image based on a region where the number of times of display in the client 2 of the region in the image frame is equal to or greater than a preset threshold value, instead of the maximum number of times of display for each image frame. Thereafter, the distribution server 1 transmits moving image data or the like as the generated digest image to the client 2 in response to a request from the client 2.

次に、図３（ａ）により最大表示回数の変化が示されるコンテンツの内容を代表するダイジェスト画像の生成方法について、具体的に図３（ｂ）および図３（ｅ）を用いて説明する。以下の説明では、図３（ａ）に示すピークＰ２に対応する画像フレームに基づいたダイジェスト画像の生成を例として説明する。ピークＰ２に対する画像フレームに基づいて本実施形態のダイジェスト画像を生成する場合、配信サーバ１は先ず、ピークＰ２に対する画像フレームに基づき、全体動画を表す動画データからダイジェスト画像の生成に用いる画像フレームを抽出する。ここで配信サーバ１による画像フレームの抽出方法としては、二通りの方法が考えられる。第１の抽出方法は図３（ｂ）に例示するように、再生時間におけるピークＰ２のタイミングを含んでそのタイミングの前後の予め設定された再生時間ＤＡ１に再生された画像フレームを抽出する方法である。すなわち配信サーバ１は、画像フレーム抽出の第１の方法として、最大表示回数が動画データの再生時間軸に対してピークＰ２となったタイミングを含んでその前後の予め設定された再生時間に再生された画像フレームを動画データから抽出する。この場合に配信サーバ１は、再生時間におけるピークＰ２のタイミングの前後の例えば５秒間に再生された合計１０秒分の画像フレームを抽出する。第２の抽出方法は図３（ｃ）に例示するように、再生時間におけるピークＰ２としての最大表示回数に対して予め設定された１未満の係数を乗じて得られる最大表示回数以上の最大表示回数に対応した再生時間ＤＡ２に再生された画像フレームを抽出する方法である。すなわち配信サーバ１は、画像フレーム抽出の第２の方法として、最大表示回数が予め設定された値（割合）以上である画像フレームを動画データから抽出する。この場合配信サーバ１は、ピークＰ２における最大表示回数に対して予め設定された例えば係数０．９を乗じて得られる最大表示回数Ｌ以上の最大表示回数に対応した再生時間ＤＡ２に再生された画像フレームを抽出する。 Next, a method of generating a digest image that represents the content content whose change in the maximum number of display times is shown in FIG. 3A will be specifically described with reference to FIGS. 3B and 3E. In the following description, generation of a digest image based on an image frame corresponding to the peak P2 illustrated in FIG. When generating the digest image of the present embodiment based on the image frame for the peak P2, the distribution server 1 first extracts the image frame used for generating the digest image from the moving image data representing the entire moving image based on the image frame for the peak P2. To do. Here, as the image frame extraction method by the distribution server 1, two methods are conceivable. As illustrated in FIG. 3B, the first extraction method is a method for extracting an image frame reproduced at a preset reproduction time DA1 before and after the timing including the timing of the peak P2 in the reproduction time. is there. That is, as a first method of extracting image frames, the distribution server 1 is played back at preset playback times before and after the timing at which the maximum display count reaches the peak P2 with respect to the playback time axis of the video data. Extracted image frames from the video data. In this case, the distribution server 1 extracts image frames for a total of 10 seconds that have been reproduced, for example, for 5 seconds before and after the timing of the peak P2 in the reproduction time. In the second extraction method, as illustrated in FIG. 3C, the maximum display number equal to or greater than the maximum display number obtained by multiplying the maximum display number as the peak P2 in the reproduction time by a coefficient less than 1 set in advance. This is a method of extracting an image frame reproduced at a reproduction time DA2 corresponding to the number of times. That is, as a second method of extracting image frames, the distribution server 1 extracts image frames whose maximum display count is equal to or greater than a preset value (ratio) from the moving image data. In this case, the distribution server 1 reproduces the image reproduced at the reproduction time DA2 corresponding to the maximum number of times of display equal to or greater than the maximum number of times of display L obtained by multiplying the maximum number of times of display at the peak P2 by, for example, a coefficient 0.9. Extract the frame.

配信サーバ１は次に、図３（ｂ）または図３（ｃ）に例示する方法を用いて抽出した画像フレームから、その画像フレームにおけるダイジェスト画像生成用の領域を切り出す。配信サーバ１は、例えば通信システムＳによるコンテンツの配信において予め設定されている、各クライアント２における表示の最低解像度およびアスペクト比に基づいて、ダイジェスト画像生成用の領域を画像フレームから切り出す。ダイジェスト画像用の画像フレームの領域は、一または複数の画素からなる領域である。ここで配信サーバ１によるダイジェスト画像生成用の画像フレームの領域の切出し方法としては、二通りの方法が考えられる。第１の切出し方法は図３（ｄ）に例示するように、ダイジェスト画像の生成に用いる画像フレームＧにおいて、その画像フレームＧにおける最大表示回数の領域Ｍの大きさが上記最低解像度以下である場合の切出し方法である。この場合に配信サーバ１は、図３（ｄ）に例示する最大表示回数の領域Ｍの中心を中心とした上記最低解像度分の大きさの領域ＡＲを、ダイジェスト画像生成用として画像フレームＧから切り出す。この場合に、図３（ｄ）に例示する最大表示回数の領域Ｍは、最低解像度分の大きさの領域ＡＲよりも小さくなる。第２の切出し方法は図３（ｅ）に例示するように、ダイジェスト画像の生成に用いる画像フレームＧにおいて、その画像フレームＧにおける最大表示回数の領域Ｍの大きさが上記最低解像度より大きい場合の切出し方法である。この場合に配信サーバ１は、図３（ｅ）に例示する最大表示回数の領域Ｍを含み、かつ上記アスペクト比となる領域ＡＲを、ダイジェスト画像生成用として画像フレームＧから切り出す。 Next, the distribution server 1 cuts out a digest image generation area in the image frame extracted from the image frame extracted by using the method illustrated in FIG. 3B or 3C. For example, the distribution server 1 cuts out a digest image generation area from the image frame based on the minimum resolution and aspect ratio of the display in each client 2 set in advance in the distribution of content by the communication system S, for example. The area of the image frame for the digest image is an area composed of one or a plurality of pixels. Here, there are two possible methods for extracting a region of an image frame for generating a digest image by the distribution server 1. In the first cutting method, as illustrated in FIG. 3D, in the image frame G used for generating the digest image, the size of the area M of the maximum display count in the image frame G is equal to or less than the above minimum resolution. This is a cutting method. In this case, the distribution server 1 cuts out the area AR having a size corresponding to the minimum resolution centered on the center of the area M with the maximum number of display times illustrated in FIG. 3D from the image frame G for generating a digest image. . In this case, the area M of the maximum number of display times illustrated in FIG. 3D is smaller than the area AR having a size corresponding to the minimum resolution. As illustrated in FIG. 3E, the second clipping method is a case where, in the image frame G used for generating the digest image, the size of the region M of the maximum display count in the image frame G is larger than the minimum resolution. This is a cutting method. In this case, the distribution server 1 cuts out the area AR including the area M having the maximum number of display times illustrated in FIG. 3E and having the aspect ratio from the image frame G for generating a digest image.

そして配信サーバ１は、図３（ｂ）または図３（ｃ）のいずれかに例示される抽出方法により抽出した画像フレームＧから図３（ｄ）または図３（ｅ）のいずれかに例示される切出し方法により切り出した領域ＡＲを用いて、図３（ａ）により最大表示回数の変化が示されるコンテンツの内容を代表するダイジェスト画像を生成する。配信サーバ１は、図３（ａ）に例示するピークＰ１ないしピークＰ３について、ダイジェスト画像の生成を繰り返す。そして配信サーバ１は、生成されたダイジェスト画像を一時的に記憶部１２に記憶し、クライアント２からの要求に応じてそのクライアント２に送信する。 The distribution server 1 is exemplified in either FIG. 3 (d) or FIG. 3 (e) from the image frame G extracted by the extraction method exemplified in either FIG. 3 (b) or FIG. 3 (c). 3A is used to generate a digest image that represents the contents of the content whose change in the maximum number of times of display is shown in FIG. The distribution server 1 repeats the generation of digest images for the peaks P1 to P3 illustrated in FIG. The distribution server 1 temporarily stores the generated digest image in the storage unit 12 and transmits it to the client 2 in response to a request from the client 2.

なお配信サーバ１は、通信システムＳを構成する複数のクライアント２を、例えば予め設定されたそれぞれのユーザの属性に基づいて予め分類し、その分類したクライアント群ごとに対応付けて本実施形態のダイジェスト画像を生成してもよい。この場合にクライアント２の複数のクライアント群への分類は、換言するとそのクライアント２のユーザの分類でもある。このような分類を以下「クラスタリング」と称する。また分類されたクライアント群のそれぞれを、以下「クラスタ」と称する。この場合に配信サーバ１は、分類したクラスタをそれぞれ示す情報を、それに属するクライアント２を識別するユーザＩＤとともに記憶部１２に記憶する。そして配信サーバ１は、各クラスタに属するクライアント２からのカメラワークデータをクラスタごとに受信し、クラスタごとに最大表示回数を算出し、クラスタごとに図３（ａ）に例示される再生時間に対する最大表示回数の変化を検出する。配信サーバ１は、検出した最大表示回数の変化に基づき、そのピークに対応する画像フレームを用いた本実施形態のダイジェスト画像をクラスタごとに生成する。配信サーバ１は、生成したダイジェスト画像を、その生成に用いられたカメラワークデータを送信したクライアント２が属するクラスタごとに送信する。 The distribution server 1 classifies the plurality of clients 2 constituting the communication system S in advance based on, for example, preset user attributes, and associates the classified clients with each of the classified client groups. An image may be generated. In this case, the classification of the client 2 into a plurality of client groups is, in other words, the classification of the user of the client 2. Such classification is hereinafter referred to as “clustering”. Each classified client group is hereinafter referred to as a “cluster”. In this case, the distribution server 1 stores information indicating each classified cluster in the storage unit 12 together with a user ID for identifying the client 2 belonging thereto. The distribution server 1 receives the camera work data from the clients 2 belonging to each cluster for each cluster, calculates the maximum number of display times for each cluster, and determines the maximum for the reproduction time illustrated in FIG. 3A for each cluster. Detect changes in the number of impressions. The distribution server 1 generates a digest image of the present embodiment for each cluster using an image frame corresponding to the peak based on the detected change in the maximum number of display times. The distribution server 1 transmits the generated digest image for each cluster to which the client 2 that transmitted the camera work data used for the generation belongs.

また、ピークＰ１ないしピークＰ３にそれぞれ対応する一の画像フレームにおいて最大表示回数の領域が複数ある場合、これら複数の領域を用いてダイジェスト画像を生成してもよい。さらに、生成されたダイジェスト画像の利用方法としては、例えば、配信可能なコンテンツを一覧表示するリスト画面や、各コンテンツの説明用画面等での利用が考えられるが、この他に、再生中のコンテンツにおけるシーン選択等で利用してもよい。また本実施形態のダイジェスト画像は、上記オンデマンド配信だけでなく、ライブ配信でも利用可能である。このライブ配信での利用の場合、リアルタイムにダイジェスト画像の生成処理を行い、例えばライブにおける休憩時間等においてそれを表示してもよい。さらに、例えばライブへの途中参加者（途中からの観覧者）に対してダイジェスト画像を提示することで、そのライブへのいわゆるキャッチアップを容易にすることに利用してもよい。 Further, when there are a plurality of maximum display count areas in one image frame corresponding to each of the peaks P1 to P3, a digest image may be generated using the plurality of areas. Furthermore, as a method of using the generated digest image, for example, it may be used on a list screen that displays a list of distributable content, an explanation screen for each content, etc. It may be used for scene selection or the like. Moreover, the digest image of this embodiment can be used not only for the on-demand delivery but also for live delivery. In the case of use in live distribution, a digest image generation process may be performed in real time, and displayed, for example, during a break time in a live performance. Furthermore, for example, it may be used to facilitate so-called catch-up to a live by presenting a digest image to a live participant (a viewer from the middle).

（４）通信システムＳの動作
次に、図４を参照して、通信システムＳの動作について説明する。図４（ａ）は本実施形態のユーザクラスタリング処理を示すフローチャートである。図４（ｂ）は本実施形態の画像フレーム抽出処理を示すフローチャートである。図４（ｃ）は本実施形態のダイジェスト画像生成処理を示すフローチャートである。 (4) Operation of Communication System S Next, the operation of the communication system S will be described with reference to FIG. FIG. 4A is a flowchart showing user clustering processing of the present embodiment. FIG. 4B is a flowchart showing image frame extraction processing of the present embodiment. FIG. 4C is a flowchart showing digest image generation processing of the present embodiment.

（Ｉ）配信サーバ１におけるユーザクラスタリング処理
図４（ａ）を参照して、本実施形態のユーザクラスタリング処理について説明する。本実施形態の通信システムＳにおける配信サーバ１の制御部１１は、例えば予め設定された期間ごとに、図４（ａ）に示すユーザクラスタリング処理を開始する。先ず制御部１１は、ステップＳ１において、各クライアント２のユーザの特徴を示すユーザ特徴量を算出する。ここで、例えばあるユーザが、通信システムＳによりコンテンツの配信を受けるために通信システムＳへの参加を要求する際には、そのユーザを示す予め設定されたユーザ情報が、そのユーザを識別するユーザＩＤに関連付けて記憶部１２に記憶される。なおこの場合のユーザ情報としては、例えばそのユーザの年齢を示す年齢情報や、そのユーザの性別を示す性別情報、またはそのユーザの趣味や嗜好を示す趣味情報等が挙げられる。また制御部１１は、通信システムＳに含まれる各クライアント２における過去のコンテンツの視聴履歴情報を、ユーザＩＤおよびコンテンツＩＤに関連付けて記憶部１２に記憶している。そしてステップＳ１として制御部１１は、ユーザ毎に記憶されている上記ユーザ情報および上記視聴履歴情報に基づき、予め設定された方法により、各ユーザの特徴を示すユーザ特徴量をユーザ毎に算出する。そして制御部１１は、ステップＳ１で算出されたユーザ特徴量に基づき、例えば階層クラスタリング法等の従来のクラスタリング手法を用いて、その時点で通信システムＳに含まれているクライアント２のユーザをクラスタリングする（ステップＳ２）。このステップＳ２により、ユーザ特徴量に応じたユーザのクラスタが生成される。そして制御部１１は、ステップＳ２で生成されたクラスタを示す情報を、それに属するクライアント２を識別するユーザＩＤとともに記憶部１２に記憶する（ステップＳ３）。その後制御部１１は、ユーザクラスタリング処理を終了する。 (I) User Clustering Process in Distribution Server 1 The user clustering process of this embodiment will be described with reference to FIG. For example, the control unit 11 of the distribution server 1 in the communication system S of the present embodiment starts the user clustering process shown in FIG. 4A every preset period. First, in step S <b> 1, the control unit 11 calculates a user feature amount indicating a user feature of each client 2. Here, for example, when a certain user requests participation in the communication system S in order to receive distribution of content by the communication system S, the user information set in advance indicating the user identifies the user. The information is stored in the storage unit 12 in association with the ID. The user information in this case includes, for example, age information indicating the user's age, gender information indicating the user's gender, or hobby information indicating the user's hobbies and preferences. Further, the control unit 11 stores the viewing history information of past contents in each client 2 included in the communication system S in the storage unit 12 in association with the user ID and the content ID. In step S1, the control unit 11 calculates a user feature amount indicating the feature of each user for each user by a preset method based on the user information and the viewing history information stored for each user. Then, the control unit 11 clusters the users of the clients 2 included in the communication system S at that time using a conventional clustering method such as a hierarchical clustering method based on the user feature amount calculated in step S1. (Step S2). By this step S2, a user cluster corresponding to the user feature amount is generated. And the control part 11 memorize | stores the information which shows the cluster produced | generated by step S2 in the memory | storage part 12 with the user ID which identifies the client 2 which belongs to it (step S3). Thereafter, the control unit 11 ends the user clustering process.

（II）配信サーバ１における画像フレーム抽出処理
図４（ｂ）を参照して、本実施形態の画像フレーム抽出処理について説明する。以下に説明する画像フレーム抽出処理は、複数の画像フレームからなる動画として本実施形態のダイジェスト画像を生成する場合の画像フレーム抽出処理である。本実施形態の画像フレーム抽出処理は、本実地形態のダイジェスト画像の生成用に、ダイジェスト画像を生成するコンテンツのシーンを抽出する処理である。配信サーバ１の制御部１１は、例えばその管理者からの指示に基づき、配信済みのあるコンテンツについての本実施形態のダイジェスト画像を動画として生成する場合、初めに図４（ｂ）に示す本実施形態の画像フレーム抽出処理を行う。先ず制御部１１は、ステップＳ１０において、図４（ａ）に示すユーザクラスタリング処理により生成されたクラスタごとに、そのクラスタに属するクライアント２から送信されたカメラワークデータを記憶部１２から抽出して集計する（図２（ｂ）または図２（ｃ）参照）。すなわち制御部１１は、各クライアント２におけるコンテンツの表示範囲を示すカメラワークデータを、各クライアント２からクラスタごとに取得して記憶部１２に記憶しておき、それを抽出して集計する。次に制御部１１は、ステップＳ１０で集計したカメラワークデータに基づいて、図２または図３を用いて説明した方法により、各画像フレームにおける最大表示回数を算出する（ステップＳ１１）。すなわち制御部１１は、取得されているカメラワークデータに基づいて、画像フレーム内の既定領域ごとの最大表示回数を、画像フレームごとに算出する。次に制御部１１は、各画像フレームについて算出された最大表示回数に対して予め設定された例えば移動平均処理を用いたいわゆるスムージング処理を施す（ステップＳ１２）。次に制御部１１は、ステップＳ１２の結果を用いてそのコンテンツにおける最大表示回数のピーク値を算出する（ステップＳ１３）。ステップＳ１３で算出されるピーク値は、例えば図３（ａ）に例示するピークＰ１ないしピークＰ３それぞれにおける最大表示回数である。次に制御部１１は、ステップＳ１３において算出されたピーク値を、再生時間の順（図３（ａ）参照）から、ピーク値についての降順に並べ替える。さらに制御部１１は、並び替えられたピーク値の中から、ピーク値が最大のものから順に、ダイジェスト画像の生成に用いる画像フレームの数だけピーク値を抽出し、その抽出されたピーク値に対応する複数の画像フレームを記憶部１２から抽出する（ステップＳ１４）。すなわち制御部１１は、算出された最大表示回数に基づいて、その最大表示回数に対応する複数の画像フレームを抽出する。より具体的に制御部１１は、ステップＳ１４の第１例として、抽出されたピーク値をそれぞれに有する各ピーク（その一例は図３（ｂ）に例示するピークＰ２である）の再生時間におけるタイミングを含んでそのタイミングの前後の予め設定された再生時間（図３（ｂ）符号「ＤＡ１」参照）に再生された複数の画像フレームを抽出する。このステップＳ１４の第１例は、図３（ｂ）を用いて説明した第１の抽出方法に相当する。あるいは制御部１１は、ステップＳ１４の第２例として、抽出されたピーク値をそれぞれに有する各ピーク（その一例は図３（ｃ）に例示するピークＰ２である）に対して予め設定された１未満の係数を乗じて得られるピーク値（図３（ｃ）符号「Ｌ」参照）以上のピーク値に対応した再生時間（図３（ｃ）符号「ＤＡ２」参照）に再生された複数の画像フレームを抽出する。このステップＳ１４の第２例は、図３（ｃ）を用いて説明した第２の抽出方法に相当する。その後制御部１１は、本実施形態の画像フレーム抽出処理を終了する。 (II) Image Frame Extraction Processing in Distribution Server 1 The image frame extraction processing of this embodiment will be described with reference to FIG. The image frame extraction process described below is an image frame extraction process when the digest image of the present embodiment is generated as a moving image including a plurality of image frames. The image frame extraction process of the present embodiment is a process of extracting a content scene for generating a digest image for generating a digest image of the actual form. When the control unit 11 of the distribution server 1 generates, as a moving image, the digest image of the present embodiment for a certain content that has already been distributed based on an instruction from the administrator, for example, the present embodiment shown in FIG. Image frame extraction processing is performed. First, in step S10, the control unit 11 extracts, from the storage unit 12, the camera work data transmitted from the client 2 belonging to the cluster generated by the user clustering process shown in FIG. (Refer to FIG. 2B or FIG. 2C). That is, the control unit 11 acquires camera work data indicating the display range of the content in each client 2 for each cluster from each client 2 and stores it in the storage unit 12, and extracts and aggregates it. Next, the control unit 11 calculates the maximum number of display times in each image frame by the method described with reference to FIG. 2 or 3 based on the camera work data tabulated in step S10 (step S11). That is, the control unit 11 calculates the maximum number of display times for each predetermined area in the image frame for each image frame based on the acquired camera work data. Next, the control unit 11 performs a so-called smoothing process using, for example, a preset moving average process for the maximum number of display times calculated for each image frame (step S12). Next, the control unit 11 calculates the peak value of the maximum number of display times in the content using the result of step S12 (step S13). The peak value calculated in step S13 is, for example, the maximum number of display times for each of the peaks P1 to P3 illustrated in FIG. Next, the control unit 11 rearranges the peak values calculated in step S13 in descending order of the peak values from the order of the reproduction times (see FIG. 3A). Further, the control unit 11 extracts the peak values by the number of image frames used for generating the digest image in order from the largest peak value among the rearranged peak values, and corresponds to the extracted peak values. A plurality of image frames to be extracted are extracted from the storage unit 12 (step S14). That is, the control unit 11 extracts a plurality of image frames corresponding to the maximum display count based on the calculated maximum display count. More specifically, as a first example of step S14, the control unit 11 determines the timing in the reproduction time of each peak having an extracted peak value (an example is the peak P2 illustrated in FIG. 3B). A plurality of image frames reproduced in a preset reproduction time before and after the timing (see the code “DA1” in FIG. 3B) are extracted. The first example of step S14 corresponds to the first extraction method described with reference to FIG. Alternatively, as a second example of step S14, the control unit 11 sets 1 in advance for each peak having an extracted peak value (an example is the peak P2 illustrated in FIG. 3C). A plurality of images reproduced at a reproduction time (see symbol “DA2” in FIG. 3C) corresponding to a peak value equal to or higher than a peak value (see symbol “L” in FIG. 3C) obtained by multiplying a coefficient less than Extract the frame. The second example of step S14 corresponds to the second extraction method described with reference to FIG. Thereafter, the control unit 11 ends the image frame extraction process of the present embodiment.

（III）配信サーバ１におけるダイジェスト画像生成処理
図４（ｃ）を参照して、本実施形態のダイジェスト画像生成処理について説明する。以下に説明するダイジェスト画像生成処理は、複数の画像フレームからなる動画として本実施形態のダイジェスト画像を生成する場合のダイジェスト画像生成処理である。配信サーバ１の制御部１１は、例えば図４（ｂ）に示す画像フレーム抽出処理が終了したタイミングから、図４（ｃ）に示すダイジェスト画像生成処理を開始する。図４（ｃ）に示すダイジェスト画像生成処理は例えば、図４（ｂ）のステップＳ１４において抽出された各ピーク値のピークについて実行される。先ず制御部１１は、ステップＳ２０において、図４（ｂ）に示す画像フレーム抽出処理により各ピークに対応して抽出された各画像フレームの中から、本実施形態のダイジェスト画像生成処理の対象とする画像フレームを選択する。次に制御部１１は、ステップＳ２１において、ステップＳ２０で選択した画像フレームにおける最大表示回数の領域（図３（ｄ）または図３（ｅ）符号Ｍ参照）の大きさを示す情報を取得する。次に制御部１１は、ステップＳ２１で取得した領域の大きさが、通信システムＳにおける予め設定された上記最低解像度よりも大きいか否かを判定する（ステップＳ２２）。ステップＳ２２の判定において、その画像フレームにおける最大表示回数の領域の大きさが最低解像度以下である場合（ステップＳ２２：ＮＯ）、制御部１１は、その最大表示回数の領域Ｍの中心を中心とした最低解像度分の大きさを有しかつ上記アスペクト比となる領域を、ダイジェスト画像生成用として、ステップＳ２０で選択した画像フレームから切り出す（ステップＳ２４、図３（ｄ）参照）。すなわち制御部１１は、算出された最大表示回数に基づいて、最大表示回数となる画像フレームの領域を抽出する。一方ステップＳ２２の判定において、その画像フレームにおける最大表示回数の領域の大きさが最低解像度より大きい場合（ステップＳ２２：ＹＥＳ）、制御部１１は、その最大表示回数の領域を含み、かつ上記アスペクト比となる領域を、ダイジェスト画像生成用として、ステップＳ２０で選択した画像フレームから切り出す（ステップＳ２３、図３（ｅ）参照）。その後制御部１１は、ステップＳ２３またはステップＳ２４においてステップＳ２０で選択した画像フレームから切り出された領域を用いて、配信済みの上記コンテンツの内容を代表するダイジェスト画像を生成し、それを記憶部１２に記憶する（ステップＳ２５）。すなわち制御部１１は、切り出された領域に基づいて、コンテンツの内容を代表するダイジェスト画像を生成する。より具体的に制御部１１は、ステップＳ２０で選択した画像フレームからステップＳ２３またはステップＳ２４において切り出された領域を用いて、その画像フレームとしてそのコンテンツの内容を代表するダイジェスト画像を生成する。その後制御部１１は、図４（ｂ）に示す画像フレーム抽出処理により各ピークについて抽出された画像フレームの全てについて本実施形態のダイジェスト画像生成処理が終了したか否かを判定する（ステップＳ２６）。ステップＳ２６の判定において、図４（ｂ）に示す画像フレーム抽出処理によりピーク値ごとに抽出された画像フレームの全てについてダイジェスト画像生成処理が終了していない場合（ステップＳ２６：ＮＯ）、制御部１１は上記ステップＳ２０に戻り、ダイジェスト画像生成処理の対象とする次の画像フレームを選択する。一方ステップＳ２６の判定において、図４（ｂ）に示す画像フレーム抽出処理により各ピーク値について抽出された画像フレームの全てについてダイジェスト画像生成処理が終了している場合（ステップＳ２６：ＹＥＳ）、制御部１１はそのままダイジェスト画像生成処理を終了する。なお図４（ｃ）に示す本実施形態のダイジェスト画像生成処理により生成されたダイジェスト画像としての動画データ等は、例えば、図４（ｂ）ステップＳ１０のクラスタごとに、そのクラスタに属するクライアント２からの要求に応じてそのクライアント２に送信される。 (III) Digest Image Generation Processing in Distribution Server 1 The digest image generation processing of this embodiment will be described with reference to FIG. The digest image generation process described below is a digest image generation process when generating the digest image of the present embodiment as a moving image including a plurality of image frames. For example, the control unit 11 of the distribution server 1 starts the digest image generation process illustrated in FIG. 4C from the timing when the image frame extraction process illustrated in FIG. The digest image generation process shown in FIG. 4C is executed for each peak value extracted in step S14 of FIG. 4B, for example. First, in step S20, the control unit 11 sets the digest image generation process of this embodiment as a target of the image frame extracted from each image frame corresponding to each peak by the image frame extraction process shown in FIG. 4B. Select an image frame. Next, in step S21, the control unit 11 acquires information indicating the size of the region of the maximum number of times of display (see FIG. 3D or FIG. 3E symbol M) in the image frame selected in step S20. Next, the control part 11 determines whether the magnitude | size of the area | region acquired by step S21 is larger than the said minimum resolution preset in the communication system S (step S22). In the determination in step S22, when the size of the maximum display count area in the image frame is equal to or less than the minimum resolution (step S22: NO), the control unit 11 is centered on the center of the maximum display count area M. A region having the size corresponding to the minimum resolution and having the above aspect ratio is cut out from the image frame selected in step S20 for generating a digest image (see step S24, FIG. 3D). That is, the control unit 11 extracts a region of the image frame that becomes the maximum number of display times based on the calculated maximum number of display times. On the other hand, if it is determined in step S22 that the area of the maximum display count in the image frame is larger than the minimum resolution (step S22: YES), the control unit 11 includes the area of the maximum display count and includes the aspect ratio. Is cut out from the image frame selected in step S20 for digest image generation (see step S23, FIG. 3E). Thereafter, the control unit 11 generates a digest image representing the content of the content that has been distributed using the region cut out from the image frame selected in step S20 in step S23 or step S24, and stores the digest image in the storage unit 12. Store (step S25). That is, the control unit 11 generates a digest image that represents the content based on the clipped region. More specifically, the control unit 11 generates a digest image representing the contents as the image frame using the region cut out in step S23 or step S24 from the image frame selected in step S20. Thereafter, the control unit 11 determines whether or not the digest image generation processing of the present embodiment has been completed for all the image frames extracted for each peak by the image frame extraction processing shown in FIG. 4B (step S26). . If it is determined in step S26 that the digest image generation process has not been completed for all the image frames extracted for each peak value by the image frame extraction process shown in FIG. 4B (step S26: NO), the control unit 11 Returns to step S20 and selects the next image frame to be subjected to digest image generation processing. On the other hand, if it is determined in step S26 that the digest image generation process has been completed for all the image frames extracted for each peak value by the image frame extraction process shown in FIG. 4B (step S26: YES), the control unit 11 ends the digest image generation process as it is. Note that the moving image data or the like as the digest image generated by the digest image generation process of the present embodiment shown in FIG. 4C is obtained from, for example, the client 2 belonging to the cluster for each cluster in step S10 of FIG. Is transmitted to the client 2 in response to the request.

以上説明したように、本実施形態によれば、各クライアント２から取得されたカメラワークデータに基づいて、画像フレーム内の領域ごとのクライアント２における表示回数を画像フレームごとに算出する。そして、算出された表示回数が予め設定された基準（例えば最大表示回数または既定の閾値以上の表示回数）を満たす画像フレームの領域を抽出し、その抽出された領域に基づいてダイジェスト画像を生成する。よって、メタデータ等の追加データを不要として、コンテンツの内容を的確に代表するダイジェスト画像を生成することができる。 As described above, according to the present embodiment, the display count in the client 2 for each region in the image frame is calculated for each image frame based on the camera work data acquired from each client 2. Then, an area of the image frame that satisfies a preset reference (for example, the maximum display count or a display count equal to or greater than a predetermined threshold) is extracted, and a digest image is generated based on the extracted area. . Accordingly, it is possible to generate a digest image that accurately represents the contents without adding additional data such as metadata.

なお、静止画としてダイジェスト画像を生成する場合に制御部１１は、図４（ｂ）のステップＳ１４で抽出されたピーク値それぞれを有する各画像フレームのみ（すなわち、最大表示回数がピーク値となる各画像フレームのみ）を抽出する。そして制御部１１は、抽出した各画像フレームから図４（ｃ）に示す方法により切り出した領域を用いて、各画像フレームにより構成される動画データを含むコンテンツの内容を代表するダイジェスト画像を生成する。また制御部１１は、図４（ｂ）のステップＳ１４で抽出されたピーク値それぞれを有する各画像フレームの前後で、例えば所定の再生時間ごとに再生される画像フレームを用いてダイジェスト画像を生成してもよい。 In addition, when generating a digest image as a still image, the control unit 11 performs only each image frame having each peak value extracted in step S14 in FIG. 4B (that is, each maximum display count is a peak value). Extract image frames only). And the control part 11 produces | generates the digest image representing the content of the content containing the moving image data comprised by each image frame using the area | region cut out by the method shown in FIG.4 (c) from each extracted image frame. . Further, the control unit 11 generates a digest image using, for example, image frames that are reproduced at predetermined reproduction times before and after each image frame having each peak value extracted in step S14 of FIG. 4B. May be.

１配信サーバ
２クライアント
１１、２１制御部
１２、２２記憶部
１３、２７インターフェース部
２４ａ表示部
２５ａ操作部
Ｓ通信システム
ＮＷネットワーク
Ｇ画像フレーム
Ｐ１、Ｐ２、Ｐ３ピーク
ＤＡ１、ＤＡ２再生時間
Ｍ、ＡＲ領域 DESCRIPTION OF SYMBOLS 1 Distribution server 2 Client 11, 21 Control part 12, 22 Storage part 13, 27 Interface part 24a Display part 25a Operation part S Communication system NW network G Image frame P1, P2, P3 Peak DA1, DA2 Playback time M, AR area

Claims

In each of a plurality of terminal devices that display a moving image, a server device that transmits the moving image including a plurality of frames,
Acquisition means for acquiring range data indicating the display range of the moving image in each terminal device from each terminal device;
Based on the range data acquired by the acquisition means, a calculation means for calculating the number of display times in the terminal device for each preset area in the frame, for each frame;
Extraction means for extracting from the video the region of the frame that satisfies the preset reference number based on the display count calculated by the calculation means;
Generating means for generating a representative image representative of the content of the moving image based on the region extracted by the extracting means;
A server device comprising:

The server device according to claim 1,
The server device, wherein the acquisition unit acquires the range data from the terminal devices belonging to a terminal device group obtained by classifying a plurality of the terminal devices by a preset classification method.

In the server apparatus according to claim 1 or 2,
The calculation means calculates the display count for each pixel as the region,
The extraction unit extracts the pixels satisfying the criterion for the number of display times from the moving image,
The server device, wherein the generation unit generates the representative image based on the pixels extracted by the extraction unit.

In the server apparatus as described in any one of Claims 1-3,
The extraction means includes the area reproduced at a preset reproduction time before and after the timing including a timing at which the display count calculated by the calculation means reaches a peak with respect to the reproduction time axis of the moving image. Is extracted from the moving image.

In the server device according to any one of claims 1 to 4,
The server device, wherein the extraction unit extracts the region in which the number of display times calculated by the calculation unit is greater than or equal to a preset threshold value from the moving image.

In the server apparatus according to any one of claims 1 to 5,
The extraction means extracts, from the moving image, an image corresponding to a minimum resolution set in advance in the transmission of the moving image, centering on the center of the region where the number of display times is maximum in the frame. apparatus.

In the server apparatus according to any one of claims 1 to 5,
The extraction unit extracts from the moving image an image that includes the region where the number of times of display is maximized in the frame and has a preset aspect ratio in the transmission of the moving image. .

In the server apparatus according to any one of claims 1 to 5,
When the size of the area in which the number of times of display is greater than a preset threshold is less than or equal to a preset minimum resolution in the transmission of the moving image, the extraction unit is configured to extract the area in which the number of times of display is greater than the threshold. Centering on the center of the image, the image for the minimum resolution is extracted from the video,
When the size of the area where the number of times of display is greater than the threshold is greater than the minimum resolution, the extraction means includes the area where the number of times of display is greater than the threshold and is preset in transmission of the moving image. A server device that extracts an image having a certain aspect ratio from the moving image.

In the information processing method executed in the server device that transmits the moving image including a plurality of frames to each of the plurality of terminal devices that display the moving image,
An acquisition step of acquiring range data indicating a display range of the moving image in each terminal device from each terminal device;
Based on the range data acquired in the acquisition step, a calculation step of calculating, for each frame, the number of display times in the terminal device for each preset region in the frame;
Based on the display count calculated in the calculation step, an extraction step for extracting the region of the frame satisfying a preset criterion for the display count from the moving image;
A generating step for generating a representative image representative of the content of the moving image based on the region extracted in the extracting step;
An information processing method comprising:

To each of a plurality of terminal devices that display a moving image, a computer included in a server device that transmits the moving image including a plurality of frames,
An acquisition step of acquiring range data indicating a display range of the moving image in each terminal device from each terminal device;
Based on the range data acquired in the acquisition step, a calculation step of calculating, for each frame, the number of display times in the terminal device for each preset region in the frame;
Based on the display count calculated in the calculation step, an extraction step for extracting the region of the frame satisfying a preset criterion for the display count from the moving image;
A generating step for generating a representative image representative of the content of the moving image based on the region extracted in the extracting step;
A program characterized by having executed.