CN108513088B

CN108513088B - Method and device for group video session

Info

Publication number: CN108513088B
Application number: CN201710104439.2A
Authority: CN
Inventors: 李凯
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2017-02-24
Filing date: 2017-02-24
Publication date: 2020-12-01
Anticipated expiration: 2037-02-24
Also published as: CN108513088A

Abstract

The invention discloses a method and a device for group video conversation, and belongs to the field of Virtual Reality (VR). The method comprises the following steps: creating a group video session; for each user in the group video session, determining the user type of the user according to the equipment information of the user, wherein the user type comprises a common user and a virtual user, the common user is used for indicating the user to adopt a two-dimensional display mode when participating in the group video session, and the virtual user is used for indicating the user to adopt a virtual reality display mode when participating in the group video session; processing the video data of the group video session according to the video display mode indicated by the user type of the user to obtain target video data of the user, wherein the video display mode of the target video data is matched with the video display mode indicated by the user type of the user; and transmitting the target video data to the user equipment of the user in the process of the group video session. The invention has strong flexibility in group video conversation.

Description

Method and device for group video session

Technical Field

The present invention relates to the field of VR (Virtual Reality) technologies, and in particular, to a method and an apparatus for group video session.

Background

VR technology is a technology that can create and experience a virtual world, which can simulate a realistic environment and intelligently sense the behavior of a user, so that the user feels personally on the scene. Therefore, the application of VR technology in social aspect is receiving a lot of attention, and a method for conducting group video session based on VR technology is in the future.

At present, during a group video session, a server may create a virtual environment for a plurality of virtual users using VR devices, superimpose a virtual character selected by the virtual users with the virtual environment to express images of the virtual users in the virtual environment, and then the server may send videos of the virtual users superimposed with the images to the virtual users, so as to bring visual and auditory experiences to the virtual users, and make the virtual users seem to talk with other virtual users in a virtual world.

In the process of implementing the invention, the inventor finds that the prior art has at least the following problems:

the virtual user can only carry out group video conversation with the virtual user, and at present, when the VR device is not popularized, a large number of common users who do not use the VR device have great communication obstacles with the virtual user, so that the group video conversation is strong in limitation and poor in flexibility.

Disclosure of Invention

In order to solve the problems in the prior art, embodiments of the present invention provide a method and an apparatus for group video session. The technical scheme is as follows:

in a first aspect, a method for group video session is provided, the method comprising:

creating a group video session;

for each user in the group video session, determining a user type of the user according to the device information of the user, wherein the user type comprises a common user and a virtual user, the common user is used for indicating the user to adopt a two-dimensional display mode when participating in the group video session, and the virtual user is used for indicating the user to adopt a virtual reality display mode when participating in the group video session;

processing the video data of the group video session according to the video display mode indicated by the user type of the user to obtain target video data of the user, wherein the video display mode of the target video data is matched with the video display mode indicated by the user type of the user;

and in the process of the group video session, sending target video data to user equipment of the user to enable the user to carry out the group video session.

In a second aspect, a method for group video session is provided, the method comprising:

receiving target video data of a group video session sent by a server, wherein a video display mode of the target video data is matched with a video display mode indicated by a user type of an end user, the user type of the end user is a common user, and the common user is used for indicating the end user to adopt a two-dimensional display mode when participating in the group video session;

and displaying the target video data to enable common users in the group video conversation to be displayed in a two-dimensional character form, and enabling virtual users in the group video conversation to be displayed in a two-dimensional virtual character form.

In a third aspect, a method for group video session is provided, the method comprising:

receiving target video data of a group video session sent by a server, wherein a video display mode of the target video data is matched with a video display mode indicated by a user type of a VR device user, the user type of the VR device user is a virtual user, and the virtual user is used for indicating that the VR device user adopts a virtual reality display mode when participating in the group video session;

and displaying the target video data to enable the common users in the group video conversation to be displayed in a virtual environment in the form of two-dimensional characters or three-dimensional characters, and enabling the virtual users in the group video conversation to be displayed in the form of three-dimensional virtual characters in the virtual environment.

In a fourth aspect, an apparatus for group video sessions is provided, the apparatus comprising:

a creation module to create a group video session;

a determining module, configured to determine, for each user in the group video session, a user type of the user according to device information of the user, where the user type includes a general user and a virtual user, the general user is used to instruct the user to use a two-dimensional display mode when participating in the group video session, and the virtual user is used to instruct the user to use a virtual reality display mode when participating in the group video session;

the processing module is used for processing the video data of the group video session according to the video display mode indicated by the user type of the user to obtain target video data of the user, and the video display mode of the target video data is matched with the video display mode indicated by the user type of the user;

and the sending module is used for sending the target video data to the user equipment of the user in the process of the group video session so as to enable the user to carry out the group video session.

In a fifth aspect, an apparatus for group video session is provided, the apparatus comprising:

the receiving module is used for receiving target video data of a group video session sent by a server, wherein the video display mode of the target video data is matched with a video display mode indicated by a user type of an end user, the user type of the end user is a common user, and the common user is used for indicating the end user to adopt a two-dimensional display mode when participating in the group video session;

and the display module is used for displaying the target video data, so that common users in the group video conversation are displayed in a two-dimensional character form, and virtual users in the group video conversation are displayed in a two-dimensional virtual character form.

In a sixth aspect, an apparatus for group video sessions is provided, the apparatus comprising:

a receiving module, configured to receive target video data of a group video session sent by a server, where a video display mode of the target video data matches a video display mode indicated by a user type of a VR device user, the user type of the VR device user is a virtual user, and the virtual user is used to indicate that the VR device user adopts a virtual reality display mode when participating in the group video session;

and the display module is used for displaying the target video data, so that the common users in the group video conversation are displayed in a virtual environment in the form of two-dimensional characters or three-dimensional characters, and the virtual users in the group video conversation are displayed in the virtual environment in the form of three-dimensional virtual characters.

According to the embodiment of the invention, the user type of each user in the group video session is determined, and the video data of the group video session is processed according to the user type, so that when the user type is a virtual user, the target video data matched with the virtual reality display mode indicated by the virtual user can be obtained, and when the user type is a common user, the target video data matched with the two-dimensional display mode indicated by the common user can be obtained, so that the video data can be displayed in reasonable display modes for different types of users, the group video session can be performed among the users of different types without limitation, and the flexibility of the group video session is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of an implementation environment of a group video session according to an embodiment of the present invention;

fig. 2 is a flowchart of a method for group video session according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a user display location provided by an embodiment of the invention;

fig. 4 is a schematic diagram of a group video session scene according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a display scenario provided by an embodiment of the present invention;

fig. 6 is a flowchart of a virtual user conducting a group video session according to an embodiment of the present invention;

fig. 7 is a block diagram of an apparatus for group video session according to an embodiment of the present invention;

fig. 8 is a block diagram of an apparatus for group video session according to an embodiment of the present invention;

fig. 9 is a block diagram of an apparatus for group video session according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of a terminal according to an embodiment of the present invention;

fig. 11 is a block diagram of an apparatus 1100 for group video session according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of an implementation environment of a group video session according to an embodiment of the present invention. Referring to fig. 1, the implementation environment includes:

at least one terminal 101 (e.g., a mobile terminal and a tablet), at least one VR device 102, and at least one server 103. The interaction process of the terminal 101, the VR device 102 and the server 103 may correspond to the process of the group video session in the following embodiments; the server 103 is configured to create a group video session for different types of users, receive and process video data transmitted by the terminal 101 and the VR device 102, and transmit the processed video data to the terminal 101 or the VR device 102, so that the group video session can be performed between the different types of users. The terminal 101 is configured to send video data captured by the camera to the server 103 in real time, and receive and display the video data processed by the server 103. The VR device 102 is configured to send behavior feature data of the user, collected by the sensing device, to the server 103, and receive and display video data processed by the server 103.

Fig. 2 is a flowchart of a method for group video session according to an embodiment of the present invention. Referring to fig. 2, the method is applied to an interaction process between a server and a terminal and a VR device.

201. The server creates a group video session.

A group video session refers to a video session that is conducted by multiple (two or more) users on a server basis. The multiple users may be multiple users on the social platform corresponding to the server, and the multiple users may have a group relationship or a friend relationship.

In this step, when the server receives a group video session request from any of the user devices, a group video session may be created. The embodiment of the invention does not limit the initiating mode of the group video session request. For example, a user initiates a group video session request to all users in an established group, in this example, the group video session request may carry a group identifier of the group, so that the server may obtain the user identifier of each user in the group according to the group identifier. For another example, the user may also initiate a group video session request after selecting some users from the established group or the user relationship chain, in this example, the group video session request may carry the user identifiers of the user and the selected users. After the server acquires the user identifier, the user corresponding to the user identifier can be added to the group video session, so that the group video session is created.

202. For each user in the group video session, the server determines the user type of the user based on the device information of the user.

The device information may be a device model of the user device used by the user to log in the server, where the device model is represented by: the mobile phone brand + the mobile phone model enable the server to determine the device type of the user equipment according to the corresponding relationship between the device model and the device type, wherein the device type may be a Personal Computer (PC) terminal, a mobile terminal or a Virtual Reality (VR) device.

In this step, the server may obtain the device information in multiple ways, for example, when the user device sends a login request to the server, the login request may carry the user identifier and the device information, so that the server may extract the user identifier and the device information when receiving the login request and store the user identifier and the device information correspondingly, or the server sends a device information obtaining request to the user device, so that the user device sends the device information to the server.

Since the users in the group video session may log in to the server using different user devices, the video display modes supported by the different user devices are different (the VR device supports the virtual reality display mode, and the terminal supports the two-dimensional display mode). Therefore, the server needs to process the video data in different ways for users using different user devices to obtain video data matching with the video display modes supported by the user devices, and in order to determine how to process the video data for a certain user, the server needs to determine the user type of the user. The user types comprise a common user and a virtual user, the common user is used for indicating the user to adopt a two-dimensional display mode when participating in the group video session, if the user is the common user, the user is indicated to be a user using a non-VR device to log in the server, the non-VR device is indicated to be a mobile terminal, a tablet computer and the like, the virtual user is used for indicating the user to adopt a virtual reality display mode when participating in the group video session, and if the user is the virtual user, the user is indicated to be the user using the VR device to log in the server.

In this step, the server may query the user type corresponding to the device information of the user according to the pre-configured device information and the correspondence between the device type and the user type. See table 1 for an example of this correspondence:

TABLE 1

Device information	Type of device	Type of user
			XX thinkpad	PC terminal	General users
WW N7	Mobile terminal	General users
			UU VR	VR equipment	Virtual users

In fact, the user may also set the device information by himself, for example, a device information setting page is provided on the VR device, and the VR device user may set the current device information to "WW N7", and may also reserve "UU N7" of the default setting, so that the server may obtain the device information set by the VR device user, thereby determining the user type that the VR device user tends to experience.

203. And the server processes the video data of the group video session according to the video display mode indicated by the user type of the user to obtain the target video data of the user.

Wherein the video display mode of the target video data matches the video display mode indicated by the user type of the user. In this step, if the user type of the user is a normal user, the server determines that the user adopts the two-dimensional display mode when participating in the group video session, and adopts a video data processing mode corresponding to the two-dimensional display mode for the user, and if the user type of the user is a virtual user, the server determines that the user adopts the virtual reality display mode when participating in the group video session, and adopts a video data processing mode corresponding to the virtual reality display mode for the user. The embodiment of the invention does not limit the specific treatment process. In the following, the video data processing method corresponding to each type of user is introduced:

the processing procedure when the user type is a normal user is as follows steps 203A-203C:

203A, if the user type of the user is a common user, the server converts the three-dimensional virtual character corresponding to the virtual user in the group video conversation into a two-dimensional virtual character.

The three-dimensional avatar is used to express the avatar of the virtual user in three-dimensional image data so that the user can be displayed as a three-dimensional avatar at the time of the group video session. In this step, the server may acquire the three-dimensional virtual character in various ways. For example, before the virtual user confirms to enter the group video session, a plurality of three-dimensional virtual characters are provided for the virtual user, and the three-dimensional virtual character selected by the virtual user is used as the three-dimensional virtual character corresponding to the virtual user. For another example, the server obtains the user attribute of the virtual user, and takes a three-dimensional virtual character matching the user attribute as the three-dimensional virtual character corresponding to the virtual user, where the user attribute includes information such as age, gender, and occupation, and the server may select a three-dimensional virtual character in the form of a girl teacher as the three-dimensional virtual character corresponding to the virtual user, taking a girl teacher in the age of 30 as an example of the user attribute of the virtual user.

Further, the server may convert the three-dimensional virtual character into a two-dimensional virtual character based on the obtained three-dimensional virtual character, it should be noted that the two-dimensional virtual character may be static or dynamic, which is not limited in this embodiment of the present invention. For example, in order to save the computation resources of the server, the two-dimensional image data of a certain angle of view may be directly extracted from the three-dimensional image data corresponding to the three-dimensional virtual character, and the two-dimensional image data of the angle of view may be used as the two-dimensional virtual character, and the angle of view may be a frontal angle of view in order to express the virtual user as comprehensively as possible. For another example, in order to visually display the behavior of the virtual user, the server may obtain behavior feature data of the virtual user, which is acquired by the three-dimensional virtual character and the VR device, where the behavior feature data includes expression feature data or limb feature data of the virtual user, and further, the server may determine the behavior feature of the three-dimensional virtual character according to the behavior feature data, generate a three-dimensional virtual character that conforms to the behavior feature, synchronize the behavior of the three-dimensional virtual character with the behavior of the virtual user, and convert the three-dimensional virtual character into a two-dimensional virtual character.

203B, the server synthesizes the two-dimensional virtual character, the two-dimensional background selected by the virtual user and the audio data corresponding to the virtual user to obtain first two-dimensional video data.

Based on the two-dimensional avatar obtained in step 203A, in order to provide a richer visual effect to the user, the server may further add a two-dimensional background to the two-dimensional avatar. The two-dimensional background refers to the background of a two-dimensional virtual character, such as a two-dimensional meeting background and a two-dimensional beach background. The server may provide multiple two-dimensional backgrounds before entering the group video session for the virtual user, or obtain a two-dimensional background selected by the virtual user. In fact, the server may also obtain the two-dimensional background in other manners, for example, randomly obtain the two-dimensional background corresponding to the virtual user. For another example, in order to bring the same experience effect to the user in the group video session as much as possible, the server may use the two-dimensional image data mapped by the virtual environment corresponding to the group video session as the two-dimensional background, or the server may obtain the label of the virtual environment, and use the two-dimensional image data same as the label as the two-dimensional background, for example, the label of the virtual environment is "forest", and the server may use the two-dimensional image data labeled as "forest" as the two-dimensional background, which may be static or dynamic.

In this step, the server may determine a display position and a synthesis size of the two-dimensional virtual character on the two-dimensional background, adjust an original display size of the two-dimensional virtual character to obtain a two-dimensional virtual character conforming to the synthesis size, synthesize the two-dimensional virtual character to a corresponding display position on the two-dimensional background, and obtain image data currently corresponding to the virtual user with a layer of the two-dimensional virtual character on a layer of the two-dimensional background. In fact, the server may also determine a display area corresponding to the display position and the composite size on the two-dimensional background, remove the pixel points in the display area, and embed the image data corresponding to the two-dimensional virtual character into the display area, so as to use the embedded two-dimensional image data as the image data currently corresponding to the virtual user.

In the group video session process, when any user speaks, the user equipment can send the recorded audio data to the server in real time, so that when the server receives the audio data corresponding to the virtual user, the current image data and the audio data can be synthesized to obtain first two-dimensional video data so as to express the current language of the virtual user. Of course, if the server does not currently receive the audio data corresponding to the virtual user, the current image data may be directly used as the first two-dimensional video data.

203C, the server synthesizes the at least one first two-dimensional video data and the at least one second two-dimensional video data to obtain the target video data of the user.

The second two-dimensional video data refers to two-dimensional video data of an ordinary user in the group video session. In the step, the server determines the display position and the synthesis size of the current two-dimensional video data of each user in the group video session, synthesizes the current video data of each user with the virtual environment according to the determined display position and synthesis size to form a piece of two-dimensional video data, and the layer of the two-dimensional video data of the user is on the layer of the virtual environment, and takes the synthesized two-dimensional video data as the target video data of the user.

It should be noted that the two-step synthesizing process of steps 202B and 202C may also correspond to one synthesizing process, and in the synthesizing process, the server omits the step of synthesizing the first two-dimensional video data, and directly synthesizes the two-dimensional virtual character, the two-dimensional background, the audio data corresponding to the virtual user, and the second two-dimensional video data, thereby obtaining the target video data.

The processing procedure when the user type is a virtual user is as follows steps 203D-203H:

203D, if the user type of the user is a virtual user, the server determines the virtual environment corresponding to the group video session.

The virtual environment refers to a three-dimensional background of a virtual user in a group video session, such as a three-dimensional image of a round table conference virtual environment, a beach virtual environment, a table game virtual environment, and the like. The embodiment of the present invention does not limit the specific manner of determining the virtual environment. For example, the server may determine the following three ways:

in the first determination mode, the server determines the virtual environment corresponding to the virtual environment option triggered by the user as the virtual environment corresponding to the user in the group video session.

In order to make the process of providing virtual environments more humanized, the server can provide diversified virtual environments, and the user can freely select the virtual environment when the group video session. In this determination, the server may provide at least one virtual environment option and a corresponding virtual environment thumbnail on the VR device (or a terminal bound to the VR device), where each virtual environment option corresponds to one virtual environment. When the VR device detects that a virtual user triggers a certain virtual environment option, a virtual environment identifier corresponding to the virtual environment option may be sent to the server, and when the server obtains the virtual environment identifier, the virtual environment corresponding to the virtual environment identifier may be determined as the virtual environment of the user in the group video session.

And the second determination mode is that the capacity of the virtual environment corresponding to the group video session is determined according to the number of users in the group video session, and the virtual environment conforming to the capacity is determined as the virtual environment corresponding to the group video session.

In order to present a reasonable virtual environment to the user to avoid the virtual environment appearing crowded or spacious, the server may obtain the number of users in the group video session to determine the capacity that the virtual environment should have to indicate the number of users that the virtual environment can accommodate, e.g., the capacity of the roundtable conference virtual environment corresponds to the number of seats in the virtual environment. Further, the server may select, based on the determined capacity, one of the stored virtual environments that is closest to the capacity. For example, the number of users is 12, the server stores three round table conference virtual environments, and the number of seats in each round table conference virtual environment is 5, 10, and 15, so the server can determine the round table conference virtual environment with the number of seats of 12 as the virtual environment corresponding to the user in the group video session.

And analyzing the virtual environment selected by each user in the group video session in a third determination mode to obtain the selection times of each virtual environment, and determining the virtual environment with the most selection times as the virtual environment corresponding to the group video session.

In the determination mode, the server obtains more virtual environments preferred by the user by comprehensively analyzing the virtual environment selected by each user. For example, there are 5 users in the group video session, and the case that each user selects a virtual environment is shown in table 2, so the server can determine, through table 2, that the virtual environment 1 is selected the most times (4 times), and determine the virtual environment 1 as the virtual environment corresponding to the user in the group video session.

TABLE 2

It should be noted that, in the above three determination manners, in order to save the calculation resources of the server, after the server determines a virtual environment for a certain user, the virtual environment corresponding to the user may be directly determined as the virtual environment corresponding to each virtual user in the group video session.

In fact, any two or three of the above three determination manners may also be combined, and the combination manner is not limited by the embodiment of the present invention. For example, the first determination method and the third determination method are combined, if the server receives the virtual environment identifier triggered by the user, the virtual environment corresponding to the virtual environment identifier is determined, otherwise, the server adopts the third determination method.

203E, against the three-dimensional background of the virtual environment, the server determines the display position of each user in the group video session in the virtual environment.

In this step, in order to enable each user in the group video session to reasonably merge into the virtual environment, the server needs to determine the display position of each user in the virtual environment, where the display position refers to the synthetic position of the video data of the general user or the synthetic position of the three-dimensional virtual character of the virtual user. The embodiment of the present invention does not limit the manner of determining the display position, for example, for the user, the viewing angle of the user may be default to be a frontal viewing angle, so that the orientation of the three-dimensional virtual character corresponding to the user is consistent with the orientation of the frontal viewing angle. Thus, the user may or may not be shown in the group video session, and if so, referring to FIG. 3, the user may correspond to the display location indicated by the arrow in FIG. 3. In addition, for other users, the server may determine the display position in the following five determination manners (determination manner 1-determination manner 5).

Determining a mode 1, analyzing the intimacy degree between the user and other users in the group video session according to the social data between the user and other users in the group video session, and arranging the display positions of other users from any side of the user according to the intimacy degree.

In order to create a more realistic conversation scene, the determining mode considers the social tendency of each user in actual conversation, and determines the display position of each user according to the intimacy degree. The social data is not limited to the number of chatting, the time length of becoming a friend, the number of comment praise, and the like. The method for analyzing the intimacy degree is not limited in the embodiment of the invention. For example, the intimacy is represented by C, the chatting times are represented by chat, and the weight is 0.4; the time length of becoming a friend is represented by time, and the weight is 0.3; if comment like number is expressed in comment and weight is 0.3, intimacy can be expressed as:

C＝0.4*chat+0.3*time+0.3*comment

thus, if the other users are user 1, user 2, user 3 and user 4, respectively, and the social data between these users and this user is shown in table 3 by C1, C2, C3 and C4, then C1 is 37, C2 is 4, C3 is 82 and C4 is 76. Therefore, the server can determine the position closest to the user as the display position of the user 3, and arrange the display positions of the user 4, the user 1, and the user 2 in order of the degree of intimacy.

TABLE 3

User' s	chat (second time)	time (sky)	comment (second)
				User 1	10	100 days	10Next time
User 2	1	10 days	2 times (one time)
				User 3	40	200 days	20 times (twice)
User 4	100	100 days	20 times (twice)

And determining the mode 2, acquiring the user identities of other users, determining the opposite position of the user as the display position of the user with the highest user identity in the other users, and randomly determining the display positions of the rest users in the other users.

To highlight the dominance of certain users in a group video session, the server may determine the display position based on the user identity. The user identity is used for indicating the importance degree of the user in the group video session. The embodiment of the invention does not limit the standard for measuring the user identity. For example, if the user a in the other users is the initiating user of the group video session, it indicates that the user a is likely to dominate the group video session, and therefore the user a is determined as the user with the highest identity. For another example, if the user B in the other users is the administrator in the group corresponding to the group video session, the user B may also be determined as the user with the highest identity.

And determining a mode 3, and arranging the display positions of other users from any side of the users according to the time sequence of the other users joining the group video session.

In order to ensure that the process of determining the display position is simpler and more convenient and save the operation resources of the server, the display position can be directly determined according to the time of the user joining the group video session. Generally, whether to join the group video session is automatically confirmed by the user, so when the user equipment detects a confirmation operation of a certain user on joining the group video session, a confirmation joining message may be sent to the server, and when the server receives a first confirmation joining message in the group video session, the user corresponding to the confirmation joining message may be arranged at a display position closest to the user, and the display positions of the users corresponding to the received confirmation joining messages may be arranged sequentially.

And 4, determining the position selected by the user as the display position of the user in the virtual environment according to the position selected by the user in the virtual environment.

In order to make the process of determining the display position more arbitrary, the server also supports the user to select the display position by himself. In this determination, the server may provide a virtual environment template to each user before the group video session starts, and each user selects a display position on the virtual environment template by himself, but in order to avoid a conflict between users when selecting display positions, the server should display the currently selected display position in real time, for example, when a certain display position is selected, the server may add an unselected mark to the display position, so that each user selects a display position among the selectable display positions.

And determining the opposite position of the user as the display position of the common user in a determining mode 5, and randomly determining the display positions of the rest users in the other users.

Considering that an ordinary user is generally displayed in a two-dimensional character form, in a three-dimensional virtual environment, in order to avoid distortion of two-dimensional video data corresponding to the ordinary user and show the overall view of the ordinary user as much as possible, the server may determine the opposite position of the user as the display position of the ordinary user and randomly determine the display positions of the remaining users.

It should be noted that each user should correspond to one display area, and therefore, when a certain user a selects one display position, the server determines the display area corresponding to the user a. Moreover, in order to make the intervals when displaying the respective users more uniform in the virtual environment, the server may previously divide the display area in the virtual environment, for example, for a round table meeting virtual environment, one display area is provided at each seat.

Of course, any two or more of the above five determination manners may also be combined, for example, the determination manner 4 and the determination manner 5 are combined, the server determines the opposite position of the user as the display position of the general user, and provides a virtual environment template to each virtual user, and the display position of the virtual environment template determined for the general user has an unselected mark, so that each virtual user can select one display position among the selectable display positions by himself.

203F, for the common user in the group video session, the server synthesizes the specified video data of the common user to the display position corresponding to the common user.

In this step, the common users include a first common user and a second common user, the first common user is a common user using a binocular camera, the second common user is a common user using a monocular camera, and the video data of the two common users are different, so the manner in which the server obtains the designated video data is also different, and the embodiment of the present invention is described in cases 1 and 2:

in case 1, if the common user includes a first common user, converting two paths of two-dimensional video data of the first common user into first three-dimensional video data, and using the first three-dimensional video data as designated video data, or, if the common user includes the first common user, using the two paths of two-dimensional video data of the first common user as designated video data.

In this case, in order to display the first general user in the form of a three-dimensional character in the virtual environment, the server may obtain the specified video data in two ways:

the first way is to convert two paths of two-dimensional video data into first three-dimensional video data. The two paths of two-dimensional video data respectively correspond to actual scenes of common users captured from two visual angles, one pixel point of one path of two-dimensional video data is used as a reference, a pixel point corresponding to the pixel point in the other path of two-dimensional video data is determined, the two pixel points correspond to the same position in the actual scene, accordingly, the parallax of the two pixel points is determined, the parallax map can be obtained after the processing is carried out on each pixel point in the two paths of two-dimensional video data, and the three-dimensional image data of the actual scene is constructed according to the parallax map.

The second mode is that two paths of two-dimensional video data are directly used as appointed video data, when the appointed video data are sent to the VR equipment, an appointed display instruction is also sent, the appointed display instruction is used for instructing the VR equipment to respectively render the two paths of two-dimensional video data in left and right eye screens, and parallax can be formed during display by respectively rendering the two paths of two-dimensional video data with different visual angles in the left and right eye screens, so that a three-dimensional display effect is achieved.

Case 2, if the general users include a second general user, the two-dimensional video data of the second general user is taken as the designated video data.

It should be noted that, the embodiment of the present invention does not limit the manner of determining the user type of the general user. For example, if the server receives two paths of two-dimensional video data of a common user at the same time, it may be determined that the user type of the common user is a first common user, otherwise, it may be determined that the common user is a second common user.

Based on the display position determined in step 203E and the specified video data obtained in step 202F, the server may synthesize the specified video data to the display position corresponding to the general user. Of course, in order to make the display effect more realistic, before the composition, the server may adjust the display size corresponding to the specified video data to the composition size according to a preset composition size, where the composition size may be determined by a ratio of the virtual environment to the real character, and each virtual environment may correspond to one composition size.

It should be noted that, since the specified video data is only video data of one view (for the second general user) or two views (for the first general user), the specified video data occupies only a two-dimensional spatial position in the virtual environment at the time of composition. Moreover, the display position of each common user is different, and in order to provide better display effect for the users, the server may add a frame to the layer edge of the specified video data during composition, so that the display effect of the specified video data is rendered on a "virtual screen" in the virtual environment. Of course, if the display positions of two or more specified video data are adjacent, the server may also add a frame to the layer edges of the specified video data during composition, so that two or more ordinary users can display in one "virtual screen". Referring to fig. 4, an embodiment of the present invention provides a schematic diagram of a group video session scene, if (a) in fig. 4 shows that one general user is displayed in a "virtual screen", and as (b) in fig. 4 shows that two general users are displayed in a "virtual screen".

203G, for the virtual users in the group video conversation, the server synthesizes the three-dimensional virtual characters of the virtual users and the audio data to the display positions corresponding to the virtual users.

In this step, the server may obtain a three-dimensional virtual character of the virtual user (the obtaining process is the same as that in step 203A), adjust the three-dimensional virtual character to a synthesized size, synthesize the adjusted three-dimensional virtual character to a display position corresponding to the virtual user, and synthesize the synthesized three-dimensional image data with the obtained audio data of the virtual user to obtain audio and video data of the virtual user.

203H, the server takes the synthesized video data as the target video data of the user.

Through the synthesizing process of steps 203F and 203G, the server may finally obtain target video data, where the target video data includes the virtual character corresponding to each virtual user in the group video session and the video data of each general user.

204. In the process of group video session, the server sends target video data to the user equipment of the user, so that the user can perform the group video session.

For each user in the group video session, if the user type of the user is a normal user, the server may send the target video data obtained in steps 203A-203C to the terminal of the user, and if the user type of the user is a virtual user, the server may send the target video data obtained in steps 203D-203H to the VR device of the user, so that each user can perform the group video session. Referring to fig. 5, an embodiment of the present invention provides a display scene diagram. The user who logs in the server with the terminal is the terminal user, and the user who logs in the server with the VR equipment is the VR equipment user.

It should be noted that some users in the group video session may also have designated management authority, where the designated management authority refers to an authority to invite or remove users in the group video session, and the embodiment of the present invention does not limit which users have designated management authority. For example, the server may open the specified administrative rights to the initiating user of the group video session. As shown in fig. 6, an embodiment of the present invention provides a flowchart of a virtual user conducting a group video session. The virtual user may invite other users except the group video session to enter the group video session, may remove a certain user from the group video session, may send a private chat request to other users, or may accept the private chat request of other users.

205. When the terminal receives target video data of a group video session sent by the server, the target video data are displayed, so that common users in the group video session are displayed in a two-dimensional character form, and virtual users in the group video session are displayed in a two-dimensional virtual character form.

The user type of the end user is a common user, and therefore the end user adopts a two-dimensional display mode when participating in the group video session.

Because the two-dimensional video data of each user is synthesized on the server side according to the display position and the display size, when the terminal receives the target video data, the target video data can be rendered on the screen, and therefore two-dimensional characters of common users or two-dimensional virtual characters corresponding to virtual users are displayed in each area on the screen.

206. When the VR device receives target video data of a group video session sent by the server, the target video data is displayed, so that an ordinary user in the group video session is displayed in the form of a two-dimensional character or a three-dimensional character in the virtual environment, and a virtual user in the group video session is displayed in the form of a three-dimensional virtual character in the virtual environment.

The user type of the VR device user is a virtual user, and thus, the VR device user employs a virtual reality display mode while participating in the group video session.

Because the two-dimensional video data or the three-dimensional video data of the common user and the three-dimensional virtual character corresponding to the virtual user are synthesized on the server side according to the display position, when the VR equipment receives the target video data, the target video data can be rendered in left and right eye screens of the VR equipment, so that the VR equipment can display the two-dimensional character or the three-dimensional character of the common user on the display position corresponding to the common user, and the three-dimensional virtual character of the virtual user is displayed on the display position corresponding to the virtual user.

In addition, in order to clearly prompt a user who is speaking to the VR device user, based on the target video data, if the VR device detects that any user in the group video session is speaking, a speech prompt is displayed at a display position corresponding to the user. The presentation form of the presentation is not limited to the "presentation on the speech" character presentation, arrow icon, blinking icon, or the like. The embodiment of the invention does not limit the way of detecting whether the user speaks or not. For example, when the VR device detects the audio data of the user from the current target video data, it determines that the user is speaking, and further determines the display position corresponding to the user, and displays the speaking prompt on the display position.

In addition, when the user type of the user is a common user, the three-dimensional virtual character corresponding to the virtual user in the group video session is converted into a two-dimensional virtual character, the two-dimensional virtual character is synthesized with the two-dimensional background and the audio data to obtain the two-dimensional video data of the virtual user, so that the two-dimensional video data of the virtual user is matched with the two-dimensional display mode corresponding to the user, and a specific mode for processing the video data of the virtual user in the group video session is provided for the user.

In addition, when the user type of the user is a virtual user, the display position of each user in the group video session in the virtual environment can be determined, and the two-dimensional video data of the common user and the three-dimensional virtual character of the virtual user are respectively synthesized to the corresponding display positions, so that the synthesized video data is matched with the virtual reality display mode corresponding to the user, and a specific mode for processing the video data of the virtual user in the group video session is provided for the user.

In addition, for the first and second general users, different ways of acquiring the specified video data are provided: processing two paths of two-dimensional video data of a first common user into first three-dimensional video data, or directly acquiring the two paths of two-dimensional video data into specified video data and informing a VR device of a display mode; two-dimensional video data of a second general user is taken as the designated video data. Through two different acquisition modes, a display effect corresponding to the user type of the common user can be intelligently provided.

In addition, at least three specific methods for determining virtual environments corresponding to the group video session are provided, which can support users to select the virtual environments by themselves, can select the virtual environments with the capacity matched with the number of the users according to the number of the users in the group video session, can analyze the virtual environments selected by each user, and can select the virtual environment with the largest number of selection times, so that the modes for determining the virtual environments are more diversified.

Additionally, at least five determination modes are provided to determine the display position of each user in the virtual environment: according to the intimacy between users, the user identity or the time when the users join the group video session, the server intelligently selects a seat for each user, or more humanized, the users select a display position by themselves, or the display position of the common user is opposite to the front visual angle of the user in order to show the overall appearance of the common user as much as possible.

Fig. 7 is a block diagram of an apparatus for group video session according to an embodiment of the present invention. Referring to fig. 7, the apparatus specifically includes:

a creating module 701 for creating a group video session;

a determining module 702, configured to determine, for each user in the group video session, a user type of the user according to the device information of the user, where the user type includes a normal user and a virtual user, the normal user is used to instruct the user to use a two-dimensional display mode when participating in the group video session, and the virtual user is used to instruct the user to use a virtual reality display mode when participating in the group video session;

the processing module 703 is configured to process the video data of the group video session according to the video display mode indicated by the user type of the user, so as to obtain target video data of the user, where the video display mode of the target video data matches the video display mode indicated by the user type of the user;

a sending module 704, configured to send target video data to user equipment of a user in a process of a group video session, so that the user performs the group video session.

In one possible implementation, the processing module 703 is configured to: if the user type of the user is a common user, converting a three-dimensional virtual character corresponding to the virtual user in the group video conversation into a two-dimensional virtual character; synthesizing a two-dimensional virtual character, a two-dimensional background selected by a virtual user and audio data corresponding to the virtual user to obtain first two-dimensional video data; and synthesizing at least one first two-dimensional video data and at least one second two-dimensional video data to obtain target video data of the user, wherein the second two-dimensional video data refers to two-dimensional video data of common users in the group video session.

In one possible implementation, the processing module 703 is configured to: if the user type of the user is a virtual user, determining a virtual environment corresponding to the group video session; determining the display position of each user in the group video conversation in the virtual environment by taking the virtual environment as a three-dimensional background; for the common users in the group video session, synthesizing the specified video data of the common users to the display positions corresponding to the common users; for the virtual users in the group video conversation, synthesizing the three-dimensional virtual characters and the audio data of the virtual users to the display positions corresponding to the virtual users; and taking the synthesized video data as target video data of a user.

In one possible implementation, the processing module 703 is further configured to: if the common user comprises a first common user, converting the two paths of two-dimensional video data of the first common user into first three-dimensional video data, and taking the first three-dimensional video data as specified video data, wherein the first common user is a common user using a binocular camera, or if the common user comprises a first common user, taking the two paths of two-dimensional video data of the first common user as specified video data; and if the common users comprise second common users, the two-dimensional video data of the second common users are used as the designated video data, and the second common users refer to the common users using the monocular camera.

In one possible implementation, the processing module 703 is configured to: determining a virtual environment corresponding to a virtual environment option triggered by a user as a virtual environment corresponding to the user in a group video session; or the like, or, alternatively,

the processing module 703 is configured to: determining the capacity of the virtual environment corresponding to the group video session according to the number of users in the group video session, and determining the virtual environment conforming to the capacity as the virtual environment corresponding to the group video session; or the like, or, alternatively,

the processing module 703 is configured to: analyzing the virtual environment selected by each user in the group video session to obtain the selected times of each virtual environment, and determining the virtual environment with the most selected times as the virtual environment corresponding to the group video session.

In one possible implementation, the processing module 703 is configured to: analyzing the intimacy degree between the user and other users according to social data between the user and other users in the group video session, and arranging the display positions of other users from any side of the user according to the intimacy degree; or the like, or, alternatively,

the processing module 703 is configured to: acquiring user identities of other users, determining the opposite position of the user as the display position of the user with the highest user identity in the other users, and randomly determining the display positions of the rest users in the other users; or the like, or, alternatively,

the processing module 703 is configured to: arranging the display positions of other users from any side of the users according to the time sequence of the other users joining the group video session; or the like, or, alternatively,

the processing module 703 is configured to: determining the position selected by the user as the display position of the user in the virtual environment according to the position selected by the user in the virtual environment; or the like, or, alternatively,

the processing module 703 is configured to: and determining the opposite position of the user as the display position of the common user, and randomly determining the display positions of the rest users in the other users.

All the above-mentioned optional technical solutions can be combined arbitrarily to form the optional embodiments of the present invention, and are not described herein again.

Fig. 8 is a block diagram of an apparatus for group video session according to an embodiment of the present invention. Referring to fig. 8, the apparatus specifically includes:

a receiving module 801, configured to receive target video data of a group video session sent by a server, where a video display mode of the target video data is matched with a video display mode indicated by a user type of an end user, the user type of the end user is a general user, and the general user is used to indicate that the end user adopts a two-dimensional display mode when participating in the group video session;

a display module 802, configured to display the target video data, so that the common user in the group video session is displayed in a two-dimensional character form, and the virtual user in the group video session is displayed in a two-dimensional virtual character form.

According to the embodiment of the invention, the target video data is obtained by the server according to the user type, so that the target video data is matched with the two-dimensional display mode indicated by the common user, the video data is displayed in a reasonable display mode for the terminal user, the group video conversation can be carried out among different types of users without limit, and the flexibility of the group video conversation is improved.

Fig. 9 is a block diagram of an apparatus for group video session according to an embodiment of the present invention. Referring to fig. 9, the apparatus specifically includes:

a receiving module 901, configured to receive target video data of a group video session sent by a server, where a video display mode of the target video data is matched with a video display mode indicated by a user type of a VR device user, the user type of the VR device user is a virtual user, and the virtual user is used to indicate that the VR device user adopts a virtual reality display mode when participating in the group video session;

a display module 902, configured to display the target video data, so that the common user in the group video session is displayed in the virtual environment in the form of a two-dimensional character or a three-dimensional character, and the virtual user in the group video session is displayed in the virtual environment in the form of a three-dimensional virtual character.

According to the embodiment of the invention, the target video data is obtained by the server according to the user type, so that the target video data is matched with the two-dimensional display mode indicated by the virtual user, the video data is displayed in a reasonable display mode for VR equipment users, group video sessions can be performed among different types of users without limitation, and the flexibility of the group video sessions is improved.

In one possible implementation, the display module 902 is configured to: displaying a two-dimensional character or a three-dimensional character of the ordinary user at a display position corresponding to the ordinary user; and displaying the three-dimensional virtual character of the virtual user at the display position corresponding to the virtual user.

In one possible implementation, the display module 902 is further configured to: and based on the target video data, if any user in the group video session is detected to speak, displaying a speaking prompt on a display position corresponding to the user.

It should be noted that: in the device for group video session provided in the foregoing embodiment, only the division of the functional modules is illustrated in the group video session, and in practical applications, the above functions may be distributed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus for group video session and the method embodiment for group video session provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are described in detail in the method embodiment and are not described herein again.

Fig. 10 is a schematic structural diagram of a terminal according to an embodiment of the present invention. Referring to fig. 10, the terminal 1000 includes:

terminal 1000 can include RF (Radio Frequency) circuitry 110, memory 120 including one or more computer-readable storage media, input unit 130, display unit 140, sensor 150, audio circuitry 160, WiFi (Wireless Fidelity) module 170, processor 180 including one or more processing cores, and power supply 190. Those skilled in the art will appreciate that the terminal structure shown in fig. 10 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. Wherein:

the RF circuit 110 may be used for receiving and transmitting signals during information transmission and reception or during a call, and in particular, receives downlink information from a base station and then sends the received downlink information to the one or more processors 180 for processing; in addition, data relating to uplink is transmitted to the base station. In general, the RF circuitry 110 includes, but is not limited to, an antenna, at least one Amplifier, a tuner, one or more oscillators, a Subscriber Identity Module (SIM) card, a transceiver, a coupler, an LNA (Low Noise Amplifier), a duplexer, and the like. In addition, the RF circuitry 110 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to GSM (Global System for Mobile communications), GPRS (General Packet Radio Service), CDMA (Code Division Multiple Access), WCDMA (Wideband Code Division Multiple Access), LTE (Long Term Evolution), email, SMS (Short Messaging Service), and the like.

The memory 120 may be used to store software programs and modules, and the processor 180 executes various functional applications and data processing by operating the software programs and modules stored in the memory 120. The memory 120 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the terminal 1000, and the like. Further, the memory 120 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 120 may further include a memory controller to provide the processor 180 and the input unit 130 with access to the memory 120.

The input unit 130 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. In particular, the input unit 130 may include a touch-sensitive surface 131 as well as other input devices 132. The touch-sensitive surface 131, also referred to as a touch display screen or a touch pad, may collect touch operations by a user on or near the touch-sensitive surface 131 (e.g., operations by a user on or near the touch-sensitive surface 131 using a finger, a stylus, or any other suitable object or attachment), and drive the corresponding connection device according to a predetermined program. Alternatively, the touch sensitive surface 131 may comprise two parts, a touch detection means and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 180, and can receive and execute commands sent by the processor 180. Additionally, the touch-sensitive surface 131 may be implemented using various types of resistive, capacitive, infrared, and surface acoustic waves. In addition to the touch-sensitive surface 131, the input unit 130 may also include other input devices 132. In particular, other input devices 132 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

Display unit 140 can be used to display information entered by or provided to a user as well as various graphical user interfaces of terminal 1000, which can be made up of graphics, text, icons, video, and any combination thereof. The Display unit 140 may include a Display panel 141, and optionally, the Display panel 141 may be configured in the form of an LCD (Liquid Crystal Display), an OLED (Organic Light-Emitting Diode), or the like. Further, the touch-sensitive surface 131 may cover the display panel 141, and when a touch operation is detected on or near the touch-sensitive surface 131, the touch operation is transmitted to the processor 180 to determine the type of the touch event, and then the processor 180 provides a corresponding visual output on the display panel 141 according to the type of the touch event. Although in FIG. 10, touch-sensitive surface 131 and display panel 141 are shown as two separate components to implement input and output functions, in some embodiments, touch-sensitive surface 131 may be integrated with display panel 141 to implement input and output functions.

Terminal 1000 can also include at least one sensor 150, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that adjusts the brightness of the display panel 141 according to the brightness of ambient light, and a proximity sensor that turns off the display panel 141 and/or a backlight when the terminal 1000 moves to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when the mobile phone is stationary, and can be used for applications of recognizing the posture of the mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor that can be configured for terminal 1000 are not described herein.

Audio circuitry 160, speaker 161, and microphone 162 can provide an audio interface between a user and terminal 1000. The audio circuit 160 may transmit the electrical signal converted from the received audio data to the speaker 161, and convert the electrical signal into a sound signal for output by the speaker 161; on the other hand, the microphone 162 converts the collected sound signal into an electric signal, converts the electric signal into audio data after being received by the audio circuit 160, and then outputs the audio data to the processor 180 for processing, and then to the RF circuit 110 to be transmitted to, for example, another terminal, or outputs the audio data to the memory 120 for further processing. Audio circuitry 160 may also include an earbud jack to provide communication of peripheral headphones with terminal 1000.

WiFi belongs to short-distance wireless transmission technology, and the terminal 1000 can help a user to send and receive e-mails, browse webpages, access streaming media and the like through the WiFi module 170, and provides wireless broadband Internet access for the user. Although fig. 10 shows the WiFi module 170, it is understood that it does not belong to the essential constitution of the terminal 1000, and can be omitted entirely as needed within the scope not changing the essence of the invention.

Processor 180 is the control center of terminal 1000, and interfaces and lines are used to connect various parts of the entire handset, and by running or executing software programs and/or modules stored in memory 120, and calling data stored in memory 120, various functions of terminal 1000 and processing data are executed, thereby performing overall monitoring of the handset. Optionally, processor 180 may include one or more processing cores; preferably, the processor 180 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 180.

Terminal 1000 can also include a power supply 190 (e.g., a battery) for powering the various components, which can be logically coupled to processor 180 via a power management system to manage charging, discharging, and power consumption management functions via the power management system. The power supply 190 may also include any component including one or more of a dc or ac power source, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

Although not shown, terminal 1000 can also include a camera, a bluetooth module, etc., which are not described in detail herein. In this embodiment, the display unit of the terminal is a touch screen display, and the terminal further includes a memory and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the one or more processors. The one or more programs include instructions for:

the method comprises the steps that target video data of a group video session are sent by a receiving server, the video display mode of the target video data is matched with a video display mode indicated by the user type of an end user, the user type of the end user is a common user, and the common user is used for indicating the end user to adopt a two-dimensional display mode when participating in the group video session; and displaying the target video data, so that the common users in the group video conversation are displayed in a two-dimensional character form, and the virtual users in the group video conversation are displayed in a two-dimensional virtual character form.

Fig. 11 is a block diagram of an apparatus 1100 for group video session according to an embodiment of the present invention. For example, the apparatus 1100 may be provided as a server. Referring to fig. 11, the apparatus 1100 includes a processing component 1122 that further includes one or more processors and memory resources, represented by memory 1132, for storing instructions, such as application programs, executable by the processing component 1122. The application programs stored in memory 1132 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1122 is configured to execute instructions to perform the server-side method in the embodiments described above.

The apparatus 1100 may also include a power component 1126 configured to perform power management of the apparatus 1100, a wired or wireless network interface 1150 configured to connect the apparatus 1100 to a network, and an input/output (I/O) interface 1158. The device 1100 may operate based on an operating system stored in the memory 1132, such as Windows Server^TM，Mac OS X^TM，Unix^TM，Linux^TM，FreeBSD^TMOr the like.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method for group video session, applied to a server, the method comprising:

creating a group video session;

in the process of the group video session, sending target video data to user equipment of the user to enable the user to carry out the group video session;

the two-dimensional display mode is that a two-dimensional character corresponding to the common user and a two-dimensional virtual character corresponding to the virtual user are displayed in a two-dimensional background;

the virtual reality display mode is that appointed video data corresponding to the common user and a three-dimensional virtual character corresponding to the virtual user are respectively displayed at a display position corresponding to each user in a three-dimensional background of the virtual user, and the appointed video data are video data which are obtained based on the received video data of the common user and accord with the virtual reality display mode.

2. The method of claim 1, wherein the processing the video data of the group video session according to the video display mode indicated by the user type of the user to obtain the target video data of the user comprises:

if the user type of the user is a common user, converting the three-dimensional virtual character corresponding to the virtual user in the group video conversation into a two-dimensional virtual character;

synthesizing the two-dimensional virtual character, the two-dimensional background selected by the virtual user and the audio data corresponding to the virtual user to obtain first two-dimensional video data;

and synthesizing at least one first two-dimensional video data and at least one second two-dimensional video data to obtain the target video data of the user, wherein the second two-dimensional video data refers to the two-dimensional video data of the common user in the group video session.

3. The method of claim 1, wherein the processing the video data of the group video session according to the video display mode indicated by the user type of the user to obtain the target video data of the user comprises:

if the user type of the user is a virtual user, determining a virtual environment corresponding to the group video session;

determining the display position of each user in the group video session in the virtual environment by taking the virtual environment as a three-dimensional background;

for a common user in the group video session, synthesizing specified video data of the common user to a display position corresponding to the common user;

for a virtual user in the group video session, synthesizing a three-dimensional virtual character and audio data of the virtual user to a display position corresponding to the virtual user;

and taking the synthesized video data as the target video data of the user.

4. The method according to claim 3, wherein before synthesizing, for a normal user in the group video session, the specified video data of the normal user to the display position corresponding to the normal user, the method further comprises:

if the common user comprises a first common user, converting the two paths of two-dimensional video data of the first common user into first three-dimensional video data, and taking the first three-dimensional video data as the designated video data, wherein the first common user is a common user using a binocular camera, or if the common user comprises the first common user, taking the two paths of two-dimensional video data of the first common user as the designated video data;

and if the common users comprise second common users, using the two-dimensional video data of the second common users as the specified video data, wherein the second common users are the common users using the monocular camera.

5. The method of claim 3, wherein the determining the virtual environment corresponding to the group video session comprises:

determining the virtual environment corresponding to the virtual environment option triggered by the user as the virtual environment corresponding to the user in the group video session; or the like, or, alternatively,

determining the capacity of a virtual environment corresponding to the group video session according to the number of users in the group video session, and determining the virtual environment conforming to the capacity as the virtual environment corresponding to the group video session; or the like, or, alternatively,

analyzing the virtual environment selected by each user in the group video session to obtain the selected times of each virtual environment, and determining the virtual environment with the most selected times as the virtual environment corresponding to the group video session.

6. The method of claim 3, wherein the determining a display location of each user in the group video session in the virtual environment comprises:

analyzing the intimacy degree between the user and other users in the group video session according to social data between the user and the other users in the group video session, and arranging the display positions of the other users from any side of the user according to the intimacy degree; or the like, or, alternatively,

acquiring the user identities of the other users, determining the opposite position of the user as the display position of the user with the highest user identity in the other users, and randomly determining the display positions of the rest users in the other users; or the like, or, alternatively,

arranging the display positions of the other users from any side of the users according to the time sequence of the other users joining the group video session; or the like, or, alternatively,

determining the position selected by the user as the display position of the user in the virtual environment according to the position selected by the user in the virtual environment; or the like, or, alternatively,

and determining the opposite position of the user as the display position of the common user, and randomly determining the display positions of the rest users in the other users.

7. A method for group video session, applied to a terminal, the method comprising:

the method comprises the steps that a receiving server sends target video data of a group video session, the video display mode of the target video data is matched with a video display mode indicated by the user type of an end user, the user type of the end user is a common user, the common user is used for indicating the end user to adopt a two-dimensional display mode when participating in the group video session, the two-dimensional display mode is that a two-dimensional character corresponding to the common user and a two-dimensional virtual character corresponding to a virtual user are displayed in a two-dimensional background, the virtual user is used for indicating the end user to adopt a virtual reality display mode when participating in the group video session, the virtual reality display mode is that appointed video data corresponding to the common user and a three-dimensional virtual character corresponding to the virtual user are respectively displayed at a display position corresponding to each user in a three-dimensional background of the virtual user, the designated video data is video data which is obtained based on the received video data of the common user and accords with the virtual reality display mode;

8. A method for group video session, applied to Virtual Reality (VR) devices, the method comprising:

the method comprises the steps that target video data of a group video session are sent by a receiving server, the video display mode of the target video data is matched with the video display mode indicated by the user type of a VR device user, the user type of the VR device user is a virtual user, the virtual user is used for indicating the VR device user to adopt a virtual reality display mode when participating in the group video session, the virtual reality display mode is that specified video data corresponding to a common user and a three-dimensional virtual character corresponding to the virtual user are respectively displayed at a display position corresponding to each user in a three-dimensional background of the virtual user, the common user is used for indicating the terminal user to adopt a two-dimensional display mode when participating in the group video session, and the two-dimensional display mode is that a two-dimensional character corresponding to the common user and a two-dimensional virtual character corresponding to the virtual user are displayed in the two-dimensional background, the designated video data is video data which is obtained based on the received video data of the common user and accords with the virtual reality display mode;

9. The method of claim 8, wherein the displaying the target video data comprises:

displaying a two-dimensional character or a three-dimensional character of the ordinary user at a display position corresponding to the ordinary user;

and displaying the three-dimensional virtual character of the virtual user at the display position corresponding to the virtual user.

10. The method of claim 8, further comprising:

and based on the target video data, if any user in the group video session is detected to speak, displaying a speaking prompt on a display position corresponding to the user.

11. An apparatus for group video sessions, the apparatus comprising:

a creation module to create a group video session;

a determining module, configured to determine, for each user in the group video session, a user type of the user according to device information of the user, where the user type includes a normal user and a virtual user, the normal user is configured to instruct the user to adopt a two-dimensional display mode when participating in the group video session, where the two-dimensional display mode is to display a two-dimensional character corresponding to the normal user and a two-dimensional virtual character corresponding to the virtual user in a two-dimensional background, and the virtual user is configured to instruct the user to adopt a virtual reality display mode when participating in the group video session, where the virtual reality display mode is to display specified video data corresponding to the normal user and a three-dimensional virtual character corresponding to the virtual user at a display position corresponding to each user in a three-dimensional background of the virtual user respectively, the designated video data is video data which is obtained based on the received video data of the common user and accords with the virtual reality display mode;

12. The apparatus of claim 11, wherein the processing module is configured to:

13. The apparatus of claim 11, wherein the processing module is configured to:

and taking the synthesized video data as the target video data of the user.

14. The apparatus of claim 13, wherein the processing module is further configured to:

15. The apparatus of claim 13,

the processing module is used for: determining the virtual environment corresponding to the virtual environment option triggered by the user as the virtual environment corresponding to the user in the group video session; or the like, or, alternatively,

the processing module is used for: determining the capacity of a virtual environment corresponding to the group video session according to the number of users in the group video session, and determining the virtual environment conforming to the capacity as the virtual environment corresponding to the group video session; or the like, or, alternatively,

the processing module is used for: analyzing the virtual environment selected by each user in the group video session to obtain the selected times of each virtual environment, and determining the virtual environment with the most selected times as the virtual environment corresponding to the group video session.

16. The apparatus of claim 13,

the processing module is used for: analyzing the intimacy degree between the user and other users in the group video session according to social data between the user and the other users in the group video session, and arranging the display positions of the other users from any side of the user according to the intimacy degree; or the like, or, alternatively,

the processing module is used for: acquiring the user identities of the other users, determining the opposite position of the user as the display position of the user with the highest user identity in the other users, and randomly determining the display positions of the rest users in the other users; or the like, or, alternatively,

the processing module is used for: arranging the display positions of the other users from any side of the users according to the time sequence of the other users joining the group video session; or the like, or, alternatively,

the processing module is used for: determining the position selected by the user as the display position of the user in the virtual environment according to the position selected by the user in the virtual environment; or the like, or, alternatively,

the processing module is used for: and determining the opposite position of the user as the display position of the common user, and randomly determining the display positions of the rest users in the other users.

17. An apparatus for group video sessions, the apparatus comprising:

a receiving module, configured to receive target video data of a group video session sent by a server, where a video display mode of the target video data matches a video display mode indicated by a user type of an end user, where the user type of the end user is a normal user, the normal user is configured to indicate that the end user adopts a two-dimensional display mode when participating in the group video session, the two-dimensional display mode is to display a two-dimensional character corresponding to the normal user and a two-dimensional virtual character corresponding to a virtual user in a two-dimensional background, the virtual user is configured to indicate that the end user adopts a virtual reality display mode when participating in the group video session, and the virtual reality display mode is to display specified video data corresponding to the normal user and a three-dimensional virtual character corresponding to the virtual user at a display position corresponding to each user in a three-dimensional background of the virtual user respectively, the designated video data is video data which is obtained based on the received video data of the common user and accords with the virtual reality display mode;

18. An apparatus for group video sessions, the apparatus comprising:

a receiving module, configured to receive target video data of a group video session sent by a server, where a video display mode of the target video data matches a video display mode indicated by a user type of a VR device user, where the user type of the VR device user is a virtual user, the virtual user is configured to indicate that the VR device user adopts a virtual reality display mode when participating in the group video session, the virtual reality display mode is that specified video data corresponding to an ordinary user and a three-dimensional virtual character corresponding to the virtual user are respectively displayed at a display position corresponding to each user in a three-dimensional background of the virtual user, the ordinary user is configured to indicate that the end user adopts a two-dimensional display mode when participating in the group video session, and the two-dimensional display mode is that a two-dimensional character corresponding to the ordinary user and a two-dimensional virtual character corresponding to the virtual user are displayed in a two-dimensional background, the designated video data is video data which is obtained based on the received video data of the common user and accords with the virtual reality display mode;

19. The apparatus of claim 18, wherein the display module is configured to:

20. The apparatus of claim 18, wherein the display module is further configured to: