CN111554314B

CN111554314B - Noise detection method, device, terminal and storage medium

Info

Publication number: CN111554314B
Application number: CN202010415327.0A
Authority: CN
Inventors: 鲍枫; 李岳鹏
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-05-15
Filing date: 2020-05-15
Publication date: 2024-08-16
Anticipated expiration: 2040-05-15
Also published as: CN111554314A

Abstract

The disclosure provides a noise detection method, a device, a terminal and a storage medium, and belongs to the technical field of artificial intelligence and cloud computing. The method comprises the following steps: displaying a first session interface of a multimedia session application; collecting an audio signal; determining a parameter value of a noise indication parameter of the current frame audio signal according to the signal state, the signal energy and the detection information of the previous frame audio signal of the current frame audio signal; and in response to the parameter value of the noise indication parameter being greater than the parameter threshold, displaying noise indication information on the first session interface. The method and the device are determined according to the signal state of the current frame audio signal, the signal energy, the detection information of the previous frame audio signal and the parameter threshold value of the noise indication parameter, and because the signal state of the audio signal is not simply taken as the noise detection result, the signal energy of the current frame audio signal and the related detection information of the previous frame audio signal are comprehensively considered, the accidental of the detection result is removed, and the accuracy of the detection result is improved.

Description

Noise detection method, device, terminal and storage medium

Technical Field

The disclosure relates to the technical field of artificial intelligence and cloud computing, in particular to a noise detection method, a device, a terminal and a storage medium.

Background

With the development of artificial intelligence technology and cloud computing technology, session means and session forms are continuously changed, and multimedia sessions are favored by more and more users due to the diversity of functions, so that the multimedia sessions become the currently mainstream session forms. In the process of the multimedia session, because the environments of the users may be different and the environments of the users may be continuously changed, when the users speak, noise exists in the background environments, other users may not clearly listen to the session content, so that the session quality of the multimedia session is greatly reduced, and therefore, in order to improve the session quality of the multimedia session, the multimedia session is ensured to be smoothly carried out, and noise detection can be carried out.

When noise detection is carried out in the related technology, the following method can be adopted: collecting audio signals of a user in real time; detecting the collected audio signal by using VAD (Voice Activity Detection ) to obtain the signal state of the audio signal, wherein the signal state comprises a voice state or a non-voice state; if the signal state of the audio signal is a non-speech state, it is determined that noise is present in the audio signal.

Because the signal state of the audio signal has a certain contingency, the related art directly uses the signal state of the audio signal as a noise detection result, so that the detection result of the related art is not accurate enough.

Disclosure of Invention

In order to improve accuracy of a multimedia session environment noise detection result, the embodiment of the disclosure provides a noise detection method, a device, a terminal and a storage medium. The technical scheme is as follows:

In one aspect, a noise detection method is provided, the method including:

displaying a first session interface of a multimedia session application based on a first user identification of the currently logged-in multimedia session application;

Collecting an audio signal;

determining a change value of a noise indication parameter of the current frame audio signal according to the signal state and the signal energy of the current frame audio signal in the frame-by-frame detection process based on the audio signal;

Determining the parameter value of the noise indication parameter of the current frame audio signal according to the change value of the noise indication parameter of the current frame audio signal and the detection information of the previous frame audio signal;

And displaying noise prompt information on the first session interface in response to the parameter value of the noise indication parameter of the current frame audio signal being greater than a parameter threshold.

In another embodiment of the present disclosure, the determining a change value of a noise indication parameter of the current frame audio signal according to a signal state of the current frame audio signal and the signal energy includes:

determining a change value of a noise indication parameter of the current frame audio signal as a first numerical value in response to the signal state of the current frame audio signal being a speech state and the signal energy value of the current frame audio signal being greater than a first energy value;

Determining that a change value of a noise indication parameter of the current frame audio signal is a second value in response to the signal state of the current frame audio signal being a speech state and the signal energy value of the current frame audio signal being less than a first energy value;

wherein the first and second values are positive values and the first value is greater than the second value.

Determining that a change value of a noise indication parameter of the current frame audio signal is a third value in response to the signal state of the current frame audio signal being a non-speech state and the signal energy value of the current frame audio signal being greater than a second energy value;

Determining that a change value of a noise indication parameter of the current frame audio signal is a fourth value in response to a signal state of the current frame audio signal being a non-speech state and a signal energy value of the current frame audio signal being greater than a third energy value and less than the second energy value;

Determining that a change value of a noise indication parameter of the current frame audio signal is a fifth value in response to a signal state of the current frame audio signal being a non-speech state and a signal energy value of the current frame audio signal being greater than a fourth energy value and less than the third energy value;

determining that a change value of a noise indication parameter of the current frame audio signal is a sixth value in response to the signal state of the current frame audio signal being a non-speech state and the signal energy value of the current frame audio signal being less than the fourth energy value;

the third value, the fourth value, the fifth value and the sixth value are negative values, the third value is larger than the fourth value, the fourth value is larger than the fifth value, and the fifth value is larger than the sixth value.

In another embodiment of the present disclosure, the determining the parameter value of the noise indication parameter of the current frame audio signal according to the change value of the noise indication parameter of the current frame audio signal, the parameter value of the noise indication parameter of the previous frame audio signal, and the noise detection result corresponding to the previous frame audio signal includes:

Responding to the noise detection result corresponding to the previous frame of audio signal as triggering noise prompt, wherein the parameter value of the noise indication parameter of the previous frame of audio signal is larger than a seventh value, and determining the sum of the seventh value and the change value of the noise indication parameter of the current frame of audio signal as the parameter value of the noise indication parameter of the current frame of audio signal;

And responding to the noise detection result corresponding to the previous frame of audio signal as triggering noise prompt, wherein the parameter value of the noise indication parameter of the previous frame of audio signal is smaller than the seventh value, and determining the sum of the parameter value of the noise indication parameter of the previous frame of audio signal and the change value of the noise indication parameter of the current frame of audio signal as the parameter value of the noise indication parameter of the current frame of audio signal.

Responding to the noise detection result corresponding to the previous frame of audio signal as a non-triggered noise prompt, wherein the parameter value of the noise indication parameter of the previous frame of audio signal is larger than an eighth value, and determining the sum of the eighth value and the change value of the noise indication parameter of the current frame of audio signal as the parameter value of the noise indication parameter of the current frame of audio signal;

and responding to the noise detection result corresponding to the previous frame of audio signal as an untriggered noise prompt, wherein the parameter value of the noise indication parameter of the previous frame of audio signal is smaller than the eighth value, and determining the sum of the parameter value of the noise indication parameter of the previous frame of audio signal and the change value of the noise indication parameter of the current frame of audio signal as the parameter value of the noise indication parameter of the current frame of audio signal.

In another aspect, a noise detection method is provided, the method including:

displaying a second session interface of the multimedia session application based on a second user identification of the currently logged-in multimedia session application;

Receiving a prompt message sent by a server, wherein the prompt message is sent by a terminal logging in a first user identification when detecting that the parameter value of a noise indication parameter of a current frame audio signal is larger than a parameter threshold value, and the parameter value of the noise indication parameter of the current frame audio signal is determined according to the signal state, the signal energy and the detection information of a previous frame audio signal of the current frame audio signal;

And responding to the prompt message, and displaying noise prompt information corresponding to the first user identification on the second session interface.

In another aspect, there is provided a noise detection apparatus, the apparatus comprising:

The display module is used for displaying a first session interface of the multimedia session application based on a first user identification of the current login multimedia session application;

the acquisition module is used for acquiring the audio signals;

The determining module is used for determining the change value of the noise indication parameter of the current frame audio signal according to the signal state and the signal energy of the current frame audio signal in the frame-by-frame detection process of the audio signal;

the determining module is used for determining the parameter value of the noise indication parameter of the current frame audio signal according to the change value of the noise indication parameter of the current frame audio signal and the detection information of the previous frame audio signal;

And the display module is used for displaying noise prompt information on the first session interface in response to the parameter value of the noise indication parameter being greater than a parameter threshold.

the display module is used for displaying a second session interface of the multimedia session application based on a second user identification of the current login multimedia session application;

The receiving module is used for receiving a prompt message sent by the server, wherein the prompt message is sent by a terminal logging in a first user identifier when detecting that the parameter value of the noise indication parameter of the current frame audio signal is larger than a parameter threshold value, and the parameter value of the noise indication parameter of the current frame audio signal is determined according to the signal state, the signal energy and the detection information of the previous frame audio signal of the current frame audio signal;

and the display module is used for responding to the prompt message and displaying noise prompt information corresponding to the first user identifier on the second session interface.

In another aspect, a terminal is provided, the terminal including a processor and a memory, the memory storing at least one program code, the at least one program code loaded and executed by the processor to implement the noise detection method described above.

In another aspect, a computer readable storage medium having stored therein at least one program code loaded and executed by a processor to implement the noise detection method described above is provided.

The technical scheme provided by the embodiment of the disclosure has the beneficial effects that:

The method comprises the steps of acquiring an audio signal, and determining a parameter value of a noise indication parameter of the audio signal of the current frame according to the signal state, the signal energy and the detection information of the audio signal of the previous frame. Because the signal state of the audio signal is not simply taken as the detection result, the signal energy of the audio signal of the current frame and the related detection information of the audio signal of the previous frame are comprehensively considered, the accidental of the detection result is removed, and the accuracy of the detection result is improved.

In addition, the signal state and the signal energy of the audio signal are utilized in the detection process, other complex detection logic is not added, and detection resources are saved.

In addition, when the noisy environment is determined according to the noise detection result, the user can know the session environment in time by displaying the noise prompt information on the first session interface, so that measures are taken to build a good session environment for other users.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 is a schematic diagram of an implementation environment involved in a noise detection method provided by an embodiment of the present disclosure;

FIG. 2 is a flow chart of a noise detection method provided by an embodiment of the present disclosure;

FIG. 3 is a flow chart of a noise detection method provided by an embodiment of the present disclosure;

FIG. 4 is a flow chart of a noise detection method provided by an embodiment of the present disclosure;

FIG. 5 is a schematic illustration of a session interface for a multimedia session application in a quiet environment provided by an embodiment of the present disclosure;

FIG. 6 is a schematic illustration of a session interface for a multimedia session application in a noisy environment provided by an embodiment of the present disclosure;

FIG. 7 is a schematic illustration of a session interface of a multimedia session application in another noisy environment provided by an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of a noise detection device according to an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of a noise detection device according to an embodiment of the present disclosure;

fig. 10 shows a block diagram of a terminal provided by an exemplary embodiment of the present disclosure.

Detailed Description

For the purposes of clarity, technical solutions and advantages of the present disclosure, the following further details the embodiments of the present disclosure with reference to the accompanying drawings.

It will be understood that the terms "each," "plurality," and "any" as used in this disclosure, including two or more, each refer to each of the corresponding plurality, and any one refers to any one of the corresponding plurality. For example, the plurality of words includes 10 words, and each word refers to each of the 10 words, and any word refers to any one of the 10 words.

Before executing the embodiments of the present disclosure, the terms related to the embodiments of the present disclosure will be explained first.

Noisy environments are environments that are noisy and tend to cause audible discomfort to the user.

Detection refers to discriminating an event and an environment.

Artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that simulates, extends, and extends human intelligence using a digital computer or a machine controlled by a digital computer, perceives the environment, obtains knowledge, and uses the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Computer Vision (CV) is a science of studying how to "look" a machine, and more specifically, to replace a human eye with a camera and a Computer to perform machine Vision such as recognition and measurement on a target, and further perform graphic processing to make the Computer process an image more suitable for human eye observation or transmission to an instrument for detection. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR (Optical Character Recognition ), video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, map construction, etc., as well as common biometric recognition techniques such as face recognition, fingerprint recognition, etc.

Key technologies for the speech technology (Speech Technology) are an automatic speech recognition technology and a speech synthesis technology, and a voiceprint recognition technology. The method can enable the computer to listen, watch, say and feel, is the development direction of human-computer interaction in the future, and voice becomes one of the best human-computer interaction modes in the future.

Natural language processing (Nature Language Processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.

Machine learning (MACHINE LEARNING, ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

With research and advancement of artificial intelligence technology, research and application of artificial intelligence technology is being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, automatic driving, unmanned aerial vehicles, robots, smart medical treatment, smart customer service, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and with increasing importance value.

Cloud conferencing is an efficient, convenient, low-cost form of conferencing based on cloud computing technology. The user can rapidly and efficiently share voice, data files and videos with all groups and clients in the world synchronously by simply and easily operating through an internet interface, and the user is helped by a cloud conference service provider to operate through complex technologies such as data transmission, processing and the like in the conference.

At present, domestic cloud conference mainly focuses on service contents taking a Software as a main body (Software as a service) mode, including service forms such as telephone, network, video and the like, and video conference based on cloud computing is called as a cloud conference.

In the cloud conference era, the transmission, processing and storage of data are all processed by the computer resources of video conference factories, and users can carry out efficient remote conferences without purchasing expensive hardware and installing complicated software.

The cloud conference system supports the dynamic cluster deployment of multiple servers, provides multiple high-performance servers, and greatly improves conference stability, safety and usability. In recent years, video conferences are popular for many users because of greatly improving communication efficiency, continuously reducing communication cost and bringing about upgrade of internal management level, and have been widely used in various fields of government, transportation, finance, operators, education, enterprises, etc. Undoubtedly, the video conference has stronger attraction in convenience, rapidness and usability after the cloud computing is applied, and the video conference application is required to be stimulated.

Based on artificial intelligence technology and cloud conference technology, the embodiment of the disclosure provides a noise detection method, which comprises the steps of acquiring an audio signal of a first user, acquiring a signal state and signal energy of an audio signal of a current frame by adopting a conventional voice activity detection algorithm, further determining a parameter value of a noise indication parameter of the audio signal of the current frame according to the signal state and the signal energy of the audio signal of the current frame and a parameter value of a noise indication parameter of the audio signal of a previous frame and a noise detection result corresponding to the audio signal of the previous frame, and displaying noise prompt information on a session interface when the parameter value of the noise indication parameter of the audio signal of the current frame is larger than a parameter threshold. By adopting the method, when the noisy environment of the first user who is speaking cannot be normally and effectively communicated with the second user, the second user can be timely informed that the second user is in the noisy environment, and the second user can not listen to the content spoken by the second user. In the embodiment of the disclosure, when the parameter value of the noise indication parameter of the current frame audio signal of the first user is greater than the parameter threshold, the prompt message is automatically reported, so that the second user in the multimedia session can know that the speaking content of the first user cannot be clearly heard at present, and the speaking content is not caused by network, software, hardware and equipment factors but caused by the environment of the first user side.

Referring to fig. 1, an implementation environment related to a noise detection method provided by an embodiment of the disclosure is shown, referring to fig. 1, where the implementation environment includes: a first terminal 101, a server 102 and a second terminal 103.

Wherein the first terminal 101 and the second terminal 103 are terminals used by users currently participating in a multimedia session. The first terminal 101 is the terminal used by the first user who is currently speaking and the second terminal 103 is the terminal used by the second user in the multimedia session. The first terminal 101 and the second terminal 103 each have a multimedia session application installed therein, and based on the installed multimedia session application, the first user and the second user can conduct a multimedia session in the form of one of a video conference, a voice conference, a video call, a voice call, and the like. The first terminal 101 and the second terminal 103 may be, but are not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc.

The server 102 is a background server of a multimedia session application, and the server 102 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network ), basic cloud computing services such as big data and an artificial intelligence platform.

The first terminal 101 and the server 102, and the server 102 and the second terminal 103 may be directly or indirectly connected through wired or wireless communication, and the embodiments of the present disclosure are not limited herein.

Based on the implementation environment shown in fig. 1, the embodiment of the present disclosure provides a noise detection method, taking the first terminal to execute the embodiment of the present disclosure as an example. Referring to fig. 2, a method flow provided by an embodiment of the present disclosure includes:

201. Based on the first user identification of the currently logged-in multimedia session application, a first session interface of the multimedia session application is displayed.

202. An audio signal is acquired.

203. And determining the change value of the noise indication parameter of the current frame audio signal according to the signal state and the signal energy of the current frame audio signal in the frame-by-frame detection process of the audio signal.

Wherein the signal state includes a speech state, which may be represented by 1, or a non-speech state, which may be represented by 0.

204. And determining the parameter value of the noise indication parameter of the current frame audio signal according to the change value of the noise indication parameter of the current frame audio signal and the detection information of the previous frame audio signal.

205. And in response to the parameter value of the noise indication parameter being greater than the parameter threshold, displaying noise indication information on the first session interface.

According to the method provided by the implementation of the present disclosure, the parameter value of the noise indication parameter of the current frame audio signal is determined according to the signal state, the signal energy and the detection information of the previous frame audio signal of the current frame audio signal. Because the signal state of the audio signal is not simply taken as the detection result, the signal energy of the audio signal of the current frame and the related detection information of the audio signal of the previous frame are comprehensively considered, the accidental of the detection result is removed, and the accuracy of the detection result is improved.

In another embodiment of the present disclosure, in response to the parameter value of the noise indication parameter being greater than the parameter threshold, displaying noise prompt information on the first session interface includes:

And in response to the parameter value of the noise indication parameter of the current frame audio signal being greater than the parameter threshold, displaying noise prompt characters at a position, in a session member list of the first session interface, where the distance between the first user identification is less than a preset distance, the session member list including a plurality of user identifications participating in the multimedia session.

In another embodiment of the present disclosure, displaying a noisy prompt message on the first session interface in response to a parameter value of a noise indication parameter of the current frame audio signal being greater than a parameter threshold, includes:

And changing the display color of the microphone identifier corresponding to the first user identifier on the first session interface in response to the parameter value of the noise indication parameter of the current frame audio signal being greater than the parameter threshold.

In another embodiment of the present disclosure, before displaying the noise prompt information on the first session interface in response to the parameter value of the noise indication parameter of the current frame audio signal being greater than the parameter threshold, further comprising:

And in response to the parameter value of the noise indication parameter of the current frame audio signal being greater than the parameter threshold, and the noise prompt function being turned on, performing the step of displaying noise prompt information on the first session interface.

In another embodiment of the present disclosure, the method further comprises:

And responding to the fact that the parameter value of the noise indication parameter of the current frame audio signal is larger than the parameter threshold value, and the automatic mute function is started to close the conversation sound corresponding to the first user identification.

In another embodiment of the present disclosure, after displaying the noise prompt information on the first session interface in response to the parameter value of the noise indication parameter of the current frame audio signal being greater than the parameter threshold, further comprising:

And hiding the noise prompt information in response to the parameter value of the noise indication parameter of the audio signal of the next frame being smaller than the parameter threshold.

In another embodiment of the present disclosure, after displaying the noise prompt information on the first session interface in response to the parameter value of the noise indication parameter being greater than the parameter threshold, further comprising:

And sending a prompt message to the server, wherein the prompt message is sent to terminals logging in a plurality of second user identifiers by the server, the prompt message is used for triggering the terminals logging in the second user identifiers to display noise prompt information, and the second user identifiers are other user identifiers except the first user identifiers in the multimedia session.

In another embodiment of the present disclosure, determining a change value of a noise indication parameter of a current frame audio signal according to a signal state and signal energy of the current frame audio signal includes:

in response to the signal state of the current frame audio signal being a speech state, and the signal energy value of the current frame audio signal being greater than the first energy value, determining a change value of a noise indication parameter of the current frame audio signal as a first value;

in response to the signal state of the current frame audio signal being a speech state, and the signal energy value of the current frame audio signal being less than the first energy value, determining a change value of a noise indication parameter of the current frame audio signal as a second value;

the first value and the second value are positive values, and the first value is larger than the second value.

determining that the change value of the noise indication parameter of the current frame audio signal is a third value in response to the signal state of the current frame audio signal being a non-speech state and the signal energy value of the current frame audio signal being greater than the second energy value;

in response to the signal state of the current frame audio signal being a non-speech state, and the signal energy value of the current frame audio signal being greater than the third energy value and less than the second energy value, determining that the change value of the noise indication parameter of the current frame audio signal is a fourth value;

In response to the signal state of the current frame audio signal being a non-speech state, and the signal energy value of the current frame audio signal being greater than the fourth energy value and less than the third energy value, determining that the change value of the noise indication parameter of the current frame audio signal is a fifth value;

Determining that the change value of the noise indication parameter of the current frame audio signal is a sixth value in response to the signal state of the current frame audio signal being a non-speech state and the signal energy value of the current frame audio signal being less than the fourth energy value;

In another embodiment of the present disclosure, determining a parameter value of a noise indication parameter of a current frame audio signal according to a variation value of the noise indication parameter of the current frame audio signal, a parameter value of a noise indication parameter of a previous frame audio signal, and a noise detection result corresponding to the previous frame audio signal, includes:

Responding to the noise detection result corresponding to the previous frame of audio signal as triggering noise prompt, wherein the parameter value of the noise indication parameter of the previous frame of audio signal is larger than the seventh value, and determining the sum of the seventh value and the change value of the noise indication parameter of the current frame of audio signal as the parameter value of the noise indication parameter of the current frame of audio signal;

Responding to the noise detection result corresponding to the previous frame of audio signal as the non-triggered noise prompt, wherein the parameter value of the noise indication parameter of the previous frame of audio signal is larger than the eighth value, and determining the sum of the eighth value and the change value of the noise indication parameter of the current frame of audio signal as the parameter value of the noise indication parameter of the current frame of audio signal;

And responding to the noise detection result corresponding to the previous frame of audio signal as the non-triggered noise prompt, wherein the parameter value of the noise indication parameter of the previous frame of audio signal is smaller than the eighth value, and determining the sum of the parameter value of the noise indication parameter of the previous frame of audio signal and the change value of the noise indication parameter of the current frame of audio signal as the parameter value of the noise indication parameter of the current frame of audio signal.

Any combination of the above-mentioned optional solutions may be adopted to form an optional embodiment of the present disclosure, which is not described herein in detail.

Based on the implementation environment shown in fig. 1, the embodiment of the present disclosure provides a noise detection method, taking the second terminal to execute the embodiment of the present disclosure as an example, referring to fig. 3, a method flow provided by the embodiment of the present disclosure includes:

301. and displaying a second session interface of the multimedia session application based on the second user identification of the currently logged-in multimedia session application.

302. And receiving a prompt message sent by the server.

The prompt message is sent by a terminal logging in the first user identification when detecting that the parameter value of the noise indication parameter of the current frame audio signal is larger than a parameter threshold value, and the parameter value of the noise indication parameter of the current frame audio signal is determined according to the signal state, the signal energy and the detection information of the previous frame audio signal of the current frame audio signal.

303. And responding to the prompt message, and displaying noise prompt information corresponding to the first user identification on the second session interface.

According to the method provided by the embodiment of the disclosure, the prompt message is sent by the terminal when the parameter value of the noise indication parameter of the current frame audio signal is detected to be larger than the parameter threshold value, and the parameter value of the noise indication parameter of the current frame audio signal is determined according to the signal state, the signal energy and the detection information of the previous frame audio signal of the current frame audio signal.

In another embodiment of the present disclosure, in response to the noise prompt message, displaying noise prompt information corresponding to the first user identification on the second session interface includes:

And responding to the prompt message, displaying noise prompt characters at a position, in which the distance between the conversation member list of the second conversation interface and the first user identification is smaller than a preset distance, wherein the conversation member list comprises a plurality of user identifications participating in the multimedia conversation.

In another embodiment of the present disclosure, in response to the alert message, displaying noise alert information corresponding to the first user identification on the second session interface includes:

and responding to the prompt message, and changing the display color of the microphone identifier corresponding to the first user identifier on the second session interface.

Based on the implementation environment shown in fig. 1, the embodiment of the present disclosure provides a noise detection method, taking the first terminal, the server and the second terminal shown in fig. 1 as examples to execute the embodiment of the present disclosure, referring to fig. 4, a method flow provided by the embodiment of the present disclosure includes:

401. based on the first user identification of the currently logged-in multimedia session application, the first terminal displays a first session interface of the multimedia session application.

The multimedia session application is an application capable of realizing multimedia session, and may be a multimedia conference application, a social application, and the like. The session interface of the multimedia session application is an interface for carrying the multimedia session process and managing the multimedia session, and may include identification and head of a plurality of users who are in progress in the multimedia session, and may further include a plurality of session function options, for example, a mute option, an open video option, a shared screen option, an invite session option, a manage member option, a chat option, an expression option, a document option, a set option, and may further include a session operation option, for example, an end session option, and the like.

To facilitate the use of the multimedia session application by the user, each user in the multimedia session application registers a user identification and password for logging into the multimedia session application. When a first user logs in the multimedia session application through a first user identification and a password input on a login interface of the multimedia session application and performs a multimedia session with a plurality of second users, the first terminal can display the first session interface of the multimedia session application based on the first user identification of the current login multimedia session application.

402. Based on the second user identification of the currently logged-in multimedia session application, the second terminal displays a second session interface of the multimedia session application.

When the second user logs in the multimedia session application through the second user identifier and the password input on the login interface of the multimedia session application and performs the multimedia session with the first user, the second terminal may display the second session interface of the multimedia session application based on the second user identifier of the currently logged in multimedia session application. Wherein the second user identification is other user identifications than the first user identification in the multimedia session.

403. The first terminal collects audio signals.

During the multimedia session with the second user, when the first user is speaking, the first terminal will use a microphone and other devices to collect the audio signal of the first user in real time.

404. And based on the audio signal in the frame-by-frame detection process, the first terminal determines the change value of the noise indication parameter of the audio signal of the current frame according to the signal state and the signal energy of the audio signal of the current frame.

For the collected audio signal, the first terminal divides the collected audio signal into a plurality of frames, and when noise detection is performed based on the audio signal, the detection can be performed frame by frame in units. For the divided multi-frame audio signals, the divided multi-frame audio signals can be formed into a frame sequence according to the acquisition time sequence. The current frame audio signal is any frame in the frame sequence, and the current frame audio signal is also a frame audio signal processed when the noise detection is performed. The previous frame of audio signal is the previous frame of the current frame of audio signal in the frame sequence, and the previous frame of audio signal is also the previous frame of audio signal processed at the time of the last noise detection. In practice, the current frame audio signal is not fixed and, over time, will participate in the detection of the previous frame of the next frame audio signal.

For the current frame audio signal, the first terminal may acquire signal energy of the current frame audio signal by using an energy algorithm. When the energy algorithm is adopted to obtain the signal energy of the current frame audio signal, the first terminal can input the current frame audio signal into the oscilloscope to obtain the waveform corresponding to the current frame audio signal, and further obtain the signal energy of the current frame audio signal according to the amplitude of the waveform displayed on the oscilloscope.

For the signal state of the current frame audio signal, the first terminal can process the current frame audio signal by adopting a VAD algorithm. The VAD is used to accurately locate the beginning and ending positions of the speech from the noisy audio signal, i.e. to separate silence from the actual speech. When the VAD is adopted to process the current frame audio signal, a threshold value can be preset, and if the signal energy of the current frame audio signal is larger than the threshold value, the signal state of the current frame audio signal is determined to be a voice state; if the signal energy of the current frame audio signal is less than the threshold, the signal state of the current frame audio signal is determined to be a non-speech state.

Wherein the noise indication parameter is used to determine a parameter of the multimedia session environment. In the embodiment of the present disclosure, the signal states of the current frame audio signal are different, and the determined change values of the noise indication parameters of the current frame audio signal are also different. For different signal states of the current frame audio signal, when the first terminal determines a change value of the noise indication parameter of the current frame audio signal according to the signal state and the signal energy of the current frame audio signal, the first terminal includes but is not limited to the following cases:

in the first case, the signal state of the audio signal of the current frame is a speech state.

In one possible implementation, in response to the signal state of the current frame audio signal being a speech state and the signal energy value of the current frame audio signal being greater than the first energy value, the first terminal determines that the change value of the noise indication parameter of the current frame audio signal is a first value.

In another possible implementation, the first terminal determines that the change value of the noise indication parameter of the current frame audio signal is the second value in response to the signal state of the current frame audio signal being a speech state and the signal energy value of the current frame audio signal being less than the first energy value.

The first value and the second value may be determined according to the statistical data, and the first value and the second value are usually positive values, and the first value is greater than the second value, for example, the first value may be 50, 60, etc., and the second value may be 20, 30, etc. The first energy value may also be determined based on statistical data, and may be-48 db, -50db, or the like.

For example, the first energy value is-48 db, the first value is 50, and the second value is 20. In response to the signal state of the current frame audio signal being a speech state and the signal energy value of the current frame audio signal being-20 db greater than the first energy value-48 db, determining that the change value of the noise indication parameter of the current frame audio signal is 50; and determining that the change value of the noise indication parameter of the current frame audio signal is 20 in response to the signal state of the current frame audio signal being a speech state and the signal energy value of the current frame audio signal being-50 db less than the first energy value-48 db.

In the second case, the signal state of the current frame audio signal is a non-speech state.

In one possible implementation, the first terminal determines that the change value of the noise indication parameter of the current frame audio signal is a third value in response to the signal state of the current frame audio signal being a non-speech state and the signal energy value of the current frame audio signal being greater than the second energy value.

In another possible implementation, the first terminal determines that the change value of the noise indication parameter of the current frame audio signal is a fourth value in response to the signal state of the current frame audio signal being a non-speech state and the signal energy value of the current frame audio signal being greater than the third energy value and less than the second energy value.

In another possible implementation, the first terminal determines that the change value of the noise indication parameter of the current frame audio signal is a fifth value in response to the signal state of the current frame audio signal being a non-speech state and the signal energy value of the current frame audio signal being greater than the fourth energy value and less than the third energy value.

In another possible implementation, the first terminal determines that the change value of the noise indication parameter of the current frame audio signal is a sixth value in response to the signal state of the current frame audio signal being a non-speech state and the signal energy value of the current frame audio signal being less than the fourth energy value.

The second energy value, the third energy value and the fourth energy value can also be determined according to the statistical data, and the magnitude relation of the second energy value, the third energy value and the fourth energy value is as follows: the second energy value is greater than the third energy value, which is greater than the fourth energy value. The second energy value may be-38 db, -39db, etc., the third energy value may be-42 db, -43db, etc., and the fourth energy value may be-48 db, -49db, etc. The third value, the fourth value, the fifth value and the sixth value can also be determined according to the statistical data, and typically the third value, the fourth value, the fifth value and the sixth value are negative values, and the magnitude relationship among the third value, the fourth value, the fifth value and the sixth value is: the third value is greater than the fourth value, the fourth value is greater than the fifth value, and the fifth value is greater than the sixth value. The third value may be-320, the fourth value may be-400, the fifth value may be-440, the sixth value may be-640, etc.

For example, the second energy value is-38 db, the third energy value is-42 db, the fourth energy value is-48 db, the third value is-320, the fourth value is-400, the fifth value is-440, and the sixth value is-640. Determining that the change value of the noise indication parameter of the current frame audio signal is-320 in response to the signal state of the current frame audio signal being a non-speech state and the signal energy value of the current frame audio signal being-20 db greater than the second energy value-38 db; determining that a change value of a noise indication parameter of the current frame audio signal is-400 in response to the signal state of the current frame audio signal being a non-speech state and the signal energy value of the current frame audio signal being-40 db greater than the third energy value-42 db and less than the second energy value-38 db; determining that a change value of a noise indication parameter of the current frame audio signal is-440 in response to the signal state of the current frame audio signal being a non-speech state and the signal energy value of the current frame audio signal being-45 db greater than-48 db and less than-42 db of the fourth energy value; and determining that the change value of the noise indication parameter of the current frame audio signal is-640 in response to the signal state of the current frame audio signal being a non-speech state and the signal energy value of the current frame audio signal being-50 db less than the fourth energy value-48 db.

405. The first terminal determines the parameter value of the noise indication parameter of the current frame audio signal according to the change value of the noise indication parameter of the current frame audio signal and the detection information of the previous frame audio signal.

The detection information of the previous frame of audio signal comprises parameter values of noise indication parameters of the previous frame of audio signal, noise detection results corresponding to the previous frame of audio signal and the like. The noise detection result corresponding to the audio signal of the previous frame comprises a triggered noise prompt or an un-triggered noise prompt. The noise detection result is used for representing the multimedia session environment of the first user identifier, the noise detection result corresponding to the audio signal of the previous frame is used for representing the multimedia session environment of the first user identifier when the audio signal of the previous frame is collected, and the noise detection result corresponding to the audio signal of the current frame is used for representing the multimedia session environment of the first user identifier when the audio signal of the current frame is collected.

In the embodiment of the disclosure, the noise detection results corresponding to the previous frame of audio signal are different, and the parameter values of the noise indication parameters of the current frame of audio signal are also different. For different noise detection results corresponding to the previous frame of audio signal, the first terminal determines the parameter value of the noise indication parameter of the current frame of audio signal according to the change value of the noise indication parameter of the current frame of audio signal, the parameter value of the noise indication parameter of the previous frame of audio signal and the noise detection result corresponding to the previous frame of audio signal, which comprises the following cases:

in the first case, the noise detection result corresponding to the audio signal of the previous frame is triggering noise prompt.

In one possible implementation manner, in response to the noise detection result corresponding to the previous frame of audio signal being triggering noise prompt, and the parameter value of the noise indication parameter of the previous frame of audio signal being greater than the seventh value, the first terminal determines the sum of the seventh value and the change value of the noise indication parameter of the current frame of audio signal as the parameter value of the noise indication parameter of the current frame of audio signal.

In another possible implementation manner, in response to the noise detection result corresponding to the previous frame of audio signal being triggering noise prompt, and the parameter value of the noise indication parameter of the previous frame of audio signal being smaller than the seventh value, the first terminal determines the sum of the parameter value of the noise indication parameter of the previous frame of audio signal and the change value of the noise indication parameter of the current frame of audio signal as the parameter value of the noise indication parameter of the current frame of audio signal.

The seventh value may be determined according to the statistical data, and the seventh value may be 1400×20, 1500×20, and so on. For example, the noise detection result corresponding to the previous frame of audio signal is triggering noise prompt, the seventh value is 1400×20, if the parameter value of the noise indication parameter of the previous frame of audio signal is 1500×20 and is greater than 1400×20, the sum of 1400×20 and the variation value of the noise indication parameter of the current frame of audio signal is determined as the parameter value of the noise indication parameter of the current frame of audio signal; if the parameter value of the noise indication parameter of the previous frame audio signal is 1300×20 smaller than the seventh value 1400×20, determining the sum of the parameter value of the noise indication parameter of the previous frame audio signal and the variation value of the noise indication parameter of the current frame audio signal as the parameter value of the noise indication parameter of the current frame audio signal.

In the second case, the corresponding noise detection result of the audio signal of the previous frame is the non-triggered noise prompt.

In one possible implementation manner, in response to the noise detection result corresponding to the previous frame of audio signal being the non-triggered noise prompt, and the parameter value of the noise indication parameter of the previous frame of audio signal being greater than the eighth value, the first terminal determines the sum of the eighth value and the change value of the noise indication parameter of the current frame of audio signal as the parameter value of the noise indication parameter of the current frame of audio signal.

In another possible implementation manner, in response to the noise detection result corresponding to the previous frame of audio signal being the non-triggered noise prompt, and the parameter value of the noise indication parameter of the previous frame of audio signal being smaller than the eighth value, the first terminal determines the sum of the parameter value of the noise indication parameter of the previous frame of audio signal and the change value of the noise indication parameter of the current frame of audio signal as the parameter value of the noise indication parameter of the current frame of audio signal.

The eighth value may be determined according to the statistical data, and the eighth value may be 1200×20, 1300×20, and so on. For example, the noise detection result corresponding to the previous frame of audio signal is that no noise prompt is triggered, the eighth value is 1300×20, if the parameter value of the noise indication parameter of the previous frame of audio signal is 1400×20 and is greater than the eighth value 1300×20, the sum of the variation values of the noise indication parameter of the current frame of audio signal and 1300×20 is determined as the parameter value of the noise indication parameter of the current frame of audio signal; if the parameter value of the noise indication parameter of the previous frame audio signal is 1200 x 20 less than the eighth value 1300 x 20, determining the sum of the parameter value of the noise indication parameter of the previous frame audio signal and the change value of the noise indication parameter of the current frame audio signal as the parameter value of the noise indication parameter of the current frame audio signal.

406. And the first terminal displays noise prompt information on the first session interface in response to the parameter value of the noise indication parameter being greater than the parameter threshold.

In the embodiment of the disclosure, when the signal energy of the audio signal of the current frame is different, the parameter threshold value for discriminating the multimedia session environment is also different. For example, the parameter threshold may be 5400 when the signal energy of the current frame audio signal is greater than-31 db, 23600 when the signal energy of the current frame audio signal is greater than-38 db and less than-31 db, 16400 when the signal energy of the current frame audio signal is greater than-44 db and less than-38 db, and 9200 when the signal energy of the current frame audio signal is less than-44 db.

Based on the determined parameter threshold, the first terminal compares the parameter value of the noise indication parameter of the current frame audio signal with the parameter threshold, and determines that the noise detection result corresponding to the current frame audio signal is triggering noise prompt, namely that the multimedia session environment is a noisy environment, in response to the parameter value of the noise indication parameter of the current frame audio signal being greater than or equal to the parameter threshold; and determining that the noise detection result corresponding to the current frame audio signal is an untriggered noise prompt, namely that the multimedia session environment is a quiet environment, in response to the parameter value of the noise indication parameter of the current frame audio signal being smaller than the parameter threshold.

In the implementation of the present disclosure, the first terminal may provide an interface for communicating between a background and a foreground of the first terminal, and may output a noise detection result of the multimedia session environment to the front end. The first terminal comprises an algorithm layer, a middle layer and a UI interface layer, wherein the algorithm layer belongs to the background, the middle layer is connected with the foreground and the background, and the UI interface layer belongs to the foreground. Based on the noise detection result corresponding to the interface and the current frame audio signal, the algorithm layer of the first terminal can obtain a noise detection result identifier according to the noise detection result, the noise detection result identifier comprises a noise identifier and a quiet identifier, the middle layer obtains the noise detection result identifier, and determines whether to display noise prompt information on the first session interface according to the noise detection result identifier, if the noise detection result identifier is the noise identifier, the UI interface layer of the first terminal displays the noise prompt information on the first session interface, otherwise, the UI interface layer of the first terminal does not display the noise prompt information on the first session interface.

The first terminal displays the noise prompt information on the first session interface in the following ways:

In a first manner, when a click operation on a session member option on a first session interface is detected, the first terminal displays a session member list comprising a plurality of user identities referring to the multimedia session. And responding to the parameter value of the noise indication parameter being larger than the parameter threshold, wherein the multimedia conversation environment is a noisy environment, and the first terminal can display noise prompt characters at a position, in the conversation member list of the first conversation interface, of which the distance from the first user identifier is smaller than a preset distance. The preset distance may be determined according to the length of the rows and columns of the session member list, and may be 1 cm, 2cm, and so on. To be able to alert the first user, the first terminal may highlight, e.g., bold, highlight, etc., the noise prompt text. Fig. 5 is a session interface of a multimedia session application in a quiet environment, fig. 6 is a session interface of a multimedia session application in a noisy environment, and comparing fig. 5 and fig. 6, it can be known that when the multimedia session environment corresponding to the user identifier AAAA is a noisy environment, two words of noisy prompt words are displayed near the user identifier AAAA on the session member list of the session interface shown in fig. 6.

In the second manner, in response to the parameter value of the noise indication parameter being greater than the parameter threshold, the multimedia session environment is a noisy environment, and the first terminal may change the display color of the microphone identifier corresponding to the first user identifier on the first session interface, for example, change the display color of the microphone identifier from green to red, and so on.

Referring to fig. 7, to provide more choices to the first user, embodiments of the present disclosure will also add a noise prompt sub-option and an auto mute sub-option to the original audio options of the session interface. When a click operation of the noise prompt option is detected, the noise prompt function is started, and in response to the parameter value of the noise indication parameter being greater than the parameter threshold, noise prompt information can be displayed on the first session interface. When the clicking operation of the automatic mute option is detected, the automatic mute function is started, and in response to the parameter value of the noise indication parameter being greater than the parameter threshold, the first terminal can close the conversation sound corresponding to the first user identifier and does not send the collected audio signal to the server, so that a second user participating in the multimedia conversation does not receive the audio signal of the first user, and the effect of muting the first user is achieved when the first user is in a noisy environment. Of course, if the multimedia session environment corresponding to the first user identifier is changed from a noisy environment to a quiet environment, the first terminal may start the session sound corresponding to the first user identifier again, and send the collected audio signal to the server, where it is sent to each second user participating in the multimedia session.

In another embodiment of the present disclosure, after the noise prompt information is displayed on the first session interface in response to the parameter value of the noise indication parameter being greater than the parameter threshold, the first terminal will conceal the noise prompt information in response to the parameter value of the noise indication parameter of the next frame of audio signal being less than the parameter threshold, thereby avoiding the displayed noise prompt information from interfering with the first user.

407. The first terminal sends a prompt message to the server.

In order to enable other users participating in the multimedia session to know the current multimedia session environment of the first user, the first terminal may further send a prompt message to the server after detecting that the parameter value of the noise indication parameter of the current frame audio signal is greater than the parameter threshold.

408. The server sends the prompt message to the second terminal.

When receiving the prompt message sent by the first terminal, the server can send the prompt message to the second terminal in a wired or wireless communication mode.

409. When receiving the prompt message sent by the server, the second terminal responds to the prompt message and displays noise prompt information corresponding to the first user identification on the second session interface.

After receiving the prompt message sent by the server, the second terminal can display noise prompt information corresponding to the first user identification on the second session interface in response to the prompt message so as to prompt the second user.

The second terminal displays the noise prompt information corresponding to the first user identifier on the second session interface in the following modes:

in the first mode, the second terminal can display noise prompt characters at a position, in a session member list of the second session interface, where the distance between the second terminal and the first user identifier is smaller than a preset distance.

In the second mode, the second terminal can change the display color of the microphone identifier corresponding to the first user identifier on the second session interface.

In another embodiment of the present disclosure, when the multimedia session environment corresponding to the first user identifier is changed from a noisy environment to a quiet environment, the second terminal will also conceal the noise prompt information in response to the noise detection result being changed from the noisy environment to the quiet environment, thereby avoiding interference to the second user.

According to the method provided by the embodiment of the disclosure, the parameter value of the noise indication parameter of the current frame audio signal is determined by collecting the current frame audio signal and according to the signal state, the signal energy and the detection information of the previous frame audio signal of the current frame audio signal. Because the signal state of the audio signal is not simply taken as the detection result, the signal energy of the audio signal of the current frame and the related detection information of the audio signal of the previous frame are comprehensively considered, the accidental of the detection result is removed, and the accuracy of the detection result is improved.

Referring to fig. 8, an embodiment of the present disclosure provides a noise detection apparatus including:

A display module 801, configured to display a first session interface of the multimedia session application based on a first user identifier of a currently logged-in multimedia session application;

a collection module 802 for collecting audio signals;

A determining module 803, configured to determine a change value of a noise indication parameter of the current frame audio signal according to the signal state and the signal energy of the current frame audio signal;

A determining module 803, configured to determine, based on the change value of the noise indication parameter of the current frame audio signal and the detection information of the previous frame audio signal in the frame-by-frame detection process of the audio signal, a parameter value of the noise indication parameter of the current frame audio signal;

And the display module 801 is configured to display noise prompt information on the first session interface in response to the parameter value of the noise indication parameter being greater than the parameter threshold.

In another embodiment of the present disclosure, the display module 801 is configured to display, in response to a parameter value of the noise indication parameter being greater than a parameter threshold, a noise prompt text at a location in a session member list of the first session interface that is less than a preset distance from the first user identifier, the session member list including a plurality of user identifiers participating in the multimedia session.

In another embodiment of the disclosure, the display module 801 is configured to change a display color of a microphone identifier corresponding to the first user identifier on the first session interface in response to a parameter value of the noise indication parameter being greater than a parameter threshold.

In another embodiment of the present disclosure,

The display module 801 is further configured to display noise prompt information on the first session interface in response to the parameter value of the noise indication parameter being greater than the parameter threshold and the noise prompt function being turned on.

In another embodiment of the present disclosure,

The display module 801 is further configured to, in response to the parameter value of the noise indication parameter being greater than the parameter threshold, turn on the automatic mute function, and turn off the conversation sound corresponding to the first user identifier.

In another embodiment of the present disclosure, the apparatus further comprises:

And the hiding module is used for hiding the noise prompt information in response to the fact that the parameter value of the noise indication parameter of the audio signal of the next frame is smaller than the parameter threshold value.

In another embodiment of the present disclosure, the apparatus includes:

The sending module is used for sending a prompt message to the server, the prompt message is sent to the terminals logging in the plurality of second user identifiers by the server, the prompt message is used for triggering the terminals logging in the second user identifiers to display noise prompt information, and the second user identifiers are other user identifiers except the first user identifiers in the multimedia session.

In another embodiment of the present disclosure, the determining module 803 is configured to determine, in response to the signal state of the current frame audio signal being a speech state and the signal energy value of the current frame audio signal being greater than the first energy value, that a change value of the noise indication parameter of the current frame audio signal is a first numerical value; in response to the signal state of the current frame audio signal being a speech state, and the signal energy value of the current frame audio signal being less than the first energy value, determining a change value of a noise indication parameter of the current frame audio signal as a second value;

In another embodiment of the present disclosure, the determining module 803 is configured to determine, in response to the signal state of the current frame audio signal being a non-speech state and the signal energy value of the current frame audio signal being greater than the second energy value, that the change value of the noise indication parameter of the current frame audio signal is a third value; in response to the signal state of the current frame audio signal being a non-speech state, and the signal energy value of the current frame audio signal being greater than the third energy value and less than the second energy value, determining that the change value of the noise indication parameter of the current frame audio signal is a fourth value; in response to the signal state of the current frame audio signal being a non-speech state, and the signal energy value of the current frame audio signal being greater than the fourth energy value and less than the third energy value, determining that the change value of the noise indication parameter of the current frame audio signal is a fifth value; determining that the change value of the noise indication parameter of the current frame audio signal is a sixth value in response to the signal state of the current frame audio signal being a non-speech state and the signal energy value of the current frame audio signal being less than the fourth energy value;

In another embodiment of the present disclosure, the determining module 803 is configured to determine, as a parameter value of the noise indication parameter of the current frame audio signal, a sum of the seventh value and a variation value of the noise indication parameter of the current frame audio signal in response to the noise detection result corresponding to the previous frame audio signal being a trigger noise indication, and the parameter value of the noise indication parameter of the previous frame audio signal being greater than the seventh value; and responding to the noise detection result corresponding to the previous frame of audio signal as triggering noise prompt, wherein the parameter value of the noise indication parameter of the previous frame of audio signal is smaller than the seventh value, and determining the sum of the parameter value of the noise indication parameter of the previous frame of audio signal and the change value of the noise indication parameter of the current frame of audio signal as the parameter value of the noise indication parameter of the current frame of audio signal.

In another embodiment of the present disclosure, the determining module 803 is configured to determine, as a parameter value of the noise indication parameter of the current frame audio signal, a sum of the eighth value and a change value of the noise indication parameter of the current frame audio signal in response to the noise detection result of the previous frame audio signal being an un-triggered noise prompt, and the parameter value of the noise indication parameter of the previous frame audio signal being greater than the eighth value; and responding to the noise detection result of the previous frame of audio signal as the non-triggered noise prompt, wherein the parameter value of the noise indication parameter of the previous frame of audio signal is smaller than the eighth value, and determining the sum of the parameter value of the noise indication parameter of the previous frame of audio signal and the change value of the noise indication parameter of the current frame of audio signal as the parameter value of the noise indication parameter of the current frame of audio signal.

In summary, the apparatus provided in the embodiments of the present disclosure determines, by collecting an audio signal, a parameter value of a noise indication parameter of an audio signal of a current frame according to a signal state, signal energy, and detection information of an audio signal of a previous frame. Because the signal state of the audio signal is not simply taken as the detection result, the signal energy of the audio signal of the current frame and the related detection information of the audio signal of the previous frame are comprehensively considered, the accidental of the detection result is removed, and the accuracy of the detection result is improved.

Referring to fig. 9, an embodiment of the present disclosure provides a noise detection apparatus including:

the display module 901 is configured to display a second session interface of the multimedia session application based on a second user identifier of the currently logged-in multimedia session application;

The receiving module 902 is configured to receive a prompt message sent by the server, where the prompt message is sent by the terminal logged in the first user identifier when detecting that a parameter value of a noise indication parameter of a current frame audio signal is greater than a parameter threshold, and the parameter value of the noise indication parameter of the current frame audio signal is determined according to a signal state, signal energy, and detection information of a previous frame audio signal of the current frame audio signal;

and the display module 901 is used for responding to the prompt message and displaying noise prompt information corresponding to the first user identifier on the second session interface.

In another embodiment of the present disclosure, the display module 901 is configured to display, in response to the alert message, a noise alert text at a location in the session member list of the second session interface where a distance between the session member list and the first user identifier is less than a preset distance, where the session member list includes a plurality of user identifiers participating in the multimedia session.

In another embodiment of the present disclosure, the display module 901 is configured to change a display color of a microphone identifier corresponding to the first user identifier on the second session interface in response to the prompt message.

In summary, the device provided in the embodiments of the present disclosure receives a prompt message, where the prompt message is sent by a terminal when detecting that a parameter value of a noise indication parameter of a current frame audio signal is greater than a parameter threshold, and the parameter value of the noise indication parameter of the current frame audio signal is determined according to a signal state, signal energy, and detection information of a previous frame audio signal of the current frame audio signal. Because the signal state of the audio signal is not simply taken as the detection result, the signal energy of the audio signal of the current frame and the related detection information of the audio signal of the previous frame are comprehensively considered, the accidental of the detection result is removed, and the accuracy of the detection result is improved.

Fig. 10 shows a block diagram of a terminal 1000 provided by an exemplary embodiment of the present disclosure. The terminal 1000 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio plane 3), an MP4 (Moving Picture Experts Group Audio Layer IV, motion picture expert compression standard audio plane 4) player, a notebook computer, or a desktop computer. Terminal 1000 can also be referred to by other names of user equipment, portable terminal, laptop terminal, desktop terminal, etc.

In general, terminal 1000 can include: a processor 1001 and a memory 1002.

The processor 1001 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 1001 may be implemented in at least one hardware form of DSP (DIGITAL SIGNAL Processing), FPGA (Field-Programmable gate array), PLA (Programmable Logic Array ). The processor 1001 may also include a main processor and a coprocessor, the main processor being a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ); a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 1001 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 1001 may also include an AI (ARTIFICIAL INTELLIGENCE ) processor for processing computing operations related to machine learning.

Memory 1002 may include one or more computer-readable storage media, which may be non-transitory. Memory 1002 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1002 is used to store at least one instruction for execution by processor 1001 to implement the noise detection method provided by the method embodiments of the present application.

In some embodiments, terminal 1000 can optionally further include: a peripheral interface 1003, and at least one peripheral. The processor 1001, the memory 1002, and the peripheral interface 1003 may be connected by a bus or signal line. The various peripheral devices may be connected to the peripheral device interface 1003 via a bus, signal wire, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1004, a display 1005, a camera 1006, audio circuitry 1007, and a power supply 1009.

Peripheral interface 1003 may be used to connect I/O (Input/Output) related at least one peripheral to processor 1001 and memory 1002. In some embodiments, processor 1001, memory 1002, and peripheral interface 1003 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 1001, memory 1002, and peripheral interface 1003 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

Radio Frequency circuit 1004 is used to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. Radio frequency circuitry 1004 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 1004 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1004 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. Radio frequency circuitry 1004 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: metropolitan area networks, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (WIRELESS FIDELITY ) networks. In some embodiments, the radio frequency circuit 1004 may further include NFC (NEAR FIELD Communication) related circuits, which is not limited by the present application.

The display screen 1005 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 1005 is a touch screen, the display 1005 also has the ability to capture touch signals at or above the surface of the display 1005. The touch signal may be input to the processor 1001 as a control signal for processing. At this time, the display 1005 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, display 1005 may be one, providing a front panel of terminal 1000; in other embodiments, display 1005 may be provided in at least two, separately provided on different surfaces of terminal 1000 or in a folded configuration; in still other embodiments, display 1005 may be a flexible display disposed on a curved surface or a folded surface of terminal 1000. Even more, the display 1005 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The display 1005 may be made of LCD (Liquid CRYSTAL DISPLAY), OLED (Organic Light-Emitting Diode) or other materials.

The camera assembly 1006 is used to capture images or video. Optionally, camera assembly 1006 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, camera assembly 1006 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

The audio circuit 1007 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and environments, converting the sound waves into electric signals, and inputting the electric signals to the processor 1001 for processing, or inputting the electric signals to the radio frequency circuit 1004 for voice communication. For purposes of stereo acquisition or noise reduction, the microphone may be multiple, each located at a different portion of terminal 1000. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 1001 or the radio frequency circuit 1004 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, audio circuit 1007 may also include a headphone jack.

Power supply 1009 is used to power the various components in terminal 1000. The power source 1009 may be alternating current, direct current, disposable battery or rechargeable battery. When the power source 1009 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal 1000 can further include one or more sensors 1010. The one or more sensors 1010 include, but are not limited to: acceleration sensor 1011, gyro sensor 1012, pressure sensor 1013, optical sensor 1015, and proximity sensor 1016.

The acceleration sensor 1011 can detect the magnitudes of accelerations on three coordinate axes of the coordinate system established with the terminal 1000. For example, the acceleration sensor 1011 may be used to detect components of gravitational acceleration in three coordinate axes. The processor 1001 may control the display screen 1005 to display a user interface in a landscape view or a portrait view according to the gravitational acceleration signal acquired by the acceleration sensor 1011. The acceleration sensor 1011 may also be used for the acquisition of motion data of a game or a user.

The gyro sensor 1012 may detect the body direction and the rotation angle of the terminal 1000, and the gyro sensor 1012 may collect the 3D motion of the user to the terminal 1000 in cooperation with the acceleration sensor 1011. The processor 1001 may implement the following functions according to the data collected by the gyro sensor 1012: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.

Pressure sensor 1013 may be disposed on a side frame of terminal 1000 and/or on an underlying layer of display 1005. When pressure sensor 1013 is provided at a side frame of terminal 1000, a grip signal of the user to terminal 1000 can be detected, and the processor 801 performs a right-left hand recognition or a quick operation based on the grip signal acquired by pressure sensor 813. When the pressure sensor 1013 is provided at the lower layer of the display screen 1005, the processor 1001 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 1005. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.

The optical sensor 1015 is used to collect ambient light intensity. In one embodiment, the processor 1001 may control the display brightness of the display screen 1005 based on the ambient light intensity collected by the optical sensor 1015. Specifically, when the intensity of the ambient light is high, the display brightness of the display screen 1005 is turned up; when the ambient light intensity is low, the display brightness of the display screen 1005 is turned down. In another embodiment, the processor 1001 may dynamically adjust the shooting parameters of the camera module 1006 according to the ambient light intensity collected by the optical sensor 1015.

Proximity sensor 1016, also referred to as a distance sensor, is typically located on the front panel of terminal 1000. Proximity sensor 1016 is used to collect the distance between the user and the front of terminal 1000. In one embodiment, when proximity sensor 1016 detects a gradual decrease in the distance between the user and the front face of terminal 1000, processor 1001 controls display 1005 to switch from the bright screen state to the off screen state; when proximity sensor 1016 detects a gradual increase in the distance between the user and the front of terminal 1000, processor 1001 controls display 1005 to switch from the off-screen state to the on-screen state.

Those skilled in the art will appreciate that the structure shown in fig. 10 is not limiting and that terminal 1000 can include more or fewer components than shown, or certain components can be combined, or a different arrangement of components can be employed.

According to the terminal provided by the embodiment of the disclosure, the parameter value of the noise indication parameter of the current frame audio signal is determined by collecting the audio signal and according to the signal state, the signal energy and the detection information of the previous frame audio signal of the current frame audio signal. Because the signal state of the audio signal is not simply taken as the detection result, the signal energy of the audio signal of the current frame and the related detection information of the audio signal of the previous frame are comprehensively considered, the accidental of the detection result is removed, and the accuracy of the detection result is improved.

Embodiments of the present disclosure provide a computer readable storage medium having at least one program code stored therein, the at least one program code loaded and executed by a processor to implement the noise detection method shown in fig. 2 or 3 or 4. The computer readable storage medium may be non-transitory. For example, the computer readable storage medium may be Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), CD-ROM (Compact Disc Read-Only Memory), magnetic tape, floppy disk, optical data storage device, etc.

The embodiment of the disclosure provides a computer readable storage medium, which is used for determining a parameter value of a noise indication parameter of a current frame audio signal according to a signal state, signal energy and detection information of a previous frame audio signal of the current frame audio signal by collecting the audio signal. Because the signal state of the audio signal is not simply taken as a noise detection result, the signal energy of the audio signal of the current frame and the related detection information of the audio signal of the previous frame are comprehensively considered, the accidental of the detection result is removed, and the accuracy of the detection result is improved.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The foregoing description of the preferred embodiments of the present disclosure is provided for the purpose of illustration only, and is not intended to limit the disclosure to the particular embodiments disclosed, but on the contrary, the intention is to cover all modifications, equivalents, alternatives, and alternatives falling within the spirit and principles of the disclosure.

Claims

1. A method of noise detection, the method comprising:

Collecting an audio signal;

determining the parameter value of the noise indication parameter of the current frame audio signal according to the change value of the noise indication parameter of the current frame audio signal, the parameter value of the noise indication parameter of the previous frame audio signal and the noise detection result corresponding to the previous frame audio signal;

and in response to the parameter value of the noise indication parameter being greater than a parameter threshold, displaying noise prompt information on the first session interface.

2. The method of claim 1, wherein the displaying noise prompt information on the first session interface in response to the parameter value of the noise indication parameter being greater than a parameter threshold comprises:

And in response to the parameter value of the noise indication parameter being greater than a parameter threshold, displaying noise prompt characters at a position, in a session member list of the first session interface, of which the distance from the first user identifier is smaller than a preset distance, wherein the session member list comprises a plurality of user identifiers participating in a multimedia session.

3. The method of claim 1, wherein the displaying noise prompt information on the first session interface in response to the parameter value of the noise indication parameter being greater than a parameter threshold comprises:

And changing the display color of the microphone identifier corresponding to the first user identifier on the first session interface in response to the parameter value of the noise indication parameter being greater than a parameter threshold.

4. The method of claim 1, wherein the responding to the parameter value of the noise indication parameter being greater than a parameter threshold value further comprises, prior to displaying noise prompt information on the first session interface:

and responding to the parameter value of the noise indication parameter being larger than a parameter threshold value, and starting a noise prompt function, and executing the step of displaying noise prompt information on the first session interface.

5. The method according to claim 1, wherein the method further comprises:

and responding to the parameter value of the noise indication parameter being larger than a parameter threshold value, and the automatic mute function being started, and closing the conversation sound corresponding to the first user identifier.

6. The method of claim 1, wherein the displaying of the noise prompt information on the first session interface in response to the parameter value of the noise indication parameter being greater than a parameter threshold further comprises:

7. The method of claim 1, wherein the displaying of the noise prompt information on the first session interface in response to the parameter value of the noise indication parameter being greater than a parameter threshold further comprises:

And sending a prompt message to a server, wherein the prompt message is sent to terminals logging in a plurality of second user identifiers by the server, the prompt message is used for triggering the terminals logging in the second user identifiers to display the noise prompt message, and the second user identifiers are other user identifiers except the first user identifiers in the multimedia session.

8. A method of noise detection, the method comprising:

Receiving a prompt message sent by a server, wherein the prompt message is sent by a terminal logging in a first user identification when detecting that the parameter value of a noise indication parameter of a current frame audio signal is larger than a parameter threshold value, and the parameter value of the noise indication parameter of the current frame audio signal is determined according to the change value of the noise indication parameter of the current frame audio signal, the parameter value of the noise indication parameter of a previous frame audio signal and a noise detection result corresponding to the previous frame audio signal, and the change value of the noise indication parameter of the current frame audio signal is determined according to the signal state and the signal energy of the current frame audio signal;

9. The method of claim 8, wherein the displaying noise prompt corresponding to the first user identification on the second session interface in response to the prompt message comprises:

And responding to the prompt message, displaying noise prompt characters at a position, in a session member list of the second session interface, of which the distance between the session member list and the first user identifier is smaller than a preset distance, wherein the session member list comprises a plurality of user identifiers participating in a multimedia session.

10. The method of claim 8, wherein the displaying noise prompt corresponding to the first user identification on the second session interface in response to the prompt message comprises:

11. A noise detection apparatus, the apparatus comprising:

the acquisition module is used for acquiring the audio signals;

The determining module is configured to determine a parameter value of a noise indication parameter of the current frame audio signal according to a change value of the noise indication parameter of the current frame audio signal, a parameter value of a noise indication parameter of a previous frame audio signal, and a noise detection result corresponding to the previous frame audio signal;

12. The apparatus of claim 11, wherein the display module is configured to display noise prompt text at a location in the session member list of the first session interface where a distance from the first user identification is less than a preset distance in response to the parameter value of the noise indication parameter being greater than a parameter threshold, the session member list including a plurality of user identifications participating in a multimedia session.

13. The apparatus of claim 11, wherein the display module is configured to change a display color of a microphone identifier corresponding to the first user identifier on the first session interface in response to a parameter value of the noise indication parameter being greater than a parameter threshold.

14. The apparatus of claim 11, wherein the display module is further configured to perform the step of displaying noise prompt information on the first session interface in response to the parameter value of the noise indication parameter being greater than a parameter threshold and a noise prompt function being turned on.

15. The apparatus of claim 11, wherein the display module is further configured to close the conversation sound corresponding to the first user identification in response to the parameter value of the noise indication parameter being greater than a parameter threshold and an automatic mute function having been turned on.

16. The apparatus of claim 11, wherein the apparatus further comprises:

17. The apparatus of claim 11, wherein the apparatus further comprises:

The sending module is used for sending a prompt message to the server, the prompt message is sent to terminals logging in a plurality of second user identifiers by the server, the prompt message is used for triggering the terminals logging in the second user identifiers to display the noise prompt message, and the second user identifiers are other user identifiers except the first user identifiers in the multimedia session.

18. A noise detection apparatus, the apparatus comprising:

The receiving module is used for receiving a prompt message sent by a server, wherein the prompt message is sent by a terminal logging in a first user identifier when detecting that the parameter value of the noise indication parameter of the current frame audio signal is larger than a parameter threshold value, the parameter value of the noise indication parameter of the current frame audio signal is determined according to the change value of the noise indication parameter of the current frame audio signal, the parameter value of the noise indication parameter of the previous frame audio signal and the noise detection result corresponding to the previous frame audio signal, and the change value of the noise indication parameter of the current frame audio signal is determined according to the signal state and the signal energy of the current frame audio signal;

19. The apparatus of claim 18, wherein the display module is configured to display noise prompt text in response to the prompt message at a location in the session member list of the second session interface that is less than a predetermined distance from the first user identification, the session member list including a plurality of user identifications participating in a multimedia session.

20. The apparatus of claim 18, wherein the display module is configured to change a display color of a microphone identifier corresponding to the first user identifier on the second session interface in response to the alert message.

21. A terminal comprising a processor and a memory, wherein the memory has stored therein at least one program code that is loaded and executed by the processor to implement the noise detection method of any of claims 1 to 7 or to implement the noise detection method of any of claims 8 to 10.

22. A computer readable storage medium, characterized in that at least one program code is stored in the storage medium, the at least one program code being loaded and executed by a processor to implement the noise detection method of any one of claims 1 to 7 or the noise detection method of any one of claims 8 to 10.