CN117975968B

CN117975968B - Remote patrol system control method and system based on sound and language model

Info

Publication number: CN117975968B
Application number: CN202410389170.7A
Authority: CN
Inventors: 景志斌; 陈果累; 何佳; 叶俊; 李孟福
Original assignee: Sichuan Kingscheme Information Technology Co ltd
Current assignee: Sichuan Kingscheme Information Technology Co ltd
Priority date: 2024-04-02
Filing date: 2024-04-02
Publication date: 2024-09-10
Anticipated expiration: 2044-04-02
Also published as: CN117975968A

Abstract

The invention discloses a remote patrol system control method and a remote patrol system control system based on a sound and language model, relates to the field of intelligent patrol of transformer substations, and solves the problems that the existing remote patrol system is complex in operation, easy to make mistakes in inquiry and untimely in result notification; comprising the following steps: acquiring voice data, and converting the voice data into text data according to the fine-tuned voice recognition model; parsing intention data from the text data according to the remote patrol expert model, the intention data comprising: API interface and interface parameter of the service to be called; analyzing interface parameters in the intention data, and enhancing the interface parameters to obtain enhanced intention data; generating a service call request according to the enhanced intention data, and calling a remote inspection system to execute a service through the service call request to obtain a service result; inputting the service result into a remote patrol expert model to generate reply text data; converting the reply text data into reply voice data and outputting the reply voice data; the remote patrol system can be controlled based on sound.

Description

Remote patrol system control method and system based on sound and language model

Technical Field

The invention relates to the field of intelligent patrol of transformer substations, in particular to a remote patrol system control method and system based on sound and language models.

Background

The remote inspection system for the transformer substation is a system which is planned by a national power grid, takes robots, unmanned aerial vehicles, voiceprint equipment, cameras and the like as sensing layer equipment, and generates inspection results after intelligent analysis of collected data through an algorithm host. The system mainly executes the patrol task of the substation equipment in a mode of manually starting the patrol task, setting a period of patrol task, triggering a main and auxiliary system alarm signal and the like. The system carries out the patrol task around the patrol point, wherein the patrol point refers to a check point on the service, and a plurality of patrol points generally exist in one device. The minimum service monitor point is typically constructed from one or more cameras, drones or robots preset to see if the components at each inspection point are defective. The system is designed with a retrieval module based on point location attributes, and is used for carrying out operations such as equipment monitoring and checking, patrol task starting, patrol equipment setting and the like after the patrol point location is inquired.

However, due to the fact that the service attributes of the inspection points are numerous, such as a transformer substation area, transformer substation intervals, equipment names, component names, phase names, point codes, inspection types, importance levels, identification algorithms, perception layer equipment and the like, the operation is complicated because the query accuracy is not high during the manual retrieval, and the related conditions of the equipment cannot be checked in the first time when the emergency occurs, particularly for the inspection points which are not covered by transformer substations where the main and auxiliary systems are not built or the main and auxiliary systems. On the other hand, due to the fact that a plurality of patrol points are involved, the duration of the patrol process is long, and the patrol result may not be observed by on-site staff at the first time after being generated.

Disclosure of Invention

The application aims to provide a remote patrol system control method and a remote patrol system control system based on sound and language models, which solve the problems that the existing remote patrol system is complex in operation, easy to make mistakes in inquiry and untimely in result notification; the remote inspection system of the transformer substation is expanded, the control of the remote inspection system is realized through a sound and language large model, a worker realizes man-machine communication with the remote inspection system through voice input, the remote inspection system is guided to rapidly complete inspection related work, voice feedback of inspection task results is obtained, the operation flow of the remote inspection system is simplified, the operation difficulty is reduced, and the worker can conveniently know the task results in time.

The application firstly provides a remote patrol system control method based on sound and language models, which comprises the following steps: acquiring voice data, and converting the voice data into text data according to a fine-tuned voice recognition model; parsing intention data from the text data according to a remote patrol expert model, the intention data comprising: the remote patrol expert model is obtained by training an SOTA large language model; analyzing interface parameters in the intention data, and enhancing the interface parameters to obtain enhanced intention data; generating a service call request according to the enhanced intention data, and calling a remote patrol system to execute a service through the service call request to obtain a service result; inputting the service result into a remote patrol expert model to generate reply text data; and converting the reply text data into reply voice data to be output.

By adopting the technical scheme, the service interface of the remote patrol system can be called by acquiring the voice data of the staff, the remote patrol system is controlled to execute the service, the result is displayed in a voice mode, the remote patrol system can be controlled to execute the service through the voice, the service result is returned in a voice mode, and the operation flow of the staff is simplified. In addition, accurate calling is achieved through intention enhancement of interface parameters, and requirements on workers are reduced.

In one possible implementation, the fine-tuned speech recognition model is obtained by: and acquiring a voice recognition SOTA model, and performing fine adjustment on the voice recognition SOTA model according to the special voice text data in the electric power field to obtain a fine-adjusted voice recognition model.

In one possible implementation, the remote patrol expert model is obtained by: acquiring an API interface and comments of a remote patrol system to form an intention realization seed library; supplementing the context and placeholder of the remote patrol system information for each API interface in the intention realization seed library to generate a placeholder intention expression; the occupation intention expression is input into a general large language model to be expanded, so that a occupation intention expression data set is formed; manually inspecting the occupation intention expression data set, and screening out wrong occupation intention expressions; replacing placeholders in the placeholder intention expression data set with power equipment information, point location information and perception layer equipment information in a remote inspection system to obtain an intention expression data set; training the large language model based on the intention expression data set to obtain a remote patrol expert model.

In one possible implementation, the enhanced intent data is obtained by: synchronizing the relational data and the entity data in the remote patrol system to generate a document database, and vectorizing the entity data in the remote patrol system through a word embedding model to generate a vector database; searching the interface parameters in the document database and the vector database respectively, and performing intersection operation on the search result to obtain enhanced interface parameters; and combining the enhanced interface parameters and the API interface needing to call the service into enhanced intention data.

In one possible embodiment, the method further comprises: and acquiring an alarm text of the remote patrol system, and converting the alarm text into voice output.

The application also provides a remote patrol system control system based on the sound and language model, comprising: the voice recognition module is used for acquiring voice data and converting the voice data into text data according to the fine-tuned voice recognition model; the large language model module is used for analyzing intention data from the text data according to a remote patrol expert model, and the intention data comprises: the remote patrol expert model is obtained by training an SOTA large language model; the retrieval enhancement module is used for analyzing the interface parameters in the intention data and enhancing the interface parameters to obtain enhanced intention data; the intention calling module is used for generating a service calling request according to the enhanced intention data, and calling a remote inspection system execution service through the service calling request to obtain a service result; the large language model module is also used for inputting the service result into a remote patrol expert model to generate reply text data; and the voice generation module is used for converting the reply text data into reply voice data and outputting the reply voice data.

In a possible implementation manner, the voice recognition module is further configured to obtain a voice recognition SOTA model, and fine tune the voice recognition SOTA model according to the voice text data specific to the electric power field, so as to obtain a fine-tuned voice recognition model.

In one possible implementation, the large language model module is further configured to: acquiring an API interface and comments of a remote patrol system to form an intention realization seed library; supplementing the context and placeholder of the remote patrol system information for each API interface in the intention realization seed library to generate a placeholder intention expression; the occupation intention expression is input into a general large language model to be expanded, so that a occupation intention expression data set is formed; manually inspecting the occupation intention expression data set, and screening out wrong occupation intention expressions; replacing placeholders in the placeholder intention expression data set with power equipment information, point location information and perception layer equipment information in a remote inspection system to obtain an intention expression data set; training the large language model based on the intention expression data set to obtain a remote patrol expert model.

In a possible implementation manner, the retrieval enhancement module is further configured to: synchronizing the relational data and the entity data in the remote patrol system to generate a document database, and vectorizing the entity data in the remote patrol system through a word embedding model to generate a vector database; searching the interface parameters in the document database and the vector database respectively, and performing intersection operation on the search result to obtain enhanced interface parameters; and combining the enhanced interface parameters and the API interface needing to call the service into enhanced intention data.

In a possible implementation manner, the voice generation module is further configured to obtain an alarm text of the remote patrol system, and convert the alarm text into voice output.

Compared with the prior art, the application has the following beneficial effects: the application collects the voice data of staff, converts the voice data into text data through a voice recognition model, analyzes intention data through a remote patrol expert model, further invokes corresponding service of the remote patrol system through the intention data, converts a structured service result into a text with smooth semantics through the remote patrol expert model, and finally outputs the text through voice; the staff can control the remote patrol system to execute the task through voice, finally returns the task result in the voice form, simplifies the operation flow of the remote patrol system, and is convenient for timely learning the task result;

according to the application, the interface parameters in the intention data are enhanced, and the document database and the vector database are combined to search the expression closest to the interface parameters, so that the accurate control is ensured, the control error caused by the abnormal expression of the staff is avoided, the fault tolerance is improved, and the operation requirement on the staff is reduced;

According to the application, an intention realization seed library is formed by an API interface and comments, a space occupation intention expression is generated by supplementing the context and space occupation symbols of the remote patrol system information, a training corpus is expanded by a general large language model, an intention expression data set is obtained by replacing the space occupation symbols by an entity, a remote patrol expert model is trained by the intention expression data set, and a section of text data is input to analyze the API interface and interface parameters of a service to be called, so that the remote patrol system is controlled to execute tasks.

Drawings

The accompanying drawings, which are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the drawings:

FIG. 1 is a flow chart of a method for controlling a remote patrol system based on sound and language models;

FIG. 2 is a schematic flow chart of training a remote patrol expert model;

FIG. 3 is a flow diagram of interface parameter retrieval enhancement;

Fig. 4 is a schematic structural diagram of a remote patrol system control system based on sound and language models.

Detailed Description

For the purpose of making apparent the objects, technical solutions and advantages of the present application, the present application will be further described in detail with reference to the following examples and the accompanying drawings, wherein the exemplary embodiments of the present application and the descriptions thereof are for illustrating the present application only and are not to be construed as limiting the present application.

Referring to fig. 1, fig. 1 is a flow chart of a remote patrol system control method based on sound and language models, and the method includes: s1, acquiring voice data, and converting the voice data into text data according to a fine-tuned voice recognition model; s2, analyzing intention data from the text data according to a remote patrol expert model, wherein the intention data comprises: the remote patrol expert model is obtained by training an SOTA large language model; s3, analyzing interface parameters in the intention data, and enhancing the interface parameters to obtain enhanced intention data; s4, generating a service call request according to the enhanced intention data, and calling a remote inspection system to execute service through the service call request to obtain a service result; s5, inputting the service result into a remote patrol expert model to generate reply text data; s6, converting the reply text data into reply voice data and outputting the reply voice data.

Specifically, an existing remote patrol system is improved, voice data of staff is obtained through an audio acquisition device, the voice data are converted into text data through a voice recognition model, intention data are analyzed through a remote patrol expert model, the intention data are accurately retrieved and enhanced through interface parameters, then the corresponding service task is executed by calling the remote patrol system through the intention data, and a returned task result is converted into a text with smooth semantics through the remote patrol expert model and finally output through voice.

The improvement of the scheme is that the service interface of the remote patrol system can be called by acquiring the voice data of the staff, the remote patrol system is controlled to execute the service, the result is displayed in a voice mode, the voice control of the remote patrol system to execute the service is realized, the result is timely notified in a voice mode, and the operation flow of the staff is simplified. In addition, accurate calling is achieved through retrieving interface parameters, and requirements on workers are reduced.

Step S1, voice data are acquired, and the voice data are converted into text data according to a fine-tuned voice recognition model. In order to improve the recognition capability of the voice recognition model on the special vocabulary of the electric power field, the voice recognition model is finely tuned by adopting pre-recorded special voice text data of the electric power field. For example, a whisper with a OpenAI company open source is used as a voice recognition model, and Fine-tuning (Fine-Tune) training of the electric power field is performed on the voice recognition model through pre-recorded special voice in the electric power field and corresponding text data, so that recognition accuracy of the voice recognition model on special vocabularies in the electric power industry is improved. Further, to reduce the computational cost of the fine tuning training, the fine tuning may be performed using the Lora or QLora approach.

Step S2, analyzing intention data from the text data according to a remote patrol expert model, wherein the intention data comprises the following steps: the API interface and interface parameters of the service need to be called. The purpose of the remote patrol expert model is to analyze the intention of the text data generated by the large voice recognition model and simulate a human natural language dialogue. And the open source large language model can be selected for intention recognition training to obtain a remote patrol expert model. For example, chatGLM-6B models. In the traditional mode, intent recognition training is carried out on a large language model, and a large amount of corpus data is required to be written manually. According to the scheme, in order to reduce the labor input cost, corpus data for training is written by a general large language model to carry out intention recognition training, and a remote patrol expert model is obtained.

Referring to fig. 2, fig. 2 is a schematic flow chart of training a remote patrol expert model. The remote patrol expert model is obtained by the following steps:

s21, acquiring an API interface and comments of the remote patrol system to form an intention realization seed library. Specifically, after relevant API interface source codes needing to perform voice interaction in the existing remote inspection system are collected, manual supplement and adjustment are performed on the part with missing or unclear annotation in the source codes, and an API interface realization part of the source codes is removed. Each pair of API interfaces and annotations constitutes an API interface pair, and multiple API interface pairs constitute an intent implementation seed library.

S22, supplementing the context and placeholder of the remote patrol system information for each API interface in the intention realization seed library to generate a placeholder intention expression; s23, inputting the occupation intention expression into a general large language model for expansion to form a occupation intention expression data set. Specifically, the placeholder intent expression is a piece of text that indicates that the API interface needs to be invoked to perform an operation on the placeholder. And inputting the occupation intention expression into a universal large language model LLM to execute an intention expression generating task to obtain a occupation intention expression data set formed by a plurality of occupation intention expressions. The generic large language model herein may be selected from commercial or open source large language models such as ChatGPT, chatGLM, discourse, llama 2, etc.

For example, the intent expression generation task is performed using ChatGPT 3.5.5-Turbo model. One possible input and output are listed below, the input being a certain placeholder intent representation and the output being an expanded plurality of placeholder intent representations for use in composing the placeholder intent representation dataset. For example:

inputting a universal large language model LLM:

"remote patrol system of substation developed based on Springboot has the following interface (conceal specific implementation part)

###

At ApiOperation (value= "rotate the camera associated with the device to the corresponding preset points, which are generally directed towards a certain part of the device")

@PostMapping("/cameraFoucsDevice")

public WrappedResult<Boolean>cameraFoucsDevice(

At ApiParam (value= "name of substation equipment", name= "DEVICENAME", required = true) @ RequestParam ("DEVICENAME") STRING DEVICENAME,

@ ApiParam (value= "part name of device", name= "partName", required=false) @ RequestParam ("partName") STRING PARTNAME

){}

###

You are professional chinese linguists, helping me to enumerate the expression sentences you know, whose intent to express can be achieved by invoking the corresponding interface.

The sentence needs to meet the following requirements:

1. according to the definition of the function, parameters of the function are combined in the sentence, and the name of the parameter is strictly used as the placeholder of the entity in the sentence generated by you, such as { DEVICENAME }

2. Note that if a parameter is optional, the parameter may not be included in the sentence, and if the parameter is not included, there may not be other entities in the sentence that refer to the parameter, so that it is necessary to ensure that the grammar of the sentence is correct

3. Sentences should be expressed by using human roles as the system, and the expression modes of human beings need to be met, such as: i need to see the monitoring video of the transformer

4. Exhaustion of different expressed sentences of the same intention as far as possible, and covering all parameters of the interface, and ensuring that sentences are not repeated

Return format of each sentence:

Intent, sentence Action, called interface name Params, parameter list.

Accordingly, the generic large language model LLM returns:

"Intint: I want to transfer the camera corresponding to the { DEVICENAME } device to the preset point location Action: cameraFoucsDevice Params: DEVICENAME

Intent can help me aim the camera at { DEVICENAME } device? Action cameraFoucsDevice Params: DEVICENAME

Intent I want to focus the camera on { DEVICENAME } device Action cameraFoucsDevice Params: DEVICENAME

Intent alignment of the camera to { DEVICENAME } device Action cameraFoucsDevice Params: DEVICENAME

Intent component Action for aligning camera to { DEVICENAME } device cameraFoucsDevice Params: DEVICENAME, partName

Intent can help me turn the camera to align { partName } components of { DEVICENAME } equipment? Action cameraFoucsDevice Params: DEVICENAME, partName

Intent can not adjust the camera to { partName } parts, it belongs to { DEVICENAME } equipment? Action cameraFoucsDevice Params: DEVICENAME, partName

Intent I want to adjust the focus of the camera, align { partName } component actions of { DEVICENAME } device: cameraFoucsDevice Params: DEVICENAME, partName ".

S24, manually examining the occupation intention expression data set, and screening out wrong occupation intention expressions. Specifically, the above-described generic large language model generated placeholder intent expression dataset may have an inappropriate expression that is not used by the staff. Therefore, there is a need to manually screen out erroneous placeholder-intent expressions.

S25, replacing the placeholder in the placeholder intention expression data set with power equipment information, point location information and perception layer equipment information in the remote patrol system to obtain the intention expression data set. For example, replacing the component "conservator" of the device "Main Transformer" with the { partName } portion of the placeholder intent expression "enlarge { DEVICENAME }, i need to see clearly" placeholders in "generate: "enlarge the conservator portion of the Main Transformer, i need to see clearly", add the intent expression dataset.

Specifically, using API SCHEMA of the API interface in the Swagger document obtaining step S21, the placeholder intent expression dataset generated in step S23 is mixed with API SCHEMA by the Python script program to obtain an intent expression dataset that can be used for intent training and verification of a large language model (for example, chatGLM3-6B model), and one expression mode of the intent expression dataset is as follows:

“{

"tools": [

"cameraFoucsDevice" turning the camera associated with the device onto the corresponding preset points, which are generally opposite to a certain component of the device \ NPARAMETERS { \ DEVICENAME \required \drilling \name of the substation device \ partName \optical drilling \component name of the device \ nOutput: boolean [ True, false ] \n ] "

],

"conversations": [

{

"role": "user",

"Content": "I want to transfer the camera corresponding to the main transformer device to a preset point location"

},

{

"role": "assistant",

"Content": "I need use cameraFoucsDevice to turn the camera to the main transformer area"

},

{

"role": "tool",

"name": "cameraFoucsDevice",

"parameters": {

DEVICENAME Main Transformer "

},

"observation": "True"

},

{

"role": "assistant",

The content operation returns True, and the camera is successfully transferred to the main transformer "

}

]

}”。

S26, training the large language model based on the intention expression data set to obtain a remote patrol expert model. Specifically, the remote patrol expert model can identify the remote patrol system API interface and interface parameters which should be called according to the text data, and transfer the remote patrol system API interface and interface parameters in the form of source codes executed by a Python function for generating a service call request. For example:

inputting a remote patrol expert model: "I want to turn the camera corresponding to the main transformer device to the preset point location".

Remote patrol expert model output: "dispath _request (" cameraFoucsDevice "," main transformer ")".

And S3, analyzing interface parameters in the intention data, and enhancing the interface parameters to obtain enhanced intention data. The interface parameters in the analysis intention data are enhanced, so that the purpose is to ensure that the interface parameters of the API interface call are accurate, and the robustness of the interface call is improved. For example, the data authority of the remote patrol system staff is a transformer station, after the analysis of the remote patrol expert model, the interface parameter is displayed as a 'No. 1 main transformer' or a 'main transformer' in the intention data, but only one transformer named as a 'No. 1 main transformer' is arranged in the transformer station, and the 'No. 1 main transformer' is returned after the interface parameter is enhanced.

Wherein, the enhancement intention data is obtained by the following way: synchronizing the relational data and the entity data in the remote patrol system to generate a document database, and vectorizing the entity data in the remote patrol system through a word embedding model to generate a vector database; searching the interface parameters in the document database and the vector database respectively, and performing intersection operation on the search result to obtain enhanced interface parameters; and combining the enhanced interface parameters and the API interface needing to call the service into enhanced intention data.

Specifically, relational data and entity data (patrol points, power equipment, personnel information and the like) in a remote patrol system are synchronized to a document database, and the entity data is vectorized and expressed through a word embedding model (Embeding Model) and stored in a vector database when the data are synchronized. Milvus can be used as a vector database, an elastic search as a document database, and bge-large-zh as a word embedding model. For one interface parameter, on one hand, the interface parameter is subjected to keyword retrieval of a document database (elastiscearch), specifically, a _search interface (provided by elastiscearch) is used for fuzzy query to obtain an entity list similar to the interface parameter, on the other hand, after the interface parameter is vectorized and expressed through a Embeding model, the entity list similar to the interface parameter semantically is queried in a vector database (Milvus), and specifically, the nearest entity list is queried in the vector database by using Euclidean distance indexes. The two entity lists are subjected to intersection operation: specifically, in order to be compatible with the situation of Chinese harmonic words, the two lists are combined, chinese in the lists is converted into pinyin, pypinyin libraries can be adopted to execute the operation, finally, the input interface parameters are converted into pinyin, the pinyin and the input interface parameters are subjected to similarity calculation of editing distance (EDIT DISTANCE), and the search result with the maximum similarity is obtained from the lists and is finally output. As shown in fig. 3, fig. 3 is a schematic flow chart of interface parameter retrieval enhancement.

And S4, generating a service call request according to the enhanced intention data, and calling a remote patrol system to execute service through the service call request to obtain a service result. Specifically, enhanced intention data in the form of Python function source codes transmitted by a remote patrol expert model is executed, an HTTP request for calling an API interface of a remote patrol system is initiated, and the remote patrol system calls a service corresponding to the API interface according to the interface parameter and returns a service result.

And S5, inputting the service result into a remote patrol expert model to generate reply text data. Specifically, the returned service results generate reply text data of the natural language expression through a remote patrol expert model.

Step S6 converts the reply text data into reply voice data output. Specifically, the reply text data is transferred to a speech generation model (PADDLESPEECH, OPENTTS, ESPEAK model, etc.) to be converted into reply speech, and output through an audio playback device of the remote patrol system.

In one possible implementation, step S6 further includes: and acquiring an alarm text of the remote patrol system, converting the alarm text into reply voice, and outputting the reply voice through audio playing equipment of the remote patrol system.

It can be understood that, according to the voice and language model-based remote patrol system control method provided by the scheme, on one hand, the API interface service of the remote patrol system is called in a voice dialogue mode, such as a camera, an unmanned aerial vehicle and a robot are called to carry out patrol tasks, so that a service result is obtained, and an operation flow is simplified; on the other hand, the patrol alarm, the patrol result and the like are converted into voice broadcasting in the patrol process and after the patrol is completed, and the on-site staff is timely notified.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a remote patrol system control system based on a sound and language model, where the system is used to implement the remote patrol system control method based on the sound and language model as described above, and the system includes: the voice recognition module is used for acquiring voice data and converting the voice data into text data according to the fine-tuned voice recognition model; the large language model module is used for analyzing intention data from the text data according to a remote patrol expert model, and the intention data comprises: the remote patrol expert model is obtained by training an SOTA large language model; the retrieval enhancement module is used for analyzing the interface parameters in the intention data and enhancing the interface parameters to obtain enhanced intention data; the intention calling module is used for generating a service calling request according to the enhanced intention data, and calling a remote inspection system execution service through the service calling request to obtain a service result; the large language model module is also used for inputting the service result into a remote patrol expert model to generate reply text data; and the voice generation module is used for converting the reply text data into reply voice data and outputting the reply voice data.

Compared with the prior art, firstly, the application collects the voice data of the staff, converts the voice data into text data through a voice recognition model, analyzes intention data through a remote patrol expert model, further invokes corresponding service of the remote patrol system through the intention data, converts a structured service result into a text with smooth semantics through the remote patrol expert model, and finally outputs the text through voice; the staff can control the remote patrol system to execute the task through voice, finally returns the task result in the voice form, simplifies the operation flow of the remote patrol system, and is convenient for timely learning the task result. Secondly, the application enhances the interface parameters in the intention data, combines the document database and the vector database to search the expression closest to the interface parameters, ensures accurate control, avoids control errors caused by nonstandard expression of staff, improves the fault tolerance rate, and reduces the operation requirement on the staff. Thirdly, the application composes an intention realization seed library through an API interface and comments, generates occupation intention expression through supplementing the context and placeholders of the information of the remote patrol system, expands training corpus through a general large language model, obtains an intention expression data set through entity replacement placeholders, trains the remote patrol expert model through the intention expression data set, inputs a section of text data to analyze the API interface and interface parameters of the service to be called, and further controls the remote patrol system to execute tasks.

The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A method for controlling a remote patrol system based on a sound and language model, comprising:

Acquiring voice data, and converting the voice data into text data according to a fine-tuned voice recognition model;

Parsing intention data from the text data according to a remote patrol expert model, the intention data comprising: the remote patrol expert model is obtained by training an SOTA large language model;

Analyzing interface parameters in the intention data, and enhancing the interface parameters to obtain enhanced intention data;

Generating a service call request according to the enhanced intention data, and calling a remote patrol system to execute a service through the service call request to obtain a service result;

inputting the service result into a remote patrol expert model to generate reply text data;

converting the reply text data into reply voice data to be output;

The remote patrol expert model is obtained by the following steps: acquiring an API interface and comments of a remote patrol system to form an intention realization seed library; supplementing the context and placeholder of the remote patrol system information for each API interface in the intention realization seed library to generate a placeholder intention expression; the occupation intention expression is input into a general large language model to be expanded, so that a occupation intention expression data set is formed; manually inspecting the occupation intention expression data set, and screening out wrong occupation intention expressions; replacing placeholders in the placeholder intention expression data set with power equipment information, point location information and perception layer equipment information in a remote inspection system to obtain an intention expression data set; training the SOTA large language model based on the intention expression data set to obtain a remote patrol expert model;

The enhancement intention data is obtained by the following steps: synchronizing the relational data and the entity data in the remote patrol system to generate a document database, and vectorizing the entity data in the remote patrol system through a word embedding model to generate a vector database; searching the interface parameters in the document database and the vector database respectively, and performing intersection operation on the search result to obtain enhanced interface parameters; combining the enhanced interface parameters and an API interface of the service to be called into enhanced intention data;

Searching the interface parameters in the document database and the vector database respectively, and performing intersection operation on the search result to obtain enhanced interface parameters; comprising the following steps: on one hand, the interface parameters are subjected to keyword retrieval of a document database, and an entity list similar to the interface parameters is obtained; on the other hand, after the interface parameters are expressed in a vector mode through Embeding, an entity list similar to the interface parameter semantics is inquired in a vector database; in order to be compatible with the situation of Chinese harmonic words, the two lists are combined, chinese in the lists is converted into pinyin, the input interface parameters are converted into pinyin, edit distance similarity calculation is carried out on the input interface parameters and the pinyin in the lists, and the search result with the maximum similarity is obtained from the lists and is finally output.

2. A method of controlling a remote tour system according to claim 1, wherein the fine-tuned speech recognition model is obtained by: and acquiring a voice recognition SOTA model, and performing fine adjustment on the voice recognition SOTA model according to the special voice text data in the electric power field to obtain a fine-adjusted voice recognition model.

3. A method of controlling a remote tour system based on a sound and language model according to claim 1, further comprising: and acquiring an alarm text of the remote patrol system, and converting the alarm text into voice output.

4. A voice and language model based remote patrol system control system for performing a voice and language model based remote patrol system control method as claimed in any one of claims 1-3, comprising:

The voice recognition module is used for acquiring voice data and converting the voice data into text data according to the fine-tuned voice recognition model;

The large language model module is used for analyzing intention data from the text data according to a remote patrol expert model, and the intention data comprises: the remote patrol expert model is obtained by training an SOTA large language model;

the retrieval enhancement module is used for analyzing the interface parameters in the intention data and enhancing the interface parameters to obtain enhanced intention data;

the intention calling module is used for generating a service calling request according to the enhanced intention data, and calling a remote inspection system execution service through the service calling request to obtain a service result;

The large language model module is also used for inputting the service result into a remote patrol expert model to generate reply text data;

the voice generation module is used for converting the reply text data into reply voice data and outputting the reply voice data;

Wherein, the big language model module is further used for: acquiring an API interface and comments of a remote patrol system to form an intention realization seed library; supplementing the context and placeholder of the remote patrol system information for each API interface in the intention realization seed library to generate a placeholder intention expression; the occupation intention expression is input into a general large language model to be expanded, so that a occupation intention expression data set is formed; manually inspecting the occupation intention expression data set, and screening out wrong occupation intention expressions; replacing placeholders in the placeholder intention expression data set with power equipment information, point location information and perception layer equipment information in a remote inspection system to obtain an intention expression data set; training a large language model based on the intention expression data set to obtain a remote patrol expert model;

The retrieval enhancement module is further configured to: synchronizing the relational data and the entity data in the remote patrol system to generate a document database, and vectorizing the entity data in the remote patrol system through a word embedding model to generate a vector database; searching the interface parameters in the document database and the vector database respectively, and performing intersection operation on the search result to obtain enhanced interface parameters; combining the enhanced interface parameters and an API interface of the service to be called into enhanced intention data;

5. The voice and language model based remote tour system control system according to claim 4, wherein the voice recognition module is further configured to obtain a voice recognition SOTA model, and fine tune the voice recognition SOTA model according to the voice text data specific to the electric power domain, so as to obtain a fine tuned voice recognition model.

6. The voice and language model based remote tour system control system according to claim 4, wherein the voice generating module is further configured to obtain an alert text of the remote tour system, and convert the alert text into a voice output.