Nothing Special   »   [go: up one dir, main page]

CN117975968B - Remote patrol system control method and system based on sound and language model - Google Patents

Remote patrol system control method and system based on sound and language model Download PDF

Info

Publication number
CN117975968B
CN117975968B CN202410389170.7A CN202410389170A CN117975968B CN 117975968 B CN117975968 B CN 117975968B CN 202410389170 A CN202410389170 A CN 202410389170A CN 117975968 B CN117975968 B CN 117975968B
Authority
CN
China
Prior art keywords
data
intention
remote
voice
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410389170.7A
Other languages
Chinese (zh)
Other versions
CN117975968A (en
Inventor
景志斌
陈果累
何佳
叶俊
李孟福
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Kingscheme Information Technology Co ltd
Original Assignee
Sichuan Kingscheme Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Kingscheme Information Technology Co ltd filed Critical Sichuan Kingscheme Information Technology Co ltd
Priority to CN202410389170.7A priority Critical patent/CN117975968B/en
Publication of CN117975968A publication Critical patent/CN117975968A/en
Application granted granted Critical
Publication of CN117975968B publication Critical patent/CN117975968B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J13/00Circuit arrangements for providing remote indication of network conditions, e.g. an instantaneous record of the open or closed condition of each circuitbreaker in the network; Circuit arrangements for providing remote control of switching means in a power distribution network, e.g. switching in and out of current consumers by using a pulse code signal carried by the network
    • H02J13/00002Circuit arrangements for providing remote indication of network conditions, e.g. an instantaneous record of the open or closed condition of each circuitbreaker in the network; Circuit arrangements for providing remote control of switching means in a power distribution network, e.g. switching in and out of current consumers by using a pulse code signal carried by the network characterised by monitoring
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Power Engineering (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a remote patrol system control method and a remote patrol system control system based on a sound and language model, relates to the field of intelligent patrol of transformer substations, and solves the problems that the existing remote patrol system is complex in operation, easy to make mistakes in inquiry and untimely in result notification; comprising the following steps: acquiring voice data, and converting the voice data into text data according to the fine-tuned voice recognition model; parsing intention data from the text data according to the remote patrol expert model, the intention data comprising: API interface and interface parameter of the service to be called; analyzing interface parameters in the intention data, and enhancing the interface parameters to obtain enhanced intention data; generating a service call request according to the enhanced intention data, and calling a remote inspection system to execute a service through the service call request to obtain a service result; inputting the service result into a remote patrol expert model to generate reply text data; converting the reply text data into reply voice data and outputting the reply voice data; the remote patrol system can be controlled based on sound.

Description

Remote patrol system control method and system based on sound and language model
Technical Field
The invention relates to the field of intelligent patrol of transformer substations, in particular to a remote patrol system control method and system based on sound and language models.
Background
The remote inspection system for the transformer substation is a system which is planned by a national power grid, takes robots, unmanned aerial vehicles, voiceprint equipment, cameras and the like as sensing layer equipment, and generates inspection results after intelligent analysis of collected data through an algorithm host. The system mainly executes the patrol task of the substation equipment in a mode of manually starting the patrol task, setting a period of patrol task, triggering a main and auxiliary system alarm signal and the like. The system carries out the patrol task around the patrol point, wherein the patrol point refers to a check point on the service, and a plurality of patrol points generally exist in one device. The minimum service monitor point is typically constructed from one or more cameras, drones or robots preset to see if the components at each inspection point are defective. The system is designed with a retrieval module based on point location attributes, and is used for carrying out operations such as equipment monitoring and checking, patrol task starting, patrol equipment setting and the like after the patrol point location is inquired.
However, due to the fact that the service attributes of the inspection points are numerous, such as a transformer substation area, transformer substation intervals, equipment names, component names, phase names, point codes, inspection types, importance levels, identification algorithms, perception layer equipment and the like, the operation is complicated because the query accuracy is not high during the manual retrieval, and the related conditions of the equipment cannot be checked in the first time when the emergency occurs, particularly for the inspection points which are not covered by transformer substations where the main and auxiliary systems are not built or the main and auxiliary systems. On the other hand, due to the fact that a plurality of patrol points are involved, the duration of the patrol process is long, and the patrol result may not be observed by on-site staff at the first time after being generated.
Disclosure of Invention
The application aims to provide a remote patrol system control method and a remote patrol system control system based on sound and language models, which solve the problems that the existing remote patrol system is complex in operation, easy to make mistakes in inquiry and untimely in result notification; the remote inspection system of the transformer substation is expanded, the control of the remote inspection system is realized through a sound and language large model, a worker realizes man-machine communication with the remote inspection system through voice input, the remote inspection system is guided to rapidly complete inspection related work, voice feedback of inspection task results is obtained, the operation flow of the remote inspection system is simplified, the operation difficulty is reduced, and the worker can conveniently know the task results in time.
The application firstly provides a remote patrol system control method based on sound and language models, which comprises the following steps: acquiring voice data, and converting the voice data into text data according to a fine-tuned voice recognition model; parsing intention data from the text data according to a remote patrol expert model, the intention data comprising: the remote patrol expert model is obtained by training an SOTA large language model; analyzing interface parameters in the intention data, and enhancing the interface parameters to obtain enhanced intention data; generating a service call request according to the enhanced intention data, and calling a remote patrol system to execute a service through the service call request to obtain a service result; inputting the service result into a remote patrol expert model to generate reply text data; and converting the reply text data into reply voice data to be output.
By adopting the technical scheme, the service interface of the remote patrol system can be called by acquiring the voice data of the staff, the remote patrol system is controlled to execute the service, the result is displayed in a voice mode, the remote patrol system can be controlled to execute the service through the voice, the service result is returned in a voice mode, and the operation flow of the staff is simplified. In addition, accurate calling is achieved through intention enhancement of interface parameters, and requirements on workers are reduced.
In one possible implementation, the fine-tuned speech recognition model is obtained by: and acquiring a voice recognition SOTA model, and performing fine adjustment on the voice recognition SOTA model according to the special voice text data in the electric power field to obtain a fine-adjusted voice recognition model.
In one possible implementation, the remote patrol expert model is obtained by: acquiring an API interface and comments of a remote patrol system to form an intention realization seed library; supplementing the context and placeholder of the remote patrol system information for each API interface in the intention realization seed library to generate a placeholder intention expression; the occupation intention expression is input into a general large language model to be expanded, so that a occupation intention expression data set is formed; manually inspecting the occupation intention expression data set, and screening out wrong occupation intention expressions; replacing placeholders in the placeholder intention expression data set with power equipment information, point location information and perception layer equipment information in a remote inspection system to obtain an intention expression data set; training the large language model based on the intention expression data set to obtain a remote patrol expert model.
In one possible implementation, the enhanced intent data is obtained by: synchronizing the relational data and the entity data in the remote patrol system to generate a document database, and vectorizing the entity data in the remote patrol system through a word embedding model to generate a vector database; searching the interface parameters in the document database and the vector database respectively, and performing intersection operation on the search result to obtain enhanced interface parameters; and combining the enhanced interface parameters and the API interface needing to call the service into enhanced intention data.
In one possible embodiment, the method further comprises: and acquiring an alarm text of the remote patrol system, and converting the alarm text into voice output.
The application also provides a remote patrol system control system based on the sound and language model, comprising: the voice recognition module is used for acquiring voice data and converting the voice data into text data according to the fine-tuned voice recognition model; the large language model module is used for analyzing intention data from the text data according to a remote patrol expert model, and the intention data comprises: the remote patrol expert model is obtained by training an SOTA large language model; the retrieval enhancement module is used for analyzing the interface parameters in the intention data and enhancing the interface parameters to obtain enhanced intention data; the intention calling module is used for generating a service calling request according to the enhanced intention data, and calling a remote inspection system execution service through the service calling request to obtain a service result; the large language model module is also used for inputting the service result into a remote patrol expert model to generate reply text data; and the voice generation module is used for converting the reply text data into reply voice data and outputting the reply voice data.
In a possible implementation manner, the voice recognition module is further configured to obtain a voice recognition SOTA model, and fine tune the voice recognition SOTA model according to the voice text data specific to the electric power field, so as to obtain a fine-tuned voice recognition model.
In one possible implementation, the large language model module is further configured to: acquiring an API interface and comments of a remote patrol system to form an intention realization seed library; supplementing the context and placeholder of the remote patrol system information for each API interface in the intention realization seed library to generate a placeholder intention expression; the occupation intention expression is input into a general large language model to be expanded, so that a occupation intention expression data set is formed; manually inspecting the occupation intention expression data set, and screening out wrong occupation intention expressions; replacing placeholders in the placeholder intention expression data set with power equipment information, point location information and perception layer equipment information in a remote inspection system to obtain an intention expression data set; training the large language model based on the intention expression data set to obtain a remote patrol expert model.
In a possible implementation manner, the retrieval enhancement module is further configured to: synchronizing the relational data and the entity data in the remote patrol system to generate a document database, and vectorizing the entity data in the remote patrol system through a word embedding model to generate a vector database; searching the interface parameters in the document database and the vector database respectively, and performing intersection operation on the search result to obtain enhanced interface parameters; and combining the enhanced interface parameters and the API interface needing to call the service into enhanced intention data.
In a possible implementation manner, the voice generation module is further configured to obtain an alarm text of the remote patrol system, and convert the alarm text into voice output.
Compared with the prior art, the application has the following beneficial effects: the application collects the voice data of staff, converts the voice data into text data through a voice recognition model, analyzes intention data through a remote patrol expert model, further invokes corresponding service of the remote patrol system through the intention data, converts a structured service result into a text with smooth semantics through the remote patrol expert model, and finally outputs the text through voice; the staff can control the remote patrol system to execute the task through voice, finally returns the task result in the voice form, simplifies the operation flow of the remote patrol system, and is convenient for timely learning the task result;
according to the application, the interface parameters in the intention data are enhanced, and the document database and the vector database are combined to search the expression closest to the interface parameters, so that the accurate control is ensured, the control error caused by the abnormal expression of the staff is avoided, the fault tolerance is improved, and the operation requirement on the staff is reduced;
According to the application, an intention realization seed library is formed by an API interface and comments, a space occupation intention expression is generated by supplementing the context and space occupation symbols of the remote patrol system information, a training corpus is expanded by a general large language model, an intention expression data set is obtained by replacing the space occupation symbols by an entity, a remote patrol expert model is trained by the intention expression data set, and a section of text data is input to analyze the API interface and interface parameters of a service to be called, so that the remote patrol system is controlled to execute tasks.
Drawings
The accompanying drawings, which are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the drawings:
FIG. 1 is a flow chart of a method for controlling a remote patrol system based on sound and language models;
FIG. 2 is a schematic flow chart of training a remote patrol expert model;
FIG. 3 is a flow diagram of interface parameter retrieval enhancement;
Fig. 4 is a schematic structural diagram of a remote patrol system control system based on sound and language models.
Detailed Description
For the purpose of making apparent the objects, technical solutions and advantages of the present application, the present application will be further described in detail with reference to the following examples and the accompanying drawings, wherein the exemplary embodiments of the present application and the descriptions thereof are for illustrating the present application only and are not to be construed as limiting the present application.
Referring to fig. 1, fig. 1 is a flow chart of a remote patrol system control method based on sound and language models, and the method includes: s1, acquiring voice data, and converting the voice data into text data according to a fine-tuned voice recognition model; s2, analyzing intention data from the text data according to a remote patrol expert model, wherein the intention data comprises: the remote patrol expert model is obtained by training an SOTA large language model; s3, analyzing interface parameters in the intention data, and enhancing the interface parameters to obtain enhanced intention data; s4, generating a service call request according to the enhanced intention data, and calling a remote inspection system to execute service through the service call request to obtain a service result; s5, inputting the service result into a remote patrol expert model to generate reply text data; s6, converting the reply text data into reply voice data and outputting the reply voice data.
Specifically, an existing remote patrol system is improved, voice data of staff is obtained through an audio acquisition device, the voice data are converted into text data through a voice recognition model, intention data are analyzed through a remote patrol expert model, the intention data are accurately retrieved and enhanced through interface parameters, then the corresponding service task is executed by calling the remote patrol system through the intention data, and a returned task result is converted into a text with smooth semantics through the remote patrol expert model and finally output through voice.
The improvement of the scheme is that the service interface of the remote patrol system can be called by acquiring the voice data of the staff, the remote patrol system is controlled to execute the service, the result is displayed in a voice mode, the voice control of the remote patrol system to execute the service is realized, the result is timely notified in a voice mode, and the operation flow of the staff is simplified. In addition, accurate calling is achieved through retrieving interface parameters, and requirements on workers are reduced.
Step S1, voice data are acquired, and the voice data are converted into text data according to a fine-tuned voice recognition model. In order to improve the recognition capability of the voice recognition model on the special vocabulary of the electric power field, the voice recognition model is finely tuned by adopting pre-recorded special voice text data of the electric power field. For example, a whisper with a OpenAI company open source is used as a voice recognition model, and Fine-tuning (Fine-Tune) training of the electric power field is performed on the voice recognition model through pre-recorded special voice in the electric power field and corresponding text data, so that recognition accuracy of the voice recognition model on special vocabularies in the electric power industry is improved. Further, to reduce the computational cost of the fine tuning training, the fine tuning may be performed using the Lora or QLora approach.
Step S2, analyzing intention data from the text data according to a remote patrol expert model, wherein the intention data comprises the following steps: the API interface and interface parameters of the service need to be called. The purpose of the remote patrol expert model is to analyze the intention of the text data generated by the large voice recognition model and simulate a human natural language dialogue. And the open source large language model can be selected for intention recognition training to obtain a remote patrol expert model. For example, chatGLM-6B models. In the traditional mode, intent recognition training is carried out on a large language model, and a large amount of corpus data is required to be written manually. According to the scheme, in order to reduce the labor input cost, corpus data for training is written by a general large language model to carry out intention recognition training, and a remote patrol expert model is obtained.
Referring to fig. 2, fig. 2 is a schematic flow chart of training a remote patrol expert model. The remote patrol expert model is obtained by the following steps:
s21, acquiring an API interface and comments of the remote patrol system to form an intention realization seed library. Specifically, after relevant API interface source codes needing to perform voice interaction in the existing remote inspection system are collected, manual supplement and adjustment are performed on the part with missing or unclear annotation in the source codes, and an API interface realization part of the source codes is removed. Each pair of API interfaces and annotations constitutes an API interface pair, and multiple API interface pairs constitute an intent implementation seed library.
S22, supplementing the context and placeholder of the remote patrol system information for each API interface in the intention realization seed library to generate a placeholder intention expression; s23, inputting the occupation intention expression into a general large language model for expansion to form a occupation intention expression data set. Specifically, the placeholder intent expression is a piece of text that indicates that the API interface needs to be invoked to perform an operation on the placeholder. And inputting the occupation intention expression into a universal large language model LLM to execute an intention expression generating task to obtain a occupation intention expression data set formed by a plurality of occupation intention expressions. The generic large language model herein may be selected from commercial or open source large language models such as ChatGPT, chatGLM, discourse, llama 2, etc.
For example, the intent expression generation task is performed using ChatGPT 3.5.5-Turbo model. One possible input and output are listed below, the input being a certain placeholder intent representation and the output being an expanded plurality of placeholder intent representations for use in composing the placeholder intent representation dataset. For example:
inputting a universal large language model LLM:
"remote patrol system of substation developed based on Springboot has the following interface (conceal specific implementation part)
###
At ApiOperation (value= "rotate the camera associated with the device to the corresponding preset points, which are generally directed towards a certain part of the device")
@PostMapping("/cameraFoucsDevice")
public WrappedResult<Boolean>cameraFoucsDevice(
At ApiParam (value= "name of substation equipment", name= "DEVICENAME", required = true) @ RequestParam ("DEVICENAME") STRING DEVICENAME,
@ ApiParam (value= "part name of device", name= "partName", required=false) @ RequestParam ("partName") STRING PARTNAME
){}
###
You are professional chinese linguists, helping me to enumerate the expression sentences you know, whose intent to express can be achieved by invoking the corresponding interface.
The sentence needs to meet the following requirements:
1. according to the definition of the function, parameters of the function are combined in the sentence, and the name of the parameter is strictly used as the placeholder of the entity in the sentence generated by you, such as { DEVICENAME }
2. Note that if a parameter is optional, the parameter may not be included in the sentence, and if the parameter is not included, there may not be other entities in the sentence that refer to the parameter, so that it is necessary to ensure that the grammar of the sentence is correct
3. Sentences should be expressed by using human roles as the system, and the expression modes of human beings need to be met, such as: i need to see the monitoring video of the transformer
4. Exhaustion of different expressed sentences of the same intention as far as possible, and covering all parameters of the interface, and ensuring that sentences are not repeated
Return format of each sentence:
Intent, sentence Action, called interface name Params, parameter list.
Accordingly, the generic large language model LLM returns:
"Intint: I want to transfer the camera corresponding to the { DEVICENAME } device to the preset point location Action: cameraFoucsDevice Params: DEVICENAME
Intent can help me aim the camera at { DEVICENAME } device? Action cameraFoucsDevice Params: DEVICENAME
Intent I want to focus the camera on { DEVICENAME } device Action cameraFoucsDevice Params: DEVICENAME
Intent alignment of the camera to { DEVICENAME } device Action cameraFoucsDevice Params: DEVICENAME
Intent component Action for aligning camera to { DEVICENAME } device cameraFoucsDevice Params: DEVICENAME, partName
Intent can help me turn the camera to align { partName } components of { DEVICENAME } equipment? Action cameraFoucsDevice Params: DEVICENAME, partName
Intent can not adjust the camera to { partName } parts, it belongs to { DEVICENAME } equipment? Action cameraFoucsDevice Params: DEVICENAME, partName
Intent I want to adjust the focus of the camera, align { partName } component actions of { DEVICENAME } device: cameraFoucsDevice Params: DEVICENAME, partName ".
S24, manually examining the occupation intention expression data set, and screening out wrong occupation intention expressions. Specifically, the above-described generic large language model generated placeholder intent expression dataset may have an inappropriate expression that is not used by the staff. Therefore, there is a need to manually screen out erroneous placeholder-intent expressions.
S25, replacing the placeholder in the placeholder intention expression data set with power equipment information, point location information and perception layer equipment information in the remote patrol system to obtain the intention expression data set. For example, replacing the component "conservator" of the device "Main Transformer" with the { partName } portion of the placeholder intent expression "enlarge { DEVICENAME }, i need to see clearly" placeholders in "generate: "enlarge the conservator portion of the Main Transformer, i need to see clearly", add the intent expression dataset.
Specifically, using API SCHEMA of the API interface in the Swagger document obtaining step S21, the placeholder intent expression dataset generated in step S23 is mixed with API SCHEMA by the Python script program to obtain an intent expression dataset that can be used for intent training and verification of a large language model (for example, chatGLM3-6B model), and one expression mode of the intent expression dataset is as follows:
“{
"tools": [
"cameraFoucsDevice" turning the camera associated with the device onto the corresponding preset points, which are generally opposite to a certain component of the device \ NPARAMETERS { \ DEVICENAME \required \drilling \name of the substation device \ partName \optical drilling \component name of the device \ nOutput: boolean [ True, false ] \n ] "
],
"conversations": [
{
"role": "user",
"Content": "I want to transfer the camera corresponding to the main transformer device to a preset point location"
},
{
"role": "assistant",
"Content": "I need use cameraFoucsDevice to turn the camera to the main transformer area"
},
{
"role": "tool",
"name": "cameraFoucsDevice",
"parameters": {
DEVICENAME Main Transformer "
},
"observation": "True"
},
{
"role": "assistant",
The content operation returns True, and the camera is successfully transferred to the main transformer "
}
]
}”。
S26, training the large language model based on the intention expression data set to obtain a remote patrol expert model. Specifically, the remote patrol expert model can identify the remote patrol system API interface and interface parameters which should be called according to the text data, and transfer the remote patrol system API interface and interface parameters in the form of source codes executed by a Python function for generating a service call request. For example:
inputting a remote patrol expert model: "I want to turn the camera corresponding to the main transformer device to the preset point location".
Remote patrol expert model output: "dispath _request (" cameraFoucsDevice "," main transformer ")".
And S3, analyzing interface parameters in the intention data, and enhancing the interface parameters to obtain enhanced intention data. The interface parameters in the analysis intention data are enhanced, so that the purpose is to ensure that the interface parameters of the API interface call are accurate, and the robustness of the interface call is improved. For example, the data authority of the remote patrol system staff is a transformer station, after the analysis of the remote patrol expert model, the interface parameter is displayed as a 'No. 1 main transformer' or a 'main transformer' in the intention data, but only one transformer named as a 'No. 1 main transformer' is arranged in the transformer station, and the 'No. 1 main transformer' is returned after the interface parameter is enhanced.
Wherein, the enhancement intention data is obtained by the following way: synchronizing the relational data and the entity data in the remote patrol system to generate a document database, and vectorizing the entity data in the remote patrol system through a word embedding model to generate a vector database; searching the interface parameters in the document database and the vector database respectively, and performing intersection operation on the search result to obtain enhanced interface parameters; and combining the enhanced interface parameters and the API interface needing to call the service into enhanced intention data.
Specifically, relational data and entity data (patrol points, power equipment, personnel information and the like) in a remote patrol system are synchronized to a document database, and the entity data is vectorized and expressed through a word embedding model (Embeding Model) and stored in a vector database when the data are synchronized. Milvus can be used as a vector database, an elastic search as a document database, and bge-large-zh as a word embedding model. For one interface parameter, on one hand, the interface parameter is subjected to keyword retrieval of a document database (elastiscearch), specifically, a _search interface (provided by elastiscearch) is used for fuzzy query to obtain an entity list similar to the interface parameter, on the other hand, after the interface parameter is vectorized and expressed through a Embeding model, the entity list similar to the interface parameter semantically is queried in a vector database (Milvus), and specifically, the nearest entity list is queried in the vector database by using Euclidean distance indexes. The two entity lists are subjected to intersection operation: specifically, in order to be compatible with the situation of Chinese harmonic words, the two lists are combined, chinese in the lists is converted into pinyin, pypinyin libraries can be adopted to execute the operation, finally, the input interface parameters are converted into pinyin, the pinyin and the input interface parameters are subjected to similarity calculation of editing distance (EDIT DISTANCE), and the search result with the maximum similarity is obtained from the lists and is finally output. As shown in fig. 3, fig. 3 is a schematic flow chart of interface parameter retrieval enhancement.
And S4, generating a service call request according to the enhanced intention data, and calling a remote patrol system to execute service through the service call request to obtain a service result. Specifically, enhanced intention data in the form of Python function source codes transmitted by a remote patrol expert model is executed, an HTTP request for calling an API interface of a remote patrol system is initiated, and the remote patrol system calls a service corresponding to the API interface according to the interface parameter and returns a service result.
And S5, inputting the service result into a remote patrol expert model to generate reply text data. Specifically, the returned service results generate reply text data of the natural language expression through a remote patrol expert model.
Step S6 converts the reply text data into reply voice data output. Specifically, the reply text data is transferred to a speech generation model (PADDLESPEECH, OPENTTS, ESPEAK model, etc.) to be converted into reply speech, and output through an audio playback device of the remote patrol system.
In one possible implementation, step S6 further includes: and acquiring an alarm text of the remote patrol system, converting the alarm text into reply voice, and outputting the reply voice through audio playing equipment of the remote patrol system.
It can be understood that, according to the voice and language model-based remote patrol system control method provided by the scheme, on one hand, the API interface service of the remote patrol system is called in a voice dialogue mode, such as a camera, an unmanned aerial vehicle and a robot are called to carry out patrol tasks, so that a service result is obtained, and an operation flow is simplified; on the other hand, the patrol alarm, the patrol result and the like are converted into voice broadcasting in the patrol process and after the patrol is completed, and the on-site staff is timely notified.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a remote patrol system control system based on a sound and language model, where the system is used to implement the remote patrol system control method based on the sound and language model as described above, and the system includes: the voice recognition module is used for acquiring voice data and converting the voice data into text data according to the fine-tuned voice recognition model; the large language model module is used for analyzing intention data from the text data according to a remote patrol expert model, and the intention data comprises: the remote patrol expert model is obtained by training an SOTA large language model; the retrieval enhancement module is used for analyzing the interface parameters in the intention data and enhancing the interface parameters to obtain enhanced intention data; the intention calling module is used for generating a service calling request according to the enhanced intention data, and calling a remote inspection system execution service through the service calling request to obtain a service result; the large language model module is also used for inputting the service result into a remote patrol expert model to generate reply text data; and the voice generation module is used for converting the reply text data into reply voice data and outputting the reply voice data.
In a possible implementation manner, the voice recognition module is further configured to obtain a voice recognition SOTA model, and fine tune the voice recognition SOTA model according to the voice text data specific to the electric power field, so as to obtain a fine-tuned voice recognition model.
In one possible implementation, the large language model module is further configured to: acquiring an API interface and comments of a remote patrol system to form an intention realization seed library; supplementing the context and placeholder of the remote patrol system information for each API interface in the intention realization seed library to generate a placeholder intention expression; the occupation intention expression is input into a general large language model to be expanded, so that a occupation intention expression data set is formed; manually inspecting the occupation intention expression data set, and screening out wrong occupation intention expressions; replacing placeholders in the placeholder intention expression data set with power equipment information, point location information and perception layer equipment information in a remote inspection system to obtain an intention expression data set; training the large language model based on the intention expression data set to obtain a remote patrol expert model.
In a possible implementation manner, the retrieval enhancement module is further configured to: synchronizing the relational data and the entity data in the remote patrol system to generate a document database, and vectorizing the entity data in the remote patrol system through a word embedding model to generate a vector database; searching the interface parameters in the document database and the vector database respectively, and performing intersection operation on the search result to obtain enhanced interface parameters; and combining the enhanced interface parameters and the API interface needing to call the service into enhanced intention data.
In a possible implementation manner, the voice generation module is further configured to obtain an alarm text of the remote patrol system, and convert the alarm text into voice output.
Compared with the prior art, firstly, the application collects the voice data of the staff, converts the voice data into text data through a voice recognition model, analyzes intention data through a remote patrol expert model, further invokes corresponding service of the remote patrol system through the intention data, converts a structured service result into a text with smooth semantics through the remote patrol expert model, and finally outputs the text through voice; the staff can control the remote patrol system to execute the task through voice, finally returns the task result in the voice form, simplifies the operation flow of the remote patrol system, and is convenient for timely learning the task result. Secondly, the application enhances the interface parameters in the intention data, combines the document database and the vector database to search the expression closest to the interface parameters, ensures accurate control, avoids control errors caused by nonstandard expression of staff, improves the fault tolerance rate, and reduces the operation requirement on the staff. Thirdly, the application composes an intention realization seed library through an API interface and comments, generates occupation intention expression through supplementing the context and placeholders of the information of the remote patrol system, expands training corpus through a general large language model, obtains an intention expression data set through entity replacement placeholders, trains the remote patrol expert model through the intention expression data set, inputs a section of text data to analyze the API interface and interface parameters of the service to be called, and further controls the remote patrol system to execute tasks.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (6)

1. A method for controlling a remote patrol system based on a sound and language model, comprising:
Acquiring voice data, and converting the voice data into text data according to a fine-tuned voice recognition model;
Parsing intention data from the text data according to a remote patrol expert model, the intention data comprising: the remote patrol expert model is obtained by training an SOTA large language model;
Analyzing interface parameters in the intention data, and enhancing the interface parameters to obtain enhanced intention data;
Generating a service call request according to the enhanced intention data, and calling a remote patrol system to execute a service through the service call request to obtain a service result;
inputting the service result into a remote patrol expert model to generate reply text data;
converting the reply text data into reply voice data to be output;
The remote patrol expert model is obtained by the following steps: acquiring an API interface and comments of a remote patrol system to form an intention realization seed library; supplementing the context and placeholder of the remote patrol system information for each API interface in the intention realization seed library to generate a placeholder intention expression; the occupation intention expression is input into a general large language model to be expanded, so that a occupation intention expression data set is formed; manually inspecting the occupation intention expression data set, and screening out wrong occupation intention expressions; replacing placeholders in the placeholder intention expression data set with power equipment information, point location information and perception layer equipment information in a remote inspection system to obtain an intention expression data set; training the SOTA large language model based on the intention expression data set to obtain a remote patrol expert model;
The enhancement intention data is obtained by the following steps: synchronizing the relational data and the entity data in the remote patrol system to generate a document database, and vectorizing the entity data in the remote patrol system through a word embedding model to generate a vector database; searching the interface parameters in the document database and the vector database respectively, and performing intersection operation on the search result to obtain enhanced interface parameters; combining the enhanced interface parameters and an API interface of the service to be called into enhanced intention data;
Searching the interface parameters in the document database and the vector database respectively, and performing intersection operation on the search result to obtain enhanced interface parameters; comprising the following steps: on one hand, the interface parameters are subjected to keyword retrieval of a document database, and an entity list similar to the interface parameters is obtained; on the other hand, after the interface parameters are expressed in a vector mode through Embeding, an entity list similar to the interface parameter semantics is inquired in a vector database; in order to be compatible with the situation of Chinese harmonic words, the two lists are combined, chinese in the lists is converted into pinyin, the input interface parameters are converted into pinyin, edit distance similarity calculation is carried out on the input interface parameters and the pinyin in the lists, and the search result with the maximum similarity is obtained from the lists and is finally output.
2. A method of controlling a remote tour system according to claim 1, wherein the fine-tuned speech recognition model is obtained by: and acquiring a voice recognition SOTA model, and performing fine adjustment on the voice recognition SOTA model according to the special voice text data in the electric power field to obtain a fine-adjusted voice recognition model.
3. A method of controlling a remote tour system based on a sound and language model according to claim 1, further comprising: and acquiring an alarm text of the remote patrol system, and converting the alarm text into voice output.
4. A voice and language model based remote patrol system control system for performing a voice and language model based remote patrol system control method as claimed in any one of claims 1-3, comprising:
The voice recognition module is used for acquiring voice data and converting the voice data into text data according to the fine-tuned voice recognition model;
The large language model module is used for analyzing intention data from the text data according to a remote patrol expert model, and the intention data comprises: the remote patrol expert model is obtained by training an SOTA large language model;
the retrieval enhancement module is used for analyzing the interface parameters in the intention data and enhancing the interface parameters to obtain enhanced intention data;
the intention calling module is used for generating a service calling request according to the enhanced intention data, and calling a remote inspection system execution service through the service calling request to obtain a service result;
The large language model module is also used for inputting the service result into a remote patrol expert model to generate reply text data;
the voice generation module is used for converting the reply text data into reply voice data and outputting the reply voice data;
Wherein, the big language model module is further used for: acquiring an API interface and comments of a remote patrol system to form an intention realization seed library; supplementing the context and placeholder of the remote patrol system information for each API interface in the intention realization seed library to generate a placeholder intention expression; the occupation intention expression is input into a general large language model to be expanded, so that a occupation intention expression data set is formed; manually inspecting the occupation intention expression data set, and screening out wrong occupation intention expressions; replacing placeholders in the placeholder intention expression data set with power equipment information, point location information and perception layer equipment information in a remote inspection system to obtain an intention expression data set; training a large language model based on the intention expression data set to obtain a remote patrol expert model;
The retrieval enhancement module is further configured to: synchronizing the relational data and the entity data in the remote patrol system to generate a document database, and vectorizing the entity data in the remote patrol system through a word embedding model to generate a vector database; searching the interface parameters in the document database and the vector database respectively, and performing intersection operation on the search result to obtain enhanced interface parameters; combining the enhanced interface parameters and an API interface of the service to be called into enhanced intention data;
Searching the interface parameters in the document database and the vector database respectively, and performing intersection operation on the search result to obtain enhanced interface parameters; comprising the following steps: on one hand, the interface parameters are subjected to keyword retrieval of a document database, and an entity list similar to the interface parameters is obtained; on the other hand, after the interface parameters are expressed in a vector mode through Embeding, an entity list similar to the interface parameter semantics is inquired in a vector database; in order to be compatible with the situation of Chinese harmonic words, the two lists are combined, chinese in the lists is converted into pinyin, the input interface parameters are converted into pinyin, edit distance similarity calculation is carried out on the input interface parameters and the pinyin in the lists, and the search result with the maximum similarity is obtained from the lists and is finally output.
5. The voice and language model based remote tour system control system according to claim 4, wherein the voice recognition module is further configured to obtain a voice recognition SOTA model, and fine tune the voice recognition SOTA model according to the voice text data specific to the electric power domain, so as to obtain a fine tuned voice recognition model.
6. The voice and language model based remote tour system control system according to claim 4, wherein the voice generating module is further configured to obtain an alert text of the remote tour system, and convert the alert text into a voice output.
CN202410389170.7A 2024-04-02 2024-04-02 Remote patrol system control method and system based on sound and language model Active CN117975968B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410389170.7A CN117975968B (en) 2024-04-02 2024-04-02 Remote patrol system control method and system based on sound and language model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410389170.7A CN117975968B (en) 2024-04-02 2024-04-02 Remote patrol system control method and system based on sound and language model

Publications (2)

Publication Number Publication Date
CN117975968A CN117975968A (en) 2024-05-03
CN117975968B true CN117975968B (en) 2024-09-10

Family

ID=90864985

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410389170.7A Active CN117975968B (en) 2024-04-02 2024-04-02 Remote patrol system control method and system based on sound and language model

Country Status (1)

Country Link
CN (1) CN117975968B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113067411A (en) * 2021-03-29 2021-07-02 北京智盟信通科技有限公司 Remote intelligent inspection system of transformer substation
CN115827750A (en) * 2022-11-02 2023-03-21 国网上海市电力公司 Inspection robot remote control system and method based on natural semantic recognition
CN117370493A (en) * 2023-09-22 2024-01-09 中国司法大数据研究院有限公司 Intelligent interaction method and device for system based on large language model

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107657680A (en) * 2017-08-30 2018-02-02 国网上海市电力公司 A kind of transformer substation remote monitoring system based on indoor substation crusing robot
CN109147768A (en) * 2018-09-13 2019-01-04 云南电网有限责任公司 A kind of audio recognition method and system based on deep learning
CA3164413A1 (en) * 2020-01-22 2021-07-29 Amit Choudhary Providing an intent suggestion to a user in a text-based conversation
CN111951805B (en) * 2020-07-10 2024-09-20 华为技术有限公司 Text data processing method and device
US11664010B2 (en) * 2020-11-03 2023-05-30 Florida Power & Light Company Natural language domain corpus data set creation based on enhanced root utterances
CN113270103A (en) * 2021-05-27 2021-08-17 平安普惠企业管理有限公司 Intelligent voice dialogue method, device, equipment and medium based on semantic enhancement
CN115858723A (en) * 2022-08-26 2023-03-28 国网江苏省电力有限公司无锡供电分公司 Query graph generation method and system for complex knowledge base question answering
CN116028608A (en) * 2023-01-10 2023-04-28 虎博网络技术(上海)有限公司 Question-answer interaction method, question-answer interaction device, computer equipment and readable storage medium
CN117290411B (en) * 2023-11-22 2024-02-13 深圳九有数据库有限公司 Multimode database query method and device, electronic equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113067411A (en) * 2021-03-29 2021-07-02 北京智盟信通科技有限公司 Remote intelligent inspection system of transformer substation
CN115827750A (en) * 2022-11-02 2023-03-21 国网上海市电力公司 Inspection robot remote control system and method based on natural semantic recognition
CN117370493A (en) * 2023-09-22 2024-01-09 中国司法大数据研究院有限公司 Intelligent interaction method and device for system based on large language model

Also Published As

Publication number Publication date
CN117975968A (en) 2024-05-03

Similar Documents

Publication Publication Date Title
CN109376353B (en) Natural language processing-based power grid starting operation ticket generation device and method
EP1203366B1 (en) Automatically determining the accuracy of a pronunciation dictionary in a speech recognition system
US10672391B2 (en) Improving automatic speech recognition of multilingual named entities
CN111090730B (en) Intelligent voice scheduling system and method
JP2017058674A (en) Apparatus and method for speech recognition, apparatus and method for training transformation parameter, computer program and electronic apparatus
CN110910903B (en) Speech emotion recognition method, device, equipment and computer readable storage medium
CN110929094A (en) Video title processing method and device
CN112465144B (en) Multi-mode demonstration intention generation method and device based on limited knowledge
CN111179928A (en) Intelligent control method for power transformation and distribution station based on voice interaction
CN111695763B (en) Scheduling system and method based on voice question and answer
CN109344374A (en) Report generation method and device, electronic equipment based on big data, storage medium
CN113140219A (en) Regulation and control instruction generation method and device, electronic equipment and storage medium
CN117975968B (en) Remote patrol system control method and system based on sound and language model
WO2018180971A1 (en) Information processing system, feature value explanation method and feature value explanation program
CN112802460A (en) Space environment forecasting system based on voice processing
CN108932225A (en) For natural language demand to be converted into the method and system of semantic modeling language statement
CN117391095A (en) Natural language analysis method and device, electronic equipment and storage medium
CN114238606A (en) Intelligent voice customer service robot conversation management method and device
Milhorat et al. What if everyone could do it? a framework for easier spoken dialog system design
CN114625759A (en) Model training method, intelligent question answering method, device, medium, and program product
CN113591470A (en) Semantic understanding method and device
CN112148845A (en) Method and device for inputting verbal resources of robot, electronic equipment and storage medium
CN112270930A (en) Method for voice recognition conversion
Altundogan et al. Transformer Based Multimodal Summarization and Highlight Abstraction Approach for Texts and Speech Audios
CN118467680B (en) Dynamic multi-intention semantic understanding method, device, computer equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant