Nothing Special   »   [go: up one dir, main page]

CN117390198A - Method, device, equipment and medium for constructing scientific and technological knowledge graph in electric power field - Google Patents

Method, device, equipment and medium for constructing scientific and technological knowledge graph in electric power field Download PDF

Info

Publication number
CN117390198A
CN117390198A CN202311381173.8A CN202311381173A CN117390198A CN 117390198 A CN117390198 A CN 117390198A CN 202311381173 A CN202311381173 A CN 202311381173A CN 117390198 A CN117390198 A CN 117390198A
Authority
CN
China
Prior art keywords
entity
electric power
model
power field
knowledge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311381173.8A
Other languages
Chinese (zh)
Inventor
王一竹
张栋栋
刘玉玺
杨强
于海亮
陈宜亮
刘沿娟
蒋顾杰
赵克生
张宏烨
张泽宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing China Power Information Technology Co Ltd
Original Assignee
Beijing China Power Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing China Power Information Technology Co Ltd filed Critical Beijing China Power Information Technology Co Ltd
Priority to CN202311381173.8A priority Critical patent/CN117390198A/en
Publication of CN117390198A publication Critical patent/CN117390198A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Databases & Information Systems (AREA)
  • Animal Behavior & Ethology (AREA)
  • Water Supply & Treatment (AREA)
  • Public Health (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method, a device, equipment and a medium for constructing a technical knowledge graph in the electric power field, and obtaining a technical corpus in the electric power field according to electric power basic data; carrying out knowledge extraction according to the electric power field science and technology corpus, the knowledge model and the entity relation extraction model to obtain electric power field science and technology term entity data, wherein the electric power field science and technology term entity data comprises entity objects and entity relations; carrying out knowledge representation according to the technical term entity data and the translation model in the electric power field to obtain entity vectors; carrying out knowledge fusion on the entity objects according to the entity vectors to obtain fused entity objects; and finally, constructing a technical knowledge graph of the electric power field according to the technical term entity data of the electric power field and the fused entity object. In the process, the historical electric power field science and technology project data is used as the electric power field science and technology corpus to form the electric power field science and technology knowledge graph with richer semantics, and the universality of the electric power field science and technology knowledge graph is improved.

Description

Method, device, equipment and medium for constructing scientific and technological knowledge graph in electric power field
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a method, a device, equipment and a medium for constructing a scientific and technological knowledge graph in the electric power field.
Background
Along with the rapid development of science and technology, the investment of enterprises to electric science and technology projects is also continuously increased, and in order to efficiently utilize the electric science and technology projects, advanced digital means are required to be applied to scientifically and finely manage scientific research project data, so that the intelligent operation and achievement knowledge service level of scientific research management is improved.
At present, a knowledge graph of the electric power field is generally constructed to manage scientific research project data, but the existing knowledge graph of the electric power field can only be applied to specific application services, such as electric power operation and inspection or electric power marketing, so that the universality is low, and each application service in the whole electric power system can not be realized.
Disclosure of Invention
The application provides a method, a device, equipment and a medium for constructing a technical knowledge graph in the electric power field, which are used for obtaining entity vectors by utilizing the technical corpus in the electric power field, and fusing knowledge from different knowledge bases or data sources according to the entity vectors to form the technical knowledge graph in the electric power field with richer semantics, so that each application service in the whole electric power system is realized.
In a first aspect, the present application provides a method for constructing a scientific knowledge graph in an electric power domain, the method comprising:
acquiring science and technology corpus in the electric power field according to electric power basic data, wherein the electric power basic data is derived from historical science and technology project data in the electric power field and the Internet;
carrying out knowledge extraction according to a technical corpus in the electric power field, a knowledge model and an entity relation extraction model to obtain technical term entity data in the electric power field, wherein the technical term entity data in the electric power field comprises entity objects and entity relations, the knowledge model is obtained by comparison and preference after fine adjustment based on a plurality of pre-training models, the pre-training models are displacement language models which are completed by training, and the entity relation extraction model is a remote supervision relation extraction model which is completed by training and comparison learning;
carrying out knowledge representation according to the technical term entity data in the electric power field and a translation model to obtain entity vectors, wherein the translation model is a trained TransE model;
carrying out knowledge fusion on the entity objects according to the entity vectors to obtain fused entity objects, wherein the knowledge fusion comprises entity alignment and entity disambiguation;
and constructing a technical knowledge graph of the electric power field according to the technical term entity data of the electric power field and the fused entity object.
Optionally, obtaining the technical corpus in the electric power domain according to the electric power basic data includes:
preprocessing electric power basic data to obtain a term vocabulary data set;
and screening and marking the term vocabulary data set according to the N-gram language model to obtain the science and technology corpus in the electric power field.
Optionally, knowledge extraction is performed according to the technical corpus, the knowledge model and the entity relation extraction model in the electric power domain to obtain technical term entity data in the electric power domain, including:
entity extraction is carried out according to the science and technology corpus and the knowledge model in the electric power field, and entity objects in science and technology term entity data in the electric power field are obtained;
and extracting the relationship according to the entity object and the entity relationship extraction model to obtain the entity relationship in the science and technology term entity data in the electric power field.
Optionally, the process of obtaining the plurality of pre-trained models includes:
dividing the science and technology corpus in the historical electric power field into a training data set and a verification data set;
training the initial model by using the training data set, and verifying the trained initial model by using the verification data set to obtain a plurality of pre-training models, wherein the plurality of pre-training models are different types of displacement language models.
Optionally, the obtaining of the knowledge model includes:
performing fine adjustment on the plurality of pre-training models to obtain fine adjustment results corresponding to the plurality of pre-training models, wherein the fine adjustment is to perform classification detection on the plurality of pre-training models according to the scientific and technological corpus in the historical electric power field;
and sorting the fine tuning results corresponding to the plurality of pre-training models in a descending order, and taking the pre-training model corresponding to the fine tuning result with the highest sorting as a knowledge model.
Optionally, the obtaining process of the entity relation extraction model includes:
constructing positive and negative examples according to historical entity objects, wherein the historical entity objects are obtained by utilizing knowledge models for scientific and technological corpora in the historical electric power field;
training the remote supervision relation extraction model for comparison learning according to the positive and negative examples to obtain the entity relation extraction model.
Optionally, the obtaining of the translation model includes:
inputting the scientific corpus in the historical electric power field into a word vector model for training to obtain a word vector matrix;
constructing a positive sample set and a negative sample set according to historical electric field science and technology term entity data, wherein the historical electric field science and technology term entity data is obtained by utilizing a knowledge model and an entity relation extraction model from historical electric field science and technology corpus;
According to the positive sample set, the negative sample set and the word vector matrix, obtaining word vectors corresponding to the positive sample set and the negative sample set;
inputting the positive sample set and the negative sample set into a TransE model to obtain entity vectors corresponding to the positive sample set and the negative sample set;
fusing the entity vector and the word vector to obtain a high-dimensional feature vector;
respectively calculating distance scores corresponding to the positive sample set and the negative sample set according to the high-dimensional feature vector, and iteratively calculating a loss function of the TransE model by using the distance scores corresponding to the positive sample set and the negative sample set;
and taking the loss function as an optimization target, carrying out iterative training on the TransE model, and taking the trained TransE model as a translation model.
Optionally, knowledge fusion is performed on the entity object according to the entity vector, so as to obtain a fused entity object, which includes:
entity alignment is carried out on the entity objects by utilizing the entity vectors and the MuGNN model so as to link the entity objects with the same name in different knowledge bases;
and carrying out entity disambiguation on the entity objects according to the context semantic information characteristics so as to unify the semantics of the entity objects with the same names in different knowledge bases.
In a second aspect, the present application provides a device for constructing a scientific knowledge graph in an electric power domain, the device comprising:
The acquisition unit is used for acquiring the science and technology corpus in the electric power field according to the electric power basic data, wherein the electric power basic data is derived from the science and technology project data in the historical electric power field and the Internet;
the knowledge extraction unit is used for carrying out knowledge extraction according to the electric power field science and technology corpus, a knowledge model and an entity relation extraction model to obtain electric power field science and technology term entity data, wherein the electric power field science and technology term entity data comprises entity objects and entity relations, the knowledge model is obtained by comparison and preference after fine adjustment based on a plurality of pre-training models, the pre-training models are displacement language models which are completed by training, and the entity relation extraction model is a remote supervision relation extraction model which is completed by training and is used for comparison and learning;
the knowledge representation unit is used for carrying out knowledge representation according to the technical term entity data in the electric power field and a translation model to obtain entity vectors, wherein the translation model is a trained TransE model;
the knowledge fusion unit is used for carrying out knowledge fusion on the entity objects by the entity vectors to obtain fused entity objects, wherein the knowledge fusion comprises entity alignment and entity disambiguation;
and the construction unit is used for constructing the technical knowledge graph of the electric power field according to the technical term entity data of the electric power field and the fused entity object.
Optionally, the obtaining unit is specifically configured to:
preprocessing electric power basic data to obtain a term vocabulary data set;
and screening and marking the term vocabulary data set according to the N-gram language model to obtain the science and technology corpus in the electric power field.
Optionally, the knowledge extraction unit is specifically configured to:
entity extraction is carried out according to the science and technology corpus and the knowledge model in the electric power field, and entity objects in science and technology term entity data in the electric power field are obtained;
and extracting the relationship according to the entity object and the entity relationship extraction model to obtain the entity relationship in the science and technology term entity data in the electric power field.
Optionally, the apparatus further comprises:
the method comprises the steps of obtaining a pre-training model unit, wherein the pre-training model unit is used for dividing a science and technology corpus in the historical electric power field into a training data set and a verification data set; the training method is used for training the initial model by using the training data set, and the verification data set verifies the trained initial model to obtain a plurality of pre-training models, wherein the plurality of pre-training models are different types of displacement language models.
Optionally, the apparatus further comprises:
the knowledge model obtaining unit is used for carrying out fine adjustment on the plurality of pre-training models to obtain fine adjustment results corresponding to the plurality of pre-training models, wherein the fine adjustment is to carry out classification detection on the plurality of pre-training models according to the scientific and technological corpus in the historical electric power field; and the training method is used for sorting the fine tuning results corresponding to the plurality of pre-training models in a descending order, and taking the pre-training model corresponding to the fine tuning result with the highest sorting as the knowledge model.
Optionally, the apparatus further comprises:
the entity relation extraction model unit is used for constructing positive and negative examples according to historical entity objects, wherein the historical entity objects are obtained by utilizing knowledge models for science and technology corpus in the historical electric power field; training the remote supervision relation extraction model for comparison learning according to the positive and negative examples to obtain the entity relation extraction model.
Optionally, the apparatus further comprises:
the translation model obtaining unit is used for inputting the scientific and technological corpus in the historical electric power field into the word vector model for training to obtain a word vector matrix; the method comprises the steps of constructing a positive sample set and a negative sample set according to historical electric power field science and technology term entity data, wherein the historical electric power field science and technology term entity data is obtained by utilizing a knowledge model and an entity relation extraction model from historical electric power field science and technology corpus; the word vector processing method comprises the steps of obtaining word vectors corresponding to a positive sample set and a negative sample set according to the positive sample set, the negative sample set and a word vector matrix; the method comprises the steps of inputting a positive sample set and a negative sample set into a TransE model to obtain entity vectors corresponding to the positive sample set and the negative sample set; the method comprises the steps of fusing entity vectors and word vectors to obtain high-dimensional feature vectors; the method comprises the steps of respectively calculating distance scores corresponding to a positive sample set and a negative sample set according to high-dimensional feature vectors, and iteratively calculating a loss function of a TransE model by using the distance scores corresponding to the positive sample set and the negative sample set; and the method is used for carrying out iterative training on the TransE model by taking the loss function as an optimization target, and taking the trained TransE model as a translation model.
Optionally, the knowledge fusion unit is specifically configured to:
entity alignment is carried out on the entity objects by utilizing the entity vectors and the MuGNN model so as to link the entity objects with the same name in different knowledge bases;
and carrying out entity disambiguation on the entity objects according to the context semantic information characteristics so as to unify the semantics of the entity objects with the same names in different knowledge bases.
In a third aspect, the present application provides an electronic device, the device comprising a memory and a processor:
the memory is used for storing a computer program;
the processor is configured to perform the method provided in the first aspect above according to a computer program.
In a fourth aspect, the present application provides a computer readable storage medium for storing a computer program for performing the method provided in the first aspect above.
From this, this application has following beneficial effect:
the application provides a method for constructing a technical knowledge graph in the electric power field, which is used for obtaining a technical corpus in the electric power field according to electric power basic data; carrying out knowledge extraction according to the electric power field science and technology corpus, the knowledge model and the entity relation extraction model to obtain electric power field science and technology term entity data, wherein the electric power field science and technology term entity data comprises entity objects and entity relations; carrying out knowledge representation according to the technical term entity data and the translation model in the electric power field to obtain entity vectors; carrying out knowledge fusion on the entity objects according to the entity vectors to obtain fused entity objects, wherein the knowledge fusion comprises entity alignment and entity disambiguation; and finally, constructing a technical knowledge graph of the electric power field according to the technical term entity data of the electric power field and the fused entity object. In the process, historical electric power field technological project data and data on the Internet are used as electric power field technological corpus, and corresponding entity vectors are obtained according to the electric power field technological corpus, so that knowledge fusion is carried out according to the entity vectors, entity objects in different knowledge bases are fused, electric power field technological knowledge patterns with rich semantics are formed, the universality of the electric power field technological knowledge patterns is improved, and therefore application services in the whole electric power system are achieved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.
FIG. 1 is a flow chart of a method for constructing a scientific knowledge graph of the electric power domain according to an embodiment of the present application;
FIG. 2 is a flow chart of an embodiment of a method for constructing a technical knowledge graph in an electrical domain according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a device for constructing a scientific knowledge graph in the electric power field according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device in an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to comply with the related laws and regulations and standards of the related countries and regions.
First, the following explanation is made on technical terms involved in the embodiments of the present application:
(1) Knowledge graph: knowledge-graph is the extraction of semantic and structured data from text. The three elements of the knowledge graph comprise: entities, relationships, and attributes. The entity is the most basic element in the knowledge graph, and can be a specific person, thing or abstract concept or relationship. Relationships are used to represent some kind of association between different entities. Also, entities and relationships may have their own attributes. From the mode of the graph, nodes in the knowledge graph represent entities, edges represent semantic relations among the entities, and a basic composition unit is in a form of a triplet of 'entity-relation-entity'.
(2) Natural language processing: natural language processing has been studied to achieve various theories and methods for efficient communication between humans and computers in natural language. In the field of natural language processing, pre-training can capture rich language knowledge and semantic representations by learning language models of large-scale text data. The fine tuning is based on pre-training the model, and uses a small amount of marking data to perform supervised training on the model.
(3) Substitution language model (Permuted Language Modeling, PLM): a context-based model for pre-training self-supervised/unsupervised tasks that, for a given sequence, generates samples of all of its possible permutations as the training targets.
(4) Contrast learned remote supervision relation extraction model (CL-DSRE): the positive example sample and the negative example sample are compared in the feature space to learn the features of the samples, and the existing structured data or knowledge is utilized to realize automatic labeling of the corpus, so that data for training is generated, and the defect of high cost of manual labeling constructed with a supervision data set under the condition of data shortage is overcome.
(5) Multichannel graph neural network model (Multi-Channel Graph Neural Network, muGNN): learning entity alignment oriented knowledge graph embedding robustly encodes knowledge graphs using different relational weight patterns by introducing knowledge graph rule reasoning and rule transfer to explicitly complement the knowledge graphs.
(6) Graph Neural NetWorks (GNNs) refer to learning graph structure data using neural NetWorks to extract and discover features and patterns in the graph structure data.
At present, the existing scientific and technological knowledge patterns in the electric power field in the electric power enterprise are mostly used for solving the problem in the actual production and operation of electric power, and the text corpus is mostly from each professional production management system, so that the pertinence is very strong, but the constructed knowledge patterns are not complete enough, and a plurality of implicit knowledge is difficult to find. For example, in the aspect of power operation and inspection, expert students take power equipment as a core, the research work of knowledge maps in the construction field is relatively deep on subdivision service points, and a technical framework based on the knowledge maps is provided to support intelligent management of transformers; and (3) forming a bolt-nut pair by utilizing the association between the bolts and the nuts, and establishing a bolt-nut pair knowledge map by adopting a method combining deep learning and priori knowledge to guide the classification of the bolt-nut pair defects and the like. In the aspect of electric power marketing, expert scholars realize the integral improvement of intelligent retrieval, intelligent question answering and active calling ability in electric power customer service based on a knowledge graph technology. In the aspect of power dispatching, fault information analysis and discrimination, intelligent auxiliary decision and multidimensional interpersonal interaction are realized based on a knowledge graph, and the possibility of floor application in business is researched by verifying data and knowledge dual-driven power topology data analysis. The knowledge graphs in the electric power fields are aimed at a certain electric power application service, so that the universality is low, and each application service in the whole electric power system can not be realized.
Along with the development of electric power science and technology research, a large number of researchers participate in the development of each professional in the electric power industry, mass science and technology texts are formed in the process, and any one device, concept and process in an electric power system can almost find corresponding science and technology projects. Therefore, the novel technical knowledge map of the electric power field can be constructed by utilizing the data information carried by the technical project of the electric power field so as to improve the universality.
In the embodiment of the application, the electric power field science and technology corpus is obtained according to electric power basic data, wherein the electric power basic data is derived from historical electric power field science and technology project data and the Internet; carrying out knowledge extraction according to a technical corpus in the electric power field, a knowledge model and an entity relation extraction model to obtain technical term entity data in the electric power field, wherein the technical term entity data in the electric power field comprises entity objects and entity relations, the knowledge model is obtained by comparison and preference after fine adjustment based on a plurality of pre-training models, the pre-training models are displacement language models which are completed by training, and the entity relation extraction model is a remote supervision relation extraction model which is completed by training and comparison learning; carrying out knowledge representation according to the technical term entity data in the electric power field and a translation model to obtain entity vectors, wherein the translation model is a trained TransE model; carrying out knowledge fusion on the entity objects according to the entity vectors to obtain fused entity objects, wherein the knowledge fusion comprises entity alignment and entity disambiguation; and constructing a technical knowledge graph of the electric power field according to the technical term entity data of the electric power field and the fused entity object.
As can be seen, in the method provided by the implementation of the present application, the power domain science and technology corpus of the power domain science and technology knowledge graph is constructed by using the power domain science and technology project data, knowledge extraction is performed according to the power domain science and technology corpus, the power domain science and technology term entity data comprises entity objects and entity relations, knowledge representation is performed by using the power domain science and technology term entity data, entity vectors are obtained, finally knowledge fusion is performed on the entity objects according to the entity vectors, so that knowledge fusion across a knowledge base is realized, and finally the power domain science and technology knowledge graph is constructed according to the power domain science and technology term entity data and the fused entity objects. In the process, the data information carried by the science and technology project in the electric power field is fully utilized, concepts, entities, events and relations among the concepts, the entities and the events in the electric power system are described in a structural mode, and a more effective cross-media big data organization, management and cognition capability is provided for the industrial chain of the electric power industry.
In order to facilitate understanding of the specific implementation of the method for constructing the technical knowledge graph in the electric power field provided in the embodiment of the present application, the following description will be given with reference to the accompanying drawings.
It should be noted that, the main body implementing the method for constructing the technical knowledge graph in the electric power domain may be the device for constructing the technical knowledge graph in the electric power domain provided in the embodiment of the present application, and the device for constructing the technical knowledge graph in the electric power domain may be carried in an electronic device or a functional module of the electronic device. The electronic device in the embodiment of the present application may be any device capable of implementing the method for building a technical knowledge graph in the electric power domain in the embodiment of the present application, for example, may be an internet of things (Internet ofThings, ioT) device.
Fig. 1 is a flow chart of a method for constructing a scientific knowledge graph in the electric power field according to an embodiment of the present application. The method may be applied to a device for constructing a technical knowledge graph of an electric power domain, for example, the device 300 for constructing a technical knowledge graph of an electric power domain as shown in fig. 3, or the device for constructing a technical knowledge graph of an electric power domain may be a functional module integrated in the electronic device 400 as shown in fig. 4.
As shown in fig. 1, the method includes the following S101 to S105:
s101: according to the electric power basic data, the electric power field science and technology corpus is obtained, and the electric power basic data is derived from historical electric power field science and technology project data and the Internet.
In order to construct a technical knowledge graph of the electric power field, firstly, a technical corpus of the electric power field is obtained according to electric power basic data, knowledge extraction is carried out according to the technical corpus of the electric power field, a knowledge model and a entity relation extraction model, technical term entity data of the electric power field is obtained, knowledge representation is carried out according to the technical term entity data of the electric power field and a translation model, entity vectors are obtained, knowledge fusion is carried out on the entity objects according to the entity vectors, a fused entity object is obtained, and finally, the technical knowledge graph of the electric power field is constructed according to the technical term entity data of the electric power field and the fused entity object. Therefore, in the embodiment of the application, the electric power domain science and technology corpus is obtained according to the electric power basic data through S101, and a precondition is provided for obtaining the electric power domain science and technology term entity data.
As one example, S101 may include: preprocessing the power base data to obtain a term vocabulary dataset, wherein the preprocessing comprises: text cleaning, chinese word segmentation and other modes; and screening and marking the term vocabulary data set according to the N-gram language model to obtain the science and technology corpus in the electric power field.
The power basic data includes structured data, semi-structured data and unstructured data, and it should be noted that the process of obtaining the structured data may include: the method comprises the steps of mapping the structured data of historical science and technology projects in a relational database and an object-oriented database into an ontology mode by utilizing a conversion rule, and carrying out data standardization processing by adopting a manual checking method; the process of obtaining semi-structured data may include: acquiring semi-structured data of the relationship between the entity objects at websites such as encyclopedia websites and industry vertical websites by using a Scrapy tool; the process of obtaining unstructured data may include: unstructured data such as unstructured documents in historical science and technology projects, documents inside enterprises, industry documents, unstructured text in internet web pages and the like are obtained.
The method comprises the steps of acquiring a term retrieval word corresponding to a term vocabulary data set according to an N-gram language model, acquiring related explanation and category of the term retrieval word from an encyclopedia website, and realizing semi-automatic labeling of the term vocabulary data set in a manual judgment mode according to the acquired related explanation and category, so as to obtain a science and technology corpus in the electric power field.
S102: and carrying out knowledge extraction according to the electric power field science and technology corpus, the knowledge model and the entity relation extraction model to obtain electric power field science and technology term entity data, wherein the electric power field science and technology term entity data comprises entity objects and entity relations, the knowledge model is obtained by fine adjustment based on a plurality of pre-training models, the pre-training models are displacement language models which are completed by training, and the entity relation extraction model is a remote supervision relation extraction model which is completed by training and is used for contrasting learning.
As one example, S102 may include: s1021, entity extraction is carried out according to the science and technology corpus and knowledge model in the electric power field, and entity objects in the science and technology term entity data in the electric power field are obtained; s1022, performing relation extraction according to the entity object and the entity relation extraction model to obtain the entity relation in the science and technology term entity data of the electric power field.
The knowledge model in S1021 is obtained by performing fine tuning according to a plurality of pre-training models, so that the pre-training model needs to be obtained before the knowledge model is obtained, and the obtaining process of the pre-training model includes: firstly, dividing a science and technology corpus in the historical electric power field into a training data set and a verification data set; and training the initial model by using the training data set, and finally verifying the trained initial model by using the verification data set to obtain a plurality of pre-training models, wherein the plurality of pre-training models are different types of displacement language models. The historical electric power field science and technology corpus further comprises a test set, the test data set is utilized to test a plurality of pre-training models, and model test results of the plurality of pre-training models can be obtained and are used for showing that the language model trained by the electric power science and technology corpus is better than the untrained language model.
After a plurality of pre-training models are obtained, the pre-training models are subjected to fine tuning, and fine tuning results corresponding to the pre-training models are obtained, wherein the fine tuning is to classify and detect the pre-training models according to the scientific and technological corpus in the historical electric power field; and sorting the fine tuning results corresponding to the plurality of pre-training models in a descending order, and taking the pre-training model corresponding to the fine tuning result with the highest sorting as a knowledge model.
The obtained knowledge model can be used for representing the characteristics of the electric power science and technology project and mining the subsequent electric power science and technology knowledge, and the specific process of obtaining the entity object can be, for example, obtaining the entity object in the electric power domain science and technology term entity data by using the knowledge model through the electric power domain science and technology related term website or online encyclopedia vocabulary entry and other modes.
Wherein, the obtaining process of the entity relation extraction model in S1022 includes: constructing positive and negative examples according to historical entity objects, wherein the historical entity objects are obtained by utilizing knowledge models for scientific and technological corpora in the historical electric power field; training the remote supervision relation extraction model for comparison learning according to the positive and negative examples to obtain the entity relation extraction model.
S103: and carrying out knowledge representation according to the technical term entity data in the electric power field and a translation model to obtain entity vectors, wherein the translation model is a trained TransE model.
As one example, S103 may include: and inputting entity language entity data of technical terms of the electric power field science and technology into the translation model to obtain corresponding entity vectors.
The obtaining process of the translation model comprises the following steps: inputting the scientific corpus in the historical electric power field into a word vector model for training to obtain a word vector matrix, wherein the word vector model can be a word2vec model; constructing a positive sample set and a negative sample set according to historical electric field science and technology term entity data, wherein the historical electric field science and technology term entity data is obtained by utilizing a knowledge model and an entity relation extraction model from historical electric field science and technology corpus; according to the positive sample set, the negative sample set and the word vector matrix, word vectors corresponding to the positive sample set and the negative sample set are obtained, for example, entity objects and entity relations in the positive sample set and the negative sample set are split into characters, and then word vectors corresponding to the characters are obtained according to the word vector matrix; inputting the positive sample set and the negative sample set into the TransE model to obtain entity vectors corresponding to the positive sample set and the negative sample set, for example, inputting the positive sample set and the negative sample set into an Embedding layer of the TransE model to obtain corresponding entity vectors; fusing the entity vector and the word vector to obtain a high-dimensional feature vector; respectively calculating distance scores corresponding to the positive sample set and the negative sample set according to the high-dimensional feature vector, and iteratively calculating a loss function of the TransE model by using the distance scores corresponding to the positive sample set and the negative sample set; and taking the loss function as an optimization target, carrying out iterative training on the TransE model, and taking the trained TransE model as a translation model.
S104: and carrying out knowledge fusion on the entity objects according to the entity vectors to obtain fused entity objects, wherein the knowledge fusion comprises entity alignment and entity disambiguation.
As one example, since entity vectors can be used to link different representations of the same entity. By mapping entities in different knowledge bases into a shared entity vector space, the same entities can be identified and linked, thereby achieving knowledge fusion across knowledge bases. Thus, S104 may include: entity alignment is carried out on the entity objects by utilizing the entity vectors and the MuGNN model so as to link the entity objects with the same name in different knowledge bases; and carrying out entity disambiguation on the entity objects according to the context semantic information characteristics so as to unify the semantics of the entity objects with the same names in different knowledge bases.
Wherein S1041 may include: firstly, an annotation meaning mechanism is used for effectively capturing the association relation between different entity objects in a knowledge graph consisting of technical and scientific term entity data in the electric power field, enhancing understanding of semantic relation between the entity objects, and comparing and trimming the semantic relation. And then, the GCN technology is used for mining the structural characteristics in the knowledge graph, and in order to better process the contribution of each entity object in alignment, the position and the importance of the entity object in the knowledge graph are deeply understood by giving a weight coefficient according to the importance of each entity object in alignment, so that the accuracy of alignment is improved, and the alignment operation is accurately carried out. And finally, using an average pooling method to express the embedding of each entity object together so as to finish the alignment linking of the entity objects.
Wherein S1042 may include: and calculating cosine similarity values of the entity object to be disambiguated and the candidate entity object based on the context semantic information characteristics so as to complete the entity disambiguation work.
Specifically, the method comprises the following steps: the contextual characteristics are first obtained by capturing text information around the physical object reference, defining a fixed size contextual window, and taking the words or phrases around the physical object reference as contextual characteristics. Next, context information is represented using word vectors, words around the physical object references are embedded into vector space, and these vectors are averaged and pooled to generate semantic vectors of context. Each candidate entity object (typically an entity object in a database of historical power domain technology projects) is then similarly represented as a semantic vector. The semantic vector of the entity object mention and the semantic vector of the candidate entity object are compared using cosine similarity to characterize the degree of semantic association between the entity object mention and each candidate entity object. Further, each candidate entity object is assigned a confidence score based on the semantic similarity calculation, the confidence score indicating the likelihood that the candidate entity object is the correct choice for the entity object mention. The higher the confidence score, the tighter the semantic association that the corresponding candidate entity object is represented with the entity object mention. Finally, the entity object with the highest confidence score is selected from the candidate entity objects as the final disambiguation result.
S105: and constructing a technical knowledge graph of the electric power field according to the technical term entity data of the electric power field and the fused entity object.
As can be seen, in the embodiment of the present application, the historical power domain technological project data and the data obtained on the internet are used as the power basic data, and the power domain technological corpus is obtained according to the power basic data; carrying out knowledge extraction according to the electric power field science and technology corpus, the knowledge model and the entity relation extraction model to obtain electric power field science and technology term entity data, wherein the electric power field science and technology term entity data comprises entity objects and entity relations; carrying out knowledge representation according to the technical term entity data and the translation model in the electric power field to obtain entity vectors; carrying out knowledge fusion on the entity objects according to the entity vectors to obtain fused entity objects, wherein the knowledge fusion comprises entity alignment and entity disambiguation; and finally, constructing a technical knowledge graph of the electric power field according to the technical term entity data of the electric power field and the fused entity object. In the process, historical electric power field technological project data and data on the Internet are used as electric power field technological corpus, and corresponding entity vectors are obtained according to the electric power field technological corpus, so that knowledge fusion is carried out according to the entity vectors, entity objects in different knowledge bases are fused, electric power field technological knowledge patterns with rich semantics are formed, the universality of the electric power field technological knowledge patterns is improved, and therefore application services in the whole electric power system are achieved.
In order to make the method provided by the embodiments of the present application clearer and easier to understand, a specific example of the method is described below with reference to fig. 2.
As shown in fig. 2, this embodiment may include:
s201: and acquiring electric power basic data according to the historical electric power field technological project data and the data on the Internet.
As one example, S201 may include: the method comprises the steps of mapping the structured data of historical science and technology projects in a relational database and an object-oriented database into an ontology mode by utilizing a conversion rule, and carrying out data standardization processing by adopting a manual checking method to obtain structured data in electric power basic data; acquiring semi-structured data of the relationship between the entity objects at websites such as encyclopedia websites and industry vertical websites by using a Scrapy tool; unstructured data such as unstructured documents in historical science and technology projects, documents in enterprises, industry documents, unstructured text in internet webpages and the like are obtained to serve as unstructured data in electric power basic data.
S202: and preprocessing, screening and marking the electric power basic data to obtain the science and technology corpus in the electric power field.
S203: training and verifying an initial model according to the scientific and technological corpus in the historical electric power field to obtain a plurality of pre-training models, wherein the plurality of pre-training models are different types of replacement language models.
As one example, S203 may include: dividing the science and technology corpus in the historical electric power field into a training data set and a verification data set; the training data set and the verification data set are sequentially input into the initial model to obtain 5 permutation language models, which may include: BERT-wwm, BERT-base, roBERTa-wwm, roBERTa-base, ELECTRA, etc.
S204: and fine tuning the plurality of pre-training models to obtain fine tuning results corresponding to the plurality of pre-training models, and obtaining a knowledge model according to the fine tuning results.
As one example, S204 may include: classifying and detecting a plurality of pre-training models by utilizing a scientific corpus in the historical electric power field to obtain a fine tuning result; and then, sorting the pre-training models in a descending order according to the fine tuning results, and taking the pre-training model corresponding to the fine tuning result with the highest sorting as a knowledge model.
S205: and extracting the entity according to the science and technology corpus and the knowledge model in the electric power field to obtain the entity object in the science and technology term entity data in the electric power field.
S206: training the remote supervision relation extraction model for comparison learning according to a historical entity object to obtain an entity relation extraction model, wherein the historical entity object is obtained by utilizing a knowledge model for science and technology corpus in the historical electric power field.
As one example, S206 may include: using the obtained knowledge model as a model encoder, realizing word segmentation, embedding and sentence characteristic representation of an input sentence under a PyTorch frame, and describing vector representation of a text training entity by using a single-layer CNN coding entity; and carrying out package level coding under an MIL frame, constructing positive and negative examples, taking a science and technology corpus in the historical electric power field as a training set, taking a sentence in the training set as a segmentation point by taking a historical entity object, carrying out enhancement processing on a plurality of text fragments, and replacing words with lower TF-IDF values, thereby forming new positive and negative examples. And learning a loss function in the optimization model by using the positive and negative examples, so as to obtain an entity relation extraction model.
S207: and extracting the relationship according to the entity object and the entity relationship extraction model to obtain the entity relationship in the science and technology term entity data in the electric power field.
S208: and training the TransE model according to the historical electric power field technological corpus and the historical electric power field technological term entity data to obtain a translation model.
S209: and carrying out knowledge representation according to the technical corpus in the electric power field and the technical term entity data in the electric power field and the translation model to obtain entity vectors.
S210: and carrying out knowledge fusion on the entity objects according to the entity vectors to obtain fused entity objects, wherein the knowledge fusion comprises entity alignment and entity disambiguation.
S211: and constructing a technical knowledge graph of the electric power field according to the technical term entity data of the electric power field and the fused entity object.
The embodiment provides a method for constructing a technical knowledge graph in the electric power field, which is used for acquiring electric power basic data according to historical electric power field technical project data and data on the Internet, preprocessing, screening and marking the electric power basic data to obtain electric power field technical corpus, and describing concepts, entities, events and relations among the concepts, entities, events in an electric power system in a structural mode to form the electric power field technical knowledge graph with richer semantics, so that a more effective cross-media big data organization, management and cognitive ability is provided for an electric power industry chain.
Referring to fig. 3, an embodiment of the present application provides an apparatus 300 for constructing a scientific knowledge graph in an electric power domain, the apparatus includes:
an obtaining unit 301, configured to obtain a science and technology corpus in an electric power domain according to electric power basic data, where the electric power basic data is derived from historical electric power domain science and technology project data and the internet;
The knowledge extraction unit 302 is configured to perform knowledge extraction according to a power domain technical corpus, a knowledge model and a physical relationship extraction model, to obtain power domain technical term entity data, where the power domain technical term entity data includes an entity object and a physical relationship, the knowledge model is obtained by performing fine adjustment and comparison and preference based on a plurality of pre-training models, the pre-training models are displacement language models that are completed by training, and the physical relationship extraction model is a remote supervision relationship extraction model that is completed by training and comparison learning;
a knowledge representation unit 303, configured to perform knowledge representation according to the technical term entity data in the electric power domain and a translation model, to obtain an entity vector, where the translation model is a trained transition model;
a knowledge fusion unit 304, configured to perform knowledge fusion on the entity objects by using the entity vectors, to obtain fused entity objects, where the knowledge fusion includes entity alignment and entity disambiguation;
the construction unit 305 is configured to construct a technical knowledge graph of the electric power domain according to the technical term entity data of the electric power domain and the fused entity object.
Alternatively, the obtaining unit 301 is specifically configured to:
preprocessing electric power basic data to obtain a term vocabulary data set;
And screening and marking the term vocabulary data set according to the N-gram language model to obtain the science and technology corpus in the electric power field.
Optionally, the knowledge extraction unit 302 is specifically configured to:
entity extraction is carried out according to the science and technology corpus and the knowledge model in the electric power field, and entity objects in science and technology term entity data in the electric power field are obtained;
and extracting the relationship according to the entity object and the entity relationship extraction model to obtain the entity relationship in the science and technology term entity data in the electric power field.
Optionally, the apparatus 300 further comprises:
the method comprises the steps of obtaining a pre-training model unit, wherein the pre-training model unit is used for dividing a science and technology corpus in the historical electric power field into a training data set and a verification data set; the training method is used for training the initial model by using the training data set, and the verification data set verifies the trained initial model to obtain a plurality of pre-training models, wherein the plurality of pre-training models are different types of displacement language models.
Optionally, the apparatus 300 further comprises:
the knowledge model obtaining unit is used for carrying out fine adjustment on the plurality of pre-training models to obtain fine adjustment results corresponding to the plurality of pre-training models, wherein the fine adjustment is to carry out classification detection on the plurality of pre-training models according to the scientific and technological corpus in the historical electric power field; and the training method is used for sorting the fine tuning results corresponding to the plurality of pre-training models in a descending order, and taking the pre-training model corresponding to the fine tuning result with the highest sorting as the knowledge model.
Optionally, the apparatus 300 further comprises:
the entity relation extraction model unit is used for constructing positive and negative examples according to historical entity objects, wherein the historical entity objects are obtained by utilizing knowledge models for science and technology corpus in the historical electric power field; training the remote supervision relation extraction model for comparison learning according to the positive and negative examples to obtain the entity relation extraction model.
Optionally, the apparatus 300 further comprises:
the translation model obtaining unit is used for inputting the scientific and technological corpus in the historical electric power field into the word vector model for training to obtain a word vector matrix; the method comprises the steps of constructing a positive sample set and a negative sample set according to historical electric power field science and technology term entity data, wherein the historical electric power field science and technology term entity data is obtained by utilizing a knowledge model and an entity relation extraction model from historical electric power field science and technology corpus; the word vector processing method comprises the steps of obtaining word vectors corresponding to a positive sample set and a negative sample set according to the positive sample set, the negative sample set and a word vector matrix; the method comprises the steps of inputting a positive sample set and a negative sample set into a TransE model to obtain entity vectors corresponding to the positive sample set and the negative sample set; the method comprises the steps of fusing entity vectors and word vectors to obtain high-dimensional feature vectors; the method comprises the steps of respectively calculating distance scores corresponding to a positive sample set and a negative sample set according to high-dimensional feature vectors, and iteratively calculating a loss function of a TransE model by using the distance scores corresponding to the positive sample set and the negative sample set; and the method is used for carrying out iterative training on the TransE model by taking the loss function as an optimization target, and taking the trained TransE model as a translation model.
Optionally, the knowledge fusion unit 304 is specifically configured to:
entity alignment is carried out on the entity objects by utilizing the entity vectors and the MuGNN model so as to link the entity objects with the same name in different knowledge bases;
and carrying out entity disambiguation on the entity objects according to the context semantic information characteristics so as to unify the semantics of the entity objects with the same names in different knowledge bases.
It should be noted that, the specific implementation manner and the achieved effect of the apparatus 300 for constructing the technical knowledge graph in the electric power field can be referred to the related description in the method provided in fig. 1 or fig. 2, and will not be repeated here.
The embodiment of the present application further provides an electronic device 400, as shown in fig. 4, where the device 400 includes a memory 401 and a processor 402:
the memory 401 is for storing a computer program;
the processor 402 is configured to perform the methods provided in fig. 1 or fig. 2 described above in accordance with a computer program.
Furthermore, the present application provides a computer readable storage medium for storing a computer program for executing the method provided in fig. 1 or fig. 2.
From the above description of embodiments, it will be apparent to those skilled in the art that all or part of the steps of the above described example methods may be implemented in software plus general hardware platforms. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which may be stored in a storage medium, such as a read-only memory (ROM)/RAM, a magnetic disk, an optical disk, or the like, including several instructions for causing a computer device (which may be a personal computer, a server, or a network communication device such as a router) to perform the methods described in the embodiments or some parts of the embodiments of the present application.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points. The apparatus embodiments described above are merely illustrative, in which the modules illustrated as separate components may or may not be physically separate, and the components shown as modules may or may not be physical modules, may be located in one place, or may be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the objective of the embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The foregoing is merely exemplary embodiments of the present application and is not intended to limit the scope of the present application.

Claims (11)

1. A method of constructing a scientific knowledge graph in the electric power domain, the method comprising:
Acquiring science and technology corpus in the electric power field according to electric power basic data, wherein the electric power basic data is derived from historical science and technology project data in the electric power field and the Internet;
carrying out knowledge extraction according to the electric power field science and technology corpus, a knowledge model and an entity relation extraction model to obtain electric power field science and technology term entity data, wherein the electric power field science and technology term entity data comprises entity objects and entity relations, the knowledge model is obtained by comparison and preference after fine adjustment based on a plurality of pre-training models, the pre-training models are displacement language models which are completed by training, and the entity relation extraction model is a remote supervision relation extraction model which is completed by training and comparison learning;
carrying out knowledge representation according to the technical term entity data in the electric power field and a translation model to obtain entity vectors, wherein the translation model is a trained TransE model;
carrying out knowledge fusion on the entity objects according to the entity vectors to obtain fused entity objects, wherein the knowledge fusion comprises entity alignment and entity disambiguation;
and constructing a technical knowledge graph of the electric power field according to the technical term entity data of the electric power field and the fused entity object.
2. The method according to claim 1, wherein the obtaining the electric power domain technology corpus according to the electric power basic data includes:
preprocessing the electric power basic data to obtain a term vocabulary data set;
and screening and marking the term vocabulary data set according to an N-gram language model to obtain the electric power field science and technology corpus.
3. The method according to claim 1, wherein the performing knowledge extraction according to the power domain technology corpus, knowledge model and entity relation extraction model to obtain power domain technology term entity data comprises:
performing entity extraction according to the electric power field science and technology corpus and the knowledge model to obtain entity objects in the electric power field science and technology term entity data;
and extracting the relationship according to the entity object and the entity relationship extraction model to obtain the entity relationship in the science and technology term entity data of the electric power field.
4. The method of claim 1, wherein the obtaining of the plurality of pre-trained models comprises:
dividing the science and technology corpus in the historical electric power field into a training data set and a verification data set;
Training the initial model by using the training data set, and verifying the trained initial model by using the verification data set to obtain a plurality of pre-training models, wherein the plurality of pre-training models are different types of displacement language models.
5. The method of claim 4, wherein the knowledge model obtaining process comprises:
performing fine adjustment on the plurality of pre-training models to obtain fine adjustment results corresponding to the plurality of pre-training models, wherein the fine adjustment is to perform classification detection on the plurality of pre-training models according to the historical electric power field technological corpus;
and sorting the fine tuning results corresponding to the plurality of pre-training models in a descending order, and taking the pre-training model corresponding to the fine tuning result with the highest sorting as the knowledge model.
6. The method of claim 1, wherein the entity relationship extraction model obtaining process comprises:
constructing positive and negative examples according to historical entity objects, wherein the historical entity objects are obtained by utilizing the knowledge model for the science and technology corpus in the historical electric power field;
and training the remote supervision relation extraction model for comparison learning according to the positive and negative examples to obtain the entity relation extraction model.
7. The method of claim 1, wherein the obtaining of the translation model comprises:
inputting the scientific corpus in the historical electric power field into a word vector model for training to obtain a word vector matrix;
constructing a positive sample set and a negative sample set according to historical electric power field science and technology term entity data, wherein the historical electric power field science and technology term entity data is obtained by utilizing the knowledge model and the entity relation extraction model from the historical electric power field science and technology corpus;
according to the positive sample set, the negative sample set and the word vector matrix, obtaining word vectors corresponding to the positive sample set and the negative sample set;
inputting the positive sample set and the negative sample set into the TransE model to obtain entity vectors corresponding to the positive sample set and the negative sample set;
fusing the entity vector and the word vector to obtain a high-dimensional feature vector;
respectively calculating the distance scores corresponding to the positive sample set and the negative sample set according to the high-dimensional feature vector, and iteratively calculating the loss function of the TransE model by using the distance scores corresponding to the positive sample set and the negative sample set;
and taking the loss function as an optimization target, carrying out iterative training on the TransE model, and taking the trained TransE model as the translation model.
8. The method according to claim 1, wherein the performing knowledge fusion on the entity object according to the entity vector to obtain a fused entity object includes:
performing entity alignment on the entity objects by using the entity vector and the MuGNN model so as to link the entity objects with the same name in different knowledge bases;
and carrying out entity disambiguation on the entity objects according to the context semantic information characteristics so as to unify the semantics of the entity objects with the same names in the different knowledge bases.
9. A device for constructing a scientific knowledge graph in the electric power field, the device comprising:
the acquisition unit is used for acquiring the science and technology corpus in the electric power field according to the electric power basic data, wherein the electric power basic data is derived from the science and technology project data in the historical electric power field and the Internet;
the knowledge extraction unit is used for carrying out knowledge extraction according to the electric power field science and technology corpus, the knowledge model and the entity relation extraction model to obtain electric power field science and technology term entity data, wherein the electric power field science and technology term entity data comprises entity objects and entity relations, the knowledge model is obtained by comparing and preferentially after fine adjustment based on a plurality of pre-training models, the pre-training models are displacement language models which are completed by training, and the entity relation extraction model is a remote supervision relation extraction model which is completed by training and is used for comparing and learning;
The knowledge representation unit is used for carrying out knowledge representation according to the technical term entity data in the electric power field and a translation model to obtain entity vectors, wherein the translation model is a trained TransE model;
the knowledge fusion unit is used for carrying out knowledge fusion on the entity objects according to the entity vectors to obtain fused entity objects, wherein the knowledge fusion comprises entity alignment and entity disambiguation;
and the construction unit is used for constructing the technical knowledge graph of the electric power field according to the technical term entity data of the electric power field and the fused entity object.
10. An electronic device, characterized in that the device comprises a memory and a processor for executing a program stored in the memory, running the method according to any one of claims 1-8.
11. A computer readable storage medium, characterized in that the computer readable storage medium is for storing a computer program for executing the method of any one of claims 1-8.
CN202311381173.8A 2023-10-24 2023-10-24 Method, device, equipment and medium for constructing scientific and technological knowledge graph in electric power field Pending CN117390198A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311381173.8A CN117390198A (en) 2023-10-24 2023-10-24 Method, device, equipment and medium for constructing scientific and technological knowledge graph in electric power field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311381173.8A CN117390198A (en) 2023-10-24 2023-10-24 Method, device, equipment and medium for constructing scientific and technological knowledge graph in electric power field

Publications (1)

Publication Number Publication Date
CN117390198A true CN117390198A (en) 2024-01-12

Family

ID=89438684

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311381173.8A Pending CN117390198A (en) 2023-10-24 2023-10-24 Method, device, equipment and medium for constructing scientific and technological knowledge graph in electric power field

Country Status (1)

Country Link
CN (1) CN117390198A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117633518A (en) * 2024-01-25 2024-03-01 北京大学 Industrial chain construction method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117633518A (en) * 2024-01-25 2024-03-01 北京大学 Industrial chain construction method and system
CN117633518B (en) * 2024-01-25 2024-04-26 北京大学 Industrial chain construction method and system

Similar Documents

Publication Publication Date Title
CN108304911B (en) Knowledge extraction method, system and equipment based on memory neural network
CN116628172B (en) Dialogue method for multi-strategy fusion in government service field based on knowledge graph
CN110727779A (en) Question-answering method and system based on multi-model fusion
CN111738004A (en) Training method of named entity recognition model and named entity recognition method
CN111639171A (en) Knowledge graph question-answering method and device
CN113806563B (en) Architect knowledge graph construction method for multi-source heterogeneous building humanistic historical material
CN109271506A (en) A kind of construction method of the field of power communication knowledge mapping question answering system based on deep learning
CN111967761B (en) Knowledge graph-based monitoring and early warning method and device and electronic equipment
CN110968699A (en) Logic map construction and early warning method and device based on event recommendation
CN112149400B (en) Data processing method, device, equipment and storage medium
CN116992005B (en) Intelligent dialogue method, system and equipment based on large model and local knowledge base
CN111783394A (en) Training method of event extraction model, event extraction method, system and equipment
CN113191148B (en) Rail transit entity identification method based on semi-supervised learning and clustering
CN116661805B (en) Code representation generation method and device, storage medium and electronic equipment
CN113239143B (en) Power transmission and transformation equipment fault processing method and system fusing power grid fault case base
CN117390198A (en) Method, device, equipment and medium for constructing scientific and technological knowledge graph in electric power field
CN115757695A (en) Log language model training method and system
CN117474010A (en) Power grid language model-oriented power transmission and transformation equipment defect corpus construction method
CN113486174B (en) Model training, reading understanding method and device, electronic equipment and storage medium
CN111368093A (en) Information acquisition method and device, electronic equipment and computer readable storage medium
CN115114419A (en) Question and answer processing method and device, electronic equipment and computer readable medium
CN117828024A (en) Plug-in retrieval method, device, storage medium and equipment
CN111104492B (en) Civil aviation field automatic question and answer method based on layering Attention mechanism
CN117216221A (en) Intelligent question-answering system based on knowledge graph and construction method
CN114443846B (en) Classification method and device based on multi-level text different composition and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination