Nothing Special   »   [go: up one dir, main page]

CN113963357B - Knowledge graph-based sensitive text detection method and system - Google Patents

Knowledge graph-based sensitive text detection method and system Download PDF

Info

Publication number
CN113963357B
CN113963357B CN202111535596.1A CN202111535596A CN113963357B CN 113963357 B CN113963357 B CN 113963357B CN 202111535596 A CN202111535596 A CN 202111535596A CN 113963357 B CN113963357 B CN 113963357B
Authority
CN
China
Prior art keywords
knowledge
text
sensitive
network
entities
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111535596.1A
Other languages
Chinese (zh)
Other versions
CN113963357A (en
Inventor
张静磊
叶蔚
张世琨
谢睿
温国昌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN202111535596.1A priority Critical patent/CN113963357B/en
Publication of CN113963357A publication Critical patent/CN113963357A/en
Application granted granted Critical
Publication of CN113963357B publication Critical patent/CN113963357B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a system for detecting sensitive texts based on a knowledge graph, wherein the method comprises the following steps: crawling the existing knowledge in the network, and preprocessing the existing knowledge to obtain a knowledge graph network; acquiring a sensitive text in a network, and preprocessing the sensitive text to obtain a training corpus; obtaining coding information of a text detection model according to the training corpus and the knowledge graph network, and converting the coding information into vector representation to obtain a final text detection model; and preprocessing the text to be tested, and obtaining a detection result according to the text detection model. According to the invention, external knowledge is introduced through the knowledge map, a text detection model is established, and the external knowledge is further fused through the multi-view reasoning network, so that the external knowledge can be fully utilized.

Description

Knowledge graph-based sensitive text detection method and system
Technical Field
The invention relates to the field of sensitive text detection, in particular to a knowledge graph-based sensitive text detection method and system.
Background
With the development of the internet, the information on the internet is growing explosively, however, unhealthy and illegal information is increasing, so that reasonable screening of the information is particularly important, the NLP technology plays an increasingly important role in the fields of text classification, language translation, part of speech tagging, named entity identification and the like in human daily language processing, and obtains remarkable results, the sensitive text analysis in the NLP field plays an increasingly important role in the internet field, however, for the defects of the technology, methods such as pinyin replacement, sequence disturbance, reference replacement and the like are utilized, so that the sensitive text detection is more difficult, and the problem can be reasonably solved by utilizing knowledge maps.
Disclosure of Invention
The invention provides a method and a system for detecting sensitive texts based on a knowledge graph, which introduce external knowledge through the knowledge graph, provide necessary basis for detection of a model, and further fuse the external knowledge through a multi-view reasoning network, so that the external knowledge can be fully utilized.
In order to achieve the above object, the present invention provides a method for detecting sensitive text based on knowledge-graph, comprising:
crawling the existing knowledge in the network, and preprocessing the existing knowledge to obtain a knowledge graph network;
acquiring a sensitive text in a network, and preprocessing the sensitive text to obtain a training corpus;
obtaining coding information of a text detection model according to the training corpus and the knowledge graph network, and converting the coding information into vector representation to obtain a final text detection model;
and preprocessing the text to be tested, and obtaining a detection result according to the text detection model.
According to one aspect of the invention, the method for obtaining the knowledge graph network comprises the following steps:
the existing knowledge in the open source community and the information disclosure website is obtained through a web crawler technology, a data set is obtained through collection, the data set is processed through an entity recognition and relation extraction technology, structured data of the data set are obtained, and the knowledge graph network is formed.
According to one aspect of the present invention, the method for obtaining the corpus comprises:
and acquiring the sensitive texts in the open source community and the information public website by the web crawler technology, deleting stop words and special symbols in the sensitive texts, and segmenting the length of the sensitive texts to obtain the training corpus.
According to one aspect of the invention, the corpus comprises entities and instances corresponding to the entities, custom identifiers are inserted into front and rear positions of the instances, different entities correspond to different custom identifiers, different instances of the same entity correspond to the same custom identifier, anchors are set for the entities, and position information of the corpus is obtained through language model coding.
According to one aspect of the invention, related concepts of each entity and the confidence degree corresponding to the related concepts are extracted according to the knowledge-graph network, and if the related concepts of the entity are less than 10, the confidence degree of a spare part is set to be 0.
According to one aspect of the invention, the entities and related concepts are preprocessed, supplemented by crawling wikipedia text, and if the knowledge-graph network does not have the entities, the entities are replaced with wiki information, which is encoded by the language model and max pooling.
According to one aspect of the invention, a weight value of the related concept is obtained through softmax operation according to the confidence degree, a vector set is obtained according to the weight value and the vector representation, a vector representation of the entity is obtained according to the vector set, and data information interaction between the training corpus and the knowledge graph network is achieved.
To achieve the above object, the present invention provides a knowledge-graph-based sensitive text detection system, comprising:
a knowledge graph network establishment module: crawling the existing knowledge in the network, and preprocessing the existing knowledge to obtain a knowledge graph network;
the training corpus building module: acquiring a sensitive text in a network, and preprocessing the sensitive text to obtain a training corpus;
the text detection model construction module: obtaining coding information of a text detection model according to the training corpus and the knowledge graph network, and converting the coding information into vector representation to obtain a final text detection model;
a prediction result module: and preprocessing the text to be tested, and obtaining a detection result according to the text detection model.
To achieve the above object, the present invention provides an electronic device, which includes a processor, a memory, and a computer program stored in the memory and running on the processor, wherein the computer program, when executed by the processor, implements the above method for detecting sensitive text based on a knowledge graph.
To achieve the above object, the present invention provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the above-mentioned method for detecting sensitive texts based on knowledge-graph.
Based on this, the beneficial effects of the invention are:
(1) sensitive texts are detected through a knowledge graph network, so that the defects of the traditional technologies such as pinyin replacement, sequence disturbance, reference replacement and the like are avoided;
(2) the training corpus and the knowledge graph network are converted into vector representation, so that the interactivity between the training corpus and the knowledge graph network is enhanced, and the accuracy of the text detection model is improved.
Drawings
FIG. 1 schematically represents a flow diagram of a knowledge-graph based sensitive text detection method according to the present invention;
FIG. 2 schematically represents a diagram of a sensitive text three-tier inference mechanism in accordance with the present invention;
FIG. 3 schematically represents an architecture diagram of a sensitive text detection model according to the present invention;
FIG. 4 schematically represents a flow diagram of a knowledge-graph based sensitive text detection system according to the present invention.
Detailed Description
The present invention will now be discussed with reference to exemplary embodiments, it being understood that the embodiments discussed are only for the purpose of enabling a person of ordinary skill in the art to better understand and thus implement the contents of the present invention, and do not imply any limitation on the scope of the present invention.
As used herein, the term "include" and its variants are to be read as open-ended terms meaning "including, but not limited to. The term "based on" is to be read as "based, at least in part, on" and the terms "one embodiment" and "an embodiment" are to be read as "at least one embodiment".
Fig. 1 schematically shows a flow chart of a method for detecting sensitive text based on a knowledge-graph according to the present invention, as shown in fig. 1, the method for detecting sensitive text based on a knowledge-graph according to the present invention comprises the following steps:
101: crawling the existing knowledge in the network, and preprocessing the existing knowledge to obtain a knowledge graph network;
102: acquiring a sensitive text in a network, and preprocessing the sensitive text to obtain a training corpus;
103: obtaining coding information of the text detection model according to the training corpus and the knowledge graph network, and converting the coding information into vector representation to obtain a final text detection model;
104: and preprocessing the text to be tested, and obtaining a detection result according to the text detection model.
According to one embodiment of the invention, the method for obtaining the knowledge graph network comprises the following steps:
the method comprises the steps of obtaining existing knowledge in open source communities and information disclosure websites through a web crawler technology, summarizing to obtain data sets, processing the data sets through an entity recognition and relation extraction technology to obtain structured data of the data sets, and forming a knowledge graph network.
According to one embodiment of the present invention, the method for obtaining the corpus comprises:
fig. 2 schematically shows a schematic diagram of a sensitive text three-layer inference mechanism according to the present invention, and as shown in fig. 2, sensitive texts in an open source community and an information disclosure website are obtained through a web crawler technology, stop words and special symbols in the sensitive texts are deleted, and the length of the sensitive texts is cut to obtain a corpus.
According to an embodiment of the present invention, fig. 3 schematically shows an architecture diagram of a sensitive text detection model according to the present invention, and according to fig. 3, a corpus includes entities and instances corresponding to the entities, custom identifiers are inserted into front and rear positions of the instances, different entities correspond to different custom identifiers, different instances of the same entity correspond to the same custom identifier, anchors are set for the entities, and position information of the corpus is obtained through language model coding.
According to one embodiment of the invention, the related concepts and the confidence degrees corresponding to the related concepts of each entity are extracted according to the knowledge-graph network, and if the related concepts of the entities are less than 10, the confidence degree of the spare part is set to be 0.
According to one embodiment of the invention, entities and related concepts are preprocessed, entities and related concepts are supplemented by crawling wikipedia text, and if the knowledge-graph network has no entities, the entities are replaced with wikipedia information, which is encoded by a language model and max pooling.
According to one embodiment of the invention, the weight value of the related concept is obtained through softmax operation according to the confidence degree, the vector set is obtained according to the weight value and the vector representation, the vector representation of the entity is obtained according to the vector set, and the interaction of data information between the training corpus and the knowledge graph network is realized.
According to an embodiment of the invention, in order to test the effect of the invention, 15 ten thousand sensitive texts are collected, 95% of items are used as a training set, 5% of items are used as a test set, the training set is implemented according to the scheme of the invention, after the training is finished, evaluation is performed on the test set, in order to better verify the effect of generating the abstract, an accuracy rate, a recall rate and an F1 value are selected as evaluation indexes, the accuracy rate: precision = (amount of text classified as sensitive text/total amount of text) x100%, recall: recall = (amount of text classified as sensitive text/total amount of text of sensitive text in text) x100%, F1 value: in order to evaluate the advantages and disadvantages of different algorithms, the concept of F1 value is proposed on the basis of the accuracy and the recall ratio to carry out overall evaluation on the accuracy and the recall ratio: f1 value = correct rate recall rate 2/(correct rate + recall rate), the existing models CNN, GRU, LSTM and BERT were selected as the reference models, the accuracy of the model CNN was 70.1%, the recall rate was 61.2%, the F1 value was 65.3%; the accuracy of the model GRU was 69.7%, the recall was 59.5%, and the F1 value was 64.2%; the accuracy of the model CNN was 66.5%, the recall was 71.8%, and the F1 value was 68.9%; the accuracy of the model CNN was 70.1%, the recall was 74.5%, and the F1 value was 72.0%; the accuracy rate of the text detection model is 84.7%, the recall rate is 86.9%, and the F1 value is 85.7%, so that the data show that the text detection model provided by the invention can better identify sensitive texts.
Furthermore, to achieve the above objects, the present invention provides a system for detecting sensitive texts based on a knowledge-graph, fig. 4 schematically shows a flow chart of the system for detecting sensitive texts based on a knowledge-graph according to the present invention, as shown in fig. 4, the system for detecting sensitive texts based on a knowledge-graph according to the present invention comprises:
a knowledge graph network establishment module: crawling the existing knowledge in the network, and preprocessing the existing knowledge to obtain a knowledge graph network;
the training corpus building module: the training corpus building module: acquiring a sensitive text in a network, and preprocessing the sensitive text to obtain a training corpus;
the text detection model construction module: obtaining coding information of the text detection model according to the training corpus and the knowledge graph network, and converting the coding information into vector representation to obtain a final text detection model;
a prediction result module: and preprocessing the text to be tested, and obtaining a detection result according to the text detection model.
According to one embodiment of the invention, the method for obtaining the knowledge graph network comprises the following steps:
the method comprises the steps of obtaining existing knowledge in open source communities and information disclosure websites through a web crawler technology, summarizing to obtain data sets, processing the data sets through an entity recognition and relation extraction technology to obtain structured data of the data sets, and forming a knowledge graph network.
According to one embodiment of the present invention, the method for obtaining the corpus comprises:
fig. 2 schematically shows a schematic diagram of a sensitive text three-layer inference mechanism according to the present invention, and as shown in fig. 2, sensitive texts in an open source community and an information disclosure website are obtained through a web crawler technology, stop words and special symbols in the sensitive texts are deleted, and the length of the sensitive texts is cut to obtain a corpus.
According to an embodiment of the present invention, fig. 3 schematically shows an architecture diagram of a sensitive text detection model according to the present invention, and according to fig. 3, a corpus includes entities and instances corresponding to the entities, custom identifiers are inserted into front and rear positions of the instances, different entities correspond to different custom identifiers, different instances of the same entity correspond to the same custom identifier, anchors are set for the entities, and position information of the corpus is obtained through language model coding.
According to one embodiment of the invention, the related concepts and the confidence degrees corresponding to the related concepts of each entity are extracted according to the knowledge-graph network, and if the related concepts of the entities are less than 10, the confidence degree of the spare part is set to be 0.
According to one embodiment of the invention, entities and related concepts are preprocessed, entities and related concepts are supplemented by crawling wikipedia text, and if the knowledge-graph network has no entities, the entities are replaced with wikipedia information, which is encoded by a language model and max pooling.
According to one embodiment of the invention, the weight value of the related concept is obtained through softmax operation according to the confidence degree, the vector set is obtained according to the weight value and the vector representation, the vector representation of the entity is obtained according to the vector set, and the interaction of data information between the training corpus and the knowledge graph network is realized.
According to an embodiment of the invention, in order to test the effect of the invention, 15 ten thousand sensitive texts are collected, 95% of items are used as a training set, 5% of items are used as a test set, the training set is implemented according to the scheme of the invention, after the training is finished, evaluation is performed on the test set, in order to better verify the effect of generating the abstract, an accuracy rate, a recall rate and an F1 value are selected as evaluation indexes, the accuracy rate: precision = (amount of text classified as sensitive text/total amount of text) x100%, recall: recall = (amount of text classified as sensitive text/total amount of text of sensitive text in text) x100%, F1 value: in order to evaluate the advantages and disadvantages of different algorithms, the concept of F1 value is proposed on the basis of the accuracy and the recall ratio to carry out overall evaluation on the accuracy and the recall ratio: f1 value = correct rate recall rate 2/(correct rate + recall rate), the existing models CNN, GRU, LSTM and BERT were selected as the reference models, the accuracy of the model CNN was 70.1%, the recall rate was 61.2%, the F1 value was 65.3%; the accuracy of the model GRU was 69.7%, the recall was 59.5%, and the F1 value was 64.2%; the accuracy of the model CNN was 66.5%, the recall was 71.8%, and the F1 value was 68.9%; the accuracy of the model CNN was 70.1%, the recall was 74.5%, and the F1 value was 72.0%; the accuracy rate of the text detection model is 84.7%, the recall rate is 86.9%, and the F1 value is 85.7%, so that the data show that the text detection model provided by the invention can better identify sensitive texts.
To achieve the above object, the present invention also provides an electronic device, including: the system comprises a processor, a memory and a computer program stored on the memory and capable of running on the processor, wherein the computer program realizes the above-mentioned sensitive text detection method based on the knowledge graph when being executed by the processor.
To achieve the above object, the present invention further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the above method for detecting sensitive texts based on knowledge-graph.
Those of ordinary skill in the art will appreciate that the modules and algorithm steps described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and devices may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.
In addition, each functional module in the embodiments of the present invention may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method for transmitting/receiving the power saving signal according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by a person skilled in the art that the scope of the invention as referred to in the present application is not limited to the embodiments with a specific combination of the above-mentioned features, but also covers other embodiments with any combination of the above-mentioned features or their equivalents without departing from the inventive concept. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.
It should be understood that the order of execution of the steps in the summary of the invention and the embodiments of the present invention does not absolutely imply any order of execution, and the order of execution of the steps should be determined by their functions and inherent logic, and should not be construed as limiting the process of the embodiments of the present invention.

Claims (7)

1. The method for detecting the sensitive text based on the knowledge graph is characterized by comprising the following steps:
crawling the existing knowledge in the network, and preprocessing the existing knowledge to obtain a knowledge graph network;
acquiring a sensitive text in a network, and preprocessing the sensitive text to obtain a training corpus;
obtaining coding information of a text detection model according to the training corpus and the knowledge graph network, and converting the coding information into vector representation to obtain a final text detection model;
the training corpus comprises entities and instances corresponding to the entities, custom identifiers are inserted into the front and rear positions of the instances, different entities correspond to different custom identifiers, different instances of the same entity correspond to the same custom identifier, anchors are arranged for the entities, and position information of the training corpus is obtained through language model coding;
extracting related concepts of each entity and a confidence corresponding to the related concepts according to the knowledge graph network, and if the related concepts of the entities are less than 10, setting the confidence of a spare part to be 0;
obtaining a weight value of the related concept through softmax operation according to the confidence degree, obtaining a vector set according to the weight value and the vector representation, obtaining a vector representation of the entity according to the vector set, and enabling the training corpus and the knowledge graph network to realize data information interaction;
and preprocessing the text to be tested, and obtaining a detection result according to the text detection model.
2. The method for detecting sensitive texts based on knowledge-graph according to claim 1, wherein the method for obtaining knowledge-graph network is as follows:
the existing knowledge in the open source community and the information disclosure website is obtained through a web crawler technology, a data set is obtained through collection, the data set is processed through an entity recognition and relation extraction technology, structured data of the data set are obtained, and the knowledge graph network is formed.
3. The knowledge-graph-based sensitive text detection method according to claim 2, wherein the method for obtaining the training corpus comprises:
and acquiring the sensitive texts in the open source community and the information public website by the web crawler technology, deleting stop words and special symbols in the sensitive texts, and segmenting the length of the sensitive texts to obtain the training corpus.
4. The method of knowledgegraph-based sensitive text detection according to claim 3, characterized in that the entities and related concepts are preprocessed, supplemented by crawling wikipedia text, if the knowledgegraph network does not have the entities, the entities are replaced by wiki information, which is encoded by the language model and max pooling.
5. A sensitive text detection system based on knowledge-graph, comprising:
a knowledge graph network establishment module: crawling the existing knowledge in the network, and preprocessing the existing knowledge to obtain a knowledge graph network;
the training corpus building module: acquiring a sensitive text in a network, preprocessing the sensitive text to obtain a training corpus, wherein the training corpus comprises entities and instances corresponding to the entities, inserting custom identifiers at the front and rear positions of the instances, wherein different entities correspond to different custom identifiers, different instances of the same entity correspond to the same custom identifier, setting anchors for the entities, coding a language model to obtain position information of the training corpus, extracting related concepts of each entity and confidence degrees corresponding to the related concepts according to the knowledge graph network, and setting the confidence degrees of vacant parts to be 0 if the related concepts of the entities are less than 10;
the text detection model construction module: obtaining coding information of a text detection model according to the training corpus and the knowledge-graph network, converting the coding information into vector representation to obtain a final text detection model, obtaining a weight value of the related concept through softmax operation according to the confidence coefficient, obtaining a vector set according to the weight value and the vector representation, obtaining vector representation of the entity according to the vector set, and enabling the training corpus and the knowledge-graph network to realize data information interaction;
a prediction result module: and preprocessing the text to be tested, and obtaining a detection result according to the text detection model.
6. An electronic device comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing a method for knowledgegraph-based sensitive text detection as claimed in any of claims 1 to 4.
7. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, implements the method for knowledge-graph based sensitive text detection according to any one of claims 1 to 4.
CN202111535596.1A 2021-12-16 2021-12-16 Knowledge graph-based sensitive text detection method and system Active CN113963357B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111535596.1A CN113963357B (en) 2021-12-16 2021-12-16 Knowledge graph-based sensitive text detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111535596.1A CN113963357B (en) 2021-12-16 2021-12-16 Knowledge graph-based sensitive text detection method and system

Publications (2)

Publication Number Publication Date
CN113963357A CN113963357A (en) 2022-01-21
CN113963357B true CN113963357B (en) 2022-03-11

Family

ID=79473244

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111535596.1A Active CN113963357B (en) 2021-12-16 2021-12-16 Knowledge graph-based sensitive text detection method and system

Country Status (1)

Country Link
CN (1) CN113963357B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108595708A (en) * 2018-05-10 2018-09-28 北京航空航天大学 A kind of exception information file classification method of knowledge based collection of illustrative plates
CN110019839A (en) * 2018-01-03 2019-07-16 中国科学院计算技术研究所 Medical knowledge map construction method and system based on neural network and remote supervisory
CN110516073A (en) * 2019-08-30 2019-11-29 北京百度网讯科技有限公司 A kind of file classification method, device, equipment and medium
CN111061843A (en) * 2019-12-26 2020-04-24 武汉大学 Knowledge graph guided false news detection method
CN111428054A (en) * 2020-04-14 2020-07-17 中国电子科技网络信息安全有限公司 Construction and storage method of knowledge graph in network space security field
CN112131401A (en) * 2020-09-14 2020-12-25 腾讯科技(深圳)有限公司 Method and device for constructing concept knowledge graph
CN112163099A (en) * 2020-09-24 2021-01-01 平安直通咨询有限公司上海分公司 Text recognition method and device based on knowledge graph, storage medium and server
CN112417314A (en) * 2020-11-26 2021-02-26 清华大学 Social network suicidal ideation detection method and system
CN112417456A (en) * 2020-11-16 2021-02-26 中国电子科技集团公司第三十研究所 Structured sensitive data reduction detection method based on big data
CN112507039A (en) * 2020-12-15 2021-03-16 苏州元启创人工智能科技有限公司 Text understanding method based on external knowledge embedding
CN113254649A (en) * 2021-06-22 2021-08-13 中国平安人寿保险股份有限公司 Sensitive content recognition model training method, text recognition method and related device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11823798B2 (en) * 2016-09-28 2023-11-21 Merative Us L.P. Container-based knowledge graphs for determining entity relations in non-narrative text
US11809986B2 (en) * 2020-05-15 2023-11-07 International Business Machines Corporation Computing graph similarity via graph matching

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110019839A (en) * 2018-01-03 2019-07-16 中国科学院计算技术研究所 Medical knowledge map construction method and system based on neural network and remote supervisory
CN108595708A (en) * 2018-05-10 2018-09-28 北京航空航天大学 A kind of exception information file classification method of knowledge based collection of illustrative plates
CN110516073A (en) * 2019-08-30 2019-11-29 北京百度网讯科技有限公司 A kind of file classification method, device, equipment and medium
CN111061843A (en) * 2019-12-26 2020-04-24 武汉大学 Knowledge graph guided false news detection method
CN111428054A (en) * 2020-04-14 2020-07-17 中国电子科技网络信息安全有限公司 Construction and storage method of knowledge graph in network space security field
CN112131401A (en) * 2020-09-14 2020-12-25 腾讯科技(深圳)有限公司 Method and device for constructing concept knowledge graph
CN112163099A (en) * 2020-09-24 2021-01-01 平安直通咨询有限公司上海分公司 Text recognition method and device based on knowledge graph, storage medium and server
CN112417456A (en) * 2020-11-16 2021-02-26 中国电子科技集团公司第三十研究所 Structured sensitive data reduction detection method based on big data
CN112417314A (en) * 2020-11-26 2021-02-26 清华大学 Social network suicidal ideation detection method and system
CN112507039A (en) * 2020-12-15 2021-03-16 苏州元启创人工智能科技有限公司 Text understanding method based on external knowledge embedding
CN113254649A (en) * 2021-06-22 2021-08-13 中国平安人寿保险股份有限公司 Sensitive content recognition model training method, text recognition method and related device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Exploiting code knowledge graph for bug localization via bi-direction attention;Jinglei zhang et al;《proceedings of the 28th international conference on program comprehension》;20200912;第219-229页 *
基于知识图谱和图像描述的虚假新闻检测研究;陈开阳等;《江西师范大学学报》;20210725;第45卷(第04期);第398-402页 *

Also Published As

Publication number Publication date
CN113963357A (en) 2022-01-21

Similar Documents

Publication Publication Date Title
CN107798136B (en) Entity relation extraction method and device based on deep learning and server
CN110490242A (en) Training method, eye fundus image classification method and the relevant device of image classification network
CN105005616B (en) Method and system are illustrated based on the text that textual image feature interaction expands
CN110929520B (en) Unnamed entity object extraction method and device, electronic equipment and storage medium
CN109670050A (en) A kind of entity relationship prediction technique and device
CN107943514A (en) The method for digging and system of core code element in a kind of software document
CN107122492A (en) Lyric generation method and device based on picture content
CN116956929B (en) Multi-feature fusion named entity recognition method and device for bridge management text data
CN110851176A (en) Clone code detection method capable of automatically constructing and utilizing pseudo clone corpus
CN117891940B (en) Multi-modal irony detection method, apparatus, computer device, and storage medium
CN114330966A (en) Risk prediction method, device, equipment and readable storage medium
CN111639185B (en) Relation information extraction method, device, electronic equipment and readable storage medium
CN112926332A (en) Entity relationship joint extraction method and device
CN113886524A (en) Network security threat event extraction method based on short text
CN116383517A (en) Dynamic propagation feature enhanced multi-modal rumor detection method and system
CN111898528B (en) Data processing method, device, computer readable medium and electronic equipment
CN116029280A (en) Method, device, computing equipment and storage medium for extracting key information of document
Pillai Leveraging Natural Language Processing for Detecting Fake News: A Comparative Analysis
CN117634483A (en) Chinese-oriented multi-granularity image-text cross-modal correlation method
CN112380861A (en) Model training method and device and intention identification method and device
CN113963357B (en) Knowledge graph-based sensitive text detection method and system
CN115048929B (en) Sensitive text monitoring method and device
CN116881408A (en) Visual question-answering fraud prevention method and system based on OCR and NLP
CN115759085A (en) Information prediction method and device based on prompt model, electronic equipment and medium
CN114416923A (en) News entity linking method and system based on rich text characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant