Nothing Special   »   [go: up one dir, main page]

CN113297387B - News detection method for image-text mismatching based on NKD-GNN - Google Patents

News detection method for image-text mismatching based on NKD-GNN Download PDF

Info

Publication number
CN113297387B
CN113297387B CN202110424490.8A CN202110424490A CN113297387B CN 113297387 B CN113297387 B CN 113297387B CN 202110424490 A CN202110424490 A CN 202110424490A CN 113297387 B CN113297387 B CN 113297387B
Authority
CN
China
Prior art keywords
news
matching
description
entity
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110424490.8A
Other languages
Chinese (zh)
Other versions
CN113297387A (en
Inventor
云静
高硕�
赵禹萌
许志伟
刘利民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inner Mongolia University of Technology
Original Assignee
Inner Mongolia University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inner Mongolia University of Technology filed Critical Inner Mongolia University of Technology
Priority to CN202110424490.8A priority Critical patent/CN113297387B/en
Publication of CN113297387A publication Critical patent/CN113297387A/en
Application granted granted Critical
Publication of CN113297387B publication Critical patent/CN113297387B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A news detection method based on NKD-GNN picture and text mismatching is used for generating news matching description with placeholders for news matching; constructing the named entity into a news knowledge graph according to the connection rule; selecting named entities related to the news matching graph based on a graph neural network driven by a news knowledge graph, and inserting the named entities into the news matching graph description, thereby generating the news matching graph description with the named entities; and calculating the matching performance of the news text and the news matching description with the named entity, and judging whether the images and texts of the news are matched. On the basis of comprehensively analyzing all the associations between the named entities in the news knowledge graph, the importance degree of the named entities in the news knowledge graph is calculated, and the core named entities in the related news are analyzed, so that the image-text matching judgment effect on the news is better.

Description

News detection method for image-text mismatching based on NKD-GNN
Technical Field
The invention belongs to the technical field of artificial intelligence, relates to false information detection, and particularly relates to a text mismatching news detection method based on NKD-GNN.
Background
With the rapid development of internet technology, browsing network news has become a main channel for people to know the current affairs. Some bad media are used for earning the eyes of readers, obtaining extremely high news click volume, and matching pictures which are attractive to news but not related to news contents. If the news with unmatched images and texts is not processed in time, the public is easily misunderstood to the fact, the ecology of the network news is damaged, and the public credibility of the media is lost. Generally, the news text comprises a place class named entity of an event, a person class named entity and an organization class named entity related to the event, and a news map visually and vividly shows a key named entity in the news event. Whether the news text is consistent with the named entity in the news matching picture or not greatly influences the result of news image-text matching detection. Because the news text contains a large number of named entities, the image feature extraction algorithm cannot directly extract the named entities from the news matching image, and a huge semantic difference exists between the news text and the news matching image. Therefore, the existing image-text matching detection method cannot be directly used for judging the matching between the news text and the news matching chart, and the news matching chart description with named entities needs to be generated.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a text mismatch news detection method based on NKD-GNN.
In order to achieve the purpose, the invention adopts the technical scheme that:
an NKD-GNN-based image-text mismatching news detection method comprises the following steps:
step 1, generating a news matching description with placeholders for the news matching;
step 2, constructing the named entities into a news knowledge graph according to the connection rule;
step 3, selecting a named entity related to the news matching based on a graph neural network driven by a news knowledge graph, and inserting the named entity into the description of the news matching, so as to generate the description of the news matching with the named entity;
and 4, calculating the matching performance of the news text and the news matching description with the named entity, and judging whether the images and texts of one news are matched.
Compared with the prior art, the method and the device have the advantages that on the basis of comprehensively analyzing all the associations among the named entities in the news knowledge graph, the importance degree of the named entities in the news knowledge graph is calculated, and the core named entities in related news are analyzed, so that the image-text matching judgment effect on the news is better.
Drawings
FIG. 1 is a schematic flow diagram of the present invention.
Fig. 2 is a diagram of two news images, which each include three related articles, and a detection process and a conclusion thereof according to an embodiment of the present invention. Wherein (a) is news with matched pictures and texts, and (b) is news with unmatched pictures and texts.
Detailed Description
The embodiments of the present invention will be described in detail below with reference to the drawings and examples.
As shown in fig. 1, the invention relates to a text mismatch news detection method based on NKD-GNN, comprising the following steps:
step 1, matching the news into a map, and generating a description of the news with placeholders.
A large number of named entities exist in a news text, but the existing image description generation method cannot directly generate the image description with the named entities, so that a semantic difference exists between the news text and a news matching picture. There is a certain difficulty in directly calculating the matching between the two. Therefore, the method generates a news matching description with placeholders by matching the news, unifies the news text and the news matching into the same modality, and selects the named entity related to the news matching to be inserted into the news matching description in the subsequent step.
The method comprises the following specific steps of generating the description of the news matching map with the placeholder:
step 1.1: generating a news matching description by using an open source pre-trained image generation description model, wherein the model follows the Encoder-Decoder design idea, uses CNN to extract image characteristics at the encoding stage, and uses RNN to generate the news matching description at the decoding stage;
step 1.2: for the generated description of the newsflash, a WordNet tool is adopted to replace the vocabulary in the newsflash description, which is in the same semantic tree with 'Person', by a < Person > placeholder, replace the vocabulary in the same semantic tree with 'Place' by a < Place > placeholder, and replace 'a group of sight' in the newsflash description by an < Organization > placeholder; replacing the Building vocabulary in the description of the news matching graph with a < Building > placeholder; thereby generating a newsflash description with four categories of placeholders < Person >, < Place >, < Organization >, < Building >.
And 2, constructing a news knowledge graph.
When a named entity is selected to be inserted into a description of a news match graph with placeholders, the statistical association between the named entities in a news article needs to be analyzed to accurately reflect the association between the named entities in a news scene. The invention constructs the news knowledge map, constructs the named entities into the news knowledge map according to the connection rule, and lays a foundation for the association between subsequent analysis entities.
The method specifically comprises the following steps of constructing a news knowledge graph:
step 2.1: using a SpaCy's named entity recognizer to extract named entities of related articles of news, and reserving four kinds of named entities of Person, Organization, Location and Building;
step 2.2: the reserved named entities form an entity set V ═ { V ═ V1,v2,...,vm}; named entities appearing in the same sentence are connected by edges, and all the edges form an edge set E ═ { E }1,e2,...,emThe weighted value of the edge e is calculated as follows:
Figure BDA0003029292520000031
wherein E is E, HeWeight value of edge e, i.e. co-occurrence of two named entities, vhAnd vtAre two named entities connected by an edge e,
Figure BDA0003029292520000032
is vhAnd vtThe number of times of co-occurrence,
Figure BDA0003029292520000033
and
Figure BDA0003029292520000034
are each vhAnd vtThe number of individual occurrences, the graph G ═ V, E, which is composed of all named entities and all edges, i.e., the news map.
And 3, generating a news matching description with the named entities.
The news knowledge-graph constructed in step 2 contains all the named entities in the news article, some of which are not related to the newsletter. In order to fully analyze the association between entities in the news knowledge-graph and to eliminate noise interference in the news knowledge-graph, named entities relevant to the news mapping are selected. The invention provides a News knowledge graph driven neural network (NKD-GNN), which selects named entities related to a News matching graph to be inserted into a News matching graph description on the basis of completely analyzing the association between the named entities in the News knowledge graph, thereby generating the News matching graph description with the named entities.
The method comprises the following specific steps of generating the news matching description with the named entities:
step 3.1: and aggregating all edge and all node information in the news knowledge graph by using the graph neural network so as to obtain each node vector v.
In particular, node v in the news knowledge-graphiProcess of input to a graph neural network, and graph neural network updating entity vector viThe process of (1) is as follows:
Figure BDA0003029292520000041
Figure BDA0003029292520000042
Figure BDA0003029292520000043
Figure BDA0003029292520000044
Figure BDA0003029292520000045
wherein
Figure BDA0003029292520000046
Is the input and weight matrix corresponding to the ith node in the news knowledge map at the time t
Figure BDA0003029292520000047
Figure BDA0003029292520000048
Is a set of node vectors at time t-1,
Figure BDA0003029292520000049
is a contiguous matrix of the news knowledge-graph,
Figure BDA00030292925200000410
a blocking matrix corresponding to the ith entity of the adjacency matrix of the news knowledge-graph,
Figure BDA00030292925200000411
is a reset gate that is turned on and off,
Figure BDA00030292925200000412
is an update gate, σ () is a sigmoid function,
Figure BDA00030292925200000413
is a point-by-point operator and,
Figure BDA00030292925200000414
entity v at time t-1iVector, WzIs time t
Figure BDA00030292925200000415
Weight matrix of WrIs time t
Figure BDA00030292925200000416
Weight matrix of WoWeight matrix, U, for activating function inputs at time tzIs the time t-1
Figure BDA00030292925200000420
Weight matrix of, UrIs the time t-1
Figure BDA00030292925200000417
Weight matrix of, UoThe weight matrix for the activation function input at time t-1,
Figure BDA00030292925200000418
is an entity viThe candidate vector of (a) is selected,
Figure BDA00030292925200000419
is the s-th reset gate; n is a serial number.
Formula (2) reflects the node v in the news knowledge graphiProcess of aggregating information of its neighboring nodes, node viBy aggregating neighbor node information
Figure BDA0003029292520000051
Formula (3) and formula (4) respectively determine the information retained in the neighbor node information and the discarded information. Equation (5) uses the t-1 time node vi(Vector)
Figure BDA0003029292520000052
And time t node viIs inputted
Figure BDA0003029292520000053
Get node viCandidate vector of
Figure BDA0003029292520000054
Equation (6) uses the t-1 time instance entity vector
Figure BDA0003029292520000055
And node candidate vectors
Figure BDA0003029292520000056
Calculating a node v at time tiAnd (4) vector representation. Until all the node information is learned, the final vector representation of the node is obtained
Figure BDA0003029292520000057
Step 3.2: the most distant nodes in the news knowledge graph are core entities in the related texts and reflect key information of the news related to the matching of the news. The invention sets the named entity with the most edges in the news knowledge graph as the important node vbWhen the named entities with the most edges in the news knowledge graph are multiple, the important node v with the highest frequency is taken as the important node vb(ii) a Global vector of news knowledge map
Figure BDA0003029292520000058
And important node vector
Figure BDA0003029292520000059
Combining to obtain the expression vector N of the news knowledge mapr
Specifically, since each node has different priority due to different influences on the global vector of the news knowledge graph, each node vector is weighted by an attention mechanism and then summed to obtain the global knowledge vector N of the news knowledge graphg(ii) a Secondly important node vbIs defined as Nb(ii) a Finally, the global knowledge vector N of the news knowledge graph is usedgAnd important entity vector NbPerforming linear splicing to obtain a news knowledge map representation vector NrThe process is as follows:
αi=qTσ(W1vb+W2vi) (7)
Figure BDA00030292925200000510
Figure BDA00030292925200000511
wherein alpha isiFor the node coefficients in the news knowledge-graph,parameter(s)
Figure BDA00030292925200000512
The purpose is to transpose the matrix so that two matrices are multiplied to obtain a specific value, the matrix
Figure BDA00030292925200000513
Sum matrix
Figure BDA00030292925200000514
Is a weight matrix, matrix of node vectors in a news knowledge graph
Figure BDA00030292925200000515
Will NgAnd NbPacked mapping of combined vectors to
Figure BDA00030292925200000516
In vector space of (a), vbIs an important entity vector, n is a sequence number,
step 3.3: node viVector v ofiWith the expression vector N of the news knowledge graphrPerforming product operation to obtain the fraction of each node
Figure BDA00030292925200000517
And outputting the fraction by using a Soft max function to obtain the probability of the node, which is shown as the following formula:
Figure BDA0003029292520000061
Figure BDA0003029292520000062
wherein
Figure BDA0003029292520000063
Is the score of the node or nodes,
Figure BDA0003029292520000064
is a node viThe score of (a) is calculated,
Figure BDA0003029292520000065
inserting the named entity into the news matching description, namely the probability of the entity being inserted into a placeholder in the news matching description;
and then training an NKD-GNN model by using a cross entropy loss function and using a time-based back propagation algorithm to perform core entity prediction, wherein the cross entropy loss function is shown as the following formula:
Figure BDA0003029292520000066
wherein y isiCoding the core entity one-hot marked in the news knowledge graph,
Figure BDA0003029292520000067
the probability of inserting a placeholder for the ith entity into the newswizzle description.
Step 3.4: the core entities predicted by NKD-GNN are inserted into the newsreader description with placeholders.
Specifically, the highest probability of each type of named entity is taken and inserted into the corresponding placeholder according to the entity type to obtain the description of the news matching graph with the named entities; when the placeholders of the newsletter bitmap with placeholders do not have corresponding inserted named entities, they are replaced with the type in the placeholder, for example, the vocabulary "PERSON" is used to replace slot < PERSON >.
And 4, calculating the matching of the news text and the news matching description with the named entities.
The method and the device can finally judge whether the image and text of the news are matched on the basis of making up the semantic difference between the news text and the news matching picture by calculating the matching property of the news text and the description of the news matching picture with the named entity. There are still sentence and structure differences between the description of the newsletter with named entities and the newsletter text. When calculating the matching between the two, not only the sentence pattern structure similarity of the two needs to be analyzed, but also the similarity of the two keywords needs to be calculated. The invention provides a method for calculating the matching of a Hybrid Co-Attention Network (HCAN) and a text of a news, which adopts a Hybrid Co-Attention Network (HCAN) method to calculate the matching of the Hybrid Co-Attention Network and the HCAN, analyzes the sentence structure similarity of the Hybrid Co-Attention Network and calculates the similarity of keywords of the Hybrid Co-Attention Network and the HCAN when calculating, firstly divides the text of the news into a plurality of single sentences, and if one single sentence is matched with the description of a news matching picture with a named entity, the news is considered as the news matched with pictures and texts.
The specific calculation method of the matching performance of the invention is as follows:
step 4.1: generating two sentence Word vectors to be compared by using a Word2vec tool, wherein each sentence consists of a plurality of Word vectors, and the two sentence vectors to be compared are respectively UqAnd Uc
Step 4.2: multiplying each word vector of two sentences to obtain a similarity matrix
Figure BDA0003029292520000071
S∈Rn ×mM is a sequence number between 0 and n;
step 4.3: for the matrix S ∈ Rn×mNormalizing each word vector, namely performing mean-posing and max-posing on the score of each word, and outputting UqAnd UcWherein the normalization method is as follows:
Figure BDA0003029292520000072
Figure BDA0003029292520000073
max S is the set of all maximal pooling;
Figure BDA0003029292520000074
max pooling operations for each word; mean(s) is the set of average pooling;
Figure BDA0003029292520000075
an average pooling operation for each word;
step 4.4: calculate UqAnd UcTF-IDF weight, namely wgt (q), of each word in two sentences is fully considered in normalization, so that U is obtainedqAnd UcCorrelation matching output ORMAnd classify U using Soft maxqAnd UcWhether there is a match. If the two sentences are matched, judging that the news images and texts are matched; and if the description of the news match with the named entity is not matched with all the single sentences of the news text, the news graph and text are considered not to be matched. Wherein O is calculatedRMAnd the formula for classification using the Softmax function is as follows:
Figure BDA0003029292520000076
o=soft max(ORM) (16)
in summary, the input of the detection method of the invention is the news matching chart, the news text and the articles related to the news matching chart, and the output is the matching property of the news text and the news matching chart. The overall implementation of the method of the invention is illustrated by a specific example.
The embodiment is established on a cloud computing platform, the platform consists of 15 servers, and comprises Vmware Esxi 5, a 20T disk array and a 1000M network switch, a Hadoop cluster is deployed, and two news images are provided, as shown in FIG. 2, each of the two news images includes three related articles. Wherein (a) is news with matched pictures and texts, and (b) is news with unmatched pictures and texts. (a) The news states that the sport events in European crown, and the match shows that Timo Werner plays football, and the related articles have 8 named entities, namely people, places and organizations. Constructing the entities into a news knowledge graph, wherein the core named entity in the news knowledge graph of (a) is Timo Werner and is scored by NKD-GNN. The most highly scored in the Person class entity is Timo Werner and the most highly scored in the Place class entity is Cologne. Therefore, the two entities are inserted into the description of the news matching graph with the placeholder, and the explanatory description of the matching graph < Timo Werner is playing the socker in Cologne > is obtained. (b) News teaches the change of consumer behavior under economic downlink conditions, and related articles have 10 named entities, namely people, places and buildings. The named entities of the two cases of news are constructed into a news knowledge graph, and then each entity is scored. The highest scoring in the Organization class entity is Federal agent, and the highest scoring in the Place class entity is Washington. An explanatory description of the assignment chart (b) < Federal agent binding in the Washington > was obtained.
(a) And the entities described in the news text are consistent with the entities in the description of the news matching chart with the named entities, so that the detection result is matching, and the detection is correct. (b) Described in the news text is the Norton Western University professor Pittr Dworkzak which teaches changes in consumer behavior. The mapping is that the federal police is maintaining social order. And (b) generating a news matching picture with named entities according to the related article of the news matching picture, wherein the news matching picture is described as a local agent marking in the Washington after the fire, is completely irrelevant to the news text, is judged to be unmatched by the news picture and text through calculation, and is detected correctly.
By the example, the news knowledge-driven image-text mismatching news detection method is shown, and the semantic difference between the news text and the news matching map is reduced by generating the news matching map description with the named entities, so that the matching performance of the news text and the news matching map is accurately calculated.

Claims (7)

1. An NKD-GNN-based image-text mismatching news detection method is characterized by comprising the following steps:
step 1, generating a news matching description with placeholders for the news matching;
step 2, constructing the named entities into a news knowledge graph according to the connection rule;
step 3, selecting a named entity related to the news matching based on a graph neural network driven by a news knowledge graph, and inserting the named entity into the description of the news matching, thereby generating the description of the news matching with the named entity, wherein the method comprises the following steps:
step 3.1: aggregating all edge and all node information in the news knowledge graph by using a graph neural network so as to obtain each node vector v;
step 3.2: setting the named entity with the most edges in the news knowledge graph as an important node vbWhen the named entities with the most edges in the news knowledge graph are multiple, the important node v with the highest frequency is taken as the important node vb(ii) a Global vector of news knowledge map
Figure FDA0003558589170000011
And important node vector
Figure FDA0003558589170000012
Combining to obtain the expression vector N of the news knowledge mapr
Wherein, firstly, an attention mechanism is adopted to weight each node vector, and then the node vectors are summed to obtain a global knowledge vector N of the news knowledge graphg(ii) a Secondly important node vbIs defined as Nb(ii) a Finally, the global knowledge vector N of the news knowledge graph is usedgAnd important entity vector NbPerforming linear splicing to obtain a news knowledge map representation vector NrThe process is as follows:
αi=qTσ(W1vb+W2vi)
Figure FDA0003558589170000013
Figure FDA0003558589170000014
wherein alpha isiFor node coefficient and parameter in news knowledge graph
Figure FDA0003558589170000015
The purpose being to transpose the momentThe matrix is obtained by multiplying two matrixes to obtain a specific numerical value
Figure FDA0003558589170000016
Sum matrix
Figure FDA0003558589170000017
Is a weight matrix, matrix of node vectors in a news knowledge graph
Figure FDA0003558589170000021
Will NgAnd NbPacked mapping of combined vectors to
Figure FDA0003558589170000022
In vector space of (a), vbIs an important entity vector, and n is a sequence number;
step 3.3: node viVector v ofiWith the expression vector N of the news knowledge graphrPerforming product operation to obtain the fraction of each node
Figure FDA0003558589170000023
Outputting the fraction by using a Soft max function to obtain the probability of the node; then, training an NKD-GNN model by using a cross entropy loss function and using a time-based back propagation algorithm to perform core entity prediction;
step 3.4: inserting the core entity predicted by the NKD-GNN into the news mapping description with the placeholder;
and 4, calculating the matching of the news text and the description of the news matching with the named entity by adopting a Hybrid Co-Attention Network (HCAN) method, analyzing sentence structure similarity of the news text and the description of the news matching with the named entity and calculating the similarity of keywords of the news text and the description of the news matching with the named entity during calculation, firstly dividing the text of the news into a plurality of single sentences, and if one single sentence is matched with the description of the news matching with the named entity, considering the news as the news matching with the pictures and the texts.
2. The NKD-GNN-based teletext mismatch detection method according to claim 1, wherein the method for generating the placeholder-carrying newsfeld description in step 1 is as follows:
step 1.1: generating a news matching description by using an open source pre-trained image generation description model, wherein the model follows the Encoder-Decoder design idea, uses CNN to extract image characteristics at the encoding stage, and uses RNN to generate the news matching description at the decoding stage;
step 1.2: for the generated description of the newsflash, a WordNet tool is adopted to replace the vocabulary in the newsflash description, which is in the same semantic tree with 'Person', by a < Person > placeholder, replace the vocabulary in the same semantic tree with 'Place' by a < Place > placeholder, and replace 'a group of sight' in the newsflash description by an < Organization > placeholder; replacing the Building vocabulary in the description of the news matching graph with a < Building > placeholder; thereby generating a newsflash description with four categories of placeholders < Person >, < Place >, < Organization >, < Building >.
3. The NKD-GNN-based teletext mismatch detection method according to claim 2, wherein the method for constructing a news knowledge graph in step 2 is as follows:
step 2.1: using a SpaCy's named entity recognizer to extract named entities of related articles of news, and reserving four kinds of named entities of Person, Organization, Location and Building;
step 2.2: the reserved named entities form an entity set V ═ { V ═ V1,v2,...,vm}; named entities appearing in the same sentence are connected by edges, and all the edges form an edge set E ═ { E }1,e2,...,emThe weighted value of the edge e is calculated as follows:
Figure FDA0003558589170000031
wherein E is E, HeA weight value of the edge e, i.eCo-occurrence of two entities, vhAnd vtAre two named entities connected by an edge e,
Figure FDA0003558589170000032
is vhAnd vtThe number of times of co-occurrence,
Figure FDA0003558589170000033
and
Figure FDA0003558589170000034
are each vhAnd vtThe number of individual occurrences, the graph G ═ V, E, which is composed of all named entities and all edges, i.e., the news map.
4. The NKD-GNN-based teletext detection method according to claim 1, wherein the step 3.1 is a node v in a news knowledge graphiProcess of input to a graph neural network, and graph neural network updating entity vector viThe process of (1) is as follows:
Figure FDA0003558589170000035
Figure FDA0003558589170000036
Figure FDA0003558589170000037
Figure FDA0003558589170000038
Figure FDA0003558589170000039
wherein
Figure FDA00035585891700000310
Is the input and weight matrix corresponding to the ith node in the news knowledge map at the time t
Figure FDA00035585891700000311
Figure FDA00035585891700000312
Is a set of node vectors at time t-1,
Figure FDA00035585891700000313
is a contiguous matrix of the news knowledge-graph,
Figure FDA00035585891700000314
a blocking matrix corresponding to the ith entity of the adjacency matrix of the news knowledge-graph,
Figure FDA00035585891700000315
is a reset gate that is turned on and off,
Figure FDA00035585891700000316
is an update gate, σ () is a sigmoid function,
Figure FDA00035585891700000317
is a point-by-point operator and,
Figure FDA00035585891700000318
entity v at time t-1iVector, WzIs time t
Figure FDA00035585891700000319
Weight matrix of WrAt time t ri tWeight matrix of WoWeights for activating function inputs at time tHeavy matrix, UzIs the time t-1
Figure FDA0003558589170000041
Weight matrix of, UrIs a time r of t-1i tWeight matrix of, UoThe weight matrix for the activation function input at time t-1,
Figure FDA0003558589170000042
is an entity viThe candidate vector of (a) is selected,
Figure FDA0003558589170000043
is the s-th reset gate; n is a serial number.
5. The NKD-GNN-based teletext detection method according to claim 1, wherein in step 3.3, the calculation process is as follows
Figure FDA0003558589170000044
Figure FDA0003558589170000045
Wherein
Figure FDA0003558589170000046
Is the score of the node or nodes,
Figure FDA0003558589170000047
is a node viThe score of (a) is calculated,
Figure FDA0003558589170000048
inserting the named entity into the news matching description, namely the probability of the entity being inserted into a placeholder in the news matching description;
the cross entropy loss function is shown as follows:
Figure FDA0003558589170000049
wherein y isiCoding the core entity one-hot marked in the news knowledge graph,
Figure FDA00035585891700000410
the probability of inserting a placeholder for the ith entity into the newswizzle description.
6. The NKD-GNN-based teletext detection method according to claim 1, wherein in step 3.4, the most probable of each type of named entity is taken and inserted into the corresponding placeholder according to the entity type to obtain a news mapping description with the named entities; when the placeholder of the newswizzle with placeholder does not have a corresponding inserted named entity, the placeholder is replaced with the type in the placeholder.
7. The NKD-GNN-based teletext mismatch detection method according to claim 1, wherein the specific calculation method of the matching is as follows:
step 4.1: generating two sentence Word vectors to be compared by using a Word2vec tool, wherein each sentence consists of a plurality of Word vectors, and the two sentence vectors to be compared are respectively UqAnd Uc
Step 4.2: multiplying each word vector of two sentences to obtain a similarity matrix
Figure FDA00035585891700000411
S∈Rn×mM is a sequence number between 0 and n;
step 4.3: for the matrix S ∈ Rn×mNormalizing each word vector, namely performing mean-posing and max-posing on the score of each word, and outputting UqAnd UcWherein the normalization method is as follows:
Figure FDA0003558589170000051
Figure FDA0003558589170000052
max S is the set of all maximal pooling;
Figure FDA0003558589170000053
max pooling operations for each word; mean(s) is the set of average pooling;
Figure FDA0003558589170000054
an average pooling operation for each word;
step 4.4: calculate UqAnd UcTF-IDF weight, namely wgt (q), of each word in two sentences is fully considered in normalization, so that U is obtainedqAnd UcCorrelation matching output ORMAnd classify U using SoftmaxqAnd UcWhether the two are matched; if the two sentences are matched, judging that the news images and texts are matched; if the description of the news matching picture with the named entity is not matched with all the single sentences of the news text, the news picture and text are considered not to be matched; wherein O is calculatedRMAnd the formula for classification using the Softmax function is as follows:
Figure FDA0003558589170000055
o=softmax(ORM)。
CN202110424490.8A 2021-04-20 2021-04-20 News detection method for image-text mismatching based on NKD-GNN Active CN113297387B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110424490.8A CN113297387B (en) 2021-04-20 2021-04-20 News detection method for image-text mismatching based on NKD-GNN

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110424490.8A CN113297387B (en) 2021-04-20 2021-04-20 News detection method for image-text mismatching based on NKD-GNN

Publications (2)

Publication Number Publication Date
CN113297387A CN113297387A (en) 2021-08-24
CN113297387B true CN113297387B (en) 2022-04-29

Family

ID=77319956

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110424490.8A Active CN113297387B (en) 2021-04-20 2021-04-20 News detection method for image-text mismatching based on NKD-GNN

Country Status (1)

Country Link
CN (1) CN113297387B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113626564B (en) * 2021-10-09 2021-12-17 腾讯科技(深圳)有限公司 Concept label generation method and device, electronic equipment and storage medium
CN114218962B (en) * 2021-12-16 2022-08-19 哈尔滨工业大学 Artificial intelligent emergency semantic recognition system and recognition method for solid waste management information

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108256065A (en) * 2018-01-16 2018-07-06 智言科技(深圳)有限公司 Knowledge mapping inference method based on relationship detection and intensified learning
CN109933802A (en) * 2019-03-25 2019-06-25 腾讯科技(深圳)有限公司 Picture and text matching process, device and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12106217B2 (en) * 2018-05-18 2024-10-01 Benevolentai Technology Limited Graph neutral networks with attention
CN108984745B (en) * 2018-07-16 2021-11-02 福州大学 Neural network text classification method fusing multiple knowledge maps
CN109885796B (en) * 2019-01-25 2020-01-03 内蒙古工业大学 Network news matching detection method based on deep learning
CN110008879A (en) * 2019-03-27 2019-07-12 深圳市尼欧科技有限公司 Vehicle-mounted personalization audio-video frequency content method for pushing and device
CN111046664A (en) * 2019-11-26 2020-04-21 哈尔滨工业大学(深圳) False news detection method and system based on multi-granularity graph convolution neural network
CN112241481B (en) * 2020-10-09 2024-01-19 中国人民解放军国防科技大学 Cross-modal news event classification method and system based on graph neural network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108256065A (en) * 2018-01-16 2018-07-06 智言科技(深圳)有限公司 Knowledge mapping inference method based on relationship detection and intensified learning
CN109933802A (en) * 2019-03-25 2019-06-25 腾讯科技(深圳)有限公司 Picture and text matching process, device and storage medium

Also Published As

Publication number Publication date
CN113297387A (en) 2021-08-24

Similar Documents

Publication Publication Date Title
CN111368996B (en) Retraining projection network capable of transmitting natural language representation
US12056458B2 (en) Translation method and apparatus based on multimodal machine learning, device, and storage medium
CN110737801B (en) Content classification method, apparatus, computer device, and storage medium
CN109840287B (en) Cross-modal information retrieval method and device based on neural network
CN109544524B (en) Attention mechanism-based multi-attribute image aesthetic evaluation system
CN110245229B (en) Deep learning theme emotion classification method based on data enhancement
CN112749608B (en) Video auditing method, device, computer equipment and storage medium
CN113627447B (en) Label identification method, label identification device, computer equipment, storage medium and program product
CN108830287A (en) The Chinese image, semantic of Inception network integration multilayer GRU based on residual error connection describes method
CN112749274B (en) Chinese text classification method based on attention mechanism and interference word deletion
CN113591483A (en) Document-level event argument extraction method based on sequence labeling
CN110390363A (en) A kind of Image Description Methods
KR101837262B1 (en) Deep learning type classification method with feature-based weighting
US11687716B2 (en) Machine-learning techniques for augmenting electronic documents with data-verification indicators
CN112232087B (en) Specific aspect emotion analysis method of multi-granularity attention model based on Transformer
CN113297387B (en) News detection method for image-text mismatching based on NKD-GNN
CN110287341B (en) Data processing method, device and readable storage medium
CN112148831B (en) Image-text mixed retrieval method and device, storage medium and computer equipment
US20220300708A1 (en) Method and device for presenting prompt information and storage medium
CN113627151B (en) Cross-modal data matching method, device, equipment and medium
CN113627550A (en) Image-text emotion analysis method based on multi-mode fusion
CN111177402A (en) Evaluation method and device based on word segmentation processing, computer equipment and storage medium
CN117094291B (en) Automatic news generation system based on intelligent writing
CN110309515B (en) Entity identification method and device
CN112131345A (en) Text quality identification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant