CN113298365B

CN113298365B - Cultural additional value assessment method based on LSTM

Info

Publication number: CN113298365B
Application number: CN202110515653.3A
Authority: CN
Inventors: 倪渊; 张腾; 韩鹏飞; 徐磊; 齐林; 王佳
Original assignee: Beijing Information Science and Technology University
Current assignee: Beijing Information Science and Technology University
Priority date: 2021-05-12
Filing date: 2021-05-12
Publication date: 2023-12-01
Anticipated expiration: 2041-05-12
Also published as: CN113298365A

Abstract

The application belongs to the technical field of cultural additional value assessment, and relates to a cultural additional value assessment method based on LSTM, which comprises the following steps: constructing a three-dimensional index system based on a person-enterprise-society; step 2: establishing a feature word list representing a comment corpus of cultural products to be evaluated; step 3: extracting a characteristic sentence to obtain characteristic sentence data; step 4: training an LSTM network model; step 5: performing accuracy test and prediction on the LSTM network model to obtain an emotion value; step 6: weighting the indexes of the three-dimensional index system in the step 1; step 7: and establishing a cultural additional value calculation equation model to obtain a cultural additional evaluation value. The method optimizes the defects of excessively subjective evaluation indexes, difficult quantification and the like in the traditional evaluation model, and is suitable for the problems of large scale of comment data under the environment of a research network platform and the like.

Description

Cultural additional value assessment method based on LSTM

Technical Field

The application belongs to the technical field of cultural additional value assessment, relates to a cultural additional value assessment method based on LSTM (long-short-term memory artificial neural network), and particularly relates to a cultural additional value assessment method based on LSTM (long-term memory artificial neural network) neural network.

Background

The rapid development of internet technology has led to a digital economic trend. Under the background of a new era, the literature industry gradually moves to digitization and intellectualization, and brings different cultural experiences to people. The organic fusion of the numbers, the cultures and the platforms derives a series of innovative forms and new business states, so that the created products are not simply reproduced in traditional culture, and the fusion symbiosis of the products and different cultures is realized through a digital technology, so that hollow and rigid cultural symbols are "alive" and more cultural added value is brought to the products. For example, network literature museums create countless "net red products" that are deep enough to gain popularity: countless powders are absorbed by the uterine curbs, and the adhesive tape is in harmony and in the batch to be flushed through the network. The high cultural added value enables the cultural product to meet the mental culture requirement of consumers, become an important means for merchants to win consumer favor, create unique cultural brand images, enable excellent cultures stored therein to enter the lives of ordinary people, and become cultural carriers and propaganda people.

Therefore, the culture additional value improvement is a main trend of the development of the culture industry, and a new round of thinking of culture enterprises and academia is initiated accordingly: how much the added value of the culture improves the original product, how the fusion of different cultural elements and products can improve the added value of the culture, how to use the rules behind the added values to guide the design and branding of the cultural product to shape? ". The resolution of these key questions must first answer: "what constitutes the cultural added value" and "how to measure the cultural added value", however, the research on the two basic problems is still mainly based on qualitative analysis, and the exploration of the quantification method of the cultural added value is lacking. In view of the above, the application analyzes connotation and structure of cultural added value from emotion view; the product comment data on the network platform is used as a support, a cultural additional value assessment method based on LSTM fine granularity emotion analysis is provided, and a reference is provided for subsequent corresponding research.

Disclosure of Invention

The application aims at: a cultural added value evaluation method based on an LSTM neural network is provided, and an index system of the cultural added value and an LSTM emotion analysis evaluation model are constructed to solve the problems in the background technology.

The application is realized by the following technical scheme:

a cultural additional value assessment method based on LSTM comprises the following steps:

step 1: constructing a three-dimensional index system based on personal-enterprise-society from the hierarchical function view of cultural additional value;

step 2: preparing a comment corpus of cultural products to be evaluated, performing word segmentation on the comment corpus, and then establishing a feature word list representing the comment corpus of cultural products to be evaluated based on a TF-IDF algorithm;

step 3: extracting a characteristic sentence to obtain characteristic sentence data;

step 4: training an LSTM network model by utilizing the feature sentence data extracted in the step 3, selecting cross entropy as a loss function parameter, waiting for the convergence of the loss function, and obtaining a learning process curve;

step 5: performing accuracy test and prediction on the LSTM network model to obtain an emotion value;

step 6: weighting the indexes of the three-dimensional index system in the step 1;

step 7: and establishing a cultural additional value calculation equation model to obtain a cultural additional evaluation value.

Based on the technical scheme, the step 1 specifically comprises the following steps: referring to the related documents of the existing cultural additional value evaluation and the hierarchical function view angle, constructing a three-dimensional index system based on a person-enterprise-society;

the three-dimensional index system based on the person-enterprise-society comprises the following steps: 3 primary indexes;

the 3 primary indexes include: cultural mental enjoyment, cultural brand shaping and cultural essence inheritance;

the cultural mental enjoyment includes the following secondary indicators: ornamental value of cultural products and artistic value of cultural products;

the cultural brand shaping comprises the following secondary indexes: the awareness of the cultural brands and the loyalty of the cultural brands;

the cultural essence inheritance comprises the following secondary indexes: inheritance of culture and transmissibility of culture.

On the basis of the technical scheme, the basic unit of the comment stock is a single comment;

the specific steps of the step 2 are as follows:

step 2.1: the comments of the comment database are segmented by calling a segmentation module of the jieba tool, and a corpus segmentation result is obtained;

step 2.2: and setting necessary parameters such as word frequency retention threshold values and the like by adopting a TF-IDF algorithm of a jieba tool to obtain a characteristic word list required for representing the whole comment corpus.

Based on the technical scheme, the specific steps of the step 2.2 are as follows:

step 2.2.1: extracting keywords by using a TF-IDF (word frequency-inverse document frequency) algorithm, wherein the keywords are specifically as follows: the calculation is performed by using the formulas (1), (2) and (3),

wherein TF is _ω Word frequency is term omega;

wherein IDF is reverse file frequency; if the number of the valid comment data containing a term is smaller, the IDF is larger, and the term has good category distinguishing capability;

TFIDF＝TF _ω *IDF (3)

wherein, TFIDF is: word frequency-inverse document frequency;

step 2.2.2: determining a word frequency retention threshold, and screening entries with a value of TFIDF higher than the word frequency retention threshold as keywords (for example, determining that the word frequency retention threshold is 20); such screening tends to filter out common words, preserving relatively important words;

counting word frequency of the keywords by using a Counter library to obtain candidate feature words;

the Counter library is one of python, belongs to the subclass of dictionary, the element is stored as the keyword of dictionary, and the number of times the keyword appears is stored as corresponding value;

and finally, classifying candidate feature words by manual screening and distinguishing according to a three-dimensional index system of a person, an enterprise and a society, and obtaining a feature word list required by representing the whole comment corpus.

On the basis of the technical scheme, the characteristic sentence comprises: displaying the feature sentence and the implicit feature sentence;

the specific steps of the step 3 are as follows:

firstly, extracting explicit characteristic sentences;

traversing word by word for word segmentation results of all the corpus, comparing the word by word with the feature word list in the step 2, and taking the matched feature words as feature attributes of comments where the vocabulary entries are located;

extracting comments with characteristic attributes and marking the comments as explicit characteristic sentences;

performing dependency analysis on the extracted explicit feature sentence by using a Stanford NLP platform, and extracting the modifier of the explicit feature sentence;

the specific steps of extracting modifier words of the explicit feature sentences are as follows: traversing the entry of the explicit feature sentence word by word, comparing the entry with the modifier of the HowNet emotion dictionary, and taking the matched modifier as the modifier of the explicit feature sentence where the entry is located;

the HowNet emotion dictionary comprises: adjectives, nouns, verbs, adverbs, and combinations thereof;

aiming at the explicit feature sentences matched to modifier words, the following processing is carried out:

the feature words of the display feature sentences are used as leading words, the modifier words of the display feature sentences are used as emotion words, and an attribute feature-emotion word pair is constructed, so that an attribute feature-emotion word-attribute emotion word pair weight is obtained;

the attribute features are: dominant words;

and marking the attribute emotion word pair weight as: SQ, calculated according to equation (4),

and a second step of: extracting implicit characteristic sentences;

aiming at the feature sentences which are not matched with the feature words, traversing the vocabulary entries word by word, and comparing the vocabulary entries with modifier words of the HowNet emotion dictionary;

when the feature sentence which is not matched with the feature word is not matched with the modifier word, deleting the feature sentence;

when the feature sentence which is not matched with the feature word is matched with the modifier, the matched modifier is used as the modifier of the feature sentence where the entry is located, and the modifier is used as the emotion word;

then, according to the obtained attribute feature-emotion word-attribute emotion word pair weight, selecting the attribute feature with the largest attribute emotion word pair weight as the feature word of the feature sentence which is not matched with the feature word according to the emotion word in the feature sentence which is not matched with the feature word;

taking the feature sentences which are not matched with the feature words obtained by the feature words as implicit feature sentences;

the standby NLP platform is a natural language processing tool kit, and integrates a plurality of very practical functions, including word segmentation, part-of-speech tagging, syntactic analysis and the like; the Standford NLP platform is not a deep learning framework, but a trained model, which can be analogized to a piece of software; the stanford NLP platform is written in Java language and has a python interface;

namely: for the rest comments which are not matched with the feature words, the feature is not clear enough, the corpus word segmentation result is required to be imported into a stanford NLP platform for sentence-based dependency relation mining, and the feature which is not clear is mined through the step.

Based on the technical scheme, the specific steps of the step 4 are as follows:

step 4.1: manually labeling each feature sentence aiming at the feature sentence extracted in the previous step;

the label expressing positive emotion is marked as +1, the label expressing negative emotion is marked as-1, and the label expressing neutral emotion is marked as 0;

step 4.2: converting the characteristic sentence into a word vector by using word2 vec;

classifying the feature sentences according to the secondary index and the primary index of the feature words matched with the feature sentences;

and taking the word vector, the feature words corresponding to the feature sentences, the classification results of the feature sentences and the labels corresponding to the feature sentences as: feature sentence data;

step 4.3: dividing the feature sentence data into training set data and test set data;

step 4.4: the quantitative ratio of the training set data to the test set data is set to 4:1.

Based on the technical scheme, the specific steps of the step 4 are as follows: training an LSTM network model by using training set data; the LSTM network model is tested using the test set data.

Based on the technical scheme, the activation function of the LSTM network selects tan h function, the word vector dimension value is set to be 100, and the data batch processing capacity is 32, namely 32 samples are selected as input at each time.

In addition, in the deep learning network training process, in order to prevent the overfitting phenomenon, neurons are temporarily discarded from the network according to a certain probability, so that joint adaptability among the neuron nodes is weakened, the generalization capability is enhanced, and the most randomly generated network structure is generated when the neuron discarding rate (namely a dropout value) is set to be 0.5 through cross verification; and selecting the cross entropy as a main parameter for drawing the LSTM network model learning curve, waiting for the curve to converge, and drawing a curve graph.

Based on the technical scheme, the specific steps of the step 5 are as follows: checking the accuracy rate, recall rate and F1 value of the LSTM network model trained in the step 4; and obtaining emotion values of all the secondary indexes by using the test set.

On the basis of the technical scheme, the weights of the indexes of the three-dimensional index system comprise: primary index weights (also known as primary index frequencies) and secondary index weights (also known as secondary index frequencies);

extracting a characteristic sentence with positive emotion;

the first-level index weight is calculated according to a formula (5),

wherein YJ1 is: the occurrence frequency (i.e. the times) of the first-level index feature words matched in the feature sentences with positive emotion, ZS is: the frequency of occurrence of all matched feature words in the feature sentences with positive emotion;

the secondary index weight is calculated according to a formula (6),

wherein EJ2 is: the occurrence frequency of the secondary index feature words matched in the feature sentences with positive emotion, ZS2 is as follows: in the primary index of the matched secondary index feature words in the feature sentences with positive emotion, the occurrence frequency of the feature words.

Based on the technical proposal, the cultural additional value calculation equation model in the step 7 is shown as the formula (7),

cultural additional evaluation value = cultural spirit enjoyment primary index weight (' ornamental ' secondary index weight of cultural product ' ornamental ' index emotion value of cultural product + ' artistic ' secondary index weight of cultural product ' artistic index emotion value) +cultural brand-shaping primary index weight (' awareness of cultural brand ' secondary index weight "+ ' loyalty of cultural brand ' index emotion value of cultural brand) +cultural essence inheritance primary index weight (' inheritance ' secondary index emotion value of cultural) inheritance ' index emotion value + ' transmissibility of cultural ' secondary index weight ' (7).

The beneficial technical effects of the application are as follows:

1. the application constructs a three-dimensional index system based on a person-enterprise-society from the hierarchical function view angle of cultural added value, and constructs the three-dimensional index system based on the person-enterprise-society, wherein the three-dimensional index system comprises 3 primary indexes and 6 secondary indexes. The index system has better systematicness and layering property, and reflects the significance of the perception value research on the development of the cultural industry;

2. and aiming at the cultural added value, adopting a perception value evaluation model of LSTM fine granularity emotion analysis. The method optimizes the defects of excessive subjectivity, difficult quantification and the like of the evaluation index in the traditional evaluation model, and is suitable for the problems of large scale of comment data under the environment of a research network platform and the like.

Drawings

The application has the following drawings:

FIG. 1 is a schematic diagram of a three-dimensional index architecture based on person-enterprise-society according to the present application.

Fig. 2 is a schematic flow chart of the cultural added value assessment method based on LSTM.

Detailed Description

The present application will be described in further detail with reference to the accompanying drawings.

As shown in fig. 1-2, the present application aims at: a cultural added value evaluation method based on an LSTM neural network is provided, and an index system of the cultural added value and an LSTM emotion analysis evaluation model are constructed to solve the problems in the background technology.

The application is realized by the following technical scheme:

a cultural added value assessment method based on LSTM neural network comprises the following steps:

step 1: constructing a three-dimensional index system based on personal-enterprise-society from the perspective of hierarchical functions of cultural added values;

step 2: preparing a comment corpus of cultural products to be evaluated, performing word segmentation on the comment corpus, and then establishing a characteristic word list from the comment corpus of cultural products to be evaluated based on a TF-IDF algorithm;

step 4: performing LSTM network model training by using the feature sentence data extracted in the step 3, selecting cross entropy as a loss function parameter, waiting for the convergence of the loss function, and obtaining a learning process curve;

step 6: the LSTM network model accuracy test and test set prediction are carried out, and emotion values are obtained;

step 6: and (3) weighting the indexes of the three-dimensional index system in the step (1).

Step 8: and establishing a cultural additional value calculation equation model to obtain a cultural additional evaluation value.

Further, the step 1 specifically includes: referring to the related documents of the existing cultural additional value evaluation and the hierarchical function view angle, a three-dimensional index system based on a person-enterprise-society is constructed; the culture added value is considered to be represented by the first-level index: the sum of three elements of cultural mental enjoyment, cultural brand shaping and cultural essence inheritance and the mutual relations thereof. On the basis of more comprehensively and evenly covering three traditional characteristic factors of individuals, enterprises and society of cultural products, the essential connotation of cultural elements is combined, and finally 6 secondary indexes are respectively extended, namely ornamental value of the cultural products, artistry of the cultural products, awareness of the cultural brands, inheritance of the cultural brands and transmissibility of the cultural are respectively formed, and finally a cultural additional value index system consisting of 3 primary indexes and 6 secondary indexes is formed.

Further, the step 2 specifically includes: preparing a cultural product comment corpus to be evaluated, wherein the basic unit of the corpus is a single comment, word segmentation is carried out on the corpus by calling a jieba module to obtain a word segmentation result of the corpus, and then parameters such as necessary word frequency retention threshold and the like are set by adopting a TF-IDF algorithm of the jieba to obtain a characteristic word list required for representing the whole comment corpus.

Further, the step 3 is specifically two steps of extracting an explicit feature sentence and an implicit feature sentence. Traversing word segmentation results of the corpus, comparing the word segmentation results with the feature word list in the step 2, and taking the matched feature words as feature attributes of comments where the vocabulary entries are located;

for implicit feature sentences with insufficient clear feature attributes, the corpus word segmentation result is imported to a Standford NLP platform to excavate sentence-based dependency relationship, and the undefined feature attributes are excavated through the step.

And step 4, summarizing the characteristic sentences which are described as characteristic attributes under the same index, extracting from the word segmentation result of the comment corpus, and carrying out centralized analysis and classification. Marking word segmentation results of the comment corpus of each category, manually labeling the feature sentences, marking the label expressing positive emotion as +1, marking the label expressing negative emotion as-1, and marking the label expressing neutral emotion as 0;

converting the characteristic sentence into a word vector by using word2 vec;

dividing the feature sentence data into training set data and test set data;

the quantitative ratio of the training set data to the test set data is set to 4:1.

The step 4 is specifically that based on the word segmentation result of the comment corpus with the tag, an LSTM network model is used for training, a tan h function is selected as an activation function of the model, a word vector dimension value is set to be 100, and a data batch processing amount is 32, namely 32 samples are selected as input at each time. In addition, in order to prevent the over-fitting phenomenon in the deep learning network training process, neurons are temporarily discarded from the network according to a certain probability, so that joint adaptability among neuron nodes is weakened, generalization capability is enhanced, and a random network structure is the largest when a dropout value is set to 0.5 through cross verification. Selecting cross entropy as a main parameter for drawing a model learning curve, waiting for curve convergence, and drawing a curve graph;

the step 5 specifically comprises the following steps: and (3) invoking the LSTM model trained in the step (4) to carry out emotion analysis on the corpus, checking the accuracy rate, recall rate and F1 value of the corpus, judging the performance of the model, and after the performance is confirmed, calculating the emotion values of all the secondary indexes.

The step 6 is specifically as follows: and (3) index weighting, screening the feature sentences with positive emotion polarity based on the classification result in the step (4), determining the corresponding frequency number of the secondary or primary index by comparing the feature word list, respectively calculating the primary index frequency and the secondary index frequency of the feature sentences, and setting the primary index frequency and the secondary index frequency as weights corresponding to the index values.

The step 7 is specifically as follows: and (3) establishing a cultural additional value calculation equation model, and referring to the weights of the indexes of each level formed by the step (6).

For example: cultural additional evaluation value (weighted total score) =0.399 (0.638) ×ornamental "index emotion value of cultural product+0.362) ×artistic" index emotion value of cultural product) +0.296 (0.569) ×knowledgeable "index emotion value of cultural brand+0.431) ×loyalty" index emotion value of cultural brand) +0.305 (0.382) ×inheritance "index emotion value of cultural+0.618) ×transmissibility" index emotion value of cultural

Wherein the decimal is the corresponding weight.

The foregoing description of the preferred embodiments of the application is not intended to limit the application to the form or principles of the application, but rather to cover all modifications, equivalents, alternatives, and improvements within the scope of the application.

What is not described in detail in this specification is prior art known to those skilled in the art.

Claims

1. The LSTM-based cultural additional value assessment method is characterized by comprising the following steps of:

step 1: constructing a three-dimensional index system based on personal-enterprise-society from the hierarchical function view of cultural additional value,

step 2: preparing a comment corpus of cultural products to be evaluated, performing word segmentation on the comment corpus, then establishing a characteristic word list representing the comment corpus of cultural products to be evaluated based on a TF-IDF algorithm,

step 3: extracting the characteristic sentence to obtain characteristic sentence data,

step 4: training LSTM network model by using the feature sentence data extracted in the step 3, selecting cross entropy as loss function parameter, waiting for the convergence of loss function to obtain learning process curve,

step 5: performing accuracy test and prediction on the LSTM network model to obtain emotion values,

step 6: weighting the indexes of the three-dimensional index system in the step 1,

step 7: establishing a cultural additional value calculation equation model to obtain a cultural additional evaluation value;

the three-dimensional index system based on the person-enterprise-society comprises the following steps: 3 primary indexes; the 3 primary indexes include: enjoyment of cultural spirit, modeling of cultural brands and inheritance of cultural essence,

the cultural mental enjoyment includes the following secondary indicators: ornamental value of cultural products and artistic quality of cultural products,

the cultural brand shaping comprises the following secondary indexes: the awareness of cultural brands and the loyalty of cultural brands,

the cultural essence inheritance comprises the following secondary indexes: inheritance of culture and transmissibility of culture; the basic unit of the comment library is a single comment;

the specific steps of the step 2 are as follows:

step 2.1: the comments of the comment database are segmented by calling a segmentation module of the jieba tool to obtain a corpus segmentation result,

step 2.2: setting word frequency retention threshold parameters by adopting a TF-IDF algorithm of a jieba tool to obtain a characteristic word list required for representing the whole comment corpus;

the specific steps of the step 2.2 are as follows:

step 2.2.1: extracting keywords by using a TF-IDF algorithm, specifically: the calculation is performed by using the formulas (1), (2) and (3),

wherein TF is _ω Word frequency is term omega;

wherein IDF is reverse file frequency;

TFIDF＝TF _ω *IDF (3)

wherein, TFIDF is: word frequency-inverse document frequency;

step 2.2.2: determining a word frequency retention threshold, and screening entries with the numerical value of TFIDF higher than the word frequency retention threshold as keywords;

finally, according to a three-dimensional index system of a person, an enterprise and a society, classifying candidate feature words in a grading manner through manual screening and distinguishing, and obtaining a feature word list required for representing the whole comment corpus;

the feature sentence comprises: displaying the feature sentence and the implicit feature sentence;

the specific steps of the step 3 are as follows:

firstly, extracting explicit characteristic sentences;

performing dependency analysis on the extracted explicit feature sentence by using a Stanford NLP platform, and extracting a modifier of the explicit feature sentence;

the attribute features are: dominant words;

and a second step of: extracting implicit characteristic sentences;

and taking the feature sentences which are not matched with the feature words as implicit feature sentences.

2. The LSTM based cultural additional value assessment method according to claim 1, wherein: the specific steps of the step 4 are as follows:

3. The LSTM based cultural additional value assessment method according to claim 2, wherein: the specific steps of the step 4 are as follows: training an LSTM network model by using training set data; testing the LSTM network model by using the test set data;

the activation function of the LSTM network is tan h function, the word vector dimension value is set to be 100, the data batch processing amount is 32, and the neuron discarding rate is set to be 0.5; and selecting the cross entropy as a parameter drawn by the LSTM network model learning curve.

4. The LSTM based cultural additional value assessment method according to claim 3, wherein: the specific steps of the step 5 are as follows: checking the accuracy rate, recall rate and F1 value of the LSTM network model trained in the step 4; and obtaining emotion values of all the secondary indexes by using the test set.

5. The LSTM based cultural additional value assessment method according to claim 4, wherein: the weights of the indexes of the three-dimensional index system comprise: a first level index weight and a second level index weight;

extracting a characteristic sentence with positive emotion;

the first-level index weight is calculated according to a formula (5),

wherein YJ1 is: the occurrence frequency of the first-level index feature words matched in the feature sentences with positive emotion, ZS is: the frequency of occurrence of all matched feature words in the feature sentences with positive emotion;

the secondary index weight is calculated according to a formula (6),

6. The LSTM based cultural additional value assessment method according to claim 5, wherein: the cultural additional value calculation equation model in the step 7 is shown as a formula (7),

cultural additional evaluation value = cultural spirit enjoyment primary index weight (' ornamental ' secondary index weight of cultural product ' ornamental ' index emotion value of cultural product + artistic ' secondary index weight of cultural product ' artistic index emotion value of cultural product) +cultural brand-shaping primary index weight (' awareness of cultural brand ' secondary index weight ' awareness of cultural brand + loyalty of cultural brand ' secondary index weight ' loyalty of cultural brand ' index emotion value) + inheritance of cultural essence first index weight (' inheritance of cultural ' secondary index weight ' inheritance of cultural ' index emotion value + transmission of cultural ' secondary index weight ' transmission of cultural ' index emotion value) (7).