Nothing Special   »   [go: up one dir, main page]

CN118132818B - Tourist area resource assessment method based on image difference - Google Patents

Tourist area resource assessment method based on image difference Download PDF

Info

Publication number
CN118132818B
CN118132818B CN202410355288.8A CN202410355288A CN118132818B CN 118132818 B CN118132818 B CN 118132818B CN 202410355288 A CN202410355288 A CN 202410355288A CN 118132818 B CN118132818 B CN 118132818B
Authority
CN
China
Prior art keywords
comment
virtual
degree
scenic spot
score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410355288.8A
Other languages
Chinese (zh)
Other versions
CN118132818A (en
Inventor
张书颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Geographic Sciences and Natural Resources of CAS
Original Assignee
Institute of Geographic Sciences and Natural Resources of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Geographic Sciences and Natural Resources of CAS filed Critical Institute of Geographic Sciences and Natural Resources of CAS
Priority to CN202410355288.8A priority Critical patent/CN118132818B/en
Publication of CN118132818A publication Critical patent/CN118132818A/en
Application granted granted Critical
Publication of CN118132818B publication Critical patent/CN118132818B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/14Travel agencies

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Human Resources & Organizations (AREA)
  • Probability & Statistics with Applications (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a tourist area resource assessment method based on image difference, in particular to the field of tourist resource analysis, which is characterized in that comment data on tourist lines are deeply mined, emotion analysis and attention trend calculation are carried out, and the stability of public praise is assessed, so that a virtual relevance quotient is finally obtained. Comprehensively considers the attention degree, the emotional tendency of the user, the public praise stability and other factors of the scenic spot in the virtual space, and comprehensively evaluates the association degree and the importance of virtual resources of the scenic spot in the digital environment. By calculating the ranking, the volatility index and the toughness index, the evaluation of the overall influence of the scenery is provided, and the stability and the volatility resistance of the operation can be found. The manager can more comprehensively and deeply understand the running state of the scenic spot in the virtual space, discover potential risks and improvement spaces in time, and provide powerful decision support for digital operation of the scenic spot.

Description

Tourist area resource assessment method based on image difference
Technical Field
The invention relates to the field of travel resource analysis, in particular to a travel area resource assessment method based on image differences.
Background
A large amount of online data is generated in the digitized environment, including social media comments, user interactions, virtual resources, etc., that reflect the user's attitudes, expectations, and ratings for travel resources. Through deep analysis of the digitized data, the image, association degree and influence of the travel resource in the virtual space can be comprehensively known. The evaluation of the running risk is not only helpful for finding potential problems and challenges, but also can early warn possible negative trends in advance, and provides scientific basis for the travel resource manager to formulate effective coping strategies. In addition, the assessment method in the digital environment can improve the objectivity and accuracy of assessment by means of advanced technical means such as time sequence analysis, emotion analysis, virtual association quotient and the like, so that the assessment method is more in line with the actual situation. Therefore, the travel resource operation risk assessment is carried out in the digital environment, comprehensive and deep information support can be provided for managers, and the managers are helped to better understand market dynamics and user demands, so that the competitiveness and sustainable development capability of the travel resource are improved more effectively.
Conventional image-difference-based assessment of operational risk of scenic spot travel in a data-based environment presents a number of problems: the traditional method may ignore the time sequence characteristics of tourist comment data, and cannot capture the change trend of the attention of the user to the scenic spot along with time. The evaluation cannot reflect the actual attitude and dynamic requirements of the user in time, and accurate prediction of the running risk is affected. In addition, the conventional method may lack deep mining and comprehensive analysis of guest comment data, and cannot comprehensively consider the relevancy of comments in terms of semantic space and context embedding, so that the evaluation result is unilateral and not comprehensive enough. Therefore, the traditional screening and analysis method may face the problems of information loss, insufficient timeliness, insufficient comprehensiveness and the like, thereby limiting accurate evaluation of relevant comments of the scenery region. Therefore, the conventional method has the problems of insufficient information, strong subjectivity, insufficient analysis depth and the like when facing the complex situation in the digital environment, and cannot provide comprehensive and accurate running risk assessment.
In order to solve the above problems, a technical solution is now provided.
Disclosure of Invention
In order to overcome the defects in the prior art, the embodiment of the invention provides a travel area resource assessment method based on image difference, which is used for carrying out emotion analysis and attention trend calculation and assessing the public praise stability by deeply mining comment data on tourist lines, and finally obtaining a virtual relevance quotient. Comprehensively considers the attention degree, the emotional tendency of the user, the public praise stability and other factors of the scenic spot in the virtual space, and comprehensively evaluates the association degree and the importance of virtual resources of the scenic spot in the digital environment. By calculating the ranking, the volatility index and the toughness index, the evaluation of the overall influence of the scenery is provided, and the stability and the volatility resistance of the operation can be found. The manager can more comprehensively and deeply understand the running state of the scenic spot in the virtual space, discover the potential risk and the improvement space in time, and provide powerful decision support for digital operation of the scenic spot so as to solve the problems in the background technology.
In order to achieve the above purpose, the present invention provides the following technical solutions:
step S1, obtaining tourist comment data related to scenic spots in a target website by utilizing a crawler technology;
step S2, based on Word2Vec model training, combining semantic space dissimilarity degree and context embedding resonance degree, and screening related words;
Step S3, comprehensively calculating a positive attention trend score and a public praise stability score by combining a positive attention trend and public praise stability through time sequence analysis and emotion analysis to obtain a virtual relevance quotient;
and S4, calculating the virtual association degree quotient of different scenic spots in the same area, calculating the ratio of the overall ranking fluctuation index to the overall virtual association degree quotient, and evaluating the running risk of the scenic spots in the digitalized environment to obtain corresponding early warning signals.
In a preferred embodiment, step S1 comprises the following:
And (3) sequentially crawling tourist comment data of a plurality of target websites by using the constructed asynchronous crawler frame, recording a time stamp according to crawling at last when crawling according to a set timing task, only acquiring newly-added tourist comment data when crawling next time, and preprocessing the crawled original data, wherein the preprocessing comprises cleaning and de-duplication.
In a preferred embodiment, step S2 comprises the following:
collecting a large-scale text corpus, and training a Word2Vec tool library;
Training a Word vector model by using a trained Word2Vec tool library;
After training is completed, word segmentation is carried out on the comment text collected in the step S1 by using a word vector model, and each word is converted into a corresponding word vector;
And calculating association degree information between each comment word and the scenic spot keyword, wherein the association degree information comprises semantic space dissimilarity degree and context embedding resonance degree.
In a preferred embodiment, the process of obtaining semantic spatial dissimilarity is:
Step one, obtaining word vector representations of comment words and scenic spot keywords by using a trained word vector model;
calculating semantic space dissimilarity between comment words and scenic spot keywords by using cosine similarity:
wherein, AndWord vectors respectively representing comment words and scenic spot keywords;
The context embedding resonance degree acquisition process comprises the following steps:
Firstly, constructing a vocabulary distribution matrix for the whole corpus by using a trained word vector model, wherein each row represents a vocabulary, each column represents a position in the context, and elements in the matrix represent the probability of the word in the corresponding position;
step two, calculating the resonance degree between the comment word and the scenic spot keyword by using the vocabulary distribution matrix:
wherein, The vocabulary distribution of the comment words is represented,The vocabulary distribution of scenic spot keywords is represented.
In a preferred embodiment, the semantic space dissimilarity degree and the context embedding resonance degree of each comment word and the scenic spot keyword are comprehensively processed to obtain a relevance score;
Comparing the association degree score value with a correlation threshold value, and generating a related signal if the association degree score value is greater than or equal to the correlation threshold value; otherwise, if the association score value is smaller than the association threshold value, an irrelevant signal is generated.
In a preferred embodiment, step S3 comprises the following:
According to the collected tourist comment data, the tourist comment data are ordered according to time, and a time sequence of the concerned trend is constructed;
Carrying out emotion analysis on each comment to obtain emotion scores;
and extracting image information of the data according to the data subjected to emotion classification, wherein the image information comprises a positive attention trend score and a public praise stability score.
In a preferred embodiment, the process of obtaining the positive attention trend score is:
Grouping comments according to time windows, wherein rated overlapping degree exists between each time window, and comparing positive emotion proportions of different time windows to obtain a time sequence of attention;
calculating the average value of the proportion of the positive emotion comments in each time window by using a time sequence analysis method, and obtaining a positive attention trend;
The acquisition process of the public praise stability is as follows:
fitting the time distribution of the positive comments by using a normal distribution model to obtain parameters of the normal distribution, wherein the normal distribution is set as WhereinIs the mean value of the two values,Time distance of comments as standard deviationThe stability score is calculated as:
wherein, Representing the probability density of the positive comment at each time point, i.e. the public praise stability;
and comprehensively processing the positive attention trend score and the public praise stability score to obtain the virtual relevance quotient.
In a preferred embodiment, step S4 comprises the following:
And counting virtual association degree quotient of each scenic spot in the target area, ranking the scenic spots according to the virtual association degree quotient from large to small to obtain a ranking table, obtaining a plurality of ranking tables in a unit operation period, and calculating the ratio of the overall ranking fluctuation index to the overall virtual association degree quotient average value to obtain a virtual toughness index.
Comparing the virtual toughness index with a risk threshold;
If the virtual toughness index is greater than or equal to the risk threshold, generating a scenic spot operation stability signal;
and otherwise, if the virtual toughness index is smaller than the risk threshold, generating an operation risk signal.
The method for estimating the resources of the travel area based on the image difference has the technical effects and advantages that:
1. Through training of Word2Vec model and relevancy calculation of comment vocabulary, deep mining and semantic analysis of tourist comment data are achieved. Firstly, through collection and training of a large-scale corpus, word2Vec models can generate Word vectors with semantic information for each Word, and semantic relations among words are better captured. Secondly, through calculating semantic space dissimilarity degree and context embedding resonance degree, the association degree of comment words and scenic spot keywords in word vector space and context is comprehensively considered. The method effectively solves the problem of the traditional screening method, and improves accuracy and comprehensiveness of comment data of tourists. Finally, through setting the association threshold, the targeted extraction of comment vocabularies is realized, so that the obtained related information is more accurate, and the method is beneficial to the subsequent emotion analysis and calculation of attention degree trend. Therefore, the step S2 has the beneficial effects of improving deep understanding and relevance evaluation of tourist comment data and providing a more reliable basis for comprehensive analysis of subsequent steps.
2. And carrying out emotion analysis and attention trend calculation and evaluation of public praise stability by deeply mining tourist comment data, and finally obtaining the virtual association quotient. Comprehensively considers the attention degree, the emotional tendency of the user, the public praise stability and other factors of the scenic spot in the virtual space, and comprehensively evaluates the association degree and the importance of virtual resources of the scenic spot in the digital environment. By calculating the ranking, the volatility index and the toughness index, the evaluation of the overall influence of the scenery is provided, and the stability and the volatility resistance of the operation can be found. The manager can more comprehensively and deeply understand the running state of the scenic spot in the virtual space, discover potential risks and improvement spaces in time, and provide powerful decision support for digital operation of the scenic spot.
Drawings
FIG. 1 is a flow chart of the image difference-based travel area resource assessment method of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
FIG. 1 shows a method for estimating resources of a travel area based on image difference, which is characterized in that:
step S1, obtaining tourist comment data related to scenic spots in a target website by utilizing a crawler technology;
step S2, based on Word2Vec model training, combining semantic space dissimilarity degree and context embedding resonance degree, and screening related words;
Step S3, comprehensively calculating a positive attention trend score and a public praise stability score by combining a positive attention trend and public praise stability through time sequence analysis and emotion analysis to obtain a virtual relevance quotient;
and S4, calculating the virtual association degree quotient of different scenic spots in the same area, calculating the ratio of the overall ranking fluctuation index to the overall virtual association degree quotient, and evaluating the running risk of the scenic spots in the digitalized environment to obtain corresponding early warning signals.
Step S1 includes the following:
Non-blocking asynchronous crawling is implemented using an asynchronous crawler framework, such as asyncio or aiohttp. The asynchronous crawler allows processing a plurality of network requests at the same time without blocking the execution of programs, improves the efficiency of the crawler, and is particularly suitable for large-scale data grabbing tasks.
Asynchronous programming is a programming paradigm that allows programs to perform certain operations while waiting for other operations to complete, without blocking the execution of the program, through an asynchronous mechanism. Asynchronous programming is typically implemented using event loops, callback functions, or coroutines. In the crawler, the asynchronous crawler can process a plurality of requests simultaneously in a non-blocking mode, so that the efficiency is improved, and the method is particularly suitable for I/O intensive tasks.
Examples:
Installing an asynchronous crawler framework, and corresponding codes are as follows:
pipinstallaiohttp
writing asynchronous crawler codes:
the corresponding codes are as follows:
importaiohttp
importasyncio
asyncdeffetch_url(session,url):
asyncwithsession.get(url)asresponse:
returnawaitresponse.text()
asyncdefmain():
urls=["https://example.com/page1","https://example.com/page2","https://example.com/page3"]
asyncwithaiohttp.ClientSession()assession:
tasks=[fetch_url(session,url)forurlinurls]
responses=awaitasyncio.gather(*tasks)
forresponseinresponses:
# processing crawled data
print(response)
if__name__=="__main__":
asyncio.run(main())
In this example, aiohttp libraries are used to asynchronously initiate HTTP requests, asyncio to manage asynchronous tasks. By using asyncwith syntax, asynchronous mechanisms can be leveraged to handle requests for multiple URLs simultaneously without blocking execution of the program.
By adopting the asynchronous crawler, a plurality of network requests can be processed simultaneously without blocking the execution of the program, thereby fully utilizing network resources and improving concurrency and efficiency of the crawler. For large-scale tasks such as climbing the comment data of tourists in a view-finding area, the asynchronous crawler framework can remarkably accelerate the data grabbing process, so that the crawler can acquire a large amount of comment information more quickly and in real time. The method has important significance for scenic spot comment analysis with high real-time requirement and huge data volume, and is also helpful for timely knowing and responding to feedback of users.
And (3) sequentially crawling tourist comment data of a plurality of target websites by using the constructed asynchronous crawler frame, recording a time stamp according to crawling at last when crawling according to a set timing task, and only acquiring newly-added tourist comment data when crawling next time, thereby realizing incremental crawling according to a set time interval and ensuring timeliness.
For raw data crawled, the following preprocessing is performed:
cleaning:
removing special characters, HTML labels and the like in the comment text through a regular expression;
And a Chinese word segmentation tool, such as jieba, is used for word segmentation of the comment text, so that subsequent processing is facilitated.
And establishing an stop word list, removing stop words in the comments, and reserving meaningful words.
And (5) de-duplication:
calculating the hash value of the comment text, and storing the hash value of the crawled comment, so that repetition is avoided;
and using a unique index or a primary key of the database to ensure that the stored guest comment data is unique.
The efficient and real-time crawler design is realized by comprehensively utilizing the multithreading, asynchronous crawler technology, incremental crawling, real-time data cleaning and weight removing mechanisms. The method ensures that the comment data of tourists are acquired from a plurality of sources, and reduces the repeated crawling times at the same time, so that the crawler system is more stable and reliable.
The original text assuming a scenic spot comment is as follows:
"< p > this scenic spot is truly too beautiful-! Feel like a heaven. I like the scenery here very well, and must make a trip-! "
Now, special characters and HTML tags are removed by regular expressions, chinese word segmentation tools jieba are used to segment words, and stop word lists are built to remove nonsensical words.
The corresponding codes are as follows:
importre
importjieba
defpreprocess_comment(comment):
Removing special characters and HTML tags by using# regular expression
cleaned_comment=re.sub(r'<.*?>|&.*?;','',comment)
Chinese word segmentation
seg_list=jieba.cut(cleaned_comment,cut_all=False)
segmented_comment="".join(seg_list)
# Building stop word list
Stop_words=set ([ "this", "true", "too", "feel", "like", "i", "very", "like", "here", "absolute", "worth", "one", "trip" ]
meaningful_words=[wordforwordinsegmented_comment.split()ifwordnotinstop_words]
Return processed comment text #
returnmeaningful_words
Test #
Original_command= "< p > this scenic spot really is too beautiful-! Feel like a heaven. I like the scenery here very well, and must make a trip-! "
processed_comment=preprocess_comment(original_comment)
print(processed_comment)
The processed comment text is changed into a vocabulary list, and the corresponding codes are as follows:
[ 'scenic spot', 'beauty', 'heaven', 'landscape', 'worth', 'swim',
In this example, such processing removes HTML tags, special characters, chinese word segmentation, and removes some meaningless vocabulary by disabling the vocabulary, leaving more meaningful comment vocabulary. Such preprocessing may provide cleaner, useful data for subsequent analysis.
By adopting the association degree analysis method based on the word vector, comment vocabularies related to scenic spots can be efficiently extracted. The method utilizes a pre-trained word vector model to segment the comment text and convert each word into a corresponding word vector. Through calculating the association degree between each comment word and the scenic spot keywords, the method such as cosine similarity is used for screening out words with association degree higher than a set threshold value as comment words related to scenic spots. The technical characteristics can more accurately capture semantic relativity, adapt to specific fields and have the advantage of dynamic update.
Step S2 includes the following:
and collecting a large-scale text corpus, and ensuring that the corpus covers the field related to the scenic spot. An open-source corpus is used that has been preprocessed.
Word2Vec tool library was trained using Word library degrees.
The following is a training process of Word2Vec, which ensures that the operations of preprocessing, word segmentation and the like of data are completed, and the corresponding codes are as follows:
# import Word2Vec model
fromgensim.modelsimportWord2Vec
# Assume sentences is text data of your divided words, each element is a word division result of a sentence
sentences=[['word','embeddings','example'],...]
# Configuration Word2Vec model
model=Word2Vec(sentences,vector_size=100,window=5,min_count=1,workers=4)
# Save trained model
model.save("word2vec_model.model")
# Loading trained model
loaded_model=Word2Vec.load("word2vec_model.model")
The parameters in the above codes are described as follows:
vector_size, dimension of word vector.
Window-the size of the contextual window, the representation model will take into account the context of a word.
Min_count, namely ignoring words with word frequency smaller than the value, and avoiding the influence of too rare words on the model.
Workers the number of CPU cores used for training, and accelerating the training process.
In practical applications, these parameters may be adjusted according to the size, characteristics of the data set and the circumstances of the computing resources. After the trained Word2Vec model is stored, the Word2Vec model is used for acquiring subsequent Word vectors.
In addition, the vector_size in the code and the training corpus need to be adjusted and prepared according to the actual situation. When using the model, word vectors for word words can be obtained by model wv word.
Training a Word vector model by using a trained Word2Vec tool library, wherein the code is as follows:
fromgensim.modelsimportWord2Vec
sentences = [ [ 'word', 'embeddings', 'example',.+ -.) ], each element is a word segmentation result of one sentence
model=Word2Vec(sentences,vector_size=100,window=5,min_count=1,workers=4)
model.save("word2vec_model.model")
The parameters in the above codes are described as follows:
vector_size, dimension of word vector.
Window-size of contextual window.
Min_count ignores words with word frequencies less than this value.
Workers the number of CPU cores used for training.
After training is completed, the generated word vector model is stored.
And (3) word segmentation is carried out on the comment text collected in the step S1 by using a word vector model, and each word is converted into a corresponding word vector.
And calculating association degree information between each comment word and the scenic spot keyword, wherein the association degree information comprises semantic space dissimilarity degree and context embedding resonance degree.
The acquisition process of the semantic space dissimilarity degree comprises the following steps:
Step one, obtaining word vector representations of comment words and scenic spot keywords by using a trained word vector model;
calculating semantic space dissimilarity between comment words and scenic spot keywords by using cosine similarity:
wherein, AndWord vectors respectively representing comment words and scenic spot keywords;
The semantic space dissimilarity reflects the similarity degree of the comment words and scenic spot keywords in the word vector semantic space. Specifically, the semantic space dissimilarity measures the relation of an included angle between two vectors, and the closer the value is to 1, the more similar the vectors are, and the smaller the included angle is; whereas a value closer to-1 indicates that the vectors are opposite and the included angle is larger. The larger semantic space heterohomology indicates that comment words and scenic spot keywords are more similar in semantic space and have closer semantic meaning; conversely, smaller heterolites mean that they differ significantly in semantic space, with different semantic features. Therefore, the size of the semantic space dissimilarity reflects the semantic association degree between the comment words and the scenic spot keywords.
The context embedding resonance degree acquisition process comprises the following steps:
Firstly, constructing a vocabulary distribution matrix for the whole corpus by using a trained word vector model, wherein each row represents a vocabulary, each column represents a position in the context, and elements in the matrix represent the probability of the word in the corresponding position;
step two, calculating the resonance degree between the comment word and the scenic spot keyword by using the vocabulary distribution matrix:
wherein, The vocabulary distribution of the comment words is represented,The vocabulary distribution of scenic spot keywords is represented.
Context embedding resonation reflects the degree of resonation of comment words with scenic spot keywords in context, i.e., the degree of semantic entanglement of them in different contexts. The size of the resonance degree reflects the distribution similarity of the two words in the context, and the larger the distribution of the comment words and scenic spot keywords in the context is, the stronger semantic entanglement and resonance are achieved; conversely, smaller means that their distribution in the context varies more, with weaker semantic entanglement and resonance. Therefore, the change in resonance can be used to measure the similarity of context information between comment words and scenic spot keywords, helping to capture their degree of semantic association in different contexts.
The semantic space dissimilarity degree and the context embedding resonance degree of each comment word and the scenic spot keyword are comprehensively processed to obtain a relevancy score, for example, the relevancy score can be obtained through the following calculation formula:
Wherein:
a relevancy score representing the relevancy of the comment word and the scenic spot keyword;
And Is an adjustable hyper-parameter for balancing the effects of semantic spatial dissimilarity and context embedding resonance.
The formula for calculating the relevancy score adopts a logistic function form, and aims to express relevancy between comment words and scenic spot keywords through a complex nonlinear function, introduce a nonlinear relation, capture complex relevancy between comment words and scenic spot keywords more flexibly, and adapt to different weight distribution through adjustable super-parameters. Normalization of the formulas and exponential function mapping ensures that the score is within a reasonable range and improves smoothness.
The relevancy score represents the degree of relevancy between the comment word and the scenic spot keyword. Specifically, the larger association score indicates that the comment word and the scenic spot keyword are more similar in terms of semantic space dissimilarity and context embedding resonance degree, and have stronger association; conversely, smaller association scores indicate that they differ more in both aspects and are less associated. Therefore, the size of the association score reflects the semantic association degree between the comment word and the scenic spot keyword, and the larger the association is, the stronger the association and the smaller the association is.
And comparing the association degree score value with a correlation threshold value, and if the association degree score value is larger than or equal to the correlation threshold value, indicating that the association degree between the comment word and the scenic spot keyword reaches or exceeds a preset association degree standard. Meaning that the comment words are similar to scenic spot keywords semantically or have stronger resonance in the context, the comment words conform to the set association requirements, related signals are generated, and corresponding comment words are extracted;
Otherwise, if the association score value is smaller than the association threshold value, the association degree between the comment word and the scenic spot keyword does not reach the preset association degree standard. This means that the comment word is semantically different from the scenic spot keyword or resonates weakly in context, and does not meet the set association requirement, and the comment word is not extracted when an irrelevant signal is generated.
According to the invention, through training of the Word2Vec model and relevancy calculation of comment words, deep mining and semantic analysis of guest comment data are realized. Firstly, through collection and training of a large-scale corpus, word2Vec models can generate Word vectors with semantic information for each Word, and semantic relations among words are better captured. Secondly, through calculating semantic space dissimilarity degree and context embedding resonance degree, the association degree of comment words and scenic spot keywords in word vector space and context is comprehensively considered. The method effectively solves the problem of the traditional screening method, and improves accuracy and comprehensiveness of comment data of tourists. Finally, through setting the association threshold, the targeted extraction of comment vocabularies is realized, so that the obtained related information is more accurate, and the method is beneficial to the subsequent emotion analysis and calculation of attention degree trend. Therefore, the step S2 has the beneficial effects of improving deep understanding and relevance evaluation of tourist comment data and providing a more reliable basis for comprehensive analysis of subsequent steps.
Step S3 includes the following:
According to the collected tourist comment data, the tourist comment data are ordered according to time, and a time sequence of the concerned trend is constructed;
and carrying out emotion analysis on each comment to obtain emotion scores. This may use methods such as emotion dictionary, machine learning model, etc.;
Comparing the emotion score with a corresponding threshold value, and classifying the emotion score into three categories:
front face: comments having a score above the threshold are identified as positive emotions.
Negative: comments having a score below the threshold are identified as negative emotions.
And (3) neutral: comments having scores within a threshold range are considered neutral emotions.
Extracting image information of the data according to the data subjected to emotion classification, wherein the image information comprises a positive attention trend score and a public praise stability score;
the acquisition process of the positive attention trend score comprises the following steps:
Grouping comments according to time windows, wherein rated overlapping degree exists between each time window, and comparing positive emotion proportions of different time windows to obtain a time sequence of attention;
for example, the positive emotion scale within each time window may be calculated using the following formula:
wherein, The number of positive emotion comments is represented,Representing the total number of comments for the time window.
After the attention score of each time window is obtained, the scores are combined into a time sequence, so that the change of the attention trend of tourists can be better understood. The method combines time factors and emotion analysis, and provides more comprehensive and accurate information for evaluating the attention.
Calculating the average value of the proportion of the positive emotion comments in each time window by using a time sequence analysis method to obtain a positive attention trend score, wherein the calculation formula is as follows:
wherein, The trend score is positively focused on,Representing the positive emotion scale for each time window,Representing the total number of reviews.
The positive attention trend score is used to represent the average proportion of attention scores within each time window as positive emotion comments. The higher score indicates a higher degree of interest and trend analysis may be determined by comparing the scores of the different time windows.
The larger the attention score, the stronger the attention trend, i.e. the higher the proportion of positive emotion comments. Meaning that within a specific time window, the tourist has high attention to scenic spots or specific topics, and positive interests and expectations are expressed on related information. A greater attention score may be interpreted as a user on the social media or comment platform that is more inclined to focus on and discuss topics related to scenic spots during this period, showing potential guest interests. Conversely, a smaller attention score indicates a weaker tendency to attention, i.e., a lower proportion of positive emotional reviews. Indicating that the interest in scenic spots or related topics is relatively low and the interest of guests is relatively weak within a particular time window. A smaller attention score may reflect that the attraction is not drawing enough attention or discussion during the time period and that further analysis and improvement may be needed to increase the attention of the attraction.
The acquisition process of the public praise stability is as follows:
fitting the time distribution of the positive comments by using a normal distribution model to obtain parameters of the normal distribution, wherein the normal distribution is set as WhereinIs the mean value of the two values,Time distance of comments as standard deviationThe stability score is calculated as:
wherein, The probability density of the positive comment at each time point, i.e. the public praise stability, is represented.
The public praise stability score reflects the distribution characteristic of the public praise in time, and is a comprehensive measure of the public praise change trend. In particular, a larger public praise stability score indicates that the positive public praise is more evenly and stably distributed over time, i.e. the tendency to be liked by the user is relatively continuous and without large fluctuations. This may indicate that the scenic spot or product has a relatively long lasting popularity and that the user's satisfaction is relatively stable. Conversely, a smaller public praise stability score indicates that the public praise is less stable in time distribution, and the conditions that public praise fluctuation is large and obvious changes occur in user evaluation may exist. Thus, the magnitude of the stability score of the public praise directly reflects the trend of the public praise, with larger representations being more stable and smaller representations being less stable. This provides the business or attraction manager with important information about the user's public praise experience and continuous appeal, helping to adjust the business strategy, improve service to ensure the stability and sustainability of public praise.
The positive attention trend score and the public praise stability score are comprehensively processed to obtain a virtual association quotient, for example, the virtual association quotient can be calculated by the following formula:
wherein, Representing the quotient of the virtual association degree,AndThe positive attention trend score and the public praise stability score,AndThe preset scaling factors for the positive attention trend score and the public praise stability score are respectively greater than zero.
The virtual association quotient reflects the comprehensive evaluation of the association degree of the virtual resources, and emphasizes the relevance and importance of scenic spots in the virtual space.
Herein, virtual resources refer to various online presence and impact of scenic spots in virtual space, including but not limited to, attention on social media, user comments, praise evaluations, and the like. The digitized information and online activities form an avatar and brand of the scenery on the internet, reflecting the user's attention to, evaluation of, and interaction with the scenery in the virtual space. Therefore, the virtual resource covers various scenic spot related data and information generated in the digital environment, and has great significance for evaluating the association degree and influence of the scenic spots in the virtual space. The purpose of the virtual association quotient is to comprehensively measure the association degree and the importance of scenic spots in virtual resources by comprehensively processing the positive attention trend and the public praise stability.
The virtual association quotient is used for reflecting the association degree and the importance of scenic spots in the virtual space. The index comprehensively considers the positive attention trend score and the public praise stability score, and aims to provide comprehensive evaluation of the relevance of virtual resources of the scenery region. Specifically, a larger virtual relevance quotient indicates that the relevance and importance of the scenic spot in the virtual space are higher. This may mean that the attraction continues to receive positive attention in social media and online reviews, while the user is more consistent and stable in their public praise rating. Thus, a large virtual relevance quotient reflects that scenic spots have a strong brand impact and popularity in the virtual environment.
Conversely, a smaller virtual relevance quotient indicates that the relevance and importance of the attraction in the virtual space is lower. This may indicate that the attraction is of relatively low interest in social media and online reviews, while the user has some inconsistency or volatility in his public praise. Thus, a smaller virtual relevance quotient may suggest a need to further focus on and improve the image and user interaction of the attraction in the virtual environment to promote its relevance in the virtual space. In combination, the magnitude of the virtual association quotient reflects the overall influence and popularity of the scenic spot in the virtual space, and provides an important reference basis for virtual resource management of the scenic spot.
Step S4 includes the following:
And counting virtual association degree quotient of each scenic spot in the target area, ranking the scenic spots according to the virtual association degree quotient from large to small to obtain a ranking table, obtaining a plurality of ranking tables in a unit operation period, and calculating the ratio of the overall ranking fluctuation index to the overall virtual association degree quotient average value to obtain a virtual toughness index. For example, the calculation formula is as follows:
wherein, Representing a virtual toughness index;
the standard deviation of the ranking of the virtual relevance quotient is represented and is used for measuring the fluctuation of the ranking;
Representing the average absolute fluctuation degree of the virtual association quotient ranking, taking the absolute change of the ranking into account;
Representing the average value of all scenic spot virtual association quotient.
Comparing the virtual toughness index with a risk threshold;
And if the virtual toughness index is greater than or equal to the risk threshold, indicating that the virtual resource has higher toughness in the virtual space. This means that the overall rank of the attraction is relatively low fluctuating, being relatively stable compared to the overall virtual relevance quotient. In a digitized environment, scenic spots are relatively capable of coping with changes, maintaining a relatively stable degree of virtual relevance. Under the condition, the operation of the scenic spot in the virtual space is considered to be relatively reliable, the scenic spot has strong anti-fluctuation performance, and a scenic spot operation stable signal is generated.
Otherwise, if the virtual toughness index is smaller than the risk threshold, it indicates that the virtual resource has lower toughness in the virtual space. This may mean that the overall rank of the attraction is relatively fluctuating, and is less stable than the overall virtual relevance quotient. In a digitized environment, scenic spots may be more susceptible to external factors, resulting in greater uncertainty in the degree of virtual association. Under the condition, attention and measures are required to be paid to improve the stability and the fluctuation resistance of the scenic spot in the virtual space, generate an operation risk signal and send out an early warning prompt.
According to the invention, through deep mining of tourist comment data, emotion analysis and attention trend calculation as well as evaluation of public praise stability are carried out, and finally virtual relevance quotient is obtained. Comprehensively considers the attention degree, the emotional tendency of the user, the public praise stability and other factors of the scenic spot in the virtual space, and comprehensively evaluates the association degree and the importance of virtual resources of the scenic spot in the digital environment. By calculating the ranking, the volatility index and the toughness index, the evaluation of the overall influence of the scenery is provided, and the stability and the volatility resistance of the operation can be found. The manager can more comprehensively and deeply understand the running state of the scenic spot in the virtual space, discover potential risks and improvement spaces in time, and provide powerful decision support for digital operation of the scenic spot. Thus, this overall framework has significant benefits in assessing the operational risk and management effects of scenic spot virtual resources.
The above formulas are all formulas with dimensionality removed and numerical calculation, the formulas are formulas with the latest real situation obtained by software simulation through collecting a large amount of data, and preset parameters and threshold selection in the formulas are set by those skilled in the art according to the actual situation.
The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more sets of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) to perform all or part of the steps described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or other various media capable of storing program codes.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Finally: the foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (3)

1. The travel area resource assessment method based on the image difference is characterized by comprising the following steps of:
step S1, obtaining tourist comment data related to scenic spots in a target website by utilizing a crawler technology;
step S2, based on Word2Vec model training, combining semantic space dissimilarity degree and context embedding resonance degree, and screening related words;
Step S3, comprehensively calculating a positive attention trend score and a public praise stability score by combining a positive attention trend and public praise stability through time sequence analysis and emotion analysis to obtain a virtual relevance quotient;
Step S4, calculating the virtual association degree quotient of different scenic spots in the same area, calculating the ratio of the overall ranking fluctuation index to the overall virtual association degree quotient, and evaluating the running risk of the scenic spots in the digitalized environment to obtain corresponding early warning signals;
the acquisition process of the semantic space dissimilarity degree comprises the following steps:
Step one, obtaining word vector representations of comment words and scenic spot keywords by using a trained word vector model;
calculating semantic space dissimilarity between comment words and scenic spot keywords by using cosine similarity:
wherein v comment and v keyword represent word vectors of comment words and scenic spot keywords, respectively;
The context embedding resonance degree acquisition process comprises the following steps:
Firstly, constructing a vocabulary distribution matrix for the whole corpus by using a trained word vector model;
step two, calculating the resonance degree between the comment word and the scenic spot keyword by using the vocabulary distribution matrix:
Wherein P represents the vocabulary distribution of comment words, and Q represents the vocabulary distribution of scenic spot keywords;
the semantic space dissimilarity degree and the context embedding resonance degree of each comment word and the scenic spot keyword are comprehensively processed to obtain a relevance score;
comparing the association degree score value with a correlation threshold value, and generating a related signal if the association degree score value is greater than or equal to the correlation threshold value; otherwise, if the association degree score value is smaller than the association threshold value, generating an irrelevant signal;
Step S3 includes the following:
According to the collected tourist comment data, the tourist comment data are ordered according to time, and a time sequence of the concerned trend is constructed;
Carrying out emotion analysis on each comment to obtain emotion scores;
extracting image information of the data according to the data subjected to emotion classification, wherein the image information comprises a positive attention trend score and a public praise stability score;
the acquisition process of the positive attention trend score comprises the following steps:
Grouping comments according to time windows, wherein rated overlapping degree exists between each time window, and comparing positive emotion proportions of different time windows to obtain a time sequence of attention;
calculating the average value of the proportion of the positive emotion comments in each time window by using a time sequence analysis method, and obtaining a positive attention trend;
The acquisition process of the public praise stability is as follows:
Fitting the time distribution of the positive comments by using a normal distribution model to obtain parameters of the normal distribution, and setting the normal distribution as N (mu, sigma 2), wherein mu is the mean value, sigma is the standard deviation, and the calculation formula of the stability score is as follows:
Wherein S stability represents the probability density of the positive comment at each time point, i.e. the public praise stability;
comprehensively processing the front attention trend score and the public praise stability score to obtain a virtual association quotient;
step S4 includes the following:
Counting virtual association degree quotient of each scenic spot in a target area, ranking the scenic spots according to the virtual association degree quotient from large to small to obtain a ranking table, obtaining a plurality of ranking tables in a unit operation period, and calculating the ratio of the overall ranking fluctuation index to the overall virtual association degree quotient average value to obtain a virtual toughness index;
comparing the virtual toughness index with a risk threshold;
If the virtual toughness index is greater than or equal to the risk threshold, generating a scenic spot operation stability signal;
and otherwise, if the virtual toughness index is smaller than the risk threshold, generating an operation risk signal.
2. The image difference-based travel zone resource assessment method according to claim 1, wherein:
Step S1 includes the following:
And (3) sequentially crawling tourist comment data of a plurality of target websites by using the constructed asynchronous crawler frame, recording a time stamp according to crawling at last when crawling according to a set timing task, only acquiring newly-added tourist comment data when crawling next time, and preprocessing the crawled original data, wherein the preprocessing comprises cleaning and de-duplication.
3. The image difference-based travel zone resource assessment method according to claim 2, wherein:
step S2 includes the following:
collecting a large-scale text corpus, and training a Word2Vec tool library;
Training a Word vector model by using a trained Word2Vec tool library;
After training is completed, word segmentation is carried out on the comment text collected in the step S1 by using a word vector model, and each word is converted into a corresponding word vector;
And calculating association degree information between each comment word and the scenic spot keyword, wherein the association degree information comprises semantic space dissimilarity degree and context embedding resonance degree.
CN202410355288.8A 2024-03-27 2024-03-27 Tourist area resource assessment method based on image difference Active CN118132818B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410355288.8A CN118132818B (en) 2024-03-27 2024-03-27 Tourist area resource assessment method based on image difference

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410355288.8A CN118132818B (en) 2024-03-27 2024-03-27 Tourist area resource assessment method based on image difference

Publications (2)

Publication Number Publication Date
CN118132818A CN118132818A (en) 2024-06-04
CN118132818B true CN118132818B (en) 2024-08-27

Family

ID=91235372

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410355288.8A Active CN118132818B (en) 2024-03-27 2024-03-27 Tourist area resource assessment method based on image difference

Country Status (1)

Country Link
CN (1) CN118132818B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110032639A (en) * 2018-12-27 2019-07-19 中国银联股份有限公司 By the method, apparatus and storage medium of semantic text data and tag match
CN111340385A (en) * 2020-03-10 2020-06-26 深圳华侨城创新研究院有限公司 Scientific measuring method for measuring joy index of tourist attraction

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10789429B2 (en) * 2018-11-21 2020-09-29 Intuit, Inc. Visualizing comment sentiment
CN111414753A (en) * 2020-03-09 2020-07-14 中国美术学院 Method and system for extracting perceptual image vocabulary of product
CN113792118A (en) * 2021-09-08 2021-12-14 浙江力石科技股份有限公司 Satisfaction improving system and method based on scenic spot evaluation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110032639A (en) * 2018-12-27 2019-07-19 中国银联股份有限公司 By the method, apparatus and storage medium of semantic text data and tag match
CN111340385A (en) * 2020-03-10 2020-06-26 深圳华侨城创新研究院有限公司 Scientific measuring method for measuring joy index of tourist attraction

Also Published As

Publication number Publication date
CN118132818A (en) 2024-06-04

Similar Documents

Publication Publication Date Title
CA3129745C (en) Neural network system for text classification
Zamani et al. Neural query performance prediction using weak supervision from multiple signals
CN108829822B (en) Media content recommendation method and device, storage medium and electronic device
CN108280114B (en) Deep learning-based user literature reading interest analysis method
CN114238573B (en) Text countercheck sample-based information pushing method and device
CN111581545B (en) Method for sorting recall documents and related equipment
CN111797214A (en) FAQ database-based problem screening method and device, computer equipment and medium
EP3729231A1 (en) Domain-specific natural language understanding of customer intent in self-help
CN109783631B (en) Community question-answer data verification method and device, computer equipment and storage medium
CN111753167B (en) Search processing method, device, computer equipment and medium
JP2009093649A (en) Recommendation for term specifying ontology space
US20200278976A1 (en) Method and device for evaluating comment quality, and computer readable storage medium
US20130173605A1 (en) Extracting Query Dimensions from Search Results
CN106407316B (en) Software question and answer recommendation method and device based on topic model
Sajeev et al. Effective web personalization system based on time and semantic relatedness
CN113392195A (en) Public opinion monitoring method and device, electronic equipment and storage medium
Wei et al. Online education recommendation model based on user behavior data analysis
CN110019763B (en) Text filtering method, system, equipment and computer readable storage medium
CN110413763B (en) Automatic selection of search ranker
Voronov et al. Forecasting popularity of news article by title analyzing with BN-LSTM network
CN118132818B (en) Tourist area resource assessment method based on image difference
CN116956818A (en) Text material processing method and device, electronic equipment and storage medium
JP2009211429A (en) Information provision method, information provision apparatus, information provision program and recording medium having the program recorded in computer
Wu Study on news recommendation of social media platform based on improved collaborative filtering
Lu et al. Semantic link analysis for finding answer experts

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant