CN117493906A

CN117493906A - City event allocation method, system and storage medium

Info

Publication number: CN117493906A
Application number: CN202311533241.8A
Authority: CN
Inventors: 余雁; 苏如春; 岑道岸
Original assignee: Guangzhou Hantele Communication Co ltd
Current assignee: Guangzhou Hantele Communication Co ltd
Priority date: 2023-11-16
Filing date: 2023-11-16
Publication date: 2024-02-02

Abstract

The invention belongs to the technical field of data processing, and particularly relates to a city event distributing method, a city event distributing system and a storage medium. The method comprises the following steps: converging city historical event data, and extracting structured data in the event data; constructing a preset event list library; acquiring reported event data and extracting event keywords; identifying the event keywords according to the extracted event keywords, and determining event types; matching and matching the determined event type with a preset event list library to determine the event service type and the responsibility department; and carrying out event distribution according to the matching result. The invention can reduce the number of keywords, accurately select proper keywords, improve the matching efficiency and the matching precision, and further improve the allocation efficiency and the allocation accuracy.

Description

City event allocation method, system and storage medium

Technical Field

The invention belongs to the technical field of data processing, and particularly relates to a city event distributing method, a city event distributing system and a storage medium.

Background

Under the promotion of a new technological revolution, the current society is accelerating to a digital society, the development of the digital society is not separated from the construction of smart cities, and the command cities effectively integrate various city management systems by using an information communication technology, so that information resource sharing and business coordination among the city systems are realized. However, for deep construction and comprehensive operation of smart cities, the types and the number of urban event data are increasingly increased, and with the improvement of urban informatization, the access source of the event data is complicated, so that the inefficiency of urban event distribution is caused.

The problems of urban event allocation at present are that firstly, the system is distributed by the present urban event allocation personnel based on subjective judgment, the service flow efficiency is low, and the allocation accuracy is not high. Second, each event data source channel is many, there may be multiple reports of multiple channels for a single event, and the event data is generally unstructured data, and the total event data cannot be deduplicated, resulting in multiple assignments of events.

The prior art CN1 14446287a discloses a city event allocation method and system based on NLP and GIS, the city is divided into a plurality of grid areas in advance; based on GIS space analysis, combining the business department region division data and the supervision department region division data to determine corresponding business departments and supervision departments of each grid region; the event distribution method comprises the following steps: acquiring urban event data, wherein the event data comprises event comprehensive description information and position information; determining the service type and the belonged grid area of the event according to the comprehensive description information and the position information of the urban event; and determining corresponding business departments and supervision departments according to the business types of the events and the grid areas to which the events belong. In the prior art, the event type is determined through the matching quantity of the keywords, but for event description with more data information, the keywords are more in word segmentation, all word segmentation is used for matching, so that the calculated amount is increased undoubtedly, and the matching efficiency is reduced.

Disclosure of Invention

In order to solve the technical problems in the prior art, the invention provides a method, a system and a storage medium for distributing urban events, which are used for carrying out structural processing on event data to finish intelligent distribution of the events and improving the high efficiency and accuracy of the distribution of the urban events.

In order to achieve the above purpose, the technical scheme of the invention is as follows:

a method of urban event distribution, comprising:

s1, converging city historical event data, and extracting structured data in the event data;

s2, constructing a preset event list library by using the structured data of the urban historical event data in the step S1;

s3, acquiring reported event data, segmenting the reported event data according to the summary and the position information of the reported event data, and extracting event keywords;

s4, identifying the event keywords according to the event keywords extracted in the step S3, and determining event types;

s5, matching and matching with a preset event list library according to the event type determined in the step S4, and determining the event service type and the responsibility department to which the event service type belongs;

s6, carrying out event allocation according to the matching result in the step S5.

Further, the city historical event data in step S1, including structured data and unstructured data, is cleaned for unstructured data, the structured data therein is identified, and the identified structured data is marked.

Still further, the structured data includes time, place, event type, treatment department, category level, and the like.

Further, in step S3, hanLP is used for word segmentation.

Further, the specific method for selecting the keywords in step S3 is as follows: calculating the word frequency inverse document frequency value of the jth word in the ith text data, and arranging the word frequency inverse document frequency values of all the words in a descending order, and intercepting a plurality of words from large to small as keywords.

Further, the word frequency inverse document frequency value is calculated by the following method:

TFIDF _ij ＝TF _ij ×IDF _k ，WORD _ij ＝＝gWORD _k

wherein TFIDF _ij Word frequency inverse document frequency value, TF, representing the jth word in the ith text data _ij Representing the frequency of occurrence of the jth word in the ith text, IDF _k WORD representing the inverse document frequency of the kth global WORD _ij gWORD representing the actual character of the jth word in the ith text _k Representing the actual character of the kth global word.

Further, the frequency of occurrence TF of the jth word in the ith text _ij The method is adopted for calculation:

wherein NUM _ij Representing the number of occurrences of the jth word in the ith text, and r representing the number of different words in the ith text.

Further, the inverse document frequency IDF of the kth global word _k The method is adopted for calculation:

wherein TOTAL is _k TOTAL number of text data entries representing words containing kth global word _r Representing the total number of text data entries containing the r-th global word, r representing the number of different words in the text.

Further, the number of keywords is limited to 10 or less.

Further, the determination conditions for selecting the keywords include: TFIDF (tfIDF) _ij Not less than 0.025.

The invention also provides an urban event distribution system which comprises an acquisition module, a preprocessing module and an extraction and identification module, wherein the acquisition module is used for acquiring urban historical event data, the preprocessing module is used for carrying out structuring processing on the urban historical event data, and a preset event list library is constructed according to the event data after structuring processing; the extraction and identification module is used for extracting keywords from the reported event data, identifying the reported event according to the extracted keywords, and displaying the service type and the responsibility department of the reported event.

The present invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the urban event distribution method described above.

Compared with the prior art, the invention has the following beneficial effects:

according to the urban event distribution method provided by the invention, the historical events are firstly utilized to carry out structural processing, then the preset event list library is constructed, then the reported events are subjected to word segmentation and keyword extraction, the keywords are selected by sorting according to the magnitude of the word frequency inverse document frequency value, and meanwhile, the keywords are selected and set according to the setting conditions, so that the number of the keywords can be reduced, the proper keywords can be accurately selected, the matching efficiency and the matching precision are improved, and the distribution efficiency and the distribution accuracy are further improved.

Drawings

Fig. 1 is a flowchart of a city event allocation method provided by the present invention.

FIG. 2 is a flow chart of a method for structuring in the present invention.

Fig. 3 is a framework diagram of the urban event distribution system provided by the invention.

Detailed Description

The technical solutions of the present invention will be clearly described below with reference to the accompanying drawings, and it is obvious that the described embodiments are not all embodiments of the present invention, and all other embodiments obtained by a person skilled in the art without making any inventive effort are within the scope of protection of the present invention.

It is noted that the relative arrangement of the components and steps, numerical expressions, set forth in these embodiments should not be construed as limiting the scope of the present invention unless it is specifically stated otherwise.

The following description of the exemplary embodiment(s) is merely illustrative, and is in no way intended to limit the invention, its application, or uses. Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail herein, but where applicable, should be considered part of the present specification.

The invention provides a city event distributing method, as shown in figure 1, comprising the following steps:

and aggregating city historical event data, and carrying out structuring processing on city event unstructured data of obtained city historical event cases aiming at different business events such as city management, traffic, municipal administration and the like in the past city treatment process to obtain structured data. Unstructured data related to urban events, such as news stories, social media, public databases, etc., are collected from various channels. As shown in fig. 2, the text data is cleaned and preprocessed, including removing useless punctuation marks, special characters, processing cases, removing stop words, performing spelling correction, etc., and then key entities in the text, such as places, characters, organizations, etc., are identified and marked by a named entity recognition technique.

Then, the event is classified, and the text data is classified according to the content and the characteristics of the event. A classifier may be trained using supervised learning methods, such as naive bayes, support vector machines, etc., or classified using rule matching or keyword matching. The time information of the occurrence of the event is extracted from the text. The information about date, time and the like can be captured by using regular expressions or natural language processing technology, and geographic position information of the occurrence of the event can be extracted from the text by using a place name recognition or geocoding technology. The location resolution and encoding may be performed using existing map services or geographic information systems.

The processed structured data is converted to a format suitable for storage and analysis, as shown in table 1:

table 1 structured data store Format example

the event list comprises types classification aiming at various business events, a plurality of keywords, departments for handling the business events and the like, such as events of street lamp inclination, wherein the business types are construction management types, the primary type is public facilities, the secondary type is street lamp components, the problem types comprise inclination, flickering, extinction and the like, and the keywords comprise related keywords such as street lamp inclination, street lamp flickering, street lamp extinction and the like. The preset event list library also comprises keywords of the solution and a disposal department.

further, in step S3, hanLP is used to perform word segmentation, where the word segmentation result of HanLP is composed of words and parts of speech marks, the words and parts of speech are separated by "/", and every two words are separated by a space. When traversing word segmentation results of all texts, ignoring all punctuations according to part-of-speech information, recording the occurrence times of each word in the texts, and sequencing each word in the texts according to dictionary sequences in an orderly mapping mode and corresponding to the occurrence times of each word.

By NUM _ij Representing the number of occurrences of the jth WORD in the ith text in WORD _ij Representing the actual character of the jth word in the ith text. In addition, unique words that have appeared in all text data are summarized, i.e., all words are globally recorded, and the number of text data entries in which these words have appeared is recorded. By TOTAL _k Representing the total number of text data entries containing the kth global word in gWORD _k Representing the actual character of the kth global word. Definition of TF _ij Representing the frequency of occurrence of the jth word in the ith textDefinition of IDF _k The inverse document frequency representing the kth global word. TF (TF) _ij And IDF (IDF) _k Can be obtained by the following and calculation.

Wherein NUM _ij The number of occurrences of the jth word in the ith text is represented, and r represents the number of different words in the text. TOTAL of (TOTAL) _k TOTAL number of text data entries representing words containing kth global word _r Representing the total number of text data entries containing the r-th global word.

Based on the above formula, TF is calculated for each word in the city event data _ij And IDF (IDF) _k Where WORD is to be guaranteed _ij And gWORD _k Consistent, the calculation result is recorded as TFIDF _ij And representing the word frequency inverse document frequency value of the jth word in the ith text data. Finally, the main key words of the urban event data can be obtained by sorting the values in descending order and intercepting a plurality of words which are sorted in the front. The calculation formula is as follows:

TFIDF _ij ＝TF _ij ×IDF _k ，WORD _ij ＝＝gWORD _k

in the method, two thresholds are set for limiting the number of keywords and screening the keywords: the first threshold value represents the keyword number of each city event data and is expressed by KEYNUM, and the method designs the value as an integer of 10; another threshold value represents the minimum requirement of TFIDF value when a word is identified as a keyword, expressed in MINTFIDF, and the method designs the value to be floating point number 0.025.

The keywords of each piece of urban event data can be obtained, and when the keywords of the two pieces of urban event data are identical, and the absolute value of the difference between the timestamp values in the time field corresponding to the urban event data is smaller than a certain critical value, the two pieces of text data are considered to belong to the same event.

For example, taking the case that a road lamp is always in an off state in the event description data "Shen Hailu, which causes pedestrians and vehicles to be unable to see a road surface obstacle at night", using the method in S3, word segmentation processing is performed first, "Shen Hailu/medium road/on/street/always/on/off/state/, which causes/pedestrians/on/vehicle/on/night/unable to see/road surface/obstacle/", and after word segmentation, keywords of the event information are extracted according to the step S3.

S4, identifying event keywords according to the event keywords extracted in the step S3, and determining information such as event types, areas, events, loss degrees and the like;

the invention adopts an event identification method based on rule matching, and matches with the information in the table 2.

Table 2 match data format

Defining STR to represent character string of name to be matched, calculating number of Chinese characters in which STR is repeated with each recorded event type character string of data base by means of character string comparison and using nSTR _i Representing the number of repeated Chinese characters of the type to be matched and the ith type name in the database, and defining similarity STM _i Representing the similarity between the type to be matched and the ith type, and representing the length of the name character string to be matched by len (STR), and the similarity SIM _i Can be represented by

Based on the formula, the similarity between the event with the matching and all types of events can be obtained, and the event type names with the highest similarity can be obtained by sequencing the similarity.

The invention also provides a city event distributing system, as shown in figure 3, comprising an acquisition module, a preprocessing module and an extraction and identification module, wherein the acquisition module is used for acquiring city historical event data, the preprocessing module is used for carrying out structuring processing on the city historical event data, and a preset event list library is constructed according to the event data after structuring processing; the extraction and identification module is used for extracting keywords from the reported event data, identifying the reported event according to the extracted keywords, and displaying the service type and the responsibility department of the reported event.

The above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to examples, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the scope of the technical solution of the present invention, which is intended to be covered by the claims of the present invention.

Claims

1. A method for distributing urban events, comprising:

2. The method according to claim 1, wherein the city history event data in step S1 comprises structured data and unstructured data, the unstructured data is cleaned, the structured data is identified, and the identified structured data is marked.

3. The method of claim 1, wherein the segmentation is performed in step S3 using HanLP; the specific method for selecting the keywords in the step S3 is as follows: calculating the word frequency inverse document frequency value of the jth word in the ith text data, and arranging the word frequency inverse document frequency values of all the words in a descending order, and intercepting a plurality of words from large to small as keywords.

4. A method according to claim 3, wherein the number of keywords is limited to 10 or less.

5. A method according to claim 3, wherein the term frequency inverse document frequency value is calculated by:

TFIDF _ij ＝TF _ij ×IDF _k ,WORD _ij ＝＝gWORD _k

wherein TFIDF _ij Word frequency inverse document frequency value, TF, representing the jth word in the ith text data _ij Representing the frequency of occurrence of the jth word in the ith text, IDF _k WORD representing the inverse document frequency of the kth global WORD _ij Real representing the jth word in the ith textInter-character, gWORD _k Representing the actual character of the kth global word.

6. The method of claim 5, wherein the frequency of occurrence TF of the jth word in the ith text _ij The method is adopted for calculation:

7. The method of claim 5, wherein the inverse document frequency IDF of the kth global word _k The method is adopted for calculation:

8. The method of claim 5, wherein the decision condition for selecting a keyword comprises: TFIDF (tfIDF) _ij Not less than 0.025.

9. The urban event distribution system is characterized by comprising an acquisition module, a preprocessing module and an extraction and identification module, wherein the acquisition module is used for acquiring urban historical event data, the preprocessing module is used for carrying out structuring processing on the urban historical event data, and a preset event list library is constructed according to the event data after structuring processing; the extraction and identification module is used for extracting keywords from the reported event data, identifying the reported event according to the extracted keywords, and displaying the service type and the responsibility department of the reported event.

10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the urban event distribution method according to any of claims 1-8.