CN117591578B - Data mining system and mining method based on big data - Google Patents
Data mining system and mining method based on big data Download PDFInfo
- Publication number
- CN117591578B CN117591578B CN202410073443.7A CN202410073443A CN117591578B CN 117591578 B CN117591578 B CN 117591578B CN 202410073443 A CN202410073443 A CN 202410073443A CN 117591578 B CN117591578 B CN 117591578B
- Authority
- CN
- China
- Prior art keywords
- data
- image
- module
- video
- key
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000007418 data mining Methods 0.000 title claims abstract description 55
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000005065 mining Methods 0.000 title claims abstract description 10
- 238000012545 processing Methods 0.000 claims abstract description 45
- 230000002159 abnormal effect Effects 0.000 claims abstract description 19
- 238000013500 data storage Methods 0.000 claims description 49
- 238000007726 management method Methods 0.000 claims description 23
- 238000004140 cleaning Methods 0.000 claims description 18
- 238000004458 analytical method Methods 0.000 claims description 17
- 238000012098 association analyses Methods 0.000 claims description 12
- 238000007405 data analysis Methods 0.000 claims description 9
- 238000003384 imaging method Methods 0.000 claims description 5
- 238000012795 verification Methods 0.000 claims description 5
- 230000005856 abnormality Effects 0.000 claims description 4
- 230000003993 interaction Effects 0.000 claims description 4
- 238000009432 framing Methods 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 5
- 238000013481 data capture Methods 0.000 abstract description 2
- 239000000284 extract Substances 0.000 abstract 1
- 238000003326 Quality management system Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/26—Visual data mining; Browsing structured data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6227—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/49—Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Quality & Reliability (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Computing Systems (AREA)
- Fuzzy Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a data mining system and a data mining method based on big data, and belongs to the technical field of big data. In order to solve the problems of limited data capture types and low data utilization rate, the data acquisition unit acquires and captures basic data, image data and video data files, the text reading module, the image processing module and the video processing module can effectively process data files in various formats, so that the data in multiple formats can be classified and mined more comprehensively, the data mining effect is improved, the data mining unit extracts key bytes for capturing a data set to check and reject abnormal data, the key bytes are checked, the data in the captured data set can be effectively checked and screened, the cost and difficulty of data mining are effectively reduced, mining, storing and utilizing can be comprehensively and systematically carried out, the data utilization is carried out in a unified system pertinence mode, and the data utilization rate is improved.
Description
Technical Field
The invention relates to the technical field of big data, in particular to a data mining system and a mining method based on big data.
Background
Data mining refers to the process of analyzing and summarizing a large amount of collected data by using a proper statistical analysis method, extracting useful information and forming conclusions to study and summarize the data in detail. This process is also a supporting process for the quality management system.
Related patents such as publication number CN106339451a disclose a data mining system based on big data, including an information system, a data mining application server and an industry client; the information system is used for collecting and processing industry data of preset conditions of users and accessing the industry data into the system through a bus; the data mining application server is used for extracting, converting and loading data aiming at industry data preset by a user, and importing a data mining result into an industry client; the industry client is used for providing the final data after analysis and processing for the user for the client to extract. Different industry data such as bank data, gene sequences, financial control and the like can be preset by a user according to the self requirements, and the data mining application server performs targeted analysis processing according to preset conditions of the user, so that the system is simple in structure, clear in purpose and high in efficiency.
The above patent has the following problems in actual operation:
1. when the data is grabbed before being mined, the data format is single, so that the problems of less types and low breadth of the grabbed data can be brought, and the data mining effect is influenced;
2. the existing data analysis system can only realize simple statistical processing on the existing data, but cannot perform deep data mining analysis on the running state of an enterprise according to the existing data, so that the data cannot be fully utilized.
Disclosure of Invention
The invention aims to provide a data mining system and a mining method based on big data, so as to solve the problems in the background technology.
In order to achieve the above purpose, the present invention provides the following technical solutions: a big data based data mining system, comprising:
the data acquisition unit is used for:
collecting basic data, image data and video data files, generating a basic data file, reading text information of the basic data to obtain a keyword set, extracting text information and picture characteristics of the image data file, capturing the keyword set in the basic data and the text information and picture characteristics in the image data file, and generating a captured data set;
a data storage unit for:
interacting with a cloud platform, carrying out data set distributed storage and encryption on a keyword set in basic data, text information and picture characteristics in an image data file and a key region in a video data file, and sharing data through a network on the basis of the cloud platform;
a data mining unit for:
extracting key bytes of the grabbing data set, checking and eliminating abnormal data to generate de-abnormal data, and cleaning the data of the basic data file based on the de-abnormal data to generate a determined data set;
a data feedback unit for:
carrying out data retrieval and retrieval result display and reminding on the cloud platform;
cloud platform for:
after the classified information is stored and edited through the cloud, the classified information is transmitted to a data storage unit, and feedback information is returned;
a user terminal for:
the system is used for storing, operating and implementing the data acquisition unit, the data storage unit, the data mining unit and the data feedback unit, and comprises at least one login end and at least one control terminal, when the control terminal works, the data acquisition unit, the data storage unit, the data mining unit and the data feedback unit are operated, and the data mining system based on big data is realized, and the login end is connected to the control terminal, the cloud platform and the server through the Internet.
Further, the data acquisition unit includes:
the file acquisition module is used for:
collecting basic data, image data and video data files, and generating a basic data file based on the basic data, the image data and the video data files;
a text reading module for:
reading the text information of the basic data, dividing the text information to obtain a plurality of extracted words in the text information, and cleaning the extracted words according to the part-of-speech statistical characteristics to obtain a keyword set;
an image processing module for:
extracting text information and picture characteristics of the image data files, creating associated stamps for the image data files, obtaining associated stamps of each image data file, wherein the associated stamp of each image data file is a global unique associated stamp, and carrying out associated stamp association on the extracted text information and picture characteristics of each image data file;
the video processing module is used for:
cutting out a video key segment of a video data file, framing the video key segment to obtain a plurality of frames of video images, and determining a key area in each video image, wherein each video image comprises a key area;
the information grabbing module is used for:
and capturing the keyword set in the basic data, the text information and the picture characteristics in the image data file and the key region in the video data file, and generating a captured data set based on the captured data.
Further, the data storage unit includes:
a data storage module for:
the method comprises the steps of interacting with a cloud platform, carrying out data set distributed storage on a keyword set in basic data, text information and picture characteristics in an image data file and a key area in a video data file, and positioning data storage information through the cloud platform;
a data encryption module for:
interacting with a data storage module, and encrypting the distributed storage of the data set;
a data sharing module, configured to:
and interacting with the cloud platform, processing data in the cloud platform and sharing the data through a network.
Further, the data mining unit includes:
the abnormality rejection module is used for:
extracting key bytes of the grabbing data set for verification, carrying out association analysis on the key bytes, determining abnormal distinguishing key bytes, extracting abnormal data corresponding to the distinguishing key bytes from each item of data for eliminating, and generating de-abnormal data based on the grabbing data set after eliminating the abnormal data;
the data cleaning module is used for:
and cleaning the data of the basic data file based on the de-alien data, and cleaning and screening the basic data file based on the association stamp corresponding to the de-alien data during data cleaning to generate a determined data set.
Furthermore, the anomaly rejection module performs association analysis on the grabbing data set during verification, builds a data association analysis model, inputs the grabbing data set into the data association analysis model for data analysis, and outputs an analysis report based on a data analysis result.
Further, the data feedback unit includes:
the quick search module is used for:
interacting with the cloud platform and providing query service based on an index system;
the data feedback module is used for:
and displaying and reminding through the display equipment according to the retrieval result of the quick retrieval module.
Further, the cloud platform includes:
cloud database for:
classifying and storing a keyword set in the received basic data, text information and picture characteristics in an image data file and a key area in a video data file according to a data stream label;
a data processing module for:
grouping the classified stored data according to stream attribute information and data content, and classifying and marking, wherein the grouping comprises a basic data set, an image data set and a video data set;
the data matching module is used for:
and respectively matching the data sets processed and grouped by the data processing module according to the attribute information of the data storage unit to generate the data requirements of the corresponding data streams, and butting the data requirements of the data streams with the data sets.
Further, the data encryption module includes:
an encryption management sub-module for:
dividing the data set into a plurality of parts according to the distributed storage condition of the data set, registering an encryption method in each part, then carrying out use record aiming at the encryption method, carrying out key management aiming at a key in the use process of the encryption method, and forming a key index by combining the key management in the use record;
an encryption processing sub-module for:
and the distributed storage information of the distributed storage of the data set is obtained through interaction with the data storage module, encryption processing is carried out on an encryption method in a corresponding part according to the distributed storage information, and meanwhile, a secret key in the encryption processing process is fed back to the encryption management sub-module.
Further, the video processing module determines a plurality of frames of video images obtained by combining when determining a key region in each video image, including:
image recognition is carried out on the video image, imaging conditions in the video image are recognized, and an image recognition result is obtained;
dividing the video image into a plurality of areas according to the image recognition result;
the video image is analyzed in combination with the adjacent frame video image by the following formula:
,
,
in the above-mentioned formula(s),indicate->Analytical data value of block area, < >>Representing a symbolic function +_>Indicate->First->Block area +.>Image information of individual feature points, < >>Indicate->Frame video image NoBlock area +.>Image information of individual feature points, < >>Indicate->Frame video mapImage +.>Block area +.>Image information of individual feature points, < >>Indicate->First->The number of feature points in the block area, +.>Represent the firstFirst->The number of feature points in the block area, +.>Indicate->First->The number of feature points in the block area, +.>Representing the total frame number of the video image, +.>Indicate->Analysis of the block region, when->When indicate->The block areas are used to constitute the critical areas when +.>When indicate->The block regions are not used to construct critical regions;
the regions for constituting the key regions are combined together to form the key regions of the video image with reference to the analysis result.
The invention provides a mining method of a data mining system based on big data, which comprises the following steps:
step one: the data acquisition unit acquires basic data, image data and video data files and generates a basic data file, and performs characteristic grabbing based on the basic data file and generates a grabbing data set;
step two: the data storage unit performs data set distributed storage and encryption on the grabbing data set;
step three: the data mining unit cleans and rejects the abnormal data and generates a determined data set;
step four: the data storage unit stores the determined data set based on the cloud platform;
step five: the data storage unit shares data through a network, and the data feedback unit retrieves and displays the data based on the cloud platform.
Compared with the prior art, the invention has the beneficial effects that:
1. under the prior art, when the data is captured before being captured, the data format is single, so that the problems of less captured data types and low breadth can be brought to influence the effect of data capture.
2. Under the prior art, the existing data analysis system can only realize simple statistical processing on the existing data, but cannot perform deep data mining analysis on the running state of an enterprise according to the existing data, and is difficult to fully utilize the data.
Drawings
FIG. 1 is a schematic diagram of a system module according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a data mining system based on big data includes:
the data acquisition unit is used for:
collecting basic data, image data and video data files, generating a basic data file, reading text information of the basic data to obtain a keyword set, extracting text information and picture characteristics of the image data file, capturing the keyword set in the basic data and the text information and picture characteristics in the image data file, and generating a captured data set;
a data storage unit for:
interacting with a cloud platform, carrying out data set distributed storage and encryption on a keyword set in basic data, text information and picture characteristics in an image data file and a key region in a video data file, and sharing data through a network on the basis of the cloud platform;
a data mining unit for:
extracting key bytes of the grabbing data set, checking and eliminating abnormal data to generate de-abnormal data, and cleaning the data of the basic data file based on the de-abnormal data to generate a determined data set;
a data feedback unit for:
carrying out data retrieval and retrieval result display and reminding on the cloud platform;
cloud platform for:
after the classified information is stored and edited through the cloud, the classified information is transmitted to a data storage unit, and feedback information is returned;
a user terminal for:
the system is used for storing, operating and implementing the data acquisition unit, the data storage unit, the data mining unit and the data feedback unit, and comprises at least one login end and at least one control terminal, when the control terminal works, the data acquisition unit, the data storage unit, the data mining unit and the data feedback unit are operated, and the data mining system based on big data is realized, and the login end is connected to the control terminal, the cloud platform and the server through the Internet.
Specifically, when the system works, firstly, the data acquisition unit acquires basic data, image data and video data files and generates basic data files, characteristic capture is performed based on the basic data files and a captured data set is generated, secondly, the data storage unit performs data set distributed storage and encryption on the captured data set, then the data mining unit cleans and rejects abnormal data and generates a determined data set, the data storage unit stores the determined data set based on the cloud platform, when the data needs to be applied, the data storage unit shares the data through a network, and the data feedback unit searches and displays the data based on the cloud platform.
In order to solve the technical problems that when data is grabbed before being mined, the data format is single, so that the problems of less types and low breadth of grabbed data can be brought, and the effect of data mining is affected, the invention provides the following technical scheme:
the data acquisition unit includes:
the file acquisition module is used for:
collecting basic data, image data and video data files, and generating a basic data file based on the basic data, the image data and the video data files;
a text reading module for:
reading the text information of the basic data, dividing the text information to obtain a plurality of extracted words in the text information, and cleaning the extracted words according to the part-of-speech statistical characteristics to obtain a keyword set;
an image processing module for:
extracting text information and picture characteristics of the image data files, creating associated stamps for the image data files, obtaining associated stamps of each image data file, wherein the associated stamp of each image data file is a global unique associated stamp, and carrying out associated stamp association on the extracted text information and picture characteristics of each image data file;
the video processing module is used for:
cutting out a video key segment of a video data file, framing the video key segment to obtain a plurality of frames of video images, and determining a key area in each video image, wherein each video image comprises a key area;
the information grabbing module is used for:
and capturing the keyword set in the basic data, the text information and the picture characteristics in the image data file and the key region in the video data file, and generating a captured data set based on the captured data.
Specifically, the text reading module, the image processing module and the video processing module can effectively process data files in various formats, so that the data in the multiple formats can be classified and mined more comprehensively, the richness of the processable data is improved, the data mining effect is improved, and meanwhile, the mining accuracy and breadth of the data can be effectively improved by increasing the range of the types of the data.
The data storage unit includes:
a data storage module for:
the method comprises the steps of interacting with a cloud platform, carrying out data set distributed storage on a keyword set in basic data, text information and picture characteristics in an image data file and a key area in a video data file, and positioning data storage information through the cloud platform;
a data encryption module for:
interacting with a data storage module, and encrypting the distributed storage of the data set;
a data sharing module, configured to:
and interacting with the cloud platform, processing data in the cloud platform and sharing the data through a network.
Specifically, the data storage unit can store the data set in a distributed manner through the cloud platform, so that the storage space requirement of the server is reduced, meanwhile, the data security can be improved through an encryption mode, and various data can be conveniently and effectively fetched and applied through interaction of the data sharing module and the cloud platform.
In order to solve the technical problems that the existing data analysis system can only realize simple statistical processing on the existing data, but cannot perform deep data mining analysis on the running state of an enterprise according to the existing data, and is difficult to fully utilize the data, the invention provides the following technical scheme:
the data mining unit includes:
the abnormality rejection module is used for:
extracting key bytes of the grabbing data set for verification, carrying out association analysis on the key bytes, carrying out association analysis on the grabbing data set when the verification is carried out by an abnormal rejection module, building a data association analysis model, inputting the grabbing data set into the data association analysis model for data analysis, outputting an analysis report based on a data analysis result, determining the distinguishing key bytes with abnormality, extracting abnormal data corresponding to the distinguishing key bytes from each item of data for rejection, and generating de-abnormal data based on the grabbing data set after the abnormal data rejection;
the data cleaning module is used for:
and cleaning the data of the basic data file based on the de-alien data, and cleaning and screening the basic data file based on the association stamp corresponding to the de-alien data during data cleaning to generate a determined data set.
Specifically, by checking the key bytes, the data in the grabbing data set can be effectively checked and screened, the data in the grabbing data set is screened, the repeated and abnormal data are cleaned and removed, the data are mined, the cost and difficulty of data mining are effectively reduced, the mining, storage and utilization of the data can be comprehensively and systematically carried out, a unified system is formed for carrying out data utilization in a targeted mode, and the data utilization rate is improved.
The data feedback unit includes:
the quick search module is used for:
interacting with the cloud platform and providing query service based on an index system;
the data feedback module is used for:
and displaying and reminding through the display equipment according to the retrieval result of the quick retrieval module.
The cloud platform includes:
cloud database for:
classifying and storing a keyword set in the received basic data, text information and picture characteristics in an image data file and a key area in a video data file according to a data stream label;
a data processing module for:
grouping the classified stored data according to stream attribute information and data content, and classifying and marking, wherein the grouping comprises a basic data set, an image data set and a video data set;
the data matching module is used for:
and respectively matching the data sets processed and grouped by the data processing module according to the attribute information of the data storage unit to generate the data requirements of the corresponding data streams, and butting the data requirements of the data streams with the data sets.
Specifically, the cloud database can store the data information set and the determined data set, the data information set and the determined data set stored in the cloud database can be searched through the quick search module, so that when the data information set and the determined data set need to be applied, the data information set and the determined data set can be searched and called through creating a call task, and the data feedback module is used.
The data encryption module includes:
an encryption management sub-module for:
dividing the data set into a plurality of parts according to the distributed storage condition of the data set, registering an encryption method in each part, then carrying out use record aiming at the encryption method, carrying out key management aiming at a key in the use process of the encryption method, and forming a key index by combining the key management in the use record;
an encryption processing sub-module for:
and the distributed storage information of the distributed storage of the data set is obtained through interaction with the data storage module, encryption processing is carried out on an encryption method in a corresponding part according to the distributed storage information, and meanwhile, a secret key in the encryption processing process is fed back to the encryption management sub-module.
The data encryption module in the technical scheme comprises an encryption management sub-module and an encryption processing sub-module, wherein the encryption management sub-module is divided into a plurality of parts according to the distributed storage condition of the data set in a distributed mode, each part is used for carrying out key management by registering an encryption method and a true-lost encryption method, and each part is used for registering the encryption method and corresponds to the distributed storage condition of the data set in a distributed mode; the encryption processing sub-module interacts with the data storage module to encrypt the distributed storage of the data set. When the data encryption module encrypts the distributed storage of the data set, the encryption processing sub-module interacts with the data storage unit to obtain distributed storage information of the distributed storage of the data set in the data storage unit, then partial matching is carried out according to the distributed storage information, the encryption method in the corresponding part is called according to the matching condition to carry out encryption processing on the distributed storage information, the secret key in the encryption process is fed back to the encryption management sub-module, meanwhile, the encryption management sub-module carries out use records on the calling condition of the encryption method in each part, when the secret key feedback is obtained, the fed-back secret key is managed, and meanwhile, a secret key index is formed by combining the management of the secret key in the use records, so that the original information of the distributed storage information can be obtained by decryption according to the secret key index when the encrypted distributed storage information is decrypted.
The data encryption module is used for encrypting the distributed storage information of the data set distributed storage in the data storage module through the encryption management sub-module and the encryption processing sub-module, the security of the distributed storage information in the data storage module is improved, in addition, in the encryption management sub-module, the encryption management sub-module is divided into a plurality of parts according to the distributed condition of the data set distributed storage, so that the encryption method in each part can encrypt the distributed storage information in the corresponding distributed condition, the keyword set in the basic data, the text information and the picture characteristic in the image data file and the key region in the video data file can be better encrypted, the suitability of the distributed storage information and the encryption method is improved, the security coefficient of the distributed storage information is higher, the encryption processing sub-module feeds back the key in the encryption processing process to the encryption management sub-module, the encryption management sub-module can manage the key in the encryption process, the key adopted in the encryption process can be decrypted when the distributed storage information is obtained, and the distributed storage information can be ensured to be reserved when the original data is used later.
The video processing module determines a plurality of frames of video images obtained by combining when determining a key region in each video image, and comprises the following steps:
image recognition is carried out on the video image, imaging conditions in the video image are recognized, and an image recognition result is obtained;
dividing the video image into a plurality of areas according to the image recognition result;
the video image is analyzed in combination with the adjacent frame video image by the following formula:
,
,
in the above-mentioned formula(s),indicate->Analytical data value of block area, < >>Representing a symbolic function +_>Indicate->First->Block area +.>Image information of individual feature points, < >>Indicate->Frame video image NoBlock area +.>Image information of individual feature points, < >>Indicate->First->Block area +.>Image information of individual feature points, < >>Indicate->First->The number of feature points in the block area, +.>Represent the firstFirst->The number of feature points in the block region,/>indicate->First->The number of feature points in the block area, +.>Representing the total frame number of the video image, +.>Indicate->Analysis of the block region, when->When indicate->The block areas are used to constitute the critical areas when +.>When indicate->The block regions are not used to construct critical regions;
the regions for constituting the key regions are combined together to form the key regions of the video image with reference to the analysis result.
Specifically, the data processing module divides according to the imaging condition of people or objects appearing in the video image, avoids dividing imaging in the video image into different areas, ensures the integrity of the key areas, and determines whether each area changes in the video image or not by combining adjacent frame video images, so that the changed part is used as a data set of the video image, the characteristics of the video image can be reflected, the redundancy of information can be reduced, the video image can be analyzed in adjacent frames by combining left and right adjacent frame video images during analysis, small changes of interval video images are avoided to be ignored, the accuracy of area analysis is improved, the accuracy of the key areas is further ensured, in addition, the areas for forming the key areas are combined together according to the analysis result, one key area included in each video image can more comprehensively comprise the changed part in the video image, and the key area can more accurately represent the video data file.
In order to better show a data mining system based on big data, the embodiment now provides a mining method of the data mining system based on big data, which comprises the following steps:
step one: the data acquisition unit acquires basic data, image data and video data files and generates a basic data file, and performs characteristic grabbing based on the basic data file and generates a grabbing data set;
step two: the data storage unit performs data set distributed storage and encryption on the grabbing data set;
step three: the data mining unit cleans and rejects the abnormal data and generates a determined data set;
step four: the data storage unit stores the determined data set based on the cloud platform;
step five: the data storage unit shares data through a network, and the data feedback unit retrieves and displays the data based on the cloud platform.
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should be covered by the protection scope of the present invention by making equivalents and modifications to the technical solution and the inventive concept thereof.
Claims (8)
1. A big data based data mining system, comprising:
the data acquisition unit is used for:
collecting basic data, image data and video data files, generating a basic data file, reading text information of the basic data to obtain a keyword set, extracting text information and picture characteristics of the image data file, capturing the keyword set in the basic data and the text information and picture characteristics in the image data file, and generating a captured data set;
a data storage unit for:
interacting with a cloud platform, carrying out data set distributed storage and encryption on a keyword set in basic data, text information and picture characteristics in an image data file and a key region in a video data file, and sharing data through a network on the basis of the cloud platform;
a data mining unit for:
extracting key bytes of the grabbing data set, checking and eliminating abnormal data to generate de-abnormal data, and cleaning the data of the basic data file based on the de-abnormal data to generate a determined data set;
a data feedback unit for:
carrying out data retrieval and retrieval result display and reminding on the cloud platform;
cloud platform for:
after the classified information is stored and edited through the cloud, the classified information is transmitted to a data storage unit, and feedback information is returned;
a user terminal for:
the system comprises a data acquisition unit, a data storage unit, a data mining unit and a data feedback unit, wherein the data acquisition unit, the data storage unit, the data mining unit and the data feedback unit are used for storing, operating and implementing the system, the login end and the control terminal are not less than one, and when the control terminal works, the data acquisition unit, the data storage unit, the data mining unit and the data feedback unit are operated, and a data mining system based on big data is realized, and the login end is connected to the control terminal, the cloud platform and the server through the Internet;
the data acquisition unit includes:
the file acquisition module is used for:
collecting basic data, image data and video data files, and generating a basic data file based on the basic data, the image data and the video data files;
a text reading module for:
reading the text information of the basic data, dividing the text information to obtain a plurality of extracted words in the text information, and cleaning the extracted words according to the part-of-speech statistical characteristics to obtain a keyword set;
an image processing module for:
extracting text information and picture characteristics of the image data files, creating associated stamps for the image data files, obtaining associated stamps of each image data file, wherein the associated stamp of each image data file is a global unique associated stamp, and carrying out associated stamp association on the extracted text information and picture characteristics of each image data file;
the video processing module is used for:
cutting out a video key segment of a video data file, framing the video key segment to obtain a plurality of frames of video images, and determining a key area in each video image, wherein each video image comprises a key area;
the information grabbing module is used for:
capturing a keyword set in the basic data, text information and picture characteristics in the image data file and a key region in the video data file, and generating a captured data set based on the captured data;
the video processing module determines a plurality of frames of video images obtained by combining when determining a key region in each video image, and comprises the following steps:
image recognition is carried out on the video image, imaging conditions in the video image are recognized, and an image recognition result is obtained;
dividing the video image into a plurality of areas according to the image recognition result;
the video image is analyzed in combination with the adjacent frame video image by the following formula:
,
,
in the above-mentioned formula(s),indicate->Analytical data value of block area, < >>Representing a symbolic function +_>Indicate->First->Block area +.>Image information of individual feature points, < >>Indicate->First->Block area +.>Image information of individual feature points, < >>Indicate->First->Block area +.>Image information of individual feature points, < >>Indicate->First->The number of feature points in the block area, +.>Indicate->First->The number of feature points in the block area, +.>Indicate->First->The number of feature points in the block area, +.>Representing the total frame number of the video image, +.>Indicate->Analysis of the block region, when->When indicate->The block areas are used to constitute the critical areas when +.>When indicate->The block regions are not used to construct critical regions;
the regions for constituting the key regions are combined together to form the key regions of the video image with reference to the analysis result.
2. A big data based data mining system according to claim 1, wherein: the data storage unit includes:
a data storage module for:
the method comprises the steps of interacting with a cloud platform, carrying out data set distributed storage on a keyword set in basic data, text information and picture characteristics in an image data file and a key area in a video data file, and positioning data storage information through the cloud platform;
a data encryption module for:
interacting with a data storage module, and encrypting the distributed storage of the data set;
a data sharing module, configured to:
and interacting with the cloud platform, processing data in the cloud platform and sharing the data through a network.
3. A big data based data mining system according to claim 1, wherein: the data mining unit includes:
the abnormality rejection module is used for:
extracting key bytes of the grabbing data set for verification, carrying out association analysis on the key bytes, determining abnormal distinguishing key bytes, extracting abnormal data corresponding to the distinguishing key bytes from each item of data for eliminating, and generating de-abnormal data based on the grabbing data set after eliminating the abnormal data;
the data cleaning module is used for:
and cleaning the data of the basic data file based on the de-alien data, and cleaning and screening the basic data file based on the association stamp corresponding to the de-alien data during data cleaning to generate a determined data set.
4. A big data based data mining system according to claim 3, wherein: and the abnormal eliminating module carries out association analysis on the grabbing data set when checking, builds a data association analysis model, inputs the grabbing data set into the data association analysis model for data analysis, and outputs an analysis report based on a data analysis result.
5. A big data based data mining system according to claim 1, wherein: the data feedback unit includes:
the quick search module is used for:
interacting with the cloud platform and providing query service based on an index system;
the data feedback module is used for:
and displaying and reminding through the display equipment according to the retrieval result of the quick retrieval module.
6. A big data based data mining system according to claim 1, wherein: the cloud platform includes:
cloud database for:
classifying and storing a keyword set in the received basic data, text information and picture characteristics in an image data file and a key area in a video data file according to a data stream label;
a data processing module for:
grouping the classified stored data according to stream attribute information and data content, and classifying and marking, wherein the grouping comprises a basic data set, an image data set and a video data set;
the data matching module is used for:
and respectively matching the data sets processed and grouped by the data processing module according to the attribute information of the data storage unit to generate the data requirements of the corresponding data streams, and butting the data requirements of the data streams with the data sets.
7. A big data based data mining system according to claim 2, wherein: the data encryption module includes:
an encryption management sub-module for:
dividing the data set into a plurality of parts according to the distributed storage condition of the data set, registering an encryption method in each part, then carrying out use record aiming at the encryption method, carrying out key management aiming at a key in the use process of the encryption method, and forming a key index by combining the key management in the use record;
an encryption processing sub-module for:
and the distributed storage information of the distributed storage of the data set is obtained through interaction with the data storage module, encryption processing is carried out on an encryption method in a corresponding part according to the distributed storage information, and meanwhile, a secret key in the encryption processing process is fed back to the encryption management sub-module.
8. A mining method of a big data based data mining system according to any of claims 1-7, characterized by: the method comprises the following steps:
step one: the data acquisition unit acquires basic data, image data and video data files and generates a basic data file, and performs characteristic grabbing based on the basic data file and generates a grabbing data set;
step two: the data storage unit performs data set distributed storage and encryption on the grabbing data set;
step three: the data mining unit cleans and rejects the abnormal data and generates a determined data set;
step four: the data storage unit stores the determined data set based on the cloud platform;
step five: the data storage unit shares data through a network, and the data feedback unit retrieves and displays the data based on the cloud platform.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410073443.7A CN117591578B (en) | 2024-01-18 | 2024-01-18 | Data mining system and mining method based on big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410073443.7A CN117591578B (en) | 2024-01-18 | 2024-01-18 | Data mining system and mining method based on big data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117591578A CN117591578A (en) | 2024-02-23 |
CN117591578B true CN117591578B (en) | 2024-04-09 |
Family
ID=89922331
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410073443.7A Active CN117591578B (en) | 2024-01-18 | 2024-01-18 | Data mining system and mining method based on big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117591578B (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106294849A (en) * | 2016-08-23 | 2017-01-04 | 成都卡莱博尔信息技术股份有限公司 | Mass data inquiry system based on data mining technology |
CN106815253A (en) * | 2015-12-01 | 2017-06-09 | 慧科讯业有限公司 | A kind of method for digging based on mixed data type data |
CN107577771A (en) * | 2017-09-07 | 2018-01-12 | 北京海融兴通信息安全技术有限公司 | A kind of big data digging system |
CN110472075A (en) * | 2018-05-09 | 2019-11-19 | 中国互联网络信息中心 | A kind of isomeric data classification storage method and system based on machine learning |
KR20210074734A (en) * | 2019-12-12 | 2021-06-22 | 동의대학교 산학협력단 | System and Method for Extracting Keyword and Ranking in Video Subtitle |
CN114117174A (en) * | 2021-09-02 | 2022-03-01 | 杨子晴 | Multi-format data screening management system based on big data |
CN114764464A (en) * | 2020-12-30 | 2022-07-19 | 北京华录新媒信息技术有限公司 | Video content pushing technology based on data mining |
CN115994173A (en) * | 2022-12-07 | 2023-04-21 | 呼和浩特市大旗网络有限公司 | Data mining system based on big data |
CN116501779A (en) * | 2023-06-26 | 2023-07-28 | 图林科技(深圳)有限公司 | Big data mining analysis system for real-time feedback |
CN116501725A (en) * | 2023-05-23 | 2023-07-28 | 厦门快快网络科技有限公司 | Big data processing method based on cloud computing |
CN116916049A (en) * | 2023-09-12 | 2023-10-20 | 北京青水环境科技有限公司 | Video data online acquisition and storage system based on cloud computing technology |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11620577B2 (en) * | 2020-07-01 | 2023-04-04 | International Business Machines Corporation | Multi-modal data explainer pipeline |
-
2024
- 2024-01-18 CN CN202410073443.7A patent/CN117591578B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106815253A (en) * | 2015-12-01 | 2017-06-09 | 慧科讯业有限公司 | A kind of method for digging based on mixed data type data |
CN106294849A (en) * | 2016-08-23 | 2017-01-04 | 成都卡莱博尔信息技术股份有限公司 | Mass data inquiry system based on data mining technology |
CN107577771A (en) * | 2017-09-07 | 2018-01-12 | 北京海融兴通信息安全技术有限公司 | A kind of big data digging system |
CN110472075A (en) * | 2018-05-09 | 2019-11-19 | 中国互联网络信息中心 | A kind of isomeric data classification storage method and system based on machine learning |
KR20210074734A (en) * | 2019-12-12 | 2021-06-22 | 동의대학교 산학협력단 | System and Method for Extracting Keyword and Ranking in Video Subtitle |
CN114764464A (en) * | 2020-12-30 | 2022-07-19 | 北京华录新媒信息技术有限公司 | Video content pushing technology based on data mining |
CN114117174A (en) * | 2021-09-02 | 2022-03-01 | 杨子晴 | Multi-format data screening management system based on big data |
CN115994173A (en) * | 2022-12-07 | 2023-04-21 | 呼和浩特市大旗网络有限公司 | Data mining system based on big data |
CN116501725A (en) * | 2023-05-23 | 2023-07-28 | 厦门快快网络科技有限公司 | Big data processing method based on cloud computing |
CN116501779A (en) * | 2023-06-26 | 2023-07-28 | 图林科技(深圳)有限公司 | Big data mining analysis system for real-time feedback |
CN116916049A (en) * | 2023-09-12 | 2023-10-20 | 北京青水环境科技有限公司 | Video data online acquisition and storage system based on cloud computing technology |
Non-Patent Citations (3)
Title |
---|
Research of big data information mining and analysis;Zhanchi Dong.etc;IEEE;20221231;全文 * |
基于Hadoop的电网非结构化数据智能分析云平台;张福铮;黄文琦;赵继光;董召杰;刘宸哲;严叶舟;曹立楠;;信息技术与信息化;20200528(第05期);全文 * |
智能情报获取系统框架研究;赵大海;郭晶;;军民两用技术与产品;20200815(第08期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN117591578A (en) | 2024-02-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107656974B (en) | Big data analysis system | |
AU2014237406B2 (en) | Method and apparatus for substitution scheme for anonymizing personally identifiable information | |
US6286098B1 (en) | System and method for encrypting audit information in network applications | |
CN108462888B (en) | Intelligent correlation analysis method and system for user television and internet behavior | |
CN108460582B (en) | System information processing method, apparatus, computer device and storage medium | |
JP4925143B2 (en) | Stream data processing system, stream data processing method, and stream data processing program | |
WO2017198227A1 (en) | Interactive internet protocol television system and real-time acquisition method for user data | |
US20020038430A1 (en) | System and method of data collection, processing, analysis, and annotation for monitoring cyber-threats and the notification thereof to subscribers | |
CN105138709B (en) | Remote evidence taking system based on physical memory analysis | |
Teyssou et al. | The InVID plug-in: web video verification on the browser | |
CN107918618B (en) | Data processing method and device | |
CN109492604A (en) | Faceform's characteristic statistics analysis system | |
CN112506925A (en) | Data retrieval system and method based on block chain | |
CN113360566A (en) | Information content monitoring method and system | |
CN112445870B (en) | Knowledge graph string parallel case analysis method based on mobile phone evidence obtaining electronic data | |
CN111639355B (en) | Data security management method and system | |
CN114254909A (en) | Risk management method and platform based on decision engine | |
CN117591578B (en) | Data mining system and mining method based on big data | |
CN115221453B (en) | Media resource management method, device, server and medium | |
CN106708876B (en) | Similar video retrieval method and system based on Lucene | |
CN114936321A (en) | Internet information data acquisition system with high accuracy | |
CN111831683A (en) | Automatic auditing method and system based on dynamic extended scene matching | |
CN118296555B (en) | Integrated large fusion law enforcement method and system | |
Su et al. | Research on WeChat Tampering and Forensics Based on Android System | |
CN117493466B (en) | Financial data synchronization method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |