CN110297847A - A kind of intelligent information retrieval method based on big data principle - Google Patents
A kind of intelligent information retrieval method based on big data principle Download PDFInfo
- Publication number
- CN110297847A CN110297847A CN201910595602.9A CN201910595602A CN110297847A CN 110297847 A CN110297847 A CN 110297847A CN 201910595602 A CN201910595602 A CN 201910595602A CN 110297847 A CN110297847 A CN 110297847A
- Authority
- CN
- China
- Prior art keywords
- data
- processing
- information retrieval
- analysis
- method based
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to information extraction fields, in particular to a kind of intelligent information retrieval method based on big data principle includes the following steps: the acquisition that big data is carried out by smart machine, enterprise's on-line system, enterprise's off-line system, social networks or internet platform etc.;The data source of collected extensive isomery is extracted and integrated by data pick-up and integration tool;Data storage is carried out to the data after being associated with and polymerizeing by using the structure of unified definition using storage tool;Data analysis is carried out using data of the data analysis technique to storage;Result presentation will be analyzed to terminal user by visualization technique or human-computer interaction technology, and the present invention provides a kind of intelligent information retrieval methods that retrieval accuracy is high.
Description
Technical field
The present invention relates to information extraction field, in particular to a kind of intelligent information retrieval method based on big data principle.
Background technique
Big data refers to the data that can not be captured, managed and be handled with conventional software tool within the scope of certain time
Set is magnanimity, the Gao Zeng for needing new tupe that could have stronger decision edge, see clearly discovery power and process optimization ability
Long rate and diversified information assets, the strategic importance of big data technology, which is not lain in, grasps huge data information, and is pair
These carry out specialized process containing significant data.In other words, if big data is compared to a kind of industry, this production
Industry realizes the key of profit, is to improve " working ability " to data, " increment " of data is realized by " processing ", from technology
On see, big data is inseparable just as the front and back sides of one piece of coin with the relationship of cloud computing.Big data can not necessarily use list
The computer of platform is handled, it is necessary to use distributed structure/architecture.Its characteristic is to carry out distributed data digging to mass data
Pick.But it must rely on distributed treatment, distributed data base and the cloud storage of cloud computing, virtualization technology, and big data needs
Special technology, effectively to handle a large amount of tolerance by the data in the time.It is advised suitable for the technology of big data, including greatly
Mould parallel processing (MPP) database, data mining, distributed file system, distributed data base, cloud computing platform, internet
With expansible storage system.
With the arrival of big data era, data become one of business activity valuable source, the science based on data
Decision and fine-grained management will become the inexorable trend of modern commerce management development, comment in the commodity of e-commerce field, magnanimity
Contain huge social value and commercial value by data, analysis digging is carried out to product feature data in magnanimity comment on commodity
Pick, the purchase decision foundation of item property granularity level can be provided for potential consumer, provides the foundation of product design for enterprise
With the competitive intelligence of other enterprises, moreover it is possible to which the improvement direction of demand and product to user makes effecting reaction, and it is competing to improve enterprise
Power is striven, but the prior art is low to the retrieval accuracy of evaluation data.
In view of the above-mentioned problems, the present invention devises a kind of intelligent information retrieval method based on big data principle.
Summary of the invention
In view of the deficiencies of the prior art, the present invention provides a kind of intelligent information retrieval method based on big data principle, to the greatest extent
Maximum possible solves the above problems, to provide a kind of intelligent information retrieval method that retrieval accuracy is high.
The present invention is achieved by the following technical programs:
A kind of intelligent information retrieval method based on big data principle, includes the following steps:
S1, it is carried out by smart machine, enterprise's on-line system, enterprise's off-line system, social networks or internet platform etc.
The acquisition of big data;
S2, the data source of collected extensive isomery is extracted and is integrated by data pick-up and integration tool;
S3, the data after being associated with and polymerizeing are carried out by using the structure of unified definition using storage tool
Data storage;
S4, data analysis is carried out using data of the data analysis technique to storage;
S5, result presentation will be analyzed to terminal user by visualization technique or human-computer interaction technology.
Preferably, big data described in S1 includes RFID data, sensing data, user behavior data, social activity
Various types of structurings such as network interaction data and mobile Internet data, semi-structured and non-structured mass data,
The acquisition of big data described in S1 is adopted using database acquisition, system log acquisition, network data acquisition or awareness apparatus data
The method of collection.
Preferably, data pick-up described in S1 and integration tool are needed when carrying out data pick-up and integrated processing to data
Data cleansing and data degradation are carried out, and the data from multiple data sources are combined together and form one by data integration tool
A uniform data set, to provide complete data basis for smoothly completing for data processing work.
Preferably, the data cleansing includes missing data processing, noise data processing and inconsistent data processing.
Preferably, the data degradation is eliminated extra and unrelated attribute and is effectively cut down number using the means of dimension abatement
The data set simplified is obtained according to the scale of collection and from original huge data set, so that this is simplified data set and is kept original number
According to the integrality of collection.
Preferably, the dimension abatement is searched out the smallest attribute set and is ensured new data using attribute set back-and-forth method
The probability distribution of subset carries out data mining using the property set after screening as close possible to the probability distribution of original data set,
Due to having used less attribute, so that user's Result easier to understand.
Preferably, data pick-up described in S2 and it is integrated include data being associated and polymerization processing.
Preferably, it needs to carry out data to data by data conversion tools when the Data Analysis Services described in S4 to turn
It changes, data conversion includes data smoothing processing, the total processing of data, data generaliza-tion processing, normalized processing and data category
Property Construction treatment.
Preferably, the data analysis includes data dependence analysis, data trend analysis and data signature analysis, and is led to
It crosses data analysis algorithm and carries out data analysis.
Preferably, it needs to carry out data explanation to data by data interpretation technique after the analysis of the data described in S4,
And by visualization technique and human-computer interaction technology by explain come characteristic be presented to terminal user.
The invention has the benefit that by the way that data are associated and polymerize with processing and is carrying out data pick-up and is integrating
Data cleansing is carried out to data when processing, guarantees the quality of data and credibility, data degradation is gone using attribute set back-and-forth method
Except extra and unrelated attribute, effectively cut down the scale of data set, keeps data quickly quasi- when carrying out data analysis
Really, it while by data dependence analysis, data trend analysis and data signature analysis, and is counted by data analysis algorithm
The accuracy of data retrieval is improved according to analysis.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 is the principle of the present invention figure;
Fig. 2 is flow chart of the invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
Fig. 1~2 are please referred to, a kind of intelligent information retrieval method based on big data principle includes the following steps:
S1, it is carried out by smart machine, enterprise's on-line system, enterprise's off-line system, social networks or internet platform etc.
The acquisition of big data;
S2, the data source of collected extensive isomery is extracted and is integrated by data pick-up and integration tool;
S3, the data after being associated with and polymerizeing are carried out by using the structure of unified definition using storage tool
Data storage;
S4, data analysis is carried out using data of the data analysis technique to storage;
S5, result presentation will be analyzed to terminal user by visualization technique or human-computer interaction technology.
Specifically, the big data in S1 includes RFID data, sensing data, user behavior data, social networks
Various types of structurings such as interaction data and mobile Internet data, semi-structured and non-structured mass data, in S1
The method that is acquired using database acquisition, system log acquisition, network data acquisition or awareness apparatus data of big data acquisition,
Data pick-up and integration tool need to carry out data cleansing and data to data when carrying out data pick-up and integrated processing in S1
Abatement, and the data from multiple data sources are combined together and are formed a uniform data set by data integration tool, with
Just complete data basis is provided for smoothly completing for data processing work, data cleansing includes missing data processing, noise number
According to processing and inconsistent data processing, data degradation is eliminated extra and unrelated attribute and is effectively disappeared using the means of dimension abatement
Subtract the scale of data set and obtain the data set simplified from original huge data set, so that this is simplified data set and keep former
There is the integrality of data set, dimension abatement searches out the smallest attribute set using attribute set back-and-forth method and ensures new data
The probability distribution of collection carries out data mining using the property set after screening as close possible to the probability distribution of original data set, by
In having used less attribute so that user's Result easier to understand, in S2 data pick-up and it is integrated include pair
Data are associated and polymerization processing, need to carry out data to data by data conversion tools when Data Analysis Services in S4
Conversion, data conversion include data smoothing processing, the total processing of data, data generaliza-tion processing, normalized processing and data
Attribute construction processing, data analysis includes data dependence analysis, data trend analysis and data signature analysis, and passes through data
Parser carries out data analysis, needs to carry out data solution to data by data interpretation technique after data analysis in S4
Release, and by visualization technique and human-computer interaction technology by explain come characteristic be presented to terminal user.
In the present invention, when carrying out big data information extraction, database acquisition, system log acquisition, network are used
The method of data acquisition or the acquisition of awareness apparatus data passes through smart machine, enterprise's on-line system, enterprise's off-line system, social network
It is mutual that network or internet platform etc. obtain RFID data, sensing data, user behavior data, social networks interaction data and movement
Various types of structurings such as networking data, semi-structured and non-structured mass data, utilize data pick-up and integrated work
Tool is extracted and is integrated to the data source of collected extensive isomery, in data integration and extraction, needs to carry out data
Cleaning, the guarantee quality of data and credibility, while needing to disappear to data progress data when carrying out data pick-up and integrated processing,
And eliminate extra and unrelated attribute using the means of dimension abatement and effectively cut down the scale of data set, dimension abatement is using category
Temper collection back-and-forth method helps effectively to reduce search space using heuristic knowledge, and heuristic search is typically based on possibility
The local optimum of global optimum is obtained to instruct and help to obtain corresponding attribute set, these data pre-processed pass through
Storage tool comes to carry out data storage to the data after association and polymerization by using the structure of unified definition, is needing
It carries out needing first to carry out data conversion to data when data analysis, data conversion includes smoothing processing, is used to help remove data
In noise;Total processing, summarizing or adding up to data operates;The general China's processing of data, is replaced with more abstract concept
The data object of low level or data Layer;Normalization processing, by related attribute data project in proportion specific small range it
In;Attribute construction processing, constructs new attribute, according to existing property set to help data handling procedure.Pass through big data analysis
Technology carries out data analysis to data, carries out data explanation to data by introducing visualization technique and human-computer interaction technology, will
Data analysis result shows terminal user, by the way that data are associated and polymerize with processing and is carrying out data pick-up and is integrating
Data cleansing is carried out to data when processing, guarantees the quality of data and credibility, data degradation is gone using attribute set back-and-forth method
Except extra and unrelated attribute, effectively cut down the scale of data set, keeps data quickly quasi- when carrying out data analysis
Really, it while by data dependence analysis, data trend analysis and data signature analysis, and is counted by data analysis algorithm
The accuracy of data retrieval is improved according to analysis, the process of entire information extraction screens data layer by layer, keeps retrieval accurate
Property it is high.
The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although with reference to the foregoing embodiments
Invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each implementation
Technical solution documented by example is modified or equivalent replacement of some of the technical features;And these modification or
Replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution.
Claims (10)
1. a kind of intelligent information retrieval method based on big data principle, characterized by the following steps:
S1, it is counted greatly by progress such as smart machine, enterprise's on-line system, enterprise's off-line system, social networks or internet platforms
According to acquisition;
S2, the data source of collected extensive isomery is extracted and is integrated by data pick-up and integration tool;
S3, data are carried out to the data after being associated with and polymerizeing by using the structure of unified definition using storage tool
Storage;
S4, data analysis is carried out using data of the data analysis technique to storage;
S5, result presentation will be analyzed to terminal user by visualization technique or human-computer interaction technology.
2. a kind of intelligent information retrieval method based on big data principle according to claim 1, it is characterised in that: in S1
The big data includes that RFID data, sensing data, user behavior data, social networks interaction data and movement are mutual
Various types of structurings such as networking data, semi-structured and non-structured mass data, the acquisition of big data described in S1
The method acquired using database acquisition, system log acquisition, network data acquisition or awareness apparatus data.
3. a kind of intelligent information retrieval method based on big data principle according to claim 1, it is characterised in that: in S1
The data pick-up and integration tool need to carry out data cleansing and data to data when carrying out data pick-up and integrated processing
Abatement, and the data from multiple data sources are combined together and are formed a uniform data set by data integration tool, with
Just complete data basis is provided for smoothly completing for data processing work.
4. a kind of intelligent information retrieval method based on big data principle according to claim 1, it is characterised in that: described
Data cleansing includes missing data processing, noise data processing and inconsistent data processing.
5. a kind of intelligent information retrieval method based on big data principle according to claim 3, it is characterised in that: described
Data degradation eliminates extra and unrelated attribute using the means of dimension abatement and effectively cuts down the scale of data set and from original
A data set simplified is obtained in huge data set, this is made to simplify the integrality that data set keeps legacy data collection.
6. a kind of intelligent information retrieval method based on big data principle according to claim 5, it is characterised in that: described
Dimension abatement is searched out the smallest attribute set and is ensured that the probability distribution of new data subset to the greatest extent may be used using attribute set back-and-forth method
Data mining can be carried out using the property set after screening, due to having used less category close to the probability distribution of original data set
Property, so that user's Result easier to understand.
7. a kind of intelligent information retrieval method based on big data principle according to claim 1, it is characterised in that: in S2
The data pick-up and it is integrated include data being associated and polymerization processing.
8. a kind of intelligent information retrieval method based on big data principle according to claim 1, it is characterised in that: in S4
Described in Data Analysis Services when need to carry out data conversion to data by data conversion tools, data conversion is flat comprising data
Sliding processing, the total processing of data, data generaliza-tion processing, normalized processing and the processing of data attribute construction.
9. a kind of intelligent information retrieval method based on big data principle according to claim 1, it is characterised in that: described
Data analysis includes data dependence analysis, data trend analysis and data signature analysis, and is carried out by data analysis algorithm
Data analysis.
10. a kind of intelligent information retrieval method based on big data principle according to claim 1, it is characterised in that:
Needs to carry out data explanation to data by data interpretation technique after the analysis of data described in S4, and by visualization technique and
Human-computer interaction technology is presented to terminal user for the characteristic come is explained.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910595602.9A CN110297847A (en) | 2019-07-03 | 2019-07-03 | A kind of intelligent information retrieval method based on big data principle |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910595602.9A CN110297847A (en) | 2019-07-03 | 2019-07-03 | A kind of intelligent information retrieval method based on big data principle |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110297847A true CN110297847A (en) | 2019-10-01 |
Family
ID=68030150
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910595602.9A Pending CN110297847A (en) | 2019-07-03 | 2019-07-03 | A kind of intelligent information retrieval method based on big data principle |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110297847A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110851444A (en) * | 2019-11-06 | 2020-02-28 | 北京许继电气有限公司 | Data acquisition and analysis method for industrial workshop |
CN111242519A (en) * | 2020-04-24 | 2020-06-05 | 北京淇瑀信息科技有限公司 | User characteristic data generation method and device and electronic equipment |
CN111737328A (en) * | 2020-06-11 | 2020-10-02 | 南京朗禾智能控制研究院有限公司 | Big data analysis method applied to Internet of vehicles |
CN111949884A (en) * | 2020-08-26 | 2020-11-17 | 桂林电子科技大学 | Multi-mode feature interaction-based depth fusion recommendation method |
CN113254517A (en) * | 2021-05-22 | 2021-08-13 | 北京德风新征程科技有限公司 | Service providing method based on internet big data |
CN113282636A (en) * | 2021-04-13 | 2021-08-20 | 武汉天梯科技股份有限公司 | Intelligent hardware system for intelligently acquiring offline big data |
CN113342790A (en) * | 2021-05-31 | 2021-09-03 | 重庆大数据人工智能创新中心有限公司 | Big data processing method for realizing mixed data analysis |
CN116821104A (en) * | 2022-08-18 | 2023-09-29 | 南通泽烁信息科技有限公司 | Industrial Internet data processing method and system based on big data |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104715047A (en) * | 2015-03-26 | 2015-06-17 | 浪潮集团有限公司 | Social network data acquisition and analysis system |
CN105069025A (en) * | 2015-07-17 | 2015-11-18 | 浪潮通信信息系统有限公司 | Intelligent aggregation visualization and management and control system for big data |
CN106649718A (en) * | 2016-12-22 | 2017-05-10 | 盐城工学院 | Large data acquisition and processing method for PDM system |
CN108446364A (en) * | 2018-03-14 | 2018-08-24 | 湖南商学院 | A kind of visual analysis method towards campus big data |
-
2019
- 2019-07-03 CN CN201910595602.9A patent/CN110297847A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104715047A (en) * | 2015-03-26 | 2015-06-17 | 浪潮集团有限公司 | Social network data acquisition and analysis system |
CN105069025A (en) * | 2015-07-17 | 2015-11-18 | 浪潮通信信息系统有限公司 | Intelligent aggregation visualization and management and control system for big data |
CN106649718A (en) * | 2016-12-22 | 2017-05-10 | 盐城工学院 | Large data acquisition and processing method for PDM system |
CN108446364A (en) * | 2018-03-14 | 2018-08-24 | 湖南商学院 | A kind of visual analysis method towards campus big data |
Non-Patent Citations (1)
Title |
---|
武志学: "《大数据导论》", 1 April 2019, 人民邮电出版社 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110851444A (en) * | 2019-11-06 | 2020-02-28 | 北京许继电气有限公司 | Data acquisition and analysis method for industrial workshop |
CN111242519A (en) * | 2020-04-24 | 2020-06-05 | 北京淇瑀信息科技有限公司 | User characteristic data generation method and device and electronic equipment |
CN111242519B (en) * | 2020-04-24 | 2020-07-17 | 北京淇瑀信息科技有限公司 | User characteristic data generation method and device and electronic equipment |
CN111737328A (en) * | 2020-06-11 | 2020-10-02 | 南京朗禾智能控制研究院有限公司 | Big data analysis method applied to Internet of vehicles |
CN111949884A (en) * | 2020-08-26 | 2020-11-17 | 桂林电子科技大学 | Multi-mode feature interaction-based depth fusion recommendation method |
CN111949884B (en) * | 2020-08-26 | 2022-06-21 | 桂林电子科技大学 | Multi-mode feature interaction-based depth fusion recommendation method |
CN113282636A (en) * | 2021-04-13 | 2021-08-20 | 武汉天梯科技股份有限公司 | Intelligent hardware system for intelligently acquiring offline big data |
CN113254517A (en) * | 2021-05-22 | 2021-08-13 | 北京德风新征程科技有限公司 | Service providing method based on internet big data |
CN113342790A (en) * | 2021-05-31 | 2021-09-03 | 重庆大数据人工智能创新中心有限公司 | Big data processing method for realizing mixed data analysis |
CN116821104A (en) * | 2022-08-18 | 2023-09-29 | 南通泽烁信息科技有限公司 | Industrial Internet data processing method and system based on big data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110297847A (en) | A kind of intelligent information retrieval method based on big data principle | |
Isah et al. | A survey of distributed data stream processing frameworks | |
US20200257680A1 (en) | Analyzing tags associated with high-latency and error spans for instrumented software | |
JP2008546054A (en) | Recognition of event patterns from event streams | |
Simmen et al. | Large-scale graph analytics in aster 6: bringing context to big data discovery | |
US20120259865A1 (en) | Automated correlation discovery for semi-structured processes | |
US8326982B2 (en) | Method and apparatus for extracting and visualizing execution patterns from web services | |
US20110040805A1 (en) | Techniques for parallel business intelligence evaluation and management | |
CN114461644A (en) | Data acquisition method and device, electronic equipment and storage medium | |
CN111492344A (en) | System and method for monitoring execution of structured query language (SQ L) queries | |
Adhikari et al. | Advances in knowledge discovery in databases | |
Mohbey | Memory-optimized distributed utility mining for big data | |
Yuan et al. | Research on technologies and application of data mining for cloud manufacturing resource services | |
CN113254517A (en) | Service providing method based on internet big data | |
CN112818230A (en) | Content recommendation method and device, electronic equipment and storage medium | |
Theeten et al. | Chive: Bandwidth optimized continuous querying in distributed clouds | |
Jin et al. | [Retracted] Cloud Statistics of Accounting Informatization Based on Statistics Mining | |
Skhiri et al. | Large graph mining: recent developments, challenges and potential solutions | |
Ramesh et al. | Granite: A distributed engine for scalable path queries over temporal property graphs | |
Dong et al. | Select actionable positive or negative sequential patterns | |
CN104834976A (en) | Method for searching for, analyzing and predicting price change trend of memory chip through big data | |
WO2021217119A1 (en) | Analyzing tags associated with high-latency and error spans for instrumented software | |
Lee et al. | Detecting anomaly teletraffic using stochastic self-similarity based on Hadoop | |
Adhikari et al. | Developing multi-database mining applications | |
Gu et al. | Characterizing job-task dependency in cloud workloads using graph learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191001 |