Nothing Special   »   [go: up one dir, main page]

CN110297847A - A kind of intelligent information retrieval method based on big data principle - Google Patents

A kind of intelligent information retrieval method based on big data principle Download PDF

Info

Publication number
CN110297847A
CN110297847A CN201910595602.9A CN201910595602A CN110297847A CN 110297847 A CN110297847 A CN 110297847A CN 201910595602 A CN201910595602 A CN 201910595602A CN 110297847 A CN110297847 A CN 110297847A
Authority
CN
China
Prior art keywords
data
processing
information retrieval
analysis
method based
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910595602.9A
Other languages
Chinese (zh)
Inventor
宋妍
吕娜
高巍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mudanjiang Normal University
Original Assignee
Mudanjiang Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mudanjiang Normal University filed Critical Mudanjiang Normal University
Priority to CN201910595602.9A priority Critical patent/CN110297847A/en
Publication of CN110297847A publication Critical patent/CN110297847A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to information extraction fields, in particular to a kind of intelligent information retrieval method based on big data principle includes the following steps: the acquisition that big data is carried out by smart machine, enterprise's on-line system, enterprise's off-line system, social networks or internet platform etc.;The data source of collected extensive isomery is extracted and integrated by data pick-up and integration tool;Data storage is carried out to the data after being associated with and polymerizeing by using the structure of unified definition using storage tool;Data analysis is carried out using data of the data analysis technique to storage;Result presentation will be analyzed to terminal user by visualization technique or human-computer interaction technology, and the present invention provides a kind of intelligent information retrieval methods that retrieval accuracy is high.

Description

A kind of intelligent information retrieval method based on big data principle
Technical field
The present invention relates to information extraction field, in particular to a kind of intelligent information retrieval method based on big data principle.
Background technique
Big data refers to the data that can not be captured, managed and be handled with conventional software tool within the scope of certain time Set is magnanimity, the Gao Zeng for needing new tupe that could have stronger decision edge, see clearly discovery power and process optimization ability Long rate and diversified information assets, the strategic importance of big data technology, which is not lain in, grasps huge data information, and is pair These carry out specialized process containing significant data.In other words, if big data is compared to a kind of industry, this production Industry realizes the key of profit, is to improve " working ability " to data, " increment " of data is realized by " processing ", from technology On see, big data is inseparable just as the front and back sides of one piece of coin with the relationship of cloud computing.Big data can not necessarily use list The computer of platform is handled, it is necessary to use distributed structure/architecture.Its characteristic is to carry out distributed data digging to mass data Pick.But it must rely on distributed treatment, distributed data base and the cloud storage of cloud computing, virtualization technology, and big data needs Special technology, effectively to handle a large amount of tolerance by the data in the time.It is advised suitable for the technology of big data, including greatly Mould parallel processing (MPP) database, data mining, distributed file system, distributed data base, cloud computing platform, internet With expansible storage system.
With the arrival of big data era, data become one of business activity valuable source, the science based on data Decision and fine-grained management will become the inexorable trend of modern commerce management development, comment in the commodity of e-commerce field, magnanimity Contain huge social value and commercial value by data, analysis digging is carried out to product feature data in magnanimity comment on commodity Pick, the purchase decision foundation of item property granularity level can be provided for potential consumer, provides the foundation of product design for enterprise With the competitive intelligence of other enterprises, moreover it is possible to which the improvement direction of demand and product to user makes effecting reaction, and it is competing to improve enterprise Power is striven, but the prior art is low to the retrieval accuracy of evaluation data.
In view of the above-mentioned problems, the present invention devises a kind of intelligent information retrieval method based on big data principle.
Summary of the invention
In view of the deficiencies of the prior art, the present invention provides a kind of intelligent information retrieval method based on big data principle, to the greatest extent Maximum possible solves the above problems, to provide a kind of intelligent information retrieval method that retrieval accuracy is high.
The present invention is achieved by the following technical programs:
A kind of intelligent information retrieval method based on big data principle, includes the following steps:
S1, it is carried out by smart machine, enterprise's on-line system, enterprise's off-line system, social networks or internet platform etc. The acquisition of big data;
S2, the data source of collected extensive isomery is extracted and is integrated by data pick-up and integration tool;
S3, the data after being associated with and polymerizeing are carried out by using the structure of unified definition using storage tool Data storage;
S4, data analysis is carried out using data of the data analysis technique to storage;
S5, result presentation will be analyzed to terminal user by visualization technique or human-computer interaction technology.
Preferably, big data described in S1 includes RFID data, sensing data, user behavior data, social activity Various types of structurings such as network interaction data and mobile Internet data, semi-structured and non-structured mass data, The acquisition of big data described in S1 is adopted using database acquisition, system log acquisition, network data acquisition or awareness apparatus data The method of collection.
Preferably, data pick-up described in S1 and integration tool are needed when carrying out data pick-up and integrated processing to data Data cleansing and data degradation are carried out, and the data from multiple data sources are combined together and form one by data integration tool A uniform data set, to provide complete data basis for smoothly completing for data processing work.
Preferably, the data cleansing includes missing data processing, noise data processing and inconsistent data processing.
Preferably, the data degradation is eliminated extra and unrelated attribute and is effectively cut down number using the means of dimension abatement The data set simplified is obtained according to the scale of collection and from original huge data set, so that this is simplified data set and is kept original number According to the integrality of collection.
Preferably, the dimension abatement is searched out the smallest attribute set and is ensured new data using attribute set back-and-forth method The probability distribution of subset carries out data mining using the property set after screening as close possible to the probability distribution of original data set, Due to having used less attribute, so that user's Result easier to understand.
Preferably, data pick-up described in S2 and it is integrated include data being associated and polymerization processing.
Preferably, it needs to carry out data to data by data conversion tools when the Data Analysis Services described in S4 to turn It changes, data conversion includes data smoothing processing, the total processing of data, data generaliza-tion processing, normalized processing and data category Property Construction treatment.
Preferably, the data analysis includes data dependence analysis, data trend analysis and data signature analysis, and is led to It crosses data analysis algorithm and carries out data analysis.
Preferably, it needs to carry out data explanation to data by data interpretation technique after the analysis of the data described in S4, And by visualization technique and human-computer interaction technology by explain come characteristic be presented to terminal user.
The invention has the benefit that by the way that data are associated and polymerize with processing and is carrying out data pick-up and is integrating Data cleansing is carried out to data when processing, guarantees the quality of data and credibility, data degradation is gone using attribute set back-and-forth method Except extra and unrelated attribute, effectively cut down the scale of data set, keeps data quickly quasi- when carrying out data analysis Really, it while by data dependence analysis, data trend analysis and data signature analysis, and is counted by data analysis algorithm The accuracy of data retrieval is improved according to analysis.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is the principle of the present invention figure;
Fig. 2 is flow chart of the invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
Fig. 1~2 are please referred to, a kind of intelligent information retrieval method based on big data principle includes the following steps:
S1, it is carried out by smart machine, enterprise's on-line system, enterprise's off-line system, social networks or internet platform etc. The acquisition of big data;
S2, the data source of collected extensive isomery is extracted and is integrated by data pick-up and integration tool;
S3, the data after being associated with and polymerizeing are carried out by using the structure of unified definition using storage tool Data storage;
S4, data analysis is carried out using data of the data analysis technique to storage;
S5, result presentation will be analyzed to terminal user by visualization technique or human-computer interaction technology.
Specifically, the big data in S1 includes RFID data, sensing data, user behavior data, social networks Various types of structurings such as interaction data and mobile Internet data, semi-structured and non-structured mass data, in S1 The method that is acquired using database acquisition, system log acquisition, network data acquisition or awareness apparatus data of big data acquisition, Data pick-up and integration tool need to carry out data cleansing and data to data when carrying out data pick-up and integrated processing in S1 Abatement, and the data from multiple data sources are combined together and are formed a uniform data set by data integration tool, with Just complete data basis is provided for smoothly completing for data processing work, data cleansing includes missing data processing, noise number According to processing and inconsistent data processing, data degradation is eliminated extra and unrelated attribute and is effectively disappeared using the means of dimension abatement Subtract the scale of data set and obtain the data set simplified from original huge data set, so that this is simplified data set and keep former There is the integrality of data set, dimension abatement searches out the smallest attribute set using attribute set back-and-forth method and ensures new data The probability distribution of collection carries out data mining using the property set after screening as close possible to the probability distribution of original data set, by In having used less attribute so that user's Result easier to understand, in S2 data pick-up and it is integrated include pair Data are associated and polymerization processing, need to carry out data to data by data conversion tools when Data Analysis Services in S4 Conversion, data conversion include data smoothing processing, the total processing of data, data generaliza-tion processing, normalized processing and data Attribute construction processing, data analysis includes data dependence analysis, data trend analysis and data signature analysis, and passes through data Parser carries out data analysis, needs to carry out data solution to data by data interpretation technique after data analysis in S4 Release, and by visualization technique and human-computer interaction technology by explain come characteristic be presented to terminal user.
In the present invention, when carrying out big data information extraction, database acquisition, system log acquisition, network are used The method of data acquisition or the acquisition of awareness apparatus data passes through smart machine, enterprise's on-line system, enterprise's off-line system, social network It is mutual that network or internet platform etc. obtain RFID data, sensing data, user behavior data, social networks interaction data and movement Various types of structurings such as networking data, semi-structured and non-structured mass data, utilize data pick-up and integrated work Tool is extracted and is integrated to the data source of collected extensive isomery, in data integration and extraction, needs to carry out data Cleaning, the guarantee quality of data and credibility, while needing to disappear to data progress data when carrying out data pick-up and integrated processing, And eliminate extra and unrelated attribute using the means of dimension abatement and effectively cut down the scale of data set, dimension abatement is using category Temper collection back-and-forth method helps effectively to reduce search space using heuristic knowledge, and heuristic search is typically based on possibility The local optimum of global optimum is obtained to instruct and help to obtain corresponding attribute set, these data pre-processed pass through Storage tool comes to carry out data storage to the data after association and polymerization by using the structure of unified definition, is needing It carries out needing first to carry out data conversion to data when data analysis, data conversion includes smoothing processing, is used to help remove data In noise;Total processing, summarizing or adding up to data operates;The general China's processing of data, is replaced with more abstract concept The data object of low level or data Layer;Normalization processing, by related attribute data project in proportion specific small range it In;Attribute construction processing, constructs new attribute, according to existing property set to help data handling procedure.Pass through big data analysis Technology carries out data analysis to data, carries out data explanation to data by introducing visualization technique and human-computer interaction technology, will Data analysis result shows terminal user, by the way that data are associated and polymerize with processing and is carrying out data pick-up and is integrating Data cleansing is carried out to data when processing, guarantees the quality of data and credibility, data degradation is gone using attribute set back-and-forth method Except extra and unrelated attribute, effectively cut down the scale of data set, keeps data quickly quasi- when carrying out data analysis Really, it while by data dependence analysis, data trend analysis and data signature analysis, and is counted by data analysis algorithm The accuracy of data retrieval is improved according to analysis, the process of entire information extraction screens data layer by layer, keeps retrieval accurate Property it is high.
The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although with reference to the foregoing embodiments Invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each implementation Technical solution documented by example is modified or equivalent replacement of some of the technical features;And these modification or Replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution.

Claims (10)

1. a kind of intelligent information retrieval method based on big data principle, characterized by the following steps:
S1, it is counted greatly by progress such as smart machine, enterprise's on-line system, enterprise's off-line system, social networks or internet platforms According to acquisition;
S2, the data source of collected extensive isomery is extracted and is integrated by data pick-up and integration tool;
S3, data are carried out to the data after being associated with and polymerizeing by using the structure of unified definition using storage tool Storage;
S4, data analysis is carried out using data of the data analysis technique to storage;
S5, result presentation will be analyzed to terminal user by visualization technique or human-computer interaction technology.
2. a kind of intelligent information retrieval method based on big data principle according to claim 1, it is characterised in that: in S1 The big data includes that RFID data, sensing data, user behavior data, social networks interaction data and movement are mutual Various types of structurings such as networking data, semi-structured and non-structured mass data, the acquisition of big data described in S1 The method acquired using database acquisition, system log acquisition, network data acquisition or awareness apparatus data.
3. a kind of intelligent information retrieval method based on big data principle according to claim 1, it is characterised in that: in S1 The data pick-up and integration tool need to carry out data cleansing and data to data when carrying out data pick-up and integrated processing Abatement, and the data from multiple data sources are combined together and are formed a uniform data set by data integration tool, with Just complete data basis is provided for smoothly completing for data processing work.
4. a kind of intelligent information retrieval method based on big data principle according to claim 1, it is characterised in that: described Data cleansing includes missing data processing, noise data processing and inconsistent data processing.
5. a kind of intelligent information retrieval method based on big data principle according to claim 3, it is characterised in that: described Data degradation eliminates extra and unrelated attribute using the means of dimension abatement and effectively cuts down the scale of data set and from original A data set simplified is obtained in huge data set, this is made to simplify the integrality that data set keeps legacy data collection.
6. a kind of intelligent information retrieval method based on big data principle according to claim 5, it is characterised in that: described Dimension abatement is searched out the smallest attribute set and is ensured that the probability distribution of new data subset to the greatest extent may be used using attribute set back-and-forth method Data mining can be carried out using the property set after screening, due to having used less category close to the probability distribution of original data set Property, so that user's Result easier to understand.
7. a kind of intelligent information retrieval method based on big data principle according to claim 1, it is characterised in that: in S2 The data pick-up and it is integrated include data being associated and polymerization processing.
8. a kind of intelligent information retrieval method based on big data principle according to claim 1, it is characterised in that: in S4 Described in Data Analysis Services when need to carry out data conversion to data by data conversion tools, data conversion is flat comprising data Sliding processing, the total processing of data, data generaliza-tion processing, normalized processing and the processing of data attribute construction.
9. a kind of intelligent information retrieval method based on big data principle according to claim 1, it is characterised in that: described Data analysis includes data dependence analysis, data trend analysis and data signature analysis, and is carried out by data analysis algorithm Data analysis.
10. a kind of intelligent information retrieval method based on big data principle according to claim 1, it is characterised in that: Needs to carry out data explanation to data by data interpretation technique after the analysis of data described in S4, and by visualization technique and Human-computer interaction technology is presented to terminal user for the characteristic come is explained.
CN201910595602.9A 2019-07-03 2019-07-03 A kind of intelligent information retrieval method based on big data principle Pending CN110297847A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910595602.9A CN110297847A (en) 2019-07-03 2019-07-03 A kind of intelligent information retrieval method based on big data principle

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910595602.9A CN110297847A (en) 2019-07-03 2019-07-03 A kind of intelligent information retrieval method based on big data principle

Publications (1)

Publication Number Publication Date
CN110297847A true CN110297847A (en) 2019-10-01

Family

ID=68030150

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910595602.9A Pending CN110297847A (en) 2019-07-03 2019-07-03 A kind of intelligent information retrieval method based on big data principle

Country Status (1)

Country Link
CN (1) CN110297847A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110851444A (en) * 2019-11-06 2020-02-28 北京许继电气有限公司 Data acquisition and analysis method for industrial workshop
CN111242519A (en) * 2020-04-24 2020-06-05 北京淇瑀信息科技有限公司 User characteristic data generation method and device and electronic equipment
CN111737328A (en) * 2020-06-11 2020-10-02 南京朗禾智能控制研究院有限公司 Big data analysis method applied to Internet of vehicles
CN111949884A (en) * 2020-08-26 2020-11-17 桂林电子科技大学 Multi-mode feature interaction-based depth fusion recommendation method
CN113254517A (en) * 2021-05-22 2021-08-13 北京德风新征程科技有限公司 Service providing method based on internet big data
CN113282636A (en) * 2021-04-13 2021-08-20 武汉天梯科技股份有限公司 Intelligent hardware system for intelligently acquiring offline big data
CN113342790A (en) * 2021-05-31 2021-09-03 重庆大数据人工智能创新中心有限公司 Big data processing method for realizing mixed data analysis
CN116821104A (en) * 2022-08-18 2023-09-29 南通泽烁信息科技有限公司 Industrial Internet data processing method and system based on big data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104715047A (en) * 2015-03-26 2015-06-17 浪潮集团有限公司 Social network data acquisition and analysis system
CN105069025A (en) * 2015-07-17 2015-11-18 浪潮通信信息系统有限公司 Intelligent aggregation visualization and management and control system for big data
CN106649718A (en) * 2016-12-22 2017-05-10 盐城工学院 Large data acquisition and processing method for PDM system
CN108446364A (en) * 2018-03-14 2018-08-24 湖南商学院 A kind of visual analysis method towards campus big data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104715047A (en) * 2015-03-26 2015-06-17 浪潮集团有限公司 Social network data acquisition and analysis system
CN105069025A (en) * 2015-07-17 2015-11-18 浪潮通信信息系统有限公司 Intelligent aggregation visualization and management and control system for big data
CN106649718A (en) * 2016-12-22 2017-05-10 盐城工学院 Large data acquisition and processing method for PDM system
CN108446364A (en) * 2018-03-14 2018-08-24 湖南商学院 A kind of visual analysis method towards campus big data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
武志学: "《大数据导论》", 1 April 2019, 人民邮电出版社 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110851444A (en) * 2019-11-06 2020-02-28 北京许继电气有限公司 Data acquisition and analysis method for industrial workshop
CN111242519A (en) * 2020-04-24 2020-06-05 北京淇瑀信息科技有限公司 User characteristic data generation method and device and electronic equipment
CN111242519B (en) * 2020-04-24 2020-07-17 北京淇瑀信息科技有限公司 User characteristic data generation method and device and electronic equipment
CN111737328A (en) * 2020-06-11 2020-10-02 南京朗禾智能控制研究院有限公司 Big data analysis method applied to Internet of vehicles
CN111949884A (en) * 2020-08-26 2020-11-17 桂林电子科技大学 Multi-mode feature interaction-based depth fusion recommendation method
CN111949884B (en) * 2020-08-26 2022-06-21 桂林电子科技大学 Multi-mode feature interaction-based depth fusion recommendation method
CN113282636A (en) * 2021-04-13 2021-08-20 武汉天梯科技股份有限公司 Intelligent hardware system for intelligently acquiring offline big data
CN113254517A (en) * 2021-05-22 2021-08-13 北京德风新征程科技有限公司 Service providing method based on internet big data
CN113342790A (en) * 2021-05-31 2021-09-03 重庆大数据人工智能创新中心有限公司 Big data processing method for realizing mixed data analysis
CN116821104A (en) * 2022-08-18 2023-09-29 南通泽烁信息科技有限公司 Industrial Internet data processing method and system based on big data

Similar Documents

Publication Publication Date Title
CN110297847A (en) A kind of intelligent information retrieval method based on big data principle
Isah et al. A survey of distributed data stream processing frameworks
US20200257680A1 (en) Analyzing tags associated with high-latency and error spans for instrumented software
JP2008546054A (en) Recognition of event patterns from event streams
Simmen et al. Large-scale graph analytics in aster 6: bringing context to big data discovery
US20120259865A1 (en) Automated correlation discovery for semi-structured processes
US8326982B2 (en) Method and apparatus for extracting and visualizing execution patterns from web services
US20110040805A1 (en) Techniques for parallel business intelligence evaluation and management
CN114461644A (en) Data acquisition method and device, electronic equipment and storage medium
CN111492344A (en) System and method for monitoring execution of structured query language (SQ L) queries
Adhikari et al. Advances in knowledge discovery in databases
Mohbey Memory-optimized distributed utility mining for big data
Yuan et al. Research on technologies and application of data mining for cloud manufacturing resource services
CN113254517A (en) Service providing method based on internet big data
CN112818230A (en) Content recommendation method and device, electronic equipment and storage medium
Theeten et al. Chive: Bandwidth optimized continuous querying in distributed clouds
Jin et al. [Retracted] Cloud Statistics of Accounting Informatization Based on Statistics Mining
Skhiri et al. Large graph mining: recent developments, challenges and potential solutions
Ramesh et al. Granite: A distributed engine for scalable path queries over temporal property graphs
Dong et al. Select actionable positive or negative sequential patterns
CN104834976A (en) Method for searching for, analyzing and predicting price change trend of memory chip through big data
WO2021217119A1 (en) Analyzing tags associated with high-latency and error spans for instrumented software
Lee et al. Detecting anomaly teletraffic using stochastic self-similarity based on Hadoop
Adhikari et al. Developing multi-database mining applications
Gu et al. Characterizing job-task dependency in cloud workloads using graph learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191001