CN105279280A - Method and tool for quickly migrating oracle data to MPP database - Google Patents
Method and tool for quickly migrating oracle data to MPP database Download PDFInfo
- Publication number
- CN105279280A CN105279280A CN201510786466.3A CN201510786466A CN105279280A CN 105279280 A CN105279280 A CN 105279280A CN 201510786466 A CN201510786466 A CN 201510786466A CN 105279280 A CN105279280 A CN 105279280A
- Authority
- CN
- China
- Prior art keywords
- data
- metadata
- oracle
- subtask
- database
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/214—Database migration support
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a method and a tool which perform functions of quickly extracting data in an oracle database, converting the data into the data which can be identified by an MPP database and quickly loading the converted data into the MPP database, thereby supporting data exchange in an enterprise online transaction system and a big data platform based on the method and the tool.
Description
Technical field
The present invention relates to the technology of the migration of subscriber data of OLTP and OLAP field MPP data-base cluster, particularly the data of fast transferring oracle database are to the method for MPP data-base cluster.
Background technology
Along with the fast development of informationization technology, number of users, the data volume of an enterprise all present explosive growth, while portfolio improves, the visit capacity of database and data volume increase database processing power and calculating strength also corresponding increase fast, sharply expanding of data causes a set of database can not support business transaction system and the statistical analysis system of enterprise simultaneously, mainstream solution enterprise builds a set of large data platform again on existing online trading system basis, market has occurred the large data platform based on MPP and hadoop Liang great camp.Under such background, fast and accurately the Data Migration in enterprise's online trading system database is arisen at the historic moment to the demand in MPP or hadoop system.Laminating market demand this patent describes a kind of fast transferring Oracle data to the method in MPP data-base cluster.
Summary of the invention
Technical matters to be solved by this invention is on the basis of existing technology, propose a kind of can data in rapid extraction oracle database be converted into data rapid loading that MPP database can identify to the method in MPP database and instrument, support the exchanges data of enterprise's online trading system and large data platform based on this.
The technical scheme that the present invention takes is: a kind of fast transferring Oracle data, to the method for MPP database, comprise the steps:
(1) obtain the corresponding metadata information of oracle database, and be split as multiple subtask walked abreast according to metadata information and strategy pattern;
(2) subtask split out in concurrence performance step (1), carries out data pick-up and conversion operations;
(3) data that step (2) data pick-up and conversion operations obtain are loaded to MPP database.
Further, in described step (1), metadata information comprises table name, partition information, maximum, the minimum ROWID value of each block of table.
Further, the method that in described step (1), subtask splits is:
(11) according to the metadata information obtained, the total data summary info of a table is calculated;
(12) carry out cutting according to metadata digest information to the data of whole table, cutting is the output of multiple subtask;
(13) each subtask processes a section, and the intersection of whole subtask is the partial data of this table.
The Migration tools that the present invention adopts is the instruments of a kind of fast transferring Oracle data to MPP database, comprises metadata and obtains and computing module, parallel extraction modular converter, Data import module; Described metadata obtains to computing module for obtaining the corresponding metadata information of oracle database, and is split as multiple subtask walked abreast according to metadata information and strategy pattern; Described parallel extraction modular converter is used for the subtask that concurrence performance splits out, and carries out data pick-up and conversion operations; The data that described Data import module is used for parallel extraction modular converter obtains load to MPP database.
Further, described metadata obtains and comprises metadata information acquiring unit with computing module, for obtaining table name, and partition information, maximum, the minimum ROWID value of each block of table; Also comprise computing unit, for according to the metadata information obtained, calculate the total data summary info of a table; Also comprise cutting unit, for carrying out cutting according to metadata digest information to the data of whole table, cutting is the output of multiple subtask.
Further, described metadata obtains with computing module, walks abreast and extract modular converter, Data import module producer consumer pattern each other, ensures high-performance line production.
Further, have strategy pattern unit at Data import module installation, loading strategy pattern for setting data is that high scalability adapts to the multiple loading data pattern of MPP database or the diversified output mode of data.
The advantage of this patent and beneficial effect are:
1. performance boost, the traditional deriving method unit list table comparing oracle derives performance boost to more than 400GB/h
2. dispose selecting property of flexible and selectable strong, oracle client can be relied on, can not rely on
3. high scalability, can distributed deployment, and performance linear promotes
Accompanying drawing explanation
Fig. 1 general plan Organization Chart;
Fig. 2 distributed fast transferring oracle database data are to MPP data base tool Organization Chart;
The quick derived data principle schematic of Fig. 3.
Embodiment
Integral deployment framework of the present invention as shown in Figure 1, extracts fast, changes, the instrument loaded data in MPP data-base cluster can be deployed as distributed mode and single cpu mode from oracle database.
Migration tools of the present invention is divided into three large modules on the whole: metadata obtains and computing module, parallel extraction modular converter, Data import module (software architecture figure is as Fig. 2).
Wherein the principle of metadata acquisition and the quick extracted data of computing module as shown in Figure 3.
The migration flow process of Migration tools is:
1. Migration tools starts, and obtains configuration information, and task is derived table data content be loaded in mpp cluster from oracle data;
2. Migration tools sets up the connection with oracle database, is obtained and obtains corresponding metadata information to computing module, and this subtask is split as multiple subtask walked abreast according to metadata information and strategy pattern by metadata;
3. Migration tools performs data pick-up work, the subtask split out is assigned to concurrent extraction modular converter, concurrence performance data pick-up and conversion operations in step 2;
4. the data that concurrent extraction modular converter obtains are submitted to Data import module by Migration tools, complete the load operation to mpp database by Data import module;
In described step 2, metadata letter comprises table name, partition information, maximum, the minimum ROWID value of each block of table, according to the metadata information obtained, can calculate the total data summary info of a table, according to metadata digest information, cutting is carried out to the data of whole table, cutting is the output of multiple subtask, and each subtask processes a section, and the intersection of whole subtask is the partial data of this table;
In described step 2, metadata obtains and computing module, parallel extraction modular converter, and Data import module is producer consumer pattern each other, ensures high-performance line production; Strategy pattern high scalability is used to adapt to the multiple loading data pattern of MPP database or the diversified output mode of data in Data import pattern.
Enumerate an embodiment scene below: at IP be 192.168.103.65 server on SID be the table that test user in the oracle data of orcl has a test by name, the inside comprises 3.3 hundred million (331941213) bar data.
1. receive and derive process request, metadata to obtain with computing module according to the SID of the IP address of formulating and port numbers and Oracle and derives agreement (OCI or the thin) creation database used and be connected.
2. if case of non-partitioned tables, according to the table name of specifying and user's name, from the data in the system views such as the system view dba_extents of oracle use ROWID_CREATE function to calculate all block of current table and terminate ROWID metadata array.
3. if partition table, if derive full table, according to the table name of specifying to system view dba_tab_partitions, dba_extents, etc. the ROWID type in system view, block and blocksize is calculated the beginning of all block of all subregions of current table and terminates ROWID metadata array by ROWID_CREATE function.
4. if partition table, but only derive current bay, according to the table name of specifying to the ROWID type in the system views such as system use dba_extents, block and blocksize is calculated the beginning of all block of all subregions of current table and terminates ROWID metadata array by ROWID_CREATE function.
5. after the metadata array of table derives, give parallel extraction modular converter, the parallel modular converter that extracts is if single cpu mode, open multithreading, each thread once extracts the data of a block, read in binary form, give modular converter and carry out the load format that data are converted to the identification of MPP database.
6., if the parallel modular converter that extracts is deployed as distributed mode, every station server simultaneous multi-threading, each thread of every station server derives the data of a block, is carried out fast the management of instrument self metadata everywhere by zookeeper.
7. persistence architecture has the prompt access interface of Data import module schedules MPP database to put in storage after completing in the memory cache queue of the machine, if the loading API that the Data import instrument of MPP database does not directly accept binary stream can select to land after formatted text lands reloads in MPP database.
Claims (7)
1. fast transferring Oracle data are to a method for MPP database, it is characterized in that, comprise the steps:
(1) obtain the corresponding metadata information of oracle database, and be split as multiple subtask walked abreast according to metadata information and strategy pattern;
(2) subtask split out in concurrence performance step (1), carries out data pick-up and conversion operations;
(3) data that step (2) data pick-up and conversion operations obtain are loaded to MPP database.
2. a kind of fast transferring Oracle data according to claim 1 are to the method for MPP database, it is characterized in that: in described step (1), metadata information comprises table name, partition information, maximum, the minimum ROWID value of each block of table.
3. a kind of fast transferring Oracle data according to claim 1 and 2 are to the method for MPP database, it is characterized in that, the method that in described step (1), subtask splits is:
(11) according to the metadata information obtained, the total data summary info of a table is calculated;
(12) carry out cutting according to metadata digest information to the data of whole table, cutting is the output of multiple subtask;
(13) each subtask processes a section, and the intersection of whole subtask is the partial data of this table.
4. fast transferring Oracle data are to an instrument for MPP database, it is characterized in that: comprise metadata and obtain and computing module, parallel extraction modular converter, Data import module; Described metadata obtains to computing module for obtaining the corresponding metadata information of oracle database, and is split as multiple subtask walked abreast according to metadata information and strategy pattern; Described parallel extraction modular converter is used for the subtask that concurrence performance splits out, and carries out data pick-up and conversion operations; The data that described Data import module is used for parallel extraction modular converter obtains load to MPP database.
5. a kind of fast transferring Oracle data according to claim 4 are to the instrument of MPP database, it is characterized in that, described metadata obtains and comprises metadata information acquiring unit, for obtaining table name with computing module, partition information, maximum, the minimum ROWID value of each block of table; Also comprise computing unit, for according to the metadata information obtained, calculate the total data summary info of a table; Also comprise cutting unit, for carrying out cutting according to metadata digest information to the data of whole table, cutting is the output of multiple subtask.
6. a kind of fast transferring Oracle data according to claim 4 or 5 are to the instrument of MPP database, it is characterized in that: described metadata obtains with computing module, walks abreast and extract modular converter, Data import module producer consumer pattern each other, ensures high-performance line production.
7. a kind of fast transferring Oracle data according to claim 4 or 5 are to the instrument of MPP database, it is characterized in that: have strategy pattern unit at Data import module installation, loading strategy pattern for setting data is that high scalability adapts to the multiple loading data pattern of MPP database or the diversified output mode of data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510786466.3A CN105279280A (en) | 2015-11-16 | 2015-11-16 | Method and tool for quickly migrating oracle data to MPP database |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510786466.3A CN105279280A (en) | 2015-11-16 | 2015-11-16 | Method and tool for quickly migrating oracle data to MPP database |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105279280A true CN105279280A (en) | 2016-01-27 |
Family
ID=55148294
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510786466.3A Pending CN105279280A (en) | 2015-11-16 | 2015-11-16 | Method and tool for quickly migrating oracle data to MPP database |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105279280A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105760212A (en) * | 2016-02-02 | 2016-07-13 | 贵州大学 | Data redistribution method and device based on vessels |
CN106572172A (en) * | 2016-11-07 | 2017-04-19 | 湖北省农村信用社联合社网络信息中心 | Multi-process data migration method based on Hash algorithm |
CN107291764A (en) * | 2016-04-05 | 2017-10-24 | 中兴通讯股份有限公司 | A kind of big data exchange method and device, system |
CN108446145A (en) * | 2018-03-21 | 2018-08-24 | 苏州提点信息科技有限公司 | A kind of distributed document loads MPP data base methods automatically |
CN109213751A (en) * | 2018-08-06 | 2019-01-15 | 北京所问数据科技有限公司 | Oracle database parallel migration technology based on Spark platform |
CN111581179A (en) * | 2019-02-19 | 2020-08-25 | 上海云桓信息科技有限公司 | Data migration method and tool from Oracle to MySQL |
CN113656474A (en) * | 2021-08-05 | 2021-11-16 | 京东科技控股股份有限公司 | Service data access method and device, electronic equipment and storage medium |
CN116756150A (en) * | 2023-08-16 | 2023-09-15 | 浩鲸云计算科技股份有限公司 | Mpp database large table association acceleration method |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080288498A1 (en) * | 2007-05-14 | 2008-11-20 | Hinshaw Foster D | Network-attached storage devices |
CN102999537A (en) * | 2011-09-19 | 2013-03-27 | 阿里巴巴集团控股有限公司 | System and method for data migration |
US20140156666A1 (en) * | 2012-11-30 | 2014-06-05 | Futurewei Technologies, Inc. | Method for Automated Scaling of a Massive Parallel Processing (MPP) Database |
CN103902593A (en) * | 2012-12-27 | 2014-07-02 | 中国移动通信集团河南有限公司 | Data transfer method and device |
CN104123392A (en) * | 2014-08-11 | 2014-10-29 | 吉林禹硕动漫游戏科技股份有限公司 | Tool and method for transferring relational database to HBase |
CN104899333A (en) * | 2015-06-24 | 2015-09-09 | 浪潮(北京)电子信息产业有限公司 | Cross-platform migrating method and system for Oracle database |
-
2015
- 2015-11-16 CN CN201510786466.3A patent/CN105279280A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080288498A1 (en) * | 2007-05-14 | 2008-11-20 | Hinshaw Foster D | Network-attached storage devices |
CN102999537A (en) * | 2011-09-19 | 2013-03-27 | 阿里巴巴集团控股有限公司 | System and method for data migration |
US20140156666A1 (en) * | 2012-11-30 | 2014-06-05 | Futurewei Technologies, Inc. | Method for Automated Scaling of a Massive Parallel Processing (MPP) Database |
CN105009110A (en) * | 2012-11-30 | 2015-10-28 | 华为技术有限公司 | Method for automated scaling of massive parallel processing (mpp) database |
CN103902593A (en) * | 2012-12-27 | 2014-07-02 | 中国移动通信集团河南有限公司 | Data transfer method and device |
CN104123392A (en) * | 2014-08-11 | 2014-10-29 | 吉林禹硕动漫游戏科技股份有限公司 | Tool and method for transferring relational database to HBase |
CN104899333A (en) * | 2015-06-24 | 2015-09-09 | 浪潮(北京)电子信息产业有限公司 | Cross-platform migrating method and system for Oracle database |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105760212A (en) * | 2016-02-02 | 2016-07-13 | 贵州大学 | Data redistribution method and device based on vessels |
CN105760212B (en) * | 2016-02-02 | 2019-04-12 | 贵州大学 | A kind of fast resampling method and device based on container |
CN107291764A (en) * | 2016-04-05 | 2017-10-24 | 中兴通讯股份有限公司 | A kind of big data exchange method and device, system |
CN106572172A (en) * | 2016-11-07 | 2017-04-19 | 湖北省农村信用社联合社网络信息中心 | Multi-process data migration method based on Hash algorithm |
CN108446145A (en) * | 2018-03-21 | 2018-08-24 | 苏州提点信息科技有限公司 | A kind of distributed document loads MPP data base methods automatically |
CN109213751A (en) * | 2018-08-06 | 2019-01-15 | 北京所问数据科技有限公司 | Oracle database parallel migration technology based on Spark platform |
CN109213751B (en) * | 2018-08-06 | 2021-11-23 | 北京所问数据科技有限公司 | Spark platform based Oracle database parallel migration method |
CN111581179A (en) * | 2019-02-19 | 2020-08-25 | 上海云桓信息科技有限公司 | Data migration method and tool from Oracle to MySQL |
CN113656474A (en) * | 2021-08-05 | 2021-11-16 | 京东科技控股股份有限公司 | Service data access method and device, electronic equipment and storage medium |
CN116756150A (en) * | 2023-08-16 | 2023-09-15 | 浩鲸云计算科技股份有限公司 | Mpp database large table association acceleration method |
CN116756150B (en) * | 2023-08-16 | 2023-10-31 | 浩鲸云计算科技股份有限公司 | Mpp database large table association acceleration method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105279280A (en) | Method and tool for quickly migrating oracle data to MPP database | |
CN106649378B (en) | Data synchronization method and device | |
CN111382226B (en) | Database query and retrieval method and device and electronic equipment | |
CN103020281B (en) | A kind of data storage and retrieval method based on spatial data numerical index | |
CN104899295B (en) | A kind of heterogeneous data source data relation analysis method | |
CN101571861B (en) | Method and device for converting data table | |
CN102262640A (en) | Method and device for full-text retrieval of document database | |
CN105138661A (en) | Hadoop-based k-means clustering analysis system and method of network security log | |
US8880463B2 (en) | Standardized framework for reporting archived legacy system data | |
CN107679146A (en) | Power grid data quality verification method and system | |
CN108241627A (en) | A kind of isomeric data storage querying method and system | |
CN103177035A (en) | Data query device and data query method in data base | |
CN103646100A (en) | Report data organization model | |
CN110263021B (en) | Theme library generation method based on personalized label system | |
CN107798120B (en) | Data conversion method and device | |
CN105677858A (en) | Data collection method and device based on big data technology framework | |
CN104572894A (en) | Method for describing service model by utilizing XML (Extensible Markup Language) in business intelligence and business intelligence system | |
CN105205621A (en) | High-performance information management system and data processing method for bioinformatics | |
CN102779160A (en) | Mass data information indexing system and indexing construction method | |
CN109857822A (en) | Meta-model conversion method and management system based on chart database | |
CN105138638A (en) | Database distribution method based on application layer | |
CN104715076A (en) | Multi-threaded data processing method and device | |
CN108228787A (en) | According to the method and apparatus of multistage classification processing information | |
CN107291938A (en) | Order Query System and method | |
CN104714956A (en) | Comparison method and device for isomerism record sets |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160127 |