OGG Flume Integration
OGG Flume Integration
OGG Flume Integration
Introduction
This how-to paper is written to illustrate the architecture, configuration and the implementation for
developing a custom adapter for streaming relational database transactions to Flume using the
Oracle GoldenGate Adapter. Please note, the code and the configuration of the Flume adapter
illustrated in this how-to is only meant to highlight the capability of the Oracle GoldenGate product.
As such, no support is provided by Oracle for the code and configuration illustrated in paper.
The code shown in this document is also contained in the accompanying archive
SampleHandlerFlume.zip. For simplicity the contents of the zip can be unpacked into the dirprm
subdirectory of the Oracle GoldenGate installation.
Hadoop Configuration
It is assumed that the reader has working knowledge of Hadoop technologies, if not,
hadoop.apache.com is a good source for learning about Hadoop. Also, it assumed that Hadoop and
Flume is installed and configured.
Oracle GoldenGate
Identify the type and version of OS platform (Linux, Windows, etc) and the type and version of
source relational data store (Oracle, MySQL etc.). Using the information about the OS and
database, download the appropriate Oracle GoldenGate binary. Refer to Oracle GoldenGate guides
for installing and configuring Oracle GoldenGate. If this is the first time using GoldenGate then it is
recommended to do a simple end to end replication between two relational databases to better grasp
the concepts
Download and install the Oracle GoldenGate Adapter. Please refer to Oracle GoldenGate Adapters
Administrators Guide for Java and get familiarized with the terminology and configuration of the
Java adapter.
configuration, the adapter will write the transactions in a desired format, with the appropriate
content into the file as defined by the Hive DDL for the table. Please refer to the Oracle
GoldenGate Developing Custom Java Adapter Handbook for details about the architecture and
developing a custom adapter.
Flume Agent
Capture
Trail
Files
Pump
Flume
Adapter
Source Channel
Sink
HDFS/
Hive
Source
Database
Pump
Parameter
file
Adapter
Properties
file
Flume
Config file
Create a table in the MySQL database and prepare the table for replication.
Configure Flume for delivery to HDFS.
Configure the Oracle GoldenGate Capture to extract transactions from the MySQL
database and create the trail file.
Configure the property file for the Flume handler.
Code, Compile and package the custom Flume handler.
Configure the Oracle GoldenGate Pump to read the trail and invoke the custom
adapter.
(Optional) Create a table in Hive that corresponds to the table in Oracle database.
Execute a test.
oggmovie.sinks.hdfs-sink.hdfs.fileType=DataStream
oggmovie.sinks.hdfs-sink.hdfs.rollInterval=60
oggmovie.sinks.hdfs-sink.hdfs.fileSuffix=.txt
oggmovie.sinks.hdfs-sink.channel=memoryChannel
Please start the Flume agent using the following command:
> flume-ng agent --conf conf --conf-file flume.conf --name oggmovie -Dflume.root.logger=INFO,console
Please make sure the Flume agent is running when starting the GoldenGate Pump process later.
Configure Oracle GoldenGate Capture process
Sample Capture parameter file dirprm/emov.prm:
EXTRACT emov
DBOPTIONS HOST localhost, CONNECTIONPORT 3306
SOURCEDB odidemo, USERID root, PASSWORD welcome1
EXTTRAIL ./dirdat/tm
GETUPDATEBEFORES
TRANLOGOPTIONS ALTLOGDEST /var/lib/mysql/log/bigdatalite-bin.index
TABLE odidemo.*;
Add and start the Extract using the following command from the 'ggsci' command shell and ensure
it starts successfully.
ggsci>add extract emov, tranlog, begin now
ggsci>add exttrail ./dirdat/tm, extract emov, megabytes 50
ggsci>start extract emov
ggsci>info emov
this.flumeHost = flumeHost;
}
Command
JAVA_HOME
PATH
export PATH=$JAVA_HOME/bin:$PATH
LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$JAVA_HOME/jre/lib/amd64/server:
$LD_LIBRARY_PATH
CLASSPATH
Compile the code using the following command in the java source directory.
>javac com/oracle/demo/SampleHandlerFlume.java
Successfully executing the above command will result in creating SampleHiveHandler.class under
the subdirectory com/oracle/demo.
Create the jar using the following command in the java source directory
>jar cvf myflumehandler.jar
com
Alternatively, instead of start pmov in GGSCI you can execute the process on the OS command
line. This is useful for debugging and has the advantage of seeing all output and error messages
directly in standard output.
> extract paramfile ./dirprm/pmov.prm
Execute a test
Perform the following transaction on the database.
insert into MOVIE (MOVIE_ID,TITLE,YEAR,BUDGET,GROSS,PLOT_SUMMARY) values (1,