Infobright Best Practices
Infobright Best Practices
Infobright Best Practices
com
Agenda
Infobright Architecture Review Installation Tips Leveraging the Architecture Toolset Integration Getting Started Down the Right Path Q&A
1. 2. 3. 4.
Column orientation Data packs and Compression Knowledge Grid Granular Engine
1. Column Orientation
IncomingData
EMP_ID 1 2 3 FNAME Moe Curly Larry LNAME Howard Joe Fine SALARY 10000 12000 9000
ColumnOrientedLayout
(1,2,3;Moe,Curly,Larry;Howard,Joe,Fine;10000,12000,9000;)
typeanddistribution
Compression
Resultsvarydependingonthe
KnowledgeNodes
builtforeachDataPack
ColumnB
Dynamicknowledge
KnowledgeNodesanswerthequerydirectly,or IdentifyonlyrequiredDataPacks,minimizingdecompression,and Predictrequireddatainadvancebasedonworkload
4. Granular Engine
Infobright Database RoughSet GranularEngine Report
1%
CompressedData
Leverage DomainExpert
DomainExpert: Breakthrough Analytics Enables Infobright and users to add intelligence into Knowledge Grid directly with no schema changes Optimized for web data analysis
IP addresses Email addresses URL/URI
DomainExpert
Intelligence to automatically optimize the database
Can cut query time in half when using this data Improves compression
Leverage DomainExpert
Pattern recognition in data enables faster query performance
Patterns defined and stored Complex fields decomposed into more homogeneous parts Database uses this information when processing query
http://www.infobright.com/News-&-Events/Events/
Installation Tips
Installation Tips
Install Directory: Dont install to Linux /home or Windows /program files
IEE Evaluation Key (Linux or Windows): Put .lic file in the /path/to/infobright directory. (Eval only) IEE Evaluation Key Windows: Put the .license file in the installation directory. (Eval only) IEE Evaluation: After starting Infobright for the first time, backup the data/iblicense.dat file.
Note: The memory settings assume that there are no other services on the machine consuming significant memory. If this is not the case, lower the memory settings for Infobright.
DO Create tables with lots of columns Only use the columns you need to complete a specific query Avoid Select * from wide tables Using views with many columns
Block level replication Active/Passive Moves compressed data image DRBD Use DLP, Infobright loader, or MySQL loader Use IEE or ICE
Adding as many WHERE conditions as you can to your SQL increases the chance that knowledge grid statistics can be used to speed up your queries.
Becomes
select sum(dlr_trans_amt), msa_id from fact_sales a where trans_date in (select trans_date from dim_dates b where b.trans_year=2006 and b.trans_month='MARCH') and msa_id in (select msa_id from dim_msa where msa_name in ('BIRMINGHAMHOOVER', 'NAPLESMARCO ISLAND', 'CHAMPAIGNURBANA') group by msa_id; 3 rows in set (21.28 sec)
Character best practice Sub-selects with surrogate keys Column option lookup http://www.infobright.org/wiki/How_ and_When_to_use_Lookups/ Chksum columns on large strings Binary collations
Becomes
Create Table Customer( Customer_Key integer, Customer_Name varchar(50), Customer_Address varchar(300), Category varchar(10) comment lookup, Customer_Name_MD5 bigint, Customer_Address_MD5 bigint);
Original Query SELECT ... FROM table WHERE str=value. Becomes SELECT ... FROM table WHERE str=value AND cksum=cksum(str)
Toolset Integration
Bear in Mind
The unique attributes of Infobright are transparent to developers. The benefits are obvious and immediate to users. Infobright is a relational database Infobright observes and obeys SQL standards Infobright observes and obeys standards-based connectivity
Design tools Development tools Administrative tools Query and reporting tools
Infobright Development
When developing applications, you can use: Industry standard interfaces including those listed below; Comprehensive Management Services and Utilities; Robust connectivity with BI Tools.
Connector/ODBC Connector/NET Connector/J Connector/MXJ Connector/C++ Connector/C C API PHP API Perl API C++ API Python API Ruby APIs
Note: API calls are restricted to the functional support of the Brighthouse engine. (e.g. mysql_stmt_insert_id )
Points to Remember
Default port =5029 No Explain Plan
BI tools
MicroStrategy Jaspersoft Pentaho BIRT
ETL Tools
Talend (aka Jasper ETL) Pentaho Data Integration
Before starting your evaluation, define a concise set of target objectives and requirements Remember that Infobright shows value with medium to large data sizes (>100GB and growing on up)
Dont undersize the evaluation
A planning document is available that can help with the evaluation exercise
The community has contributed code called ICE Breakers that can make data migration easier Industry tools can be used but will require manual intervention
Additional Resources
Both infobright.com and infobright.org have additional documentation and white papers Ask a question on the Infobright forum If you are new to MySQL you can also visit www.mysql.com for additional help You can also visit the Infobright YouTube channel
Infobright Community Edition with Pentaho: http://www.infobright.org/Downloads/Pentaho_ICE_VM/ Infobright Community Edition with Talend: http://www.infobright.org/Downloads/Talend_VM/ Infobright Community Edition with Jaspersoft: http://www.infobright.org/Downloads/Jaspersoft_ICE_VM/ Infobright Community Edition with Jaspersoft AND Talend: http://www.infobright.org/Downloads/Talend_VM/
Questions?
For the open community
ICE Quick Start (http://www.infobright.org/wiki) ICE FAQ (http://www.infobright.org/Resources/FAQ/) ICE Data Loading Guide (http://http://www.infobright.org/wiki/Data_Loading/ MySQL Online Tutorial (http://dev.mysql.com/doc/refman/5.1/en/tutorial.html)
080