Nothing Special   »   [go: up one dir, main page]

Teradata Performance Tuning - Basic Tips

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 7

Performance tuning Tips:

Teradata Performance Tuning - Basic Tips


Performance tuning thumb rules. Here are very basic steps which are used to PT any given query in given environment . As a pre-requiste , make sure - user has proper select rights and actual profile settings - Enough space available to run and test the queries

1. Run explain plan (pressing F6 or EXPLAIN sel * ,)


Then see for potential information like - No or low confidence - Product joins conditions - By way of an all row scan - FTS - Translate Also check for - Distinct or group by keywords in SQL query - In/ not in keywords and check for the list of values generated for the same

APPROACHES A. In case of product join scenarios, check for - Proper usage of alias - joining on matching columns - Usage of join keywords - like specifying type of joins (ex. inner or outer) - use union in case of "OR scenarios - Ensure statistics are collected on join columns and this is especially important if the columns you are joining on are not unique.

B. collects stats

- Run command "diagnostic help stats on for the session" - Gather information on columns on which stats has to be collected - Collect stats on suggestions columns - Also check for stats missing on PI, SI or columns used in joins - "help stats <databasename>.<tablename> - Make sure stats are re-collected when at-least 10% of data changes - remove unwanted stats or stat which hardly improves performance of the queries - Collect stats on columns instead of indexes since index dropped will drop stats as well!! - collect stats on index having multiple columns, this might be helpful when these columns are used in join conditions - Check if stats are re-created for tables whose structures have some changes C. Full table scan scenarios - Try to avoid FTS scenarios as, it might take very long time to access all the data in every amp in the system - Make sure SI is defined on the columns which are used as part of joins or Alternate access path. - Collect stats on SI columns else there are chances where optimizer might go for FTS even when SI is defined on that particular column

2. If intermediate tables are used to store results, make sure that


- It has same PI of source and destination table

3. Tune to get the optimizer to join on the Primary Index of the largest table,
when possible, to ensure that the large table is not redistributed on AMPS

4. For large list of values, avoid using IN /NOT IN in SQLs. Write large list values
to a temporary table and use this table in the query

5. Make sure when to use exists/not exists condition since they ignore
unknown comparisons (ex. - NULL value in the column results in unknown) . Hence this leads to inconsistent results

6. Inner Vs Outer Joins


Check which join works efficiently in given scenarios.Some examples are - Outer joins can be used in case of large table joining with small tables (like fact table joining with Dimension table based on reference column)

- Inner joins can be used when we get actual data and no extra data is loaded into spool for processing Please note for outer join conditions: 1. Filter condition for inner table should be present in "ON" condition 2. Filter condition for outer table should be present in "WHERE" condition

LIKE Clause
While tuning queries in Teradata , We take care of major performance issues but ignore small cases which might still cause the query to perform badly. I wanted to mention about one such case of LIKE clause , which many people good at performance tuning miss it assuming like patterns does not harm the performance . But in reality this is not so!! If LIKE is used in a WHERE clause, it is better to try to use one or more leading character in the clause, if at all possible. eg; LIKE '%STRING%' will be processed differently compared to LIKE 'STRING %' If a leading character 'STRING%' is used in the begining of like clause , the the Qptimizer makes use of an index to perform on query thereby increasig the performance But if the leading character' in '%STRING%' is a wildcard(say '%') , then the Optimizer will not be able to use an index, and a full table scan (FTS ) must be run, which reduces performance and takes more time. Hence it is suggested to go for '%STRING%' only if STRING is a part of entire pattern say 'SUBSTRING'

Inlist vs Between
There are lot of questions running around guys who do tuning . Sometimes they do suggest use of temporary tables instead of using large in list values, as the optimizer would go for value in XXX or Value in XXY or Value in XXZ to generate a explain plain. What if the column compared against large in-list values was part of any index say PI,SI,JI.... ? Sometimes it so happens that even after using a temp table with list of values you would still get same performance issue, why so? Did you ever consider to use "between" clause to check if the query performed better ?? Did you... I would say give it a try to see if this would be much better option compared to standard "temp table " against the in list

Say for example: SELECT customer_number, customer_name FROM customer WHERE customer_number in (1000, 1001, 1002, 1003, 1004); is much less efficient than: SELECT customer_number, customer_name FROM customer WHERE customer_number BETWEEN 1000 and 1004

We are assuming that an index on customer_number, the Query Optimizer can locate a range of numbers much faster (using BETWEEN) than it can find a series of numbers using the IN clause. Check for explain to compare difference, if still the same... refresh/collect stats on index column it would help. If still you would find some kind of issue try to find out skewness of column using following query and try to rectify the issue. Sel hashamp(hashbucket(hashrow(customer_number))), count(*) from Customer group by 1; Over all i would say trying the query with between would be of great help. But query tuning is such a complex thing you will never get what you want unless you understand the data ;-)

Join Considerations
If you are working on writing queries, working on performance or helping in betterment of performance. You will have to take sometime in going through this topic. It is all to do about Joins which is most important concern in Teradata. If some light is given to following suggestions, any join related issues can be taken care off...

Tip 1: Joining on PI/NUPI/ Non PI columns


We should make sure join is happening on columns composed of UPI/NUPI. But why?? Whenever we join two tables on common columns, the smart optimizer will try to take data from both the data into a common spool space and join them to get results. But getting data from both the tables into common spool has overhead.

What if I joined a very large table with small table?

Should small table be redistributed or large table? Should small table be duplicated across all the AMPs? Should both the tables be redistributed across all the AMPs??

Here is some basic thumb rules on joining columns on Index, so joining happens faster.

Case 1 - P.I = P.I joins


There is no redistribution of data over amp's. Since amp local joins happen as data are present in same AMP and need not be re-distributed. These types of joins on unique primary index are very fast.

Case 2 - P.I = Non PI column joins


-Data from second table will be re-distributed on all amps since joins are happening on PI vs. NUPI column. Ideal scenario is when small table is redistributed to be joined with large table records on same amp -Data in small table is duplicated to Every AMP where it is joined locally with large table

Case 3 - No Index = Non PI column joins


Data from both the tables are redistributed on all AMPs. This is one of the longest processing queries , Care should be taken to see that stats are collected on these columns

Tip 2:

The columns part of join must be of the same data type (CHAR, INTEGER,). But why?!? When trying to join columns from two tables, optimizer makes sure that datatype is same or else it will translate the column in driving table to match that of derived table. Say for example TABLE employee deptno (char) TABLE dept deptno (integer) If I am joining employee table with Dept on employee.deptno(char) = dept.deptno(Integer), optimizer will convert character column to Integer resulting in translation . What would happen if employee table had 100 million records and every time deptno would have to undergo Translation. So we have to make

sure to avoid such scenarios since translation is a cost factor and might need time and system resources. Make sure you are joining columns that have same data types to avoid translation!!!!

Tip 3 :

Do not use functions like SUBSTR, COALESCE , CASE ... on the indices used as part of Join. Why?!? It is not recommended not to use functions such as SUBSTR, COALESCE, CASE and others since they add up to cost factor resulting in performance issue. Optimizer will not be able to read stats on those columns which have functions as it is busy converting functions. This might result in Product join, spool out issues and optimizer will not be able to take decisions since no stats/demographics are available on column. It might assume column to have 100 values instead of 1 million values and might redistribute on wrong assumption directly impacting performance.

Tip 4 : use NOT NULL where ever possible!


What?!! Did someone say Not Null?? .. Yes, we have to make sure to use NOT null for columns which are declared as NULLABLE in TABLE definition. Reason being that all the Null values might get sorted to one poor AMP resulting in infamous " NO SPOOL SPACE " Error as that AMP cannot accommodate any more Null values. SO remember to use NOT NULL in joining so that table SKEW can be avoid . Since V2R5 , teradata automatically adds the condition IS NOT NULL to the query. Still it is better to ensure NOT NULL columns are not included as part of the join

Locking table for access?


We would have come across this statement in many queries which are run in sensitive environments like PROD, UAT. They can be used with views or sometimes just for querying purpose. I wanted to discuss how important this statement would be in real-time /active data warehouses where lot of users will be striking queries on same database at the time. create as locking select table Employee.Dept_emp for access * view Employee.view_employ_withLock

from Employee.Dept_emp ; By using locking table for access, we make sure that normal "access" lock is applied on table which is required to fetch results. By doing so , - There is no waiting for other locks to release since access lock can be applied on table which has read/write lock applied to it - This will cause the query to execute even when some lock is applied , but accessing data using this lock might not be consistent as it might result in dirty read due to concurrent write on the same table. It is always suggested to use locking table for access" which since they will not block the other users from applying read/write lock on the table.

You might also like