Teradata Performance Tuning - Basic Tips
Teradata Performance Tuning - Basic Tips
Teradata Performance Tuning - Basic Tips
APPROACHES A. In case of product join scenarios, check for - Proper usage of alias - joining on matching columns - Usage of join keywords - like specifying type of joins (ex. inner or outer) - use union in case of "OR scenarios - Ensure statistics are collected on join columns and this is especially important if the columns you are joining on are not unique.
B. collects stats
- Run command "diagnostic help stats on for the session" - Gather information on columns on which stats has to be collected - Collect stats on suggestions columns - Also check for stats missing on PI, SI or columns used in joins - "help stats <databasename>.<tablename> - Make sure stats are re-collected when at-least 10% of data changes - remove unwanted stats or stat which hardly improves performance of the queries - Collect stats on columns instead of indexes since index dropped will drop stats as well!! - collect stats on index having multiple columns, this might be helpful when these columns are used in join conditions - Check if stats are re-created for tables whose structures have some changes C. Full table scan scenarios - Try to avoid FTS scenarios as, it might take very long time to access all the data in every amp in the system - Make sure SI is defined on the columns which are used as part of joins or Alternate access path. - Collect stats on SI columns else there are chances where optimizer might go for FTS even when SI is defined on that particular column
3. Tune to get the optimizer to join on the Primary Index of the largest table,
when possible, to ensure that the large table is not redistributed on AMPS
4. For large list of values, avoid using IN /NOT IN in SQLs. Write large list values
to a temporary table and use this table in the query
5. Make sure when to use exists/not exists condition since they ignore
unknown comparisons (ex. - NULL value in the column results in unknown) . Hence this leads to inconsistent results
- Inner joins can be used when we get actual data and no extra data is loaded into spool for processing Please note for outer join conditions: 1. Filter condition for inner table should be present in "ON" condition 2. Filter condition for outer table should be present in "WHERE" condition
LIKE Clause
While tuning queries in Teradata , We take care of major performance issues but ignore small cases which might still cause the query to perform badly. I wanted to mention about one such case of LIKE clause , which many people good at performance tuning miss it assuming like patterns does not harm the performance . But in reality this is not so!! If LIKE is used in a WHERE clause, it is better to try to use one or more leading character in the clause, if at all possible. eg; LIKE '%STRING%' will be processed differently compared to LIKE 'STRING %' If a leading character 'STRING%' is used in the begining of like clause , the the Qptimizer makes use of an index to perform on query thereby increasig the performance But if the leading character' in '%STRING%' is a wildcard(say '%') , then the Optimizer will not be able to use an index, and a full table scan (FTS ) must be run, which reduces performance and takes more time. Hence it is suggested to go for '%STRING%' only if STRING is a part of entire pattern say 'SUBSTRING'
Inlist vs Between
There are lot of questions running around guys who do tuning . Sometimes they do suggest use of temporary tables instead of using large in list values, as the optimizer would go for value in XXX or Value in XXY or Value in XXZ to generate a explain plain. What if the column compared against large in-list values was part of any index say PI,SI,JI.... ? Sometimes it so happens that even after using a temp table with list of values you would still get same performance issue, why so? Did you ever consider to use "between" clause to check if the query performed better ?? Did you... I would say give it a try to see if this would be much better option compared to standard "temp table " against the in list
Say for example: SELECT customer_number, customer_name FROM customer WHERE customer_number in (1000, 1001, 1002, 1003, 1004); is much less efficient than: SELECT customer_number, customer_name FROM customer WHERE customer_number BETWEEN 1000 and 1004
We are assuming that an index on customer_number, the Query Optimizer can locate a range of numbers much faster (using BETWEEN) than it can find a series of numbers using the IN clause. Check for explain to compare difference, if still the same... refresh/collect stats on index column it would help. If still you would find some kind of issue try to find out skewness of column using following query and try to rectify the issue. Sel hashamp(hashbucket(hashrow(customer_number))), count(*) from Customer group by 1; Over all i would say trying the query with between would be of great help. But query tuning is such a complex thing you will never get what you want unless you understand the data ;-)
Join Considerations
If you are working on writing queries, working on performance or helping in betterment of performance. You will have to take sometime in going through this topic. It is all to do about Joins which is most important concern in Teradata. If some light is given to following suggestions, any join related issues can be taken care off...
Should small table be redistributed or large table? Should small table be duplicated across all the AMPs? Should both the tables be redistributed across all the AMPs??
Here is some basic thumb rules on joining columns on Index, so joining happens faster.
Tip 2:
The columns part of join must be of the same data type (CHAR, INTEGER,). But why?!? When trying to join columns from two tables, optimizer makes sure that datatype is same or else it will translate the column in driving table to match that of derived table. Say for example TABLE employee deptno (char) TABLE dept deptno (integer) If I am joining employee table with Dept on employee.deptno(char) = dept.deptno(Integer), optimizer will convert character column to Integer resulting in translation . What would happen if employee table had 100 million records and every time deptno would have to undergo Translation. So we have to make
sure to avoid such scenarios since translation is a cost factor and might need time and system resources. Make sure you are joining columns that have same data types to avoid translation!!!!
Tip 3 :
Do not use functions like SUBSTR, COALESCE , CASE ... on the indices used as part of Join. Why?!? It is not recommended not to use functions such as SUBSTR, COALESCE, CASE and others since they add up to cost factor resulting in performance issue. Optimizer will not be able to read stats on those columns which have functions as it is busy converting functions. This might result in Product join, spool out issues and optimizer will not be able to take decisions since no stats/demographics are available on column. It might assume column to have 100 values instead of 1 million values and might redistribute on wrong assumption directly impacting performance.
from Employee.Dept_emp ; By using locking table for access, we make sure that normal "access" lock is applied on table which is required to fetch results. By doing so , - There is no waiting for other locks to release since access lock can be applied on table which has read/write lock applied to it - This will cause the query to execute even when some lock is applied , but accessing data using this lock might not be consistent as it might result in dirty read due to concurrent write on the same table. It is always suggested to use locking table for access" which since they will not block the other users from applying read/write lock on the table.