Tuesday, 3 May 2011

Performanic Tuning In Teradata :::

f you are working on writing queries, working on performance or helping in betterment of performance. You will have to take sometime in going through this topic.   It is all to do about Joins which is most important concern in Teradata.

If some light is given to following suggestions, any join related issues can be taken care off...

Tip 1: Joining on PI/NUPI/ Non PI columns

We should make sure join is happening on columns composed of UPI/NUPI.  But why??

Whenever we join two tables on common columns, the smart optimizer will try to take data from both the data into a common spool space and join them to get results. But getting data from both the tables into common spool has overhead.

What if I joined a very large table with small table?
Should small table be redistributed or large table?
Should small table be duplicated across all the AMPs?
Should both the tables be redistributed across all the AMPs??

Here is some basic thumb rules on joining columns on Index, so joining happens faster.

Case 1 - P.I = P.I joins
There is no redistribution of data over amp's. Since amp local joins happen as data are present in same AMP and need not be re-distributed. These types of joins on unique primary index are very fast.

Case 2 - P.I = Non PI column joins
-Data from second table will be re-distributed on all amps since joins are happening on PI vs. NUPI column.  Ideal scenario is when small table is redistributed to be joined with large table records on same amp
-Data in small table is duplicated to Every AMP where it is joined locally with large table

Case 3 - No Index = Non PI column joins
Data from both the tables are redistributed on all AMPs.  This is one of the longest processing   queries , Care should be taken to see that stats are  collected on these columns


Tip 2: The columns part of join must be of the same data type (CHAR, INTEGER,). But why?!?

When trying to join columns from two tables, optimizer makes sure that datatype is same or else it will translate the column in driving table to match that of derived table.

Say for example
TABLE employee deptno (char)
TABLE dept deptno  (integer)

If I am joining employee table with Dept on  employee.deptno(char) = dept.deptno(Integer), optimizer will convert character column to Integer resulting in translation  . What would happen if employee table had 100 million records and every time deptno would have to undergo Translation. So we have to make sure to avoid  such scenarios since translation is a  cost factor  and might need time and system resources.

Make sure you are joining columns that have same data types to avoid translation!!!!


Tip 3 : Do not use functions like SUBSTR, COALESCE , CASE ...) on the indices used as part of Join.  Why?!?

It is not recommended not to use functions such as SUBSTR, COALESCE, CASE and others since they add up to cost factor resulting in performance issue.
Optimizer will not be able to read stats on those columns which have functions as it is busy converting functions. This might result in Product join, spool out issues and optimizer will not be able to take decisions since no stats/demographics are available on column. It might assume column to have 100 values instead of 1 million values and might redistribute on wrong assumption directly impacting performance.


Tip 4 : use NOT NULL where ever possible!

What?!!  Did someone say Not Null?? .. Yes, we have to make sure to use NOT null for columns which are declared as NULLABLE in TABLE definition.
Reason being that all the Null values might get sorted to one poor AMP resulting in infamous " NO SPOOL SPACE "  Error  as that  AMP cannot accommodate any more Null values.
SO remember to use NOT NULL in joining  so that table SKEW can be avoid  .

Since V2R5 , teradata automatically adds the condition « IS NOT NULL » to the query. Still it is better to ensure NOT NULL columns are not included as part of the join..

Peformance tuning Tips : LIKE Clause


While tuning queries in Teradata , We take care of major performance issues but ignore small cases which might still cause the query to perform badly.

I wanted to mention about one such case of  LIKE clause , which many people good at performance tuning miss it  assuming like patterns does not harm the performance . But in reality this is not so!!

If LIKE is used in a WHERE clause, it is better to try to use one or more leading character in the clause, if at all possible.

eg;   LIKE '%STRING%'  will be processed differently compared to LIKE 'STRING%'

If a leading character   'STRING%'   is used in the begining of like clause , the the Qptimizer makes use of an index to perform on query thereby increasig the performance

But if the leading character' in '%STRING%' is a wildcard(say '%') , then the Optimizer will not be able to use an index, and a full table scan  (FTS ) must be run, which reduces performance and takes more time.

Hence it is suggested to go for '%STRING%'   only if STRING is a part of entire pattern say 'SUBSTRING'

Partitioned Primary Index (PPI)

Partitioned Primary Index (PPI)

Partitioned Primary Index is one of the unique features of Teradata, which allows access of portion of data of large table. This reduces the overhead of scanning the complete table thus improving performance. 

PPI works by hashing rows to different virtual AMPs, as is done with a normal PI.  PPI does not alter data distribution, it only creates partitions on data already distributed based on PI.

Usually PPI's are defined on a table in order to increase query efficiency by avoiding full table scans without the overhead and maintenance costs of secondary indexes.

Partitions are usually defined based on Range or Case as follows.
Two functions, RANGE_N and CASE_N, can be used to simplify the specification of a partitioning expression.

1. Partition by CASE
CREATE      TABLE ORDER_Table
(
ORD_number  integer NOT NULL,
customer_number integer NOT NULL,
order_date  date ,
order_total integer
)
PRIMARY INDEX (customer_number)
PARTITION BY case1 (
            order_total < 10000 ,
            order_total < 20000 ,
            order_total < 30000,
NO           CASE     OR        UNKNOWN ) ;

2. Partition by Range - example using date range
CREATE TABLE ORDER_Table
(
ORD_number  integer NOT NULL,
customer_number integer NOT NULL,
order_date  date ,
order_total integer
)
PRIMARY INDEX (customer_number)
PARTITION BY range1 (
            Order_date BETWEEN date '2010-01-01'       AND      date '2010-12-01'
            EACH interval '1' month,
NO RANGE
        OR  UNKNOWN);

P.S: If we use NO RANGE or NO CASE - then all values not in this range will be in a single partition.
If we specify UNKNOWN, then all null values will be placed in this partition


PPI improves performance as follows:
  • Automatic optimization occurs for queries that specify a restrictive condition on the partitioning column.
  • Uses partition elimination to improve the efficiency of range searches when, for example, the searches are range partitioned.
  • Only the rows of the qualified partitions in a query need to be accessed avoid full table scans.
  • Provides an access path to the rows in the base table while still providing efficient join Strategies
  • If the same partition is consistently targeted, the part of the table updated may be able to  fit largely in cache, significantly boosting performance
  • PPI based tables have advantage during Purging stages. Since purging based on partition is very fast and deletion happens quickly.

PPI also comes with some disadvantages like
  • The PI access disadvantage occurs only when the partitioning column is not part of the PI. In this situation, a query specifying a PI value, but no value for the partitioning column, must look in each partition for that value, instead of positioning directly to the first row for the PI value.
  • Another disadvantage is that when another table (without PPI) is joined with PPI table on PI=PI condition. If one of the tables is partitioned, the rows won't be ordered the same, and the task, in effect, becomes a set of sub-joins, one for each partition of the PPI table. This type of join is sliding window join

Limitations of Partitioned Primary Index (PPI) :
  •  Primary index of PPI table has to be 
    • Non unique PI, if PPI column is not part of Index, since enforcing a Unique PI would require checking for a duplicate key value in each partition, which would be very expensive.
  •  Primary Index of PPI table can be Unique, if PPI is part of UPI. This will result in checking for unique constraint in same partition.
  •  PPI cannot be defined on Global temporary tables and Volatile tables.
  • PPI Table rows occupy two extra bytes compared to NPPI table row, as these extra bytes store the partition number for each row .
  • PPI table rows are four bytes wider if value compression is specified for the table. 

Since PPI , results in lot of partitions , there is a little overhead to user/dba.  He has to regularly run collect stats on the PPI column.

It is beneficial to collect stats on Partition column .Collecting stats on the system derived column Partition is faster because rather than reading all the base table rows for collecting information, it usually just scans the cylinder index for that PPI table.By doing so, it avoids all unnecessary partitions , thus speeding up the access.
Help stats tablename column PARTITION; -- used to list partitions in table and their details
Collect stats on tablename column PARTITION; -- refresh partition details

Maximum limits in v2r5/ v2r6 databases


Maximum limits in v2r5/ v2r6 databases

The database specifications in the following table apply to a single Teradata database.



Max number of journal tables per database
1
Max number of data tables per database
 4.2 x 10 9
Max number of columns per table or view
± 2048
Max number of columns created over the life of a table
2560
Max number of rows per table
 Limited by disk capacity
Max row size
 Approx 64,000 bytes
Max field size
 Approx 64,000 bytes
Max database object name size
 30 bytes
Max number of columns per primary or secondary index
64
Max SQL request size
1 MB
Max SQL response size
 Approx 64,000 bytes
Max expanded text size for macros and views
 2 MB
Max SQL title size
 60 characters
Max number of characters in a string constant
± 32,000
Max number of columns per DML statement ORDER BY clause
16
Max number of primary indexes per table
1
Max number of partitions for a partitioned primary index
 ± 65,535
Max numbers of secondary, hash, and join indexes, in any combination, per table
32

Found this very useful  information elsewhere on net, felt like sharing it here.!!

Teradata Interview questions - Part 7


How do you set the session mode parameters in BTEQ?
Answer
.set session transaction ANSI /* this is to set ANSI mode */
.set session transaction BTET /* this is to set Teradata transaction mode */

These commands have to be entered before logging to the session
===================================================================================

What is the command in BTEQ to check for session settings ?
Answer
The BTEQ .SHOW CONTROL command displays BTEQ settings.
===================================================================================

How do you submit bteq script (batch mode)?
Answer
1. Start the BTEQ, by typing BTEQ
2. Enter the following command
    .run file = BTEQScript.btq
OR
1. Bteq < BTEQScript.btq


BTEQScript.btq contains following

.logon 127.0.0.1/dbc, dbc;
sel top 10 * from dbc.tables;
.quit
===================================================================================

Is Like comparision case-sensitive in Teradata?
Answer

LIKE operator is not case sensitive in Teradata.
Consider the following example  
Select F_name from employee where F_name like '%JO%' ;
The followig query will pick values matchinf with 'JO' and 'jo' as well, since teradata is not case-sensitive

To overcome this problem, a new function called "CASESPECIFIC" is used in TERADATA as follows
Select F_name from employee where F_name (CASESPECIFIC) like '%JO%' ;
===================================================================================
What does DROP table command do?

It deletes all data in emp_DATA
Removes the emp_DATA definiation from the data dictionary
Removes all explicit access rights on the table
===================================================================================
How do you set default date setting in BTEQ?
Answer

There are two default date setting in BTEQ.  They have to be set after logging on to the session
They are

Set session dateform = ANSIDATE; /*format is yyyy-mm-dd */      
Set session dateform = integerdate; /* format is yy/mm/dd   -teradata date format */
===================================================================================
Difference between Inner join and outer join?

An inner join gets data from both tables where the specified data exists in both
tables.
An outer join gets data from the source table at all times, and returns data from
the outer joined table ONLY if it matches the criteria.
===================================================================================
What is multi Insert?

Inserting data records into the table using multiple insert statements.
 Putting a Semi colon in front of the key word INSERT in the next statement rather than
Terminating the first statement with a semi colon achieves it.

Insert into Sales “select * from customer”
; Insert into Loan “select * from customer”;

====================================================================
How to find duplicates in a table?
Group by those fields and select id, count(*) from table group by id having count
(*) > 1

====================================================================
How do you see a DDL for an existing table?
By using show table command.

====================================================================
Which is more efficient GROUP BY or DISTINCT to find duplicates?
With more duplicates GROUP BY is more efficient, if only a few duplicates exist
DISTINCT is more efficient.

====================================================================
Syntax for CASE WHEN statement?
CASE value_expression_1 WHEN value_expression_n THEN scalar_expression_n
END;


Locking table for access?



Locking table for access?


We would have come across this statement in many queries which are run in sensitive environments like PROD, UAT.   They can be used with views or sometimes just for querying purpose.

I wanted to discuss how important this statement would be in real-time /active data warehouses where lot of users will be striking queries on same database at the time.

create   view Employee.view_employ_withLock
as
locking table Employee.Dept_emp  for access
select   *
from     Employee.Dept_emp ;

By using locking table for access, we make sure that normal "access" lock is applied on table which is required to fetch results. By doing so ,

- There is no waiting for other locks to release since access lock can be applied on table which has read/write lock applied to it
- This will cause the query to execute even when some lock is applied , but  accessing data using this lock might not be consistent as it might result in dirty read due to concurrent write on the same table.

It is always suggested to use “locking table for access" which since they will not block the other users from applying read/write lock on the table.

Fastload locks

Fastload locks are one of the major issues in fastload . I have seen this kind of scenarios in production environment and there are various ways by which we can handle this issue.

When do fastload locks happen?
- When the fastload script results in failure (error code 8 or 12)
- When fastload script is aborted by dba or some other session
- Fastload fails (Bad record, file)
- When you have forgot to add end loading statement in ur fload script


What can be done to release fload locks?

1. Finish the fastload, which means leave your hands off of the error tables and restart  fastload. When it finishes the table lock will be released.
 If the following error persists then do step 2.
Error is: RDBMS error 2652: Operation not allowed: _db_._table_ is being Loaded.


2.  Create a dummy fload job  as following without any content for loading  as  follows.

     logon tdpid/username,password;
     begin loading databasename.tablename;
/*  emtpy loading   to release fastload lock */
     end loading;
     logoff;

3.  If   second step also fails then, drop the empty locked table (because if you can't restart Fastload you most likely dropped the error tables)
   Please note that before using the DROP solution, try to close your already opened fastload session.

4. If still this  error persists and not able to drop the table  :( , then please kill the fload session from pmon/manager , drop the table and rerun actual    fload script .


I hope this solution works, if not then please contact ur DBA or contact m

Teradata Interview questions - Part 6

Teradata  Interview questions  - Part 6
What is a Sparse Index?
Answer:

Sparse Join Indexes are a type of Join Index which contains a WHERE clause that reduces the number of rows which would otherwise be included in the index. All types of join indexes, including single table, multitable, simple or aggregate can be sparse.

Difference between Stored Procedure and Macro?
Answer:

SP:
It does not return rows to the user.
It has to use cursors to fetch multiple rows
It used Inout/Out to send values to user
It Contains comprehensive SPL
It is stored in DATABASE or USER PERM
A stored procedure also provides output/Input capabilities

Macros:
It returns set of rows to the user.
It is stored in DBC PERM space
A macro that allows only input values

How do you execute the given SQL statement repeatedly in BTEQ?
Answer:

Select top 1* from database.table1;
=n
Here "=n” is to run the previous sql statement, "n" number of times.

What are types of PARTITION PRIMARY INDEX (PPI) in Teradata?
Answer:

1. Partition by CASE
CREATE      TABLE ORDER_Table
(
ORD_number  integer NOT NULL,
customer_number integer NOT NULL,
order_date  date ,
order_total integer
)
PRIMARY INDEX (customer_number)
PARTITION BY case1 (
            order_total < 10000 ,
            order_total < 20000 ,
            order_total < 30000,
NO           CASE     OR        UNKNOWN ) ;

2. Partition by Range  - example using date range
CREATE TABLE ORDER_Table
(
ORD_number  integer NOT NULL,
customer_number integer NOT NULL,
order_date  date ,
order_total integer
)
PRIMARY INDEX (customer_number)
PARTITION BY range1 (
            Order_date BETWEEN date '2010-01-01'       AND      date '2010-12-01'
            EACH interval '1' month ,
NO RANGE
        OR  UNKNOWN);

P.S:If  we use NO RANGE or NO CASE  - then all values not in this range will be in a single partition.
If we specify UNKNOWN, then all null values will be placed in this partition

Can we define PARTITION PRIMARY INDEX (PPI) on a Primary Index column in Table? Explain Why?
Answer:

PPI cannot be defined on PI column in Table.  Since PI is used to locate an AMP and store the data based on HASH value (ROW ID ) it cannot be used as PPI column.
In case of PPI , The data stored in AMP's are Partitioned based on PPI column after they are row hashed  (ROW KEY = ROW ID +PPI value )

What is Cross Join?
Answer:

It is a Teradata specified Join, which is used as equivalent to product join.
There is no “On” clause in case of CROSS join
SELECT  EMP.ename , DPT.Dname
FROM     employee EMP
CROSS JOIN
Department DPT
WHERE
EMp.deptno = DPT.depto ;


What does SLEEP function does in Fast load?
Answer

The SLEEP command specifies the amount minutes to wait before retrying to logon and establish all sessions.
Sleep command can be used with all load utilities not only fastload.
This situation can occur if all of the loader slots are used or if the number of requested sessions is not available. The default value is 6 minutes. If tenacity was set to 2 hours and sleep 10 minutes, Sleep command will try to logon for every 10 minutes up to 2 hours duration.


What is TENACITY? What is its default value?
Answer

TENACITY specifies the amount of time in hours, to retry to obtain a loader slot or to establish all requested sessions to logon. The default for Fast Load is “no tenacity”, meaning that it will not retry at all. If several FastLoad jobs are executed at the same time, we recommend setting the TENACITY to 4, meaning that the system will continue trying to logon for the number of sessions requested for up to four hours.


How to skip 1st record while using Bteq IMPORT?
Answer:

By using SKIP=1 ;   , we can skip first record.
 .import infile=<filename>, skip=1;

How many types of Index are present in teradata?
Answer:

There are 5 different indices present in Teradata
1. Primary Index
      a.Unique primary index
      b. non Unique primary index
2. Secondary Index
      a. Unique Secondary index
      b. non Unique Secondary index
3. Partitioned Primary Index
      a. Case partition (ex. age, salary...)
      b. range partition ( ex. date)
4. Join index
      a. Single table join index
      b. multiple table join index
      c. Sparse Join index ( constraint applied on join index in where clause)
5. Hash index

Sparse Index ( SPARSE JOIN INDEX)

It is type of join index which contains a WHERE clause   , that  reduces number of rows participating  from the base tables.
All types of join indexes, including single table, multitable, simple or aggregate can be sparse.

By default , a given join index will have a NUPI defined on the first column if it is not specified in create JOIN index statement.   We can explicitly mention a column to be NUPI for the join Index.

Any combination of AND, OR, IN  (LOGICAL) conditions  can be used along with  WHERE clause in Sparse Index.

Check the following example for Sparse Join Index.
CREATE      JOIN INDEX DEV.testjoin AS
sel  A.DEPTno  ,  b.sal
FROM 
dept  A
INNER JOIN
 employee B
ON  A.deptno=b.dept
WHERE a.deptno=20   --- sparse index
PRIMARY INDEX (sal);
 
In the following DDL for Join INDEX,
- Sal is explicitly mentioned as NUPI
- a filter clause is applied on Dept.DeptNO = 20, which reduces the number of rows  ( an Ideal case of Sparse Index )

P.s:  Join Index never allows a Unique Index to be created.


Rank vs Row_Number in Teradata

Anyone working on Teradata would be aware of the two most common OLAP functions in Teradata which are used for common use i.e.
  1. Rank()
  2. Row_Number()

  In one of my posts I have mentioned about how to generate sequence number (ex. Surrogate Key) using Row_Number.   Today let us find out the difference in usage of rank and row number functions by taking following scenario on tables (sales_daily)


SELECT    
item_id,
sales_date,
Actual_sales,
RANK() OVER (ORDER  BY Actual_sales DESC)  ,                                  -- RANK function
ROW_NUMBER() OVER (ORDER     BY Actual_sales DESC)                   -- ROW_NUMBER function
FROM   Sales_daily
WHERE    
sales_date BETWEEN DATE  '2010-01-01'     AND DATE '2010-03-01';


The result set is as follows:
Item_id
Sales_date
Actual_sales
Rank
Row_Number()
120
4/1/2010
550
1
1
120
4/2/2017
550
1
2
120
4/2/2020
450
3
3
120
4/2/2006
350
4
4
120
4/2/2027
350
4
5
120
4/1/2005
350
4
6
120
4/1/2003
250
7
7
120
4/2/2003
250
7
8
120
4/1/2025
200
9
9
120
4/1/2002
200
9
10
120
4/1/2021
150
11
11
120
4/2/2001
150
11
12
120
4/1/2001
150
11
13
120
4/1/1931
100
14
14


In the result set, we can observe the following,  
1)       Rows 1 and 2 have same rank assigned  since that have same actual_sales value
2)       Rows 1 and 2 have different row numbers assigned even though same actual_sales value
      3)   Row 3 has rank “3” assigned as ranking positions

Hence , 
rows are always assigned an incremented sequence number with ROW_NUMBER.
Rows are always assigned the same number with RANK function.

Types of Teradata Joins

Teradata joins

When we join two or more tables on a column or set of columns, Joining takes place. This will result in data resulting from matching records in both the tables. This Universal concept remains the same for all the databases.
In Teradata, we have Optimizer (a very smart Interpreter), which determines type of join strategy to be used based on user input taking performance factor in mind.

In Teradata, some of common join types are used like
- Inner join (can also be "self join" in some cases)
- Outer Join (Left, Right, Full)
- Cross join (Cartesian product join)

When User provides join query, optimizer will come up with join plans to perform joins. These Join strategies include
- Merge Join
- Nested Join
- Hash Join
- Product join
- Exclusion Join


 Merge Join
--------------------

Merge join is a concept in which rows to be joined must be present in same AMP. If the rows to be joined are not on the same AMP, Teradata will either redistribute the data or duplicate the data in spool to make that happen based on row hash of the columns involved in the joins WHERE Clause.
            If two tables to be joined have same primary Index, then the records will be present in Same AMP and Re-Distribution of records is not required.

There are four scenarios in which redistribution can happen for Merge Join
Case 1: If joining columns are on UPI = UPI, the records to be joined are present in Same AMP and redistribution is not required. This is most efficient and fastest join strategy
Case 2: If joining columns are on UPI = Non Index column, the records in 2nd table has to be redistributed on AMP's based on data corresponding to first table.
Case 3: If joining columns are on Non Index column = Non Index column , the both the tables are to be redistributed so that matching data lies on same amp , so the join can happen on redistributed data.  This strategy is time consuming since complete redistribution of both the tables takes across all the amps
Case 4: For join happening on Primary Index, If the Referenced table (second table in the join) is very small, then this table is duplicated /copied on to every AMP.

 Nested Join
-------------------
Nested Join is one of the most precise join plans   suggested by Optimizer .Nested Join works on UPI/USI used in Join statement and is used to retrieve the single row from first table . It then checks for one more matching rows in second table based on being used in the join using an index (primary or secondary) and returns the matching results.

Example:
Select EMP.Ename , DEP.Deptno, EMP.salary
from
EMPLOYEE EMP ,
DEPARTMENT DEP
Where EMP.Enum = DEP.Enum
and EMp.Enum= 2345;    -- this results in nested join

 Hash join
----------------
Hash join is one of the plans suggested by Optimizer based on joining conditions. We can say Hash Join to be close relative of Merge based on its functionality. In case of merge join, joining would happen in same amp.   In Hash Join, one or both tables which are on same amp are fit completely inside the AMP's Memory   . Amp chooses to hold small tables in its memory for joins happening on ROW hash.

Advantages of Hash joins are
1. They are faster than Merge joins since the large table doesn’t need to be sorted.
2. Since the join happening b/w table in AMP memory and table in unsorted spool, it happens so quickly.

 Exclusion Join
-------------------------

These type of joins are suggested by optimizer when following are used in the queries
- NOT IN
- EXCEPT
- MINUS
- SET subtraction operations

Select EMP.Ename , DEP.Deptno, EMP.salary
from
EMPLOYEE EMP
WHERE EMP.Enum NOT IN
( Select Enum from
DEPARTMENT DEP
where Enum is NOT NULL );

Please make sure to add an additional WHERE filter “with <column> IS NOT NULL” since usage of NULL in a NOT IN <column> list will return no results.

Exclusion join for following NOT In query has 3 scenarios
Case 1: matched data in "NOT IN" sub Query will disqualify that row
Case 2: Non-matched data in "NOT IN" sub Query will qualify that row
Case 3: Any Unknown result in "NOT IN" will disqualify that row - ('NULL' is a typical example of this scenario). 


How to Export without junk characters in FastExport?

How to Export without junk characters in FastExport?

Solution1: Create a small OUTMOD to strip off the two-byte VARCHAR length field.

Solution2: Another alternative if you are in a UNIX environment is to use AWK or SED or CUT the file after it has been output to strip the leading to characters from each record.

STEP1: Write the query in FASTEXPORT file as follows

SELECT '|'||Col1||';'||Col2 from TAB1; -- Here '|' pipe operator is used as delimiter.
So result generated will be as follows
^A|Col1;Col2

"^A" is a junk value generated during EXPORT.

STEP2: Write UNIX script and pass filename as first parameter

#####UNIX SOLUTION #############
##create a file trim.ksh ############
## Run the file trim.ksh <user_filename>###

FileName=$1
cat $FileName | cut -f2- -d '|' > $FileName.tmp
mv $FileName.tmp $Join_Export.txt1

The Result after cutting all junk characters till occurrence of PIPE operation will be as follows
Col1;Col2

Solution 3: by casting Entire result to CHAR

When we are exporting a table in record mode , Please make sure that data exported does not contain extra 2 characters by casting entire result to CHAR().

For example: if table1 has 2 columns, Col1 is of type char (6), Col2 is of type char (2)
Write the query as follows:

Select cast (c1 || c2) as char (8) from table1;

Note: If we write query like “select c1 || c2 from table1",
The records exported will contain 2 byte length at the beginning, because concatenation returns VARCHAR () type not CHAR () type.

Which is faster – MultiLoad delete or Delete command?
Answer

MultiLoad delete is faster then normal Delete command, since the deletion happens in data blocks of 64Kbytes, where as delete command deletes data row by row.  Transient journal maintains entries only for Delete command since Teradata utilities doesn’t support Transient journal loading.


How to Skip or Get first and Last Record from Flat File through MultiLoad?
Answer

In .IMPORT command in Mload we have a option to give record no. from which processing should begin. i.e. ‘FROM m’ ‘m’ is a logical record number, as an integer, of the record in the identified data source where processing is to begin. You can mention ’m’ as 2 and processing will  start from second record.

THRU k and FOR n are two options in the same Mload command, functions same towards the end of the processing.

Adding to the above, if from n"start record" and for n "stop record" are not mentioned, mload considers records from   start till the end of the file.


What do High confidence, Low confidence and No confidence mean in EXPLAIN plan?
Answer

Explain gives the execution strategy means what are the different steps that the query will go through.

HIGH CONFIDENCE:  Statistics are collected.
LOW CONFIDENCE:   Statistics are not collected.  But the where condition is having the condition on indexed column. Then estimations can be based on sampling.
NO CONFIDENCE: Statistics are not collected and the condition is on non indexed column.


What is the default join strategy in Teradata???
Answer
# 1
The Teradata Cost based optimizer will decide the join strategy based on the optimum path. The common strategies followed are from Merge, Hash & Nested Join

Answer 
# 2 
Three strategies followed by optimizer are:
1. Duplication of rows of one table to every amp
--> This one is opted by optimizer when the non-PI column is on a small table.
2. Redistribution of the non PI rows of the table to the amp containing the matching PI row amp.
--> This one is opted when the tables are large. The non PI column is in one table is redistributed to the amp containing the matching PI.
3. Redistribute both the rows of the table by hash values.
--> This is chosen when the join is on a column that is not the PI in either table. This will require the most spool space.


Can we take collect stats on Derived Tables and Volatile tables and Temporary tables?
What is Global Temporary table?
What is the use of this Global Temporary table?

Answer
1. No for volatile and derived tables and yes for global tables.
2. Global tables are temp tables like volatile tables but unlike volatile tables, their definitions are retained in dd.
3. It is used whenever there is a need for a temporary table with same table definition for all users.


How to handle nulls in Teradata???
How many columns can be there in a table???
How to find average sal with out using avg function????
Answer
1. Use zeroifnull, nullifzero in select and NULL in insert directly.
2. 256 columns max per table.
3.  Without using "avg" we can find the avg salary by using sum (sal)/count (sal);
  sel sum(sal)/count(sal) as avgsal from tablename


Why Fload doesn’t support multiset table?
Answer

Fload does not support Multiset table because of following reason.

Say, the fastload job fails. Till the fastload failed, some number of rows was sent to the AMP's.
Now if you restart FLOAD, it would start loading record from the  last checkpoint and some of the consecutive rows are sent for the second time. These will be caught as duplicate rows are found after sorting of data.
   This restart logic is the reason that Fastload will not load duplicate rows into a MULTISET table. It assumes they are duplicates because of this logic. Fastload support Multiset table but does not support the duplicate rows. Multiset tables are tables that allow duplicate rows. When Fastload finds the duplicate rows it discards it. Fast Load can load data into multiset table but will not load the duplicate rows.


How many types of Skew exist?
Answer

If you utilized unequally TD resources (CPU,AMP,IO,Disk and etc) this is called skew exists. Major are 3 types of skews (CPU skew, AMP/Data skew, IO Skew).

-Data skew?
When data is not distributed equally on all the AMPs.
-Cpu skew?
Who is taking/consuming more CPU called cpu skew.
-IO skew?
Who perform more IO Operation? Resulting in IO Skew


Why does varchar occupy 2 extra bytes?
Answer

The two bytes are for the number of bytes for the binary length of the field.
It stores the exact no of characters stored in varchar


What is the difference between start schema and Fsldm? 
Answer

Star Schema
--> Demoralized for query performance
--> A star schema has one Fact tables and many dimensional tables to store the related.

FSLDM (Financial Services Logical Data Model)
--> 3rd normal form
-->Proposed and designed by NCR to cater the needs of financial sectors, especially for the Banking domain.
--> It has many Fact tables and dimension tables.
--> The FSLDM can be customized to cater to bank's specific needs.
Ex. The major fact tables include Party (Customers of the bank), Product (Services offered by the bank), Event (An event occurred ex: An ATM transaction), Agreement (A deal between
the party and the bank for a product) , etc..
Can you load multiple data files for same target table using Fastload?
Answer:

Yes, we can Load a table using multiple datafiles in Fastload.
Before giving "end loading" statement user can define file path and use insert sql for multiple source files and give "end loading" statement at the end


Explain types of re-distribution of data happening for joining of columns from two tables
Case 1 - P.I = P.I joins
Case 2 - P.I = N.U.P.I joins
Case 3 - N.U.P.I = N.U.P.I joins
Answer:

Case1 - there is no redistribution of data over amp's. Since amp local joins happen as data are present in same AMP and need not be re-distributed. These types of joins on unique primary index are very fast.
Case2 - data from second table will be re-distributed on all amps since joins are happening on PI vs. NUPI column.  Ideal scenario is when small table is redistributed to be joined with large table records on same amp
case3 - data from both the tables are redistributed on all AMPs.  This is one of the longest processing queries , Care should be taken to see that stats are collected on these columns
How to skip the header row in the fastload script

RECORD 2;   /* this skips first record in the source file */
DEFINE ...
How to find out list of indexes in Teradata?

IndexType
Description
P
Nonpartitioned Primary
Q
Partitioned Primary
S
Secondary
J
join index
N
hash index
K
primary key
U
unique constraint
V
value ordered secondary
H
hash ordered ALL covering secondary
O
valued ordered ALL covering secondary
I
ordering column of a composite secondary index
M
Multi column statistics
D
Derived column partition statistics
1
field1 column of a join or hash index
2
field2 column of a join or hash index

SELECT
databasename, tablename, columnname, indextype, indexnumber, indexname
FROM  
dbc.indices
ORDER BY
databasename,  tablename, indexnumber;


How to identify PPI columns?
Answer:

Select databasename , tablename , columnposition ,columnname from dbc.indices
where indextype ='Q'
order by 1 ,2,3 ;


What are the scenarios in which Full Table Scans occurs?

1. The where clause in SELECT statement does not use either primary index or secondary index
2. SQL Statement which uses a partial value (like or not like), in the WHERE statement.
3. SQL Statement which does not contain where clause.
4. SQL statement using range in where clause. Ex. (col1 > 40 or col1 < =10000)

 
What are Differences between Teradata and ANSI Session modes in Teradata?
Answer:

TERADATA MODE
-------------
1. Comparison is not Case sensitive
2. Create table are default to SET tables
3. Each transaction is committed implicitly
4. Supports all Teradata commands 
5. It follows BTET (Begin and End Transaction) Mode


ANSI MODE
-------------
1. Comparison is CASE sensitive
2. Create table are default to MULTISET tables
3. Each transaction has to be committed explicitly
4. Does not support all Teradata commands 
5. It does not follow BTET Mode


How do you create materialized view in Teradata?
Answer:

There is no such thing as a "materialized view" in Teradata. The equivalent in Teradata would be a join index (or aggregate index) with a corresponding view put on top of it. The command to create one is "CREATE JOIN INDEX...(lots of options)".

Join indices are maintained when the data is maintained. They cannot be accessed directly, but are invoked when the optimizer determines it is beneficial. Even though you cannot invoke a join index directly, you create a view that looks like the join index and in that way, it would be similar to having direct access. However, this approach does not guarantee that the join index will be used.


What is difference B/w User and database in Teradata?
Answer:

- User is a database with password but database cannot have password
- Both can contain Tables , views and macros
- Both users and databases may or may not hold privileges
- Only users can login, establish a session with Teradata database and they can submit requests


Teradata Performance Tuning - Basic Tips

Performance tuning thumb rules.

Here are very basic steps which are used to PT any given query in given environment . As a pre-requiste , make sure 
- user has proper select rights and actual profile settings
- Enough space available to run and test the queries

1. Run explain plan (pressing F6 or “EXPLAIN sel * …”,)
Then see for potential information like
- No or low confidence
- Product joins conditions
- By way of an all row scan - FTS
- Translate

Also check for
- Distinct or group by keywords in SQL query
- In/ not in keywords and check for the list of values generated for the same

APPROACHES

A. In case of product join scenarios,check for
- Proper usage of alias
- joining on matching columns
- Usage of join keywords - like specifying type of joins (ex. inner or outer )
- use union in case of "OR” scenarios
- Ensure statistics are collected on join columns and this is especially important if the columns you are joining on are not unique.

B. collects stats
- Run command "diagnostic help stats on for the session"
- Gather information on columns on which stats has to be collected
- Collect stats on suggestions columns
- Also check for stats missing on PI, SI or columns used in joins - "help stats <databasename>.<tablename>
- Make sure stats are re-collected when at-least 10% of data changes
- remove unwanted stats or stat which hardly improves performance of the queries
- Collect stats on columns instead of indexes since index dropped will drop stats as well!!
- collect stats on index having multiple columns, this might be helpful when these columns are used in join conditions
- Check if stats are re-created for tables whose structures have some changes 

c. Full table scan scenarios
- Try to avoid FTS scenarios as, it might take very long time to access all the data in every amp in the system
- Make sure SI is defined on the columns which are used as part of joins or Alternate access path.
- Collect stats on SI columns else there are chances where optimizer might go for FTS even when SI is defined on that particular column

2. If intermediate tables are used to store results, make sure that
- It has same PI of source and destination table

3. Tune to get the optimizer to join on the Primary Index of the largest table, when possible, to ensure that the large table is not redistributed on AMPS

4. For large list of values, avoid using IN /NOT IN in SQLs. Write large list values to a temporary table and use this table in the query

5. Make sure when to use exists/not exists condition since they ignore unknown comparisons (ex. - NULL value in the column results in unknown) . Hence this leads to inconsistent results

6. Inner Vs Outer Joins
Check which join works efficiently in given scenarios.Some examples are 
- Outer joins can be used in case of large table joining with small tables (like fact table joining with Dimension table based on reference column)
- Inner joins can be used when we get actual data and no extra data is loaded into spool for processing
Please note for outer join conditions:
1. Filter condition for inner table should be present in "ON" condition
2. Filter condition for outer table should be present in "WHERE" condition

DIAGNOSTIC HELPSTATS

One of my favorite commands and most useful among the lot , is Diagnostic help stats .
This command is very useful in helping user understand which all columns should have collect
stats be collected on, so optimizer can select the best plan.

  To Include the Stats collection recommendations in the explain plan.

DIAGNOSTIC HELPSTATS ON FOR SESSION;

At the end of the explain text is the recommended statistics for collection will be as follows

/*BEGIN RECOMMENDED STATS ->
 16) "COLLECT STATISTICS ADW.PRODUCT COLUMN P_SIZE".  (HighConf)
 17) "COLLECT STATISTICS ADW.PRODUCT COLUMN P_CODE".  (HighConf)
 18) "COLLECT STATISTICS ADW.PRODUCT COLUMN P_DESC".  (HighConf) */
If you want explain to stop showing recommendations for collection of stats, then use the following

DIAGNOSTIC HELPSTATS NOT ON FOR SESSION;

Diagnostic help stats has some drawbacks like

  • It does not give any sort of indication of stale stats
  • Stats should be chosen carefully as recommended by diagnostic help stats
  • Care should be taken to see that too many stats on a given table can impact batch running of scripts and increases the overload of stats maintenance.
  • If recommended stats don’t show any improvements in performance, DROP them!

Difference between count(*) and count(1) ?


Difference between count(*) and count(1) ?

Count is one of very important functions used in  any database . But what many dont know that  is the result count we get from COUNT function might be different  based on how it is used. Let us consider the example of count(*) and Count(1) and understand how there is some difference the usage.

Say for a given table with data
X       y
------------
21      31
Null    32
Null    33
41      34
Null    Null

And this query:
     select count(*), count(x), count(1) from table;

Results: 4, 2, 4

* Count(*) counts all occurrences including nulls. This is a row count for a given table.
* Count(x) counts all non null occurrences of values in column x. If x has null values, count(x) will be less than count(*).
* Count(1) is similar to  count(x) .It counts all null values in a "pseudo column" created by the constant. Since the constant 1 will never be null, this would,  be like count(*) i.e row count .

List of useful Data dictionary views

List of useful Data dictionary views which might come in handy in situations!

1. DBC.users 
This view gives current user information

2. dbc.sessioninfo
This view gives information about
- details of  users currently logged in

3.DBC.Databases
This view list all the databases present in the given teradata database system. ALso contains useful information like
-Creatorname
-OWnername
-PERMspace
-SPOOLspace
-TEMPspace

4.DBC.Indices
It gives information on the index created for given table

5.DBC.Tables
It gives information about all the Tables(T), views(V), macros(M), triggers(G), and stored procedures .

6.DBC.IndexConstraints
It Provides information about partitioned primary index constraints.
'Q' indicates a table with a PPI

7. DBC.DiskSpace
It provides information about disk space usage (including spool) for any database or account.
SELECT      DatabaseName
,CAST (SUM (MaxPerm) AS FORMAT 'zzz,zzz,zz9')
        ,CAST (SUM (CurrentPerm) AS FORMAT 'zzz,zzz,zz9')
        ,CAST (((SUM (CurrentPerm))/
                NULLIFZERO (SUM(MaxPerm)) * 100)
                AS FORMAT 'zz9.99%') AS "% Used"
FROM   DBC.DiskSpace
GROUP BY   1
ORDER BY   4 DESC ;

8. DBC.TableSize
It provides information about disk space usage (excluding spool) for any database, table or account
SELECT      Vproc
        ,CAST (TableName
                 AS FORMAT 'X(20)')
        ,CurrentPerm
        ,PeakPerm
FROM   DBC.TableSize
WHERE DatabaseName = USER
ORDER BY           TableName, Vproc ;

9. DBC.AllSpace
It provides information about disk space usage (including spool) for any database, table, or account.
SELECT      Vproc
        ,CAST (TableName AS
                FORMAT 'X(20)')
        ,MaxPerm
        ,CurrentPerm
FROM   DBC.AllSpace
WHERE DatabaseName = USER
ORDER BY   TableName, Vproc ;

10.  DBC.columnstats , DBC.indexstats  and DBC.Multicolumnstats
 These are used to find stats info on given tables

Teradata interview questions PART 4
Teradata interview questions PART 4 consolidated over period of time from various sources.

Explain Different types of temporary tables in Teradata
Answer:

Derived temporary tables
  # Derived tables are local to an SQL query.
  # Not included in the DBC data dictionary database, the definition is kept in cache.
  # They are specified on a query level with an AS keyword in an sql statement

Volatile temporary tables
  # Local to a session ( deleted automatically when the session terminates)
  # Not included in the DBC data dictionary database and table definition is stored in cache. However, the volatile tables need to have unique names across the session.
  # Created by the CREATE VOLATILE TABLE sql statement

Global temporary tables
  # Global tables are local to a session and deleted automatically when the session terminates
  # A permanent table definition is stored in the DBC data dictionary database (DBC.Temptables)
  # defined with the CREATE GLOBAL TEMPORARY TABLE sql


How do you find out number of AMP's in the Given system
Answer:
 
Select HASHAMP () +1;


List types of HASH functions used in teradata?
Answer:

There are HASHROW, HASHBUCKET, HASHAMP and HASHBAKAMP.
The SQL hash functions are:
  • HASHROW (column(s))
  • HASHBUCKET (hashrow)
  • HASHAMP (hashbucket)
  • HASHBAKAMP (hashbucket)

Example:
SELECT
            HASHROW ('Teradata')   AS "Hash Value"
            , HASHBUCKET (HASHROW ('Teradata')) AS "Bucket Num"
            , HASHAMP (HASHBUCKET (HASHROW ('Teradata'))) AS "AMP Num"
            , HASHBAKAMP (HASHBUCKET (HASHROW ('Teradata')))  AS "AMP Fallback Num" ;


What are the basic criteria to select Primary Index column for a given table?
Answer:
A thumb rule of ADV demographics is followed.
Access Demographics
Identify index candidates that maximize one-AMP operations.
Columns most frequently used for access (Value and Join).

Distribution Demographics
Identify index candidates that optimize parallel processing.
Columns that provide good distribution.

Volatility Demographics
Identify index candidates with low maintenance I/O.

What are the basic rules that define how PI is defined in Teradata?
Answer:

The following rules govern how Primary Indexes in a Teradata Database must be defined as well as how they function:

One Primary Index per table.
A Primary Index value can be unique or non-unique.
The Primary Index value can be NULL.
The Primary Index value can be modified.
The Primary Index of a populated table cannot be modified.
A Primary Index has a limit of 64 columns.  


How to make sure Optimizer chooses NUSI over Full Table Scan (FTS)?
Answer:

A optimizer would prefer FTS over NUSI, when there are no Statistics defined on NUSI columns.
It is always suggested to collect statistics whenever NUSI columns are defined on the table.
Verify whether index is being used by checking in Explain plan.


What are the advantages and dis-advantages of secondary Indexes?
Answer:

Advantages:
1. A secondary index might be created and dropped dynamically
2.A table may have up to 32 secondary indexes.
3. Secondary index can be created on any column. .Either Unique or Non-Unique
4. It is used as alternate path or Least frequently used cases.  ex. defining SI on non indexed column can improve the performance, if it is used in  join or filter condition of a given query.
5. Collecting Statistics on SI columns make sure Optimizer 

Disadvantages
1. Since Sub tables are to be created, there is always an overhead for additional spaces.
2. They require additional I/Os to maintain their sub tables.
3. The Optimizer may, or may not, use a NUSI, depending on its selectivity.
4. If the base table is Fallback, the secondary index sub table is Fallback as well.
5. If statistics are not collected accordingly, then the optimizer would go for Full Table Scan.


When should the statistics be collected?
Answer:
Here are some excellent guidelines on when to collect statistics:
·       All Non-Unique indices
·       Non-index join columns
·       The Primary Index of small tables
·       Primary Index of a Join Index
·       Secondary Indices defined on any join index
·       Join index columns that frequently appear on any additional join index columns that frequently appear in WHERE search conditions
·       Columns that frequently appear in WHERE search conditions or in the WHERE clause of joins.
Where does TD store transient journal?
Answer:

In perm space -> dbc.transientjournal
But that special table can grow over dbc's perm limit until the whole system runs out of perm space.
How can you find the Teradata Release and Version from Data Dictionary Table?
Answer:
SELECT * FROM DBC.DBCINFO;   
    

How can you determine I/O and CPU usage at a user level in Teradata?
Answer:
SELECT ACCOUNTNAME, USERNAME, SUM (CPUTIME) AS CPU, SUM (DISKIO) AS DISKIO
FROM DBC.AMPUSAGE
GROUP           BY 1,2
ORDER            BY 3 DESC;


How can you track Login Parameters of users in Teradata?
Answer:
SELECT LOGDATE, LOGTIME, USERNAME, EVENT FROM DBC.LOGONOFF;

Difference between Create table (copy) and Create table (select)


Difference between Create table (copy) and Create table (select)

When ever we need to create a copy of existing table we tend to use create table(copy ) from existing table or Create table ( select) from existing table.

Many may ignore the difference in running of create table in two different ways assuming the structure created to  be same.  But in actual case, it is  not so!!
let us try out two type of create table types using examples to understand the differences.

Create a table Check123 which include not null ,default ,UPI and USI definations in it

SHOW   TABLE check123;
/*
CREATE SET TABLE check123 ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT
     (
      int1 INTEGER DEFAULT 0 ,
     int12 INTEGER NOT NULL DEFAULT 0 ,
      int2 INTEGER NOT NULL,
      int3 INTEGER NOT NULL)
UNIQUE PRIMARY INDEX prim1 ( int3 )
UNIQUE INDEX uniq1 ( int2 );
*/

Step1:   Create table Check_COPY  from Check123 using CREATE TABLE (COPY ) method

CREATE   TABLE check_COPY AS check123 WITH no data ;
Run show table command  to check for table structure
SHOW   TABLE check_COPY;
/*
CREATE SET TABLE  check_COPY ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT
     (
      int1 INTEGER DEFAULT 0 ,
      int12 INTEGER NOT NULL DEFAULT 0 ,
      int2 INTEGER NOT NULL,
      int3 INTEGER NOT NULL)
UNIQUE PRIMARY INDEX prim1 ( int3 )
UNIQUE INDEX uniq1 ( int2 );
*/

From the following observation we can understand that the table created using COPY method will retain all datatypes and index definations like UPI and NUPI

Step2:   Create table Check_SELECT  from Check123 using CREATE TABLE (COPY ) method

CREATE      TABLE Check_SELECT AS
( sel * FROM   check123 ) WITH no data ;

Run show table command  to check for table structure
SHOW   TABLE Check_SELECT;
/*
CREATE SET TABLE Check_SELECT ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT
     (
      int1 INTEGER,                          --DEFAULT MISSING
      int12 INTEGER,                -- DEFAULT and NOTNULL  MISSING
      int2 INTEGER,                          -- NOTNULL  MISSING
      int3 INTEGER)                          -- NOTNULL  MISSING
PRIMARY INDEX ( int1 );
 */

Hence  when table is created using  CREATE TABLE using SELECT from table method, the table created will not retain following from original table
·                     DEFAULT
·                     NOT NULL
·                     UNIQUE PRIMARY INDEX
·                     UNIQUE INDEX

distinct vs group by in Teradata .

There is always a debate going on when finding out unique values in a given table.  The problem comes into picture when we use Group by or distinct to find it.
Both return same number of rows , but with some execute  time difference between them. Execution time is always a very important factor considering performance as one of the major factors is teradata warehouse.

So which is more efficient ?? DISTINCT or GROUP BY???

Since DISTINCT redistributes the rows immediately, more data may move between the AMPs, where as  GROUP BY that only sends unique values between the AMPs.
So, we can say that  GROUP BY sounds more efficient. 
    But when you assume that data is nearly unique in a table, GROUP BY will  spend more time attempting to eliminate duplicates that do not exist at all.Therefore, it is wasting its  time to check for duplicates the first time. Then, it must redistribute the same amount of data .

Let us see how these steps are used in each case for elimination of Duplicates
(can be found out using explain plan)

DISTINCT
1. It reads each row on AMP
2. Hashes the column value identified in the distinct clause of select statement.
3. Then redistributes the rows according to row value into appropriate AMP
4. Once  redistribution is completed , it
    a. Sorts data to group duplicates on each AMP
    b. Will remove all the duplicates on each amp and sends the original/unique value

P.s: There are cases when "Error : 2646 No more Spool Space " . In such cases try using GROUP BY.

GROUP BY
1. It reads all the rows part of GROUP BY
2. It will remove all duplicates in each AMP for given set of values using "BUCKETS" concept
3. Hashes the unique values on each AMP
4. Then it will re-distribute them to particular /appropriate AMP's
5. Once  redistribution is completed , it
    a. Sorts data to group duplicates on each AMP
    b. Will remove all the duplicates on each amp and sends the original/unique value


Hence it is better to  go for
  • GROUP BY  -  when Many duplicates
  • DISTINCT        -  when few or no duplicates
  • GROUP BY -  SPOOL space is exceeded

Teradata HELP commands

Teradata provides one of the user friendly featured like HELP commands.

Let us see the various help commands and their uses.

HELP SESSION;   
This command is used to display the user name, account name, logon date and time, current database name, collation code set and character set being used and also , transaction semantics, time zone and character set data.

HELP DATABASE <database-name> ;   
This command is used to display the names of all the tables (T), views (V), macros (M), and triggers (G) stored in a database and table comments

HELP USER <user-name> ;   
This command is used to display the names of all the tables (T), views (V), macros (M), and triggers (G) stored in a user area and table comments

HELP TABLE <table-name> ;   
This command is used to display the column names, type identifier, and any user written comments on the columns within a table.

HELP VOLATILE TABLE ;   
This command is used to display the names of all Volatile temporary tables active for the current  user session.

HELP VIEW <view-name> ;   
This command is used to display the column names, type identifier, and comments on the columns within a view.

HELP MACRO <macro-name> ;   
This command is used to display the characteristics of parameters passed to it at execution time.

HELP TRIGGER <trigger-name> ;   
This command is used to display details created for a trigger, like action time and sequence.

HELP PROCEDURE <procedure-name> ;   
This command is used to display the characteristics of parameters passed to it at execution time.

HELP COLUMN <table-name>.*  OR  HELP COLUMN <view-name>.*  OR HELP COLUMN <table-name>.<column-name>, .…;   
This command is used to display detail data describing the column level characteristics.

HELP INDEX <table-name> ;   
This command is used to display the indexes and their characteristics like unique or non-unique and the column or columns involved in the index. This information  is used by the Optimizer to create a plan for SQL.

HELP STATISTICS <table-name> ;   
This command is used to display values associated with the data demographics collected on the table. This information is used by the Optimizer to create a plan for SQL.

HELP CONSTRAINT <table-name>.<constraint-name> ;   
This command is used to display the checks to be made on the data when it is inserted or updated and the columns are involved.

HELP 'SQL';   
This command is used to display a list of all  available SQL commands and functions.

HELP 'SQL  <command>';   
This command is used to display the basic syntax and options for the SQL command used in place of the <command>.

HELP 'SPL';   
This command is used to display a list of available SPL commands.

HELP 'SPL <command>';   
This command is used to display the basic syntax and options for the SPL command used in place of the <command>.

Working around with Transposition of Table data

I was working around with some transposition  and want to share one of the samples .

Consider the customer table ,having customer and month details
customer...... month
Ron................ 1
Kev................. 2
joh................. 1
Nel................. 2
Ave................. 11
Cin................. 10
tra................. 3

Case statement play very important role in transposition of rows to columns and viceversa. In the following scenarios , we can find the extensive usage of case statement 

Scenario 1:
Display total number of customers for each month

jan....feb....mar....apr....may....jun....jul....aug....sep....oct....nov....dec
2......2......1......0......0......0......0......0......0......1......1......0....

The sql query is as follows:

sel
count(case when month = '1' then customer else null end) "jan",
count(case when month = '2' then customer else null end) "feb",
count(case when month = '3' then customer else null end) "mar",
count(case when month = '4' then customer else null end) "apr",
count(case when month = '5' then customer else null end) "may",
count(case when month = '6' then customer else null end) "jun",
count(case when month = '7' then customer else null end) "jul",
count(case when month = '8' then customer else null end) "aug",
count(case when month = '9' then customer else null end) "sep",
count(case when month = '10' then customer else null end) "oct",
count(case when month = '11' then customer else null end) "nov",
count(case when month = '12' then customer else null end) "dec"
from CUST_TABLE ;


Scenario 2:
Display customer and month details with every customer mapped to corresponding month

customer....jan....feb....mar....apr....may....jun....jul....aug....sep....oct....nov....dec
Ron...........1......0......0......0......0......0......0......0......0......0......0......0....
Kev...........0......1......0......0......0......0......0......0......0......0......0......0....
joh...........1......0......0......0......0......0......0......0......0......0......0......0....
Nel...........0......1......0......0......0......0......0......0......0......0......0......0....
Ave...........0......0......0......0......0......0......0......0......0......0......1......0....
Cin...........0......0......0......0......0......0......0......0......0......1......0......0....
Tra...........0......0......1......0......0......0......0......0......0......0......0......0....

The sql query is as follows:

sel
customer,
count(case when month = '1' then customer else null end) "jan",
count(case when month = '2' then customer else null end) "feb",
count(case when month = '3' then customer else null end) "mar",
count(case when month = '4' then customer else null end) "apr",
count(case when month = '5' then customer else null end) "may",
count(case when month = '6' then customer else null end) "jun",
count(case when month = '7' then customer else null end) "jul",
count(case when month = '8' then customer else null end) "aug",
count(case when month = '9' then customer else null end) "sep",
count(case when month = '10' then customer else null end) "oct",
count(case when month = '11' then customer else null end) "nov",
count(case when month = '12' then customer else null end) "dec"
from CUST_TABLE;

Find column type using built in function!!!

Did you know that datatype of column could be found  using a built in function called "TYPE" .

Till I came across this function, I used to follow the ancient method of digging into Data  Dictionaries(dbc.columns) to find the type of given column.

There are some cases in which people use CASE function to find the match for type of the column and based on which they do  execute steps.
E.g:  Check for column as timestamp , if yes extract date or so...

To avoid usage of dbc.columns table , we can use Type(Column Name)  function

SELECT TYPE(TAB1.COL1);

Type(COL1)
-----------
INTEGER

Calculate average on Time field

Strange, but yes we can calculate average on time fields. Consider a scenario in which a table has following structure

City            Mode        Start-time             end-time
char(10)    char(10)     char(5)(HH:MI)    char(5)(HH:MI)
------------------------------------------------------------------
Paris          Taxi                04:55                09:33
Rome        Taxi                02:14                 08:44
Madrid      Shuttle            01:21                 03:19

The requirement is to calculate average end-time from the table.

Since end-time is char (5), this has to be converted into Time format HH:MI:SS , before calculating the average

When A cast is applied as follows
cast(end-time||':00' as time) and then try to get an average , We will get following error

Error - "invalid operation on an ANSI date-time or interval"

To fix, this issue the concept of interval is applied.

Select CITY, avg (cast(trim(end-time)||':00' as interval hour(4) to second (0) ))
from TRAVEL group by CITY ;

Hence, using interval, we can override the ANSI time error and calculate average on time field

Teradata Timestamp tips & tricks - 3


Tip1: Extracting month from date as 'MMM'

select current_date (format 'MMM');

Date
----
Oct

 Please use " sel cast ((cast(current_date as format 'mmm')) as char(3));   "  while querying on Sql Assistant(queryman)


Tip2: Converting number of seconds to Minutes and Seconds

If i want to convert the input of seconds into minutes+seconds,    following query is used
ex: 300 seconds --> 0005:00  or 5:00 mins

create table table1
( second_s integer );
insert into table1 (300);

Select second_s * INTERVAL '0000:01' MINUTE TO SECOND from Table1;

Please note that ,if seconds interval is more then 9999(approx 7 days) , it will probably result in a "7453: Interval field overflow".


Tip3: Converting number of seconds to Minutes and Seconds
Similarly , We can convert seconds to Day to second format as well!!

Select second_s* INTERVAL '0000 00:00:01' DAY TO SECOND from table1;

This will work for up to 27 years  after which it results in "7453: Interval field overflow" error.


Tip4: How to add/subtract minutes to timestamp!!

Problem: To the given timestamp value, how can we add or subtract given minutes.
ex:
 Timestamp1                          Minutes1         Result 
2010-01-04 02:10:08   (+)    330               = 2010-01-04 07:40:08
2010-01-02 18:04:32   (-)     245               = 2010-01-02 13:59:32

create table Table_1
(Timestamp1 timestamp(0) ,Minutes1 integer );
insert into table_1 values('2010-01-04 02:10:08',330);
insert into table_1 values('2010-01-02 18:04:32',245);

select Timestamp1 + (Minutes1 * interval '1' minute) from table_1;
select Timestamp1 - (Minutes1 * interval '1' minute) from table_1;


How to insert a string having quotes into a column?

Problem:
Say there are some values coming for given column say comments  like
'It's a wonderful day'
'I don't know '

As we know that teradata  considers anything between two quotes to be 1 statement . But what if we need to insert the statement having the quotes as apostrophe in it?(above examples)

CREATE SET TABLE SAMPLE1
(
col1 VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC)
PRIMARY INDEX ( col1 );

To insert these values , I need to use insert statements as below.
insert into SAMPLE1 values('It"s a wonderful day');
insert into SAMPLE1 values('I don"t know ');

In case of teradata, two consequent apostrophe    are used instead of one’

Can i Generate random numbers in Teradata?

There is a scenario in which I need to generate RANDOM numbers for a given column (integer ). In teradata we can use a function called as RANDOM(min_value, max_value) .

This is implemented in sql as follows.

Select RANDOM(1,500) from table;

Please note:    There might be cases where Random function can produce duplicate values. SO while designing tables , make sure that the column making use of RANDOM is not part of UNIQUE PRIMARY INDEX


How to find tables that have a PPI

In teradata we use lot of data dictionary tables (dbc)  to find definition/details about anything related to data.

How to find the tables which have PPI(partitioned primary index) ?
We make use of DBC.indices view to find any details regrading index.

Query :

SEL * FROM dbc.indices
WHERE indextype = 'Q'          ---- 'Q' means Partitioned Primary Index
AND databasename = 'DATABASENAME' ;

Infact , DBC.Indices can be used to find out various types of index associated with tables in a given database.

which is faster ? select * from table or select 'all Columns' from table ??

Many of us would have come across a scenario where  listing column names in select SQL  was found to be faster then using select * from table  command . This indeed is interesting .


The reason being ,

In case of using "select * from table" , A extra stage is added  where * is replaced by column names by teradata  and then it would fetch the data .

But using "select <all Columns > from table "  eliminates this extra stage of verifying and fetching on columns from the table.

Hence it is always recommended to use "select <all Columns > from table "

How to Convert column data to rows in teradata

Problem :
There is a requirement to convert column data into rows.
Consider the country table with data as follows

Table has 4 columns (country , value1,value2,value3)
NZ  50  60  70
AUS 110 120 130

The output should be displayed in following manner
country    value
NZ  50
NZ  60
NZ  70
AUS 110
AUS 120
AUS 130

Resulting query:
select country, value1 as value from T_country
union all
select country, value2 as v from T_country
union all
select country, value3 as v from T_country;

There might be various  approaches to this problem.Union was found out to be efficient and served the purpose.

Materialised view in teradata? Is it possible?

Most of the databases uses concept of materialised views  (which stores data ). Sadly, in case of teradata  there is no such concept of materialised views.

The closest option of having materialsed view in case of teradata is by using JOIN index (or aggregate index) .  A view can be created on join INDEX  . The command used to create index is as follows

 "CREATE JOIN INDEX...(Select table options)".

When Join Index is executed, it stores data as a table in spool space, hence making the join much faster

 Please Note:
There are drawbacks on  Join Indices.
1. Optimizer would determine whether The index is beneficial. If yes then it invokes it!
2.As the join index cannot be accessed directly and a view  is created that looks like the join index . However, this approach does not guarantee that the join index will be used when view is called.

How to pass parameter to BTEQ script (Windows) ?

How to pass parameter to BTEQ script (Windows) ?

I was wondering whether we could pass parameters to Bteq script. After doing some research , found out a closest way to passing parameter to Bteq script.  Actually this involves creating of BATCH file".bat"  and passing parameter to it.


Let us try this for following select query
select * from user.tab_test1  where test1='2010';

STEP1:
The batch file is created as follows (script2.bat)
del script1.txt
echo .logon localhost/test,test >>script1.txt
echo select * from user.tab_%1%  where %1% ='2010'; >>script1.txt
echo .logoff >> script1.txt
echo .quit >> script1.txt
bteq < script1.txt


STEP2:
Run the batch files as follows
C:\>script2.bat test1

STEP3:
we can see following execution steps

C:\>del script.txt
C:\>echo .logon localhost/test,test 1>>script1.txt
C:\>echo select * from user.tab_test1 where test1='2010' ; 1>>script1.txt
C:\>echo .logoff 1>>script1.txt
C:\>echo .quit 1>>script1.txt
C:\>bteq 0

BTEQ 08.02.00.00 Thu Jul 18 09:33:12 2010

+---------+---------+---------+---------+---------+---------+---------+----
.logon localhost/test,

*** Logon successfully completed.
*** Transaction Semantics are BTET.
*** Character Set Name is 'ASCII'.

*** Total elapsed time was 5 seconds.

+---------+---------+---------+---------+---------+---------+---------+----
select * from user.tab_test1  where test1='2010';

*** Query completed. One row found. One column returned.
*** Total elapsed time was 1 second.

test1
-----------
2010

+---------+---------+---------+---------+---------+---------+---------+----
.logoff
*** You are now logged off from the DBC.
+---------+---------+---------+---------+---------+---------+---------+----
.quit
*** Exiting BTEQ...
*** RC (return code) = 0

Purpose of CASESPECIFIC in teradata

Casespecific function in teradata is used to get excat value  say  for  example .product code 'Rac123Sa'  .

SELECT  product_code
FROM product_tab
WHERE product_code (CASESPECIFIC) LIKE '%Ra%';

Note: Usually the comparison  in teradata is not case-specific. To enforce CASESPECIFIC , we use CASESPECIFIC function

How to split source column into multiple target columns ( full name to first and Last)


Problem: To split fullname into firstname and lastname to be inserted into Target table.


Approach:

CREATE SET TABLE test
fullname varchar(30)
);


INSERT INTO test12 ('nitin raj');
INSERT INTO test12 ('nitin agarwal');
INSERT INTO test12 ('abhishek gupta');


sel * FROM    test;
fullname
nitin agarwal
nitin raj
abhishek gupta


Use index to find the position of space "SPACE" in full name and then use the position to get
--> firstname  =fullname from 1st till (SPACE-1)
-->lastname = fullname from (SPACE+1)

SELECT      INDEX(fullname ,' ') AS "a", SUBSTR(fullname,1, a-1 ) ,

SUBSTR(fullname,a+1 )  FROM        test;

a
Substr(fullname,1,(a-1))
Substr(fullname,a)
6
nitin
agarwal
6
nitin
raj
9
abhishek
gupta


Difference between read lock and access lock?

For access lock to be placed we have to specify it explicitly as LOCKING FOR ACCESS.
Access lock is one wherein the table is not locked means you can do insert/update/delete on the table while access lock will access the table ,in this the dirty reads may happen , which  means you will not get the latest changes whatever happened on table to be reflected in your answer set.

 READ lock will lock the table wherein you can not do insert/update and structural changes in table.
It is placed by simple SELECT statement for by explicitly specifying as LOCKING FOR READ.

Syntax:
read lock :
Locking table for Access;

access lock :
Locking table for Read Access

The main difference between read lock and access lock is data Integrity.On placing a read lock the user expects data integrity, while as for access lock the user cannot expect data integrity.

Consider following scenarios
1. User A places READ lock and User B WRITE places lock
User B will have to wait for User A to complete its read in order to start insert/updates/deletes on the data.

2. User A places ACCESS lock and User B WRITE places lock.
User A & B access the data simultaneously, hence User A cannot expect to get consistent results.

How to create and use Macro ?

A macro is a Teradata extension to ANSI SQL that contains pre written SQL  statements. Macros are used to run a repeatable set of tasks.The details of macro can be found in data dictionary (DD)  . Macros are database objects and thus they belong to a specified user or database. A macro can be executed by Queryman. ,  BTEQ, by another macro.


How to create a Macro
Create a macro to generate a DOB list for department 321:

CREATE MACRO DOB_Details AS
(SELECT  first_name ,last_name  ,DOB
FROM TERADATA.employees
WHERE dept_numbr =321
ORDER BY DOB asc;);

EXECUTE a Macro
To execute a macro, call it along with  the exec command.
EXEC DOB_Details;

last_name    first_name    DOB
Ram            Kumar         75/02/22
Laxman       Sinha           79/04/06

DROP a Macro
To drop a macro, use following command .
DROP MACRO DOB_Details;

REPLACE a Macro
If we need to modify an existing macro , instead of  dropping and re-creating it
We can use replace macro command as follows

REPLACE MACRO DOB_Details AS      
(SELECT first_name,last_name ,DOB
FROM TERADATA.employees
WHERE dept_numbr = 321
ORDER BY DOB, first_name;);

Parameterized Macros

Parametrized macros allow usage of variables .  we can pass values to these variables. Advantage of using parametrized macros is  , Values can be passed to these variables at run-time.

Example
CREATE MACRO dept_list (dept INTEGER) AS
(
SELECT last_name
FROM TERADATA.employees
WHERE dept_numbr = :dept; );

To Execute the macro
EXEC dept_list (321);

Macros may have more than one parameter. Each name and its associated type are separated by a comma from the next name and its associated type. The order is important. The first value in the EXEC of the macro will be associated with the first value in the parameter list. The second value in the EXEC is associated with the second value in the parameter list, and so on.

Example
CREATE MACRO emp_verify (dept_numbr INTEGER ,salary DEC(18,0))
AS (
SELECT emp_numbr
from TERADATA.employees
WHERE dept_numbr = :dept
AND salary< :sal;) ;

To Execute this  macro
EXEC emp_check (301, 50000);

Key points to note about Macros:
  • Macros are a Teradata extension to SQL.
  • Macros can only be executed with the EXEC privilege.
  • Macros can provide column level security.
NOTE:  A user must have  EXEC privileges to execute the macros.  It doesn't matter if
he has privileges for the underlying tables or views that the macro uses.

How can you view Data Distribution in Teradata ?



Teradata uses HASH values to store data in AMPs. To view data distribution we use  Hash Functions.
Hash functions are usually used over primary index columns to find data distribution . We can identify skewness  by using this concept .
Following query can be used to find hash values of PI columns

SELECT HASHAMP(HASHBUCKET(HASHROW(<PRIMARY INDEX>))) AS
"AMP#",COUNT(*)
FROM <TABLENAME>
GROUP BY 1
ORDER BY 2 DESC;

By looking at result ,  you  query you can easily find out the Data Distribution across all AMPs in your system and further you can easily identify un-even data distribution.


definitions :
HASHROW - returns the row hash value for a given value
HASHBUCKET - the grouping of a specific hash value
HASHAMP - the AMP that is associated with the hash bucket

How to check for table size in Teradata



The folllowing queries are used to check for tablesize in Teradata 

How can you find the Table Space Size of your table across all AMPs ?
 
SELECT DATABASENAME, TABLENAME, SUM(CURRENTPERM)
FROM DBC.TABLESIZE
WHERE DATABASENAME = '<DATABASE_NAME>'
AND TABLENAME = '<TABLE_NAME>'
GROUP BY DATABASENAME , TABLENAME;


Following query will give the space consumed on each AMP by the Table

SELECT DATABASENAME, TABLENAME, CURRENTPERM
FROM DBC.TABLESIZE
WHERE DATABASENAME = ‘<DATABASE_NAME>'
AND TABLENAME = '<TABLE_NAME> ';

Error code :2616 : Numeric overflow occured during computation

Below is the scenario when this error occurs

2616 : Numeric overflow occured during computation
sel count(*) from DPRODUIT;
Numeric overflow occured during computation


Reason:  result value is more then integer range.   The output was 2,333,452,124  --> which is more than integer limit. Hence it resulted in numeric overflow.

How to fix the issue?
Solution:
Using cast to convert result from integer to decimal will work  .
COunt(*)  always provides result in integer format. Hence using  cast we can get values larger than integer limit of * characters

sel  cast (count(*) as decimal(18,0) ) from DPRODUIT;
result : 2,333,452,124

Teradata TIMESTAMP tips & tricks - Part 2

This is continuation of following topic http://www.teradatahelp.com/2010/08/teradata-timestamp-tips.html.

Some more tips which might be useful while working on date and timestamp. (Queries run and verified on V2R6)


Tip 1: Adding and Subtracting Dates in Teradata

select my_date + interval '10' day from my_table

And equally to subtract values from a date:
select my_date - interval '10' day from my_table


Tip 2: Calculating the last day in a given month

select add_months(current_date,1) - extract(day from add_months(current_date,1)) ;


Tip 3:  Usage of SYS_CALENDAR.CALENDAR

Teradata provides a handy calendar table for date lookup called  sys_calendar.calendar.
Using this table you can calculate useful dates such as weekending dates, quarter ending dates, etc.

example1 : Find the previous Friday from today:
 selectfrom   sys_calendar.calendar where  day_of_week=6 and calendar_date between date -6 and date;

example2 :Find the first Monday one week after the last Friday of previous month:
select a.calendar_date from sys_calendar.calendar a,
(
sel max(calendar_date) + 10 as calendar_date
from sys_calendar.calendar 
where extract(year from add_months( date, -1)) = year_of_calendar
and
extract(month from add_months(date,-1)) = month_of_year and day_of_week = 6
) b
where
a.calendar_date = b.calendar_date;


Tip 4: Adding or Subtracting Months
  
select add_month(date_col,1) from table/* adding of month */
select add_month(date_col,-10) from table; /* subtraction of month */

How to select Nth row from a table ?

how to select a particular row from the table?
Ans:

By using  ROW_NUMBER() Function , we can mark all the selected rows with numbers .
Then use QUALIFY clause to get excat row number.


select
columnA,columnB
from
source_table
qualify row_number() over (order by columnA ) = Nth record ;
Here 'N' being particular row number.

P.S:  The same query can be used to select Top N records;
select
columnA,columnB
from
source_table
qualify row_number() over (order by columnA ) <= N;

How to generate row numbers in teradata?

We have come across rownum function in oracle . This function gives number for every row/record in Oracle. In teradata there is no direct approach like in oracle. There are various approaches suggested around.

Approach 1:
Here Sum function is used  over rows preceding in the SOURCE_TABLE

select
sum(1) over( rows unbounded preceding ),
columnA,
columnB
from
SOURCE_TABLE;

Approach 2:
Here ROW_NUMBER function is used to generate row_number on columnA
select
ROW_NUMBER() over( ORDER BY columnA ),
columnA,
columnB
from
SOURCE_TABLE;


If you have  to use the  row number concept  in target table  as well, then  following  approach using "identity column" (from V2R6 onwards )  be used :

CREATE MULTISET TABLE TARGET_TABLE
  (
   columnA INTEGER GENERATED BY DEFAULT AS IDENTITY
       (START WITH 1
        INCREMENT BY 20) ,
   columnB VARCHAR(20) NOT NULL
  )
UNIQUE PRIMARY INDEX pidx (ColA);

P.S:  Identity columns does differ from sequence concept in oracle. The numbers assigned in these columns are not guaranteed to be sequenctial. The Identity column in Teradata is used to guaranteed row-uniqueness.

This  works without use of Identity approach.
create TABLE TARGET_TABLE as
(
   select
   ROW_NUMBER() over( ORDER BY columnA ) NUMBER_SEQ ,
   columnA,
   columnB,
   columnC
   from a join b on a.id=b.id
) with data ;

FAQ's on Collect statistics

Collect stats just derives the data demographics of the table. These demographics are useful for optimizer to decide the execution of given query which in turn improves performance.

It collects the information like:
  • total row counts of the table,
  • how many distinct values are there in the column,
  • how many rows per value, is the column indexed,
  • if so unique or non unique etc.


What if collect stats is not done on the table?
Ans:
Teradata uses a cost based optimizer and cost estimates are done based on statistics.
So if you donot have statistics collected then optimizer will use a Dynamic AMP Sampling method to get the stats. If your table is big and data was unevenly distributed then dynamic sampling may not get right information and your performance will suffer.


How can i know the tables for which the collect stats has been done?
Ans:
Run the Help Stats command on that table.
e.g HELP STATISTICS TABLE_NAME ;
this will give you Date and time when stats were last collected. You will also see stats for the columns ( for which stats were defined) for the table


Whenever collect stats is done on the particular  table(say on index/column) where can I find information regarding these entries ?
Ans:
Collected statistics are stored in DBC.TVFields or DBC.Indexes tables. However, these two tables cannot be queried .


When to collect stats on  tables which have stats ?

Ans:
1.    Typical guideline is  roughly 10% of the data has changed. (By measuring delta in perm space since last collected.) 
2.    Recollect based on stats that have aged 60-90 days. (say last time stats collected was 2 months ago) .

Please note :
  Collect stats could be pretty resource consuming for large tables. So it is always advisable to schedule the job at off peak period .

Permanent Journal

I did some homework on the topic and would like to share the same . comments and suggestions are welcome :)


Permanent Journal

The purpose of a permanent journal is to maintain a sequential history of all changes made to the rows of one or more tables.
Permanent journals help protect user data when users commit, uncommit or abort transactions.
A permanent journal can capture a snapshot of rows before a change, after a change, or both.
 Permanent journaling is usually used to  protect data.
likein case of  the automatic journal, the contents of a permanent journal remain until you drop them.

When you create a new journal table, you can use several options to control the type of information to be captured.


 We create Permanent Journal  when the User or Database is created.
Consider following example for creation of database
CREATE DATABASE testdat
FROM xxspace
AS
PERM = 20000000
SPOOL = 2000000
NO FALLBACK
ACCOUNT = '$xxxxx'
NO BEFORE JOURNAL
AFTER JOURNAL
DEFAULT JOURNAL TABLE = testdat.journal;


Here Admin has opted   for only AFTER JOURNAL and he has name the journal table as "testdat.journal".
When user creates a table in the database "testdat" , by default AFTER JOURNAL is available for him to protect his data when the hardware failure occurs.

He can opt for NO AFTER JOURNAL by overriding the default. Follwoing is the example.

Scenario1 : Here  by default the table has AFTER JOURNAL option.

"CREATE TABLE testdat.table
( field1 INTEGER,
field2 INTEGER)
PRIMARY INDEX field1;

Scenario2: in this case, user has specifically stated he wanted no AFTER JOURNAL for his data. This is how user can override the defult.

CREATE TABLE testdat.table2
FALLBACK,
NO AFTER JOURNAL
( field1 INTEGER,
field2 INTEGER)
PRIMARY INDEX field1;

In this case whenever the user inserts/updates and the transaction is committed , then the affected rows will be taken backup in the journal table "testdat.journal".


Please note :
You must allocate sufficient permanent space to a database or user that will contain permanent journals. If a database or user that contains a permanent journal runs out of space, all table updates that write to that journal abort.

How to change data-type of primary index column for a given Table

Situation: There is a requirement to change the datatype of primary index column .

Solution:
  1. You could run a Show table command to get the exact DDL, then change datatype ( say for example char(8)) to datatype varchar(10) .
  2. Run the ddl script to create the table.
  3. Then you can run an insert select command to insert data into the new table.
  4. Since the PI of both the tables are same, the operation would be pretty fast.
  5. Then DROP the original table and rename the new one to the old one.

Inserting data into an empty table is very quick because there is not reference to transient journal.

please note , if you use
  • CREATE new_table AS existing_table" preserves all column attributes and indexes (just Triggers and Foreign Keys are removed)
  • But using CREATE new_table AS (SELECT * FROM existing_table) will remove NOT NULL and TITLE properties.





Introduction to teradata utilities : BTEQ

BTEQ is a Teradata native query tool for DBA and programmers. BTEQ (Basic
TEradata Query) is a command-driven utility used to 1) access and manipulate
data, and 2) format reports for both print and screen output.




DEFINITION

BTEQ, short for Basic TEradata Query,
is a general-purpose command-driven utility used to access and manipulate data
on the Teradata Database, and format reports for both print and screen output. [1]



OVERVIEW

As part of the Teradata Tools and Utilities (TTU), BTEQ is a
Teradata native query tool for DBA and programmers — a real Teradata workhorse,
just like SQLPlus for the Oracle Database. It enables users on a workstation to
easily access one or more Teradata Database systems for ad hoc queries, report
generation, data movement (suitable for small volumes) and database
administration.

All database requests in BTEQ are expressed in Teradata
Structured Query Language (Teradata SQL). You can use Teradata SQL statements
in BTEQ to:

    * Define
      data — create and modify data structures;
    * Select
      data — query a database;
    * Manipulate
      data — insert, delete, and update data;
    * Control
      data — define databases and users, establish access rights, and secure
      data;
    * Create
      Teradata SQL macros — store and execute sequences of Teradata SQL
      statements as a single operation.


BTEQ supports Teradata-specific SQL functions for doing
complex analytical querying and data mining, such as:

    * RANK -
      (Rankings);
    * QUANTILE
      - (Quantiles);
    * CSUM -
      (Cumulation);
    * MAVG -
      (Moving Averages);
    * MSUM -
      (Moving Sums);
    * MDIFF
      - (Moving Differences);
    * MLINREG
      - (Moving Linear Regression);
    * ROLLUP
      - (One Dimension of Group);
    * CUBE -
      (All Dimensions of Group);
    * GROUPING
      SETS - (Restrict Group);
    * GROUPING
      - (Distinguish NULL rows).


Noticeably, BTEQ supports the conditional logic (i.e.,
"IF..THEN..."). It is useful for batch mode export / import
processing.



OPERATING FEATURES

This section is based on Teradata documentation for the
current release.[1]



BTEQ Sessions

In a BTEQ session, you can access a Teradata Database easily
and do the following:

    * enter
      Teradata SQL statements to view, add, modify, and delete data;
    * enter
      BTEQ commands;
    * enter
      operating system commands;
    * create
      and use Teradata stored procedures.






Operating Modes

BTEQ operates in two modes: interactive mode and batch mode.
In interactive mode, you start a BTEQ session by entering BTEQ at
the system prompt on your terminal or workstation, and submit commands to the
database as needed. In batch mode, you prepare BTEQ scripts or macros, and then
submit them to BTEQ from a scheduler for processing. A BTEQ script is a set of
SQL statements and BTEQ commands saved in a file with the extension
".bteq"; however, it does not matter what file extension is used. The
BTEQ script can be run using the following command (in UNIX or Windows):

bteq < infle > outfile

Here infile is the BTEQ script, and outfile
is the output or log file.



BTEQ COMMANDS

This section is based on Teradata documentation[1],
and for the detailed usage, please refer to Reference 1.



BTEQ Command Summary

BTEQ commands can be categorized into four functional
groups, as described below:

    * Session
      control — Session control commands begin and end BTEQ sessions, and control
      session characteristics;
    * File
      control — specify input and output formats and identify information
      sources and destinations;
    * Sequence
      control — control the sequence in which other BTEQ commands and Teradata
      SQL statements will be executed within scripts and macros;
    * Format
      control — control the format of screen and printer output.



1. Commands for Session Control


COMMAND NAME

FUNCTION

ABORT

abort any active requests and transactions without exiting
BTEQ.

COMPILE

create or replace a Teradata stored procedure.

DECIMALDIGITS

override the precision specified by a CLI System Parameter
Block (SPB) max_decimal_returned entry,

or if that entry does not exist,
to indicate what the precision should be for decimal
values associated with subsequently issued SQL requests for non-fieldmode
responses.

DEFAULTS

Resets BTEQ command options to the values that were set
when BTEQ was first invoked.

EXIT

end the current sessions and exit BTEQ.

HALT EXECUTION

abort any active requests and transactions and exit BTEQ;
also called "HX".

LOGOFF

end the current sessions without exiting BTEQ.

LOGON

start a BTEQ session.

LOGONPROMPT

bypass the warnings related to conventional LOGON command
use.

QUIT

end the current sessions and exit BTEQ.

SESSION CHARSET

specify the name of a character set for the current
session.

SESSION RESPBUFLEN

override the buffer length specified in resp_buf_len.

SESSION SQLFLAG

specify the disposition of warnings issued in response to
violations of ANSI-compliant syntax.

SESSION TRANSACTION

specify whether transaction boundaries are determined by
Teradata SQL semantics or ANSI semantics.

SESSION TWORESPBUFS

specify whether CLI double-buffering is used.

SESSIONS

specify the number of sessions to use with the next LOGON
command.

SHOW CONTROLS

display the current configuration of the BTEQ control
command options.

SHOW VERSIONS

display the BTEQ version number, module revision numbers,
and linking date.

TDP

specify the Teradata server for subsequent logons during
the current session.


2. Commands for File Control


COMMAND NAME

FUNCTION

=

Repeats the previous Teradata SQL request a specified
number of times.

AUTOKEYRETRIEVE

enables users to specify whether the values of any fields
associated with Identity Data are returned in response to a SQL Insert
operation.

CMS

executes a VM CMS command from within the BTEQ
environment.

ERROROUT

Routes the standard error stream and the standard output
stream to two files or devices for channel-attached systems, or to one file
or device for network-attached client systems.

EXPORT

Specifies the name and format of an export file that BTEQ
will use to store database information returned by a subsequent SQL SELECT
statement.

EXPORTEJECT

Enables suppression of the additional Page Advance ASA
Carriage Control Character at the top of the EXPORT file in REPORT mode for
MVS/VM BTEQ.

HALT EXECUTION

aborts any active requests and transactions and exit BTEQ;
also called "HX".

FORMAT

Enables all of the page-oriented formatting commands, or
disables them and centers the response from SQL SELECT statements, using the
value of the WIDTH command option to determine the space available.

IMPORT

Opens a channel- or network-attached system file, of the
specified format, to provide data for USING modifiers of subsequent SQL
statements.

INDICDATA

Specifies the mode of information returned from the
Teradata Database in response to SQL SELECT statements.

INDICDATA and / or LARGEDATAMODE

specify the response mode, either Field mode, Indicator
mode, Record mode, or Multipart Indicator Mode, for data selected from the
Teradata Database.

LARGEDATAMODE

Enables use of Teradata Database’s Multipart Indicator
response mode for inline mode retrieval of Large Object (LOB) data. BTEQ
limits the record size for exported files to approximately 64K (65473 for
workstation builds and 64260 for mainframe builds).
If more than 64K is required, SET LARGEDATAMODE allows
hex-dump style output (similar to RecordMode directed to standard output).

OS

executes an MS-DOS, PC-DOS, or UNIX command from within
the BTEQ environment.

QUIET

Limits BTEQ output to errors and request processing
statistics. BTEQ displays the results in a format that is suitable for
performance testing.

RECORDMODE

Returns data from SQL SELECT statements in client-oriented
data representations rather than character format.

REPEAT

submits the next request a specified number of times.

RUN

executes Teradata SQL requests and BTEQ commands from a
specified run file.

TSO

executes an MVS TSO command from within the BTEQ environment.


3. Commands for Sequence Control

Use the following commands to control the sequence in which
BTEQ executes commands:

  • ABORT
  • ERRORLEVEL
  • EXIT
  • GOTO
  • HANG
  • IF...
    THEN...
  • LABEL
  • MAXERROR
  • QUIT
  • REMARK
  • REPEAT
  • =

For the commands not listed below, refer to the tables
above.


COMMAND NAME

FUNCTION

ERRORLEVEL

Assigns severity levels to errors.

GOTO

Skips over all intervening BTEQ commands and SQL
statements until a specified label is encountered, then resumes processing in
sequence.

HANG

Pauses BTEQ processing for a specified period of time.

IF... THEN...

Tests the validity of the condition stated in the IF
clause.

LABEL

Identifies the point at which BTEQ resumes processing, as
specified in a previous GOTO command.

MAXERROR

Designates a maximum error severity level beyond which
BTEQ terminates job processing.


4. Format Control Commands

Use the following BTEQ commands to specify the way BTEQ
presents information for screenoriented and printer/printer-file oriented
output:

  • DEFAULTS
  • ECHOREQ
  • EXPORT
  • FOLDLINE
  • FOOTING
  • FORMAT
  • HEADING
  • IMPORT
  • INDICDATA
  • NULL
  • OMIT
  • PAGEBREAK
  • PAGELENGTH
  • QUIET
  • RECORDMODE
  • RETCANCEL
  • RETLIMIT
  • RETRY
  • RTITLE
  • SEPARATOR
  • SHOW
    CONTROLS
  • SIDETITLES
  • SKIPDOUBLE
  • SKIPLINE
  • SUPPRESS
  • TITLEDASHES
  • UNDERLINE
  • WIDTH

For the commands not listed below, refer to the tables
above.


COMMAND NAME

FUNCTION

ECHOREQ

Enables the echo required function that returns a copy of
each Teradata SQL request and BTEQ command to the standard output stream.

FOLDLINE

Splits (fold) each line of a report into two or more
lines.

FOOTING

Specifies a footer to appear at the bottom of every page
of a report.

HEADING

Specifies a header to appear at the top of every page of a
report.

NULL

Specifies a character or character string to represent
null field values returned from the Teradata Database.

OMIT

Excludes specified columns returned from SQL SELECT
statements.

PAGEBREAK

Ejects a page whenever the value for one or more specified
columns changes.

PAGELENGTH

specify the page length of printed reports, in lines per
page.

RETCANCEL

cancel a request when the value specified by the RETLIMIT
command ROWS option is exceeded.

RETLIMIT

Specifies the maximum number of rows and/or columns
displayed or written in response to a Teradata SQL request.

RETRY

resubmit requests that fail under certain error
conditions.

RTITLE

Specifies a header to appear at the top of every page of a
report.

SEPARATOR

Specifies a character string or width (in blank
characters) to separate columns of a report.

SIDETITLES

Position summary titles to the left of the summary lines
in a report.

SKIPDOUBLE

insert two blank lines in a report whenever the value of a
specified column changes.

SKIPLINE

Inserts a blank line in a report whenever the value of a
specified column changes.

SUPPRESS

Replaces all consecutively repeated values with all-blank
character strings.

TITLEDASHES

Display a row of dash characters before each report line
summarized by a WITH clause.

UNDERLINE

Displays a row of dash characters whenever the value of a
specified column changes.

WIDTH

Specifies the width of screen displays and printed
reports, in characters per line.

Teradata interview questions PART 3

Teradata interview questions PART 3 consolidated over period of time from various sources. Thanks! to the original posters

=================================================================

Can you recover the password of a user in Teradata?

Answers:

No, you can’t recover the password of a user in Teradata. Passwords are stored in this data dictionary table (DBC.DBASE) using a one-way encryption method. You can view the encrypted passwords using the following query

SELECT * FROM DBC.DBASE;

=================================================================

Explain Ferret Utility in Teradata?

Ferret (File Reconfiguration tool) is an utility which is used to display and set Disk Space Utilization parameters within Teradata RDBMS. When you select the Ferret Utility parameters, it dynamically reconfigures the data on disks. We can run this utility through Teradata Manager; to start the Ferret Utility type (START FERRET) in the database window.

Answers:
Following commands can be used within Ferret Utility:

1. SHOWSPACE – Well this command reports you the amount of Disk Cylinder Space is in use and the amount of Disk Cylinder Space is available in the system. This will give you an information about Permanent Space cylinders, Spool Space cylinders, Temporary Space cylinders, Journaling cylinders, Bad cylinders and Free cylinders. For each of these 5 things it will present you 3 parameters i.e. Average Utilization per cylinder, % of total avaliable cylinders and number of cylinders.
2. SHOWBLOCKS – This command will help you in identifying the Data Block size and the number of Rows per data block. This command displays the Disk Space information for a defined range of Data Blocks and Cylinders.

=================================================================

Explain TPUMP (Teradata Parallel Data Pump) Utility in Teradata?

Answers: 

* TPUMP allows near real time updates from Transactional Systems into the Data Warehouse.
* It can perform Insert, Update and Delete operations or a combination from the same source.
* It can be used as an alternative to MLOAD for low volume batch maintenance of large databases.
* TPUMP allows target tables to have Secondary Indexes, Join Indexes, Hash Indexes, Referential Integrity, Populated or Empty Table, Multiset or Set Table or Triggers defined on the Tables.
* TPUMP can have many sessions as it doesn’t have session limit.
* TPUMP uses row hash locks thus allowing concurrent updates on the same table.

=================================================================

How can you determine I/O and CPU usage at a user level in Teradata?

Answers: 

You can find out I/O and CPU Usage from this Data Dictionary Table DBC.AMPUSAGE;

SELECT ACCOUNTNAME, USERNAME, SUM(CPUTIME) AS CPU, SUM(DISKIO) AS DISKIO FROM DBC.AMPUSAGE GROUP BY 1,2 ORDER BY 3 DESC;

=================================================================

How can you find the Table Space Size of your table across all AMPs?

Answers:

You can find the Table Space Size of your table from this Data Dictionary Table DBC.TABLESIZE

SELECT DATABASENAME, TABLENAME, SUM(CURRENTPERM) FROM DBC.TABLESIZE WHERE DATABASENAME = ‘’ AND TABLENAME = ‘’ GROUP BY DATABASENAME , TABLENAME;

=================================================================
How can you find the Teradata Release and Version information from Data Dictionary Table?
Answers:

To find Release and Version information you can query this Data Dictionary table DBC.DBCINFO

SELECT * FROM DBC.DBCINFO;

=================================================================
How can you track Login Parameters of users in Teradata?
Answers: 

You can view all these parameters in this Data Dictionary Table DBC.LOGONOFF
SELECT LOGDATE, LOGTIME, USERNAME, EVENT FROM DBC.LOGONOFF;

=================================================================
How can you use HASH FUNCTIONS to view Data Distribution across all AMPs in Teradata?
Answers: 

Hash Functions can be used to view the data distribution of rows for a chosen primary index.

SELECT HASHAMP(HASHBUCKET(HASHROW())) AS “AMP#”, COUNT(*) FROM GROUP BY 1 ORDER BY 2 DESC;

HASHROW – returns the row hash value for a given value
HASHBUCKET – the grouping of a specific hash value
HASHAMP – the AMP that is associated with the hash bucket

This is really good, by looking into the result set of above written query you can easily find out the Data Distribution across all AMPs in your system and further you can easily identify un-even data distribution.

=================================================================
How do you transfer large amount of data in Teradata?
Answers: 

Transferring of large amount of data can be done using various Application Teradata Utilities which resides on the host computer ( Mainframe or Workstation) i.e. BTEQ, FastLaod, MultiLoad, Tpump and FastExport.

* BTEQ (Basic Teradata Query) supports all 4 DMLs: SELECT, INSERT, UPDATE and DELETE. BTEQ also support IMPORT/EXPORT protocols.
* Fastload, MultiLoad and Tpump transfer the data from Host to Teradata.
* FastExport is used to export data from Teradata to the Host.

=================================================================
How does Hashing happens in Teradata?

Answers: 

* Hashing is the mechanism through which data is distributed and retrieved to/from AMPs.
* Primary Index (PI) value of a row is the input to the Hashing Algorithm.
* Row Hash (32-bit number) value is the output from this Algorithm.
* Table Id + Row Hash is used to locate Cylinder and Data block.
* Same Primary Index value and data type will always produce same hash value.
* Rows with the same hash value will go to the same AMP.

So data distribution depends directly on the Row Hash uniqueness; be careful while Choosing Indexes in Teradata.

=================================================================
How to eliminate Product Joins in a Teradata SQL query?
Answers: 
 
1. Ensure statistics are collected on join columns and this is especially important if the columns you are joining on are not unique.
2. Make sure you are referencing the correct alias.
3. Also, if you have an alias, you must always reference it instead of a fully qualified tablename.
4. Sometimes product joins happen for a good reason. Joining a small table (100 rows) to a large table (1 million rows) a product join does make sense.


=================================================================
How to select first N Records in Teradata?
Answers: 

To select N records in Teradata you can use RANK function. Query syntax would be as follows
SELECT BOOK_NAME, BOOK_COUNT, RANK(BOOK_COUNT) A FROM LIBRARY QUALIFY A <= 10;

=================================================================
How to view every column and the columns contained in indexes in Teradata?
Answers:

Following query describes each column in the Teradata RDBMS
SELECT * FROM DBC.TVFields;

Following query describes columns contained in indexes in the Teradata RDBMS
SELECT * FROM DBC.Indexes;

=================================================================
What are the 5 phases in a MultiLoad Utility?
Answers:

* Preliminary Phase – Basic Setup
* DML Phase – Get DML steps down on AMPs
* Acquisition Phase – Send the input data to the AMPs and sort it
* Application Phase – Apply the input data to the appropriate Target Tables
* End Phase – Basic Cleanup

=================================================================
What are the functions of a Teradata DBA?
Answers:

Following are the different functions which a DBA can perform:
1. User Management – Creation and managing Users, Databases, Roles, Profiles and Accounts.
2. Space Allocation – Assigning Permanent Space, Spool Space and Temporary Space.
3. Access of Database Objects – Granting and Revoking Access Rights on different database objects.
4. Security Control – Handling logon and logoff rules for Users.
5. System Maintenance – Specification of system defaults, restart etc.
6. System Performance – Use of Performance Monitor(PMON), Priority Scheduler and Job Scheduling.
7. Resource Monitoring – Database Query Log(DBQL) and Access Logging.
8. Data Archives, Restores and Recovery – ARC Utility and Permanent Journals.

=================================================================
What are the MultiLoad Utility limitations?
Answers:

MultiLoad is a very powerful utility; it has following limitations:

* MultiLoad Utility doesn’t support SELECT statement.
* Concatenation of multiple input data files is not allowed.
* MultiLoad doesn’t support Arithmatic Functions i.e. ABS, LOG etc. in Mload Script.
* MultiLoad doesn’t support Exponentiation and Aggregator Operators i.e. AVG, SUM etc. in Mload Script.
* MultiLoad doesn’t support USIs (Unique Secondary Indexes), Refrential Integrity, Join Indexes, Hash Indexes and Triggers.
* Import task require use of PI (Primary Index).

=================================================================
What are TPUMP Utility Limitations?
Answers:

Following are the limitations of Teradata TPUMP Utility:
* Use of SELECT statement is not allowed.
* Concatenation of Data Files is not supported.
* Exponential & Aggregate Operators are not allowed.
* Arithmatic functions are not supported.

=================================================================
What is FILLER command in Teradata?
Answers:

While running Fastload or Multiload if you don’t want to load a particular field from the datafile to the target table then use the FILLER command to achieve this. Syntax for FILLER command would be as following:

.LAYOUT FILE_PRODUCT; /* It is input file layout name */
.FIELD Prod_No * char(11); /* To load data into Prod_No */
.FIELD Prod_Name * char(11); /* To load data into Prod_Name */
.FIELD Location * char(11); /* To load data into Location */
.FILLER Prod_Chars * char(20); /* To skip the value for the next 5 locations */

=================================================================
What is the difference between Access Logging and Query Logging in Teradata?
Answers:

1. Access Logging is concerned with security (i.e. who’s is doing what). In access logging you ask the database to log who’s doing what on a given object. The information stored is based on the object not the SQL fired or the user who fired it.
2. Query Logging (DBQL) is used for debugging (i.e. what’s happening around ?). Incase of DBQL database keep on tracking various parameters i.e. the SQLs, Resource, Spool Usage, Steps and other things which help you understand what’s going on, the information is fruitful to debug a problem. Further DBQL is enabled on a User id rather than an object like say Table or so.

=================================================================
What is the difference between Sub-Query & Co-Related Sub-Query?
Answers: 

When queries are written in a nested manner then it is termed as a sub-query. A Sub-Query get executed once for the parent statement whereas Co-Related Sub-Query get executed once for each row of the parent query.

Select Empname, Deptno, Salary from Employee Emp where Salary = (Select Max(Salary) from Employee where Deptno = Emp.Deptno) order by Deptno

=================================================================
What is Reconfiguration Utility in Teradata and What it is used for?
Answers:

* When we feed Primary Index value to Hashing Algorithm then it gives us Row Hash(32 bit number) value which is used to make entries into Hash Maps.
* Hash Maps are the mechansim for determining which AMP will be getting that row.
* Each Hash Map is an array of 65,536 entries and its size is close to 128KB.

When Teradata is installed on a system then there are some scrpits which we need to execute i.e. DIP Scripts. So it creates a Hash Maps of 65,536 entries for the current configuration. But what if you want to add some more AMPs into your system?

Reconfiguration (Reconfig) is a technique for changing the configuration (i.e. changing the number of AMPs in a system) and is controlled by the Reconfiguration Hash Maps. System builds Reconfiguration Hash Maps by reassigning hash map entries to reflect new configuration of system.

Lets understand this concept with the help of an example; suppose you have a 4 AMPs system which holds 65,536 entries. Each AMP is responsible for holding (65,536/4=16,384) 16,384 entries.

Now you have added 2 more AMPs in your current configuration so you need to reconfigure your system. Now each AMP would be responsible for holding (65,536/6=10922) 10,922 entries.

=================================================================

Introduction to teradata utilities : fastload

Teradata FastLoad, also called "FastLoad"   or "FL," is a multi-sessioned parallel load utility for initial table load in bulk mode on a Teradata Database.

OVERVIEW

Teradata FastLoad processes a series of FastLoad commands and Teradata SQL statements written in a FastLoad job script or interactively entered. The FastLoad commands provide the session control and data handling specifications for the data load operations, and the Teradata SQL statements perform the actual data load functions on the Teradata RDBMS tables and views. However, this command-driven utility feeds only one table per job; thus, to load more than one table, multiple FastLoad jobs must be submitted — one for each table. (In comparison, refer to Teradata MultiLoad)


It loads data from:
  • Disk or tape files on a channel-attached client system;
  • Input files on a network-attached workstation;
  • Special input module (INMOD) routines that select, validate, and preprocess input data;
  • Any other device providing properly formatted source data.
Operating Modes


Teradata FastLoad runs in the following operating modes:
  • Interactive mode;
  • Batch mode.
In interactive mode, Teradata FastLoaduses terminal screen and keyboard as the standard output and input streams.In batch mode, FastLoad uses > and < to redirect the standard output / input streams.


To invoke Teradata FastLoad in interactive mode, enter the following command (for Microsoft Windows):
c:\ncr\bin\fastload


To invoke FastLoad in batch mode, use the command below (for Microsoft Windows):
c:\ncr\bin\fastload [options] < infile > outfile


Here, the infile is a Teradata FastLoad job script file which includes all the required FastLoad commands and Teradata SQL statements, whereas the outfile is the FastLoad output stream file.


Character Sets


Teradata FastLoad supports Latin, Chinese, Japanese, and Korean character sets for networkattached and channel-attached configurations.Teradata FastLoad also supports UTF16 (workstation only) and UTF8 character sets.



Data Transfer Capabilities


Teradata FastLoad utilizes TCP/IP Protocol for all data transfer operations on network-attached
workstations.


FastLoad transfers data on channel-attached systems as either:
  • a multi-volume data set or file;
  • a number of single-volume data sets or files in separate Teradata FastLoad jobs.


Serial FastLoad operations can be restarted by loading the next tape in a series, instead of beginning with the first tape in a set. In either case, FastLoad:
  • transfer data in multiple-session mode (one session per AMP);
  • transfers multiple rows of data within a single message.


Also, in either case, until the Teradata FastLoad job is completed and the data has been loaded into
the Teradata FastLoad table:
  • there is no journaling or fallback data;
  • the secondary indexes cannot be defined.

 Data Conversion Capabilities


FastLoad data conversion types are:
  • Numeric-to-numeric (for example integer-to-decimal);
  • Character-to-numeric;
  • Character-to-date;
  • Date-to-character.


Note: Redundant conversions, like integer-to-integer, are legal and necessary to support the zoned decimal format.



Input Data Formats


Supported

For network-attached configurations, Teradata FastLoad supports the following data formats:
  • Formatted data;
  • Unformatted data;
  • Binary data;
  • Text data;
  • Variable-length text.


For channel-attached configurations, Teradata FastLoad supports:

Data sets and tape files with the following record format (RECFM) attributes
  • F (fixed);
  • FB (fixed block);
  • V (variable);
  • VB (variable block);
  • VBS (variable block, spanned).




 FASTLOAD COMMANDS


The FastLoad commands perform two types of activities:
  • Session control — Session control commands begin and end Teradata FastLoad sessions and provide online information about a particular Teradata FastLoad operation;
  • Data handling — Data handling commands establish and define a Teradata FastLoad operation.

1. Teradata FastLoad Commands for Session Control:

COMMAND NAME
FUNCTION
HELP
Lists Teradata FastLoad commands and options
HELP TABLE
Creates a list of field names that you can use with the
INSERT statement
LOGOFF

QUIT
Ends Teradata FastLoad sessions and terminates Teradata
FastLoad
LOGON
Begins one or more Teradata FastLoad sessions
NOTIFY
Specifies a user exit or action to be performed when
certain significant events occur
OS
Enters client operating system commands
SESSIONS
Specifies the number of Teradata FastLoad sessions logged on with a LOGON command and, optionally, the minimum number of sessions
required to run the job

SHOW

Shows the current field/file definitions established by
DEFINE commands

SHOW SESSIONS
Shows the current level of all Teradata FastLoad software modules

SLEEP

Specifies the number of minutes that Teradata FastLoad
pauses before retrying a logon operation

TENACITY

Specifies the number of hours that Teradata FastLoad
continues trying to log on when the maximum number of load jobs is already running on the Teradata Database

2. Teradata FastLoad Commands for Data Handling:


COMMAND NAME

FUNCTION

AXSMOD

Specifies the name and initialization string for a shared
object file that loads data from a file on network-attached client systems

BEGIN LOADING

Identifies the tables used in the Teradata FastLoad
operation and, optionally,
specifies when checkpoints are taken or if the user
supplies indicator data

CLEAR

QUIT

Cancels the current DEFINE command specifications

DATEFORM

Specifies the form of the DATE data type specifications
for the Teradata FastLoad job

DEFINE

Describes each field of an input data source record and
specifies the name of the input data source or INMOD routine

END LOADING

Informs the Teradata Database that all input data has been sent

ERRLIMIT

Limits the number of errors detected during the loading
phase of a Teradata FastLoad job Processing stops when the limit is reached.

RECORD

Specifies the number of a record in an input data source
at which Teradata FastLoad begins to read data and/or the number of the last record to be read

RUN

Invokes the specified external source as the current
source of commands and statements

SET RECORD

Specifies that the input data records are either:
  • Formatted
  • Unformatted
  • Binary
  • Text
  • Variable-length
    text
Note: The SET RECORD command applies only to
network-attached systems.

SET SESSION CHARSET

Specifies which character set is in effect during a
specific Teradata FastLoad invocation
Note: The SET SESSION CHARSET command applies only to
networkattached systems.



Teradata SQL Statements in FastLoad

A subset of Teradata SQL statements can be used in Teradata
FastLoad to define and manipulate the data stored in the Teradata Database. The
Teradata SQL statements supported by Teradata FastLoad are listed below:


TERADATA SQL STATEMENT

FUNCTION

CREATE TABLE

Defines the columns, index, and other qualities of a table

DATABASE

Changes the default database

DELETE

Deletes rows from a table

DROP TABLE

Removes a table and all of its rows from a database

INSERT

Inserts rows into a table

SET QUERY_BANDING...FOR SESSION

Sets query banding at the session level

Introduction to teradata utilities : Multiload

"Teradata MultiLoad"


DEFINITION
Teradata MultiLoad, also called "MultiLoad," "MLoad" or "ML," is a command-driven parallel load utility for high-volume batch maintenance on multiple tables and views of the Teradata Database. It is specially designed for high-speed batch creation and maintenance of large databases.

OVERVIEW
Teradata MultiLoad executes a series of MultiLoad commands and Teradata SQL statements written in a batch mode job script or interactively entered. The MultiLoad commands provide the session control and data handling specifications for the data transfer operations, and the Teradata SQL statements perform the actual maintenance functions on the Teradata RDBMS tables and views.

A single MultiLoad job performs a number of different import and delete tasks on database tables and views:

• Each MultiLoad import task can do multiple data insert, update, and delete functions on up to five different tables or views;
• Each MultiLoad import task can have up to 100 DML steps;
• Each MultiLoad delete task can remove large numbers of rows from a single table.

Teradata MultiLoad is characterized by its serialization feature. In certain scenarios, multiple changes are possible to occur to a single row in the same MultiLoad job. By using its serialization feature, Teradata MultiLoad automatically guarantees that the ordering of such operations is maintained correctly.

If a MultiLoad job has only one import task, then Teradata FastLoad has to be chosen for it, since FastLoad will finish it much faster. However, the FastLoad utility works only on empty tables.

A FastLoad job can not be used to:
  • • Insert additional data rows into an existing table;
  • • Update individual rows of existing tables;
  • • Delete individual rows from existing tables;
  • • Load data into multiple tables.


DATA SOURCE

Teradata MultiLoad imports data from:
• Disk or tape files
• Input files on a network-attached workstation;
• Special input module (INMOD) routines that select, validate, and preprocess input data;
• Access modules;
• Any device providing properly formatted source data;
The table or view in the database receiving the data can be any existing table or view for which the user has access privileges for the required maintenance tasks.
MultiLoad command IMPORT has an optional parameter FORMAT, which is used to specify the format of the external data source.
The format may be:
• FASTLOAD;
• BINARY;
• TEXT;
• UNFORMAT;
• VARTEXT.


OPERATING MODES 

Teradata MultiLoad runs in the following operating modes:
• Interactive mode;
• Batch mode.

In interactive mode, Teradata MultiLoad uses terminal screen and keyboard as the standard output and input streams. In batch mode, MultiLoad uses > and < to redirect the standard output / input streams. To invoke Teradata MultiLoad in interactive mode, enter the following command (for Microsoft Windows): c:\ncr\bin\MultiLoad

To invoke MultiLoad in batch mode, use the command below (for Microsoft Windows): c:\ncr\bin\MultiLoad [options] < infile > outfile
Here, the infile is a Teradata MultiLoad job script file which includes all the required MultiLoad commands and Teradata SQL statements, whereas the outfile is the outfile is the output stream file.

Character Sets
Teradata MultiLoad supports Latin, Chinese, Japanese and Korean character sets, along with ASCII, EBCDIC, UTF-8 and UTF-16. It also supports user-defined session character sets.

Task Status Reporting
Teradata MultiLoad has three reporting methods to monitor the status of jobs that are still in progress and those that have just completed:
• MultiLoad utility messages provide job status reports, including:
o Options messages that list the settings of MultiLoad task parameters;
o Logoff/disconnect messages that list key run statistics.
• QrySessn (i.e., Query Session Utility) provides real-time, phase-oriented progress reports at selected intervals during the MultiLoad job;
• Statistics messages present information at the end of the MultiLoad job.

Data Conversion Capabilities
Teradata MultiLoad can redefine the data type specification of input data to match it with that of the destination column in the MultiLoad table on the target database. MultiLoad data conversion types are:
• Numeric-to-numeric (for example integer-to-decimal);
• Character-to-numeric;
• Character-to-date;
• Date-to-character.
Use the DATADESC specification of the MultiLoad FIELD command to convert input data to a different data type before inserting it into the MultiLoad table on the Teradata Database.
Note: Redundant conversions, like integer-to-integer, are legal and necessary to support the zoned decimal format.

MULTILOAD COMMANDS


MultiLoad Command Summary
The MultiLoad commands perform two types of activities:
• Support activities — Support commands perform support activities to establish the MultiLoad sessions with the Teradata Database and define the operational support environment for Teradata MultiLoad. Established support environment options remain in effect until another support command changes them. Support commands do not specify a MultiLoad task;
• Task activities — Task commands perform task activities to specify the actual processing of each MultiLoad task.

1. Teradata MultiLoad Commands for Support Activities:
COMMAND NAME
FUNCTION
ACCEPT
Allows
the value of one or more utility variables to be accepted from either a file
or an environment variable
DATEFORM
Specifies
the form of the DATE data type specifications for the MultiLoad job
DISPLAY
Writes messages to the specified destination

ELSE (see
IF, ELSE, and ENDIF)

Introduces
commands and statements that execute when a preceding IF condition is false

ENDIF
(see IF, ELSE, and ENDIF)

Delimits the group of MultiLoad commands that were subject to previous IF or ELSE conditions

IF (see
IF, ELSE, and ENDIF)

Introduces a conditional expression whose value initiates execution of subsequent commands

LOGOFF

Disconnects all active sessions and terminates MultiLoad on the client system.

LOGON

Specifies the LOGON command string to be used in connecting all sessions established by MultiLoad.
LOGTABLE
Identifies the table to be used to journal checkpoint information required for safe,
automatic restart of MultiLoad when the client or Teradata Database system fails.

ROUTE
MESSAGES

Identifies the destination of output produced by MultiLoad support environment.

RUN FILE

Invokes the specified external file as the current source of utility commands and Teradata SQL statements

SET

Assigns a data type and a value to a utility variable

SYSTEM

Suspends operation of MultiLoad and executes any valid local operating system command

2. Teradata MultiLoad Commands for Task Activities:

COMMAND NAME

FUNCTION

BEGIN
MLOAD and BEGIN DELETE MLOAD

Specifies:

  • The kind of MultiLoad task to
    be executed;
  • The target tables in the
    Teradata Database;
  • The parameters for executing
    the task.

DML LABEL

Defines a label and error treatment options for a following group of DML statements.

END MLOAD
 Indicates completion of MultiLoad command entries and initiates execution of the task.

FIELD

Used with the LAYOUT command to define a field of the data source record that is sent to the Teradata Database.

FILLER

Used with the LAYOUT command to define a field in the data source record that is not sent to the Teradata Database.

IMPORT

Identifies the data source, the layout used to describe the data record, and optional conditions for performing DML operations.

LAYOUT

Introduces the record format of the data source to be used in the MultiLoad task. This command is followed by a sequence or combination of FIELD and TABLE commands.

LOGDATA
Supplies parameters to the LOGMECH command beyond those needed by the logon mechanism, such as userid and password, to successfully authenticate the user.

LOGMECH

Identifies the appropriate logon mechanism by name.
 PAUSE
ACQUISITION

Temporarily suspends the acquisition phase of a MultiLoad job.

TABLE
Used with the LAYOUT command to identify a table whose column names and data descriptions are used as the field names and data descriptions of the data source records.


Teradata SQL Statements in MultiLoad
MultiLoad supports a subset of Teradata SQL statements in order to define and manipulate the data stored in the Teradata Database. Consequently, other utilities do not have to be invoked to perform routine database maintenance functions before executing MultiLoad utility tasks. The Teradata SQL statements supported by Teradata MultiLoad are listed below:

COMMAND NAME

FUNCTION

ALTER TABLE

Changes the column configuration or options of an existing table

CHECKPOINT

Adds a checkpoint entry to a journal table

COLLECT STATISTICS
 Collects statistical data for one or more columns of a table

COMMENT

Stores or retrieves comment string associated with a database object

CREATE DATABASE

CREATE MACRO

CREATE TABLE

CREATE VIEW

Creates a new database, macro, table, or view

DATABASE
Specifies a new default database for the current session

DELETE

Removes rows from a table

DELETE DATABASE
Removes all tables, views, and macros from a database
DROP DATABASE

Drops the definition for an empty database from the Data Dictionary

DROP TABLE

Removes a table from the database
GIVE
Transfers ownership of a database to another user
GRANT
Grants access privileges to a database object
INSERT
Insert  new rows to a table

MODIFY DATABASE

Changes  the options of an existing database

RELEASE MLOAD

Removes the access locks from the target tables in the Teradata Database

This inhibits any attempts to restart the MultiLoad job when a MultiLoad task has been suspended or aborted.

Note: A RELEASE MLOAD statement must be entered from BTEQ. It cannot be entered from MultiLoad.

RENAME

Changes the name of an existing table, view, or macro

REPLACE MACRO
REPLACE VIEW

Redefines an existing macro or view

REVOKE
Rescinds access privileges to a database object

SET QUERY_BAND

Identifies type and source of query to determine prioritization of queries

SET SESSION COLLATION

Overrides the collation specification for the current session

SET SESSION OVERRIDE REPLICATION ON/OFF
 Turns replication services on or off

UPDATE

Changes the column values of an existing row in a table

Export /Import using Teradata SQL assistant

please Use this option  for few reocrds only (say in 1000's) . For large data volumes use fastload (empty table loading) or Mload utilities



1. Login to the Source Database using Teradata SQL Assistant. ( QA / PROD)

2. Paste the Source SQL Query in the "Query" window.

3. Select the following Menu Option "File=>Export Results"


















4. Go to "Tools=>Options" and select the Date Format as 'YYYY-MM-DD'.
P.S:  This option is used when a date column exists in the table. 
















5. Run the Query (Press - F5) and Provide the Target Location where you want the output file to be saved:

















6. Open the Saved File (ex. Test_Export.txt) and remove the First Line from the File Which contains all the column Names.

7. Login to the Target Database using Teradata SQL Assistant (Dev).

8. Select the following Menu Option "File=>Import Data"




















9. Paste the SQL Insert Query in the "Query" window.
ex. Insert into TBENW_RVLTN_REPL_LOC_SKU_FW values (?,?,?,?,?);

NOTE: There should be a '?' as a Place-Holder for all the columns in your Source File.

10. Run the Query (Press - F5).

11. Turn off the Import Data mode and resume querying.




Importing Excel Data into Teradata Table

1. Make sure you have the columns in Excel in the same order as the table you created. Remove the header rows from the Excel file and Save the file as comma delimited file from Excel (CSV File).
2. In SQL Assistant set the Delimiter as ","(comma) .
( Delimiter is available in Menu Tools , Options, Import/Export)
















3. Go to "Tools=>Options" and select the Date Format as 'YYYY-MM-DD'.
P.S:  This option is used when a date column exists in the table. 
















4. From Teradata SQL Assistant, click 'File', then 'Import Data'.

5. Run the command: insert into TABLENAME values (?,?,?....) 
You must supply a ? (question mark) for every column. So, for a table with 5 columns the values command would look like: values (?,?,?,?,?).

6. Highlight the insert query and Press F5. This will prompt you for a file location. Select the appropriate file.

7. This will load in the order in which they appear in your text file, which is why it is important to match the columns in Excel with the order of the columns in your table.

8. Turn off the Import Data mode and resume querying.

Teradata interview questions PART 2


Teradata interview questions PART 2
Teradata interview questions PART 2 consolidated over period of time from various sources. Thanks! to the original posters

==================================================================================
Teradata performance tuning and optimization
Answers:

1. collecting statistics
2. Explain Statements
3. Avoid Product Joins when possible
4. select appropriate primary index to avoid skewness in storage
5. Avoid Redistribution when possible
6. Use sub-selects instead of big "IN" lists
7. Use derived tables
8. Use GROUP BY instead of DISTINCT ( GROUP BY sorts the data locally on the VPROC. DISTINCT sorts the data after it is redistributed)
9. Use Compression on large tables

==================================================================================
How many codd's rules are satisfied by teradata database?
Answers: 12 codd’s rules.



==================================================================================
What is the difference between Global temporary tables and Volatile temporary tables?
Answers:

Global Temporary tables (GTT) –
1. When they are created, its definition goes into Data Dictionary.
2. When materialized data goes in temp space.
3. That's why, data is active upto the session ends, and definition will remain there up-to its not dropped using Drop table statement.If dropped from some other session then its should be Drop table all;
4. you can collect stats on GTT.

Volatile Temporary tables (VTT) -
1. Table Definition is stored in System cache
2. Data is stored in spool space.
3. thats why, data and table definition both are active only upto session ends.
4. No collect stats for VTT.If you are using volatile table, you can not put the default values on column level ( while creating table )

==================================================================================
What is the difference between start schema and Fsldm?
Answers:
FSLDM --> Financial Services Logical Data Model (Teradata FS-LDM 7.0) its developed by Teradata for the financial sectors (Specifically for Bank). it can be customised based on the user requirement.

StarSchema --> Its a relational database schema for representing multidimensional data. The data is stored in a central fact table, with one or more tables holding information on each dimension.




==================================================================================
Join Strategies There are 2 tables, table A with 10 million records, table B has 100 million records, now we are joining both tables, when we seen Explain Plan the plan showing TD will took the table A and it will redistributes itNow the Question is: By that plan is the optimizer is correct job or not ? Justify Ans2. From the same above example now the optimizer is taking Table B (100 million records) and it is distributing it, Now is the optimizer is doing best? and How you avoid this situation
Answers:

Teradata is smart enough to decide when to redistribute and when to copy.
It compares the tables. Are they comparable? or one is big as compared to the other?Based on simple logic it decides whether to distribute the smaller table on all the AMPs or to copy.
what I mean is the small table is copied into all the AMPs in the SPOOL space.Remember all always the Join's has to take place on the AMPs SPOOL Space.By redistributing it is making sure that the 100 million rows table gets the feeling that
it is making AMP local JOIN. Remember the basic thing what ever Teradata does.
It does keeping in consideration for Space and Performance and not to forget the Efficiency.

My simple formula:
If the table is small redistribute them to all the AMPs to have the AMP local Join.
Always JOINs are made AMP local if it cannot then you have the high chance of running out of SPOOL space.

==================================================================================
What are set tables and multiset tables in Teradata?Explain with an appropriate example?
Answers:
1) Set tables cannot accept duplicate at row level not Index or key level.
Example of rows for set table:
R1 c2 c3 c4 ..cn
1 2 3 4 ... 9

Accepted
1 2 3 4 ... 9
Duplicate is Rejected
2 1 2 4 ... 9
3 2 4 4 ... 9
4 3 4 4 ... 9

2) Multi set Tables can accept
duplicate at row level not Index or key level.Exmaple of rows for multi set table:
R1 c2 c3 c4 .. cn
1 2 3 4 ... 9
1 2 3 4 ... 9
Duplicate is Accepted
2 1 2 4 ...9
3 2 4 4 ...9
3 2 4 4 ...9
Duplicate is Accepted

=================================================================================
What is FILLER command in Teradata?
Answers:

While using the mload of fastload if you don't want to load a particular filed in the datafile to the
target then use this filler command to achieve this

==================================================================================
How a Referential integrity is handled in Teradata?
Answers:
 By use of  TPUMP utility,  referential integrity is handled in teradata



==================================================================================
Explain about Skew Factor?
Answers:

The data distribution of table among AMPs is called Skew Factor . Generally For Non-Unique PI we get duplicate values so the more duplicate vales we get more the data have same row hash so all the same data will come to same amp, it makes data distribution inequality,One amp will store more data and other amp stores less amount of data, when we are accessing full table,
The amp which is having more data will take longer time and makes other amps waiting which leads processing wastage In this situation (unequal distribution of data)we get Skew Factor HighFor this type of tables we should avoid full table scans.
ex:AMP0 AMP110000(10%) 9000000(90%)in this situation skew factor is very high 90%

==================================================================================
Why AMP & PE are called Vprocs?
Answers:

Vprocs:Virtual process From PE to AMP (This is the network root via MSP(message passing layer),The processing data will store in Disks(These are Physical disks),Each Amp have too many P.disks,to migrate these P.disks The Bynet Network maintains Virtual disks.These V.disks will responsible for data migration.hence they are called as Virtual Process(VPROCS).

==================================================================================
Why Fload doesn’t support multiset table?
Answers:



Fast Load will load the Data Fastly,Mean Concurrently,Bulk Insert Internally TeraData server will Create Bulk insert DML statement (Like: insert into Table1 as sel * from Table2 with data) For Fast Performance it will not support the duplicate values,Referential Integrity.

==================================================================================
What is use of compress in teradata?Explain?
Answers:

Compression is used to Minimize the table size, for example when the table size is increasing anonymously We can use Compression to reduce the size of the table
Conditions:1.Compression can be declared at the time of table creation2.We can compress up to 256 column values(not columns) 3.We can't compress variable length fields (vartext,varchar..)

Explanation
for 2 condition:create table tab1(::Order_type char(25) compress ('air','sea','road'):)in the above example order type have 3 fields, one should be selected by the user, so one of the field will repeat for every order, like these column values we can use compress statement because these are repeating for
entire table, like these column values TD supports 256 col generally NCR people will provides ready made scripts for these type of compressions However, we can store only one value per column and not 3(air, sea, road). The compressed value is
stored in column header and is to be used as default of that column unless a value is present.
e.g.
Dept of Taxes in Washington has a database that tracks all people working in Washington. Around 99.9% of the tax payers would have Washington as their state code on their address. Instead of storing “Washington” in millions of records the compress will store the value “Washington” in the table header. Now, a value can be assumed in a row as a default unless another value exists inside the column

==================================================================================
What is the process to restart the multiload if it fails?
Answers:

MULTILOAD will creates 2 error tables, 1 work table When MULTILOAD fails We have to unlock the Main Table, here error tables and work tables are not locked like FLOAD.

To Unlock the Main Table in  case of acquisation Phase :
Mload RELEASE MLOAD ;
To release lock in application phase failure :
MLOADRELEASE MLOAD .IN APPLY;
==================================================================================
Can we load a Multi set table using MLOAD?
Answers:

We can Load SET, MULTISET tables using Mload, But here when loading into MULTISET table using MLOAD duplicate rows will not be rejected, we have to take care of them before loading.But in case of Fload when we are loading into MULTISET duplicate rows are automatically rejected, FLOAD will not load duplicate rows weather table is SET or MULTISET

==================================================================================
Can I use “drop” statement in the utility “fload”?
Answers:

YES,But you have to declare it out of the FLOAD Block it means it should not come between .begin loading,.end loading FLOAD also supports DELETE,CREATE,DROP statements which we have to declare out of FLOAD blocking the FLOAD Block we can give only INSERT




==================================================================================
Is it possible that there r two primary key will be in a same table?
Answers:

Primary key
==========
1. A table should have only one primary key
2. More than one column can consist of a primary key – upto 64 columns
3. Can not be NULL values (missing values)
4. Should be unique values (no duplicate value)

Foreign key
==========
1. A table may have zero or more than that up-to 32 keys
2. More than one column can consist a primary key – up to 64 columns
3. Can have NULL values
4. Can have duplicate values
5. There should be a corresponding primary key in the parent table to enforce referential integrity for a foreign key

==================================================================================
In a table can we use primary key in one column and in another column both unique and not null constrains.if yes how?
Answers:

Yes, you can have a column for Primary key and have another column which will have no duplicates or null.e.g.A Salary Table will have employee ID as primary key.
The table also contains TAX-ID which can not be null or duplicate

==================================================================================
What are the types of tables in Teradata ?
Answers:

1. set table
2. multiset table
3.  derived table
4. global temporary table(temporary table)
5. volatile table



Teradata interview questions PART 1



How many error tables are there in fload and Mload and what is their significance/use?
Can we see the data of error tables?
When mload job fails, can we access mload tables? If yes then how?
Answers:

Fload uses 2 error tables
ET TABLE 1: where format of data is not correct.
ET TABLE 2: violations of UPI
it maintains only error field name, errorcode and data-parcel only.


Mload also uses 2 error tables (ET and UV), 1 work table and 1 log table
1. ET TABLE - Data error
MultiLoad uses the ET table, also called the Acquisition Phase error table, to store data errors found during the acquisition phase of a MultiLoad import task.

2. UV TABLE - UPI violations
MultiLoad uses the UV table, also called the Application Phase error table, to store data errors found during the application phase of a MultiLoad import or delete task


Apart from error tables, it also has work and log tables
3. WORK TABLE - WT
Mload loads the selected records in the work table

4. LOG TABLE
A log table maintains record of all checkpoints related to the load job, it is essential l/ mandatory to specify a log table in mload job. This table will be useful in case you have a job abort or restart due to any reason.


How Teradata makes sure that there are no duplicate rows being inserted when its a SET table?
Answers:

Teradata will redirect the new inserted row as per its PI to the target AMP (on the basis of its row hash value), and if it find same row hash value in that AMP (hash synonyms) then it start comparing the whole row, and find out if duplicate.
If it’s a duplicate it silently skips it without throwing any error.


What is basic Teradata query language?
Answers:

1. It allows us to write SQL statements along with BTEQ commands. We can use BTEQ for importing, exporting and reporting purposes.
2. The commands start with a (.) dot and can be terminated by using (;), it is not mandatory to use (;).
3. BTEQ will assume any thing written with out a dot as a SQL statement and requires a (;) to terminate it.


What are cliques? What is Vdisk and how it will communicate with physical data storage at the time of data retrieval through AMP?
Answers:

A clique is a set of Teradata nodes that share a common set of disk arrays. Cabling a subset of nodes to the same disk arrays creates a clique.

Each AMP vproc must have access to an array controller, which in turn accesses the physical disks. AMP vprocs are associated with one or more ranks (or mirrored pairs) of data. The total disk space associated with an AMP is called a vdisk. A vdisk may have up to three ranks. Hence Vdisk will communicate with physical storage through array controllers.


What is the difference between MultiLoad & Fast load interns of Performance?
Answers:

If you want to load, empty table then you use the fastload, so it will very useful than the MultiLoad ,because fastload performs the loading of the data in 2phase and its no need a work table for loading the data .
So it is faster as well as it follows the below steps to load the data in the table
Phase1 - It moves all the records to the entire AMP first without any hashing
Phase2 - After giving end loading command, Amp will hashes the record and send it to the appropriate AMPS.

MultiLoad:
It does the loading in the 5 phases
Phase1 - It will get the import file and checks the script
Phase2 - It reads the record from the base table and store in the work table
Phase3 - In this Application phase it locks the table header
Phase4 - In the DML operation will done in the tables
Phase5 - In this table locks will be released and work tables will be dropped.

Which two statements are true about a foreign key?
Answers:

Each Foreign Key must exist as a Primary Key.
Foreign Keys can change values over time.
Answer-1:
First: True
Second: False
1. Foreign Keys can change values over time.
2. Each Foreign Key must exist as a Primary Key.

How does indexing improve query performance?
Answers:

Indexing is a way to physically reorganize the records to enable some frequently used queries to run faster.

The index can be used as a pointer to the large table. It helps to locate the required row quickly and then return it back to the user.
or
The frequently used queries need not hit a large table for data. they can get what they want from the index itself. - cover queries.

Index comes with the overhead of maintenance. Teradata maintains its index by itself. Each time an insert/update/delete is done on the table the indexes will also need to be updated and maintained.

Indexes cannot be accessed directly by users. Only the optimizer has access to the index
.

What is a common data source for the central enterprise data warehouse?
Answers:

ODS=>Operational Data Source


What are the enhanced features in Teradata V2R5 and V2R6?
Answers:

V2R6 included the feature of replica in it. in which copy of data base are available on another system which means V2R6 provide the additional data protection as comparison to V2R5 while if data from one system has been vanishes.


Where is the GRANT table option usually done? When tables are newly created, what is the default role and what the default privileges which get assigned?
Answers:

The GRANT option for any particular table depends on the privileges of the user. If it is an admin user you can grant privileges at any point of time.
The default roles associated with the newly created tables depend on he schema in which they are created.


What is error table? What is the use of error table?
Answers:

The Error Table contains information concerning:
- Data conversion errors Constraint violations and other error conditions:
* Contains rows which failed to be manipulated due to constraint violations or Translation error
* Captures rows that contain duplicate Values for UPIs.
* It logs errors & exceptions that occurs during the apply phase.
* It logs errors that are occurs during the acquisition phase.

What is optimization and performance tuning and How does it work in practical projects. Explain with an example.
Answers:

Optimization is the technique of selecting the least expensive plan (fastest plan) for the query to fetch results.
Optimization is directly proportional to the availability of --
1. CPU resources
2. Systems resources - amps PEs etc.
Teradata performance tuning is a technique of improving the process in order for query to perform faster with the minimal use of CPU resources.

Does SDLC changes when you use Teradata instead of Oracle?
Answers:

If the teradata is going to be only a data base means It won?t change the System development life cycle (SDLC)
If you are going to use the Teradata utilities then it will change the Architecture or SDLC
If your schema is going to be in 3NF then there won’t be huge in change


What is an optimization and performance tuning and how does it really work in practical projects?
Answers:

Performance tuning and optimization of a query involves collecting statistics on join columns, avoiding cross product join, selection of appropriate primary index (to avoid skewness in storage) and using secondary index.
Avoiding NUSI is advisable.


What are two examples of an OLTP environment?
Answers:

The two examples of OLTP are:
1- ATM.
2- POS