Social Icons

Tuesday 21 June 2016

Teradata History

Teradata History


What is a Data Warehouse?

A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources. 

In addition to a relational database, a data warehouse environment includes an extraction, transportation, transformation, and loading (ETL) solution, an online analytical processing (OLAP) engine, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users.

What is a RDBMS?

a type of database management system (DBMS) that storesdata in the form of related tables. Relational databases are powerful because they require few assumptions about how data is related or how it will be extracted from the database. As a result, the same database can be viewed in many different ways.
An important feature of relational systems is that a single database can be spread across several tables. 

What is Teradata?

Teradata is a Relational Database Management System (RDBMS) for the world’s largest commercial databases.  Teradata can store data upto Teradata bytes in size. This makes the Teradata as a market leader in data warehousing applications.

Through the concept of parallelism, Teradata obtains the ability to manage terabytes of data.
Teradata is a Relational Database Management System (RDBMS).
Designed to run the world’s largest commercial databases.
Ø Preferred solution for enterprise data warehousing
Ø Executes on UNIX MP-RAS, Windows NT or Windows 2000 operating systems
Ø Compliant with ANSI industry standards
Ø Runs on single or multiple nodes
Ø Acts as a “database server” to client applications throughout the enterprise
Ø Uses parallelism to manage “terabytes” of data
Ø Fault tolerance to automatically detect and recover from hardware failures

Teradata - A Brief History

1979  -  Teradata Corp founded in L.A. Cal
  -  Development begins on a massively parallel computer
1982  -  YNET technology is patented
1984  -   Teradata markets the first database computer DBC/1012
  -  First system purchased by Wells Fargo Bank of Cal.
  -  Total revenue for year - $3 million
1987  -  First public offering of stock
1989  - Teradata and NCR partner on next generation of DBC
1991  -  NCR Corporation is acquired by AT&T 
  - Teradata revenues at $280 million
1992  -  Teradata is merged into NCR
1996  -  AT&T spins off NCR Corp with Teradata product
1997  -  Teradata database becomes industry leader in data warehousing
2000  -  100+ Terabyte system in production

The Teradata Charter

ØRelational database
ØEnormous capacity
-  Billions of rows
-  Terabytes of data
ØHigh performance parallel processing
ØSingle database server for multiple clients
ØNetwork and mainframe connectivity
ØUtilize an industry standard access language -  (Structured Query Language - SQL)
ØManageable growth via modularity
ØFault tolerance at all levels of hardware and software
ØData integrity and reliability

Teradata’s Competitive Advantages

ØUnlimited, Proven Scalability - amount of data and number of users; allows for an enterprise wide model of the data.
ØUnlimited Parallelism - Parallel access, sorts, and aggregations.
ØMature Optimizer - Handles complex queries, up to 64 joins per query, ad-hoc processing.
ØModels the Business - 3NF, robust view processing, & provides star schema capabilities.
ØProvides a “single version of the truth”.
ØLow TCO (Total Cost of Ownership) - ease of setup, maintenance, & administration; no re-orgs, lowest disk to data ratio, and robust expansion utility (reconfig).
ØHigh Availability - no single point of failure.
ØParallel Load utilities -  robust, parallel, and scalable load utilities such as FastLoad,MultiLoad, and TPump.

Teradata Differentiator

•Parallelism
•Scalability
•Optimizer

Parallelism

•The traditional OLTP has conditional parallelism
    whereas Teradata has unconditional parallelism.
    OLTP systems have Query parallelization but
    they don’t have parallelism in Join,Aggregate and
    sort processing.But all the functions in Teradata are done in parallel.
•Degree of Parallelism = Number of AMPS
•Number of parallel units independent from tables or queries
•Each parallel unit (AMP) owns and manages it’s own data.


Query Parallelism:

      Teradata Database query parallelism is enabled by the hash distribution of rows across all AMPs in the system. All

      relational operations perform in parallel, simultaneously,and unconditionally across the AMPs, and each operation on an AMP is performed independently of the data on other AMPs

  in the system.

•Within-a-step Parallelism:

      The optimizer splits the SQL query into smaller base  operations called steps,and dispatches these steps into  AMPs. If a step contains complicated sub operations,they are processed in parallel pipelining.

•Multi-step Parallelism:

     Multi-steps for the same query can be executed at the same time if they do not depend on the previous step results.

•Multi-statement request Parallelism:  

      When several different SQL statements are sent to the optimizer as a bundle, these can be executed in parallel. For example, five queries are sent as a bundle and they have the same subquery,the subquery is executed only once and used for five queries. By eliminating common expression, this adds efficiency




Scalability

•Teradata is linearly scalable.
•Shared Nothing architecture enables hardware scalability
•Teradata’s Parallel Everything enables software scalability by eliminating single points of control at all levels
•Add more data, more users, and more subjects
•e.g. double number of nodes:
    to your data warehouse with predictable performance
   process twice the workload in same amount of time,
   or process same workload in half amount of time

Teradata Optimizer

•Teradata has a parallel-aware, cost-based optimizer with full look ahead capability to maximize throughput and minimize resource contention:
 The optimizer can determine the lowest cost (time) to complete each and every intermediate step within the query plan in order to choose the fastest overall time for a query

What does it Optimize?

 Access Path: Method of accessing each table
-Table Scan, Index Use
 Join Method: How pairs of table are joined
-Merge Join, Product Join, Hash Join, Nested Join and Row ID join
 Join Geography: How rows are relocated prior to the join
-Redistribute Rows, Duplicate Rows, AMP local etc.,
 Join Order: Sequence of table joins
- 5 table look ahead, pick the cheapest

1 comment:

  1. very informative blog and useful article thank you for sharing with us , keep posting Big Data Hadoop Online Training Hyderabad

    ReplyDelete