Learn Teradata: Teradata History

Teradata History

What is a Data Warehouse?

A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources.

In addition to a relational database, a data warehouse environment includes an extraction, transportation, transformation, and loading (ETL) solution, an online analytical processing (OLAP) engine, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users.

What is a RDBMS?

a type of database management system (DBMS) that storesdata in the form of related tables. Relational databases are powerful because they require few assumptions about how data is related or how it will be extracted from the database. As a result, the same database can be viewed in many different ways.

An important feature of relational systems is that a single database can be spread across several tables.

What is Teradata?

Teradata is a Relational Database Management System (RDBMS) for the world’s largest commercial databases. Teradata can store data upto Teradata bytes in size. This makes the Teradata as a market leader in data warehousing applications.

Through the concept of parallelism, Teradata obtains the ability to manage terabytes of data.

Teradata is a Relational Database Management System (RDBMS).

Designed to run the world’s largest commercial databases.

Ø Preferred solution for enterprise data warehousing

Ø Executes on UNIX MP-RAS, Windows NT or Windows 2000 operating systems

Ø Compliant with ANSI industry standards

Ø Runs on single or multiple nodes

Ø Acts as a “database server” to client applications throughout the enterprise

Ø Uses parallelism to manage “terabytes” of data

Ø Fault tolerance to automatically detect and recover from hardware failures

Teradata - A Brief History

1979  -  Teradata Corp founded in L.A. Cal

  -  Development begins on a massively parallel computer

1982  -  YNET technology is patented

1984  -   Teradata markets the first database computer DBC/1012

  -  First system purchased by Wells Fargo Bank of Cal.

  -  Total revenue for year - $3 million

1987  -  First public offering of stock

1989  - Teradata and NCR partner on next generation of DBC

1991  -  NCR Corporation is acquired by AT&T

  - Teradata revenues at $280 million

1992  -  Teradata is merged into NCR

1996  -  AT&T spins off NCR Corp with Teradata product

1997  -  Teradata database becomes industry leader in data warehousing

2000  -  100+ Terabyte system in production

The Teradata Charter

ØRelational database

ØEnormous capacity

- Billions of rows

- Terabytes of data

ØHigh performance parallel processing

ØSingle database server for multiple clients

ØNetwork and mainframe connectivity

ØUtilize an industry standard access language - (Structured Query Language - SQL)

ØManageable growth via modularity

ØFault tolerance at all levels of hardware and software

ØData integrity and reliability

Teradata’s Competitive Advantages

ØUnlimited, Proven Scalability - amount of data and number of users; allows for an enterprise wide model of the data.

ØUnlimited Parallelism - Parallel access, sorts, and aggregations.

ØMature Optimizer - Handles complex queries, up to 64 joins per query, ad-hoc processing.

ØModels the Business - 3NF, robust view processing, & provides star schema capabilities.

ØProvides a “single version of the truth”.

ØLow TCO (Total Cost of Ownership) - ease of setup, maintenance, & administration; no re-orgs, lowest disk to data ratio, and robust expansion utility (reconfig).

ØHigh Availability - no single point of failure.

ØParallel Load utilities - robust, parallel, and scalable load utilities such as FastLoad,MultiLoad, and TPump.

Teradata Differentiator

•Parallelism

•Scalability

•Optimizer

Parallelism

•The traditional OLTP has conditional parallelism

    whereas Teradata has unconditional parallelism.

    OLTP systems have Query parallelization but

    they don’t have parallelism in Join,Aggregate and

    sort processing.But all the functions in Teradata are done in parallel.

•Degree of Parallelism = Number of AMPS

•Number of parallel units independent from tables or queries

•Each parallel unit (AMP) owns and manages it’s own data.

Query Parallelism:

Teradata Database query parallelism is enabled by the hash distribution of rows across all AMPs in the system. All

relational operations perform in parallel, simultaneously,and unconditionally across the AMPs, and each operation on an AMP is performed independently of the data on other AMPs

in the system.

•Within-a-step Parallelism:

The optimizer splits the SQL query into smaller base operations called steps,and dispatches these steps into AMPs. If a step contains complicated sub operations,they are processed in parallel pipelining.

•Multi-step Parallelism:

Multi-steps for the same query can be executed at the same time if they do not depend on the previous step results.

•Multi-statement request Parallelism:

When several different SQL statements are sent to the optimizer as a bundle, these can be executed in parallel. For example, five queries are sent as a bundle and they have the same subquery,the subquery is executed only once and used for five queries. By eliminating common expression, this adds efficiency

Scalability

•Teradata is linearly scalable.

•Shared Nothing architecture enables hardware scalability

•Teradata’s Parallel Everything enables software scalability by eliminating single points of control at all levels

•Add more data, more users, and more subjects

•e.g. double number of nodes:

    to your data warehouse with predictable performance

   process twice the workload in same amount of time,

   or process same workload in half amount of time

Teradata Optimizer

•Teradata has a parallel-aware, cost-based optimizer with full look ahead capability to maximize throughput and minimize resource contention:

• The optimizer can determine the lowest cost (time) to complete each and every intermediate step within the query plan in order to choose the fastest overall time for a query

What does it Optimize?

• Access Path: Method of accessing each table

-Table Scan, Index Use

• Join Method: How pairs of table are joined

-Merge Join, Product Join, Hash Join, Nested Join and Row ID join

• Join Geography: How rows are relocated prior to the join

-Redistribute Rows, Duplicate Rows, AMP local etc.,

• Join Order: Sequence of table joins

- 5 table look ahead, pick the cheapest

Learn Teradata

Social Icons

Pages

Popular Posts

Followers

About Me

Coming Soon!

Tuesday, 21 June 2016

Teradata History

Teradata History

What is a Data Warehouse?

What is a RDBMS?

Teradata - A Brief History

The Teradata Charter

Teradata Differentiator

Parallelism

Query Parallelism:

Teradata Database query parallelism is enabled by the hash distribution of rows across all AMPs in the system. All

relational operations perform in parallel, simultaneously,and unconditionally across the AMPs, and each operation on an AMP is performed independently of the data on other AMPs

in the system.

•Within-a-step Parallelism:

The optimizer splits the SQL query into smaller base operations called steps,and dispatches these steps into AMPs. If a step contains complicated sub operations,they are processed in parallel pipelining.

•Multi-step Parallelism:

Multi-steps for the same query can be executed at the same time if they do not depend on the previous step results.

•Multi-statement request Parallelism:

1 comment:

Blog Archive

Social Icons