[Redshift Week] Principles

Let’s have an overview on Amazon Redshift Principles : MPP/Shared nothing architecture, cluster, distributed computing.

Since last week we are official Technology Partner with the Amazon Web Service Partner Network. It is a clear occasion to explain how we work with Amazon Redshift on a Tech point of view.

Julien Theulier, Data Plumber in chief at Squid, has done a brilliant work explaining RedShift. As it is a long and technical Blog Post we are going to post a piece of the analysis every day.

Let’s have an overview on Amazon Redshift Principles:

 

MPP / Shared Nothing Architecture:

Screen Shot 2014-01-27 at 5.48.44 PM

Key concept I : Cluster

Leader Node:

Manage communications between clients and compute nodes, compute execution plan from SQL, distribute data and execution code to nodes, aggregate results.

Compute Node:

Each compute node has its own dedicated CPU, memory, and attached disk storage. Execute code on distributed data and send result back to leader.

 Key concept II : Distributed Computing

Nodes Slices:

A compute node is partitioned into slices; one slice for each core of the node’s multi-core processor. The leader node manages distributing data to the slices. The slices then work in parallel to complete the operation.

Data Distribution:

Data are distributed to slices according to the TABLE distribution policy (column, random, all).

 

Next blog post tomorrow – stay tuned!