Graphcore IPU: Transforming Industries with Next-Gen AI Power

September Edition 2020

Cio Bulletin

StreetLight, transportation analysis platform

The Graphcore IPU is going to be transformative across all industries

The Intelligence Processing Unit (IPU) is completely different from today's CPU and GPU processors. It is a highly flexible, easy to use, parallel processor that has been designed from the ground up to deliver state of the art performance on today's machine intelligence models for both training and inference.

Graphcore is optimistic for a future where people live healthier, more informed, more creative lives. It sees a world where technology enhances human potential, and takes itself into a new era of intelligence and progress that everyone can benefit from.

“We let innovators create the next breakthroughs in machine intelligence.”

The firm believes its Intelligence Processing Unit (IPU) technology will become the worldwide standard for machine intelligence compute. The Graphcore IPU is going to be transformative across all industries whether you are a medical researcher, roboticist or building autonomous cars.

Graphcore has created a completely new processor, the IPU, specifically designed for AI compute. The IPU’s unique architecture lets AI researchers undertake entirely new types of work, not possible using current technologies, to drive the next advances in machine intelligence.

The IPU-Machine: IPU-M2000

The IPU-M2000 is Graphcore’s revolutionary next-generation system solution built with the Colossus MK2 IPU. It packs 1 PetaFlop of AI compute and up to 450GB Exchange-Memory™ in a slim 1U blade for the most demanding machine intelligence workloads.

The IPU-M2000 has a flexible, modular design, so you can start with one and scale to thousands. Directly connect a single system to an existing CPU server, add up to eight connected IPU-M2000s or with racks of 16 tightly interconnected IPU-M2000s in IPU-POD64 systems, grow to supercomputing scale thanks to the high-bandwidth, near-zero latency IPU-Fabric™ interconnect architecture built into the box.

A core, new building block for AI infrastructure, the IPU-M2000 is powered by 4 x Colossus Mk2 GC200, Graphcore’s second generation 7nm IPU. It packs 1 PetaFlop of AI compute, up to 450GB Exchange Memory and 2.8Tbps IPU-Fabric for super low latency communication, in a slim 1U blade to handle the most demanding of machine intelligence workloads. The IPU-M2000 has a flexible, modular design, so you can start with one and scale to thousands. It works as a standalone system, eight can be stacked together or racks of 16 tightly interconnected IPU-M2000’s in IPU-POD64 systems can grow to supercomputing scale thanks to 2.8Tbps high-bandwidth, near-zero latency IPU-Fabric™ interconnect architecture, built into the box.

The Graphcore IPU-POD64

IPU-POD64 is Graphcore's unique solution for massive, disaggregated scale-out enabling high-performance machine intelligence compute to supercomputing scale. The IPU-POD64 builds upon the innovative IPU-M2000 and offers seamless scale-out up to 64,000 IPUs working as one integral whole or as independent subdivided partitions to handle multiple workloads and different users.

The IPU-POD64 has 16 IPU-M2000s in a standard rack. IPU-PODs communicate with near-zero latency using Graphcore’s unique IPU-Fabric™ interconnect architecture. IPU-Fabric has been specifically designed to eliminate communication bottlenecks and allow thousands of IPUs to operate on machine intelligence workloads as a single, high-performance and ultra-fast cohesive unit.

Radical new breakthroughs

Compute

MK2 IPU systems deliver unparalleled performance and flexibility from device to scale-out, with 1 PetaFlops of AI-compute and more FP32 compute than any other processor.

Communications

IPU-Fabric™ is the firm’s innovative, ultra-fast and jitter-free communications technology. It offers 2.8Tbps communication in all directions from any IPU to any IPU and can scale up to 64,000 IPUs.

Data

The IPU-M2000 has an unprecedented 450GB Exchange-Memory™ - 3.6GB In-Processor Memory™ plus up to 448GB Streaming Memory™ for larger models. This is crucial for modern AI workloads –how you access memory is as important as how you perform the compute once you've fetched the data.

Scalability

IPU-POD64 is Graphcore’s solution for massive disaggregated machine intelligence scale-out. IPU-POD64 leverages the ultra-fast IPU-Fabric for outstanding performance at scale, and is designed for seamless deployment and integration into existing data center set-ups.

Sp‍ar‌‍se c‌‍o‌mp‌ute. Simple to‍ use.

Sparsity

In machine intelligence, the search for better model efficiency runs parallel to the shift towards ever larger model sizes. Model sparsity is integral to this emerging trend. Graphcore’s IPU products are designed with a fine-grained architecture from device to massive scale-out using tens of thousands of IPUs. This fine-grained independent processing is fundamental to Graphcore’s design philosophy and is ideally suited to leveraging model sparsity and model collectives such as all-reduce and all-gather operations.

Co-Designed with Poplar^® SDK

With IPU-POD64 systems you can run vast workloads across up to 64,000 IPUs. With Poplar, computing on this scale is as simple as using a single machine. Poplar takes care of all the scaling and optimization – allowing you to focus on the model and the results.

Graphcore has also made it possible to dynamically share your AI compute between users, with its Virtual-IPU software when you want to allow multiple users to run different workloads at the same time.

It supports industry standard ecosystem tools for infrastructure management, including Open BMC and Redfish, Docker containers and orchestration with Slurm and Kubernetes. And its adding support for more platforms all the time.

Shake hands with Nigel Toon

Nigel is Co-Founder & CEO of Graphcore. Nigel was CEO of two VC-backed silicon companies before founding Graphcore; Picochip, which was sold to Mindspeed in 2012 and most recently, XMOS, in which Graphcore was incubated for two years before being established as a separate entity in 2016.

Before that he was Co-Founder of Icera, a 3G cellular modem chip company, where he led Sales and Marketing and was on the Board of Directors. Icera was sold to NVIDIA in 2011 for $435M.

Prior to Icera, he was Vice President and General Manager at Altera Corporation where he spent 13 years and was responsible for establishing and building the European business unit that grew to over $400m annual revenues.