The Amazon EC2 team has been providing our customers with GPU-equipped instances for nearly a decade. The first-generation Cluster GPU instances were launched in late 2010, followed by the G2 (2013), P2 (2016), P3 (2017), G3 (2017), P3dn (2018), and G4 (2019) instances. Each successive generation incorporates increasingly-capable GPUs, along with enough CPU power, memory, and network bandwidth to allow the GPUs to be used to their utmost.
New EC2 P4 Instances
Today I would like to tell you about the new GPU-equipped P4 instances. These instances are powered by the latest Intel®Cascade Lake processors and feature eight of the latest NVIDIA A100 Tensor Core GPUs, each connected to all of the others by NVLink and with support for NVIDIA GPUDirect. With 2.5 PetaFLOPS of floating point performance and 320 GB of high-bandwidth GPU memory, the instances can deliver up to 2.5x the deep learning performance, and up to 60% lower cost to train when compared to P3 instances.
P4 instances include 1.1 TB of system memory and 8 TB of NVME-based SSD storage that can deliver up to 16 gigabytes of read throughput per second.
Network-wise, you have access to four 100 Gbps network connections to a dedicated, petabit-scale, non-blocking network fabric (accessible via EFA) that was designed specifically for the P4 instances, along with 19 Gbps of EBS bandwidth that can support up to 80K IOPS.
EC2 UltraClusters
The NVIDIA A100 GPUs, support for NVIDIA GPUDirect, 400 Gbps networking, the petabit-scale network fabric, and access to AWS services such as S3, Amazon FSx for Lustre, and AWS ParallelCluster give you all that you need to create on-demand EC2 UltraClusters with 4,000 or more GPUs:
These clusters can take on your toughest supercomputer-scale machine learning and HPC workloads: natural language processing, object detection & classification, scene understanding, seismic analysis, weather forecasting, financial modeling, and so forth.
Now Available
P4 instances are available in one size (p4d.24xlarge) and you can launch them in the US East (N. Virginia) and US West (Oregon) Regions today. Your AMI will need to have the NVIDIA A100 drivers and the most recent ENA driver (the Deep Learning Containers have already been updated).
If you are using multiple P4s to run distributed training jobs, you can use EFA and an MPI-compatible application to make the best use of the 400 Gbps of networking and the petabit-scale networking fabric.
You can purchase P4 instances in On-Demand, Savings Plan, Reserved Instance, and Spot form. Support for use of P4 instances in managed AWS services such as Amazon SageMaker and Amazon Elastic Kubernetes Service is in the works and will be available later this year.
Take it from Dave
My colleague Dave Brown has even more to say about the P4 instances:
Learn More
To learn more about the performance of P4d instances in comparison to the previous generation (P3) instances, read Amazon EC2 P4d Instances in UltraClusters. For pricing and additional technical details, read about P4 Instances.
— Jeff;