Future Tech

Superclusters too big, but single servers too small? Oracle offers AI Goldilocks zone

Tan KW
Publish date: Thu, 01 Aug 2024, 09:37 AM
Tan KW
0 461,328
Future Tech

Oracle has created a pair of for-rent AI infrastructure options aimed at medium-scale AI training and inference workloads - and teased the arrival of Nvidia's GH200 superchip in its cloud

On Wednesday, Big Red's product marketing director Akshai Parthasarathy and principal product manager Sagar Zanwar detailed the two new "shapes" - Oracle-speak for cloud instance types - for mid-range AI workloads.

One bears the snappy moniker of BM.GPU.L40S.4. The BM stands for bare metal, and in this shape the boxes come equipped with four Nvidia L40S GPUs - each with 48GB of GDDR6 memory, 7.38TB of local NVMe capacity, 4th Generation Intel Xeon CPUs with 112 cores, and a terabyte of system memory.

The BM.GPU.L40S.4 shape is "orderable now."

If you prefer virtual machines, Oracle has defined two more shapes, but isn't ready to rent them just yet - instead billing them as coming "soon."

The VM.GPU.A100.1 and VM.GPU.H100.1 shapes support either an Nvidia A100 or H100 accelerator. The H100 shape will include up to 80GB of HBM3 memory, 2x 3.84TB of NVMe drive capacity, 13 cores from 4th Gen Intel Xeon processors, and 246GB of system memory.

The A100 offering will pack either 40GB or 80GB of HBM2e memory.

Parthasarathy and Zanwar pitched the offerings as suitable for users who feel Oracle's AI superclusters are too big, but single-node offerings packing one to four GPUs are too small.

FLOP for FLOP, the L40S looks to outperform Nvidia's older, Ampere-based A100, which is also offered in a higher-end Oracle cluster. The L40S boasts 183 TFLOPs in TF32 to the A100's 156 TFLOPs, but the L40S has a major disadvantage in its relatively paltry 864GB/sec memory bandwidth compared to the A100's 1,555GB/sec for the 40GB variant and 2,039GB/sec for the 80GB version.

Memory bandwidth is crucial for AI inferencing, especially when it comes to token-per-second performance - which is presumably why Oracle considers the A100 more powerful than the L40S.

Given the memory buffer size of 48GB, L40S superclusters will probably be best suited for large language models with up to 14 billion parameters, so as to allow up to 2GB per billion parameters and 20GB reserved for the context window and batch parts of the equation.

Technically, the combined memory size of multiple L40S GPUs would permit larger models, but since the L40S lacks NVLink and instead uses slower PCIe 4.0, the performance is likely to be less than optimal. Quantizing could, however, increase the number of parameters for L40S clusters without running into memory constraints.

It's not clear if Oracle is using individual PCIe versions of the A100 and H100 or their SXM variants, which allow multiple GPUs on the same board. We suspect it's using the SXM model, which is intended for sharing among VMs.

Whatever powers these VMs, they are a substantial improvement from the A10-powered VM Oracle has offered previously and now casts as workstation-grade oferings. With 80GB of memory, these new VMs ought to be able to run LLMs with 30 billion parameters without any quantization, and the high memory bandwidth should allow for relatively high token-per-second rates.

Oracle also teased a BM.GPU.GH200 compute shape, currently in customer testing.

It features the Nvidia Grace Hopper Superchip and NVLink C2C - a high-bandwidth cache-coherent 900GB/sec connection between Nvidia's Grace CPU and Hopper GPU that provides over 600GB of accessible memory, enabling up to 10X higher performance for AI and HPC workloads. Customers interested in the Grace architecture and upcoming Grace Blackwell Superchip can ask Big Red for access. ®

 

https://www.theregister.com//2024/08/01/oracle_l40s_clusters/

Discussions
Be the first to like this. Showing 0 of 0 comments

Post a Comment