Google Cloud announced significant updates to its AI Hypercomputer software layer, focusing on enhancing training and inference performance, improving resiliency at scale, and providing a centralized hub for AI Hypercomputer resources.
One of the key updates is the support for MaxText on A3 Mega VMs, enabling faster and more efficient training of large language models (LLMs). These VMs, powered by NVIDIA H100 Tensor Core GPUs, offer a 2X improvement in GPU-to-GPU network bandwidth over A3 VMs.
Additionally, Google Cloud introduced SparseCore on Cloud TPU v5p, providing hardware acceleration for embedding operations, leading to higher performance for recommender systems.
To enhance LLM inference, Google Cloud also introduced KV cache quantization and ragged attention kernels in JetStream, improving inference performance by up to 2X on Cloud TPU v5e.
With these updates, Google Cloud continues to empower organizations to accelerate their AI journeys by providing a performant and cost-effective infrastructure. The focus on optimized hardware and software, along with comprehensive resources, makes AI Hypercomputer an attractive solution for businesses looking to leverage the power of AI.