Run your AI inference applications on Cloud Run with NVIDIA GPUs

Google Cloud has announced the addition of NVIDIA L4 GPU support to Cloud Run, in preview. This opens up many new use cases for Cloud Run developers, including:

* Performing real-time inference with lightweight open models such as Google’s open Gemma (2B/7B) models or Meta’s Llama 3 (8B) to build custom chatbots or on-the-fly document summarization, while scaling to handle spiky user traffic.

* Serving custom fine-tuned gen AI models, such as image generation tailored to your company's brand, and scaling down to optimize costs when nobody's using them.

* Speeding up your compute-intensive Cloud Run services, such as on-demand image recognition, video transcoding and streaming, and 3D rendering.

As a fully managed platform, Cloud Run lets you run your code directly on top of Google’s scalable infrastructure, combining the flexibility of containers with the simplicity of serverless to help boost your productivity. With Cloud Run, you can run frontend and backend services, batch jobs, deploy websites and applications, and handle queue processing workloads — all without having to manage the underlying infrastructure.

At the same time, many workloads that perform AI inference, especially applications that demand real-time processing, require GPU acceleration to deliver responsive user experiences. With support for NVIDIA GPUs, you can perform on-demand online AI inference using the LLMs of your choice in seconds.

Early customers are excited about the combination of Cloud Run and NVIDIA GPUs.

“Cloud Run's GPU support has been a game-changer for our real-time inference applications. The low cold-start latency is impressive, allowing our models to serve predictions almost instantly, which is critical for time-sensitive customer experiences. Additionally, Cloud Run GPUs maintain consistently minimal serving latency under varying loads, ensuring our generative AI applications are always responsive and dependable — all while effortlessly scaling to zero during periods of inactivity. Overall, Cloud Run GPUs have significantly enhanced our ability to provide fast, accurate, and efficient results to our end users.” - Thomas MENARD, Head of AI - Global Beauty Tech, L’Oreal

Overall, the addition of NVIDIA GPU support to Cloud Run is a significant development for developers looking to build real-time AI inference applications. This feature will allow developers to take advantage of the power of NVIDIA GPUs, while enjoying the ease of use and scalability of Cloud Run.

To get started with Cloud Run with NVIDIA GPUs, you can sign up for the preview program at g.co/cloudrun/gpu.

Run your AI inference applications on Cloud Run with NVIDIA GPUs

Recommends