• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Sunday, June 14, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Google Marketing

Speeding Up AI: Bringing Google Colossus to PyTorch via GCSFS and Rapid Bucket

Josh by Josh
April 29, 2026
in Google Marketing
0
Speeding Up AI: Bringing Google Colossus to PyTorch via GCSFS and Rapid Bucket


Today, we are announcing a major performance boost for AI/ML workloads using the PyTorch ecosystem on Google Cloud. By integrating Rapid Storage, powered by Google’s Colossus storage architecture, directly with PyTorch via the industry-standard fsspec interface, we are enabling researchers and developers to keep their GPUs busier than ever before.

The challenge: Keeping GPUs fed

As model sizes grow, data loading and checkpointing often become the primary bottlenecks in training. Data preparation activities to train models involve fetching and processing terabytes and petabytes of data from remote storage mechanisms like object storage. Standard REST-based storage access can struggle to meet the extreme throughput and low-latency requirements of modern distributed training, wasting valuable GPU resources.

Rapid Bucket: Rapid Storage via bi-di gRPC

Our new Rapid Bucket solution provides high-performance object storage in dedicated zonal buckets. By bypassing legacy REST APIs and utilizing persistent gRPC bidirectional streams, we’ve brought the power of Colossus, filesystem stateful protocols that power YouTube and Google Search, directly to the PyTorch ecosystem.

Key performance metrics of Rapid Storage

  • Extreme Throughput: 15+ TiB/s aggregate throughput.
  • Ultra-Low Latency: <1ms for random reads and append writes.
  • High QPS: Rapid Bucket provides 20M+ QPS.

Fsspec – PyTorch’s Pythonic file interface

fsspec is the pervasive Pythonic interface for file systems in the PyTorch ecosystem. It is already used for:

  • Data preparation: Dask, Pandas, Hugging Face Datasets, Ray Data
  • Checkpoints: PyTorch Lightning, Torch.dist, Weights & Biases
  • Inference: vLLM

There are various backend implementations of fsspec for many different storage systems, which can all be integrated under a single layer, eliminating the need to write specific code for each backend. By integrating Rapid Storage with gcsfs (the Google Cloud Storage implementation of fsspec), developers can leverage speed gains provided by Rapid with a simple fsspec.open() call — no complex code rewrites required.

Under the hood: Leveraging Colossus

To achieve a performance boost with Rapid Buckets, we optimized the entire data path:

  1. Stateful grpc-based streaming: gRPC bi-directional streaming keeps the connection alive, minimizing per-operation overhead like connection setup, auth, metadata etc., and enabling efficient, stateful data exchange for multiple reads or appends within a single object.
  2. Direct path: Google Cloud Storage(GCS) Rapid Bucket uses direct connectivity for its gRPC bi-directional streaming APIs (BidiReadObject, BidiWriteObject) to achieve maximum performance by connecting clients directly to underlying Colossus files. Non-Rapid traffic to GCS would typically have more network hops than direct paths, making read/write latencies over Rapid significantly lower. For more details, see Rapid storage internal working.
  3. Zonal co-location: By placing storage in the same zone as your compute (e.g., us-central1-a), we eliminate cross-zone latency. Prior to Rapid buckets, data in a regional bucket and compute(accelerators) can be in different zones and access the data induced latency.
  4. No-Op User Migration: Preserved the existing fsspec API while entirely upgrading internal traffic from HTTP to BiDi-gRPC for Rapid buckets. By adding bucket-type auto-detection to gcsfs, PyTorch and other fsspec clients transparently utilize Rapid with zero manual configuration.

Results

A dataset of 134M rows totaling around 451GB was loaded onto 16 GKE nodes, each containing eight A4 GPUs. Training was conducted in 100 steps, with a checkpoint after every 25 steps using PyTorch Lightning. We benchmarked the performance of total training time, including the data load times, and we observed a performance gain of 23% using Rapid Bucket compared with Standard regional bucket.

Microbenchmarking — that is, measuring the performance of a building block like I/O or resource usage — confirms these gains. Throughput improved by 4.8x for reads (both sequential and random) and 2.8x for writes. These tests used 16MB IO sizes across 48 processes. You can find more details at GCSFS-performance-benchmarks.

Get started

Getting started with GCSFS on Rapid Bucket is easy. Your existing code and scripts remain the same. You just need to change the bucket to a Rapid Bucket to take advantage of the performance boost.

To install:

Rapid Bucket integration is available from version 2026.3.0.

Code sample to read/write from GCS Rapid:

import gcsfs

# Initialize the filesystem
fs = gcsfs.GCSFileSystem()

# Writing to a Rapid bucket
with fs.open('my-zonal-rapid-bucket/data/checkpoint.pt', 'wb') as f:
   f.write(b"model data...")

# Appending to an existing object (Native Rapid feature)
with fs.open('my-zonal-rapid-bucket/data/checkpoint.pt', 'ab') as f:
   f.write(b"appended data...")

Python



Source_link

READ ALSO

Why safeguarding and digital literacy must go hand-in-hand

Gemini in Chrome expanding to more markets

Related Posts

Why safeguarding and digital literacy must go hand-in-hand
Google Marketing

Why safeguarding and digital literacy must go hand-in-hand

June 14, 2026
Gemini in Chrome expanding to more markets
Google Marketing

Gemini in Chrome expanding to more markets

June 13, 2026
AI wasn’t just slop at this year’s Tribeca Film Festival
Google Marketing

AI wasn’t just slop at this year’s Tribeca Film Festival

June 13, 2026
New Gemini app features for small businesses
Google Marketing

New Gemini app features for small businesses

June 13, 2026
Google funds skilled trades training for the American economy
Google Marketing

Google funds skilled trades training for the American economy

June 13, 2026
Google sues scammers using Gemini AI for financial scams
Google Marketing

Google sues scammers using Gemini AI for financial scams

June 12, 2026
Next Post
How AI Policy in South Africa Is Ruining Itself

How AI Policy in South Africa Is Ruining Itself

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

Conversions Lost Due to Attribution Changes

Conversions Lost Due to Attribution Changes

March 8, 2026
10 Essential Shopify Tasks You Should Automate Using AI –

10 Essential Shopify Tasks You Should Automate Using AI –

June 6, 2025
We’re investing in connectivity, products and skills for Africa’s AI future

We’re investing in connectivity, products and skills for Africa’s AI future

September 18, 2025
Craft Food Roblox Sushi Set Recipe

Craft Food Roblox Sushi Set Recipe

January 26, 2026

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Branding, AI and the rise of conference attire
  • Smartphones broke dating. ChatGPT might finish the job.
  • What are the Top-Rated Personalization Platforms for Enterprises?
  • How Brands Built Connection at Licensing Expo 2026
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions