• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Tuesday, December 2, 2025
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

A Hands-On Introduction to cuML for GPU-Accelerated Machine Learning Workflows

Josh by Josh
September 28, 2025
in Al, Analytics and Automation
0
A Hands-On Introduction to cuML for GPU-Accelerated Machine Learning Workflows
0
SHARES
1
VIEWS
Share on FacebookShare on Twitter


In this article, you will learn what cuML is, and how it can significantly speed up the training of machine learning models through GPU acceleration.

Topics we will cover include:

  • The aim and distinctive features of cuML.
  • How to prepare datasets and train a machine learning model for classification with cuML in a scikit-learn-like fashion.
  • How to easily compare results with an equivalent conventional scikit-learn model, in terms of classification accuracy and training time.

Let’s not waste any more time.

A Hands-On Introduction to cuML for GPU-Accelerated Machine Learning Workflows

A Hands-On Introduction to cuML for GPU-Accelerated Machine Learning Workflows
Image by Editor | ChatGPT

Introduction

This article offers a hands-on Python introduction to cuML, a Python library from RAPIDS AI (an open-source suite within NVIDIA) for GPU-accelerated machine learning workflows across widely used models. In conjunction with its data science–oriented sibling, cuDF, cuML has gained popularity among practitioners who need scalable, production-ready machine learning solutions.

READ ALSO

Instruction Tuning for Large Language Models

Study Shows ChatGPT and Gemini Still Trickable Despite Safety Training

The hands-on tutorial below uses cuML together with cuDF for GPU-accelerated dataset management in a DataFrame format. For an introduction to cuDF, check out this related article.

About cuML: An “Accelerated Scikit-Learn”

RAPIDS cuML (short for CUDA Machine Learning) is an open-source library that accelerates scikit-learn–style machine learning on NVIDIA GPUs. It provides drop-in replacements for many popular algorithms, often reducing training and inference times on large datasets — without major code changes or a steep learning curve for those familiar with scikit-learn.

Among its three most distinctive features:

  • cuML follows a scikit-learn-like API, easing the transition from CPU to GPU for machine learning with minimal code changes
  • It covers a broad set of techniques — all GPU-accelerated — including regression, classification, ensemble methods, clustering, and dimensionality reduction
  • Through tight integration with the RAPIDS ecosystem, cuML works hand-in-hand with cuDF for data preprocessing, as well as with related libraries to facilitate end-to-end, GPU-native pipelines

Hands-On Introductory Example

To illustrate the basics of cuML for building GPU-accelerated machine learning models, we will consider a fairly large, yet easily accessible, dataset via public URL in Jason Brownlee’s repository: the adult income dataset. This is a large, slightly class-unbalanced dataset intended for binary classification tasks, namely predicting whether an adult’s income level is high (above $50K) or low (below $50K) based on a set of demographic and socio-economic features. Therefore, we aim to build a binary classification model.

IMPORTANT: To run the code below on Google Colab or a similar notebook environment, make sure you change the runtime type to GPU; otherwise, a warning will be raised indicating cuDF cannot find the specific CUDA driver library it utilizes.

We start by importing the necessary libraries for our scenario:

import cudf

import cuml

from cuml.model_selection import train_test_split as gpu_train_test_split

from cuml.linear_model import LogisticRegression as cuLogReg

from IPython.display import display

 

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

import time

Note that, in addition to cuML modules and functions to split the dataset and train a logistic regression classifier, we have also imported their classical scikit-learn counterparts. While not mandatory for using cuML (as it works independently from plain scikit-learn), we are importing equivalent scikit-learn components for the sake of comparison in the rest of the example.

Next, we load the dataset into a cuDF dataframe optimized for GPU usage:

url = “https://raw.githubusercontent.com/jbrownlee/Datasets/master/adult-all.csv”

# Column names (they are not included in the dataset’s CSV file we will read)

cols = [

    “age”,“workclass”,“fnlwgt”,“education”,“education_num”,

    “marital_status”,“occupation”,“relationship”,“race”,“sex”,

    “capital_gain”,“capital_loss”,“hours_per_week”,“native_country”,“income”

]

 

df = cudf.read_csv(url, header=None, names=cols)

display(df.head())

Once the data is loaded, we identify the target variable and convert it into binary (1 for high income, 0 for low income):

df[“income”] = df[“income”].str.strip()

df[“income”] = (df[“income”] == “>50K”).astype(“int32”)

This dataset combines numeric features with a slight predominance of categorical ones. Most scikit-learn models — including decision trees and logistic regression — do not natively handle string-valued categorical features, so they require encoding. A similar pattern applies to cuML; hence, we will select a small number of features to train our classifier and one-hot encode the categorical ones.

# Feature selection (let’s say based on domain expertise!)

features = [“age”,“education_num”,“hours_per_week”,“workclass”,“occupation”,“sex”]

X = df[features]

y = df[“income”]

 

# One-hot encode categorical features

X_enc = cudf.get_dummies(X, drop_first=True)

print(“Encoded feature shape:”, X_enc.shape)

So far, we have used cuML (and also cuDF) much like using classical scikit-learn along with Pandas.

Now comes the interesting part. We will split the dataset into training and test sets and train a logistic regression classifier twice, using both CUDA GPU (cuML) and standalone scikit-learn. We will then compare both the classification accuracy and the time taken to train each model. Here’s the complete code for the model training and comparison:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

# MODEL 1: GPU (cuML) train-test split and training

t0 = time.time()

X_train, X_test, y_train, y_test = gpu_train_test_split(X_enc, y, test_size=0.2, random_state=42)

 

model_gpu = cuLogReg(max_iter=1000)

model_gpu.fit(X_train, y_train)

gpu_time = time.time() – t0

 

acc_gpu = model_gpu.score(X_test, y_test)

print(f“cuML Logistic Regression accuracy: {acc_gpu:.4f}, time: {gpu_time:.3f} sec”)

 

# MODEL 2: Scikit-learn and Pandas-driven train-test split and model training

df_pd = pd.read_csv(url, header=None, names=cols)

df_pd[“income”] = df_pd[“income”].str.strip()

df_pd[“income”] = (df_pd[“income”] == “>50K”).astype(“int32”)

 

X_pd = df_pd[features]

y_pd = df_pd[“income”]

X_pd = pd.get_dummies(X_pd, drop_first=True)

 

t0 = time.time()

X_train_pd, X_test_pd, y_train_pd, y_test_pd = train_test_split(X_pd, y_pd, test_size=0.2, random_state=42)

 

model_cpu = LogisticRegression(max_iter=1000)

model_cpu.fit(X_train_pd, y_train_pd)

cpu_time = time.time() – t0

 

acc_cpu = model_cpu.score(X_test_pd, y_test_pd)

print(f“scikit-learn Logistic Regression accuracy: {acc_cpu:.4f}, time: {cpu_time:.3f} sec”)

The results are quite interesting. They should look something like:

cuML Logistic Regression accuracy: 0.8014, time: 0.428 sec

scikit–learn Logistic Regression accuracy: 0.8097, time: 15.184 sec

As we can observe, the model trained with cuML achieved very similar classification performance to its classical scikit-learn counterpart, but it trained over an order of magnitude faster: about 0.5 seconds compared to roughly 15 seconds for the scikit-learn classifier. Your exact numbers will vary with hardware, drivers, and library versions.

Wrapping Up

This article provided a gentle, hands-on introduction to the cuML library for enabling GPU-boosted construction of machine learning models for classification, regression, clustering, and more. Through a simple comparison, we showed how cuML can help build effective models with significantly enhanced training efficiency.



Source_link

Related Posts

Instruction Tuning for Large Language Models
Al, Analytics and Automation

Instruction Tuning for Large Language Models

December 2, 2025
Study Shows ChatGPT and Gemini Still Trickable Despite Safety Training
Al, Analytics and Automation

Study Shows ChatGPT and Gemini Still Trickable Despite Safety Training

December 2, 2025
MIT Sea Grant students explore the intersection of technology and offshore aquaculture in Norway | MIT News
Al, Analytics and Automation

MIT Sea Grant students explore the intersection of technology and offshore aquaculture in Norway | MIT News

December 2, 2025
MiniMax-M2: Technical Deep Dive into Interleaved Thinking for Agentic Coding Workflows
Al, Analytics and Automation

MiniMax-M2: Technical Deep Dive into Interleaved Thinking for Agentic Coding Workflows

December 2, 2025
Pretrain a BERT Model from Scratch
Al, Analytics and Automation

Pretrain a BERT Model from Scratch

December 1, 2025
How to Design an Advanced Multi-Page Interactive Analytics Dashboard with Dynamic Filtering, Live KPIs, and Rich Visual Exploration Using Panel
Al, Analytics and Automation

How to Design an Advanced Multi-Page Interactive Analytics Dashboard with Dynamic Filtering, Live KPIs, and Rich Visual Exploration Using Panel

December 1, 2025
Next Post
How to Choose the Right Gaming Laptop (2025): What You Need to Know

How to Choose the Right Gaming Laptop (2025): What You Need to Know

POPULAR NEWS

Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
7 Best EOR Platforms for Software Companies in 2025

7 Best EOR Platforms for Software Companies in 2025

June 21, 2025

EDITOR'S PICK

Pinterest Posting Frequency: How Often Should You Pin?

September 16, 2025
A Guide for Writing a Social Media Plan to Market Headstones

A Guide for Writing a Social Media Plan to Market Headstones

July 24, 2025
Google partners on watershed health in North and South Carolina

Google partners on watershed health in North and South Carolina

May 30, 2025
PR Strategies That Drive Success for New Lifestyle Summits

PR Strategies That Drive Success for New Lifestyle Summits

July 25, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Lessons on Leadership, Creativity and Coming Home
  • 8 Essential Breakdowns to Use in Meta Ads Manager
  • What Does A Freelance Copywriter Do? A Complete Guide For Beginners
  • Instruction Tuning for Large Language Models
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?