• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Saturday, August 23, 2025
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

Google AI Proposes Novel Machine Learning Algorithms for Differentially Private Partition Selection

Josh by Josh
August 23, 2025
in Al, Analytics and Automation
0
Google AI Proposes Novel Machine Learning Algorithms for Differentially Private Partition Selection
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


Differential privacy (DP) stands as the gold standard for protecting user information in large-scale machine learning and data analytics. A critical task within DP is partition selection—the process of safely extracting the largest possible set of unique items from massive user-contributed datasets (such as queries or document tokens), while maintaining strict privacy guarantees. A team of researchers from MIT and Google AI Research present novel algorithms for differentially private partition selection, which is an approach to maximize the number of unique items selected from a union of sets of data, while strictly preserving user-level differential privacy

The Partition Selection Problem in Differential Privacy

At its core, partition selection asks: How can we reveal as many distinct items as possible from a dataset, without risking any individual’s privacy? Items only known to a single user must remain secret; only those with sufficient “crowdsourced” support can be safely disclosed. This problem underpins critical applications such as:

READ ALSO

Seeing Images Through the Eyes of Decision Trees

Tried an AI Text Humanizer That Passes Copyscape Checker

  • Private vocabulary and n-gram extraction for NLP tasks.
  • Categorical data analysis and histogram computation.
  • Privacy-preserving learning of embeddings over user-provided items.
  • Anonymizing statistical queries (e.g., to search engines or databases).

Standard Approaches and Limits

Traditionally, the go-to solution (deployed in libraries like PyDP and Google’s differential privacy toolkit) involves three steps:

  1. Weighting: Each item receives a “score”, usually its frequency across users, with every user’s contribution strictly capped.
  2. Noise Addition: To hide precise user activity, random noise (usually Gaussian) is added to each item’s weight.
  3. Thresholding: Only items whose noisy score passes a set threshold—calculated from privacy parameters (ε, δ)—are released.

This method is simple and highly parallelizable, allowing it to scale to gigantic datasets using systems like MapReduce, Hadoop, or Spark. However, it suffers from fundamental inefficiency: popular items accumulate excess weight that doesn’t further aid privacy, while less-common but potentially valuable items often miss out because the excess weight isn’t redirected to help them cross the threshold.

Adaptive Weighting and the MaxAdaptiveDegree (MAD) Algorithm

Google’s research introduces the first adaptive, parallelizable partition selection algorithm—MaxAdaptiveDegree (MAD)—and a multi-round extension MAD2R, designed for truly massive datasets (hundreds of billions of entries).

Key Technical Contributions

  • Adaptive Reweighting: MAD identifies items with weight far above the privacy threshold, reroutes the excess weight to boost lesser-represented items. This “adaptive weighting” increases the probability that rare-but-shareable items are revealed, thus maximizing output utility.
  • Strict Privacy Guarantees: The rerouting mechanism maintains the exact same sensitivity and noise requirements as classic uniform weighting, ensuring user-level (ε, δ)-differential privacy under the central DP model.
  • Scalability: MAD and MAD2R require only linear work in dataset size and a constant number of parallel rounds, making them compatible with massive distributed data processing systems. They need not fit all data in-memory and support efficient multi-machine execution.
  • Multi-Round Improvement (MAD2R): By splitting privacy budget between rounds and using noisy weights from the first round to bias the second, MAD2R further boosts performance, allowing even more unique items to be safely extracted—especially in long-tailed distributions typical of real-world data.

How MAD Works—Algorithmic Details

  1. Initial Uniform Weighting: Each user shares their items with a uniform initial score, ensuring sensitivity bounds.
  2. Excess Weight Truncation and Rerouting: Items above an “adaptive threshold” have their excess weight trimmed and rerouted proportionally back to contributing users, who then redistribute this to their other items.
  3. Final Weight Adjustment: Additional uniform weight is added to make up for small initial allocation mistakes.
  4. Noise Addition and Output: Gaussian noise is added; items above the noisy threshold are output.

In MAD2R, the first-round outputs and noisy weights are used to refine which items should be focused on in the second round, with weight biases ensuring no privacy loss and further maximizing output utility.

Experimental Results: State-of-the-Art Performance

Extensive experiments across nine datasets (from Reddit, IMDb, Wikipedia, Twitter, Amazon, all the way to Common Crawl with nearly a trillion entries) show:

  • MAD2R outperforms all parallel baselines (Basic, DP-SIPS) on seven out of nine datasets in terms of number of items output at fixed privacy parameters.
  • On the Common Crawl dataset, MAD2R extracted 16.6 million out of 1.8 billion unique items (0.9%), but covered 99.9% of users and 97% of all user-item pairs in the data—demonstrating remarkable practical utility while holding the line on privacy.
  • For smaller datasets, MAD approaches the performance of sequential, non-scalable algorithms, and for massive datasets, it clearly wins in both speed and utility.
https://research.google/blog/securing-private-data-at-scale-with-differentially-private-partition-selection/
https://research.google/blog/securing-private-data-at-scale-with-differentially-private-partition-selection/

Concrete Example: Utility Gap

Consider a scenario with a “heavy” item (very commonly shared) and many “light” items (shared by few users). Basic DP selection overweights the heavy item without lifting the light items enough to pass the threshold. MAD strategically reallocates, increasing the output probability of the light items and resulting in up to 10% more unique items discovered compared to the standard approach.

Summary

With adaptive weighting and parallel design, the research team brings DP partition selection to new heights in scalability and utility. These advances ensure researchers and engineers can make fuller use of private data, extracting more signal without compromising individual user privacy.


Check out the Blog and Technical paper here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.



Source_link

Related Posts

Seeing Images Through the Eyes of Decision Trees
Al, Analytics and Automation

Seeing Images Through the Eyes of Decision Trees

August 23, 2025
Tried an AI Text Humanizer That Passes Copyscape Checker
Al, Analytics and Automation

Tried an AI Text Humanizer That Passes Copyscape Checker

August 22, 2025
Top 10 AI Blogs and News Websites for AI Developers and Engineers in 2025
Al, Analytics and Automation

Top 10 AI Blogs and News Websites for AI Developers and Engineers in 2025

August 22, 2025
AI-Powered Content Creation Gives Your Docs and Slides New Life
Al, Analytics and Automation

AI-Powered Content Creation Gives Your Docs and Slides New Life

August 22, 2025
What Is Speaker Diarization? A 2025 Technical Guide: Top 9 Speaker Diarization Libraries and APIs in 2025
Al, Analytics and Automation

What Is Speaker Diarization? A 2025 Technical Guide: Top 9 Speaker Diarization Libraries and APIs in 2025

August 22, 2025
Image Augmentation Techniques to Boost Your CV Model Performance
Al, Analytics and Automation

Image Augmentation Techniques to Boost Your CV Model Performance

August 22, 2025
Next Post
OpenCUA’s open source computer-use agents rival proprietary models from OpenAI and Anthropic

OpenCUA’s open source computer-use agents rival proprietary models from OpenAI and Anthropic

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

POPULAR NEWS

Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
7 Best EOR Platforms for Software Companies in 2025

7 Best EOR Platforms for Software Companies in 2025

June 21, 2025
Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Refreshing a Legacy Brand for a Meaningful Future – Truly Deeply – Brand Strategy & Creative Agency Melbourne

Refreshing a Legacy Brand for a Meaningful Future – Truly Deeply – Brand Strategy & Creative Agency Melbourne

June 7, 2025

EDITOR'S PICK

Peacock Feathers Are Stunning. They Can Also Emit Laser Beams

Peacock Feathers Are Stunning. They Can Also Emit Laser Beams

August 3, 2025
Episource is notifying millions of people that their health data was stolen

Episource is notifying millions of people that their health data was stolen

July 14, 2025
A Complete Guide To Brand Awareness, From Theory to Practice

A Complete Guide To Brand Awareness, From Theory to Practice

June 3, 2025

eClerx Marks 25 Years of Innovation, Growth, and Client Impact

August 22, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Our approach to energy innovation and AI’s environmental footprint
  • Transparency, accountability, security & trust
  • Maximize Your Amazon Affiliate Income with Pinterest
  • OpenCUA’s open source computer-use agents rival proprietary models from OpenAI and Anthropic
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?