• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Tuesday, June 23, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

10 Ways to Use Embeddings for Tabular ML Tasks

Josh by Josh
February 2, 2026
in Al, Analytics and Automation
0
10 Ways to Use Embeddings for Tabular ML Tasks


10 Ways to Use Embeddings for Tabular ML Tasks

10 Ways to Use Embeddings for Tabular ML Tasks
Image by Editor

Introduction

Embeddings — vector-based numerical representations of typically unstructured data like text — have been primarily popularized in the field of natural language processing (NLP). But they are also a powerful tool to represent or supplement tabular data in other machine learning workflows. Examples not only apply to text data, but also to categories with a high level of diversity of latent semantic properties.

READ ALSO

GLM-5.2 OpenAI-Compatible API: A Hands-On Guide to Reasoning Effort, Function Calling, and Long-Context Retrieval

Sakana AI Launches Sakana Fugu: An Orchestration Model That Routes Tasks Across a Swappable Pool of Frontier LLMs

This article uncovers 10 insightful uses of embeddings to leverage data at its fullest in a variety of machine learning tasks, models, or projects as a whole.

Initial Setup: Some of the 10 strategies described below will be accompanied by brief illustrative code excerpts. An example toy dataset used in the examples is provided first, along with the most basic and commonplace imports needed in most of them.

import pandas as pd

import numpy as np

 

# Example customer reviews’ toy dataset

df = pd.DataFrame({

    “user_id”: [101, 102, 103, 101, 104],

    “product”: [“Phone”, “Laptop”, “Tablet”, “Laptop”, “Phone”],

    “category”: [“Electronics”, “Electronics”, “Electronics”, “Electronics”, “Electronics”],

    “review”: [“great battery”, “fast performance”, “light weight”, “solid build quality”, “amazing camera”],

    “rating”: [5, 4, 4, 5, 5]

})

1. Encoding Categorical Features With Embeddings

This is a useful approach in applications like recommender systems. Rather than being handled numerically, high-cardinality categorical features, like user and product IDs, are best turned into vector representations. This approach has been widely applied and shown to effectively capture the semantic aspects and relationships among users and products.

This practical example defines a couple of embedding layers as part of a neural network model that takes user and product descriptors and converts them into embeddings.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

from tensorflow.keras.layers import Input, Embedding, Flatten, Dense, Concatenate

from tensorflow.keras.models import Model

 

# Numeric and categorical

user_input = Input(shape=(1,))

user_embed = Embedding(input_dim=500, output_dim=8)(user_input)

user_vec = Flatten()(user_embed)

 

prod_input = Input(shape=(1,))

prod_embed = Embedding(input_dim=50, output_dim=8)(prod_input)

prod_vec = Flatten()(prod_embed)

 

concat = Concatenate()([user_vec, prod_vec])

output = Dense(1)(concat)

 

model = Model([user_input, prod_input], output)

model.compile(“adam”, “mse”)

2. Averaging Word Embeddings for Text Columns

This approach compresses multiple texts of variable length into fixed-size embeddings by aggregating word-wise embeddings within each text sequence. It resembles one of the most common uses of embeddings; the twist here is aggregating word-level embeddings into a sentence- or text-level embedding.

The following example uses Gensim, which implements the popular Word2Vec algorithm to turn linguistic units (typically words) into embeddings, and performs an aggregation of multiple word-level embeddings to create an embedding associated with each user review.

from gensim.models import Word2Vec

 

# Train embeddings on the review text

sentences = df[“review”].str.lower().str.split().tolist()

w2v = Word2Vec(sentences, vector_size=16, min_count=1)

 

df[“review_emb”] = df[“review”].apply(

    lambda t: np.mean([w2v.wv[w] for w in t.lower().split()], axis=0)

)

3. Clustering Embeddings Into Meta-Features

Vertically stacking multiple individual embedding vectors into a 2D NumPy array (a matrix) is the core step to perform clustering on a set of customer review embeddings and identify natural groupings that might relate to topics in the review set. This technique captures coarse semantic clusters and can yield new, informative categorical features.

from sklearn.cluster import KMeans

 

emb_matrix = np.vstack(df[“review_emb”].values)

km = KMeans(n_clusters=3, random_state=42).fit(emb_matrix)

df[“review_topic”] = km.labels_

4. Learning Self-Supervised Tabular Embeddings

As surprising as it may sound, learning numerical vector representations of structured data — particularly for unlabeled datasets — is a clever way to turn an unsupervised problem into a self-supervised learning problem: the data itself generates training signals.

While these approaches are a bit more elaborate than the practical scope of this article, they commonly use one of the following strategies:

  • Masked feature prediction: randomly hide some features’ values — similar to masked language modeling for training large language models (LLMs) — forcing the model to predict them based on the remaining visible features.
  • Perturbation detection: expose the model to a noisy variant of the data, with some feature values swapped or replaced, and set the training goal as identifying which values are “legitimate” and which ones have been altered.

5. Building Multi-Labeled Categorical Embeddings

This is a robust approach to prevent runtime errors when certain categories are not in the vocabulary used by embedding algorithms like Word2Vec, while maintaining the usability of embeddings.

This example represents a single category like “Phone” using multiple tags such as “mobile” or “touch.” It builds a composite semantic embedding by aggregating the embeddings of associated tags. Compared to standard categorical encodings like one-hot, this method captures similarity more accurately and leverages knowledge beyond what Word2Vec “knows.”

tags = {

    “Phone”: [“mobile”, “touch”],

    “Laptop”: [“portable”, “cpu”],

    “Tablet”: []  # Added to handle the ‘Tablet’ product

}

 

def safe_mean_embedding(words, model, dim):

    vecs = [model.wv[w] for w in words if w in model.wv]

    return np.mean(vecs, axis=0) if vecs else np.zeros(dim)

 

df[“tag_emb”] = df[“product”].apply(

    lambda p: safe_mean_embedding(tags[p], w2v, 16)

)

6. Using Contextual Embeddings for Categorical Features

This slightly more sophisticated approach first maps categorical variables into “standard” embeddings, then passes them through self-attention layers to produce context-enriched embeddings. These dynamic representations can change across data instances (e.g., product reviews) and capture dependencies among attributes as well as higher-order feature interactions. In other words, this allows downstream models to interpret a category differently based on context — i.e. the values of other features.

7. Learning Embeddings on Binned Numerical Features

It is common to convert fine-grained numerical features like age into bins (e.g., age groups) as part of data preprocessing. This strategy produces embeddings of binned features, which can capture outliers or nonlinear structure underlying the original numeric feature.

In this example, the numerical rating feature is turned into a binned counterpart, then a neural embedding layer learns a unique 3D vector representation for diverse rating ranges.

bins = pd.cut(df[“rating”], bins=4, labels=False)

emb_numeric = Embedding(input_dim=4, output_dim=3)(Input(shape=(1,)))

8. Fusing Embeddings and Raw Features (Interaction Features)

Suppose you encounter a label not found in Word2Vec (e.g., a product name like “Phone”). This approach combines pre-trained semantic embeddings with raw numerical features in a single input vector.

This example first obtains a 16-dimensional embedding representation for categorical product names, then appends raw ratings. For downstream modeling, this helps the model understand both products and how they are perceived (e.g., sentiment).

df[“product_emb”] = df[“product”].str.lower().apply(

    lambda p: w2v.wv[p] if p in w2v.wv else np.zeros(16)

)

 

df[“user_product_emb”] = df.apply(

    lambda r: np.concatenate([r[“product_emb”], [r[“rating”]]]),

    axis=1

)

9. Using Sentence Embeddings for Long Text

Sentence transformers convert full sequences like text reviews into embedding vectors that capture sequence-level semantics. With a small twist — converting a review into a list of vectors — we transform unstructured text into fixed-width attributes that can be used by models alongside classical tabular columns.

from sentence_transformers import SentenceTransformer

 

model = SentenceTransformer(“sentence-transformers/all-MiniLM-L6-v2”)

df[“sent_emb”] = list(model.encode(df[“review”].tolist()))

10. Feeding Embeddings Into Tree Models

The final strategy combines representation learning with tabular data learning in a hybrid fusion approach. Similar to the previous item, embeddings found in a single column are expanded into several feature columns. The focus here is not on how embeddings are created, but on how they are used and fed to a downstream model alongside other data.

import xgboost as xgb

 

X = pd.concat(

    [pd.DataFrame(df[“review_emb”].tolist()), df[[“rating”]]],

    axis=1

)

y = df[“rating”]

 

model = xgb.XGBRegressor()

model.fit(X, y)

Closing Remarks

Embeddings are not merely an NLP thing. This article showed a variety of possible uses of embeddings — with little to no extra effort — that can strengthen machine learning workflows by unlocking semantic similarity among examples, providing richer interaction modeling, and producing compact, informative feature representations.



Source_link

Related Posts

GLM-5.2 OpenAI-Compatible API: A Hands-On Guide to Reasoning Effort, Function Calling, and Long-Context Retrieval
Al, Analytics and Automation

GLM-5.2 OpenAI-Compatible API: A Hands-On Guide to Reasoning Effort, Function Calling, and Long-Context Retrieval

June 23, 2026
Sakana AI Launches Sakana Fugu: An Orchestration Model That Routes Tasks Across a Swappable Pool of Frontier LLMs
Al, Analytics and Automation

Sakana AI Launches Sakana Fugu: An Orchestration Model That Routes Tasks Across a Swappable Pool of Frontier LLMs

June 22, 2026
How to Design Python-First Interactive Dashboards with Prefab Reactive UI Components and Static HTML Export
Al, Analytics and Automation

How to Design Python-First Interactive Dashboards with Prefab Reactive UI Components and Static HTML Export

June 22, 2026
Cisco AI Introduces FAPO: Pipeline-Aware Prompt Optimization With Step-Level Failure Attribution and Claude Code Orchestration
Al, Analytics and Automation

Cisco AI Introduces FAPO: Pipeline-Aware Prompt Optimization With Step-Level Failure Attribution and Claude Code Orchestration

June 21, 2026
Crawlee for Python: Build a Web Crawling Pipeline with Robots Handling, Link Graphs, and RAG Chunk Export
Al, Analytics and Automation

Crawlee for Python: Build a Web Crawling Pipeline with Robots Handling, Link Graphs, and RAG Chunk Export

June 21, 2026
Yandex Open-Sources YaFF: A Zero-Copy Wire Format for Protobuf With Near-Struct Read Speed
Al, Analytics and Automation

Yandex Open-Sources YaFF: A Zero-Copy Wire Format for Protobuf With Near-Struct Read Speed

June 20, 2026
Next Post
Enterprises are measuring the wrong part of RAG

Enterprises are measuring the wrong part of RAG

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

Meta’s Edits app is getting an AI assistant and a desktop version

Meta’s Edits app is getting an AI assistant and a desktop version

June 11, 2026
Logo & Branding for Siuru by Bond — BP&O

Logo & Branding for Siuru by Bond — BP&O

June 2, 2025
Perplexity Just Released pplx-embed: New SOTA Qwen3 Bidirectional Embedding Models for Web-Scale Retrieval Tasks

Perplexity Just Released pplx-embed: New SOTA Qwen3 Bidirectional Embedding Models for Web-Scale Retrieval Tasks

February 27, 2026
Samsung’s Bespoke Update Is Big Step Towards A Useful AI For Your Fridge

Samsung’s Bespoke Update Is Big Step Towards A Useful AI For Your Fridge

May 11, 2026

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Dogsters’ Red Carpet and Puma’s Hidden Frequencies
  • Build Cross-Language Multi-Agent Team with Google’s Agent Development Kit and A2A
  • How to Create Interactive In-App Templates With Merlin AI In-App Template Generator
  • Israel Ranks #1 in the World for AI Adoption Per Capita; 95% of Israeli Tech Workers Now Use AI Daily, Joint 5WPR-Louder Study Finds
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions