• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Thursday, June 11, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

Decision Trees Aren’t Just for Tabular Data

Josh by Josh
July 15, 2025
in Al, Analytics and Automation
0
Decision Trees Aren’t Just for Tabular Data


Decision Trees Aren’t Just for Tabular Data

Decision Trees Aren’t Just for Tabular Data
Image by Editor | ChatGPT

Introduction

Versatile, interpretable, and effective for a variety of use cases, decision trees have been among the most well-established machine learning techniques for decades, widely used for classification and regression tasks. Yet, they are still widely used — whether as standalone models or as components of more powerful ensemble methods like random forests and gradient boosting machines.

READ ALSO

MIT affiliates win 2026 Hertz Foundation Fellowships | MIT News

Meet ‘North Mini Code’: Cohere’s 30B Open-Weight Mixture-of-Experts Model With 3B Active Parameters for Agentic Coding

And there is one more attractive feature that pushes the boundaries of their versatility even further: they can accommodate data in diverse formats, beyond just fully structured, tabular data. This article examines this facet of decision trees from a balanced theoretical and practical approach.

Quick Overview of Decision Trees

Decision trees are a type of supervised learning model for predictive tasks, namely, classification and regression. They are trained on a set of labeled examples, i.e. data examples with known prediction outputs, for instance, a set of collected animal specimens’ attributes along with the species each observation belongs to. The tree is built gradually, in parallel to a process in which the set of training data is iteratively and recursively partitioned into subsets, seeking as much class (or numerical label) homogeneity as possible per subset. Once trained, the model has learned a hierarchical set of decision rules applied to data attributes, visually represented as a tree (see image below).

Outline of a decision tree

Overview of a decision tree for penguin species classification
Image by Author

Applying inference to predict the label for an example with an unknown label consists of checking these rules or conditions from top to bottom, eventually leading to a “leaf node” pointing at a class or value prediction for that unknown label, depending on whether the problem entails classification or regression.

Beyond Tabular Data in Decision Trees

Structured or tabular data organized into instances (rows) that are described by numerical and categorical attributes (columns) constitutes the typical data format digested by most classical machine learning models, including decision trees. However, they can also accommodate datasets or parts of them that aren’t strictly tabular.

Common examples of non-tabular data include text, images, and time series. Through the application of suitable preprocessing techniques, these data formats can be converted into a more structured form. For instance, a text sequence like a customer review of a product can be made structured through feature extraction or embeddings before using them as inputs for a decision tree classifier for analyzing the positive or negative sentiment behind the customer review.

Another strategy to utilize decision trees in predictive tasks that contain partly unstructured data — for example, product data that combines tabular attributes with high-resolution images of that product — is to use hybrid solutions that combine a deep learning model with a decision tree. Take, for instance, a convolutional neural network (CNN) trained to extract features from an image in a structured format (inferring attributes like size, shape, colors, etc.), after which those image-based features are passed to a tree-based model like a random forest for calculating predictions, e.g. estimated product sales.

In research spheres, there have been efforts in adapting decision tree-based models directly for digesting non-tabular data like graphs and hierarchical data, although their mainstream application is still rare.

Practical Example

To wrap up with a bit of practical flavor, we will illustrate how to train a decision tree-based model on a dataset that combines purely tabular and text data.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

import pandas as pd

from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.tree import DecisionTreeClassifier

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import LabelEncoder

from sklearn.metrics import classification_report, confusion_matrix

import seaborn as sns

import matplotlib.pyplot as plt

from scipy.sparse import hstack

 

url = “https://raw.githubusercontent.com/gakudo-ai/open-datasets/refs/heads/main/customer_support_dataset.csv”

df = pd.read_csv(url)

 

df = df.dropna(subset=[‘prior_tickets’, ‘account_age_days’, ‘text’, ‘label’])

 

text_vec = TfidfVectorizer(max_features=1000, ngram_range=(1, 2), stop_words=‘english’)

X_text = text_vec.fit_transform(df[‘text’])

 

X_num = df[[‘prior_tickets’, ‘account_age_days’]].values

X = hstack([X_text, X_num])

 

le = LabelEncoder()

y = le.fit_transform(df[‘label’])

 

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

clf = DecisionTreeClassifier(max_depth=6, random_state=42)

clf.fit(X_train, y_train)

 

y_pred = clf.predict(X_test)

y_test_labels = le.inverse_transform(y_test)

y_pred_labels = le.inverse_transform(y_pred)

 

print(classification_report(y_test_labels, y_pred_labels, zero_division=0))

 

cm = confusion_matrix(y_test_labels, y_pred_labels, labels=le.classes_)

sns.heatmap(cm, annot=True, fmt=‘d’, xticklabels=le.classes_, yticklabels=le.classes_, cmap=‘Blues’)

plt.xlabel(‘Predicted’)

plt.ylabel(‘True’)

plt.title(‘Confusion Matrix’)

plt.show()

In essence, this code does the following:

  • uses a dataset that contains three predictor attributes describing customer support tickets
  • one of them is text, which needs to be preprocessed before being input to a decision tree model
  • a TF-IDF vectorizer is employed to obtain a vector representation of each text
  • afterwards, this new feature is merged with the other features to train a decision tree classifier and evaluate it on a test set

You may execute this code to train the model and get disappointed at its performance (roughly as many correct predictions as incorrect ones). This is expected, as we are using a small dataset with just 100 instances, and learning from text representations typically requires more instances.

Conclusion

This article discussed the capabilities of decision trees and decision tree-based machine learning models like random forests to accommodate data that is not strictly tabular. From text to images to time series, machine learning models and data can be preprocessed or combined together to accommodate data that many would at first glance think is impossible to digest.



Source_link

Related Posts

MIT affiliates win 2026 Hertz Foundation Fellowships | MIT News
Al, Analytics and Automation

MIT affiliates win 2026 Hertz Foundation Fellowships | MIT News

June 11, 2026
Meet ‘North Mini Code’: Cohere’s 30B Open-Weight Mixture-of-Experts Model With 3B Active Parameters for Agentic Coding
Al, Analytics and Automation

Meet ‘North Mini Code’: Cohere’s 30B Open-Weight Mixture-of-Experts Model With 3B Active Parameters for Agentic Coding

June 11, 2026
Building Semantic Search with Transformers.js and Sentence Embeddings
Al, Analytics and Automation

Building Semantic Search with Transformers.js and Sentence Embeddings

June 11, 2026
Startup’s nuclear-inspired cooling system could make data centers more sustainable | MIT News
Al, Analytics and Automation

Startup’s nuclear-inspired cooling system could make data centers more sustainable | MIT News

June 10, 2026
Top AI Coding Agents and Development Platforms in 2026: Atoms, Devin, Windsurf, Cursor, Warp, and More Compared
Al, Analytics and Automation

Top AI Coding Agents and Development Platforms in 2026: Atoms, Devin, Windsurf, Cursor, Warp, and More Compared

June 10, 2026
The Practitioner’s Guide to AgentOps
Al, Analytics and Automation

The Practitioner’s Guide to AgentOps

June 10, 2026
Next Post
Top 5 Email Tools Trusted by Lead Gen Agencies in 2025

Top 5 Email Tools Trusted by Lead Gen Agencies in 2025

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

The League of Legends KeSPA cup will air globally on Disney+

The League of Legends KeSPA cup will air globally on Disney+

April 7, 2026
Why Showing Up In Person Is Still Your Most Powerful Visibility Strategy

Why Showing Up In Person Is Still Your Most Powerful Visibility Strategy

April 21, 2026
Website Maintenance Services

Marketing Planning Services Calgary

July 8, 2025
The FCC plans to ban Chinese technology in undersea cables

The FCC plans to ban Chinese technology in undersea cables

July 17, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Meta’s Edits app is getting an AI assistant and a desktop version
  • Silverpush Strikes Gold (Thrice!) at The Drum Awards for Marketing 2026
  • MIT affiliates win 2026 Hertz Foundation Fellowships | MIT News
  • Why Data Fragmentation Is Undermining Canadian Brands’ AI Returns
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions