• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Thursday, March 12, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

Decision Trees Aren’t Just for Tabular Data

Josh by Josh
July 15, 2025
in Al, Analytics and Automation
0
Decision Trees Aren’t Just for Tabular Data


Decision Trees Aren’t Just for Tabular Data

Decision Trees Aren’t Just for Tabular Data
Image by Editor | ChatGPT

Introduction

Versatile, interpretable, and effective for a variety of use cases, decision trees have been among the most well-established machine learning techniques for decades, widely used for classification and regression tasks. Yet, they are still widely used — whether as standalone models or as components of more powerful ensemble methods like random forests and gradient boosting machines.

READ ALSO

How to Design a Streaming Decision Agent with Partial Reasoning, Online Replanning, and Reactive Mid-Execution Adaptation in Dynamic Environments

3 Questions: On the future of AI and the mathematical and physical sciences | MIT News

And there is one more attractive feature that pushes the boundaries of their versatility even further: they can accommodate data in diverse formats, beyond just fully structured, tabular data. This article examines this facet of decision trees from a balanced theoretical and practical approach.

Quick Overview of Decision Trees

Decision trees are a type of supervised learning model for predictive tasks, namely, classification and regression. They are trained on a set of labeled examples, i.e. data examples with known prediction outputs, for instance, a set of collected animal specimens’ attributes along with the species each observation belongs to. The tree is built gradually, in parallel to a process in which the set of training data is iteratively and recursively partitioned into subsets, seeking as much class (or numerical label) homogeneity as possible per subset. Once trained, the model has learned a hierarchical set of decision rules applied to data attributes, visually represented as a tree (see image below).

Outline of a decision tree

Overview of a decision tree for penguin species classification
Image by Author

Applying inference to predict the label for an example with an unknown label consists of checking these rules or conditions from top to bottom, eventually leading to a “leaf node” pointing at a class or value prediction for that unknown label, depending on whether the problem entails classification or regression.

Beyond Tabular Data in Decision Trees

Structured or tabular data organized into instances (rows) that are described by numerical and categorical attributes (columns) constitutes the typical data format digested by most classical machine learning models, including decision trees. However, they can also accommodate datasets or parts of them that aren’t strictly tabular.

Common examples of non-tabular data include text, images, and time series. Through the application of suitable preprocessing techniques, these data formats can be converted into a more structured form. For instance, a text sequence like a customer review of a product can be made structured through feature extraction or embeddings before using them as inputs for a decision tree classifier for analyzing the positive or negative sentiment behind the customer review.

Another strategy to utilize decision trees in predictive tasks that contain partly unstructured data — for example, product data that combines tabular attributes with high-resolution images of that product — is to use hybrid solutions that combine a deep learning model with a decision tree. Take, for instance, a convolutional neural network (CNN) trained to extract features from an image in a structured format (inferring attributes like size, shape, colors, etc.), after which those image-based features are passed to a tree-based model like a random forest for calculating predictions, e.g. estimated product sales.

In research spheres, there have been efforts in adapting decision tree-based models directly for digesting non-tabular data like graphs and hierarchical data, although their mainstream application is still rare.

Practical Example

To wrap up with a bit of practical flavor, we will illustrate how to train a decision tree-based model on a dataset that combines purely tabular and text data.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

import pandas as pd

from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.tree import DecisionTreeClassifier

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import LabelEncoder

from sklearn.metrics import classification_report, confusion_matrix

import seaborn as sns

import matplotlib.pyplot as plt

from scipy.sparse import hstack

 

url = “https://raw.githubusercontent.com/gakudo-ai/open-datasets/refs/heads/main/customer_support_dataset.csv”

df = pd.read_csv(url)

 

df = df.dropna(subset=[‘prior_tickets’, ‘account_age_days’, ‘text’, ‘label’])

 

text_vec = TfidfVectorizer(max_features=1000, ngram_range=(1, 2), stop_words=‘english’)

X_text = text_vec.fit_transform(df[‘text’])

 

X_num = df[[‘prior_tickets’, ‘account_age_days’]].values

X = hstack([X_text, X_num])

 

le = LabelEncoder()

y = le.fit_transform(df[‘label’])

 

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

clf = DecisionTreeClassifier(max_depth=6, random_state=42)

clf.fit(X_train, y_train)

 

y_pred = clf.predict(X_test)

y_test_labels = le.inverse_transform(y_test)

y_pred_labels = le.inverse_transform(y_pred)

 

print(classification_report(y_test_labels, y_pred_labels, zero_division=0))

 

cm = confusion_matrix(y_test_labels, y_pred_labels, labels=le.classes_)

sns.heatmap(cm, annot=True, fmt=‘d’, xticklabels=le.classes_, yticklabels=le.classes_, cmap=‘Blues’)

plt.xlabel(‘Predicted’)

plt.ylabel(‘True’)

plt.title(‘Confusion Matrix’)

plt.show()

In essence, this code does the following:

  • uses a dataset that contains three predictor attributes describing customer support tickets
  • one of them is text, which needs to be preprocessed before being input to a decision tree model
  • a TF-IDF vectorizer is employed to obtain a vector representation of each text
  • afterwards, this new feature is merged with the other features to train a decision tree classifier and evaluate it on a test set

You may execute this code to train the model and get disappointed at its performance (roughly as many correct predictions as incorrect ones). This is expected, as we are using a small dataset with just 100 instances, and learning from text representations typically requires more instances.

Conclusion

This article discussed the capabilities of decision trees and decision tree-based machine learning models like random forests to accommodate data that is not strictly tabular. From text to images to time series, machine learning models and data can be preprocessed or combined together to accommodate data that many would at first glance think is impossible to digest.



Source_link

Related Posts

How to Design a Streaming Decision Agent with Partial Reasoning, Online Replanning, and Reactive Mid-Execution Adaptation in Dynamic Environments
Al, Analytics and Automation

How to Design a Streaming Decision Agent with Partial Reasoning, Online Replanning, and Reactive Mid-Execution Adaptation in Dynamic Environments

March 12, 2026
3 Questions: On the future of AI and the mathematical and physical sciences | MIT News
Al, Analytics and Automation

3 Questions: On the future of AI and the mathematical and physical sciences | MIT News

March 12, 2026
NVIDIA Releases Nemotron 3 Super: A 120B Parameter Open-Source Hybrid Mamba-Attention MoE Model Delivering 5x Higher Throughput for Agentic AI
Al, Analytics and Automation

NVIDIA Releases Nemotron 3 Super: A 120B Parameter Open-Source Hybrid Mamba-Attention MoE Model Delivering 5x Higher Throughput for Agentic AI

March 11, 2026
A better method for planning complex visual tasks | MIT News
Al, Analytics and Automation

A better method for planning complex visual tasks | MIT News

March 11, 2026
Google AI Introduces Gemini Embedding 2: A Multimodal Embedding Model that Lets Your Bring Text, Images, Video, Audio, and Docs into the Embedding Space
Al, Analytics and Automation

Google AI Introduces Gemini Embedding 2: A Multimodal Embedding Model that Lets Your Bring Text, Images, Video, Audio, and Docs into the Embedding Space

March 11, 2026
AI Is Learning From the News. Now Publishers Want to Get Paid
Al, Analytics and Automation

AI Is Learning From the News. Now Publishers Want to Get Paid

March 11, 2026
Next Post
Top 5 Email Tools Trusted by Lead Gen Agencies in 2025

Top 5 Email Tools Trusted by Lead Gen Agencies in 2025

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Google announced the next step in its nuclear energy plans 

Google announced the next step in its nuclear energy plans 

August 20, 2025

EDITOR'S PICK

Google and California Community Colleges announce AI partnership

Google and California Community Colleges announce AI partnership

September 14, 2025
How To Start A New Business And Get On The Right Track?

How To Start A New Business And Get On The Right Track?

June 17, 2025
9 Best SEO Content Writing Tools We Like in 2026

9 Best SEO Content Writing Tools We Like in 2026

January 27, 2026
The Best London Printing Service

The Best London Printing Service

May 31, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • The non-obvious guide to understanding people on social media
  • CarFax Accident Impact on Trade-In Value
  • NVIDIA- and Uber-backed Nuro is testing autonomous vehicles in Tokyo
  • How to Design a Streaming Decision Agent with Partial Reasoning, Online Replanning, and Reactive Mid-Execution Adaptation in Dynamic Environments
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions