• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Monday, April 27, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

Decision Trees Aren’t Just for Tabular Data

Josh by Josh
July 15, 2025
in Al, Analytics and Automation
0
Decision Trees Aren’t Just for Tabular Data


Decision Trees Aren’t Just for Tabular Data

Decision Trees Aren’t Just for Tabular Data
Image by Editor | ChatGPT

Introduction

Versatile, interpretable, and effective for a variety of use cases, decision trees have been among the most well-established machine learning techniques for decades, widely used for classification and regression tasks. Yet, they are still widely used — whether as standalone models or as components of more powerful ensemble methods like random forests and gradient boosting machines.

READ ALSO

The LoRA Assumption That Breaks in Production 

Top 7 Benchmarks That Actually Matter for Agentic Reasoning in Large Language Models

And there is one more attractive feature that pushes the boundaries of their versatility even further: they can accommodate data in diverse formats, beyond just fully structured, tabular data. This article examines this facet of decision trees from a balanced theoretical and practical approach.

Quick Overview of Decision Trees

Decision trees are a type of supervised learning model for predictive tasks, namely, classification and regression. They are trained on a set of labeled examples, i.e. data examples with known prediction outputs, for instance, a set of collected animal specimens’ attributes along with the species each observation belongs to. The tree is built gradually, in parallel to a process in which the set of training data is iteratively and recursively partitioned into subsets, seeking as much class (or numerical label) homogeneity as possible per subset. Once trained, the model has learned a hierarchical set of decision rules applied to data attributes, visually represented as a tree (see image below).

Outline of a decision tree

Overview of a decision tree for penguin species classification
Image by Author

Applying inference to predict the label for an example with an unknown label consists of checking these rules or conditions from top to bottom, eventually leading to a “leaf node” pointing at a class or value prediction for that unknown label, depending on whether the problem entails classification or regression.

Beyond Tabular Data in Decision Trees

Structured or tabular data organized into instances (rows) that are described by numerical and categorical attributes (columns) constitutes the typical data format digested by most classical machine learning models, including decision trees. However, they can also accommodate datasets or parts of them that aren’t strictly tabular.

Common examples of non-tabular data include text, images, and time series. Through the application of suitable preprocessing techniques, these data formats can be converted into a more structured form. For instance, a text sequence like a customer review of a product can be made structured through feature extraction or embeddings before using them as inputs for a decision tree classifier for analyzing the positive or negative sentiment behind the customer review.

Another strategy to utilize decision trees in predictive tasks that contain partly unstructured data — for example, product data that combines tabular attributes with high-resolution images of that product — is to use hybrid solutions that combine a deep learning model with a decision tree. Take, for instance, a convolutional neural network (CNN) trained to extract features from an image in a structured format (inferring attributes like size, shape, colors, etc.), after which those image-based features are passed to a tree-based model like a random forest for calculating predictions, e.g. estimated product sales.

In research spheres, there have been efforts in adapting decision tree-based models directly for digesting non-tabular data like graphs and hierarchical data, although their mainstream application is still rare.

Practical Example

To wrap up with a bit of practical flavor, we will illustrate how to train a decision tree-based model on a dataset that combines purely tabular and text data.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

import pandas as pd

from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.tree import DecisionTreeClassifier

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import LabelEncoder

from sklearn.metrics import classification_report, confusion_matrix

import seaborn as sns

import matplotlib.pyplot as plt

from scipy.sparse import hstack

 

url = “https://raw.githubusercontent.com/gakudo-ai/open-datasets/refs/heads/main/customer_support_dataset.csv”

df = pd.read_csv(url)

 

df = df.dropna(subset=[‘prior_tickets’, ‘account_age_days’, ‘text’, ‘label’])

 

text_vec = TfidfVectorizer(max_features=1000, ngram_range=(1, 2), stop_words=‘english’)

X_text = text_vec.fit_transform(df[‘text’])

 

X_num = df[[‘prior_tickets’, ‘account_age_days’]].values

X = hstack([X_text, X_num])

 

le = LabelEncoder()

y = le.fit_transform(df[‘label’])

 

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

clf = DecisionTreeClassifier(max_depth=6, random_state=42)

clf.fit(X_train, y_train)

 

y_pred = clf.predict(X_test)

y_test_labels = le.inverse_transform(y_test)

y_pred_labels = le.inverse_transform(y_pred)

 

print(classification_report(y_test_labels, y_pred_labels, zero_division=0))

 

cm = confusion_matrix(y_test_labels, y_pred_labels, labels=le.classes_)

sns.heatmap(cm, annot=True, fmt=‘d’, xticklabels=le.classes_, yticklabels=le.classes_, cmap=‘Blues’)

plt.xlabel(‘Predicted’)

plt.ylabel(‘True’)

plt.title(‘Confusion Matrix’)

plt.show()

In essence, this code does the following:

  • uses a dataset that contains three predictor attributes describing customer support tickets
  • one of them is text, which needs to be preprocessed before being input to a decision tree model
  • a TF-IDF vectorizer is employed to obtain a vector representation of each text
  • afterwards, this new feature is merged with the other features to train a decision tree classifier and evaluate it on a test set

You may execute this code to train the model and get disappointed at its performance (roughly as many correct predictions as incorrect ones). This is expected, as we are using a small dataset with just 100 instances, and learning from text representations typically requires more instances.

Conclusion

This article discussed the capabilities of decision trees and decision tree-based machine learning models like random forests to accommodate data that is not strictly tabular. From text to images to time series, machine learning models and data can be preprocessed or combined together to accommodate data that many would at first glance think is impossible to digest.



Source_link

Related Posts

The LoRA Assumption That Breaks in Production 
Al, Analytics and Automation

The LoRA Assumption That Breaks in Production 

April 27, 2026
Top 7 Benchmarks That Actually Matter for Agentic Reasoning in Large Language Models
Al, Analytics and Automation

Top 7 Benchmarks That Actually Matter for Agentic Reasoning in Large Language Models

April 26, 2026
Al, Analytics and Automation

RAG Without Vectors: How PageIndex Retrieves by Reasoning

April 26, 2026
Meet GitNexus: An Open-Source MCP-Native Knowledge Graph Engine That Gives Claude Code and Cursor Full Codebase Structural Awareness
Al, Analytics and Automation

Meet GitNexus: An Open-Source MCP-Native Knowledge Graph Engine That Gives Claude Code and Cursor Full Codebase Structural Awareness

April 25, 2026
Google DeepMind Introduces Vision Banana: An Instruction-Tuned Image Generator That Beats SAM 3 on Segmentation and Depth Anything V3 on Metric Depth Estimation
Al, Analytics and Automation

Google DeepMind Introduces Vision Banana: An Instruction-Tuned Image Generator That Beats SAM 3 on Segmentation and Depth Anything V3 on Metric Depth Estimation

April 25, 2026
MIT scientists build the world’s largest collection of Olympiad-level math problems, and open it to everyone | MIT News
Al, Analytics and Automation

MIT scientists build the world’s largest collection of Olympiad-level math problems, and open it to everyone | MIT News

April 24, 2026
Next Post
Top 5 Email Tools Trusted by Lead Gen Agencies in 2025

Top 5 Email Tools Trusted by Lead Gen Agencies in 2025

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

What Is EEAT & How to Adhere To It With AI Content?

What Is EEAT & How to Adhere To It With AI Content?

April 14, 2026
3 new ways we’re making web payments easier

3 new ways we’re making web payments easier

August 18, 2025
‘Ask Gemini’ AI will tell you what you missed during a Google Meet call

‘Ask Gemini’ AI will tell you what you missed during a Google Meet call

September 17, 2025

Government introduces new rules for public university admission

April 24, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • 8 Best E-commerce Analytics Software I Recommend for 2026
  • 30 B2B Marketing Communities – Which is Best for You? – TopRank® Marketing
  • AI has changed entry-level hiring. Most interviews haven’t.
  • The LoRA Assumption That Breaks in Production 
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions