• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Sunday, October 26, 2025
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

Decision Trees Aren’t Just for Tabular Data

Josh by Josh
July 15, 2025
in Al, Analytics and Automation
0
Decision Trees Aren’t Just for Tabular Data
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


Decision Trees Aren’t Just for Tabular Data

Decision Trees Aren’t Just for Tabular Data
Image by Editor | ChatGPT

Introduction

Versatile, interpretable, and effective for a variety of use cases, decision trees have been among the most well-established machine learning techniques for decades, widely used for classification and regression tasks. Yet, they are still widely used — whether as standalone models or as components of more powerful ensemble methods like random forests and gradient boosting machines.

READ ALSO

Tried Fantasy GF Hentai Generator for 1 Month: My Experience

How to Build, Train, and Compare Multiple Reinforcement Learning Agents in a Custom Trading Environment Using Stable-Baselines3

And there is one more attractive feature that pushes the boundaries of their versatility even further: they can accommodate data in diverse formats, beyond just fully structured, tabular data. This article examines this facet of decision trees from a balanced theoretical and practical approach.

Quick Overview of Decision Trees

Decision trees are a type of supervised learning model for predictive tasks, namely, classification and regression. They are trained on a set of labeled examples, i.e. data examples with known prediction outputs, for instance, a set of collected animal specimens’ attributes along with the species each observation belongs to. The tree is built gradually, in parallel to a process in which the set of training data is iteratively and recursively partitioned into subsets, seeking as much class (or numerical label) homogeneity as possible per subset. Once trained, the model has learned a hierarchical set of decision rules applied to data attributes, visually represented as a tree (see image below).

Outline of a decision tree

Overview of a decision tree for penguin species classification
Image by Author

Applying inference to predict the label for an example with an unknown label consists of checking these rules or conditions from top to bottom, eventually leading to a “leaf node” pointing at a class or value prediction for that unknown label, depending on whether the problem entails classification or regression.

Beyond Tabular Data in Decision Trees

Structured or tabular data organized into instances (rows) that are described by numerical and categorical attributes (columns) constitutes the typical data format digested by most classical machine learning models, including decision trees. However, they can also accommodate datasets or parts of them that aren’t strictly tabular.

Common examples of non-tabular data include text, images, and time series. Through the application of suitable preprocessing techniques, these data formats can be converted into a more structured form. For instance, a text sequence like a customer review of a product can be made structured through feature extraction or embeddings before using them as inputs for a decision tree classifier for analyzing the positive or negative sentiment behind the customer review.

Another strategy to utilize decision trees in predictive tasks that contain partly unstructured data — for example, product data that combines tabular attributes with high-resolution images of that product — is to use hybrid solutions that combine a deep learning model with a decision tree. Take, for instance, a convolutional neural network (CNN) trained to extract features from an image in a structured format (inferring attributes like size, shape, colors, etc.), after which those image-based features are passed to a tree-based model like a random forest for calculating predictions, e.g. estimated product sales.

In research spheres, there have been efforts in adapting decision tree-based models directly for digesting non-tabular data like graphs and hierarchical data, although their mainstream application is still rare.

Practical Example

To wrap up with a bit of practical flavor, we will illustrate how to train a decision tree-based model on a dataset that combines purely tabular and text data.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

import pandas as pd

from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.tree import DecisionTreeClassifier

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import LabelEncoder

from sklearn.metrics import classification_report, confusion_matrix

import seaborn as sns

import matplotlib.pyplot as plt

from scipy.sparse import hstack

 

url = “https://raw.githubusercontent.com/gakudo-ai/open-datasets/refs/heads/main/customer_support_dataset.csv”

df = pd.read_csv(url)

 

df = df.dropna(subset=[‘prior_tickets’, ‘account_age_days’, ‘text’, ‘label’])

 

text_vec = TfidfVectorizer(max_features=1000, ngram_range=(1, 2), stop_words=‘english’)

X_text = text_vec.fit_transform(df[‘text’])

 

X_num = df[[‘prior_tickets’, ‘account_age_days’]].values

X = hstack([X_text, X_num])

 

le = LabelEncoder()

y = le.fit_transform(df[‘label’])

 

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

clf = DecisionTreeClassifier(max_depth=6, random_state=42)

clf.fit(X_train, y_train)

 

y_pred = clf.predict(X_test)

y_test_labels = le.inverse_transform(y_test)

y_pred_labels = le.inverse_transform(y_pred)

 

print(classification_report(y_test_labels, y_pred_labels, zero_division=0))

 

cm = confusion_matrix(y_test_labels, y_pred_labels, labels=le.classes_)

sns.heatmap(cm, annot=True, fmt=‘d’, xticklabels=le.classes_, yticklabels=le.classes_, cmap=‘Blues’)

plt.xlabel(‘Predicted’)

plt.ylabel(‘True’)

plt.title(‘Confusion Matrix’)

plt.show()

In essence, this code does the following:

  • uses a dataset that contains three predictor attributes describing customer support tickets
  • one of them is text, which needs to be preprocessed before being input to a decision tree model
  • a TF-IDF vectorizer is employed to obtain a vector representation of each text
  • afterwards, this new feature is merged with the other features to train a decision tree classifier and evaluate it on a test set

You may execute this code to train the model and get disappointed at its performance (roughly as many correct predictions as incorrect ones). This is expected, as we are using a small dataset with just 100 instances, and learning from text representations typically requires more instances.

Conclusion

This article discussed the capabilities of decision trees and decision tree-based machine learning models like random forests to accommodate data that is not strictly tabular. From text to images to time series, machine learning models and data can be preprocessed or combined together to accommodate data that many would at first glance think is impossible to digest.



Source_link

Related Posts

Tried Fantasy GF Hentai Generator for 1 Month: My Experience
Al, Analytics and Automation

Tried Fantasy GF Hentai Generator for 1 Month: My Experience

October 26, 2025
How to Build, Train, and Compare Multiple Reinforcement Learning Agents in a Custom Trading Environment Using Stable-Baselines3
Al, Analytics and Automation

How to Build, Train, and Compare Multiple Reinforcement Learning Agents in a Custom Trading Environment Using Stable-Baselines3

October 26, 2025
Future-Proofing Your AI Engineering Career in 2026
Al, Analytics and Automation

Future-Proofing Your AI Engineering Career in 2026

October 26, 2025
AIAllure Video Generator: My Unfiltered Thoughts
Al, Analytics and Automation

AIAllure Video Generator: My Unfiltered Thoughts

October 26, 2025
How to Build a Fully Functional Computer-Use Agent that Thinks, Plans, and Executes Virtual Actions Using Local AI Models
Al, Analytics and Automation

How to Build a Fully Functional Computer-Use Agent that Thinks, Plans, and Executes Virtual Actions Using Local AI Models

October 26, 2025
7 Must-Know Agentic AI Design Patterns
Al, Analytics and Automation

7 Must-Know Agentic AI Design Patterns

October 25, 2025
Next Post
Top 5 Email Tools Trusted by Lead Gen Agencies in 2025

Top 5 Email Tools Trusted by Lead Gen Agencies in 2025

POPULAR NEWS

Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
7 Best EOR Platforms for Software Companies in 2025

7 Best EOR Platforms for Software Companies in 2025

June 21, 2025

EDITOR'S PICK

July Edition of Future Horizons Semiconductor Report

July Edition of Future Horizons Semiconductor Report

July 23, 2025
Logo & Branding for Siuru by Bond — BP&O

Logo & Branding for Siuru by Bond — BP&O

June 2, 2025
What is Substack? Everything businesses need to know

What is Substack? Everything businesses need to know

October 26, 2025
Learning how to predict rare kinds of failures | MIT News

Learning how to predict rare kinds of failures | MIT News

June 6, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Tried Fantasy GF Hentai Generator for 1 Month: My Experience
  • The Power of Multi-Channel Discovery in Best Answer Marketing – TopRank® Marketing
  • Restrictions on Custom and Lookalike Audiences
  • Less than 24 hours until Disrupt 2025 — and ticket rates rise
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?