06 - Decision Trees - lec. 7,8
ucla | CS M146 | 2023-04-24T13:58
Table of Contents
Supplemental
Lecture
Decision Tree
- each internal node → test one feature $\bm x_j^{(i)}$
- each branch/edge → selects one value for $\bm x_j^{(i)}$
- each leaf node → predicts $y^{(i)}$
- if features are continuous → internal nodes can test against a threshold
I/O
![]() |
Decision Boundary
![]() |
Example from Introspection
![]() |
Optimizing for the Best Decision Tree
- Occam’s razor: simplest, consistent answer is the best
- i.e. smallest tree that correctly calssifies all training examples
- but, finding provably smalles tree iss NP-hard → instead construct one that is pretty small
Iterative Dichotimiser (ID3) Algorithm
![]() |
Choosing Best Attribute
- choosing which attribute to split given a set of examples → to choose root and internal nodes
- naive possibilities
- random
- least-values - choose feature w smallest number of possibl values
- highest accuracy - choose feaature w largest accuracy
ID3 uses Max-Gain → one that has the largest expected information gain

Entropy
measure the impurity in a group

![]() |
entropy of a fair coin

- the higher the entropy → higher the uncertainty
multi-class entropy

Conditional Entropy
![]() |
Information Gain
![]() |
![]() |
Choosing Attributes
compute inormation gain for each attribute, pick whichever is higher

for the next attribute, look at only the branches which have entropy and sleect the one with the best information gain
Full ID3 Algo
![]() |
Pruning Sub-trees
pre-pruning → while building

early stopping → for future classifications choose the class of the majority of the samples

- post-pruning → after building
reduced error pruning

- tree-depth is a hyperparameter
Discussion
Resources
📌
**SUMMARY
**








