06 - Decision Trees - lec. 7,8

ucla | CS M146 | 2023-04-24T13:58

Table of Contents

Supplemental
Lecture
- Decision Tree
- Full ID3 Algo
  - Pruning Sub-trees
Discussion
Resources

Supplemental

Lecture

Decision Tree

each internal node → test one feature $\bm x_j^{(i)}$
each branch/edge → selects one value for $\bm x_j^{(i)}$
each leaf node → predicts $y^{(i)}$
if features are continuous → internal nodes can test against a threshold

I/O

Decision Boundary

Example from Introspection

Optimizing for the Best Decision Tree

Occam’s razor: simplest, consistent answer is the best
i.e. smallest tree that correctly calssifies all training examples
but, finding provably smalles tree iss NP-hard → instead construct one that is pretty small

Iterative Dichotimiser (ID3) Algorithm

Choosing Best Attribute

choosing which attribute to split given a set of examples → to choose root and internal nodes
naive possibilities
- random
- least-values - choose feature w smallest number of possibl values
- highest accuracy - choose feaature w largest accuracy
ID3 uses Max-Gain → one that has the largest expected information gain

Entropy

measure the impurity in a group

entropy of a fair coin
the higher the entropy → higher the uncertainty
multi-class entropy

Conditional Entropy

Information Gain

Choosing Attributes

compute inormation gain for each attribute, pick whichever is higher
for the next attribute, look at only the branches which have entropy and sleect the one with the best information gain

Full ID3 Algo

Pruning Sub-trees

pre-pruning → while building
- early stopping → for future classifications choose the class of the majority of the samples
post-pruning → after building
- reduced error pruning
tree-depth is a hyperparameter

Discussion

Resources

📌

**SUMMARY
**