06 - Decision Trees - lec. 7,8

ucla | CS M146 | 2023-04-24T13:58


Table of Contents

Supplemental

Lecture

Decision Tree

  • each internal node → test one feature $\bm x_j^{(i)}$
  • each branch/edge → selects one value for $\bm x_j^{(i)}$
  • each leaf node → predicts $y^{(i)}$
  • if features are continuous → internal nodes can test against a threshold

I/O

Decision Boundary

Example from Introspection

Optimizing for the Best Decision Tree

  • Occam’s razor: simplest, consistent answer is the best
  • i.e. smallest tree that correctly calssifies all training examples
  • but, finding provably smalles tree iss NP-hard → instead construct one that is pretty small

Iterative Dichotimiser (ID3) Algorithm

Choosing Best Attribute

  • choosing which attribute to split given a set of examples → to choose root and internal nodes
  • naive possibilities
    • random
    • least-values - choose feature w smallest number of possibl values
    • highest accuracy - choose feaature w largest accuracy
  • ID3 uses Max-Gain → one that has the largest expected information gain

Entropy

  • measure the impurity in a group

  • entropy of a fair coin

  • the higher the entropy → higher the uncertainty
  • multi-class entropy

Conditional Entropy

Information Gain

Choosing Attributes

  • compute inormation gain for each attribute, pick whichever is higher

  • for the next attribute, look at only the branches which have entropy and sleect the one with the best information gain

Full ID3 Algo

Pruning Sub-trees

  • pre-pruning → while building

    • early stopping → for future classifications choose the class of the majority of the samples

  • post-pruning → after building
    • reduced error pruning

  • tree-depth is a hyperparameter

Discussion

Resources


📌

**SUMMARY
**