02 - Generalization - lec. 3,4
ucla | CS M146 | 2023-04-10T14:02
Table of Contents
- Supplemental
- Lecture
- Linear Basis
- Linear Regression: lin. feature transforms
- Extended Linear Regression: non-lin. feature transforms
- Generalization
- generalization - ability of ML model to make good predictions on unseen (test) data
- Option 1: Cross Validation (validation split)
- @import url(‘https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.13.2/katex.min.css’)
-fold Cross Validation - Option 2: Regularization
- Ridge Regression: @import url(‘https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.13.2/katex.min.css’)
-regularized
- Hyperparameters
- Discussion
- Resources
Supplemental
- dimensionality - line or hyperplane, dependent on dimensionality of features
- linear or extended linear - dependent on feature transformations (polynomial, root, etc.)
Lecture
- the term linear is generally. reserved for function linear w.r.t. the weights, so we can input non linear transformations of the inputs/features into linear regression
Linear Basis
Linear Basis Function Models
is a k-dimensional basis w/ params usually so first param is till bias can b different from feature dimension , e.g. polynomial reg:
Linear Regression: lin. feature transforms
![]() |
Extended Linear Regression: non-lin. feature transforms
![]() |
Generalization
more complex is not always better - overfitting (on training data)
generalization - ability of ML model to make good predictions on unseen (test) data
- the LSE loss we looked at so far is ON TRAINING DATA - empirical risk minimization (ERM)
- does ERM generalize on unseen
- theoretically
- depends on hypothesis class, data size, learning algo → learning theory
- empirically
- can assess via validation data
- algorithmically
- can strengthen via regularization
- underfitting - hypothesis is not very expressive/complex for data
- overfitting - hypo is too complex for data
- hypothesis complexity - hard to define, polynomial degree for regression
- an
-degree polynomial can reach 0 loss on size dataset easily
- an
Option 1: Cross Validation (validation split)
train, validation, test - split
if test is similar to validation (similar distribution) - generalization is achieved
@import url(‘https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.13.2/katex.min.css’) -fold Cross Validation
- partition dataset of
instances into disjoint folds (subsets) - choose fold
as the validation - train on
remaining folds and cross validate and eval accuracy on - compute average over
folds or chose best model on a certain fold - “leave-one-out”:
visual
Option 2: Regularization
- eliminating features, getting more data - regularize dataset
- loss regularization - method to prevent overfitting by controlling complexity of the learned hypothesis
- penalize large weights (absolute) during optimization → loss
Ridge Regression: @import url(‘https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.13.2/katex.min.css’) -regularized
![]() |
is the regularization hyperparameter → appends squared L2 norm of weights onto loss → when minimizing, we know try to minimize regularization too
$\sum^d_{j=1}\theta_j^2=|\bm\theta_{1:d}|2^2=|\bm\theta{1:d}-\vec0|_2^2$
- pulls weights towards the origin (minimizes)
![]() |
- vectorized
![]() |
Hyperparameters
- additional unknowns (other than weights) for improving learning -
, model hyperparameters - influence representation
- hypothesis class
- basis function
- hypothesis class
algorithmic hyperparameters - influence traning
- learning rate
- regularization coefficient
- batch size
- learning rate
- model selection: best hyperparams are ones that help generalize → eval based on valudation loss
Discussion
Linear Basis Function Models
examples: polynomial and gaussian
Regularized Linear (Ridge) Regression
![]() |
Closed form
Resources
📌
**SUMMARY
**