07 - Kernels - lec. 9,10
ucla | CS M146 | 2023-05-01T13:59
Table of Contents
Supplemental
Linear Model regularization review
Lecture
Feature Maps (Basis Functions)
any function
e.g. for features
, we have a feature map
- with this, we can still use a linear hypothesis class:
Gradient descent w/ feature maps
![]() |
- each update requires computing the hypothesis → high dimensions feature maps require more computation
very expressive feature map (high dimension) tend to overfit → need regularization
Regularization w/ feature maps
Linear Model regularization review
for linear models with possible feature maps, we can express them as
![]() |
Representor Theorem
- used to efficiently train and evaluate expresive feature mapd
- theorem (special case): for any
, there exists some ral-valued coefficients (scalars) such that the minimizer of the regularized loss function is: - i.e. the optimal gradient for the regularized loss is
proof
Reparametrized Risk
- representor theorem allows us to reparametrize the
regularized loss:
thus, by representor theorem, we can use the minimized gradient into the loss function above to obtaain a reparametriz objective (loss) function w.r.t.
![]() |
- the kernel function can be precomputed for all training instances during training → we only optimize over
insteaad of computing th hypothesis (dot product) for each iteration
Kernel Trick
![]() |
- thus, we can (in some situations) compute the kernel dot product faster than the fature map itself
Kernelized Linear Model
- apply kernel trick: only compute kernel function instad of feature map → kernelized linear models
- thus, given a kernel function on a set of training intances of cardinallity
, we can dfine a kernell matrix
![]() |
Kernalized Ridge Regression
![]() |
Valid Kernels
- given a feature map, a kernel function defined as the dot product of the feature map with itself → defines a notion of similrity between the two vectors (data points)
- e.g.
shows similarity between and - other form of similarity include: euclidean distance, cosine similarity, any similar kernel function
- but any arbitrary kernel function does NOT imply a feature map
Is a Kernel Valid?
![]() |
Mercer’s Theorem
![]() |
e.g.
Kernel Composition Rules
- algebraic operation on known valid kernels → result in valid kernels
![]() |
Example: Gaussian Kernel (RBF)
- has. val 1 when
- value falls off to 0 w/ increasing distance
- note: needs feature scaling before using RBF
is the bandwidth hyperparameter → determines how much L2 norm affects the kernel
Discussion
Resources
📌
**SUMMARY
**