04 - Logistic Regression - lec. 5,6

ucla | CS M146 | 2023-04-17T15:31


Table of Contents

Supplemental

  • any event EE s.t. 0P(E)1
  • sum of probs 1=EEP(E)
  • logistic regression is a classification model
  • log is always base e i.e. logln
  • loss functionin classification is binary or softmax cross-entropy loss

Lecture

Classification using Probability

  • instead of predicting the class, predict the probability that inctance belongs to that class i.e. $P(y\bm x)$
  • binary calssification: y0,1 as events for an input \bmx

Logistic Regression

Logistic (Sigmoid) Regression Model/Func

  • hypothesis function is the probability in [0,1] i.e. $P_{\bm\theta}(y=1\bm x)$

h\bmθ(\bmx)=g(\bmθT\bmx)s.t.g(z)=11+ez

h\bmθ(\bmx)=1/[1+e\bmθT\bmx]

Interpreting Hypothesis function

  • hypo func gives probability label=1 given some input, e.g.
  • logistic regression assumes the log odds is a linear function of \bmx

    $\log\frac{P(y=1\bm x;\bm\theta)}{P(y=0\bm x;\bm\theta)}=\bm\theta^T\bm x$

Non-Linear Decision Boundary

  • we can applya basis function expansion to features just like we did for linear regression
  • NOTE: Loss functions don’t need to be averaged bc minimization via gradient descent will work the same regardless

Loss Function

  • loss of a single instance

$\ell(y^{(i)},\bm x^{(i)},\bm\theta)=\begin{cases}-\log \big(h_{\bm\theta}(\bm x^{(i)})\big) & y^{(i)}=1$

  • logistic regression loss

J(\bmθ)=in(y(i),\bmx(i),\bmθ)

J(\bmθ)=in[y(i)logh\bmθ(\bmx(i))+(1y(i))log(1h\bmθ(\bmx(i)))]

Intuition behind loss

  • non-linear loss implies largely wrong guesses result in much higher loss than less wrong guesses

Regularized Loss Function

  • Given the loss function

Jreg(\bmθ)=J(\bmθ)+λ2|\bmθ1:d|22

  • note the L2 norm is from index 1 to d
  • we don’t regularize basis

Gradient Desceent

  • weight updates (simultaneous) - similar as lin. reg. and perceptrons

θjθjαθjJ(\bmθ)

Multi-Class Classsification

Discussion

Resources


📌

**SUMMARY
**