Loss Functions
- Cross Entropy
    
- Entropy: \(H(p) = -\sum_{i=1}^{n}p(x_i)log_bp(x_i)\)
 - 
        
Cross Entropy $$H(p, q) = E_p[-log(q)] = H(p) + D_{KL}(p q)$$  - With discrete p and q: $H(p, q) = -\sum_xp(x) log(q(x))$
 
 
Logistic Regression
- Two common way to model logistic regression
    
- $f_{1}(x) = w_1* x$
 - $f_{2}(x) = \frac{1}{1+e^{-w_2*x}}$
 
 - With corresponding loss functions.
    
- logistic loss: $L_1(y, f_1(\vec{x})) = ln(1+ e^{-y*f_{1}(\vec{x})})$
        
- here $y\in {+1, -1}$
 
 - cross entropy loss: $L_2(y, f_2(x)) = -yln(f_{2}(x)) -  (1-y)ln(1-f_{2}(x))$
        
- here $y \in {0, 1}$
 
 
 - logistic loss: $L_1(y, f_1(\vec{x})) = ln(1+ e^{-y*f_{1}(\vec{x})})$
        
 - Gradients for each
    
- gradient for logistic loss:
 
\(\frac{\partial{L_1}}{\partial{w}} = x^T * \frac{1}{1+e^{-y*f_1(x)}}*e^{-y*f_1(x)}*-y \\ = x^T * y(\frac{1}{1+e^{-y*f_1(x)}} - 1)\)
- gradient for cross entropy loss: \(\frac{\partial{L_2}}{\partial{w}} = X^T*\{-y*\frac{f_2(x)*(1-f_2(x)}{f_2(x)} - (1-y)*\frac{-1*f_2(x)*(1-f_2(x))}{1 - f_2(x)}\} = X^T*(f_2(x)-y)\)
 
 - Hessians for each
    
- hessian for logistic loss: \(\frac{\partial^2{L_1}}{\partial{w}} = x^T * y * \{ \frac{1}{1+e^{-y*f_1(x)}}*(1-\frac{1}{1+e^{-y*f_1(x)}})*(-yx) \}\)
 - hessian for cross entropy loss: \(\frac{\partial^2{L_2}}{\partial{w}} = x^T*\{f_2(x)*(1- f_2(x))\} * x\)