Polynomial Regression

Xw=yXw = y

XTXw=XTyX^TXw = X^Ty

(XTX)1XTXw=(XTX)1XTy(X^TX)^{-1}X^TXw = (X^TX)^{-1}X^Ty

w=(XTX)1XTyw = (X^TX)^{-1}X^Ty

continuous and discrete refers to output

slope:斜率 intercept:截距

http://scikit-learn.org/stable/modules/linear\_model.html

def studentReg(ages_train, net_worths_train):
    ### import the sklearn regression module, create, and train your regression
    ### name your regression reg

    ### your code goes here!
    from sklearn.linear_model import LinearRegression
    reg = LinearRegression()
    reg.fit(ages_train, net_worths_train)
    return reg

import numpy
import matplotlib.pyplot as plt

from ages_net_worths import ageNetWorthData

ages_train, ages_test, net_worths_train, net_worths_test = ageNetWorthData()



from sklearn.linear_model import LinearRegression

reg = LinearRegression()
reg.fit(ages_train, net_worths_train)

### get Katie's net worth (she's 27)
### sklearn predictions are returned in an array, so you'll want to index into
### the output to get what you want, e.g. net_worth = predict([[27]])[0][0] (not
### exact syntax, the point is the [0] at the end). In addition, make sure the
### argument to your prediction function is in the expected format - if you get
### a warning about needing a 2d array for your data, a list of lists will be
### interpreted by sklearn as such (e.g. [[27]]).
km_net_worth = reg.predict([[27]])[0][0] ### fill in the line of code to get the right value

### get the slope
### again, you'll get a 2-D array, so stick the [0][0] at the end
slope = reg.coef_[0][0] ### fill in the line of code to get the right value

### get the intercept
### here you get a 1-D array, so stick [0] on the end to access
### the info we want
intercept = reg.intercept_[0] ### fill in the line of code to get the right value


### get the score on test data
test_score = reg.score(ages_test, net_worths_test) ### fill in the line of code to get the right value


### get the score on the training data
training_score = reg.score(ages_train, net_worths_train) ### fill in the line of code to get the right value



def submitFit():
    # all of the values in the returned dictionary are expected to be
    # numbers for the purpose of the grader.
    return {"networth":km_net_worth,
            "slope":slope,
            "intercept":intercept,
            "stats on test":test_score,
            "stats on training": training_score}

The goal is the find m and b in equation y = mx+b that minimizes

allTrainingPoints(acutalpredicted)2\sum_{allTrainingPoints}{(acutal-predicted)^2}

Algorithms for Minimizing Squared Errors

  1. ordinary least squares(OLS):最小二乘法,在sklearn中使
  2. gradient descent (梯度下降)

R Squared Metric For Regression

r2r^2 answers the question

"how much of my change in the output y is explained by the change in my input x

K nearest neighbor(KNN)

take k nearest points, take average y

Kernel regression

weighted KNN

#
#
# Regression and Classification programming exercises
#
#


#
#    In this exercise we will be taking a small data set and computing a linear function
#    that fits it, by hand.
#    

#    the data set

import numpy as np

sleep = [5,6,7,8,10]
scores = [65,51,75,75,86]


def compute_regression(sleep,scores):

    #    First, compute the average amount of each list

    avg_sleep = np.mean(sleep)
    avg_scores = np.mean(scores)

    #    Then normalize the lists by subtracting the mean value from each entry

    normalized_sleep = sleep - avg_sleep
    normalized_scores = scores - avg_scores

    #    Compute the slope of the line by taking the sum over each student
    #    of the product of their normalized sleep times their normalized test score.
    #    Then divide this by the sum of squares of the normalized sleep times.

    slope = sum(normalized_sleep * normalized_scores) / sum(normalized_sleep * normalized_sleep)

    #    Finally, We have a linear function of the form
    #    y - avg_y = slope * ( x - avg_x )
    #    Rewrite this function in the form
    #    y = m * x + b
    #    Then return the values m, b
    m = slope
    b = avg_scores - slope * avg_sleep
    return m,b


if __name__=="__main__":
    m,b = compute_regression(sleep,scores)
    print "Your linear model is y={}*x+{}".format(m,b)
#
#    Polynomial Regression
#
#    In this exercise we will examine more complex models of test grades as a function of 
#    sleep using numpy.polyfit to determine a good relationship and incorporating more data.
#
#
#   at the end, store the coefficients of the polynomial you found in coeffs
#

import numpy as np

sleep = [5,6,7,8,10,12,16]
scores = [65,51,75,75,86,80,0]


coeffs = np.polyfit(sleep, scores, 2)

results matching ""

    No results matching ""