Polynomial Regression
continuous and discrete refers to output
slope:斜率 intercept:截距
http://scikit-learn.org/stable/modules/linear\_model.html
def studentReg(ages_train, net_worths_train):
### import the sklearn regression module, create, and train your regression
### name your regression reg
### your code goes here!
from sklearn.linear_model import LinearRegression
reg = LinearRegression()
reg.fit(ages_train, net_worths_train)
return reg
import numpy
import matplotlib.pyplot as plt
from ages_net_worths import ageNetWorthData
ages_train, ages_test, net_worths_train, net_worths_test = ageNetWorthData()
from sklearn.linear_model import LinearRegression
reg = LinearRegression()
reg.fit(ages_train, net_worths_train)
### get Katie's net worth (she's 27)
### sklearn predictions are returned in an array, so you'll want to index into
### the output to get what you want, e.g. net_worth = predict([[27]])[0][0] (not
### exact syntax, the point is the [0] at the end). In addition, make sure the
### argument to your prediction function is in the expected format - if you get
### a warning about needing a 2d array for your data, a list of lists will be
### interpreted by sklearn as such (e.g. [[27]]).
km_net_worth = reg.predict([[27]])[0][0] ### fill in the line of code to get the right value
### get the slope
### again, you'll get a 2-D array, so stick the [0][0] at the end
slope = reg.coef_[0][0] ### fill in the line of code to get the right value
### get the intercept
### here you get a 1-D array, so stick [0] on the end to access
### the info we want
intercept = reg.intercept_[0] ### fill in the line of code to get the right value
### get the score on test data
test_score = reg.score(ages_test, net_worths_test) ### fill in the line of code to get the right value
### get the score on the training data
training_score = reg.score(ages_train, net_worths_train) ### fill in the line of code to get the right value
def submitFit():
# all of the values in the returned dictionary are expected to be
# numbers for the purpose of the grader.
return {"networth":km_net_worth,
"slope":slope,
"intercept":intercept,
"stats on test":test_score,
"stats on training": training_score}
The goal is the find m and b in equation y = mx+b that minimizes
Algorithms for Minimizing Squared Errors
- ordinary least squares(OLS):最小二乘法,在sklearn中使
- gradient descent (梯度下降)
R Squared Metric For Regression
answers the question
"how much of my change in the output y is explained by the change in my input x
K nearest neighbor(KNN)
take k nearest points, take average y
Kernel regression
weighted KNN
#
#
# Regression and Classification programming exercises
#
#
#
# In this exercise we will be taking a small data set and computing a linear function
# that fits it, by hand.
#
# the data set
import numpy as np
sleep = [5,6,7,8,10]
scores = [65,51,75,75,86]
def compute_regression(sleep,scores):
# First, compute the average amount of each list
avg_sleep = np.mean(sleep)
avg_scores = np.mean(scores)
# Then normalize the lists by subtracting the mean value from each entry
normalized_sleep = sleep - avg_sleep
normalized_scores = scores - avg_scores
# Compute the slope of the line by taking the sum over each student
# of the product of their normalized sleep times their normalized test score.
# Then divide this by the sum of squares of the normalized sleep times.
slope = sum(normalized_sleep * normalized_scores) / sum(normalized_sleep * normalized_sleep)
# Finally, We have a linear function of the form
# y - avg_y = slope * ( x - avg_x )
# Rewrite this function in the form
# y = m * x + b
# Then return the values m, b
m = slope
b = avg_scores - slope * avg_sleep
return m,b
if __name__=="__main__":
m,b = compute_regression(sleep,scores)
print "Your linear model is y={}*x+{}".format(m,b)
#
# Polynomial Regression
#
# In this exercise we will examine more complex models of test grades as a function of
# sleep using numpy.polyfit to determine a good relationship and incorporating more data.
#
#
# at the end, store the coefficients of the polynomial you found in coeffs
#
import numpy as np
sleep = [5,6,7,8,10,12,16]
scores = [65,51,75,75,86,80,0]
coeffs = np.polyfit(sleep, scores, 2)