线性回归(boston房价预测)
(1)数据预处理
1 2 3 4 5 6 7 8 9
   | data = pd.read_csv("data.txt", delim_whitespace=True, names=['CRIM', 'ZN', 'INDUS','CHAS',' NOX','RM','AGE',' DIS',' RAD',' TAX','PTRATIO','B','LSTAT','MEDV'])
  for i in range(len(data)):     if((i + 1) % 2 != 1):         data["B"][i - 1] = data["CRIM"][i]         data["LSTAT"][i - 1] = data["ZN"][i]         data["MEDV"][i - 1] = data["INDUS"][i]         data = data.drop([i], axis = 0) data 
  | 
 

(2)划分训练集与测试集
由于没有测试数据,我们将数据集划分为训练集与验证集
1 2 3 4 5 6
   | data_X = data[['ZN','RM','PTRATIO','LSTAT']]  data_y = data[['MEDV']] 
  X_train,X_test,y_train,y_test = train_test_split(data_X, data_y, test_size = 0.4)
  X_train.shape,X_test.shape,y_train.shape,y_test.shape 
   | 
 

(3)训练数据
划分的训练集训练模型
1 2
   | model=LinearRegression() model.fit(X_train,y_train)
   | 
 
(4)计算预测值
划分的验证集使用模型预测
1 2
   | y_pred = model.predict(X_test) y_pred
   | 
 
(5)计算平均绝对误差
1 2
   | mae = mean_absolute_error(y_pred,y_test) mae
   | 
 
(6)数据分析
1 2 3 4 5 6
   | import matplotlib.pyplot as plt fig = plt.figure(figsize = (20,10)) plt.rcParams['font.size'] = 15 plt.plot(range(y_test.shape[0]),y_test, linewidth=2, linestyle='-') plt.plot(range(y_test.shape[0]),y_pred,linewidth=2, linestyle='-.') plt.legend(['Test','Predict'])
   | 
 

逻辑回归(iris分类)
(1)读入数据
1 2
   | data = pd.read_csv("G:/大数据/机器学习实验/实验一/iris/iris/iris.data",sep = ',',names = ['Sepal_Length','Sepal_width','Petal_length','Petal_width','Class']) data
  | 
 

(2) 查看缺失值
运行结果可见无缺失

(3)划分数据集
1 2 3 4 5
   |  X = data.iloc[:, :-1] y = data.iloc[:, -1] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0) print(X)
 
  | 
 

(4)对数据标准化处理
1 2 3 4 5 6 7
   |  stdsc = StandardScaler() X_train_conti_std = stdsc.fit_transform(X_train[['Sepal_Length','Sepal_width','Petal_length','Petal_width']]) X_test_conti_std = stdsc.fit_transform(X_test[['Sepal_Length','Sepal_width','Petal_length','Petal_width']])
  X_train_conti_std = pd.DataFrame(data=X_train_conti_std, columns=['Sepal_Length','Sepal_width','Petal_length','Petal_width'], index=X_train.index) X_test_conti_std = pd.DataFrame(data=X_test_conti_std, columns=['Sepal_Length','Sepal_width','Petal_length','Petal_width'], index=X_test.index)
 
  | 
 
(5)逻辑回归建立模型
1 2 3 4 5 6 7 8
   |  classifier = LogisticRegression(random_state=0) classifier.fit(X_train, y_train)  
  y_pred = classifier.predict(X_test) confusion_matrix = confusion_matrix(y_test, y_pred) print(confusion_matrix)
 
  | 
 

(6)正确率
1
   | print('Accuracy of logistic regression classifier on test set: {:.2f}'.format(classifier.score(X_test, y_test)))
  | 
 

*编码处理
该处理适用一些模型,可用编码将变量数字化,本模型并不需要,以下仅展示使用效果。
1 2 3 4 5
   |  data_dummy = pd.get_dummies(data[['Class']]) data_conti = pd.DataFrame(data, columns=['Sepal_Length','Sepal_width','Petal_length','Petal_width'], index=data.index) data = data_conti.join(data_dummy) data
 
  | 
 
