1 常用算法模块

1
2
3
from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier

2 通用模式

2.1 分为训练集和测试集

1
2
3
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

2.2 建立模型

1
2
3
from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier()

2.3 训练

1
knn.fit(X_train, y_train)

2.4 预测

1
knn.predict(X_test)

2.5 准确率

2.5.1 方法1

1
print(knn.score(X_test, y_test))

2.5.2 方法2

1
2
3
from sklearn.metrics import accuracy_score

score = accuracy_score(predict, test_labels)

3 数据预处理

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
from sklearn import preprocessing
import numpy as np


#建立Array
a = np.array([[10, 2.7, 3.6],
[-100, 5, -2],
[120, 20, 40]], dtype=np.float64)


#将normalized后的a打印出
print(preprocessing.scale(a))
# [[ 0. -0.85170713 -0.55138018]
# [-1.22474487 -0.55187146 -0.852133 ]
# [ 1.22474487 1.40357859 1.40351318]]

3.1 标准正态分布

将数据转化为标准正态分布(均值为0,方差为1)

1
preprocessing.scale(X,axis=0, with_mean=True, with_std=True, copy=True)

3.2 数据缩放

将数据在缩放在固定区间,默认缩放到区间 [0, 1]

1
preprocessing.minmax_scale(X,feature_range=(0, 1), axis=0, copy=True)

3.3 归一化

将数据归一化到区间 [0, 1],norm 可取值 ‘l1’、’l2’、’max’。可用于稀疏数据 scipy.sparse

1
preprocessing.normalize(X,norm='l2', axis=1, copy=True)