sklearn 使用总结

1 常用算法模块

1
2
3

from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier

2 通用模式

2.1 分为训练集和测试集

1
2
3

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

2.2 建立模型

1
2
3

from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier()

2.3 训练

1	knn.fit(X_train, y_train)

2.4 预测

1	knn.predict(X_test)

2.5 准确率

2.5.1 方法1

1	print(knn.score(X_test, y_test))

2.5.2 方法2

1
2
3

from sklearn.metrics import accuracy_score

score = accuracy_score(predict, test_labels)

3 数据预处理

from sklearn import preprocessing
import numpy as np


#建立Array
a = np.array([[10, 2.7, 3.6],
              [-100, 5, -2],
              [120, 20, 40]], dtype=np.float64)


#将normalized后的a打印出
print(preprocessing.scale(a))
# [[ 0.         -0.85170713 -0.55138018]
#  [-1.22474487 -0.55187146 -0.852133  ]
#  [ 1.22474487  1.40357859  1.40351318]]

3.1 标准正态分布

将数据转化为标准正态分布（均值为0，方差为1）

1	preprocessing.scale(X,axis=0, with_mean=True, with_std=True, copy=True)

3.2 数据缩放

将数据在缩放在固定区间，默认缩放到区间 [0, 1]

1	preprocessing.minmax_scale(X,feature_range=(0, 1), axis=0, copy=True)

3.3 归一化

将数据归一化到区间 [0, 1]，norm 可取值 ‘l1’、’l2’、’max’。可用于稀疏数据 scipy.sparse

1	preprocessing.normalize(X,norm='l2', axis=1, copy=True)