混淆矩阵和精准率可以帮我们了解贝叶斯的分类结果,然而,选择贝叶斯分类,大多数时候并不是为了单纯追求效果,而是希望看到预测的相关概率。所以对于概率类模型,还要了解两种独有的评估指标,本文先介绍第一种:布里尔分数(Brier Score)。

Brier Score = ∑(pi-oi)/N

pi为概率,oi是样本对应的真实结果(0或1),布里尔分数范围[0,1],分数越接近0越好,反之越高预测结果越差,校准程度越差。以下以乳腺癌数据集为例,用布里尔分数评估逻辑回归、SVM、朴素贝叶斯的效果。

代码示例

1、导入并拆分数据集

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

ds = load_breast_cancer()
x, y = ds.data, ds.target

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=0)

2、训练并计算概率

from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

# 高斯贝叶斯
gnb = GaussianNB()
gnb.fit(x_train, y_train)
prob_gnb = gnb.predict_proba(x_test)

# 逻辑回归
lr = LogisticRegression()
lr.fit(x_train, y_train)
prob_lr = lr.predict_proba(x_test)

# SVM
svc = SVC(probability=True)
svc.fit(x_train, y_train)
prob_svc = svc.predict_proba(x_test)

3、计算布里尔分数

from sklearn.metrics import brier_score_loss

score_gnb = brier_score_loss(y_test, prob_gnb[:, 1], pos_label=1)
print('score gnb:', score_gnb)
# score gnb: 0.075

score_lr = brier_score_loss(y_test, prob_lr[:, 1], pos_label=1)
print('score lr:', score_lr)
# score lr: 0.031

score_svc = brier_score_loss(y_test, prob_svc[:, 1], pos_label=1)
print('score svc:', score_svc)
# score svc: 0.044

从分数值大小判断:逻辑回归效果最好,贝叶斯次之,SVM效果最差。

本文为 陈华 原创,欢迎转载,但请注明出处:http://www.ichenhua.cn/read/299