multiclass classification 클래스별로 ROC curve 그리기

multiclass classification 클래스별로 ROC curve 그리기Deep Learning & Machine Learning/강좌&예제 코드2022. 3. 14. 07:49@webnautes

Table of Contents

multiclass classification에서 클래스별로 ROC curve를 그리는 예제 코드입니다.

2022. 3. 14 최초작성

# https://stackoverflow.com/questions/45332410/roc-for-multiclass-classification
# https://moons08.github.io/datascience/classification_score_roc_auc/

from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score, auc, roc_auc_score, roc_curve
from sklearn.preprocessing import label_binarize
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
import matplotlib.pyplot as plt
import pandas as pd

iris = load_iris()

iris_data = iris.data
iris_label = iris.target

# print(iris_data.shape)
# print(iris_label.shape)
# (150, 4)
# (150,)

iris_df = pd.DataFrame(data = iris_data, columns = iris.feature_names)

# print(iris_df.head())
# sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
# 0 5.1 3.5 1.4 0.2
# 1 4.9 3.0 1.4 0.2
# 2 4.7 3.2 1.3 0.2
# 3 4.6 3.1 1.5 0.2
# 4 5.0 3.6 1.4 0.2

# label 컬럼 추가
iris_df['label'] = iris.target

# print(iris_df.head())
# sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) label
# 0 5.1 3.5 1.4 0.2 0
# 1 4.9 3.0 1.4 0.2 0
# 2 4.7 3.2 1.3 0.2 0
# 3 4.6 3.1 1.5 0.2 0
# 4 5.0 3.6 1.4 0.2 0

# train 데이터 세트와 test 데이터 세트로 분리. 비율은 8 : 2
X_train, X_test, y_train, y_test = train_test_split(iris_data,
iris_label,
test_size=0.2,
random_state=7)

# print(X_train.shape, y_train.shape)
# print(X_test.shape, y_test.shape)
# (120, 4) (120,)
# (30, 4) (30,)

# 결정 트리 모델 분류 학습
decision_tree = DecisionTreeClassifier(random_state=32)
decision_tree.fit(X_train, y_train)

# 추론
y_pred = decision_tree.predict(X_test)

# 정확도
accuracy = accuracy_score(y_test, y_pred)
print('정확도 {:.4f}'.format(accuracy))
# 정확도 0.9000

# print(y_test.shape, y_pred.shape)
# (30,) (30,)

# print('y_test[:10]', y_test[:10])
# y_test[:10] [2 1 0 1 2 0 1 1 0 1]

# label_binarize를 사용하여 클래스별로 이진화를 합니다.
# 클래스별로 배열을 따로 만들어서 해당 클래스의 값이면 1, 아니면 0으로 표시합니다.
# label_binarize 전후의 배열값 변화를 확인해보세요
labels = [0, 1, 2]
y_test = label_binarize(y_test, classes=labels)
y_pred = label_binarize(y_pred, classes=labels)

# print(y_test.shape, y_pred.shape)
# (30, 3) (30, 3)

# print('y_test[:10, 0]', y_test[:10, 0])
# print('y_test[:10, 1]', y_test[:10, 1])
# print('y_test[:10, 2]', y_test[:10, 2])
# y_test[:10, 0] [0 0 1 0 0 1 0 0 1 0]
# y_test[:10, 1] [0 1 0 1 0 0 1 1 0 1]
# y_test[:10, 2] [1 0 0 0 1 0 0 0 0 0]

# 클래스별로 ROC curve를 그립니다.
n_classes = 3
fpr = dict()
tpr = dict()
roc_auc = dict()
for i in range(n_classes):
fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_pred[:, i])
roc_auc[i] = auc(fpr[i], tpr[i])

# Plot of a ROC curve for a specific class
plt.figure(figsize=(15, 5))
for idx, i in enumerate(range(n_classes)):
plt.subplot(131+idx)
plt.plot(fpr[i], tpr[i], label='ROC curve (area = %0.2f)' % roc_auc[i])
plt.plot([0, 1], [0, 1], 'k--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Class %0.0f' % idx)
plt.legend(loc="lower right")
plt.show()

print("roc_auc_score: ", roc_auc_score(y_test, y_pred, multi_class='raise'))
# roc_auc_score: 0.9302675881623251

저작자표시 비영리 동일조건

'Deep Learning & Machine Learning > 강좌&예제 코드' 카테고리의 다른 글

Matplotlib로 정규 분포 그려보기 (0)	2023.10.09
faiss 사용법 및 예제 코드 (0)	2023.10.07
다중 클래스의 혼동행렬(confusion matrix) 구하기 - multilabel_confusion_matrix (0)	2022.03.12
SPARK를 사용하여 대용량 데이터셋의 평균과 표준편차 구하기 (0)	2021.12.12
tensorflow dataset에서 batch 단위로 window 적용하기 (0)	2021.12.11

시간날때마다 틈틈이 이것저것 해보며 블로그에 글을 남깁니다.

블로그의 문서는 종종 최신 버전으로 업데이트됩니다.

여유 시간이 날때 진행하는 거라 언제 진행될지는 알 수 없습니다.

블로그 글과 유튜브 영상을 만드는 것은 전문가라서라기보단 공부한 내용을 함께 공유하는 게 좋아서입니다.

'Deep Learning & Machine Learning > 강좌&예제 코드' 카테고리의 다른 글

제가 쓴 책도 한번 검토해보세요 ^^

티스토리툴바