Optuna를 사용하여 XGBoost의 최적 하이퍼 파라미터 구하는 예제코드입니다.
2022. 03. 12 최초작성
2024. 5. 29
2024. 7. 4
import pandas as pd from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from xgboost import XGBClassifier import optuna from sklearn.datasets import load_iris RANDOM_SEED = 42 # Iris 데이터셋 로드 iris = load_iris() df = pd.DataFrame(data=iris.data, columns=iris.feature_names) df['label'] = iris.target X = df.drop(columns=['label']) y = df['label'] # train 데이터세트와 test 데이터세트로 분리 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=RANDOM_SEED) # train 데이터세트와 validation 데이터세트로 분리 X_train, X_validation, y_train, y_validation = train_test_split(X_train, y_train, test_size=0.25, random_state=RANDOM_SEED) def objective(trial): params = { "objective": "multi:softprob", "eval_metric": 'mlogloss', "booster": 'gbtree', "tree_method": 'hist', "max_depth": trial.suggest_int("max_depth", 4, 10), "learning_rate": trial.suggest_float('learning_rate', 0.0001, 0.99), 'n_estimators': trial.suggest_int("n_estimators", 1000, 10000, step=100), "colsample_bytree": trial.suggest_float("colsample_bytree", 0.5, 1.0), "colsample_bylevel": trial.suggest_float("colsample_bylevel", 0.5, 1.0), "colsample_bynode": trial.suggest_float("colsample_bynode", 0.5, 1.0), "reg_lambda": trial.suggest_float("reg_lambda", 1e-2, 1.0), "reg_alpha": trial.suggest_float("reg_alpha", 1e-2, 1.0), 'subsample': trial.suggest_float('subsample', 0.6, 1.0), 'min_child_weight': trial.suggest_int('min_child_weight', 2, 15), "gamma": trial.suggest_float("gamma", 0.1, 1.0), "random_state": RANDOM_SEED, "early_stopping_rounds": 50 } model = XGBClassifier(**params) bst = model.fit(X_train, y_train, eval_set=[(X_validation,y_validation)], verbose=False) preds = bst.predict(X_validation) accuracy = accuracy_score(y_validation, preds) return accuracy study = optuna.create_study(direction="maximize", sampler=optuna.samplers.TPESampler(seed=RANDOM_SEED)) study.optimize(objective, n_trials=20) final_model = XGBClassifier(random_state=RANDOM_SEED, **study.best_params) # 전체 훈련 데이터로 모델 학습 final_model.fit(X_train, y_train) y_pred = final_model.predict(X_test) accuracy = accuracy_score(y_test, y_pred) print(f'Accuracy = {accuracy:.2f}') |
실행 결과입니다.
(python) webnautes@webnautes-laptop:~$ /home/webnautes/miniconda3/envs/python/bin/python /home/webnautes/1.py
[I 2024-05-29 22:54:59,217] A new study created in memory with name: no-name-ad32d1f4-8dcd-4abe-aa04-aed6b5ae86b3
[I 2024-05-29 22:54:59,297] Trial 0 finished with value: 0.84 and parameters: {'max_depth': 6, 'learning_rate': 0.941212091915176, 'n_estimators': 7600, 'colsample_bytree': 0.7993292420985183, 'colsample_bylevel': 0.5780093202212182, 'colsample_bynode': 0.5779972601681014, 'reg_lambda': 0.06750277604651747, 'reg_alpha': 0.8675143843171859, 'subsample': 0.8404460046972835, 'min_child_weight': 11, 'gamma': 0.1185260448662222}. Best is trial 0 with value: 0.84.
[I 2024-05-29 22:54:59,357] Trial 1 finished with value: 0.88 and parameters: {'max_depth': 10, 'learning_rate': 0.8241349701283375, 'n_estimators': 2900, 'colsample_bytree': 0.5909124836035503, 'colsample_bylevel': 0.5917022549267169, 'colsample_bynode': 0.6521211214797689, 'reg_lambda': 0.5295088673159155, 'reg_alpha': 0.4376255684556946, 'subsample': 0.7164916560792167, 'min_child_weight': 10, 'gamma': 0.22554447458683766}. Best is trial 1 with value: 0.88.
[I 2024-05-29 22:54:59,394] Trial 2 finished with value: 0.92 and parameters: {'max_depth': 6, 'learning_rate': 0.3627615886764254, 'n_estimators': 5100, 'colsample_bytree': 0.8925879806965068, 'colsample_bylevel': 0.5998368910791798, 'colsample_bynode': 0.7571172192068059, 'reg_lambda': 0.596490423173422, 'reg_alpha': 0.05598590859279775, 'subsample': 0.8430179407605753, 'min_child_weight': 4, 'gamma': 0.1585464336867516}. Best is trial 2 with value: 0.92.
[I 2024-05-29 22:54:59,447] Trial 3 finished with value: 0.92 and parameters: {'max_depth': 10, 'learning_rate': 0.9559791495405063, 'n_estimators': 8300, 'colsample_bytree': 0.6523068845866853, 'colsample_bylevel': 0.5488360570031919, 'colsample_bynode': 0.8421165132560784, 'reg_lambda': 0.44575096880220527, 'reg_alpha': 0.13081785249633104, 'subsample': 0.798070764044508, 'min_child_weight': 2, 'gamma': 0.9183883618709039}. Best is trial 2 with value: 0.92.
[I 2024-05-29 22:54:59,512] Trial 4 finished with value: 0.72 and parameters: {'max_depth': 5, 'learning_rate': 0.6559308092820068, 'n_estimators': 3800, 'colsample_bytree': 0.7600340105889054, 'colsample_bylevel': 0.7733551396716398, 'colsample_bynode': 0.5924272277627636, 'reg_lambda': 0.9698887814869129, 'reg_alpha': 0.7773814951275034, 'subsample': 0.9757995766256756, 'min_child_weight': 14, 'gamma': 0.6381099809299766}. Best is trial 2 with value: 0.92.
[I 2024-05-29 22:54:59,593] Trial 5 finished with value: 0.92 and parameters: {'max_depth': 10, 'learning_rate': 0.08769872778119511, 'n_estimators': 2700, 'colsample_bytree': 0.522613644455269, 'colsample_bylevel': 0.6626651653816322, 'colsample_bynode': 0.6943386448447411, 'reg_lambda': 0.27863554145615693, 'reg_alpha': 0.83045013406041, 'subsample': 0.7427013306774357, 'min_child_weight': 5, 'gamma': 0.5884264748424236}. Best is trial 2 with value: 0.92.
[I 2024-05-29 22:54:59,657] Trial 6 finished with value: 0.88 and parameters: {'max_depth': 4, 'learning_rate': 0.7941947912484238, 'n_estimators': 1600, 'colsample_bytree': 0.9934434683002586, 'colsample_bylevel': 0.8861223846483287, 'colsample_bynode': 0.5993578407670862, 'reg_lambda': 0.015466895952366375, 'reg_alpha': 0.8173068141702858, 'subsample': 0.8827429375390468, 'min_child_weight': 12, 'gamma': 0.7941433120173511}. Best is trial 2 with value: 0.92.
[I 2024-05-29 22:54:59,719] Trial 7 finished with value: 0.84 and parameters: {'max_depth': 4, 'learning_rate': 0.35494522468597545, 'n_estimators': 2000, 'colsample_bytree': 0.9315517129377968, 'colsample_bylevel': 0.811649063413779, 'colsample_bynode': 0.6654490124263246, 'reg_lambda': 0.0729227667831634, 'reg_alpha': 0.3178724984985056, 'subsample': 0.7300733288106989, 'min_child_weight': 12, 'gamma': 0.6738017242196918}. Best is trial 2 with value: 0.92.
[I 2024-05-29 22:54:59,769] Trial 8 finished with value: 0.88 and parameters: {'max_depth': 10, 'learning_rate': 0.4675455544178136, 'n_estimators': 2000, 'colsample_bytree': 0.8566223936114975, 'colsample_bylevel': 0.8803925243084487, 'colsample_bynode': 0.7806385987847482, 'reg_lambda': 0.7732575081550154, 'reg_alpha': 0.4988576404007468, 'subsample': 0.8090931317527976, 'min_child_weight': 7, 'gamma': 0.12287721406968567}. Best is trial 2 with value: 0.92.
[I 2024-05-29 22:54:59,859] Trial 9 finished with value: 0.92 and parameters: {'max_depth': 4, 'learning_rate': 0.031211750911298235, 'n_estimators': 6700, 'colsample_bytree': 0.6571779905381634, 'colsample_bylevel': 0.7542853455823514, 'colsample_bynode': 0.9537832369630466, 'reg_lambda': 0.25679930685738617, 'reg_alpha': 0.41627909380527345, 'subsample': 0.9022204554172195, 'min_child_weight': 5, 'gamma': 0.1692819188459137}. Best is trial 2 with value: 0.92.
[I 2024-05-29 22:54:59,918] Trial 10 finished with value: 0.92 and parameters: {'max_depth': 7, 'learning_rate': 0.27547421856771304, 'n_estimators': 9800, 'colsample_bytree': 0.867958808340303, 'colsample_bylevel': 0.6658087572237882, 'colsample_bynode': 0.8939212978268534, 'reg_lambda': 0.6578327575430455, 'reg_alpha': 0.020770911919664833, 'subsample': 0.6387875118993118, 'min_child_weight': 2, 'gamma': 0.35268485881573364}. Best is trial 2 with value: 0.92.
[I 2024-05-29 22:54:59,973] Trial 11 finished with value: 0.92 and parameters: {'max_depth': 8, 'learning_rate': 0.5793195136326114, 'n_estimators': 5200, 'colsample_bytree': 0.6712579968688783, 'colsample_bylevel': 0.5073117797812243, 'colsample_bynode': 0.8035921370166128, 'reg_lambda': 0.41909708257558387, 'reg_alpha': 0.014699611507676916, 'subsample': 0.9518680377516312, 'min_child_weight': 2, 'gamma': 0.9892183182715868}. Best is trial 2 with value: 0.92.
[I 2024-05-29 22:55:00,049] Trial 12 finished with value: 0.92 and parameters: {'max_depth': 8, 'learning_rate': 0.24085332754822578, 'n_estimators': 9000, 'colsample_bytree': 0.7064320329894238, 'colsample_bylevel': 0.5021467266227144, 'colsample_bynode': 0.8559889011067058, 'reg_lambda': 0.5787557981321025, 'reg_alpha': 0.1982337737677383, 'subsample': 0.7758790572044749, 'min_child_weight': 4, 'gamma': 0.407697585555261}. Best is trial 2 with value: 0.92.
[I 2024-05-29 22:55:00,121] Trial 13 finished with value: 0.88 and parameters: {'max_depth': 8, 'learning_rate': 0.9853648399787401, 'n_estimators': 5200, 'colsample_bytree': 0.59564718000957, 'colsample_bylevel': 0.9826689730327276, 'colsample_bynode': 0.9908928664487755, 'reg_lambda': 0.743633011545021, 'reg_alpha': 0.2004362798025116, 'subsample': 0.8604911724169197, 'min_child_weight': 7, 'gamma': 0.991064045747221}. Best is trial 2 with value: 0.92.
[I 2024-05-29 22:55:00,197] Trial 14 finished with value: 0.92 and parameters: {'max_depth': 6, 'learning_rate': 0.42139101077375424, 'n_estimators': 7000, 'colsample_bytree': 0.8443014139089451, 'colsample_bylevel': 0.6520673670567512, 'colsample_bynode': 0.7354110341678785, 'reg_lambda': 0.3860708042434934, 'reg_alpha': 0.6421919898294646, 'subsample': 0.6566336966609405, 'min_child_weight': 3, 'gamma': 0.8325318137206282}. Best is trial 2 with value: 0.92.
[I 2024-05-29 22:55:00,261] Trial 15 finished with value: 0.88 and parameters: {'max_depth': 9, 'learning_rate': 0.6034347116314008, 'n_estimators': 8200, 'colsample_bytree': 0.920268595864156, 'colsample_bylevel': 0.5858804491069047, 'colsample_bynode': 0.5148225739562332, 'reg_lambda': 0.9438027644281547, 'reg_alpha': 0.16798192231986073, 'subsample': 0.7942732833728919, 'min_child_weight': 6, 'gamma': 0.38469254276694687}. Best is trial 2 with value: 0.92.
[I 2024-05-29 22:55:00,362] Trial 16 finished with value: 0.88 and parameters: {'max_depth': 6, 'learning_rate': 0.17980782406763357, 'n_estimators': 5900, 'colsample_bytree': 0.7533881284358335, 'colsample_bylevel': 0.6919182326238186, 'colsample_bynode': 0.853555682355332, 'reg_lambda': 0.432083199433153, 'reg_alpha': 0.11743187165245833, 'subsample': 0.8993629548265796, 'min_child_weight': 9, 'gamma': 0.7865857985482376}. Best is trial 2 with value: 0.92.
[I 2024-05-29 22:55:00,419] Trial 17 finished with value: 0.88 and parameters: {'max_depth': 7, 'learning_rate': 0.7319362114136037, 'n_estimators': 4400, 'colsample_bytree': 0.6141831052110288, 'colsample_bylevel': 0.5569942994937775, 'colsample_bynode': 0.9123161127609903, 'reg_lambda': 0.2844198595735006, 'reg_alpha': 0.9913644241821635, 'subsample': 0.9422033988604263, 'min_child_weight': 4, 'gamma': 0.4531602835262813}. Best is trial 2 with value: 0.92.
[I 2024-05-29 22:55:00,477] Trial 18 finished with value: 0.92 and parameters: {'max_depth': 9, 'learning_rate': 0.35553545500625583, 'n_estimators': 6100, 'colsample_bytree': 0.535877190739221, 'colsample_bylevel': 0.6215517462475777, 'colsample_bynode': 0.8166075758316286, 'reg_lambda': 0.650977222057745, 'reg_alpha': 0.316548684624763, 'subsample': 0.685539134599549, 'min_child_weight': 2, 'gamma': 0.25345425951040723}. Best is trial 2 with value: 0.92.
[I 2024-05-29 22:55:00,541] Trial 19 finished with value: 0.88 and parameters: {'max_depth': 5, 'learning_rate': 0.5768368291030347, 'n_estimators': 8400, 'colsample_bytree': 0.98317534257415, 'colsample_bylevel': 0.7149872794581776, 'colsample_bynode': 0.7367486032526811, 'reg_lambda': 0.7807860567804339, 'reg_alpha': 0.29052213958843653, 'subsample': 0.8266164089224515, 'min_child_weight': 8, 'gamma': 0.5235678301764107}. Best is trial 2 with value: 0.92.
Accuracy = 0.98
관련 포스트입니다.
XGBoost에서 파이프라인 사용하여 표준화(standardization) 적용하기
https://webnautes.tistory.com/2352
'Deep Learning & Machine Learning > XGBoost' 카테고리의 다른 글
XGBoost에서 파이프라인 사용하여 표준화(standardization) 적용하기 (0) | 2024.07.08 |
---|---|
XGBoost에서 GPU(cuda) 사용하는 예제 (1) | 2024.06.15 |
RandomizedSearchCV를 사용하여 XGBoost 최적 하이퍼 파라미터 구하는 예제코드 (0) | 2024.05.30 |
XGBoost Warning 해결 방법 (0) | 2023.10.18 |
MacBook m1에서 XGBoost 코드 실행시 segmentation fault 해결 (0) | 2023.10.16 |
시간날때마다 틈틈이 이것저것 해보며 블로그에 글을 남깁니다.
블로그의 문서는 종종 최신 버전으로 업데이트됩니다.
여유 시간이 날때 진행하는 거라 언제 진행될지는 알 수 없습니다.
영화,책, 생각등을 올리는 블로그도 운영하고 있습니다.
https://freewriting2024.tistory.com
제가 쓴 책도 한번 검토해보세요 ^^
그렇게 천천히 걸으면서도 그렇게 빨리 앞으로 나갈 수 있다는 건.
포스팅이 좋았다면 "좋아요❤️" 또는 "구독👍🏻" 해주세요!