ADDI Alzheimers Detection Challenge
What about constant solution???
Selection of constants based on the distribution of test data and 0.806 on the leaderboard
Define preprocessing code¶
Import common packages¶
Please import packages that are common for training and prediction phases here.
In [ ]:
import numpy as np
import os
import pandas as pd
from catboost import CatBoostClassifier, CatBoostRegressor
from lightgbm import LGBMClassifier
import lightgbm as lgb
import joblib
from sklearn.model_selection import train_test_split, KFold, StratifiedKFold
from sklearn.metrics import *
import warnings
warnings.filterwarnings('ignore')
Training phase¶
Load training data¶
In [ ]:
train_data = pd.read_csv(AICROWD_TRAIN_DATASET_PATH)
train_data.head()
Out[ ]:
Functions¶
As we know, test data has another distribution. The best distribution of 3500-1149-420 was found experimentally.
In [ ]:
def get_batch(data, rs):
neg = 3500
pos = data[data.diagnosis == 'normal'].sample(neg, random_state=rs)
neg = data[data.diagnosis != 'normal']
return pd.concat([pos, neg])
Find constant¶
In [ ]:
t_df = get_batch(train_data, 17).reset_index(drop=True)
t = np.zeros((t_df.shape[0], 3))
t[t_df[t_df.diagnosis == 'normal'].index, 0] = 1
t[t_df[t_df.diagnosis == 'post_alzheimer'].index, 1] = 1
t[t_df[t_df.diagnosis == 'pre_alzheimer'].index, 2] = 1
t
Out[ ]:
In [ ]:
min_err = 1
prob = 0.5
while prob < 1:
prob_2 = 0
while prob_2 < 1:
if prob + prob_2 < 1:
pred = np.zeros((t_df.shape[0], 3))
pred[:, 0] = prob
pred[:, 1] = prob_2
pred[:, 2] = (1 - prob - prob_2)
if (log_loss(t, pred) + log_loss(t, pred) + log_loss(t, pred)) / 3 < min_err:
min_err = (log_loss(t, pred) + log_loss(t, pred) + log_loss(t, pred)) / 3
print(round(prob, 2), round(prob_2, 2), round((1 - prob - prob_2), 2),
(log_loss(t, pred) + log_loss(t, pred) + log_loss(t, pred)) / 3)
prob_2 += 0.01
prob += 0.01
So, the best probabilities are 0.69 - 0.23 - 0.08. Let's try to submit it
Prediction phase 🔎¶
Please make sure to save the weights from the training section in your assets directory and load them in this section
Load test data¶
In [ ]:
test_data = pd.read_csv(AICROWD_DATASET_PATH)
Generate predictions¶
In [ ]:
preds = np.zeros((test_data.shape[0], 3))
In [ ]:
predictions = {
"row_id": test_data["row_id"].values,
"normal_diagnosis_probability": preds[:, 0],
"post_alzheimer_diagnosis_probability": preds[:, 1],
"pre_alzheimer_diagnosis_probability": preds[:, 2],
}
predictions_df = pd.DataFrame.from_dict(predictions)
In [ ]:
predictions_df['normal_diagnosis_probability'] = 0.69
predictions_df['post_alzheimer_diagnosis_probability'] = 0.23
predictions_df['pre_alzheimer_diagnosis_probability'] = 0.08
Save predictions 📨¶
In [ ]:
predictions_df.to_csv(AICROWD_PREDICTIONS_PATH, index=False)
Submit to AIcrowd 🚀¶
NOTE: PLEASE SAVE THE NOTEBOOK BEFORE SUBMITTING IT (Ctrl + S)
In [ ]:
!DATASET_PATH=$AICROWD_DATASET_PATH \
aicrowd notebook submit \
--assets-dir $AICROWD_ASSETS_DIR \
--challenge addi-alzheimers-detection-challenge
In [ ]:
Content
Comments
You must login before you can post a comment.