Loading

ADDI Alzheimers Detection Challenge

F1:0.52-Baseline Imbalance Samplers(20+) and 8Classifiers

Automated Benchmark of Imbalanced Samplers and Classifiers + Feature Engineering with Shapley Values

nilabha

This notebook gets a score of 0.521 F1 Score and log loss of 0.669.

The notebook was built upon the features shared in the link - https://discourse.aicrowd.com/t/target-distribution-in-the-test-set-lb-0-616-with-a-simple-magic-trick/5613

Created new features of mean/std-based features and checked for importance using Shapley values (https://shap.readthedocs.io/en/latest/index.html) and check for the impact of features on the normal diagnosis probability. The feature `dist from mean` and `dist from std` created by averaging and taking standard deviation across the digits for  `dist from cen`  feature showed higher importance based on Shapley values.

About 20+ samples and 8 classifier models (including the popular Xgboost, LightGBM, Catboost, and Tensorflow based Keras Neural Network Classifier) were used for the benchmarking. Random Forest tends to give the best cv scores but Catboost does better on the leaderboard.

This selects the best model based on the K-Fold metric. Alternatively, a stratified k-fold metric can also be chosen. Any other strategy like a train -valid split can also be easily added by including it in the list of `model_sel_strategy`. A simple K-fold was selected by checking proximity to leaderboard scores.

The scikit learn and imbalanced Learn pipelines have been used to automate the benchmarking process over all the samplers and classifiers.

Standard parameters for the classifier and samplers were used without hyper parameter tuning which could further boost performance.  Log loss score was high as some of the probabilities were quite spread across the classes. A simple ensemble-based approach of arithmetic/geometric mean or just averaging based on different models selected in different k-folds could help to gain more confidence in the probabilities.

 

 

 

 

 

Drawing

What is the notebook about?

The challenge is to use the features extracted from the Clock Drawing Test to build an automated and algorithm to predict whether each participant is one of three phases:

1) Pre-Alzheimer’s (Early Warning) 2) Post-Alzheimer’s (Detection) 3) Normal (Not an Alzheimer’s patient)

In machine learning terms: this is a 3-class classification task.

How to use this notebook? 📝

notebook overview

  • Update the config parameters. You can define the common variables here
Variable Description
AICROWD_DATASET_PATH Path to the file containing test data (The data will be available at /ds_shared_drive/ on aridhia workspace). This should be an absolute path.
AICROWD_PREDICTIONS_PATH Path to write the output to.
AICROWD_ASSETS_DIR In case your notebook needs additional files (like model weights, etc.,), you can add them to a directory and specify the path to the directory here (please specify relative path). The contents of this directory will be sent to AIcrowd for evaluation.
AICROWD_API_KEY In order to submit your code to AIcrowd, you need to provide your account's API key. This key is available at https://www.aicrowd.com/participants/me
  • Installing packages. Please use the Install packages 🗃 section to install the packages
  • Training your models. All the code within the Training phase ⚙️ section will be skipped during evaluation. Please make sure to save your model weights in the assets directory and load them in the predictions phase section

Setup AIcrowd Utilities 🛠

We use this to bundle the files for submission and create a submission on AIcrowd. Do not edit this block.

In [3]:
!pip install -q -U aicrowd-cli
In [2]:
%load_ext aicrowd.magic
In [16]:
!pip install sweetviz
!pip install -U jupyter
In [3]:
import sweetviz as sv
In [4]:
import os

# Please use the absolute for the location of the pip install Shapelydataset.
# Or you can use relative path with `os.getcwd() + "test_data/validation.csv"`
AICROWD_DATASET_PATH = os.getenv("DATASET_PATH", "/ds_shared_drive/validation.csv")
AICROWD_PREDICTIONS_PATH = os.getenv("PREDICTIONS_PATH", "predictions.csv")
AICROWD_ASSETS_DIR = "assets"
In [85]:
#!pip install ipywidgets
#!jupyter nbextension enable --py widgetsnbextension
#!conda install -y jupyterlab_widgets
#!pip install aquirdturtle_collapsible_headings

Install packages 🗃

Please add all pacakage installations in this section

In [86]:
!pip install numpy pandas
!pip install -U imbalanced-learn
!pip install xgboost
!pip install lightgbm
!pip install catboost
!pip install tensorflow
!pip install shap

Define preprocessing code 💻

The code that is common between the training and the prediction sections should be defined here. During evaluation, we completely skip the training section. Please make sure to add any common logic between the training and prediction sections here.

Import common packages

Please import packages that are common for training and prediction phases here.

In [101]:
from imblearn.datasets import fetch_datasets
import numpy as np
import pandas as pd
import joblib
import matplotlib.pyplot as plt
from collections import Counter

%matplotlib inline

from sklearn.model_selection import train_test_split
from sklearn.model_selection import StratifiedKFold, KFold
from sklearn.metrics import plot_confusion_matrix, log_loss, f1_score
from sklearn.model_selection import cross_val_score

from sklearn.ensemble import BaggingClassifier
from imblearn.ensemble import BalancedBaggingClassifier
from sklearn.ensemble import RandomForestClassifier
from imblearn.ensemble import BalancedRandomForestClassifier
from sklearn.ensemble import AdaBoostClassifier
from imblearn.ensemble import EasyEnsembleClassifier, RUSBoostClassifier
from lightgbm import LGBMClassifier
from xgboost import XGBClassifier
import xgboost
import shap
from catboost import CatBoostClassifier
from sklearn.linear_model import LogisticRegression

from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import OrdinalEncoder
from sklearn.preprocessing import FunctionTransformer
from sklearn.pipeline import make_pipeline
from sklearn.compose import make_column_transformer
from sklearn.compose import make_column_selector as selector

from sklearn.ensemble import IsolationForest

from sklearn.decomposition import PCA
from sklearn.neighbors import KNeighborsClassifier

from imblearn.over_sampling import RandomOverSampler, SMOTE, ADASYN,BorderlineSMOTE, KMeansSMOTE, SVMSMOTE, SMOTEN, SMOTENC

from imblearn.under_sampling import (RandomUnderSampler, EditedNearestNeighbours, TomekLinks, NearMiss, 
    CondensedNearestNeighbour,ClusterCentroids,
    OneSidedSelection,
    NeighbourhoodCleaningRule,InstanceHardnessThreshold,                                     
RepeatedEditedNearestNeighbours, AllKNN)

from imblearn import FunctionSampler

from imblearn.combine import SMOTEENN, SMOTETomek

from imblearn.pipeline import make_pipeline as make_pipeline_imblearn
In [63]:
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
from tensorflow.python.keras.wrappers.scikit_learn import KerasClassifier
from tensorflow.keras.metrics import CategoricalCrossentropy
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import (
    Activation,
    Dense,
    Dropout,
    BatchNormalization,
)

def simple_model():
    clf = Sequential()
    clf.add(Dense(32, activation='relu', input_dim=X.shape[1]))
    clf.add(Dense(16, activation='relu'))
    clf.add(Dense(3, activation='softmax'))
    clf.compile(loss='categorical_crossentropy', optimizer='adam',metrics=[CategoricalCrossentropy(),"AUC","Precision","accuracy"])
    return clf
In [8]:
def create_model_sampler(classifier, sampler):
    pipeline = make_pipeline_imblearn(sampler,classifier)
    return pipeline

samplers = [
    FunctionSampler(), # Do nothing
    RandomOverSampler(random_state=0),
    ADASYN(random_state=0),    
    SMOTE(random_state=0),
    BorderlineSMOTE(random_state=0, kind="borderline-1"),
    BorderlineSMOTE(random_state=0, kind="borderline-2"),
    # KMeansSMOTE(random_state=0, k_neighbors=3), Causes error in some cases with clusters
    SMOTEN(random_state=0),
    # SMOTENC(random_state=0), Requires categorical features
    SVMSMOTE(random_state=0),
    SMOTEENN(random_state=0), 
    SMOTETomek(random_state=0),
    NearMiss(version=1), NearMiss(version=2), NearMiss(version=3),
    RandomUnderSampler(random_state=0),
    ClusterCentroids(random_state=0),
    CondensedNearestNeighbour(random_state=0),
    OneSidedSelection(random_state=0),
    NeighbourhoodCleaningRule(),
    TomekLinks(sampling_strategy="auto"),
    EditedNearestNeighbours(),
    RepeatedEditedNearestNeighbours(),
    AllKNN(allow_minority=True),
    # InstanceHardnessThreshold(estimator=LogisticRegression()) Does not converge with warning
]
In [9]:
target_col = "diagnosis"
key_col = "row_id"
cat_cols = ['intersection_pos_rel_centre']
seed = 2021

target_values = ["normal", "post_alzheimer", "pre_alzheimer"]

Training phase ⚙️

You can define your training code here. This sections will be skipped during evaluation.

In [891]:
train = pd.read_csv('/ds_shared_drive/train.csv')
In [677]:
# valid = pd.read_csv('/ds_shared_drive/validation.csv')
# valid_truth = pd.read_csv('/ds_shared_drive/validation_ground_truth.csv')
# valid_all = valid.merge(valid_truth,how='left')
# train = pd.concat([train, valid_all],axis = 0)
In [892]:
train = train[train[target_col].isin(target_values)].copy().reset_index(drop=True)

# Remove Constant Columns
train = train.loc[:, (train != train.iloc[0]).any()]
features = train.columns[1:-1].to_list()

numeric_features = [c for c in features if c not in cat_cols]
In [893]:
for c in numeric_features:
    train[c] = train[c].astype(float)

print(train[target_col].value_counts())
print(train.shape)
normal            31208
post_alzheimer     1149
pre_alzheimer       420
Name: diagnosis, dtype: int64
(32777, 120)
In [894]:
df_pos = train[train[target_col].isin(target_values[1:])]
nb_pos = df_pos.shape[0]
nb_neg = nb_pos*2
df_neg = train[train[target_col] == "normal"].sample(n=nb_neg, random_state=seed)
# df_neg = df_normal 
df_samples = pd.concat([df_pos, df_neg]).sample(frac=1).reset_index(drop=True)
# df_samples = train
df_samples.shape
Out[894]:
(4707, 120)
In [895]:
df_samples.shape
Out[895]:
(4707, 120)
In [896]:
print(cat_cols)
for c in cat_cols:
    df_samples[c].fillna("NA", inplace=True)
    
df_dummies = pd.get_dummies(df_samples[cat_cols], columns=cat_cols, dummy_na=True).add_prefix('CAT_')
dummy_cols = df_dummies.columns.to_list()
print(dummy_cols)

df_samples = pd.concat([df_samples, df_dummies], axis=1)
df_samples['cnt_NaN'] = df_samples[numeric_features].isna().sum(axis=1)
df_samples.fillna(-1, inplace=True)
model_features = df_samples.columns.to_list()
model_features = [c for c in model_features if c not in [key_col, target_col] + cat_cols]
print(len(model_features))
X_train = df_samples[model_features]
y_train = df_samples[target_col].map(dict(zip(target_values, list(range(len(target_values))))))
['intersection_pos_rel_centre']
['CAT_intersection_pos_rel_centre_BL', 'CAT_intersection_pos_rel_centre_BR', 'CAT_intersection_pos_rel_centre_NA', 'CAT_intersection_pos_rel_centre_TL', 'CAT_intersection_pos_rel_centre_TR', 'CAT_intersection_pos_rel_centre_nan']
124
In [897]:
df_samples[target_col].value_counts()
Out[897]:
normal            3138
post_alzheimer    1149
pre_alzheimer      420
Name: diagnosis, dtype: int64
In [868]:
df_analysis = df_samples.copy()
df_analysis[target_col] = df_analysis[target_col].astype('category').cat.codes
In [27]:
feature_config = sv.FeatureConfig(force_num=target_col)
In [29]:
addi_report = sv.analyze(df_analysis,target_feat = target_col,feat_cfg = feature_config)
addi_report.show_html()
Report SWEETVIZ_REPORT.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
In [579]:
df_analysis[target_col].value_counts()
Out[579]:
0    3138
1    1149
2     420
Name: diagnosis, dtype: int64
In [898]:
X_train.fillna(-1,inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
In [899]:
X_train['more than 12'] = [1 if x > 12 else 0 for x in X_train['number_of_digits'] ]
new_cols = ["missing_digit_", "euc_dist__digit_", "area_digit_", 
           "height_digit_", "width_digit_","dist from "]
for new_col in new_cols:
    digit_columns = X_train.columns[X_train.columns.str.contains(new_col)]
    X_train[new_col + "mean"] = X_train[digit_columns].mean(axis=1)
    X_train[new_col + "std"] = X_train[digit_columns].std(axis=1)
    X_train[new_col + "skew"] = X_train[digit_columns].mean(axis=1)
    X_train[new_col + "kurtosis"] = X_train[digit_columns].std(axis=1)
shap.initjs()
X_train.fillna(-1, inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
In [900]:
X_train.shape
Out[900]:
(4707, 149)
In [901]:
model = LGBMClassifier().fit(X_train.values, y_train.values)

explainer = shap.TreeExplainer(model)
shap_values = explainer(X_train)
shapely_values = explainer.shap_values(X_train)
In [902]:
shap.summary_plot(shapely_values, X_train,max_display=10)
In [903]:
shap.dependence_plot("angle_between_hands", shapely_values[1], X_train)
In [904]:
shap.force_plot(explainer.expected_value[0], shapely_values[0][0,:], X_train.iloc[0,:])
Out[904]:
Visualization omitted, Javascript library not loaded!
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security. If you are using JupyterLab this error is because a JupyterLab extension has not yet been written.
In [907]:
shap.force_plot(explainer.expected_value[0], shapely_values[0][:2000,:], X_train.iloc[:2000,:])
Out[907]:
Visualization omitted, Javascript library not loaded!
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security. If you are using JupyterLab this error is because a JupyterLab extension has not yet been written.
In [908]:
X = X_train.values
y = y_train.values
X.shape, y.shape
Out[908]:
((4707, 149), (4707,))
In [909]:
model_sel_strategy = [KFold(n_splits=3), StratifiedKFold(n_splits=3)]
SCORING = 'neg_log_loss'

base_estimator = AdaBoostClassifier(n_estimators=10)
other_models = [EasyEnsembleClassifier(n_estimators=10, base_estimator=base_estimator), RUSBoostClassifier(n_estimators=10, base_estimator=base_estimator)]

models = [RandomForestClassifier(n_estimators=150, random_state=0), BalancedRandomForestClassifier(n_estimators=150, random_state=0),XGBClassifier(),LGBMClassifier(),CatBoostClassifier(verbose =False)]

deep_models = [KerasClassifier(simple_model, epochs=30, verbose=0)]

all_models = []
all_models.extend(models)
all_models.extend(other_models)
all_models.extend(deep_models)
In [910]:
all_models[3:4]
Out[910]:
[LGBMClassifier()]
In [911]:
all_columns = ['model','sampler', 'metric','model_sel','cv_mean','cv_std']
all_results = pd.DataFrame(columns=all_columns)
In [912]:
# load your data

Train your model

Instructions

Remove [3:4] in models[3:4] to run all models. Likewise remove [:3] from samplers[:3] to run all samplers

In [913]:
all_clfs = []
for model in all_models[3:4]:
    cur_model = str(model)
    for sampler in samplers[:3]:
        cur_sampler = str(sampler)
        for model_sel in model_sel_strategy[1:]:
            clf = create_model_sampler(model, sampler)
            print("Running Pipeline",clf, "\n")
            try:
                results = cross_val_score(clf, X, y, cv=model_sel, scoring=SCORING)
            except Exception as e:
                print("Error","\n")
                print(e)
                results = [np.nan]
            cur_results = pd.DataFrame([str(cur_model),str(cur_sampler), SCORING,
                                                    str(model_sel),round(np.nanmean(results),4),
                                             round(np.nanstd(results),4)]).transpose()
            cur_results.columns = all_columns
            all_results = all_results.append(cur_results)
            all_clfs.append(clf)
Running Pipeline Pipeline(steps=[('functionsampler', FunctionSampler()),
                ('lgbmclassifier', LGBMClassifier())]) 

Running Pipeline Pipeline(steps=[('randomoversampler', RandomOverSampler(random_state=0)),
                ('lgbmclassifier', LGBMClassifier())]) 

Running Pipeline Pipeline(steps=[('adasyn', ADASYN(random_state=0)),
                ('lgbmclassifier', LGBMClassifier())]) 

In [916]:
all_results.sort_values(by=['cv_mean'], ascending = False).head(15)
Out[916]:
model sampler metric model_sel cv_mean cv_std
0 LGBMClassifier() ADASYN(random_state=0) neg_log_loss StratifiedKFold(n_splits=3, random_state=None,... -0.7058 0.0375
0 LGBMClassifier() RandomOverSampler(random_state=0) neg_log_loss StratifiedKFold(n_splits=3, random_state=None,... -0.7104 0.0371
0 LGBMClassifier() FunctionSampler() neg_log_loss StratifiedKFold(n_splits=3, random_state=None,... -0.7357 0.0287
In [917]:
all_results = all_results.fillna(-999)
results_filename = f'{AICROWD_ASSETS_DIR}/all_results_lgb.csv'
all_results.to_csv(results_filename, index = False)

idx = np.argmax(all_results['cv_mean'])
all_results.iloc[idx,:]
Out[917]:
model                                         LGBMClassifier()
sampler                                 ADASYN(random_state=0)
metric                                            neg_log_loss
model_sel    StratifiedKFold(n_splits=3, random_state=None,...
cv_mean                                                -0.7058
cv_std                                                  0.0375
Name: 0, dtype: object
In [918]:
# Get best classifier based on cv mean and fit on entire dataset
best_clf = all_clfs[idx]
best_clf.fit(X,y)
Out[918]:
Pipeline(steps=[('adasyn', ADASYN(random_state=0)),
                ('lgbmclassifier', LGBMClassifier())])

Save your trained model

In [919]:
meta = {
    "numeric_features": numeric_features,
    "cat_cols": cat_cols,
    "dummy_cols": dummy_cols,
    "model_features": model_features
}
meta_filename = f'{AICROWD_ASSETS_DIR}/model_meta.pkl'
joblib.dump(meta, meta_filename)
Out[919]:
['assets/model_meta.pkl']
In [920]:
clf_file = f'{AICROWD_ASSETS_DIR}/best_clf.pkl'
joblib.dump(best_clf,clf_file)
Out[920]:
['assets/best_clf.pkl']

Prediction phase 🔎

Please make sure to save the weights from the training section in your assets directory and load them in this section

In [921]:
clf_file = f'{AICROWD_ASSETS_DIR}/best_clf.pkl'
final_clf = joblib.load(clf_file)
In [922]:
meta_filename = f'{AICROWD_ASSETS_DIR}/model_meta.pkl'
meta = joblib.load(meta_filename)
print(meta.keys())

numeric_features = meta['numeric_features']
cat_cols = meta['cat_cols']
dummy_cols = meta['dummy_cols']
model_features = meta['model_features']
dict_keys(['numeric_features', 'cat_cols', 'dummy_cols', 'model_features'])

Load test data

In [923]:
test_data = pd.read_csv(AICROWD_DATASET_PATH)
test_data.head()
Out[923]:
row_id number_of_digits missing_digit_1 missing_digit_2 missing_digit_3 missing_digit_4 missing_digit_5 missing_digit_6 missing_digit_7 missing_digit_8 ... top_area_perc bottom_area_perc left_area_perc right_area_perc hor_count vert_count eleven_ten_error other_error time_diff centre_dot_detect
0 LA9JQ1JZMJ9D2MBZV 11.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.500272 0.499368 0.553194 0.446447 0 0 0 1 NaN NaN
1 PSSRCWAPTAG72A1NT 6.0 1.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 ... 0.572472 0.427196 0.496352 0.503273 0 1 0 1 NaN NaN
2 GCTODIZJB42VCBZRZ 11.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 ... 0.494076 0.505583 0.503047 0.496615 1 0 0 0 0.0 0.0
3 7YMVQGV1CDB1WZFNE 3.0 1.0 0.0 1.0 0.0 1.0 1.0 1.0 1.0 ... 0.555033 0.444633 0.580023 0.419575 0 1 0 1 NaN NaN
4 PHEQC6DV3LTFJYIJU 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 ... 0.603666 0.395976 0.494990 0.504604 0 0 0 1 150.0 0.0

5 rows × 121 columns

In [924]:
test_data = test_data.copy()

for c in cat_cols:
    test_data[c].fillna("NA", inplace=True)
    
df_test_dummies = pd.get_dummies(test_data[cat_cols], columns=cat_cols, dummy_na=True).add_prefix('CAT_')
test_data = pd.concat([test_data, df_test_dummies], axis=1)
test_data['cnt_NaN'] = test_data[numeric_features].isna().sum(axis=1)

test_data.fillna(-1, inplace=True)

for c in dummy_cols:
    if c not in test_data.columns:
        test_data[c] = 0

print("Missing columns:", [c for c in model_features if c not in test_data.columns])
test_data.head(3)

X_test = test_data[model_features]
Missing columns: []
In [925]:
X_test['more than 12'] = [1 if x > 12 else 0 for x in X_test['number_of_digits'] ]
new_cols = ["missing_digit_", "euc_dist__digit_", "area_digit_", 
           "height_digit_", "width_digit_","dist from "]
for new_col in new_cols:
    digit_columns = X_test.columns[X_test.columns.str.contains(new_col)]
    X_test[new_col + "mean"] = X_test[digit_columns].mean(axis=1)
    X_test[new_col + "std"] = X_test[digit_columns].std(axis=1)
    X_test[new_col + "skew"] = X_test[digit_columns].mean(axis=1)
    X_test[new_col + "kurtosis"] = X_test[digit_columns].std(axis=1)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
In [926]:
X_test.fillna(-1,inplace=True)
pred_probs = final_clf.predict_proba(X_test)

Generate predictions

In [927]:
predictions = {
    "row_id": test_data["row_id"].values,
    "normal_diagnosis_probability": pred_probs[:,0],
    "post_alzheimer_diagnosis_probability": pred_probs[:,1],
    "pre_alzheimer_diagnosis_probability": pred_probs[:,2],
}

predictions_df = pd.DataFrame.from_dict(predictions)
In [928]:
predictions_df.head()
Out[928]:
row_id normal_diagnosis_probability post_alzheimer_diagnosis_probability pre_alzheimer_diagnosis_probability
0 LA9JQ1JZMJ9D2MBZV 0.697201 0.171115 0.131684
1 PSSRCWAPTAG72A1NT 0.250597 0.297936 0.451467
2 GCTODIZJB42VCBZRZ 0.991051 0.005484 0.003464
3 7YMVQGV1CDB1WZFNE 0.467004 0.473815 0.059181
4 PHEQC6DV3LTFJYIJU 0.315124 0.601338 0.083538

Save predictions 📨

In [929]:
predictions_df.to_csv(AICROWD_PREDICTIONS_PATH, index=False)

Submit to AIcrowd 🚀

NOTE: PLEASE SAVE THE NOTEBOOK BEFORE SUBMITTING IT (Ctrl + S)

In [791]:
!DATASET_PATH=$AICROWD_DATASET_PATH \
aicrowd notebook submit \
    --assets-dir $AICROWD_ASSETS_DIR \
    --challenge addi-alzheimers-detection-challenge
API Key valid
Saved API Key successfully!
Using notebook: /home/desktop0/AutomatedBenchmarkingOfImbalancedSamplerAndClassificationPipelines.ipynb for submission...
Removing existing files from submission directory...
Scrubbing API keys from the notebook...
Collecting notebook...
Validating the submission...
Executing install.ipynb...
[NbConvertApp] Converting notebook /home/desktop0/submission/install.ipynb to notebook
[NbConvertApp] Executing notebook with kernel: python
[NbConvertApp] Writing 14517 bytes to /home/desktop0/submission/install.nbconvert.ipynb
Executing predict.ipynb...
[NbConvertApp] Converting notebook /home/desktop0/submission/predict.ipynb to notebook
[NbConvertApp] Executing notebook with kernel: python
2021-05-14 17:21:37.120023: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-05-14 17:21:37.120101: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
[NbConvertApp] Writing 45355 bytes to /home/desktop0/submission/predict.nbconvert.ipynb
submission.zip ━━━━━━━━━━━━━━━━━━━━━━ 100.0%25.6/25.6 MB2.7 MB/s0:00:00[0m • 0:00:01[36m0:00:01
                                                 ╭─────────────────────────╮                                                 
                                                 │ Successfully submitted! │                                                 
                                                 ╰─────────────────────────╯                                                 
                                                       Important links                                                       
┌──────────────────┬────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│  This submission │ https://www.aicrowd.com/challenges/addi-alzheimers-detection-challenge/submissions/137339              │
│                  │                                                                                                        │
│  All submissions │ https://www.aicrowd.com/challenges/addi-alzheimers-detection-challenge/submissions?my_submissions=true │
│                  │                                                                                                        │
│      Leaderboard │ https://www.aicrowd.com/challenges/addi-alzheimers-detection-challenge/leaderboards                    │
│                  │                                                                                                        │
│ Discussion forum │ https://discourse.aicrowd.com/c/addi-alzheimers-detection-challenge                                    │
│                  │                                                                                                        │
│   Challenge page │ https://www.aicrowd.com/challenges/addi-alzheimers-detection-challenge                                 │
└──────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────┘
In [ ]:


Comments

You must login before you can post a comment.

Execute