In this notebook you can find an implementation of CatBoostClassifier and cross-validation for better measures of model performance!
With this notebook, you will increase the stability of your models. So, we I will use K-Folds technique because its a popular and easy to understand. I will use 5 Folds.
Plan:
- Split dataset into 5 Folds.
- Fit the model on 4 folds and validate using remaining fold.
- Repeat this 5 times.
- Inference our models on test data and submit them.
In [ ]:
!pip install -q aicrowd-cli
!pip install -q catboost
%load_ext aicrowd.magic
%aicrowd login
!rm -rf data
!mkdir data
%aicrowd ds dl -c obstacle-prediction -o data
Importing Libraries¶
In [ ]:
import pandas as pd
import numpy as np
from sklearn.model_selection import KFold
import os
import matplotlib.pyplot as plt
import seaborn as sns
from catboost import CatBoostClassifier
Reading the dataset and converting it to 1D array to train models
In [ ]:
data = np.load("/content/data/data.npz", allow_pickle=True)
train_data = data["train"]
test_data = data['test']
X = np.array([sample.flatten() for sample in train_data[:, 0].tolist()])
y = np.array(train_data[:, 1].tolist())
Training the Model¶
In [ ]:
kf = KFold(n_splits=5, shuffle=True)
models = []
for i, (train_index, valid_index) in enumerate(kf.split(X)):
X_train, y_train = X[train_index], y[train_index]
X_valid, y_valid = X[valid_index], y[valid_index]
model = CatBoostClassifier(
iterations = 2,
depth = 1,
verbose = 10
)
model.fit(X_train, y_train, eval_set=(X_valid, y_valid))
models.append(model)
Inference¶
In [ ]:
# Converting each testing sample into 1D array
X_test = [sample.flatten() for sample in test_data.tolist()]
predictions = np.array([0. for i in range(len(X_test))])
for model in models:
preds = model.predict_proba(X_test)
predictions += np.array([pr[1] for pr in preds])
predictions = [1 if pr > 0.5 else 0 for pr in predictions]
In [ ]:
submission = pd.DataFrame({"label":predictions})
submission.head()
Out[ ]:
In [ ]:
# Saving the pandas dataframe
!rm -rf assets
!mkdir assets
submission.to_csv(os.path.join("assets", "submission.csv"), index=False)
Submitting our Predictions¶
Note : Please save the notebook before submitting it (Ctrl + S)
In [ ]:
!aicrowd notebook submit -c obstacle-prediction -a assets --no-verify
Content
Comments
You must login before you can post a comment.
Hi, why didn’t you divide the sum of probabilities by the number of models (5 in this case)?
Yes you’re right, but I think that in this contest we have a very large data dimension and model can based just on small part of them, so just 1 model can find obstackle and set prediction to 1.