Loading

BHPOB

[Getting Started Notebook] BHPOB Challange

This is a Baseline Code to get you started with the challenge.

gauransh_k

You can use this code to start understanding the data and create a baseline model for further imporvments.

Starter Code for BHPOB Practice Challange

Note : Create a copy of the notebook and use the copy for submission. Go to File > Save a Copy in Drive to create a new copy

Author: Gauransh Kumar

Downloading Dataset

Installing aicrowd-cli

In [1]:
!pip install aicrowd-cli
%load_ext aicrowd.magic
Requirement already satisfied: aicrowd-cli in /home/gauransh/anaconda3/lib/python3.8/site-packages (0.1.10)
Requirement already satisfied: requests-toolbelt<1,>=0.9.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (0.9.1)
Requirement already satisfied: tqdm<5,>=4.56.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (4.62.2)
Requirement already satisfied: rich<11,>=10.0.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (10.15.2)
Requirement already satisfied: pyzmq==22.1.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (22.1.0)
Requirement already satisfied: click<8,>=7.1.2 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (7.1.2)
Requirement already satisfied: toml<1,>=0.10.2 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (0.10.2)
Requirement already satisfied: requests<3,>=2.25.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (2.26.0)
Requirement already satisfied: GitPython==3.1.18 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (3.1.18)
Requirement already satisfied: gitdb<5,>=4.0.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from GitPython==3.1.18->aicrowd-cli) (4.0.9)
Requirement already satisfied: smmap<6,>=3.0.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from gitdb<5,>=4.0.1->GitPython==3.1.18->aicrowd-cli) (5.0.0)
Requirement already satisfied: certifi>=2017.4.17 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (2021.10.8)
Requirement already satisfied: charset-normalizer~=2.0.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (2.0.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (1.26.6)
Requirement already satisfied: idna<4,>=2.5 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (3.1)
Requirement already satisfied: colorama<0.5.0,>=0.4.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (0.4.4)
Requirement already satisfied: pygments<3.0.0,>=2.6.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (2.10.0)
Requirement already satisfied: commonmark<0.10.0,>=0.9.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (0.9.1)
In [2]:
%aicrowd login
Please login here: https://api.aicrowd.com/auth/khqgel3Ibf7arl1iQq5m-Xq24aEXlWCVOhkqMVL40LE
Opening in existing browser session.
API Key valid
Saved API Key successfully!
In [3]:
!rm -rf data
!mkdir data
%aicrowd ds dl -c bhpob -o data

Importing Libraries

In this baseline, we will be using skleanr library to train the model and generate the predictions

In [4]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import normalize
import os
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display

Reading the dataset

Here, we will read the train.csv which contains both training samples & labels, and test.csv which contains testing samples.

In [5]:
# Reading the CSV
train_data_df = pd.read_csv("data/train.csv")
test_data_df = pd.read_csv("data/test.csv")

# train_data.shape, test_data.shape
display(train_data_df.head())
display(test_data_df.head())
1 2 3 4 5 6 7 8 9 10 ... 13 14 15 16 17 18 19 20 21 22
0 9 0.276818 0.727428 500 0.001247 72.738338 72.742764 0.272572 138.40875 361.59125 ... 12317 32871 65070720 17736480 0.480102 0.249136 0.000826 1 0.307265 3
1 3 0.268178 0.735815 500 0.000856 73.577056 73.581482 0.264185 134.08875 365.91125 ... 11938 33250 65070720 17190720 0.596010 0.233314 0.001202 1 0.530449 2
2 9 0.349013 0.656338 600 0.000424 65.628230 65.633766 0.343662 209.40750 390.59250 ... 18621 35563 78024960 26814240 0.557887 0.314111 0.000643 1 0.457467 2
3 9 0.468750 0.538535 300 0.001276 53.846154 53.853536 0.461465 140.62500 159.37500 ... 12502 14590 39012480 18002880 0.350048 0.412500 0.001038 2 0.136519 3
4 3 0.376538 0.630084 100 0.000501 62.146331 63.008400 0.369916 37.65375 62.34625 ... 65579 5701 13029120 4819680 0.434758 0.327588 0.000735 1 0.308678 0

5 rows × 22 columns

1 2 3 4 5 6 7 8 9 10 ... 12 13 14 15 16 17 18 19 20 21
0 3 0.557522 0.451300 400 0.000428 45.113448 45.130050 0.548700 223.00875 176.99125 ... 36140 19830 16310 52041600 28555200 0.374579 0.479469 0.000600 2 0.112374
1 3 0.360394 0.645944 200 0.000455 64.273873 64.594385 0.354056 72.07875 127.92125 ... 18096 6407 11689 26058240 9226080 0.452161 0.306335 0.000647 1 0.284861
2 3 0.684394 0.327255 200 0.000493 32.692308 32.725464 0.672745 136.87875 63.12125 ... 18096 12174 5922 26058240 17530560 0.265076 0.657018 0.001902 0 0.021206
3 3 0.559238 0.450597 100 0.000704 44.993369 45.059682 0.549403 55.92375 44.07625 ... 9048 4971 4077 13029120 7158240 0.351466 0.492129 0.000626 2 0.087866
4 3 0.792514 0.220435 700 0.000419 22.034005 22.043487 0.779565 554.76000 145.24000 ... 63284 49334 13950 91128960 71040960 0.185165 0.776664 0.000944 0 0.037033

5 rows × 21 columns

Data Preprocessing

In [6]:
# Separating data from the dataframe for final training
X = normalize(train_data_df.drop(columns=["22"]).to_numpy())
y = train_data_df["22"].to_numpy()
print(X.shape, y.shape)
(860, 21) (860,)
In [7]:
# Visualising the final lable classes for training
sns.countplot(y)
/home/gauransh/anaconda3/lib/python3.8/site-packages/seaborn/_decorators.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
  warnings.warn(
Out[7]:
<AxesSubplot:ylabel='count'>

Splitting the data

In [8]:
# Splitting the training set, and training & validation
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2)
print(X_train.shape)
print(y_train.shape)
(688, 21)
(688,)
In [9]:
X_train[0], y_train[0]
Out[9]:
(array([9.55451775e-08, 2.82403943e-09, 7.83863252e-09, 7.43129158e-06,
        4.45877495e-12, 7.83812804e-07, 7.83863135e-07, 2.77749831e-09,
        1.97682972e-06, 5.45446186e-06, 1.52872284e-05, 6.71831223e-04,
        1.75771278e-04, 4.96059945e-04, 9.67436962e-01, 2.53110641e-01,
        5.95735859e-09, 2.45691239e-09, 7.85593681e-12, 1.06161308e-08,
        4.58716643e-09]),
 2)

Training the Model

In [10]:
model = KNeighborsClassifier()
model.fit(X_train, y_train)
Out[10]:
KNeighborsClassifier()

Validation

In [11]:
model.score(X_val, y_val)
Out[11]:
0.8313953488372093

So, we are done with the baseline let's test with real testing data and see how we submit it to challange.

Predictions

In [12]:
# Separating data from the dataframe for final testing
X_test = normalize(test_data_df.to_numpy())
print(X_test.shape)
(215, 21)
In [13]:
# Predicting the labels
predictions = model.predict(X_test)
predictions.shape
Out[13]:
(215,)
In [14]:
# Converting the predictions array into pandas dataset
submission = pd.DataFrame({"node":predictions})
submission
Out[14]:
node
0 3
1 3
2 0
3 0
4 0
... ...
210 2
211 1
212 0
213 0
214 0

215 rows × 1 columns

In [15]:
# Saving the pandas dataframe
!rm -rf assets
!mkdir assets
submission.to_csv(os.path.join("assets", "submission.csv"), index=False)

Submitting our Predictions

Note : Please save the notebook before submitting it (Ctrl + S)

In [16]:
!!aicrowd submission create -c bhpob -f assets/submission.csv
Out[16]:
['submission.csv ━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 2,080/435 bytes • ? • 0:00:00',
 '                                  ╭─────────────────────────╮                                  ',
 '                                  │ Successfully submitted! │                                  ',
 '                                  ╰─────────────────────────╯                                  ',
 '                                        Important links                                        ',
 '┌──────────────────┬──────────────────────────────────────────────────────────────────────────┐',
 '│  This submission │ https://www.aicrowd.com/challenges/bhpob/submissions/169724              │',
 '│                  │                                                                          │',
 '│  All submissions │ https://www.aicrowd.com/challenges/bhpob/submissions?my_submissions=true │',
 '│                  │                                                                          │',
 '│      Leaderboard │ https://www.aicrowd.com/challenges/bhpob/leaderboards                    │',
 '│                  │                                                                          │',
 '│ Discussion forum │ https://discourse.aicrowd.com/c/bhpob                                    │',
 '│                  │                                                                          │',
 '│   Challenge page │ https://www.aicrowd.com/challenges/bhpob                                 │',
 '└──────────────────┴──────────────────────────────────────────────────────────────────────────┘',
 "{'submission_id': 169724, 'created_at': '2021-12-24T18:42:29.092Z'}"]
In [ ]:


Comments

You must login before you can post a comment.

Execute