Starter Code for AZACS Practice Challange

Author: Gauransh Kumar¶

Note : Create a copy of the notebook and use the copy for submission. Go to File > Save a Copy in Drive to create a new copy

Downloading Dataset¶

Installing aicrowd-cli

In [1]:

!pip install aicrowd-cli
%load_ext aicrowd.magic

Requirement already satisfied: aicrowd-cli in /home/gauransh/anaconda3/lib/python3.8/site-packages (0.1.10)
Requirement already satisfied: requests<3,>=2.25.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (2.26.0)
Requirement already satisfied: toml<1,>=0.10.2 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (0.10.2)
Requirement already satisfied: pyzmq==22.1.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (22.1.0)
Requirement already satisfied: requests-toolbelt<1,>=0.9.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (0.9.1)
Requirement already satisfied: rich<11,>=10.0.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (10.15.2)
Requirement already satisfied: click<8,>=7.1.2 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (7.1.2)
Requirement already satisfied: tqdm<5,>=4.56.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (4.62.2)
Requirement already satisfied: GitPython==3.1.18 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (3.1.18)
Requirement already satisfied: gitdb<5,>=4.0.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from GitPython==3.1.18->aicrowd-cli) (4.0.9)
Requirement already satisfied: smmap<6,>=3.0.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from gitdb<5,>=4.0.1->GitPython==3.1.18->aicrowd-cli) (5.0.0)
Requirement already satisfied: idna<4,>=2.5 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (3.1)
Requirement already satisfied: charset-normalizer~=2.0.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (2.0.0)
Requirement already satisfied: certifi>=2017.4.17 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (2021.10.8)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (1.26.6)
Requirement already satisfied: colorama<0.5.0,>=0.4.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (0.4.4)
Requirement already satisfied: commonmark<0.10.0,>=0.9.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (0.9.1)
Requirement already satisfied: pygments<3.0.0,>=2.6.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (2.10.0)

In [2]:

%aicrowd login

Please login here: https://api.aicrowd.com/auth/duk-WWqaR_wPuMCpPntKfFK8zQa6MmFgcna4P2IJvoY
Opening in existing browser session.
API Key valid
Saved API Key successfully!

In [3]:

!rm -rf data
!mkdir data
%aicrowd ds dl -c azacs -o data

Importing Libraries¶

In this baseline, we will be using skleanr library to train the model and generate the predictions

In [4]:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
import os
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display

Reading the dataset¶

Here, we will read the train.csv which contains both training samples & labels, and test.csv which contains testing samples.

In [5]:

# Reading the CSV
train_data_df = pd.read_csv("data/train.csv")
test_data_df = pd.read_csv("data/test.csv")

# train_data.shape, test_data.shape
display(train_data_df.head())
display(test_data_df.head())

	RESOURCE	MGR_ID	ROLE_ROLLUP_1	ROLE_ROLLUP_2	ROLE_DEPTNAME	ROLE_TITLE	ROLE_FAMILY_DESC	ROLE_FAMILY	ROLE_CODE	access
0	153	19760	117961	117962	118352	118321	117906	290919	118322	0
1	5891	44917	117902	117903	118450	120952	126309	118453	120954	1
2	972	4953	117961	118343	123476	120006	294485	118424	120008	1
3	6725	6729	117961	117969	6725	120527	275449	6725	120529	1
4	16174	19623	117961	118225	118403	117905	141377	290919	117908	1

	RESOURCE	MGR_ID	ROLE_ROLLUP_1	ROLE_ROLLUP_2	ROLE_DEPTNAME	ROLE_TITLE	ROLE_FAMILY_DESC	ROLE_FAMILY	ROLE_CODE
0	34924	4642	117961	118225	120551	118321	133936	290919	118322
1	23497	2694	118887	118888	118458	120344	217919	118424	120346
2	14354	21593	117926	118266	117941	117879	117886	19721	117880
3	76446	7505	117961	118300	119181	117905	117906	290919	117908
4	19312	19350	118095	118096	117941	117899	117899	19721	117900

Data Preprocessing¶

In [6]:

# Separating data from the dataframe for final training
X = train_data_df.drop(['access'], axis=1).to_numpy()
y = train_data_df["access"].to_numpy()
print(X.shape, y.shape)

(26215, 9) (26215,)

In [7]:

# Visualising the final lable classes for training
sns.countplot(y)

/home/gauransh/anaconda3/lib/python3.8/site-packages/seaborn/_decorators.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
  warnings.warn(

Out[7]:

<AxesSubplot:ylabel='count'>

Splitting the data¶

In [8]:

# Splitting the training set, and training & validation
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2)
print(X_train.shape)
print(y_train.shape)

(20972, 9)
(20972,)

In [9]:

X_train[0], y_train[0]

Out[9]:

(array([ 85387,  32457, 117961, 118327, 118320, 118321, 117906, 290919,
        118322]),
 1)

Training the Model¶

In [10]:

model = KNeighborsClassifier()
model.fit(X_train, y_train)

Out[10]:

KNeighborsClassifier()

Validation¶

In [11]:

model.score(X_val, y_val)

Out[11]:

0.9412550066755674

So, we are done with the baseline let's test with real testing data and see how we submit it to challange.

Predictions¶

In [12]:

# Separating data from the dataframe for final testing
X_test = test_data_df.to_numpy()
print(X_test.shape)

(6554, 9)

In [13]:

# Predicting the labels
predictions = model.predict(X_test)
predictions.shape

Out[13]:

(6554,)

In [14]:

# Converting the predictions array into pandas dataset
submission = pd.DataFrame({"access":predictions})
submission

Out[14]:

	access
0	1
1	1
2	1
3	1
4	1
...	...
6549	1
6550	1
6551	1
6552	1
6553	1

6554 rows × 1 columns

In [15]:

# Saving the pandas dataframe
!rm -rf assets
!mkdir assets
submission.to_csv(os.path.join("assets", "submission.csv"), index=False)

Submitting our Predictions¶

Note : Please save the notebook before submitting it (Ctrl + S)

In [16]:

!!aicrowd submission create -c azacs -f assets/submission.csv

Out[16]:

['submission.csv ━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 14.8/13.1 KB • 5.1 MB/s • 0:00:00',
 '                                  ╭─────────────────────────╮                                  ',
 '                                  │ Successfully submitted! │                                  ',
 '                                  ╰─────────────────────────╯                                  ',
 '                                        Important links                                        ',
 '┌──────────────────┬──────────────────────────────────────────────────────────────────────────┐',
 '│  This submission │ https://www.aicrowd.com/challenges/azacs/submissions/169713              │',
 '│                  │                                                                          │',
 '│  All submissions │ https://www.aicrowd.com/challenges/azacs/submissions?my_submissions=true │',
 '│                  │                                                                          │',
 '│      Leaderboard │ https://www.aicrowd.com/challenges/azacs/leaderboards                    │',
 '│                  │                                                                          │',
 '│ Discussion forum │ https://discourse.aicrowd.com/c/azacs                                    │',
 '│                  │                                                                          │',
 '│   Challenge page │ https://www.aicrowd.com/challenges/azacs                                 │',
 '└──────────────────┴──────────────────────────────────────────────────────────────────────────┘',
 "{'submission_id': 169713, 'created_at': '2021-12-24T11:51:08.892Z'}"]

In [ ]:

AZACS

[Getting Started Notebook] AZACS Challange