Loading

AZACS

[Getting Started Notebook] AZACS Challange

This is a Baseline Code to get you started with the challenge.

gauransh_k

You can use this code to start understanding the data and create a baseline model for further imporvments.

Starter Code for AZACS Practice Challange

Author: Gauransh Kumar

Note : Create a copy of the notebook and use the copy for submission. Go to File > Save a Copy in Drive to create a new copy

Downloading Dataset

Installing aicrowd-cli

In [1]:
!pip install aicrowd-cli
%load_ext aicrowd.magic
Requirement already satisfied: aicrowd-cli in /home/gauransh/anaconda3/lib/python3.8/site-packages (0.1.10)
Requirement already satisfied: requests<3,>=2.25.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (2.26.0)
Requirement already satisfied: toml<1,>=0.10.2 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (0.10.2)
Requirement already satisfied: pyzmq==22.1.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (22.1.0)
Requirement already satisfied: requests-toolbelt<1,>=0.9.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (0.9.1)
Requirement already satisfied: rich<11,>=10.0.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (10.15.2)
Requirement already satisfied: click<8,>=7.1.2 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (7.1.2)
Requirement already satisfied: tqdm<5,>=4.56.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (4.62.2)
Requirement already satisfied: GitPython==3.1.18 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (3.1.18)
Requirement already satisfied: gitdb<5,>=4.0.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from GitPython==3.1.18->aicrowd-cli) (4.0.9)
Requirement already satisfied: smmap<6,>=3.0.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from gitdb<5,>=4.0.1->GitPython==3.1.18->aicrowd-cli) (5.0.0)
Requirement already satisfied: idna<4,>=2.5 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (3.1)
Requirement already satisfied: charset-normalizer~=2.0.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (2.0.0)
Requirement already satisfied: certifi>=2017.4.17 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (2021.10.8)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (1.26.6)
Requirement already satisfied: colorama<0.5.0,>=0.4.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (0.4.4)
Requirement already satisfied: commonmark<0.10.0,>=0.9.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (0.9.1)
Requirement already satisfied: pygments<3.0.0,>=2.6.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (2.10.0)
In [2]:
%aicrowd login
Please login here: https://api.aicrowd.com/auth/duk-WWqaR_wPuMCpPntKfFK8zQa6MmFgcna4P2IJvoY
Opening in existing browser session.
API Key valid
Saved API Key successfully!
In [3]:
!rm -rf data
!mkdir data
%aicrowd ds dl -c azacs -o data

Importing Libraries

In this baseline, we will be using skleanr library to train the model and generate the predictions

In [4]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
import os
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display

Reading the dataset

Here, we will read the train.csv which contains both training samples & labels, and test.csv which contains testing samples.

In [5]:
# Reading the CSV
train_data_df = pd.read_csv("data/train.csv")
test_data_df = pd.read_csv("data/test.csv")

# train_data.shape, test_data.shape
display(train_data_df.head())
display(test_data_df.head())
RESOURCE MGR_ID ROLE_ROLLUP_1 ROLE_ROLLUP_2 ROLE_DEPTNAME ROLE_TITLE ROLE_FAMILY_DESC ROLE_FAMILY ROLE_CODE access
0 153 19760 117961 117962 118352 118321 117906 290919 118322 0
1 5891 44917 117902 117903 118450 120952 126309 118453 120954 1
2 972 4953 117961 118343 123476 120006 294485 118424 120008 1
3 6725 6729 117961 117969 6725 120527 275449 6725 120529 1
4 16174 19623 117961 118225 118403 117905 141377 290919 117908 1
RESOURCE MGR_ID ROLE_ROLLUP_1 ROLE_ROLLUP_2 ROLE_DEPTNAME ROLE_TITLE ROLE_FAMILY_DESC ROLE_FAMILY ROLE_CODE
0 34924 4642 117961 118225 120551 118321 133936 290919 118322
1 23497 2694 118887 118888 118458 120344 217919 118424 120346
2 14354 21593 117926 118266 117941 117879 117886 19721 117880
3 76446 7505 117961 118300 119181 117905 117906 290919 117908
4 19312 19350 118095 118096 117941 117899 117899 19721 117900

Data Preprocessing

In [6]:
# Separating data from the dataframe for final training
X = train_data_df.drop(['access'], axis=1).to_numpy()
y = train_data_df["access"].to_numpy()
print(X.shape, y.shape)
(26215, 9) (26215,)
In [7]:
# Visualising the final lable classes for training
sns.countplot(y)
/home/gauransh/anaconda3/lib/python3.8/site-packages/seaborn/_decorators.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
  warnings.warn(
Out[7]:
<AxesSubplot:ylabel='count'>

Splitting the data

In [8]:
# Splitting the training set, and training & validation
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2)
print(X_train.shape)
print(y_train.shape)
(20972, 9)
(20972,)
In [9]:
X_train[0], y_train[0]
Out[9]:
(array([ 85387,  32457, 117961, 118327, 118320, 118321, 117906, 290919,
        118322]),
 1)

Training the Model

In [10]:
model = KNeighborsClassifier()
model.fit(X_train, y_train)
Out[10]:
KNeighborsClassifier()

Validation

In [11]:
model.score(X_val, y_val)
Out[11]:
0.9412550066755674

So, we are done with the baseline let's test with real testing data and see how we submit it to challange.

Predictions

In [12]:
# Separating data from the dataframe for final testing
X_test = test_data_df.to_numpy()
print(X_test.shape)
(6554, 9)
In [13]:
# Predicting the labels
predictions = model.predict(X_test)
predictions.shape
Out[13]:
(6554,)
In [14]:
# Converting the predictions array into pandas dataset
submission = pd.DataFrame({"access":predictions})
submission
Out[14]:
access
0 1
1 1
2 1
3 1
4 1
... ...
6549 1
6550 1
6551 1
6552 1
6553 1

6554 rows × 1 columns

In [15]:
# Saving the pandas dataframe
!rm -rf assets
!mkdir assets
submission.to_csv(os.path.join("assets", "submission.csv"), index=False)

Submitting our Predictions

Note : Please save the notebook before submitting it (Ctrl + S)

In [16]:
!!aicrowd submission create -c azacs -f assets/submission.csv
Out[16]:
['submission.csv ━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 14.8/13.1 KB • 5.1 MB/s • 0:00:00',
 '                                  ╭─────────────────────────╮                                  ',
 '                                  │ Successfully submitted! │                                  ',
 '                                  ╰─────────────────────────╯                                  ',
 '                                        Important links                                        ',
 '┌──────────────────┬──────────────────────────────────────────────────────────────────────────┐',
 '│  This submission │ https://www.aicrowd.com/challenges/azacs/submissions/169713              │',
 '│                  │                                                                          │',
 '│  All submissions │ https://www.aicrowd.com/challenges/azacs/submissions?my_submissions=true │',
 '│                  │                                                                          │',
 '│      Leaderboard │ https://www.aicrowd.com/challenges/azacs/leaderboards                    │',
 '│                  │                                                                          │',
 '│ Discussion forum │ https://discourse.aicrowd.com/c/azacs                                    │',
 '│                  │                                                                          │',
 '│   Challenge page │ https://www.aicrowd.com/challenges/azacs                                 │',
 '└──────────────────┴──────────────────────────────────────────────────────────────────────────┘',
 "{'submission_id': 169713, 'created_at': '2021-12-24T11:51:08.892Z'}"]
In [ ]:


Comments

You must login before you can post a comment.

Execute