Loading

BLOGF

[Getting Started Notebook] BLOGF Challange

This is a Baseline Code to get you started with the challenge.

gauransh_k

You can use this code to start understanding the data and create a baseline model for further imporvments.

Starter Code for BLOGF Practice Challange

Note : Create a copy of the notebook and use the copy for submission. Go to File > Save a Copy in Drive to create a new copy

Author: Gauransh Kumar

Downloading Dataset

Installing aicrowd-cli

In [1]:
!pip install aicrowd-cli
%load_ext aicrowd.magic
Requirement already satisfied: aicrowd-cli in /home/gauransh/anaconda3/lib/python3.8/site-packages (0.1.10)
Requirement already satisfied: pyzmq==22.1.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (22.1.0)
Requirement already satisfied: rich<11,>=10.0.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (10.15.2)
Requirement already satisfied: requests<3,>=2.25.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (2.26.0)
Requirement already satisfied: toml<1,>=0.10.2 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (0.10.2)
Requirement already satisfied: click<8,>=7.1.2 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (7.1.2)
Requirement already satisfied: tqdm<5,>=4.56.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (4.62.2)
Requirement already satisfied: GitPython==3.1.18 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (3.1.18)
Requirement already satisfied: requests-toolbelt<1,>=0.9.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (0.9.1)
Requirement already satisfied: gitdb<5,>=4.0.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from GitPython==3.1.18->aicrowd-cli) (4.0.9)
Requirement already satisfied: smmap<6,>=3.0.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from gitdb<5,>=4.0.1->GitPython==3.1.18->aicrowd-cli) (5.0.0)
Requirement already satisfied: certifi>=2017.4.17 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (2021.10.8)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (1.26.6)
Requirement already satisfied: idna<4,>=2.5 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (3.1)
Requirement already satisfied: charset-normalizer~=2.0.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (2.0.0)
Requirement already satisfied: commonmark<0.10.0,>=0.9.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (0.9.1)
Requirement already satisfied: pygments<3.0.0,>=2.6.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (2.10.0)
Requirement already satisfied: colorama<0.5.0,>=0.4.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (0.4.4)
In [2]:
%aicrowd login
Please login here: https://api.aicrowd.com/auth/OEagDshs79RBGByG1cKNWtA8goZ7JzGAGGi5AF_i-AM
Opening in existing browser session.
API Key valid
Saved API Key successfully!
In [3]:
!rm -rf data
!mkdir data
%aicrowd ds dl -c blogf -o data

Importing Libraries

In this baseline, we will be using skleanr library to train the model and generate the predictions

In [4]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import normalize
import os
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display

Reading the dataset

Here, we will read the train.csv which contains both training samples & labels, and test.csv which contains testing samples.

In [34]:
# Reading the CSV
train_data_df = pd.read_csv("data/train.csv", header=None)
test_data_df = pd.read_csv("data/test.csv", header=None, on_bad_lines='skip', skipfooter=1, engine='python')

# train_data.shape, test_data.shape
display(train_data_df.head())
display(test_data_df.head())
0 1 2 3 4 5 6 7 8 9 ... 271 272 273 274 275 276 277 278 279 280
0 3.721774 7.941267 0 43 0.0 1.366936 4.506093 0 41 0.0 ... 0 0 1 0 0 0 0 0 0.0 0
1 40.304670 53.845657 0 401 15.0 15.524160 32.441880 0 377 3.0 ... 0 0 0 1 0 0 0 0 0.0 0
2 16.593575 19.671364 1 144 10.0 6.512450 11.051215 0 111 2.0 ... 1 0 0 0 0 0 0 0 0.0 2
3 2.285714 8.414516 0 102 1.0 0.973810 4.721458 0 93 0.0 ... 1 0 0 0 0 0 0 0 0.0 0
4 56.512093 77.442830 0 438 32.0 19.296530 49.221344 0 432 0.0 ... 1 0 0 0 0 2 0 0 0.0 25

5 rows × 281 columns

0 1 2 3 4 5 6 7 8 9 ... 270 271 272 273 274 275 276 277 278 279
0 10.630660 17.882992 1 259 5.0 4.018276 10.396790 0 235 1.0 ... 1 0 0 0 0 0 0 0 0 0.0
1 0.214286 0.646813 0 3 0.0 0.089286 0.342094 0 2 0.0 ... 0 0 1 0 0 0 0 0 0 0.0
2 10.630660 17.882992 1 259 5.0 4.018276 10.396790 0 235 1.0 ... 0 0 0 0 1 0 0 0 0 0.0
3 1.428572 1.978401 0 14 1.0 0.511278 1.101035 0 8 0.0 ... 0 0 1 0 0 0 0 0 0 0.0
4 27.333334 48.309480 0 134 2.0 13.083333 36.713890 0 134 0.0 ... 1 0 0 0 0 0 0 0 0 0.0

5 rows × 280 columns

Data Preprocessing

In [6]:
# Separating data from the dataframe for final training
X = train_data_df.drop(columns=[280]).to_numpy()
y = train_data_df[280].to_numpy()
print(X.shape, y.shape)
(41916, 280) (41916,)

Splitting the data

In [7]:
# Splitting the training set, and training & validation
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2)
print(X_train.shape)
print(y_train.shape)
(33532, 280)
(33532,)
In [8]:
X_train[0], y_train[0]
Out[8]:
(array([ 4.6666665e+00,  1.4459277e+01,  0.0000000e+00,  7.6000000e+01,
         0.0000000e+00,  2.4848485e+00,  9.2925120e+00,  0.0000000e+00,
         6.2000000e+01,  0.0000000e+00,  1.8686869e+00,  8.7278110e+00,
         0.0000000e+00,  6.2000000e+01,  0.0000000e+00,  1.6262627e+00,
         4.9412420e+00,  0.0000000e+00,  2.6000000e+01,  0.0000000e+00,
         6.1616164e-01,  1.0848551e+01, -4.8000000e+01,  6.2000000e+01,
         0.0000000e+00,  1.0101010e-01,  4.6045542e-01,  0.0000000e+00,
         3.0000000e+00,  0.0000000e+00,  5.0505050e-02,  2.9725130e-01,
         0.0000000e+00,  2.0000000e+00,  0.0000000e+00,  3.0303031e-02,
         2.2268088e-01,  0.0000000e+00,  2.0000000e+00,  0.0000000e+00,
         6.0606062e-02,  3.4283966e-01,  0.0000000e+00,  2.0000000e+00,
         0.0000000e+00,  2.0202020e-02,  3.1717816e-01, -2.0000000e+00,
         2.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         6.2000000e+01,  2.6630000e+03,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         1.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  1.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  1.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  1.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  1.0000000e+00,  1.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  1.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  1.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  1.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         1.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         1.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  1.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  1.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  1.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  1.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  1.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  1.0000000e+00,  0.0000000e+00,  0.0000000e+00,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00]),
 0)

Training the Model

In [9]:
model = LogisticRegression()
model.fit(X_train, y_train)
/home/gauransh/anaconda3/lib/python3.8/site-packages/sklearn/linear_model/_logistic.py:763: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
Out[9]:
LogisticRegression()

Validation

In [10]:
model.score(X_val, y_val)
Out[10]:
0.6197519083969466

So, we are done with the baseline let's test with real testing data and see how we submit it to challange.

Predictions

In [35]:
# Separating data from the dataframe for final testing
X_test = test_data_df.to_numpy()
print(X_test.shape)
(10479, 280)
In [36]:
test_data_df.tail()
Out[36]:
0 1 2 3 4 5 6 7 8 9 ... 270 271 272 273 274 275 276 277 278 279
10474 38.200640 75.516920 0 723 15.0 16.194221 44.316776 0 547 4.0 ... 0 0 1 0 0 0 0 0 0 0.0
10475 27.230215 45.970950 0 371 14.0 10.784173 24.209942 0 228 4.0 ... 0 0 0 1 0 0 0 0 0 0.0
10476 0.000000 0.000000 0 0 0.0 0.000000 0.000000 0 0 0.0 ... 0 1 0 0 0 0 0 0 0 0.0
10477 65.469185 102.356910 0 724 20.0 24.737106 61.875816 0 586 2.0 ... 0 0 0 0 0 0 0 0 0 0.0
10478 24.661972 61.212154 0 263 2.0 9.802817 39.872902 0 256 0.0 ... 0 1 0 0 0 0 0 0 0 0.0

5 rows × 280 columns

In [37]:
# Predicting the labels
predictions = model.predict(X_test)
predictions.shape
Out[37]:
(10479,)
In [38]:
# Converting the predictions array into pandas dataset
submission = pd.DataFrame({"comments":predictions})
submission
Out[38]:
comments
0 0
1 0
2 0
3 0
4 0
... ...
10474 0
10475 0
10476 0
10477 0
10478 0

10479 rows × 1 columns

In [39]:
# Saving the pandas dataframe
!rm -rf assets
!mkdir assets
submission.to_csv(os.path.join("assets", "submission.csv"), index=False)

Submitting our Predictions

Note : Please save the notebook before submitting it (Ctrl + S)

In [40]:
!!aicrowd submission create -c blogf -f assets/submission.csv
Out[40]:
['submission.csv ━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 22.9/21.3 KB • 7.1 MB/s • 0:00:00',
 '                                  ╭─────────────────────────╮                                  ',
 '                                  │ Successfully submitted! │                                  ',
 '                                  ╰─────────────────────────╯                                  ',
 '                                        Important links                                        ',
 '┌──────────────────┬──────────────────────────────────────────────────────────────────────────┐',
 '│  This submission │ https://www.aicrowd.com/challenges/blogf/submissions/169732              │',
 '│                  │                                                                          │',
 '│  All submissions │ https://www.aicrowd.com/challenges/blogf/submissions?my_submissions=true │',
 '│                  │                                                                          │',
 '│      Leaderboard │ https://www.aicrowd.com/challenges/blogf/leaderboards                    │',
 '│                  │                                                                          │',
 '│ Discussion forum │ https://discourse.aicrowd.com/c/blogf                                    │',
 '│                  │                                                                          │',
 '│   Challenge page │ https://www.aicrowd.com/challenges/blogf                                 │',
 '└──────────────────┴──────────────────────────────────────────────────────────────────────────┘',
 "{'submission_id': 169732, 'created_at': '2021-12-24T19:53:51.972Z'}"]
In [ ]:


Comments

You must login before you can post a comment.

Execute