Environment Classification

Solution for submission 156842

Environment Classification

In this challenge, you will have images of a self driving car moving through a town in different weather conditions. Your goal will be to classify the environment into 5 different classes ( using unsupervised methonds ), 1 means the weather is really good for a self driving car while 5 means the weather is very challenging for a self driving car.

Unsupvised Image Classification

Image clustering using Transfer learning¶

Resnet50 + Kmeans based image clustering model¶

https://towardsdatascience.com/image-clustering-using-transfer-learning-df5862779571

In [1]:

!pip install -q aicrowd-cli
%load_ext aicrowd.magic

     |████████████████████████████████| 44 kB 1.3 MB/s 
     |████████████████████████████████| 1.1 MB 7.2 MB/s 
     |████████████████████████████████| 54 kB 2.8 MB/s 
     |████████████████████████████████| 211 kB 52.1 MB/s 
     |████████████████████████████████| 170 kB 40.0 MB/s 
     |████████████████████████████████| 62 kB 863 kB/s 
     |████████████████████████████████| 63 kB 1.8 MB/s 
     |████████████████████████████████| 51 kB 7.2 MB/s 
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires requests~=2.23.0, but you have requests 2.26.0 which is incompatible.
datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.

In [2]:

%aicrowd login

Please login here: https://api.aicrowd.com/auth/qBNvIq0YzlT6GTsihXKlzHlxe7hIACPEFHrs6maKIgQ
API Key valid
Saved API Key successfully!

In [3]:

# Downloading the Dataset
!rm -rf data
!mkdir data
%aicrowd ds dl -c environment-classification -o data

In [4]:

# Unzipping and Organising the datasets
!unzip data/images.zip  -d data/images > /dev/null

In [5]:

import os
import csv 
from pathlib import Path
import random
import time

import pandas as pd
import numpy as np

In [6]:

DATA_DIR = "data/images/"

Model¶

In [7]:

from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.models import Sequential

resnet = ResNet50(include_top=False, pooling='avg', weights='imagenet')
my_new_model = Sequential()
my_new_model.add(resnet)

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5
94773248/94765736 [==============================] - 1s 0us/step
94781440/94765736 [==============================] - 1s 0us/step

In [8]:

# Say not to train first layer (ResNet) model. It is already trained
my_new_model.layers[0].trainable = False

Images Preprocessing¶

In [9]:

%%time
from tensorflow.keras.applications.resnet50 import preprocess_input
import cv2 
import numpy as np

resnet_feature_list = []
images = [f for f in os.listdir(DATA_DIR)]
for image in images:
    file = DATA_DIR+image
    #print(file)
    im = cv2.imread(file)
    #im = cv2.resize(im,(256,256))
    img = preprocess_input(np.expand_dims(im.copy(), axis=0))
    resnet_feature = my_new_model.predict(img)
    resnet_feature_np = np.array(resnet_feature)
    resnet_feature_list.append(resnet_feature_np.flatten())

array = np.array(resnet_feature_list)

CPU times: user 1min 13s, sys: 2.45 s, total: 1min 15s
Wall time: 1min 39s

In [10]:

array.shape

Out[10]:

(700, 2048)

Training¶

https://scikit-learn.org/stable/modules/clustering.html

In [11]:

from sklearn.cluster import KMeans 

kmeans = KMeans(n_clusters=5, random_state=None, n_init=50, max_iter=1000).fit(array)  # 

print(kmeans.labels_)

[3 4 3 4 4 4 3 1 4 4 3 0 0 4 4 4 3 4 2 4 2 4 3 3 1 4 4 2 4 4 1 4 3 0 4 4 0
 3 3 3 1 2 3 3 4 3 1 4 4 3 2 2 3 4 1 2 3 3 4 3 2 3 4 2 3 0 2 1 2 3 2 4 4 4
 4 4 4 2 4 3 4 4 4 0 3 2 3 1 2 4 4 4 1 4 3 1 3 3 1 1 3 4 4 1 1 3 4 2 3 3 3
 3 3 4 4 4 4 3 2 3 3 1 4 2 3 3 3 4 4 4 4 3 3 3 0 3 2 3 3 4 4 3 3 1 3 4 2 1
 3 3 2 2 4 2 2 4 4 4 3 3 2 3 4 4 3 2 2 4 4 3 1 1 3 4 1 2 2 1 3 4 3 3 3 3 3
 2 2 4 3 1 1 3 3 4 4 4 2 2 3 4 4 4 1 1 4 3 4 2 3 4 4 4 2 3 2 1 3 4 2 2 4 2
 1 1 1 3 3 1 3 3 2 4 3 4 2 2 4 1 4 4 4 4 4 4 4 4 2 2 4 4 4 3 3 1 1 4 2 2 1
 3 1 4 4 2 4 4 3 4 3 3 4 3 2 4 4 4 2 2 3 1 4 4 3 4 1 3 3 4 4 1 4 1 4 2 4 2
 2 4 2 1 3 2 4 3 4 2 4 4 4 3 4 1 1 3 3 1 4 1 3 1 4 1 4 1 2 1 2 3 3 4 2 3 1
 2 2 3 1 1 3 4 4 3 1 2 1 2 4 1 2 2 3 4 3 3 2 4 1 3 3 3 1 3 2 3 2 0 1 4 1 2
 3 4 4 1 0 3 1 1 4 3 4 4 3 4 3 1 4 3 4 0 4 4 3 4 4 3 3 4 0 4 2 4 3 3 3 4 0
 2 2 1 4 3 3 3 4 3 3 4 2 3 4 1 1 3 4 1 2 2 3 3 4 4 2 3 4 2 4 4 3 4 4 4 3 2
 4 4 2 2 3 4 3 2 3 1 4 3 3 4 3 3 4 1 3 1 0 4 1 2 3 4 1 4 3 2 1 4 4 4 3 4 1
 3 2 4 3 4 3 2 3 1 4 1 3 4 4 1 1 1 3 3 3 2 3 3 3 1 3 2 4 1 1 1 4 3 4 4 3 1
 4 2 3 1 4 2 3 3 0 3 3 3 1 4 4 3 4 4 3 3 3 4 1 4 4 2 3 3 1 4 2 1 4 3 2 0 4
 3 4 2 4 1 4 0 2 4 1 3 3 0 0 3 3 3 1 2 3 2 4 2 4 1 1 2 3 3 3 4 4 3 1 2 4 3
 3 3 2 1 3 4 3 3 3 2 3 1 2 1 3 4 4 2 4 3 1 3 4 4 3 1 3 4 3 4 3 4 4 2 2 3 1
 3 4 2 0 3 4 2 1 1 3 1 1 3 4 2 4 1 4 1 0 4 4 3 4 2 4 4 4 4 1 3 1 3 3 4 4 4
 4 3 1 4 3 3 3 4 1 3 3 2 3 4 3 3 3 3 3 3 4 4 2 2 2 3 3 3 3 4 3 1 3 2]

Submission¶

In [12]:

img_ids_list = [f[:-4] for f in images]

In [15]:

img_ids_list[0]

Out[15]:

'200'

In [23]:

pre_sub = {'ImageID':img_ids_list, "label":kmeans.labels_}
pre_sub = pd.DataFrame(pre_sub)

pre_sub = pre_sub.astype(int)
pre_sub = pre_sub.sort_values(by=['ImageID'])  

pre_sub

Out[23]:

	ImageID	label
160	0	2
127	1	4
410	2	4
484	3	3
324	4	2
...	...	...
528	695	3
94	696	3
175	697	2
249	698	4
97	699	3

700 rows × 2 columns

In [24]:

pre_sub.label.value_counts()

Out[24]:

4    231
3    224
2    114
1    111
0     20
Name: label, dtype: int64

It is clear that 20 images are missclassified, we get rid of them and repeat training process¶

In [33]:

to_del = np.array(pre_sub[pre_sub.label == 0].ImageID)
to_del = set(to_del)
images_clean = []
for image in images:
    if int(image[:-4]) not in to_del:
        images_clean.append(image)
len(images_clean)

Out[33]:

In [34]:

%%time
from tensorflow.keras.applications.resnet50 import preprocess_input
import cv2 
import numpy as np

resnet_feature_list = []
# images = [f for f in os.listdir(DATA_DIR)]
for image in images_clean:
    file = DATA_DIR+image
    #print(file)
    im = cv2.imread(file)
    #im = cv2.resize(im,(256,256))
    img = preprocess_input(np.expand_dims(im.copy(), axis=0))
    resnet_feature = my_new_model.predict(img)
    resnet_feature_np = np.array(resnet_feature)
    resnet_feature_list.append(resnet_feature_np.flatten())

array = np.array(resnet_feature_list)

CPU times: user 1min 10s, sys: 1.56 s, total: 1min 11s
Wall time: 1min 10s

In [36]:

array.shape

Out[36]:

(680, 2048)

In [37]:

from sklearn.cluster import KMeans 

kmeans = KMeans(n_clusters=5, random_state=None, n_init=50, max_iter=1000).fit(array)  #

In [38]:

img_ids_list_clean = [f[:-4] for f in images_clean]

In [84]:

pre_sub_2 = {'ImageID':img_ids_list_clean, "label":kmeans.labels_}
pre_sub_2 = pd.DataFrame(pre_sub_2)

pre_sub_2 = pre_sub_2.astype(int)

rnd_labels = []
for i in range(len(to_del)):
    rnd_labels.append(random.randint(0,4))
missing_labels = [3, 3, 2, 1, 1, 0, 1, 2, 1, 2, 2, 2, 3, 2, 1, 4, 0, 2, 0, 3]

ending = {'ImageID':list(to_del), 'label':rnd_labels}
ending = pd.DataFrame(ending)

submission = pd.concat([pre_sub_2, ending], axis=0)
submission = submission.sort_values(by=['ImageID'])

Out[84]:

In [65]:

!rm -rf assets
!mkdir assets

submission.to_csv(os.path.join("assets", "submission.csv"), index=False)

In [ ]:

Making Direct Submission thought Aicrowd CLI¶

In [66]:

/usr/local/lib/python3.7/dist-packages/aicrowd/notebook/helpers.py:361: UserWarning: `%aicrowd` magic command can be used to save the notebook inside jupyter notebook/jupyterLab environment and also to get the notebook directly from the frontend without mounting the drive in colab environment. You can use magic command to skip mounting the drive and submit using the code below:
 %load_ext aicrowd.magic
%aicrowd notebook submit -c environment-classification -a assets --no-verify
  warnings.warn(description + code)
Mounting Google Drive 💾
Your Google Drive will be mounted to access the colab notebook
Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.activity.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fexperimentsandconfigs%20https%3a%2f%2fwww.googleapis.com%2fauth%2fphotos.native&response_type=code

Enter your authorization code:
4/1AX4XfWgbhNxSVaym8w-m_l37MSNBND4uLZlPzKW86wtP8nl4v_QZSpgZpkk
Mounted at /content/drive
Using notebook: BlitzXI_ResNet50_Kmeans_Cluster.ipynb for submission...
Scrubbing API keys from the notebook...
Collecting notebook...
submission.zip ━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 25.9/24.2 KB • 1.4 MB/s • 0:00:00
                                                       ╭─────────────────────────╮                                                       
                                                       │ Successfully submitted! │                                                       
                                                       ╰─────────────────────────╯                                                       
                                                             Important links                                                             
┌──────────────────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│  This submission │ https://www.aicrowd.com/challenges/ai-blitz-xi/problems/environment-classification/submissions/156841              │
│                  │                                                                                                                    │
│  All submissions │ https://www.aicrowd.com/challenges/ai-blitz-xi/problems/environment-classification/submissions?my_submissions=true │
│                  │                                                                                                                    │
│      Leaderboard │ https://www.aicrowd.com/challenges/ai-blitz-xi/problems/environment-classification/leaderboards                    │
│                  │                                                                                                                    │
│ Discussion forum │ https://discourse.aicrowd.com/c/ai-blitz-xi                                                                        │
│                  │                                                                                                                    │
│   Challenge page │ https://www.aicrowd.com/challenges/ai-blitz-xi/problems/environment-classification                                 │
└──────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

In [ ]:

Content

2439

Show Comments

Comments

You must login before you can post a comment.

	ImageID	label
160	0	2
127	1	4
410	2	4
484	3	3
324	4	2
...	...	...
528	695	3
94	696	3
175	697	2
249	698	4
97	699	3

	ImageID	label
160	0	2
127	1	4
410	2	4
484	3	3
324	4	2
...	...	...
528	695	3
94	696	3
175	697	2
249	698	4
97	699	3

	ImageID	label
160	0	2
127	1	4
410	2	4
484	3	3
324	4	2
...	...	...
528	695	3
94	696	3
175	697	2
249	698	4
97	699	3