Loading

Data Purchasing Challenge 2022

Sneak Peek into the image samples from Round 2 dataset.

This notebook will help you to visualise images from different classes and combinations of them.

sagar_rathod

Quickly take a look at the image samples of different class labels.

This notebook will help you to understand images from different classes. Specifically, images of 'stray_partical' and 'discoloration' new classes introduced in 2nd round of this challenge.

We make use of deepml python library to quickly visualize these images.

In [ ]:
!pip install deepml
In [1]:
import pandas as pd
import deepml
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib as mpl

#mpl.rcParams['text.color'] = 'white'
In [2]:
train_df = pd.read_csv("data-purchasing-challenge-2022-starter-kit/data/training/labels.csv")
train_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   filename        1000 non-null   object
 1   scratch_small   1000 non-null   int64 
 2   scratch_large   1000 non-null   int64 
 3   dent_small      1000 non-null   int64 
 4   dent_large      1000 non-null   int64 
 5   stray_particle  1000 non-null   int64 
 6   discoloration   1000 non-null   int64 
dtypes: int64(6), object(1)
memory usage: 54.8+ KB
In [3]:
train_df.head()
Out[3]:
filename scratch_small scratch_large dent_small dent_large stray_particle discoloration
0 np7x98vV9L.png 0 0 1 0 0 0
1 eJL9eBxtwi.png 1 0 0 0 0 0
2 Mm0wzMknhT.png 0 0 0 0 0 0
3 UJhpQVf8LP.png 0 0 0 0 0 0
4 5vpsw4NX6n.png 0 0 0 0 0 0

Create additional class called 'no_defect' for image samples containig no damages.

In [4]:
train_df['no_defect'] = (~train_df.iloc[:, 1:].any(axis=1)).astype(int)
In [5]:
classes = train_df.columns[1:].tolist()
classes
Out[5]:
['scratch_small',
 'scratch_large',
 'dent_small',
 'dent_large',
 'stray_particle',
 'discoloration',
 'no_defect']

Since it's a multiclass classification challenge, let's create Joined Class Label Distribution.

In [6]:
train_df['joined_label'] = train_df[classes].apply(lambda row: " ".join([c for c in classes if row[c]]),
                                                                                     axis=1)
train_df.head()
Out[6]:
filename scratch_small scratch_large dent_small dent_large stray_particle discoloration no_defect joined_label
0 np7x98vV9L.png 0 0 1 0 0 0 0 dent_small
1 eJL9eBxtwi.png 1 0 0 0 0 0 0 scratch_small
2 Mm0wzMknhT.png 0 0 0 0 0 0 1 no_defect
3 UJhpQVf8LP.png 0 0 0 0 0 0 1 no_defect
4 5vpsw4NX6n.png 0 0 0 0 0 0 1 no_defect
In [7]:
train_df['joined_label'].value_counts()
Out[7]:
stray_particle                                                                    524
no_defect                                                                         202
scratch_small dent_small stray_particle discoloration                              34
dent_small                                                                         30
scratch_small dent_large stray_particle discoloration                              27
dent_large stray_particle                                                          23
scratch_small scratch_large dent_small stray_particle                              23
scratch_small dent_small stray_particle                                            21
scratch_small dent_small                                                           19
scratch_small scratch_large                                                        16
scratch_small                                                                      14
scratch_small scratch_large dent_small dent_large stray_particle discoloration     12
scratch_large                                                                       6
dent_large                                                                          6
dent_small discoloration                                                            5
dent_small stray_particle                                                           5
scratch_small scratch_large dent_large stray_particle discoloration                 5
scratch_large dent_small                                                            4
dent_large stray_particle discoloration                                             4
scratch_small scratch_large dent_large                                              3
scratch_small dent_large                                                            3
scratch_large dent_small discoloration                                              2
scratch_small scratch_large dent_small stray_particle discoloration                 2
scratch_small scratch_large dent_small dent_large discoloration                     2
scratch_small stray_particle                                                        2
scratch_small scratch_large dent_small dent_large stray_particle                    2
scratch_small scratch_large dent_small discoloration                                1
dent_small dent_large discoloration                                                 1
discoloration                                                                       1
scratch_small dent_small dent_large stray_particle                                  1
Name: joined_label, dtype: int64
In [8]:
plt.figure(figsize=(10,15))
sns.countplot(y='joined_label', data=train_df)
Out[8]:
<AxesSubplot:xlabel='count', ylabel='joined_label'>
In [9]:
from deepml.visualize import show_images_from_dataframe
/Users/rathods/opt/anaconda3/envs/machine_learning/lib/python3.7/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Random samples from training csv file

In [10]:
train_image_dir = "data-purchasing-challenge-2022-starter-kit/data/training/images"
show_images_from_dataframe(train_df, img_dir = train_image_dir, image_file_name_column='filename', 
                           label_column='joined_label', samples=10, cols=2, figsize=(10, 30))
In [12]:
from deepml.visualize import show_images_from_folder

Image samples showing only large scratches (scratch_large)

In [13]:
show_images_from_folder(train_image_dir, images= train_df[train_df['joined_label'] == 'scratch_large']['filename'].tolist())

Image samples showing only small scratches (scratch_small)

In [14]:
show_images_from_folder(train_image_dir, images= train_df[train_df['joined_label'] == 'scratch_small']['filename'].tolist(), 
                        figsize=(15,20))

Image samples showing only small dents (dent_small)

In [15]:
show_images_from_folder(train_image_dir, images=train_df[train_df['joined_label'] == 'dent_small']['filename'].tolist()[:12], 
                        figsize=(15, 20))

Please watch out for noise samples in the dataset. May be image file j1NNKMd2ho.png does not contain any damages.

Image samples showing only large dents (dent_large)

In [16]:
show_images_from_folder(train_image_dir, images= train_df[train_df['joined_label'] == 'dent_large']['filename'].tolist(), 
                        figsize=(15, 10))

Image samples showing only discoloration (discoloration)

In [17]:
show_images_from_folder(train_image_dir, images= train_df[train_df['joined_label'] == 'discoloration']['filename'].tolist(), figsize=(15, 20))

We have only one sample showing only discoloration damages.

Image samples showing only stray particles (stray_particle)

In [18]:
show_images_from_folder(train_image_dir, images= train_df[train_df['joined_label'] == 'stray_particle']['filename'].tolist()[:12], 
                        figsize=(15, 20))

Image samples showing no damages (no_defect)

In [19]:
show_images_from_folder(train_image_dir, images= train_df[train_df['joined_label'] == 'no_defect']['filename'].tolist()[:12], 
                        figsize=(15, 20))

Similarly, we can look at image samples containing different combination of class labels.

Image samples showing all damages (scratch_small, scratch_large, dent_small, dent_large, stray_particle, discoloration)

In [20]:
show_images_from_folder(train_image_dir, images= train_df[train_df['joined_label'] == 'scratch_small scratch_large dent_small dent_large stray_particle discoloration']['filename'].tolist()[:12], 
                        figsize=(15, 20))
In [ ]:


Comments

santiactis
Almost 3 years ago

Interesting! It’s cool to see that there are actually noisy labels as was shared on Discourse.

You must login before you can post a comment.

Execute