AI Blitz XI

Semantic_Segmentation_101_with_AIBLITZ_11

Segmentaion is a type of computer vision problem, in which the goal is to identify each pixel of the image with respect to its class. There are three main types of segmentaion such as Panoptic segmentation, Instance segmentation and Semanti Segmenation.

Digging into Semantic Segmentation¶

Our Goal¶

In this competion, the goal is to segment over 23 classes, The dataset is generated using Carla simulator.

In this notebook, I will explain on how to tackle this problem with Pytorch and Unets and share some common jargons in semantic segmenation task

Let's dive in.

Download the Dataset¶

Let's download the dataset using aicrowd cli

In [1]:

!pip install aicrowd-cli
%load_ext aicrowd.magic

Requirement already satisfied: aicrowd-cli in /usr/local/lib/python3.7/dist-packages (0.1.10)
Requirement already satisfied: requests-toolbelt<1,>=0.9.1 in /usr/local/lib/python3.7/dist-packages (from aicrowd-cli) (0.9.1)
Requirement already satisfied: pyzmq==22.1.0 in /usr/local/lib/python3.7/dist-packages (from aicrowd-cli) (22.1.0)
Requirement already satisfied: toml<1,>=0.10.2 in /usr/local/lib/python3.7/dist-packages (from aicrowd-cli) (0.10.2)
Requirement already satisfied: GitPython==3.1.18 in /usr/local/lib/python3.7/dist-packages (from aicrowd-cli) (3.1.18)
Requirement already satisfied: rich<11,>=10.0.0 in /usr/local/lib/python3.7/dist-packages (from aicrowd-cli) (10.9.0)
Requirement already satisfied: click<8,>=7.1.2 in /usr/local/lib/python3.7/dist-packages (from aicrowd-cli) (7.1.2)
Requirement already satisfied: requests<3,>=2.25.1 in /usr/local/lib/python3.7/dist-packages (from aicrowd-cli) (2.26.0)
Requirement already satisfied: tqdm<5,>=4.56.0 in /usr/local/lib/python3.7/dist-packages (from aicrowd-cli) (4.62.0)
Requirement already satisfied: typing-extensions>=3.7.4.0 in /usr/local/lib/python3.7/dist-packages (from GitPython==3.1.18->aicrowd-cli) (3.7.4.3)
Requirement already satisfied: gitdb<5,>=4.0.1 in /usr/local/lib/python3.7/dist-packages (from GitPython==3.1.18->aicrowd-cli) (4.0.7)
Requirement already satisfied: smmap<5,>=3.0.1 in /usr/local/lib/python3.7/dist-packages (from gitdb<5,>=4.0.1->GitPython==3.1.18->aicrowd-cli) (4.0.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (2021.5.30)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (2.10)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (1.24.3)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (2.0.4)
Requirement already satisfied: colorama<0.5.0,>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from rich<11,>=10.0.0->aicrowd-cli) (0.4.4)
Requirement already satisfied: pygments<3.0.0,>=2.6.0 in /usr/local/lib/python3.7/dist-packages (from rich<11,>=10.0.0->aicrowd-cli) (2.6.1)
Requirement already satisfied: commonmark<0.10.0,>=0.9.0 in /usr/local/lib/python3.7/dist-packages (from rich<11,>=10.0.0->aicrowd-cli) (0.9.1)

In [2]:

%aicrowd login

Please login here: https://api.aicrowd.com/auth/aO-Aoy1oKjPFh7eSNNgZs28veDOPcPrRJ0Y5myfb34E
API Key valid
Saved API Key successfully!

In [3]:

!rm -rf data
!mkdir data
%aicrowd ds dl -c scene-segmentation -o data

In [4]:

!unzip data/train.zip -d data/train > /dev/null
!unzip data/test.zip -d data/test > /dev/null

In [5]:

! pip install albumentations==0.4.6
!pip install git+https://github.com/qubvel/segmentation_models.pytorch

Collecting albumentations==0.4.6
  Downloading albumentations-0.4.6.tar.gz (117 kB)
     |████████████████████████████████| 117 kB 12.8 MB/s 
Requirement already satisfied: numpy>=1.11.1 in /usr/local/lib/python3.7/dist-packages (from albumentations==0.4.6) (1.19.5)
Requirement already satisfied: scipy in /usr/local/lib/python3.7/dist-packages (from albumentations==0.4.6) (1.4.1)
Collecting imgaug>=0.4.0
  Downloading imgaug-0.4.0-py2.py3-none-any.whl (948 kB)
     |████████████████████████████████| 948 kB 22.8 MB/s 
Requirement already satisfied: PyYAML in /usr/local/lib/python3.7/dist-packages (from albumentations==0.4.6) (3.13)
Requirement already satisfied: opencv-python>=4.1.1 in /usr/local/lib/python3.7/dist-packages (from albumentations==0.4.6) (4.1.2.30)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from imgaug>=0.4.0->albumentations==0.4.6) (3.2.2)
Requirement already satisfied: imageio in /usr/local/lib/python3.7/dist-packages (from imgaug>=0.4.0->albumentations==0.4.6) (2.4.1)
Requirement already satisfied: Pillow in /usr/local/lib/python3.7/dist-packages (from imgaug>=0.4.0->albumentations==0.4.6) (7.1.2)
Requirement already satisfied: Shapely in /usr/local/lib/python3.7/dist-packages (from imgaug>=0.4.0->albumentations==0.4.6) (1.7.1)
Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from imgaug>=0.4.0->albumentations==0.4.6) (1.15.0)
Requirement already satisfied: scikit-image>=0.14.2 in /usr/local/lib/python3.7/dist-packages (from imgaug>=0.4.0->albumentations==0.4.6) (0.16.2)
Requirement already satisfied: PyWavelets>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from scikit-image>=0.14.2->imgaug>=0.4.0->albumentations==0.4.6) (1.1.1)
Requirement already satisfied: networkx>=2.0 in /usr/local/lib/python3.7/dist-packages (from scikit-image>=0.14.2->imgaug>=0.4.0->albumentations==0.4.6) (2.6.2)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->imgaug>=0.4.0->albumentations==0.4.6) (2.8.2)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->imgaug>=0.4.0->albumentations==0.4.6) (2.4.7)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->imgaug>=0.4.0->albumentations==0.4.6) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->imgaug>=0.4.0->albumentations==0.4.6) (1.3.1)
Building wheels for collected packages: albumentations
  Building wheel for albumentations (setup.py) ... done
  Created wheel for albumentations: filename=albumentations-0.4.6-py3-none-any.whl size=65172 sha256=7bd1df8fb935b6c9209d9bd3db3b5fb53d49163662068f0b64fd488fdd8ffd30
  Stored in directory: /root/.cache/pip/wheels/cf/34/0f/cb2a5f93561a181a4bcc84847ad6aaceea8b5a3127469616cc
Successfully built albumentations
Installing collected packages: imgaug, albumentations
  Attempting uninstall: imgaug
    Found existing installation: imgaug 0.2.9
    Uninstalling imgaug-0.2.9:
      Successfully uninstalled imgaug-0.2.9
  Attempting uninstall: albumentations
    Found existing installation: albumentations 0.1.12
    Uninstalling albumentations-0.1.12:
      Successfully uninstalled albumentations-0.1.12
Successfully installed albumentations-0.4.6 imgaug-0.4.0
Collecting git+https://github.com/qubvel/segmentation_models.pytorch
  Cloning https://github.com/qubvel/segmentation_models.pytorch to /tmp/pip-req-build-pla1wl27
  Running command git clone -q https://github.com/qubvel/segmentation_models.pytorch /tmp/pip-req-build-pla1wl27
Requirement already satisfied: torchvision>=0.5.0 in /usr/local/lib/python3.7/dist-packages (from segmentation-models-pytorch==0.2.0) (0.10.0+cu102)
Collecting pretrainedmodels==0.7.4
  Downloading pretrainedmodels-0.7.4.tar.gz (58 kB)
     |████████████████████████████████| 58 kB 4.0 MB/s 
Collecting efficientnet-pytorch==0.6.3
  Downloading efficientnet_pytorch-0.6.3.tar.gz (16 kB)
Collecting timm==0.4.12
  Downloading timm-0.4.12-py3-none-any.whl (376 kB)
     |████████████████████████████████| 376 kB 29.8 MB/s 
Requirement already satisfied: torch in /usr/local/lib/python3.7/dist-packages (from efficientnet-pytorch==0.6.3->segmentation-models-pytorch==0.2.0) (1.9.0+cu102)
Collecting munch
  Downloading munch-2.5.0-py2.py3-none-any.whl (10 kB)
Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from pretrainedmodels==0.7.4->segmentation-models-pytorch==0.2.0) (4.62.0)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch->efficientnet-pytorch==0.6.3->segmentation-models-pytorch==0.2.0) (3.7.4.3)
Requirement already satisfied: pillow>=5.3.0 in /usr/local/lib/python3.7/dist-packages (from torchvision>=0.5.0->segmentation-models-pytorch==0.2.0) (7.1.2)
Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from torchvision>=0.5.0->segmentation-models-pytorch==0.2.0) (1.19.5)
Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from munch->pretrainedmodels==0.7.4->segmentation-models-pytorch==0.2.0) (1.15.0)
Building wheels for collected packages: segmentation-models-pytorch, efficientnet-pytorch, pretrainedmodels
  Building wheel for segmentation-models-pytorch (setup.py) ... done
  Created wheel for segmentation-models-pytorch: filename=segmentation_models_pytorch-0.2.0-py3-none-any.whl size=88635 sha256=2d448757d06ff4573d0b16e32a9c1dc0bdd167f5dc61015f04a4db53bd137eb8
  Stored in directory: /tmp/pip-ephem-wheel-cache-jydryrnu/wheels/fa/c5/a8/1e8af6cb04a0974db8a4a156ebd2fdd1d99ad2558d3fce49d4
  Building wheel for efficientnet-pytorch (setup.py) ... done
  Created wheel for efficientnet-pytorch: filename=efficientnet_pytorch-0.6.3-py3-none-any.whl size=12421 sha256=1f761eaa12e0cc367b0e22915a1cc348d45e1959720cf77f0dba44b692555802
  Stored in directory: /root/.cache/pip/wheels/90/6b/0c/f0ad36d00310e65390b0d4c9218ae6250ac579c92540c9097a
  Building wheel for pretrainedmodels (setup.py) ... done
  Created wheel for pretrainedmodels: filename=pretrainedmodels-0.7.4-py3-none-any.whl size=60965 sha256=6e2cd2b6b0ff87c7707799b7404082367b2c90c41aa7aa7925053efd78f338d4
  Stored in directory: /root/.cache/pip/wheels/ed/27/e8/9543d42de2740d3544db96aefef63bda3f2c1761b3334f4873
Successfully built segmentation-models-pytorch efficientnet-pytorch pretrainedmodels
Installing collected packages: munch, timm, pretrainedmodels, efficientnet-pytorch, segmentation-models-pytorch
Successfully installed efficientnet-pytorch-0.6.3 munch-2.5.0 pretrainedmodels-0.7.4 segmentation-models-pytorch-0.2.0 timm-0.4.12

Imports¶

In [21]:

import torch
import torch.nn as nn  
import torch.nn.functional as F 
from torch.utils.data import Dataset, DataLoader

import albumentations as albu
import segmentation_models_pytorch as smp
from natsort import natsorted
import os
from PIL import Image
import numpy as np
from tqdm.notebook import tqdm

Load the Dataset¶

The dataset directory structure is this

data
--test
---image         #directory containing test images
--train
---image         #directory containing train images
---segmentation  #direcory containing train segmentation(masks)

Now, we'll load the dataset using Pytorch dataset class and create dataloaders for predicting the pixels

In [7]:

class SemanticSegmentationDataset(Dataset):

  """
  Creates a segmentation dataset with options to maugmentations
  """

  def __init__(self,  
               img_directory = None, 
               label_directory = None,
               mode = 'train',
               augmentation=None,
               preprocessing = None

               ):
    self.img_directory = img_directory
    self.label_directory = label_directory

    if img_directory is not None:
      self.img_list = natsorted(os.listdir(img_directory))
      self.label_list = natsorted(os.listdir(label_directory))
    
    self.mode = mode
    self.labels = list(range(0, 25)) #we have over 23 classes
    self.augmentation = augmentation
    self.preprocessing = preprocessing
  
  def __len__(self):
    return len(self.img_list)
  
  def __getitem__(self, idx):
    image = Image.open(os.path.join(self.img_directory, self.img_list[idx]))
    image = image.convert('L') #converts to grayscale


    if self.mode == 'train' or self.mode == 'val':
          mask = Image.open(os.path.join(self.label_directory, self.label_list[idx]))

          image = np.array(image, dtype=np.float32)
          mask = np.array(mask, dtype=np.float32)

          image = image[np.newaxis, :, :]
          image = torch.from_numpy(image)
          image = image.float()/255

          binary_mask = np.array([(mask == v) for v in list(self.labels)])
          binary_mask = np.stack(binary_mask, axis=-1).astype('float')

          mask_preprocessed = binary_mask.transpose(2, 0, 1)
          mask_preprocessed = torch.from_numpy(mask_preprocessed)

          #apply augmentation
          if self.augmentation:
            sample = self.augmentation(image=image, mask=mask_preprocessed)
            image, mask_preprocessed = sample['image'], sample['mask']

          #apply preprocessing
          if self.preprocessing:
            sample = self.preprocessing(image=image, mask=mask_preprocessed)
            image, mask_preprocessed = sample['image'], sample['mask']
            

          return image, mask_preprocessed

    else:
          image = np.array(image, dtype=np.float32)
          image = image[np.newaxis, :, :]
          image = torch.from_numpy(image)
          image = image.float()/255
          return image

Model Architecture¶

There are many different types of architectures, that can be used for training segmentation models.

The main popular architectures are,

Unet
Unet++
FPN
MAnet
Linknet
PSPNet
PAN
DeepLabV3
DeepLabV3+

These architectures are also available in segmentation models pytorch toolkit. For this tutorial, we will focus only on UNET.

As you can notice from the above figure, the architecture is in the U shape. The left block is called as an Encoder block and the right block is considered as decoder block.

The encoder is mostly a CNN, which extracts meaningful feature map from an input image, also it doubles the number of channels in each step and downsample the image, i.e reduce the height and width.

The decoder upsamples the feature maps and halves the number of channels and outputs a segmentation map, which is the region of pixels

For this competition, we will use Segmentation models pytorch library which consists of handful of utility functions to create architecture and load the pretrained encoders

In [8]:

train_dataset = SemanticSegmentationDataset(img_directory = "data/train/image", 
                                        label_directory = "data/train/segmentation",
                                        mode='train',  
                                        augmentation=None,
                                        preprocessing=None
                                        )
train_loader = DataLoader(train_dataset,
                          batch_size=8, 
                          shuffle=True, 
                          drop_last=True,
                          )

In [9]:

ENCODER = 'resnet18' 
ENCODER_WEIGHTS = 'imagenet'
ACTIVATION = "softmax2d" 
DEVICE = 'cuda'

model = smp.Unet(
    encoder_name=ENCODER, 
    encoder_weights=ENCODER_WEIGHTS, 
    classes=len(train_dataset.labels),
    in_channels=1,
    activation=ACTIVATION,
)


preprocessing_fn = smp.encoders.get_preprocessing_fn(ENCODER, ENCODER_WEIGHTS)

Downloading: "https://download.pytorch.org/models/resnet18-5c106cde.pth" to /root/.cache/torch/hub/checkpoints/resnet18-5c106cde.pth

In [10]:

print(model)

Unet(
  (encoder): ResNetEncoder(
    (conv1): Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
    (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (layer1): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (layer2): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (downsample): Sequential(
          (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
          (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): BasicBlock(
        (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (layer3): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (downsample): Sequential(
          (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): BasicBlock(
        (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (layer4): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (downsample): Sequential(
          (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
          (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): BasicBlock(
        (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
  )
  (decoder): UnetDecoder(
    (center): Identity()
    (blocks): ModuleList(
      (0): DecoderBlock(
        (conv1): Conv2dReLU(
          (0): Conv2d(768, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (attention1): Attention(
          (attention): Identity()
        )
        (conv2): Conv2dReLU(
          (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (attention2): Attention(
          (attention): Identity()
        )
      )
      (1): DecoderBlock(
        (conv1): Conv2dReLU(
          (0): Conv2d(384, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (attention1): Attention(
          (attention): Identity()
        )
        (conv2): Conv2dReLU(
          (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (attention2): Attention(
          (attention): Identity()
        )
      )
      (2): DecoderBlock(
        (conv1): Conv2dReLU(
          (0): Conv2d(192, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (attention1): Attention(
          (attention): Identity()
        )
        (conv2): Conv2dReLU(
          (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (attention2): Attention(
          (attention): Identity()
        )
      )
      (3): DecoderBlock(
        (conv1): Conv2dReLU(
          (0): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (attention1): Attention(
          (attention): Identity()
        )
        (conv2): Conv2dReLU(
          (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (attention2): Attention(
          (attention): Identity()
        )
      )
      (4): DecoderBlock(
        (conv1): Conv2dReLU(
          (0): Conv2d(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (attention1): Attention(
          (attention): Identity()
        )
        (conv2): Conv2dReLU(
          (0): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (attention2): Attention(
          (attention): Identity()
        )
      )
    )
  )
  (segmentation_head): SegmentationHead(
    (0): Conv2d(16, 25, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): Identity()
    (2): Activation(
      (activation): Softmax(dim=1)
    )
  )
)

In [11]:

#create dataset  and dataloaders
train_dataset = SemanticSegmentationDataset(img_directory = "data/train/image", 
                                        label_directory = "data/train/segmentation",
                                        mode='train',  
                                        augmentation=None,
                                        preprocessing=None
                                        )
train_loader = DataLoader(train_dataset,
                          batch_size=8, 
                          shuffle=True, 

                          drop_last=True,
                          )

Loss Functions¶

Several loss functions could be used for semantic segmentation task, the ones that you see often are

Dice Loss

Dice Loss is derived from Sørensen–Dice coefficient, which is a statistical term, that used to measure similarity between two samples.

Focal Loss

Focal loss is also used for image classification, it is mainly used when we have a highly imbalanced dataset and when we need a model to take risk and predict as a false positive, instead of negative result, which is important in medical diagnostic problems.

Jaccard Loss

Jaccard Loss is similar to dice loss, it is commonly used to train segmentation mdodels, it is also called as an IoU metric.It is calculated as the ratio between the overlap of the positive instances between two sets, and their mutual combined values.

We will use standard Dice Loss for this tutorial

In [12]:

loss = smp.utils.losses.DiceLoss()

metrics = [
    smp.utils.metrics.IoU(threshold=0.5),
    smp.utils.metrics.Fscore(threshold=0.5),
    smp.utils.metrics.Accuracy(threshold=0.5),
    smp.utils.metrics.Recall(threshold=0.5),
    smp.utils.metrics.Precision(threshold=0.5),
]

optimizer = torch.optim.AdamW(params=model.parameters(), lr=0.006)

In [13]:

DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'

train_epoch = smp.utils.train.TrainEpoch(
    model, 
    loss = loss, 
    metrics = metrics, 
    optimizer = optimizer,
    device = DEVICE,
    verbose = True,
)

For the sake of this tutorial, I will train only for 5 epochs

In [14]:

max_score = 0

for i in range(0, 5): 
  print('\nEpoch: {}'.format(i + 1))
  train_logs = train_epoch.run(train_loader)
  #valid_logs = valid_epoch.run(val_loader)
  curr_score = train_logs['fscore']
  #save the best model based on f1-score

  if max_score < curr_score:
    max_score = curr_score
    torch.save(model, 'result.pth')
    print('Model saved!')

  torch.save(model, 'last.pth')

Epoch: 1
train:   0%|          | 0/500 [00:00<?, ?it/s]

/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /pytorch/c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)

train: 100%|██████████| 500/500 [19:38<00:00,  2.36s/it, dice_loss - 0.2108, iou_score - 0.6586, fscore - 0.7903, accuracy - 0.9836, recall - 0.7888, precision - 0.7948]
Model saved!

Epoch: 2
train: 100%|██████████| 500/500 [19:29<00:00,  2.34s/it, dice_loss - 0.1911, iou_score - 0.6806, fscore - 0.8089, accuracy - 0.9847, recall - 0.8089, precision - 0.8089]
Model saved!

Epoch: 3
train: 100%|██████████| 500/500 [19:32<00:00,  2.34s/it, dice_loss - 0.1911, iou_score - 0.6805, fscore - 0.8089, accuracy - 0.9847, recall - 0.8089, precision - 0.8089]

Epoch: 4
train: 100%|██████████| 500/500 [19:28<00:00,  2.34s/it, dice_loss - 0.1887, iou_score - 0.6837, fscore - 0.8113, accuracy - 0.9849, recall - 0.8113, precision - 0.8113]
Model saved!

Epoch: 5
train: 100%|██████████| 500/500 [19:29<00:00,  2.34s/it, dice_loss - 0.1877, iou_score - 0.6852, fscore - 0.8123, accuracy - 0.985, recall - 0.8123, precision - 0.8123]
Model saved!

Submitting predictions with the test dataset¶

In [18]:

# Creating the testing dataset

test_dataset = SemanticSegmentationDataset(img_directory="data/test/image", mode="test")
test_loader = DataLoader(test_dataset, batch_size=1, num_workers=2, shuffle=False, drop_last=False)

In [23]:

# Generating Model Predictions
!rm -rf segmentation
!mkdir segmentation

for n, batch in enumerate(tqdm(test_loader)):

  # Getting the predictions
  predictions = model.predict(batch.to(DEVICE)).cpu() 
  
  # Converting the predictions to right format
  prediction_mask = (predictions.squeeze().cpu().numpy())   
  prediction_mask = np.transpose(prediction_mask, (1, 2, 0))

  # Getting individual channel and combining them into single image
  prediction_mask_gray = np.zeros((prediction_mask.shape[0],prediction_mask.shape[1]))
  for i in range(prediction_mask.shape[2]):
    prediction_mask_gray = prediction_mask_gray + i*prediction_mask[:,:,i].round()


  # Saving the image
  prediction_mask_gray = Image.fromarray(prediction_mask_gray.astype(np.uint8))
  prediction_mask_gray.save(os.path.join("segmentation", f"{n}.png"))

In [24]:

!aicrowd notebook submit -c scene-segmentation -a segmentation --no-verify

/usr/local/lib/python3.7/dist-packages/aicrowd/notebook/helpers.py:361: UserWarning: `%aicrowd` magic command can be used to save the notebook inside jupyter notebook/jupyterLab environment and also to get the notebook directly from the frontend without mounting the drive in colab environment. You can use magic command to skip mounting the drive and submit using the code below:
 %load_ext aicrowd.magic
%aicrowd notebook submit -c scene-segmentation -a segmentation --no-verify
  warnings.warn(description + code)
Mounting Google Drive 💾
Your Google Drive will be mounted to access the colab notebook
Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.activity.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fexperimentsandconfigs%20https%3a%2f%2fwww.googleapis.com%2fauth%2fphotos.native&response_type=code

Enter your authorization code:
4/1AX4XfWh4fkefC9fibK--rFoYFFEJle8XXBqp-0H3_dmPnhtaucd1s32D3cA
Mounted at /content/drive
Using notebook: Semantic_Segmentation_101_with_AIBLITZ_11.ipynb for submission...
Scrubbing API keys from the notebook...
Collecting notebook...
submission.zip ━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 2.7/2.7 MB • 2.4 MB/s • 0:00:00
                                                   ╭─────────────────────────╮                                                   
                                                   │ Successfully submitted! │                                                   
                                                   ╰─────────────────────────╯                                                   
                                                         Important links                                                         
┌──────────────────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│  This submission │ https://www.aicrowd.com/challenges/ai-blitz-xi/problems/scene-segmentation/submissions/156410              │
│                  │                                                                                                            │
│  All submissions │ https://www.aicrowd.com/challenges/ai-blitz-xi/problems/scene-segmentation/submissions?my_submissions=true │
│                  │                                                                                                            │
│      Leaderboard │ https://www.aicrowd.com/challenges/ai-blitz-xi/problems/scene-segmentation/leaderboards                    │
│                  │                                                                                                            │
│ Discussion forum │ https://discourse.aicrowd.com/c/ai-blitz-xi                                                                │
│                  │                                                                                                            │
│   Challenge page │ https://www.aicrowd.com/challenges/ai-blitz-xi/problems/scene-segmentation                                 │
└──────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Extra Resources to learn segmentation:¶

[1] Jeremy jordan's blogpost: jeremyjordan.me/semantic-segmentation/

[2] Unet from scratch by Aladdin Person: https://www.youtube.com/watch?v=IHq1t7NxS8k

[3] Unet from scratch by Aman Arora: https://amaarora.github.io/2020/09/13/unet.html

In [ ]:

Content

3171

Show Comments

Comments

You must login before you can post a comment.