AI Blitz XI
Semantic_Segmentation_101_with_AIBLITZ_11
Hands on notebook with segmentation for Pytorch
Segmentaion is a type of computer vision problem, in which the goal is to identify each pixel of the image with respect to its class. There are three main types of segmentaion such as Panoptic segmentation, Instance segmentation and Semanti Segmenation.
Digging into Semantic Segmentation¶
Segmentaion is a type of computer vision problem, in which the goal is to identify each pixel of the image with respect to its class. There are three main types of segmentaion such as Panoptic segmentation, Instance segmentation and Semanti Segmenation.
Our Goal¶
In this competion, the goal is to segment over 23 classes, The dataset is generated using Carla simulator.
In this notebook, I will explain on how to tackle this problem with Pytorch and Unets and share some common jargons in semantic segmenation task
Let's dive in.
Download the Dataset¶
Let's download the dataset using aicrowd cli
!pip install aicrowd-cli
%load_ext aicrowd.magic
%aicrowd login
!rm -rf data
!mkdir data
%aicrowd ds dl -c scene-segmentation -o data
!unzip data/train.zip -d data/train > /dev/null
!unzip data/test.zip -d data/test > /dev/null
! pip install albumentations==0.4.6
!pip install git+https://github.com/qubvel/segmentation_models.pytorch
Imports¶
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
import albumentations as albu
import segmentation_models_pytorch as smp
from natsort import natsorted
import os
from PIL import Image
import numpy as np
from tqdm.notebook import tqdm
Load the Dataset¶
The dataset directory structure is this
data
--test
---image #directory containing test images
--train
---image #directory containing train images
---segmentation #direcory containing train segmentation(masks)
Now, we'll load the dataset using Pytorch dataset class and create dataloaders for predicting the pixels
class SemanticSegmentationDataset(Dataset):
"""
Creates a segmentation dataset with options to maugmentations
"""
def __init__(self,
img_directory = None,
label_directory = None,
mode = 'train',
augmentation=None,
preprocessing = None
):
self.img_directory = img_directory
self.label_directory = label_directory
if img_directory is not None:
self.img_list = natsorted(os.listdir(img_directory))
self.label_list = natsorted(os.listdir(label_directory))
self.mode = mode
self.labels = list(range(0, 25)) #we have over 23 classes
self.augmentation = augmentation
self.preprocessing = preprocessing
def __len__(self):
return len(self.img_list)
def __getitem__(self, idx):
image = Image.open(os.path.join(self.img_directory, self.img_list[idx]))
image = image.convert('L') #converts to grayscale
if self.mode == 'train' or self.mode == 'val':
mask = Image.open(os.path.join(self.label_directory, self.label_list[idx]))
image = np.array(image, dtype=np.float32)
mask = np.array(mask, dtype=np.float32)
image = image[np.newaxis, :, :]
image = torch.from_numpy(image)
image = image.float()/255
binary_mask = np.array([(mask == v) for v in list(self.labels)])
binary_mask = np.stack(binary_mask, axis=-1).astype('float')
mask_preprocessed = binary_mask.transpose(2, 0, 1)
mask_preprocessed = torch.from_numpy(mask_preprocessed)
#apply augmentation
if self.augmentation:
sample = self.augmentation(image=image, mask=mask_preprocessed)
image, mask_preprocessed = sample['image'], sample['mask']
#apply preprocessing
if self.preprocessing:
sample = self.preprocessing(image=image, mask=mask_preprocessed)
image, mask_preprocessed = sample['image'], sample['mask']
return image, mask_preprocessed
else:
image = np.array(image, dtype=np.float32)
image = image[np.newaxis, :, :]
image = torch.from_numpy(image)
image = image.float()/255
return image
Model Architecture¶
There are many different types of architectures, that can be used for training segmentation models.
The main popular architectures are,
- Unet
- Unet++
- FPN
- MAnet
- Linknet
- PSPNet
- PAN
- DeepLabV3
- DeepLabV3+
These architectures are also available in segmentation models pytorch toolkit. For this tutorial, we will focus only on UNET.
As you can notice from the above figure, the architecture is in the U shape. The left block is called as an Encoder block and the right block is considered as decoder block.
The encoder is mostly a CNN, which extracts meaningful feature map from an input image, also it doubles the number of channels in each step and downsample the image, i.e reduce the height and width.
The decoder upsamples the feature maps and halves the number of channels and outputs a segmentation map, which is the region of pixels
For this competition, we will use Segmentation models pytorch library which consists of handful of utility functions to create architecture and load the pretrained encoders
train_dataset = SemanticSegmentationDataset(img_directory = "data/train/image",
label_directory = "data/train/segmentation",
mode='train',
augmentation=None,
preprocessing=None
)
train_loader = DataLoader(train_dataset,
batch_size=8,
shuffle=True,
drop_last=True,
)
ENCODER = 'resnet18'
ENCODER_WEIGHTS = 'imagenet'
ACTIVATION = "softmax2d"
DEVICE = 'cuda'
model = smp.Unet(
encoder_name=ENCODER,
encoder_weights=ENCODER_WEIGHTS,
classes=len(train_dataset.labels),
in_channels=1,
activation=ACTIVATION,
)
preprocessing_fn = smp.encoders.get_preprocessing_fn(ENCODER, ENCODER_WEIGHTS)
print(model)
#create dataset and dataloaders
train_dataset = SemanticSegmentationDataset(img_directory = "data/train/image",
label_directory = "data/train/segmentation",
mode='train',
augmentation=None,
preprocessing=None
)
train_loader = DataLoader(train_dataset,
batch_size=8,
shuffle=True,
drop_last=True,
)
Loss Functions¶
Several loss functions could be used for semantic segmentation task, the ones that you see often are
- Dice Loss
Dice Loss is derived from Sørensen–Dice coefficient, which is a statistical term, that used to measure similarity between two samples.
- Focal Loss
Focal loss is also used for image classification, it is mainly used when we have a highly imbalanced dataset and when we need a model to take risk and predict as a false positive, instead of negative result, which is important in medical diagnostic problems.
- Jaccard Loss
Jaccard Loss is similar to dice loss, it is commonly used to train segmentation mdodels, it is also called as an IoU metric.It is calculated as the ratio between the overlap of the positive instances between two sets, and their mutual combined values.
We will use standard Dice Loss for this tutorial
loss = smp.utils.losses.DiceLoss()
metrics = [
smp.utils.metrics.IoU(threshold=0.5),
smp.utils.metrics.Fscore(threshold=0.5),
smp.utils.metrics.Accuracy(threshold=0.5),
smp.utils.metrics.Recall(threshold=0.5),
smp.utils.metrics.Precision(threshold=0.5),
]
optimizer = torch.optim.AdamW(params=model.parameters(), lr=0.006)
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
train_epoch = smp.utils.train.TrainEpoch(
model,
loss = loss,
metrics = metrics,
optimizer = optimizer,
device = DEVICE,
verbose = True,
)
For the sake of this tutorial, I will train only for 5 epochs
max_score = 0
for i in range(0, 5):
print('\nEpoch: {}'.format(i + 1))
train_logs = train_epoch.run(train_loader)
#valid_logs = valid_epoch.run(val_loader)
curr_score = train_logs['fscore']
#save the best model based on f1-score
if max_score < curr_score:
max_score = curr_score
torch.save(model, 'result.pth')
print('Model saved!')
torch.save(model, 'last.pth')
Submitting predictions with the test dataset¶
# Creating the testing dataset
test_dataset = SemanticSegmentationDataset(img_directory="data/test/image", mode="test")
test_loader = DataLoader(test_dataset, batch_size=1, num_workers=2, shuffle=False, drop_last=False)
# Generating Model Predictions
!rm -rf segmentation
!mkdir segmentation
for n, batch in enumerate(tqdm(test_loader)):
# Getting the predictions
predictions = model.predict(batch.to(DEVICE)).cpu()
# Converting the predictions to right format
prediction_mask = (predictions.squeeze().cpu().numpy())
prediction_mask = np.transpose(prediction_mask, (1, 2, 0))
# Getting individual channel and combining them into single image
prediction_mask_gray = np.zeros((prediction_mask.shape[0],prediction_mask.shape[1]))
for i in range(prediction_mask.shape[2]):
prediction_mask_gray = prediction_mask_gray + i*prediction_mask[:,:,i].round()
# Saving the image
prediction_mask_gray = Image.fromarray(prediction_mask_gray.astype(np.uint8))
prediction_mask_gray.save(os.path.join("segmentation", f"{n}.png"))
!aicrowd notebook submit -c scene-segmentation -a segmentation --no-verify
Extra Resources to learn segmentation:¶
[1] Jeremy jordan's blogpost: jeremyjordan.me/semantic-segmentation/
[2] Unet from scratch by Aladdin Person: https://www.youtube.com/watch?v=IHq1t7NxS8k
[3] Unet from scratch by Aman Arora: https://amaarora.github.io/2020/09/13/unet.html
Content
Comments
You must login before you can post a comment.