Loading

Data Purchasing Challenge 2022

Representation Learning

Representation Learning: A Stepping Stone for an Improved Data Label Purchase aka Active Learning

AO

In this notebook, I will introduce Representation Learning (RL) and how RL can be utilised for data label purchase aka Active Learning (AL).

Representation Learning (or Feature Engineering): A Stepping Stone for an Improved Active Learning

Reprenstation Learning (RL) is concerned with discovering hidden patterns in the data and encode/compress these information into a feature vector. Using RL in the context of Neural Netwokrs (NN) can also be viewed as Feature Engineering (FE) which removes manual FE, and thus you have a system that learns end-to-end.

Data Purchasing Challenge

As I already mentioned in my previous notebook (explainable AI), key point in this challenge is to learn a good representation of the given training images. The literature offers a series of approaches, but the difficulty arises due to the small number of the training images and the given time constraint. Therefore, the loss for RL is of paramount importance, especially, in the case of little data. I will introduce you RL including possible loss functions. Of course, RL is just half of the battle. The other important part is going to be what to do with the learned feature vectors. Here, I will make a few possible suggestions.

Load Dependencies

In [1]:
import math
import os
import random

import time

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import models, transforms
from evaluator.dataset import ZEWDPCBaseDataset
from evaluator.evaluation_metrics import get_zew_dpc_metrics

seed = 42
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
Out[1]:
<torch._C.Generator at 0x7f21705d1690>

Helper function

As an exemplary RL approach, I will introduce CosFace or Large Margin Cosine Loss (LMCL) and it is defined as following

Formula from [1]

The goal is to separate the examples of different classes and bring examples of the same class closer together in the feature space. CosFace aims to cluster examples of the same class together and moves these clusters (classes of clusters) away from each other for a better separation. The following image depicts this geometrically:

Image from [1]

is the angle between W and the feature vector x. The following is an possible implementation of LCML

In [2]:
class CosFace(nn.Module):
    '''
     Paper: CosFace: Large Margin Cosine Loss for Deep Face Recognition
     arxiv: https://arxiv.org/pdf/1801.09414.pdf
    '''
    def __init__(self, in_features, out_features, margin, scale) -> None:
        super(CosFace, self).__init__()
        self.scale = scale
        self.margin = margin
        self.weight = nn.Parameter(torch.randn((out_features, in_features)))

    def forward(self, features, label):
        cosine_similarity = F.linear(F.normalize(features), F.normalize(self.weight))  # cos(phi) = features/||features|| * weight/||weight||
        output = (label * (cosine_similarity-self.margin)) + ((1.0 - label) * cosine_similarity)
        output *= self.scale

        return output
In [3]:
def train(training_dataset: ZEWDPCBaseDataset, train_idx, val_idx, model, optimizer, batch_size, epochs, criterion, device):
    start_time = time.time()
    n_train_it, n_val_it = math.ceil(train_idx.shape[0] / batch_size), math.ceil(val_idx.shape[0] / batch_size)
    for epoch in range(epochs):
        model.train()
        train_predictions, val_predictions = [], []
        random_idx = np.random.permutation(train_idx.shape[0])
        train_idx = train_idx[random_idx]
        for batch in range(n_train_it):
            optimizer.zero_grad()
            data, y_true = [], []
            for i in train_idx[batch*batch_size:(batch+1)*batch_size]:
                sample = training_dataset[i]
                data.append(sample['image'])
                y_true.append(sample['label'])
            data = torch.stack(data, dim=0).to(device)
            y_true = torch.tensor(y_true).float().to(device)
            output = model(data, y_true, True)
            train_predictions.append(output.cpu().detach())
            loss = criterion(output, y_true)
            loss.backward()
            optimizer.step()
            # print(f"==> [Train] Epoch {epoch+1}/{epochs} | Batch {batch+1}/{n_train_it} | Loss {loss:.6f} | Passed time {(time.time() - start_time)/60:.2f} min.")
        
        with torch.no_grad():
            train_predictions = torch.concat(train_predictions, dim=0)
            y_true = torch.from_numpy(training_dataset._get_all_labels()[train_idx]).float()
            loss = criterion(train_predictions, y_true)
            train_predictions[train_predictions <= .5] = 0
            train_predictions[train_predictions > .5] = 1
            scores = get_zew_dpc_metrics(training_dataset._get_all_labels()[train_idx], train_predictions)
            print(f"=> [Train] Epoch {epoch+1}/{epochs} | Loss {loss:.6f} | F1 {scores['F1_score_macro']:.3f} | "
                  f"Acc {scores['accuracy_score']:.3f} | HamL {scores['hamming_loss']:.3f} | Passed time {(time.time() - start_time)/60:.2f} min.")
            model.eval()
            for batch in range(n_val_it):
                data, y_true = [], []
                for i in val_idx[batch*batch_size:(batch+1)*batch_size]:
                    sample = training_dataset[i]
                    data.append(sample['image'])
                    y_true.append(sample['label'])
                data = torch.stack(data, dim=0).to(device)
                y_true = torch.tensor(y_true).float().to(device)
                output = model(data, y_true, True)
                val_predictions.append(output.cpu())
                loss = criterion(output, y_true)
                # print(f"==> [Val] Epoch {epoch+1}/{epochs} | Batch {batch+1}/{n_train_it} | Loss {loss:.6f} | Passed time {(time.time() - start_time)/60:.2f} min.")
            
            val_predictions = torch.concat(val_predictions, dim=0)
            y_true = torch.from_numpy(training_dataset._get_all_labels()[val_idx]).float()
            loss = criterion(val_predictions, y_true)
            val_predictions[val_predictions <= .5] = 0
            val_predictions[val_predictions > .5] = 1
            scores = get_zew_dpc_metrics(training_dataset._get_all_labels()[val_idx], val_predictions)
            print(f"=> [Val] Epoch {epoch+1}/{epochs} | Loss {loss:.6f} | F1 {scores['F1_score_macro']:.3f} | "
                  f"Acc {scores['accuracy_score']:.3f} | HamL {scores['hamming_loss']:.3f} | Passed time {(time.time() - start_time)/60:.2f} min.")

Set Parameters

In [4]:
device = 'cuda:0' if torch.cuda.is_available else 'cpu'
device = torch.device(device)

Load Dataset

In [5]:
mean, std = torch.tensor([0.485, 0.456, 0.406]), torch.tensor([0.229, 0.224, 0.225])
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=mean, std=std)
])

training_dataset = ZEWDPCBaseDataset(
        images_dir="./data/public_training/images",
        labels_path="./data/public_training/labels.csv",
        shuffle_seed=seed,
        transform=transform
    )

n_samples = len(training_dataset)
random_idx = np.random.permutation(n_samples)
train_idx, val_idx = random_idx[:math.floor(n_samples*.9)], random_idx[math.floor(n_samples*.9):]

Load Torchvision's Pre-Trained Model: ResNet18

In [6]:
class Encoder(nn.Module):
    def __init__(self, in_features, out_features, margin, scale) -> None:
        super(Encoder, self).__init__()
        self.model = models.resnet18(True, False)
        self.model = nn.Sequential(*list(self.model.children())[:-1], nn.Flatten())
        self.cos_face = CosFace(in_features, out_features, margin, scale)
    
    def forward(self, img, label=None, training=None):
        feature_vector = self.model(img)
        if training:
            output = self.cos_face(feature_vector, label)
            return output
        
        return feature_vector
In [7]:
model = Encoder(512, 6, .5, 64).to(device)

Train ResNet18

In [8]:
optimizer, batch_size, epochs, criterion = torch.optim.Adam(model.parameters(), lr=1e-4, weight_decay=1e-4), 32, 12, nn.CrossEntropyLoss()
train(training_dataset, train_idx, val_idx, model, optimizer, batch_size, epochs, criterion, device)
=> [Train] Epoch 1/12 | Loss 31.750378 | F1 0.030 | Acc 0.052 | HamL 0.460 | Passed time 0.08 min.
=> [Val] Epoch 1/12 | Loss 24.335424 | F1 0.000 | Acc 0.000 | HamL 0.370 | Passed time 0.09 min.
=> [Train] Epoch 2/12 | Loss 20.650026 | F1 0.000 | Acc 0.000 | HamL 0.406 | Passed time 0.17 min.
=> [Val] Epoch 2/12 | Loss 25.333149 | F1 0.000 | Acc 0.000 | HamL 0.390 | Passed time 0.18 min.
=> [Train] Epoch 3/12 | Loss 18.571512 | F1 0.000 | Acc 0.001 | HamL 0.382 | Passed time 0.27 min.
=> [Val] Epoch 3/12 | Loss 21.059807 | F1 0.000 | Acc 0.000 | HamL 0.355 | Passed time 0.27 min.
=> [Train] Epoch 4/12 | Loss 15.753123 | F1 0.001 | Acc 0.003 | HamL 0.374 | Passed time 0.35 min.
=> [Val] Epoch 4/12 | Loss 21.079130 | F1 0.008 | Acc 0.010 | HamL 0.353 | Passed time 0.36 min.
=> [Train] Epoch 5/12 | Loss 13.411356 | F1 0.040 | Acc 0.099 | HamL 0.355 | Passed time 0.44 min.
=> [Val] Epoch 5/12 | Loss 20.996479 | F1 0.097 | Acc 0.310 | HamL 0.307 | Passed time 0.45 min.
=> [Train] Epoch 6/12 | Loss 10.940973 | F1 0.091 | Acc 0.277 | HamL 0.316 | Passed time 0.53 min.
=> [Val] Epoch 6/12 | Loss 21.982407 | F1 0.042 | Acc 0.090 | HamL 0.353 | Passed time 0.54 min.
=> [Train] Epoch 7/12 | Loss 10.159442 | F1 0.104 | Acc 0.308 | HamL 0.309 | Passed time 0.62 min.
=> [Val] Epoch 7/12 | Loss 21.353516 | F1 0.101 | Acc 0.320 | HamL 0.332 | Passed time 0.63 min.
=> [Train] Epoch 8/12 | Loss 8.314388 | F1 0.117 | Acc 0.366 | HamL 0.293 | Passed time 0.70 min.
=> [Val] Epoch 8/12 | Loss 22.383198 | F1 0.112 | Acc 0.350 | HamL 0.313 | Passed time 0.71 min.
=> [Train] Epoch 9/12 | Loss 7.591969 | F1 0.127 | Acc 0.391 | HamL 0.273 | Passed time 0.79 min.
=> [Val] Epoch 9/12 | Loss 21.794899 | F1 0.130 | Acc 0.400 | HamL 0.272 | Passed time 0.80 min.
=> [Train] Epoch 10/12 | Loss 6.503292 | F1 0.136 | Acc 0.411 | HamL 0.271 | Passed time 0.87 min.
=> [Val] Epoch 10/12 | Loss 22.777351 | F1 0.118 | Acc 0.300 | HamL 0.332 | Passed time 0.88 min.
=> [Train] Epoch 11/12 | Loss 5.515342 | F1 0.147 | Acc 0.462 | HamL 0.254 | Passed time 0.96 min.
=> [Val] Epoch 11/12 | Loss 21.868790 | F1 0.128 | Acc 0.350 | HamL 0.297 | Passed time 0.96 min.
=> [Train] Epoch 12/12 | Loss 4.986962 | F1 0.154 | Acc 0.459 | HamL 0.255 | Passed time 1.04 min.
=> [Val] Epoch 12/12 | Loss 20.727112 | F1 0.125 | Acc 0.400 | HamL 0.287 | Passed time 1.05 min.

So, why are we doing this? The following figure depicts the result up until now

The encoded representation is a compact representation of the input image. Now, the input images can be used for further processing, e.g. they can be used as input for a classifier as depicted in the following figure:

Of course, the classifier can be a neural network but it can also be some other type of classifier. You can also do something else with the learned representations. This is completely up to your creativity.

Above, I introduced one possible loss function. It is one among many. Other possible loss functions are:

  • Angular Loss
  • Sphere Face Loss
  • Contrastive Loss
  • ...

The list goes on and on. You can have look at here. It contains many possible loss functions.

Data (Label) Purchase (aka Active Learning (AL))

Now, that we covered RL, let's continue with the data (label) purchase. The question is, how to use RL in the context of AL? Possible approaches are

  • Random Sampling
  • Least Confidence
  • Margin Sampling
  • Entropy Sampling
  • Ensemble of Active Learners
  • ...

E.g., you could train a classifier based on RL and use the probabilities for least confidence, or entropy sampling or even train multiple classifiers and employ them as a comittee of active learners (classifiers).

The list of combinations of RL and AL approaches is comprehensive and there should be something for everyone.

Hopefully, this notebook will be helpful to you. Best of luck in the challenge!

MIT LICENSE

Copyright 2022 AO

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

In [ ]:


Comments

You must login before you can post a comment.

Execute