Data Purchasing Challenge 2022
WANDB - Build better models faster with experiment tracking
Track, compare, and visualize ML experiments with a few lines of code. Baseline as an ILLUSTRATION.
I am sure everyone is trying different things to increase their performance. Some of them work, some of them don't, but how do you track each experiment.
Wandb is a MLOps platform. It helps you build better models faster with experiment tracking, dataset versioning, and model management. It enables you to track the progress of your experiments in real-time on your web browser. Setting up wandb will take just 10-15 mins and in this notebook, I will show you how you can set it up with an illustration of this challenge.
I am using baseline as an illustration. If you want to learn how to make a baseline submission (video tutorial) please refer to this notebook: https://www.aicrowd.com/showcase/create-your-baseline-with-0-4-on-lb-git-repo-and-video
How to use this notebook 📝¶
You can use this notebook as stand-alone colab solution, however I have not written a code to make a submission.
Content of this notebook¶
- Download relevant data
- Install requirements
- Initialize datasets
- Setup the wandb (Shown in below cells)
- Baseline model in
run.sh
- Execution pipeline You can copy the relevant code to your codebase during submission.
Note: I have written code for both CPU and CUDA. Search for this key: "CHANGE CPU CUDA HERE" and make changes as per your machine. You just have to comment and uncomment a very few lines.
Setting up WANDB¶
Steps:
- Create a free account on Wandb (https://wandb.ai/)
- Create a project in your profile (https://wandb.ai/USERNAME). You will get an API key which you should use for authentication while running the experiment.
- Install wandb (pip install wandb)
- Create wandb configuration (To track the Hyper-Parameters of your model). It's a free end dictionary which will store anything you want. Example: name of experient, learning rate, batch size, model name, optimizer name, transformations used, etc. An example is shown below
wandb_config={ "model": "ResNet101", "learning_rate": self.LEARNING_RATE, "samples_per_gpu": self.BATCH_SIZE, "workers_per_gpu": self.NUM_WORKERS, "augmentations": "NO", "description": "Pre-Training. No augmentation" }
- Initialize your project in
run.py
. An example is shown below
here,wandb.init(project="zew-data-purchase", entity="krazy", name=model_name, config=wandb_config)
- project-> The one you created in step 2
- entity-> Your wandb username
- name-> Name of your experiment
- Track the performance. Here you can put all the metrics you want to analyse. For example, loss at each step/epoch, accuracy at nth epoch, hamming loss at nth etc. An example is shown below.
For loss
For performancewandb.log({ "Step": (self.epoch*5000)+self.global_step, "Loss": loss, })
wandb.log({ "Epoch": self.epoch+1, "Accuracy": accuracy_score, "Hamming Loss": hamming_loss_score, "Match ratio": exact_match_ratio_score })
- Analyse the performance with different graphs in your Wandb project. (https://wandb.ai/USERNAME/PROJECT-NAME)
1) Login to AIcrowd 🤩¶
#@title Login to AIcrowd
!pip install -U aicrowd-cli > /dev/null
!aicrowd login 2> /dev/null
2) Setup magically, run the below cell 😉¶
#@title Magic Box ⬛ { vertical-output: true, display-mode: "form" }
try:
import os
if first_run and os.path.exists("/content/data-purchasing-challenge-2022-starter-kit/data/training"):
first_run = False
except:
first_run = True
if first_run:
%cd /content/
!git clone http://gitlab.aicrowd.com/zew/data-purchasing-challenge-2022-starter-kit.git > /dev/null
%cd data-purchasing-challenge-2022-starter-kit
!aicrowd dataset list -c data-purchasing-challenge-2022
!aicrowd dataset download -c data-purchasing-challenge-2022
!mkdir -p data/
!mv *.tar.gz data/ && cd data && echo "Extracting dataset" && ls *.tar.gz | xargs -n1 -I{} bash -c "tar -xvf {} > /dev/null"
def run_pre_training_phase():
from run import ZEWDPCBaseRun
run = ZEWDPCBaseRun()
run.pre_training_phase = pre_training_phase
run.pre_training_phase(self=run, training_dataset=training_dataset)
# NOTE:It is critical that the checkpointing works in a self-contained way
# As, the evaluators might choose to run the different phases separately.
run.save_checkpoint("/tmp/pretrainig_phase_checkpoint.pickle")
def run_purchase_phase():
from run import ZEWDPCBaseRun
run = ZEWDPCBaseRun()
run.pre_training_phase = pre_training_phase
run.purchase_phase = purchase_phase
run.load_checkpoint("/tmp/pretrainig_phase_checkpoint.pickle")
# Hacky way to make it work in notebook
unlabelled_dataset.purchases = set()
run.purchase_phase(self=run, unlabelled_dataset=unlabelled_dataset, training_dataset=training_dataset, budget=3000)
run.save_checkpoint("/tmp/purchase_phase_checkpoint.pickle")
del run
def run_prediction_phase():
from run import ZEWDPCBaseRun
run = ZEWDPCBaseRun()
run.pre_training_phase = pre_training_phase
run.purchase_phase = purchase_phase
run.prediction_phase = prediction_phase
run.load_checkpoint("/tmp/purchase_phase_checkpoint.pickle")
run.prediction_phase(self=run, test_dataset=val_dataset)
del run
3) Writing your code implementation! ✍️¶
a) Runtime Packages¶
#@title a) Runtime Packages<br/><small>Important: Add the packages required by your code here. (space separated)</small> { run: "auto", display-mode: "form" }
apt_packages = "build-essential vim" #@param {type:"string"}
pip_packages = "scikit-image pandas timeout-decorator==0.5.0 numpy" #@param {type:"string"}
!apt install -y $apt_packages git-lfs
!pip install $pip_packages
b) Load Dataset¶
The directory sturcture at this point looks like this:
Quick preview of images and labels.csv is as follows:
Let's initialise dataset instances.
from evaluator.dataset import ZEWDPCBaseDataset, ZEWDPCProtectedDataset
DATASET_SHUFFLE_SEED = 1022022
# Instantiate Training Dataset
training_dataset = ZEWDPCBaseDataset(
images_dir="./data/debug/images",
labels_path="./data/debug/labels.csv",
shuffle_seed=DATASET_SHUFFLE_SEED,
)
# Instantiate Unlabelled Dataset
unlabelled_dataset = ZEWDPCProtectedDataset(
images_dir="./data/debug/images",
labels_path="./data/debug/labels.csv",
budget=3000, # Configurable Parameter
shuffle_seed=DATASET_SHUFFLE_SEED,
)
# Instantiate Validation Dataset
val_dataset = ZEWDPCBaseDataset(
images_dir="./data/debug/images",
labels_path="./data/debug/labels.csv",
drop_labels=True,
shuffle_seed=DATASET_SHUFFLE_SEED,
)
val_dataset_gt = ZEWDPCBaseDataset(
images_dir="./data/debug/images",
labels_path="./data/debug/labels.csv",
drop_labels=False,
shuffle_seed=DATASET_SHUFFLE_SEED,
)
c) pre_training_phase¶
Pre-train your model on the available labelled dataset here.
Hook for the Pre-Training Phase of the Competition, where you have access to a training_dataset
, an instance of the ZEWDPCBaseDataset
class (see dataset.py for more details).
You are allowed to pre-train on this data while you prepare for the purchase phase of the competition.
If you train some models, you can instantiate them as self.model
, as long as you implement self-contained checkpointing in the self.save_checkpoint
and self.load_checkpoint
hooks, as the hooks for the different phases of the competition, can be called in other executions of the BaseRun.
Base code¶
import torch
from torch import nn
from torchvision import models
from torch.optim import Adam, SGD, lr_scheduler
from torchvision import transforms
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter
import numpy as np
import abc
import datetime
from tqdm import tqdm
from sklearn.metrics import accuracy_score
from sklearn.metrics import hamming_loss
from evaluator.dataset import ZEWDPCBaseDataset, ZEWDPCProtectedDataset
class ResNet101(nn.Module):
def __init__(self, num_labels):
super(ResNet101, self).__init__()
self.network = models.resnet101(pretrained=False, num_classes=num_labels)
model_dict = self.network.state_dict()
self.network.load_state_dict(model_dict)
for param in self.network.parameters():
param.requires_grad = False
for param in self.network.layer4.parameters():
param.requires_grad = True
self.network.fc.requires_grad = True
def forward(self, x):
x = self.network(x)
return x
class AverageMeter(object):
def __init__(self, num_classes):
super(AverageMeter, self).__init__()
self.num_classes = num_classes
def reset(self):
self._right_pred_counter = np.zeros(self.num_classes) # right predicted image per-class counter
self._pred_counter = np.zeros(self.num_classes) # predicted image per-class counter
self._gt_counter = np.zeros(self.num_classes) # ground-truth image per-class counter
def update(self, confidence, gt_label):
self._count(confidence, gt_label)
def compute(self):
self._op = sum(self._right_pred_counter) / sum(self._pred_counter)
self._or = sum(self._right_pred_counter) / sum(self._gt_counter)
self._of1 = 2 * self._op * self._or / (self._op + self._or)
self._right_pred_counter = np.maximum(self._right_pred_counter, np.finfo(np.float64).eps)
self._pred_counter = np.maximum(self._pred_counter, np.finfo(np.float64).eps)
self._gt_counter = np.maximum(self._gt_counter, np.finfo(np.float64).eps)
self._cp = np.mean(self._right_pred_counter / self._pred_counter)
self._cr = np.mean(self._right_pred_counter / self._gt_counter)
self._cf1 = 2 * self._cp * self._cr / (self._cp + self._cr)
@abc.abstractmethod
def _count(self, confidence, gt_label):
pass
@property
def op(self): # overall precision
return self._op
@property # overall recall
def or_(self):
return self._or
@property # overall F1
def of1(self):
return self._of1
@property # per-class precision
def cp(self):
return self._cp
@property # per-class recall
def cr(self):
return self._cr
@property # per-class F1
def cf1(self):
return self._cf1
Training class¶
# Install wandb
!pip install wandb
import wandb
class ZEWDPCBaseRun:
def __init__(self):
self.evaluation_state = {}
# Model parameters
self.BATCH_SIZE = 32
self.NUM_WORKERS = 2
self.LEARNING_RATE = 0.001
self.NUM_CLASSES = 4
self.TOPK= 3
self.THRESHOLD = 0.5
self.NUM_EPOCS = 50
self.EVAL_FREQ = 5
self.model = ResNet101(num_labels = self.NUM_CLASSES)
## CHANGE CPU CUDA HERE
# self.model.cuda()
self.model.cpu()
self.trainable_parameters = filter(lambda param: param.requires_grad, self.model.parameters())
self.optimizer = Adam(self.trainable_parameters, lr=self.LEARNING_RATE)
self.epoch = 0
self.lr_scheduler_ = lr_scheduler.ReduceLROnPlateau(
self.optimizer, mode='max', patience=2, verbose=True
)
self.criterion = nn.BCEWithLogitsLoss()
# WandB setup
model_name = "ResNet101"
wandb_config={
"model": model_name,
"learning_rate": self.LEARNING_RATE,
"samples_per_gpu": self.BATCH_SIZE,
"workers_per_gpu": self.NUM_WORKERS,
"augmentations": "NO",
"description": "Pre-Training. No augmentation"
}
wandb.init(project="temp", entity="krazy", name=model_name, config=wandb_config)
def pre_training_phase(
self, training_dataset: ZEWDPCBaseDataset, register_progress=lambda x: False
):
print("\n================> Pre-Training Phase\n")
# Creating transformations
train_transform = transforms.Compose([
transforms.ToTensor(),
])
training_dataset.set_transform(train_transform)
train_loader = DataLoader(
dataset=training_dataset,
batch_size=self.BATCH_SIZE,
shuffle=False,
num_workers=self.NUM_WORKERS,
)
def run_epoch():
for _, batch in enumerate(train_loader):
## CHANGE CPU CUDA HERE
# x, y = batch["image"].cuda(), batch["label"]
x, y = batch["image"].cpu(), batch["label"]
pred_y = self.model(x)
# Change the shape of true labels here. Because for last batch the no. of images can be less
y = torch.cat(y, dim=0).reshape(
self.NUM_CLASSES, pred_y.shape[0]
).T.type(torch.FloatTensor)
## CHANGE CPU CUDA HERE. Comment for CPU
# y = y.cuda()
loss = self.criterion(pred_y, y)
self.optimizer.zero_grad()
loss.backward()
self.optimizer.step()
# 416 = BATCH_SIZE*13
if self.global_step % 416 == 0:
wandb.log({
"Step": (self.epoch*5000)+self.global_step,
"Loss": loss,
})
print("[{}] Training [epoch {}, step {}], loss: {:4f}".format(
datetime.datetime.now(), self.epoch, self.global_step, loss))
self.global_step += self.BATCH_SIZE
epoch_range = tqdm(range(self.epoch, self.NUM_EPOCS))
for i in epoch_range:
epoch_range.set_description(f"Epoch: {i}")
self.global_step = 0
run_epoch()
register_progress(i) # Epoch as progress
if (i+1)%self.EVAL_FREQ == 0:
predictions = self.prediction_phase(val_dataset)
self.evaluation(predictions)
self.epoch += 1
print("Execution Complete of Training Phase.")
def purchase_phase(
self,
unlabelled_dataset: ZEWDPCProtectedDataset,
training_dataset: ZEWDPCBaseDataset,
budget=1000,
register_progress=lambda x: False,
):
"""
# Purchase Phase
-------------------------
In this phase of the competition, you have access to
the unlabelled_dataset (an instance of `ZEWDPCProtectedDataset`)
and the training_dataset (an instance of `ZEWDPCBaseDataset`)
{see datasets.py for more details}, and a purchase budget.
You can iterate over both the datasets and access the images without restrictions.
However, you can probe the labels of the unlabelled_dataset only until you
run out of the label purchasing budget.
PARTICIPANT_TODO: Add your code here
"""
print("\n================> Purchase Phase | Budget = {}\n".format(budget))
register_progress(0.0) #Register Progress
for sample in tqdm(unlabelled_dataset):
idx = sample["idx"]
# image = unlabelled_dataset.__getitem__(idx)
# print(image)
# Budgeting & Purchasing Labels
if budget > 0:
label = unlabelled_dataset.purchase_label(idx)
budget -= 1
register_progress(1.0) #Register Progress
print("Execution Complete of Purchase Phase.")
def prediction_phase(
self,
test_dataset: ZEWDPCBaseDataset,
register_progress=lambda x: False,
):
"""
# Prediction Phase
-------------------------
In this phase of the competition, you have access to the test dataset, and you
are supposed to make predictions using your trained models.
Returns:
np.ndarray of shape (n, 4)
where n is the number of samples in the test set
and 4 refers to the 4 labels to be predicted for each sample
for the multi-label classification problem.
PARTICIPANT_TODO: Add your code here
"""
print(
"\n================> Prediction Phase : - on {} images\n".format(
len(test_dataset)
)
)
test_transform = transforms.Compose([
transforms.ToTensor(),
])
test_dataset.set_transform(test_transform)
test_loader = DataLoader(
dataset=test_dataset,
batch_size=self.BATCH_SIZE,
shuffle=False,
num_workers=self.NUM_WORKERS,
)
def convert_to_label(preds):
return np.array((torch.sigmoid(preds) > 0.5), dtype=int).tolist()
predictions = []
self.model.eval()
with torch.no_grad():
for _, batch in enumerate(test_loader):
## CHANGE CPU CUDA HERE
X= batch['image'].cpu()
# X = batch['image'].cuda()
pred_y = self.model(X)
# Convert to labels
pred_y_labels = []
for arr in pred_y:
## CHANGE CPU CUDA HERE
# pred_y_labels.append(convert_to_label(arr.cpu())) # For CUDA
pred_y_labels.append(convert_to_label(arr)) # For CPU
# Save the results
predictions.extend(pred_y_labels)
register_progress(1.0)
predictions = np.array(predictions) # random predictions
print("Execution Complete of Purchase Phase.")
return predictions
def evaluation(self, predictions):
from evaluator.evaluation_metrics import accuracy_score, hamming_loss, exact_match_ratio
y_true = val_dataset_gt._get_all_labels()
y_pred = predictions
accuracy_score = accuracy_score(y_true, y_pred)
hamming_loss_score = hamming_loss(y_true, y_pred)
exact_match_ratio_score = exact_match_ratio(y_true, y_pred)
wandb.log({
"Epoch": self.epoch+1,
"Accuracy": accuracy_score,
"Hamming Loss": hamming_loss_score,
"Match ratio": exact_match_ratio_score
})
print("Accuracy Score : ", accuracy_score)
print("Hamming Loss : ", hamming_loss_score)
print("Exact Match Ratio : ", exact_match_ratio_score)
def save_checkpoint(self, checkpoint_path):
"""
Saves the checkpoint in the checkpoint_path directory. Each checkpoint will be saved for epoch_x
"""
save_dict = {
'epoch': self.epoch + 1,
'model_state_dict': self.model.state_dict(),
'optim_state_dict': self.optimizer.state_dict(),
}
torch.save(save_dict, checkpoint_path)
print(f"Checkpont epoch:{self.epoch} Model saved at {checkpoint_path}")
def load_checkpoint(self, checkpoint_path):
"""
Load the latest checkpoint from the experiment
"""
## CHANGE CPU CUDA HERE
# checkpoint_model = torch.load(checkpoint_path, map_location="cuda:0")
checkpoint_model = torch.load(checkpoint_path, map_location="cpu")
self.latest_epoch = checkpoint_model['epoch']
self.model.load_state_dict(checkpoint_model['model_state_dict'])
self.optimizer.load_state_dict(checkpoint_model['optim_state_dict'])
print('loading checkpoint success (epoch {})'.format(self.latest_epoch))
import tempfile
checkpoint_path = tempfile.NamedTemporaryFile(delete=False).name
# checkpoint_path = "/content/drive/MyDrive/data-purchasing-challenge-2022-starter-kit/experiments/baseline/debug.pt"
run = ZEWDPCBaseRun()
## Pre - Training process
run.pre_training_phase(training_dataset)
run.save_checkpoint(checkpoint_path)
del run
# ## Purchasing phase
run = ZEWDPCBaseRun()
run.load_checkpoint(checkpoint_path)
run.purchase_phase(unlabelled_dataset, training_dataset, budget=3000)
run.save_checkpoint(checkpoint_path)
del run
## Prediction phase
run = ZEWDPCBaseRun()
run.load_checkpoint(checkpoint_path)
predictions = run.prediction_phase(val_dataset)
assert type(predictions) == np.ndarray
assert predictions.shape == (len(val_dataset), 4)
## Evaluation Phase
from evaluator.evaluation_metrics import accuracy_score, hamming_loss, exact_match_ratio
y_true = val_dataset_gt._get_all_labels()
y_pred = predictions
accuracy_score = accuracy_score(y_true, y_pred)
hamming_loss_score = hamming_loss(y_true, y_pred)
exact_match_ratio_score = exact_match_ratio(y_true, y_pred)
print("Accuracy Score : ", accuracy_score)
print("Hamming Loss : ", hamming_loss_score)
print("Exact Match Ratio : ", exact_match_ratio_score)
Content
Comments
You must login before you can post a comment.