MABe 2022: Ant-Beetles - Video Data

Getting Started - Ant-Beetles Video Data

Initial data exploration and a basic embedding using a vision model


Initial data exploration and a basic embedding using a vision modelInitial data exploration and a basic embedding using a vision model. 


How to use this notebook 📝

  1. Copy the notebook. This is a shared template and any edits you make here will not be saved. You should copy it into your own drive folder. For this, click the "File" menu (top-left), then "Save a Copy in Drive". You can edit your copy however you like.
  2. Link it to your AIcrowd account. In order to submit your predictions to AIcrowd, you need to provide your account's API key.

Problem Statement

Join the AIcrowd Discord Server!
chat on Discord

Setup AIcrowd Utilities 🛠

In [1]:
!pip install -U aicrowd-cli
%load_ext aicrowd.magic
Collecting aicrowd-cli
  Downloading aicrowd_cli-0.1.15-py3-none-any.whl (51 kB)
     |████████████████████████████████| 51 kB 3.5 MB/s 
Collecting pyzmq==22.1.0
  Downloading pyzmq-22.1.0-cp37-cp37m-manylinux1_x86_64.whl (1.1 MB)
     |████████████████████████████████| 1.1 MB 9.1 MB/s 
Collecting requests<3,>=2.25.1
  Downloading requests-2.27.1-py2.py3-none-any.whl (63 kB)
     |████████████████████████████████| 63 kB 1.6 MB/s 
Collecting toml<1,>=0.10.2
  Downloading toml-0.10.2-py2.py3-none-any.whl (16 kB)
Collecting GitPython==3.1.18
  Downloading GitPython-3.1.18-py3-none-any.whl (170 kB)
     |████████████████████████████████| 170 kB 74.4 MB/s 
Requirement already satisfied: click<8,>=7.1.2 in /usr/local/lib/python3.7/dist-packages (from aicrowd-cli) (7.1.2)
Collecting requests-toolbelt<1,>=0.9.1
  Downloading requests_toolbelt-0.9.1-py2.py3-none-any.whl (54 kB)
     |████████████████████████████████| 54 kB 3.4 MB/s 
Collecting rich<11,>=10.0.0
  Downloading rich-10.16.2-py3-none-any.whl (214 kB)
     |████████████████████████████████| 214 kB 66.8 MB/s 
Requirement already satisfied: tqdm<5,>=4.56.0 in /usr/local/lib/python3.7/dist-packages (from aicrowd-cli) (4.63.0)
Requirement already satisfied: semver<3,>=2.13.0 in /usr/local/lib/python3.7/dist-packages (from aicrowd-cli) (2.13.0)
Collecting python-slugify<6,>=5.0.0
  Downloading python_slugify-5.0.2-py2.py3-none-any.whl (6.7 kB)
Requirement already satisfied: typing-extensions>= in /usr/local/lib/python3.7/dist-packages (from GitPython==3.1.18->aicrowd-cli) (
Collecting gitdb<5,>=4.0.1
  Downloading gitdb-4.0.9-py3-none-any.whl (63 kB)
     |████████████████████████████████| 63 kB 2.3 MB/s 
Collecting smmap<6,>=3.0.1
  Downloading smmap-5.0.0-py3-none-any.whl (24 kB)
Requirement already satisfied: text-unidecode>=1.3 in /usr/local/lib/python3.7/dist-packages (from python-slugify<6,>=5.0.0->aicrowd-cli) (1.3)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (2021.10.8)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (2.0.12)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (2.10)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (1.24.3)
Requirement already satisfied: pygments<3.0.0,>=2.6.0 in /usr/local/lib/python3.7/dist-packages (from rich<11,>=10.0.0->aicrowd-cli) (2.6.1)
Collecting colorama<0.5.0,>=0.4.0
  Downloading colorama-0.4.4-py2.py3-none-any.whl (16 kB)
Collecting commonmark<0.10.0,>=0.9.0
  Downloading commonmark-0.9.1-py2.py3-none-any.whl (51 kB)
     |████████████████████████████████| 51 kB 8.7 MB/s 
Installing collected packages: smmap, requests, gitdb, commonmark, colorama, toml, rich, requests-toolbelt, pyzmq, python-slugify, GitPython, aicrowd-cli
  Attempting uninstall: requests
    Found existing installation: requests 2.23.0
    Uninstalling requests-2.23.0:
      Successfully uninstalled requests-2.23.0
  Attempting uninstall: pyzmq
    Found existing installation: pyzmq 22.3.0
    Uninstalling pyzmq-22.3.0:
      Successfully uninstalled pyzmq-22.3.0
  Attempting uninstall: python-slugify
    Found existing installation: python-slugify 6.1.1
    Uninstalling python-slugify-6.1.1:
      Successfully uninstalled python-slugify-6.1.1
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires requests~=2.23.0, but you have requests 2.27.1 which is incompatible.
datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.
Successfully installed GitPython-3.1.18 aicrowd-cli-0.1.15 colorama-0.4.4 commonmark-0.9.1 gitdb-4.0.9 python-slugify-5.0.2 pyzmq-22.1.0 requests-2.27.1 requests-toolbelt-0.9.1 rich-10.16.2 smmap-5.0.0 toml-0.10.2

Login to AIcrowd ㊗¶

In [2]:
%aicrowd login
Please login here: https://api.aicrowd.com/auth/pJPFul2JIDc6NKXbjwXG8D6XZ62Uus8e4qu5_lQVMuw
API Key valid
Gitlab access token valid
Saved details successfully!

Install packages 🗃

Please add all pacakages installations in this section

In [3]:
!pip install torch torchvision tqdm
Requirement already satisfied: torch in /usr/local/lib/python3.7/dist-packages (1.10.0+cu111)
Requirement already satisfied: torchvision in /usr/local/lib/python3.7/dist-packages (0.11.1+cu111)
Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (4.63.0)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch) (
Requirement already satisfied: pillow!=8.3.0,>=5.3.0 in /usr/local/lib/python3.7/dist-packages (from torchvision) (7.1.2)
Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from torchvision) (1.21.5)

Import necessary modules and packages 📚

In [4]:
import os
import cv2
import numpy as np
from tqdm.auto import tqdm

import torch
import torchvision
import torchvision.transforms as T

import copy
import matplotlib.pyplot as plt
from matplotlib import animation
from matplotlib import colors
from matplotlib import rc
from matplotlib import rcParams

Download and prepare the dataset 🔍

In [5]:
aicrowd_challenge_name = "mabe-2022-ant-beetles-video-data"
if not os.path.exists('data'):

datafolder = 'data/'

## If data is already downloaded and stored on google drive, skip the download and point to the prepared directory
# datafolder = '/content/drive/MyDrive/mabe-2022-ant-beetle/data/'

video_folder = f'{datafolder}video_clips/'
In [6]:
## The download might take a while, recommend to move to Google Drive if you want to run multiple times.
%aicrowd ds dl -c {aicrowd_challenge_name} -o data *.npy* # Download all files
# We'll download the 224x224 videos since they're fast on the dataloader, but you can use the full sized videos if you want
%aicrowd ds dl -c {aicrowd_challenge_name} -o data *resized_224* # Download all file
# %aicrowd ds dl -c {aicrowd_challenge_name} -o data *videos.zip* # Download the 512x512 videos
In [ ]:
!unzip -q data/submission_videos_resized_224.zip  -d {video_folder}
!unzip -q data/userTrain_videos_resized_224.zip -d {video_folder}

## Careful when running the below commands - For copying to Google Drive
# !rm data/submission_videos.zip data/userTrain_videos.zip 
# !cp -r data/ '/content/drive/MyDrive/mabe-2022-ant-beetle/data/'

Data Description 📚

The following files are available in the Resources section on the Challenge Page. A "sequence" is a continuous recording of social interactions between animals: sequences are 30 seconds long (900 frames at 30Hz) in the beetle dataset. The sequence_id is a random hash to anonymize experiment details. nans indicate missing data. These occur because not all videos are labelled for all tasks. Data are padded with nans to be all the same size.

  • user_train.npy - Set of videos where three public tasks are provided, for your local validation, which follows the following schema :

    "vocabulary" : A list of public task names
    "sequences" : {
        "<sequence_id> : {
            "annotations" : a ndarray of shape (3, 900) - Per frame labels for each of the public tasks
            "keypoints" : a ndarray of shape (900, 4) - Single point tracking on each of the ant-beetles
  • submission_keypoints.npy - Keypoints for the submission clips, which follows the following schema :

    "sequences" : {
        "<sequence_id> : {
            "keypoints" : a ndarray of shape (900, 4) - Single point tracking on each of the ant-beetles
  • frame_number_map.npy - A map of frame numbers for each clip to be used for the submission embeddings array

  • sample_submission.npy - Template for a sample submission for this task, follows the following schema :

        {"<sequence_id-1>": (start_frame_index, end_frame_index),
        "<sequence_id-1>": (start_frame_index, end_frame_index),
        "<sequence_id-n>": (start_frame_index, end_frame_index),
    "<sequence_id-1>" : [
            [0.321, 0.234, 0.186, 0.857, 0.482, 0.185], .....]
            [0.184, 0.583, 0.475, 0.485, 0.275, 0.958], .....]
  • userTrain_videos.zip - Videos for the userTrain sequences, all 512x512 Grayscale 30 fps - 900 frames each
  • submission_videos.zip - Videos for the Submission sequences, all 512x512 Grayscale 30 fps - 900 frames each

In sample_submission, each key in the frame_number_map dictionary refers to the unique sequence id of a video in the test set. The item for each key is expected to be an the start and end index for slicing the embeddings numpy array to get the corresponding embeddings. The embeddings array is a 2D ndarray of floats of size total_frames by X , where X is the dimension of your learned embedding (6 in the above example; maximum permitted embedding dimension is 128), representing the embedded value of each frame in the sequence. total_frames is the sum of all the frames of the sequences, the array should be concatenation of all the embeddings of all the clips.

In [8]:
# Load data
userTrain_data = np.load(datafolder + 'user_train.npy', allow_pickle=True).item()
submission_keypoints = np.load(datafolder + 'submission_keypoints.npy', allow_pickle=True).item()
sample_submission = np.load(datafolder + 'sample_submission.npy')
frame_number_map = np.load(datafolder + 'frame_number_map.npy', allow_pickle=True).item()
In [9]:
# Check some basic info
print("UserTrain Vocabulary (Public Tasks)", userTrain_data['vocabulary'])
print("Number of UserTrain Sequences", len(userTrain_data['sequences']))
print("Number of Submission Sequences", len(submission_keypoints['sequences']))
UserTrain Vocabulary (Public Tasks) ['reapplied', 'grooming_self', 'exploring_object']
Number of UserTrain Sequences 1948
Number of Submission Sequences 9491
In [10]:
sk = list(userTrain_data['sequences'].keys())[0]
single_sequence = userTrain_data['sequences'][sk]
print("Sequence name", sk, " - Sequence keys", single_sequence.keys())
print("Annotations shape", single_sequence['annotations'].shape)
print("Keypoints shape", single_sequence['keypoints'].shape)
Sequence name 001a9aa693e2cfe120ed  - Sequence keys dict_keys(['keypoints', 'annotations'])
Annotations shape (3, 900)
Keypoints shape (900, 4)

Visualize the sequences 🤓

In [11]:
class_to_number = {s: i for i, s in enumerate(userTrain_data['vocabulary'])}
number_to_class = {i: s for i, s in enumerate(userTrain_data['vocabulary'])}
In [12]:
rcParams['animation.embed_limit'] = 2**128
rc('animation', html='jshtml')
# Note: Image processing may be slow if too many frames are animated.                
#Plotting constants
POINT_COLORS = ['lawngreen', 'tomato']
def set_figax():
    fig = plt.figure(figsize=(8, 8))
    img = np.zeros((FRAME_HEIGHT_TOP, FRAME_WIDTH_TOP, 3))
    ax = fig.add_subplot(111)
    imh = ax.imshow(img)
    return fig, ax, imh
def plot_points(ax, keypoints, colors):
    # Draw the keypoints
    keypoints = [int(k * 224) for k in keypoints]
    ax.plot(keypoints[1], keypoints[0], 'o', color=colors[0], markersize=5)
    ax.plot(keypoints[3], keypoints[2], 'o', color=colors[1], markersize=5)

def animate_pose_sequence(video_name, seq, 
                          start_frame = 0, stop_frame = 100, skip = 0, 
                          load_video = True, video_directory = []):
    # Returns the animation of the keypoint sequence between start frame
    # and stop frame.
    image_list = []
    cap = []
    if load_video:
        curr_vid = os.path.join(video_directory, video_name + '.avi')
        if not os.path.exists(curr_vid):
            print('I couldn''t find a video for sequence ' + video_name + ' in ' + video_directory)
            cap = cv2.VideoCapture(curr_vid)
            cap.set(cv2.CAP_PROP_POS_FRAMES, start_frame)
    counter = 0
    if skip:
        anim_range = range(start_frame, stop_frame, skip)
        anim_range = range(start_frame, stop_frame)
    for j in anim_range:
        if counter%100 == 0:
            print("Processing frame ", j)

        fig, ax, imh = set_figax()
        plot_points(ax, seq[j, :], colors=POINT_COLORS)

        if cap:
            cap.set(cv2.CAP_PROP_POS_FRAMES, j)
            ret,frame = cap.read()

            video_name + '\n frame {:03d}.png'.format(j))


        image_from_plot = np.frombuffer(fig.canvas.tostring_rgb(),
        image_from_plot = image_from_plot.reshape(
            fig.canvas.get_width_height()[::-1] + (3,)) 


        counter = counter + 1
    # Plot animation.
    fig = plt.figure(figsize=(8,8))
    im = plt.imshow(image_list[0])
    def animate(k):
        return im,
    ani = animation.FuncAnimation(fig, animate, frames=len(image_list), blit=True)
    return ani
In [13]:
sequence_names = list(userTrain_data['sequences'].keys())
sequence_key = sequence_names[-6]
single_sequence = userTrain_data["sequences"][sequence_key]

keypoint_sequence = single_sequence['keypoints']

masked_data = np.ma.masked_where(keypoint_sequence==0, keypoint_sequence)

ani = animate_pose_sequence(sequence_key,
                            start_frame = 300,
                            stop_frame = 500,
                            skip = 0,
                            load_video = True,
                            video_directory = video_folder)

# Display the animaion on colab
Output hidden; open in https://colab.research.google.com to view.

EDA 🕵️

In [14]:
# Percentage of frames for each task
for task_idx, task_name in enumerate(userTrain_data['vocabulary']):
    l0, l1 = 0, 0 # We count both because NaNs can exist
    for sk in userTrain_data['sequences'].keys():
        l0 += np.sum(userTrain_data['sequences'][sk]['annotations'][task_idx] == 0)
        l1 += np.sum(userTrain_data['sequences'][sk]['annotations'][task_idx] == 1)
    print(f"Task {task_name} - Percentage Frames Active {l1/l0*100:0.3f}")
Task reapplied - Percentage Frames Active 11.825
Task grooming_self - Percentage Frames Active 72.062
Task exploring_object - Percentage Frames Active 11.750
In [15]:
# Check the number of bouts of each task occuring
def check_bouts(anno):
    anno_padded = np.pad(anno.copy(), 1)
    anno_padded[np.isnan(anno_padded)] = 0
    if np.sum(anno_padded) == 0:
        return None
    locs = np.where(np.diff(anno_padded))
    return locs[0].reshape(-1,2)

def get_bout_infos(dataset):
    num_tasks = len(userTrain_data['vocabulary'])
    bout_infos = [np.empty((0,2)) for _ in range(num_tasks)]
    for sk in dataset['sequences']:    
        anno = dataset['sequences'][sk]['annotations']
        for idx in range(num_tasks):
            bout_limits = check_bouts(anno[idx])
            if bout_limits is not None:
                bout_infos[idx] = np.concatenate([bout_infos[idx], bout_limits], axis=0)
    return bout_infos

bout_infos = get_bout_infos(userTrain_data)
for task_idx, task_name in enumerate(userTrain_data['vocabulary']):
    b_info = bout_infos[task_idx]

    blens = b_info[:, 1] - b_info[:, 0]

    print(f"Number of bouts : {len(b_info)}")
    print(f"Average length : {np.mean(blens)}")
    print(f"Std lengths : {np.std(blens)}")
Number of bouts : 206
Average length : 900.0
Std lengths : 0.0

Number of bouts : 59
Average length : 651.6440677966102
Std lengths : 306.07497490226007

Number of bouts : 41
Average length : 235.41463414634146
Std lengths : 257.34658945265807

Generate an embedding ✨

We'll generate a basic embedding using a pre-trained vision model.

In [16]:
num_frames_per_clip = 900
image_size = (224, 224)
batch_size = 9 # Reduce this if encountering OOM errors
frame_skip = 17  # For every 1 frame, skip 8 frames after that
# NOTE - We skip frames because the output generation on all frames takes a lot of time, 
# primarily because reading videos is slow. Resizing frames also takes lot of time.

class MabeVideoDataset(torch.utils.data.Dataset):
    Reads all frames from video files with frame skip
    def __init__(self, 
        Initializing the dataset with images and labels
        self.videofolder = videofolder
        self.frame_number_map = frame_number_map
        self.video_keys = list(frame_number_map.keys())
        self.frame_skip = frame_skip # For every frame read, skip <frame_skip> frames after that
        assert num_frames_per_clip % (frame_skip + 1) == 0, "frame_skip+1 should exactly divide frame number map"
        self.num_frames = num_frames_per_clip // (self.frame_skip + 1)
        self.transform = T.Compose([
            # T.Resize(image_size), # Use this if using full sized videos
            T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

    def __len__(self):
        return len(self.frame_number_map)
    def __getitem__(self, idx):
        video_name = self.video_keys[idx]

        video_path = os.path.join(self.videofolder, video_name + '.avi')
        if not os.path.exists(video_path):
            # raise FileNotFoundError(video_path)
            print("File not found", video_path)
            return torch.zeros((self.num_frames, 3, *image_size), dtype=torch.float32)
        cap = cv2.VideoCapture(video_path)
        frame_array = torch.zeros((self.num_frames, 3, *image_size), dtype=torch.float32)

        for array_idx, frame_idx in enumerate(range(0, num_frames_per_clip, self.frame_skip+1)):
            cap.set(cv2.CAP_PROP_POS_FRAMES, frame_idx)
            success, frame = cap.read()
            if success:
                frame_tensor = self.transform(frame)
                frame_array[array_idx] = frame_tensor
        return frame_array
In [17]:
dataset = MabeVideoDataset(videofolder=video_folder,

dataloader = torch.utils.data.DataLoader(
In [18]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
def get_model():
    resnet_encoder = torchvision.models.resnet18(pretrained=False)
    model = torch.nn.Sequential(*list(resnet_encoder.children())[:-1])
    return model

model = get_model()
In [19]:
embedding_size = 128 # We'll clip the top embeddings from the output of the CNN
submission = np.empty((sample_submission.shape[0], embedding_size), dtype=np.float32)
idx = 0
for data in tqdm(dataloader, total=len(dataloader)):
    with torch.no_grad():
        dshape = data.shape
        images = data.reshape((-1, *dshape[2:])).to(device) # Squeeze first 2 dimensions to make 4D
        output = model(images)
        output = output[:, :embedding_size, 0, 0]
        output = output.reshape((dshape[0], dshape[1], -1)) # Return the outputs to 2D for multiple clips
        output = output.cpu().numpy()
        output = np.repeat(output, frame_skip+1, axis=1) # Repeat the output for next skipped frames
        # At this point the output should be the embeddings for batch_size number of clips
        # Shape of output - (batch_size, num_frames_per_clip, embedding_size)
        output = np.reshape(output, (-1, embedding_size))
        submission[idx:idx+output.shape[0]] = output
        idx += output.shape[0]

Submission 🚀

Validate and submit to AIcrowd

In [20]:
print("Embedding shape:", submission.shape)
Embedding shape: (8541900, 128)

Validate the submission ✅

The submssion should follow these constraints:

  • It should be a numpy array
  • Embeddings is an 2D numpy array of dtype float32
  • The embedding size should't exceed 128
  • The frame number map matches the clip lengths
  • You can use the helper function below to check these
In [21]:
def validate_submission(submission, frame_number_map):
    if not isinstance(submission, np.ndarray):
        print("Embeddings should be a numpy array")
        return False
    elif not len(submission.shape) == 2:
        print("Embeddings should be 2D array")
        return False
    elif not submission.shape[1] <= 128:
        print("Embeddings too large, max allowed is 128")
        return False
    elif not isinstance(submission[0, 0], np.float32):
        print(f"Embeddings are not float32")
        return False

    total_clip_length = frame_number_map[list(frame_number_map.keys())[-1]][1]
    if not len(submission) == total_clip_length:
        print(f"Emebddings length doesn't match submission clips total length")
        return False

    if not np.isfinite(submission).all():
        print(f"Emebddings contains NaN or infinity")
        return False

    print("All checks passed")
    return True
In [22]:
validate_submission(submission, frame_number_map)
All checks passed
In [23]:
np.save('submission.npy', submission)
In [ ]:
%aicrowd submission create --description "Ant-Beetle-Getting-Started" -c {aicrowd_challenge_name} -f submission.npy


You must login before you can post a comment.