Make your first submission in just a few clicks!
Baseline containing code for submitting submissions for Letter Recognition (AI Blitz 5)
Getting Started Code for TXTOCR Challenge on AIcrowd¶
Author : Shubhamai¶
Installing AIcrowd CLI and Authentication¶
This will help in easy downloading dataset and submitting directly via this notebook. Do not forget to participate and accept the rules before ruinning this notebook.
In [ ]:
!pip install git+https://gitlab.aicrowd.com/yoogottamk/aicrowd-cli.git
API_KEY = "" #Input your API key here, you can get it from your profile page.
!aicrowd login --api-key $API_KEY
Download Data¶
The first step is to download our train, val & test data. We will be training a model on the train data and make predictions on test data and submit our predictions.
In [ ]:
!aicrowd dataset download -c txtocr >/dev/null
In [ ]:
!rm -rf data
!mkdir data
!mv train.csv data/train.csv
!mv val.csv data/val.csv
!unzip train.zip -d data/
!unzip val.zip -d data/
!unzip test.zip -d data/
Importing Libraries¶
In [ ]:
!apt update
!apt install tesseract-ocr
!apt install libtesseract-dev
!pip install --upgrade fastai
!pip install pytesseract
In [ ]:
import pandas as pd
from fastai import *
from fastai.vision import *
from fastai.vision.data import *
from fastai.vision.all import *
import pytesseract
from tqdm.notebook import tqdm
In [ ]:
train_df = pd.read_csv("data/train.csv")
val_df = pd.read_csv("data/val.csv")
train_df.head()
In [ ]:
# Adding full image path
train_df['image_id'] = "data/train/"+train_df['image_id'].astype(str)+".png"
train_df
Making Predictions¶
Instead of training our model on training set and then making predictions, we are going to directly make predictions by using pytesseract
,an optical character recognition tool for python
In [ ]:
test_imgs_paths = os.listdir("data/test")
predictions = []
for test_img_path in tqdm(test_imgs_paths):
label = pytesseract.image_to_string(Image.open("data/test/"+test_img_path))
#Removing garbage characters
label = label.replace("\x0c","")
label = label.replace("\n","")
predictions.append(label)
In [ ]:
# Making our testing dataframe
test_imgs_paths = [int(i.split(".")[0]) for i in test_imgs_paths]
test_df = pd.DataFrame(test_imgs_paths, columns=["image_id"])
test_df['label'] = predictions
test_df
In [ ]:
# Saving predictions
test_df.to_csv("submission.csv", index=False)
To download the generated csv in colab run the below command¶
In [ ]:
try:
from google.colab import files
files.download('submission.csv')
except:
print("Option Only avilable in Google Colab")
Well Done! 👍 We are all set to make a submission and see your name on leaderborad. Let navigate to challenge page and make one.¶
In [ ]:
Content
Comments
You must login before you can post a comment.