Loading

Lingua Franca Translation

Modified Getting Started Notebook for Lingua Franca Transala

A getting started notebook for the challenge.

victorkras2008

Create dict1 for 1-th word of sentences. 

Bleu = 0.080

Getting Started with Lingua Franca Translation

In this puzzle, we've to translate to english from crowd-talk lanugage. There are multiple ways to build the language translator:

  • Using Dictionary and Mapping
  • Using LSTM
  • Using Transformers

In this starter notebook, we'll go with dictionary and mapping. Here We'll create dictionary of words for both english and corwd-talk language.

Download the files 💾

Download AIcrowd CLI

We will first install aicrowd-cli which will help you download and later make submission directly via the notebook.

In [ ]:
%%capture
!pip install aicrowd-cli
%load_ext aicrowd.magic

Login to AIcrowd ㊗

In [ ]:
%aicrowd login
Please login here: https://api.aicrowd.com/auth/NPz72ux6cPJoh9ZbLHQWW3v_BO3gSIlOlqpxPVjWbjo
API Key valid
Saved API Key successfully!

Download Dataset

We will create a folder name data and download the files there.

In [ ]:
!rm -rf data
!mkdir data
%aicrowd ds dl -c lingua-franca-translation -o data

Importing Necessary Libraries

In [1]:
import os
import pandas as pd
import gensim
from sklearn.metrics.pairwise import cosine_similarity

Diving in the dataset:

In [2]:
train_df = pd.read_csv("data/train.csv")
In [3]:
train_df
Out[3]:
id crowdtalk english
0 31989 wraov driourth wreury hyuirf schneiald chix lo... upon this ladder one of them mounted
1 29884 treuns schleangly kriaors draotz pfiews schlio... and solicited at the court of Augustus to be p...
2 26126 toirts choolt chiugy knusm squiend sriohl gheold but how am I sunk!
3 44183 schlioncy yoik yahoos dynuewn maery schlioncy ... the Yahoos draw home the sheaves in carriages
4 19108 treuns schleangly tsiens mcgaantz schmeecks tr... and placed his hated hands before my eyes
... ... ... ...
11950 50106 hydriaond cieurry mcdaabs swiings schlioncy yo... about five hundred leagues to the east
11951 14786 treuns schleangly criaody treuns schleangly wr... ) and two and a half in breadth
11952 16903 toirts choolt cycluierg triild schuony hypuids... “But my toils now drew near a close
11953 68451 toantz spluiey gheuck schoutch spluiey gheuck ... going as soon as I was dressed to pay my atten...
11954 30895 shriedy hyoirds splauetch sooc kniousts schlai... for there was no sign of any violence except t...

11955 rows × 3 columns

In [4]:
english = train_df.english.values
crowdtalk = train_df.crowdtalk.values
In [5]:
english
Out[5]:
array(['upon this ladder one of them mounted',
       'and solicited at the court of Augustus to be preferred to a greater ship',
       'but how am I sunk!', ..., '“But my toils now drew near a close',
       'going as soon as I was dressed to pay my attendance upon his honour',
       'for there was no sign of any violence except the black mark of fingers on his neck.'],
      dtype=object)
In [6]:
processedLines = [gensim.utils.simple_preprocess(sentence) for sentence in english]
#eng_word_list = [word for words in processedLines for word in words]

eng_word_list = [word[0] for word in processedLines ]  # only 1-th words (Bleu = 0.080)  !!!
In [7]:
processedLines = [gensim.utils.simple_preprocess(sentence) for sentence in crowdtalk]
#crowdtalk_word_list = [word for words in processedLines for word in words]

crowdtalk_word_list = [word[0] for word in processedLines]  # only 1-th words (Bleu = 0.080)  !!!
In [8]:
dict1 = dict(zip(crowdtalk_word_list, eng_word_list))

Prediction Phase ✈

In [9]:
test_df = pd.read_csv("data/test.csv")
In [10]:
test_df.crowdtalk[3984]
Out[10]:
'zoetz treiahl typeauty squiend sriohl daonts schloors rhiuny'
In [11]:
crowdtalk = test_df.crowdtalk.values
In [12]:
processedLines = [gensim.utils.simple_preprocess(sentence) for sentence in crowdtalk]

Creating sentences by matching english word corresponding the new langauge word in the sentence using the dictionary mapping created.

In [13]:
sentence = []

for i in processedLines:
  sentence_part = []
  word = ''
  for k, j in enumerate(i):
    if j in dict1:
      word = ''.join(dict1[j])
    else:
      word = ''.join(' ')
    sentence_part.append(word)
    temp = ' '.join(sentence_part)
  sentence.append(temp)
In [14]:
test_df['prediction'] = sentence
In [15]:
test_df
Out[15]:
id crowdtalk prediction
0 27226 treuns schleangly throuys praests qeipp cyclui... and my of
1 31034 feosch treuns schleangly gliath spluiey gheuck... scared and as was only
2 35270 scraocs knaedly squiend sriohl clield whaioght... when only found on my
3 23380 sqaups schlioncy yoik gnoirk cziourk schnaunk ... according the to he had given
4 92117 schlioncy yoik psycheiancy mcountz pously mcna... the very that
... ... ... ...
3980 22854 scraocs knaedly daioc mceab spriaonn schmeips ... when it did not rain
3981 24201 toirts choolt blointly spriaonn schmeips krous... but she did not
3982 33494 scraocs knaedly daioc mceab sooc kniousts clie... when it was found could only neither...
3983 28988 czogy stoorty wheians veurg mcmoorth dwiountz ... by which they
3984 25337 zoetz treiahl typeauty squiend sriohl daonts s... till could only reach

3985 rows × 3 columns

Saving the prediction in the asset directory with the same as submission.csv.

In [16]:
!rm -rf assets
!mkdir assets
test_df.to_csv(os.path.join("assets", "submission.csv"), index=False)
"rm" ­Ґ пў«пҐвбп ў­гв७­Ґ© Ё«Ё ў­Ґи­Ґ©
Є®¬ ­¤®©, ЁбЇ®«­пҐ¬®© Їа®Ја ¬¬®© Ё«Ё Ї ЄҐв­л¬ д ©«®¬.

Submitting our Predictions

Note : Please save the notebook before submitting it (Ctrl + S)

In [ ]:
%aicrowd notebook submit -c lingua-franca-translation -a assets --no-verify
Using notebook: getting-started-notebook-for-lingua-franca-transalation.ipynb for submission...
Scrubbing API keys from the notebook...
Collecting notebook...


                                                       ╭─────────────────────────╮                                                       
                                                       │ Successfully submitted! │                                                       
                                                       ╰─────────────────────────╯                                                       
                                                             Important links                                                             
┌──────────────────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│  This submission │ https://www.aicrowd.com/challenges/ai-blitz-xii/problems/lingua-franca-translation/submissions/169598              │
│                  │                                                                                                                    │
│  All submissions │ https://www.aicrowd.com/challenges/ai-blitz-xii/problems/lingua-franca-translation/submissions?my_submissions=true │
│                  │                                                                                                                    │
│      Leaderboard │ https://www.aicrowd.com/challenges/ai-blitz-xii/problems/lingua-franca-translation/leaderboards                    │
│                  │                                                                                                                    │
│ Discussion forum │ https://discourse.aicrowd.com/c/ai-blitz-xii                                                                       │
│                  │                                                                                                                    │
│   Challenge page │ https://www.aicrowd.com/challenges/ai-blitz-xii/problems/lingua-franca-translation                                 │
└──────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Comments

You must login before you can post a comment.

Execute