Loading

Research Paper Classification

Solution for submission 147355

A detailed solution for submission 147355 submitted for challenge Research Paper Classification

sean_benhur

Downloading data and Login with AIicrowd CLI

In [1]:
!pip install transformers aicrowd-cli wandb -q
     |████████████████████████████████| 2.5MB 39.5MB/s 
     |████████████████████████████████| 51kB 8.9MB/s 
     |████████████████████████████████| 1.8MB 19.5MB/s 
     |████████████████████████████████| 3.3MB 38.9MB/s 
     |████████████████████████████████| 901kB 39.6MB/s 
     |████████████████████████████████| 61kB 8.5MB/s 
     |████████████████████████████████| 215kB 54.2MB/s 
     |████████████████████████████████| 174kB 59.2MB/s 
     |████████████████████████████████| 133kB 60.0MB/s 
     |████████████████████████████████| 102kB 14.4MB/s 
     |████████████████████████████████| 51kB 7.9MB/s 
     |████████████████████████████████| 71kB 11.7MB/s 
  Building wheel for pathtools (setup.py) ... done
  Building wheel for subprocess32 (setup.py) ... done
ERROR: aicrowd-cli 0.1.7 has requirement requests<3,>=2.25.1, but you'll have requests 2.23.0 which is incompatible.
ERROR: aicrowd-cli 0.1.7 has requirement tqdm<5,>=4.56.0, but you'll have tqdm 4.41.1 which is incompatible.
In [2]:
!pip install simpletransformers
Collecting simpletransformers
  Downloading https://files.pythonhosted.org/packages/cf/2b/9073313586a8cdd7997b2f4ed43c0a44d9c484013b51f77c1c0f034dd78b/simpletransformers-0.61.6-py3-none-any.whl (220kB)
     |████████████████████████████████| 225kB 28.1MB/s 
Collecting datasets
  Downloading https://files.pythonhosted.org/packages/08/a2/d4e1024c891506e1cee8f9d719d20831bac31cb5b7416983c4d2f65a6287/datasets-1.8.0-py3-none-any.whl (237kB)
     |████████████████████████████████| 245kB 37.6MB/s 
Requirement already satisfied: wandb in /usr/local/lib/python3.7/dist-packages (from simpletransformers) (0.10.32)
Collecting streamlit
  Downloading https://files.pythonhosted.org/packages/d7/0c/469ee9160ad7bc064eb498fa95aefd4e96b593ce0d53fb07ff217badff47/streamlit-0.83.0-py2.py3-none-any.whl (7.7MB)
     |████████████████████████████████| 7.8MB 51.6MB/s 
Requirement already satisfied: tokenizers in /usr/local/lib/python3.7/dist-packages (from simpletransformers) (0.10.3)
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.7/dist-packages (from simpletransformers) (0.22.2.post1)
Requirement already satisfied: pandas in /usr/local/lib/python3.7/dist-packages (from simpletransformers) (1.1.5)
Collecting sentencepiece
  Downloading https://files.pythonhosted.org/packages/ac/aa/1437691b0c7c83086ebb79ce2da16e00bef024f24fec2a5161c35476f499/sentencepiece-0.1.96-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2MB)
     |████████████████████████████████| 1.2MB 41.6MB/s 
Requirement already satisfied: scipy in /usr/local/lib/python3.7/dist-packages (from simpletransformers) (1.4.1)
Collecting seqeval
  Downloading https://files.pythonhosted.org/packages/9d/2d/233c79d5b4e5ab1dbf111242299153f3caddddbb691219f363ad55ce783d/seqeval-1.2.2.tar.gz (43kB)
     |████████████████████████████████| 51kB 9.0MB/s 
Collecting tensorboardx
  Downloading https://files.pythonhosted.org/packages/07/84/46421bd3e0e89a92682b1a38b40efc22dafb6d8e3d947e4ceefd4a5fabc7/tensorboardX-2.2-py2.py3-none-any.whl (120kB)
     |████████████████████████████████| 122kB 55.5MB/s 
Collecting tqdm>=4.47.0
  Downloading https://files.pythonhosted.org/packages/b4/20/9f1e974bb4761128fc0d0a32813eaa92827309b1756c4b892d28adfb4415/tqdm-4.61.1-py2.py3-none-any.whl (75kB)
     |████████████████████████████████| 81kB 2.0MB/s 
Requirement already satisfied: regex in /usr/local/lib/python3.7/dist-packages (from simpletransformers) (2019.12.20)
Requirement already satisfied: transformers>=4.2.0 in /usr/local/lib/python3.7/dist-packages (from simpletransformers) (4.7.0)
Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from simpletransformers) (1.19.5)
Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from simpletransformers) (2.23.0)
Collecting fsspec
  Downloading https://files.pythonhosted.org/packages/8e/d2/d05466997f7751a2c06a7a416b7d1f131d765f7916698d3fdcb3a4d037e5/fsspec-2021.6.0-py3-none-any.whl (114kB)
     |████████████████████████████████| 122kB 50.3MB/s 
Requirement already satisfied: importlib-metadata; python_version < "3.8" in /usr/local/lib/python3.7/dist-packages (from datasets->simpletransformers) (4.5.0)
Requirement already satisfied: multiprocess in /usr/local/lib/python3.7/dist-packages (from datasets->simpletransformers) (0.70.12.2)
Requirement already satisfied: pyarrow<4.0.0,>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from datasets->simpletransformers) (3.0.0)
Requirement already satisfied: huggingface-hub<0.1.0 in /usr/local/lib/python3.7/dist-packages (from datasets->simpletransformers) (0.0.8)
Requirement already satisfied: packaging in /usr/local/lib/python3.7/dist-packages (from datasets->simpletransformers) (20.9)
Collecting xxhash
  Downloading https://files.pythonhosted.org/packages/7d/4f/0a862cad26aa2ed7a7cd87178cbbfa824fc1383e472d63596a0d018374e7/xxhash-2.0.2-cp37-cp37m-manylinux2010_x86_64.whl (243kB)
     |████████████████████████████████| 245kB 56.3MB/s 
Requirement already satisfied: dill in /usr/local/lib/python3.7/dist-packages (from datasets->simpletransformers) (0.3.4)
Requirement already satisfied: sentry-sdk>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from wandb->simpletransformers) (1.1.0)
Requirement already satisfied: GitPython>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from wandb->simpletransformers) (3.1.18)
Requirement already satisfied: shortuuid>=0.5.0 in /usr/local/lib/python3.7/dist-packages (from wandb->simpletransformers) (1.0.1)
Requirement already satisfied: python-dateutil>=2.6.1 in /usr/local/lib/python3.7/dist-packages (from wandb->simpletransformers) (2.8.1)
Requirement already satisfied: promise<3,>=2.0 in /usr/local/lib/python3.7/dist-packages (from wandb->simpletransformers) (2.3)
Requirement already satisfied: protobuf>=3.12.0 in /usr/local/lib/python3.7/dist-packages (from wandb->simpletransformers) (3.12.4)
Requirement already satisfied: psutil>=5.0.0 in /usr/local/lib/python3.7/dist-packages (from wandb->simpletransformers) (5.4.8)
Requirement already satisfied: six>=1.13.0 in /usr/local/lib/python3.7/dist-packages (from wandb->simpletransformers) (1.15.0)
Requirement already satisfied: Click!=8.0.0,>=7.0 in /usr/local/lib/python3.7/dist-packages (from wandb->simpletransformers) (7.1.2)
Requirement already satisfied: docker-pycreds>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from wandb->simpletransformers) (0.4.0)
Requirement already satisfied: pathtools in /usr/local/lib/python3.7/dist-packages (from wandb->simpletransformers) (0.1.2)
Requirement already satisfied: PyYAML in /usr/local/lib/python3.7/dist-packages (from wandb->simpletransformers) (3.13)
Requirement already satisfied: configparser>=3.8.1 in /usr/local/lib/python3.7/dist-packages (from wandb->simpletransformers) (5.0.2)
Requirement already satisfied: subprocess32>=3.5.3 in /usr/local/lib/python3.7/dist-packages (from wandb->simpletransformers) (3.5.4)
Requirement already satisfied: tornado>=5.0 in /usr/local/lib/python3.7/dist-packages (from streamlit->simpletransformers) (5.1.1)
Requirement already satisfied: astor in /usr/local/lib/python3.7/dist-packages (from streamlit->simpletransformers) (0.8.1)
Collecting validators
  Downloading https://files.pythonhosted.org/packages/db/2f/7fed3ee94ad665ad2c1de87f858f10a7785251ff75b4fd47987888d07ef1/validators-0.18.2-py3-none-any.whl
Requirement already satisfied: altair>=3.2.0 in /usr/local/lib/python3.7/dist-packages (from streamlit->simpletransformers) (4.1.0)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.7/dist-packages (from streamlit->simpletransformers) (7.1.2)
Requirement already satisfied: tzlocal in /usr/local/lib/python3.7/dist-packages (from streamlit->simpletransformers) (1.5.1)
Requirement already satisfied: toml in /usr/local/lib/python3.7/dist-packages (from streamlit->simpletransformers) (0.10.2)
Collecting pydeck>=0.1.dev5
  Downloading https://files.pythonhosted.org/packages/d6/bc/f0e44828e4290367c869591d50d3671a4d0ee94926da6cb734b7b200308c/pydeck-0.6.2-py2.py3-none-any.whl (4.2MB)
     |████████████████████████████████| 4.2MB 53.6MB/s 
Collecting blinker
  Downloading https://files.pythonhosted.org/packages/1b/51/e2a9f3b757eb802f61dc1f2b09c8c99f6eb01cf06416c0671253536517b6/blinker-1.4.tar.gz (111kB)
     |████████████████████████████████| 112kB 59.7MB/s 
Collecting base58
  Downloading https://files.pythonhosted.org/packages/b8/a1/d9f565e9910c09fd325dc638765e8843a19fa696275c16cc08cf3b0a3c25/base58-2.1.0-py3-none-any.whl
Collecting watchdog; platform_system != "Darwin"
  Downloading https://files.pythonhosted.org/packages/f2/5b/36b3b11e557830de6fc1dc06e9aa3ee274119b8cea9cc98175dbbf72cf87/watchdog-2.1.2-py3-none-manylinux2014_x86_64.whl (74kB)
     |████████████████████████████████| 81kB 11.8MB/s 
Requirement already satisfied: cachetools>=4.0 in /usr/local/lib/python3.7/dist-packages (from streamlit->simpletransformers) (4.2.2)
Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/dist-packages (from scikit-learn->simpletransformers) (1.0.1)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas->simpletransformers) (2018.9)
Requirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from transformers>=4.2.0->simpletransformers) (3.0.12)
Requirement already satisfied: sacremoses in /usr/local/lib/python3.7/dist-packages (from transformers>=4.2.0->simpletransformers) (0.0.45)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->simpletransformers) (2.10)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->simpletransformers) (2021.5.30)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->simpletransformers) (3.0.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->simpletransformers) (1.24.3)
Requirement already satisfied: typing-extensions>=3.6.4; python_version < "3.8" in /usr/local/lib/python3.7/dist-packages (from importlib-metadata; python_version < "3.8"->datasets->simpletransformers) (3.7.4.3)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata; python_version < "3.8"->datasets->simpletransformers) (3.4.1)
Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging->datasets->simpletransformers) (2.4.7)
Requirement already satisfied: gitdb<5,>=4.0.1 in /usr/local/lib/python3.7/dist-packages (from GitPython>=1.0.0->wandb->simpletransformers) (4.0.7)
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from protobuf>=3.12.0->wandb->simpletransformers) (57.0.0)
Requirement already satisfied: decorator>=3.4.0 in /usr/local/lib/python3.7/dist-packages (from validators->streamlit->simpletransformers) (4.4.2)
Requirement already satisfied: jsonschema in /usr/local/lib/python3.7/dist-packages (from altair>=3.2.0->streamlit->simpletransformers) (2.6.0)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.7/dist-packages (from altair>=3.2.0->streamlit->simpletransformers) (2.11.3)
Requirement already satisfied: entrypoints in /usr/local/lib/python3.7/dist-packages (from altair>=3.2.0->streamlit->simpletransformers) (0.3)
Requirement already satisfied: toolz in /usr/local/lib/python3.7/dist-packages (from altair>=3.2.0->streamlit->simpletransformers) (0.11.1)
Requirement already satisfied: traitlets>=4.3.2 in /usr/local/lib/python3.7/dist-packages (from pydeck>=0.1.dev5->streamlit->simpletransformers) (5.0.5)
Requirement already satisfied: ipywidgets>=7.0.0 in /usr/local/lib/python3.7/dist-packages (from pydeck>=0.1.dev5->streamlit->simpletransformers) (7.6.3)
Collecting ipykernel>=5.1.2; python_version >= "3.4"
  Downloading https://files.pythonhosted.org/packages/90/6d/6c8fe4b658f77947d4244ce81f60230c4c8d1dc1a21ae83e63b269339178/ipykernel-5.5.5-py3-none-any.whl (120kB)
     |████████████████████████████████| 122kB 59.5MB/s 
Requirement already satisfied: smmap<5,>=3.0.1 in /usr/local/lib/python3.7/dist-packages (from gitdb<5,>=4.0.1->GitPython>=1.0.0->wandb->simpletransformers) (4.0.0)
Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.7/dist-packages (from jinja2->altair>=3.2.0->streamlit->simpletransformers) (2.0.1)
Requirement already satisfied: ipython-genutils in /usr/local/lib/python3.7/dist-packages (from traitlets>=4.3.2->pydeck>=0.1.dev5->streamlit->simpletransformers) (0.2.0)
Requirement already satisfied: nbformat>=4.2.0 in /usr/local/lib/python3.7/dist-packages (from ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (5.1.3)
Requirement already satisfied: widgetsnbextension~=3.5.0 in /usr/local/lib/python3.7/dist-packages (from ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (3.5.1)
Requirement already satisfied: jupyterlab-widgets>=1.0.0; python_version >= "3.6" in /usr/local/lib/python3.7/dist-packages (from ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (1.0.0)
Requirement already satisfied: ipython>=4.0.0; python_version >= "3.3" in /usr/local/lib/python3.7/dist-packages (from ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (5.5.0)
Requirement already satisfied: jupyter-client in /usr/local/lib/python3.7/dist-packages (from ipykernel>=5.1.2; python_version >= "3.4"->pydeck>=0.1.dev5->streamlit->simpletransformers) (5.3.5)
Requirement already satisfied: jupyter-core in /usr/local/lib/python3.7/dist-packages (from nbformat>=4.2.0->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (4.7.1)
Requirement already satisfied: notebook>=4.4.1 in /usr/local/lib/python3.7/dist-packages (from widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (5.3.1)
Requirement already satisfied: pexpect; sys_platform != "win32" in /usr/local/lib/python3.7/dist-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (4.8.0)
Requirement already satisfied: simplegeneric>0.8 in /usr/local/lib/python3.7/dist-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (0.8.1)
Requirement already satisfied: pickleshare in /usr/local/lib/python3.7/dist-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (0.7.5)
Requirement already satisfied: prompt-toolkit<2.0.0,>=1.0.4 in /usr/local/lib/python3.7/dist-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (1.0.18)
Requirement already satisfied: pygments in /usr/local/lib/python3.7/dist-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (2.6.1)
Requirement already satisfied: pyzmq>=13 in /usr/local/lib/python3.7/dist-packages (from jupyter-client->ipykernel>=5.1.2; python_version >= "3.4"->pydeck>=0.1.dev5->streamlit->simpletransformers) (22.1.0)
Requirement already satisfied: Send2Trash in /usr/local/lib/python3.7/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (1.5.0)
Requirement already satisfied: terminado>=0.8.1 in /usr/local/lib/python3.7/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (0.10.1)
Requirement already satisfied: nbconvert in /usr/local/lib/python3.7/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (5.6.1)
Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.7/dist-packages (from pexpect; sys_platform != "win32"->ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (0.7.0)
Requirement already satisfied: wcwidth in /usr/local/lib/python3.7/dist-packages (from prompt-toolkit<2.0.0,>=1.0.4->ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (0.2.5)
Requirement already satisfied: testpath in /usr/local/lib/python3.7/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (0.5.0)
Requirement already satisfied: defusedxml in /usr/local/lib/python3.7/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (0.7.1)
Requirement already satisfied: bleach in /usr/local/lib/python3.7/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (3.3.0)
Requirement already satisfied: mistune<2,>=0.8.1 in /usr/local/lib/python3.7/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (0.8.4)
Requirement already satisfied: pandocfilters>=1.4.1 in /usr/local/lib/python3.7/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (1.4.3)
Requirement already satisfied: webencodings in /usr/local/lib/python3.7/dist-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (0.5.1)
Building wheels for collected packages: seqeval, blinker
  Building wheel for seqeval (setup.py) ... done
  Created wheel for seqeval: filename=seqeval-1.2.2-cp37-none-any.whl size=16184 sha256=3d09b50ef231311761e6e5c6a8ad6d8cac32b764eacd626bdb4e51d860136399
  Stored in directory: /root/.cache/pip/wheels/52/df/1b/45d75646c37428f7e626214704a0e35bd3cfc32eda37e59e5f
  Building wheel for blinker (setup.py) ... done
  Created wheel for blinker: filename=blinker-1.4-cp37-none-any.whl size=13476 sha256=aebfea07d20a092bca0d1e17fe87063fde9d531f79e823b2b33abd0d127d6150
  Stored in directory: /root/.cache/pip/wheels/92/a0/00/8690a57883956a301d91cf4ec999cc0b258b01e3f548f86e89
Successfully built seqeval blinker
ERROR: google-colab 1.0.0 has requirement ipykernel~=4.10, but you'll have ipykernel 5.5.5 which is incompatible.
ERROR: aicrowd-cli 0.1.7 has requirement requests<3,>=2.25.1, but you'll have requests 2.23.0 which is incompatible.
ERROR: datasets 1.8.0 has requirement tqdm<4.50.0,>=4.27, but you'll have tqdm 4.61.1 which is incompatible.
Installing collected packages: fsspec, xxhash, tqdm, datasets, validators, ipykernel, pydeck, blinker, base58, watchdog, streamlit, sentencepiece, seqeval, tensorboardx, simpletransformers
  Found existing installation: tqdm 4.41.1
    Uninstalling tqdm-4.41.1:
      Successfully uninstalled tqdm-4.41.1
  Found existing installation: ipykernel 4.10.1
    Uninstalling ipykernel-4.10.1:
      Successfully uninstalled ipykernel-4.10.1
Successfully installed base58-2.1.0 blinker-1.4 datasets-1.8.0 fsspec-2021.6.0 ipykernel-5.5.5 pydeck-0.6.2 sentencepiece-0.1.96 seqeval-1.2.2 simpletransformers-0.61.6 streamlit-0.83.0 tensorboardx-2.2 tqdm-4.61.1 validators-0.18.2 watchdog-2.1.2 xxhash-2.0.2
In [3]:

API Key valid
Saved API Key successfully!
In [4]:
# Downloading the Dataset ( removing data and assets folder if existing already and then creating the folder )
!rm -rf data
!mkdir data
!rm -rf assets
!mkdir assets
test.csv:   0% 0.00/3.01M [00:00<?, ?B/s]
val.csv:   0% 0.00/883k [00:00<?, ?B/s]
val.csv: 100% 883k/883k [00:00<00:00, 2.55MB/s]

test.csv: 100% 3.01M/3.01M [00:00<00:00, 6.45MB/s]

train.csv: 100% 8.77M/8.77M [00:00<00:00, 8.87MB/s]

Importing Libraries

In [5]:
from simpletransformers.classification import ClassificationModel
import numpy as np
import pandas as pd
In [18]:
import os
In [6]:
train_path = "/content/data/train.csv"
val_path = "/content/data/val.csv"
test_path = "/content/data/test.csv"

train_df = pd.read_csv(train_path)
val_df = pd.read_csv(val_path)
test_df = pd.read_csv(test_path)
In [7]:
train_df
Out[7]:
id text label
0 0 we propose deep network models and learning al... 3
1 1 multi-distance information computed by the MDL... 3
2 2 traditional solutions consider dense pedestria... 2
3 3 in this paper, is used the lagrangian classica... 2
4 4 the aim of this work is to determine how vulne... 3
... ... ... ...
31495 31495 the proposed method is easily programmed by ki... 2
31496 31496 research in unpaired video translation has foc... 3
31497 31497 deep learning models exhibit limited generaliz... 3
31498 31498 in this paper, we aim to incorporate global se... 3
31499 31499 to precisely calculate context-based probabili... 3

31500 rows × 3 columns

In [8]:
train_df['label'].value_counts()
Out[8]:
3    19676
0     4352
1     4078
2     3394
Name: label, dtype: int64
In [9]:
combined_data = pd.concat([train_df,val_df],axis=0)
In [10]:
combined_data.drop(['id'],axis=1,inplace=True)
In [11]:
custom_args = {'fp16': True, # not using mixed precision 
               'train_batch_size': 16, # default is 8
               'gradient_accumulation_steps': 2,
               'do_lower_case': True,
               'learning_rate': 1e-05, # using lower learning rate
               'overwrite_output_dir': True, # important for CV
               'num_train_epochs': 2} # default is 1
In [12]:
model = ClassificationModel("roberta", "xlm-roberta-large", args=custom_args,num_labels=4) 
model.train_model(combined_data,)
You are using a model of type xlm-roberta to instantiate a model of type roberta. This is not supported for all configurations of models and can yield errors.
Some weights of the model checkpoint at xlm-roberta-large were not used when initializing RobertaForSequenceClassification: ['lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.dense.weight', 'lm_head.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at xlm-roberta-large and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.weight', 'classifier.out_proj.bias', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
/usr/local/lib/python3.7/dist-packages/simpletransformers/classification/classification_model.py:602: UserWarning: Dataframe headers not specified. Falling back to using column 0 as text and column 1 as labels.
  "Dataframe headers not specified. Falling back to using column 0 as text and column 1 as labels."
/usr/local/lib/python3.7/dist-packages/simpletransformers/classification/classification_model.py:927: FutureWarning: Non-finite norm encountered in torch.nn.utils.clip_grad_norm_; continuing anyway. Note that the default behavior will change in a future release to error out if a non-finite total norm is encountered. At that point, setting error_if_nonfinite=false will be required to retain the old behavior.
  model.parameters(), args.max_grad_norm
Exception ignored in: <generator object tqdm.__iter__ at 0x7f7367f73ed0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/tqdm/std.py", line 1193, in __iter__
    self.close()
  File "/usr/local/lib/python3.7/dist-packages/tqdm/notebook.py", line 283, in close
    self.disp(bar_style='danger', check_delay=False)
  File "/usr/local/lib/python3.7/dist-packages/tqdm/notebook.py", line 177, in display
    rtext.value = right
  File "/usr/local/lib/python3.7/dist-packages/traitlets/traitlets.py", line 604, in __set__
    self.set(obj, value)
  File "/usr/local/lib/python3.7/dist-packages/traitlets/traitlets.py", line 593, in set
    obj._notify_trait(self.name, old_value, new_value)
  File "/usr/local/lib/python3.7/dist-packages/traitlets/traitlets.py", line 1222, in _notify_trait
    type='change',
  File "/usr/local/lib/python3.7/dist-packages/ipywidgets/widgets/widget.py", line 605, in notify_change
    self.send_state(key=name)
  File "/usr/local/lib/python3.7/dist-packages/ipywidgets/widgets/widget.py", line 489, in send_state
    self._send(msg, buffers=buffers)
  File "/usr/local/lib/python3.7/dist-packages/ipywidgets/widgets/widget.py", line 737, in _send
    self.comm.send(data=msg, buffers=buffers)
  File "/usr/local/lib/python3.7/dist-packages/ipykernel/comm/comm.py", line 121, in send
    """Send a message to the frontend-side version of this comm"""
  File "/usr/local/lib/python3.7/dist-packages/ipykernel/comm/comm.py", line 71, in _publish_msg
    buffers=buffers,
  File "/usr/local/lib/python3.7/dist-packages/jupyter_client/session.py", line 748, in send
    stream.send_multipart(to_send, copy=copy)
  File "/usr/local/lib/python3.7/dist-packages/ipykernel/iostream.py", line 262, in send_multipart
    return self.io_thread.send_multipart(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/ipykernel/iostream.py", line 212, in send_multipart
    self.schedule(lambda : self._really_send(*args, **kwargs))
  File "/usr/local/lib/python3.7/dist-packages/ipykernel/iostream.py", line 203, in schedule
    self._event_pipe.send(b'')
  File "/usr/local/lib/python3.7/dist-packages/zmq/sugar/socket.py", line 505, in send
    return super(Socket, self).send(data, flags=flags, copy=copy, track=track)
  File "zmq/backend/cython/socket.pyx", line 718, in zmq.backend.cython.socket.Socket.send
  File "zmq/backend/cython/socket.pyx", line 765, in zmq.backend.cython.socket.Socket.send
  File "zmq/backend/cython/socket.pyx", line 235, in zmq.backend.cython.socket._send_copy
  File "zmq/backend/cython/checkrc.pxd", line 13, in zmq.backend.cython.checkrc._check_rc
KeyboardInterrupt: 
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-12-b82f9f4fc9ed> in <module>()
      1 model = ClassificationModel("roberta", "xlm-roberta-large", args=custom_args,num_labels=4)
----> 2 model.train_model(combined_data,)

/usr/local/lib/python3.7/dist-packages/simpletransformers/classification/classification_model.py in train_model(self, train_df, multi_label, output_dir, show_running_loss, args, eval_df, verbose, **kwargs)
    626             eval_df=eval_df,
    627             verbose=verbose,
--> 628             **kwargs,
    629         )
    630 

/usr/local/lib/python3.7/dist-packages/simpletransformers/classification/classification_model.py in train(self, train_dataloader, output_dir, multi_label, show_running_loss, eval_df, verbose, **kwargs)
   1120 
   1121             if args.save_model_every_epoch:
-> 1122                 self.save_model(output_dir_current, optimizer, scheduler, model=model)
   1123 
   1124             if args.evaluate_during_training and args.evaluate_each_epoch:

/usr/local/lib/python3.7/dist-packages/simpletransformers/classification/classification_model.py in save_model(self, output_dir, optimizer, scheduler, model, results)
   2252             if optimizer and scheduler and self.args.save_optimizer_and_scheduler:
   2253                 torch.save(
-> 2254                     optimizer.state_dict(), os.path.join(output_dir, "optimizer.pt")
   2255                 )
   2256                 torch.save(

/usr/local/lib/python3.7/dist-packages/torch/serialization.py in save(obj, f, pickle_module, pickle_protocol, _use_new_zipfile_serialization)
    377         if _use_new_zipfile_serialization:
    378             with _open_zipfile_writer(opened_file) as opened_zipfile:
--> 379                 _save(obj, opened_zipfile, pickle_module, pickle_protocol)
    380                 return
    381         _legacy_save(obj, opened_file, pickle_module, pickle_protocol)

/usr/local/lib/python3.7/dist-packages/torch/serialization.py in _save(obj, zip_file, pickle_module, pickle_protocol)
    493         # this means to that to get tensors serialized, you need to implement
    494         # .cpu() on the underlying Storage
--> 495         if storage.device.type != 'cpu':
    496             storage = storage.cpu()
    497         # Now that it is on the CPU we can directly copy it into the zip file

KeyboardInterrupt: 
In [13]:
predictions, raw_outputs = model.predict(list(test_df['text'].values))
In [15]:
# Applying the predictions to the labels column of the sample submission 
test_df['label'] = predictions
test_df
Out[15]:
id text label
0 0 we propose a lightweight framework to detect i... 3
1 1 the proposed method presents an alternate solu... 2
2 2 proposed ear identification method fusing SIFT... 3
3 3 a method to reconstruct the three-dimensional ... 3
4 4 strong local consistencies can improve their p... 0
... ... ... ...
10795 10795 whole-body gradient echo scans of 240 subjects... 3
10796 10796 we present a tracker that accomplishes trackin... 3
10797 10797 the most popular FL algorithm is Federated Ave... 1
10798 10798 in the field of Autonomous Driving, the system... 2
10799 10799 our method takes as an input a foreground imag... 3

10800 rows × 3 columns

In [19]:
!mkdir assets

# Saving the sample submission in assets directory
test_df.to_csv(os.path.join("assets", "submission.csv"), index=False)
mkdir: cannot create directory ‘assets’: File exists
In [ ]:

Mounting Google Drive 💾
Your Google Drive will be mounted to access the colab notebook
Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.activity.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fexperimentsandconfigs%20https%3a%2f%2fwww.googleapis.com%2fauth%2fphotos.native&response_type=code

Enter your authorization code:
In [ ]:


Comments

You must login before you can post a comment.

Execute