Research Paper Classification
Solution for submission 147355
A detailed solution for submission 147355 submitted for challenge Research Paper Classification
Downloading data and Login with AIicrowd CLI¶
In [1]:
!pip install transformers aicrowd-cli wandb -q
|████████████████████████████████| 2.5MB 39.5MB/s |████████████████████████████████| 51kB 8.9MB/s |████████████████████████████████| 1.8MB 19.5MB/s |████████████████████████████████| 3.3MB 38.9MB/s |████████████████████████████████| 901kB 39.6MB/s |████████████████████████████████| 61kB 8.5MB/s |████████████████████████████████| 215kB 54.2MB/s |████████████████████████████████| 174kB 59.2MB/s |████████████████████████████████| 133kB 60.0MB/s |████████████████████████████████| 102kB 14.4MB/s |████████████████████████████████| 51kB 7.9MB/s |████████████████████████████████| 71kB 11.7MB/s Building wheel for pathtools (setup.py) ... done Building wheel for subprocess32 (setup.py) ... done ERROR: aicrowd-cli 0.1.7 has requirement requests<3,>=2.25.1, but you'll have requests 2.23.0 which is incompatible. ERROR: aicrowd-cli 0.1.7 has requirement tqdm<5,>=4.56.0, but you'll have tqdm 4.41.1 which is incompatible.
In [2]:
!pip install simpletransformers
Collecting simpletransformers Downloading https://files.pythonhosted.org/packages/cf/2b/9073313586a8cdd7997b2f4ed43c0a44d9c484013b51f77c1c0f034dd78b/simpletransformers-0.61.6-py3-none-any.whl (220kB) |████████████████████████████████| 225kB 28.1MB/s Collecting datasets Downloading https://files.pythonhosted.org/packages/08/a2/d4e1024c891506e1cee8f9d719d20831bac31cb5b7416983c4d2f65a6287/datasets-1.8.0-py3-none-any.whl (237kB) |████████████████████████████████| 245kB 37.6MB/s Requirement already satisfied: wandb in /usr/local/lib/python3.7/dist-packages (from simpletransformers) (0.10.32) Collecting streamlit Downloading https://files.pythonhosted.org/packages/d7/0c/469ee9160ad7bc064eb498fa95aefd4e96b593ce0d53fb07ff217badff47/streamlit-0.83.0-py2.py3-none-any.whl (7.7MB) |████████████████████████████████| 7.8MB 51.6MB/s Requirement already satisfied: tokenizers in /usr/local/lib/python3.7/dist-packages (from simpletransformers) (0.10.3) Requirement already satisfied: scikit-learn in /usr/local/lib/python3.7/dist-packages (from simpletransformers) (0.22.2.post1) Requirement already satisfied: pandas in /usr/local/lib/python3.7/dist-packages (from simpletransformers) (1.1.5) Collecting sentencepiece Downloading https://files.pythonhosted.org/packages/ac/aa/1437691b0c7c83086ebb79ce2da16e00bef024f24fec2a5161c35476f499/sentencepiece-0.1.96-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2MB) |████████████████████████████████| 1.2MB 41.6MB/s Requirement already satisfied: scipy in /usr/local/lib/python3.7/dist-packages (from simpletransformers) (1.4.1) Collecting seqeval Downloading https://files.pythonhosted.org/packages/9d/2d/233c79d5b4e5ab1dbf111242299153f3caddddbb691219f363ad55ce783d/seqeval-1.2.2.tar.gz (43kB) |████████████████████████████████| 51kB 9.0MB/s Collecting tensorboardx Downloading https://files.pythonhosted.org/packages/07/84/46421bd3e0e89a92682b1a38b40efc22dafb6d8e3d947e4ceefd4a5fabc7/tensorboardX-2.2-py2.py3-none-any.whl (120kB) |████████████████████████████████| 122kB 55.5MB/s Collecting tqdm>=4.47.0 Downloading https://files.pythonhosted.org/packages/b4/20/9f1e974bb4761128fc0d0a32813eaa92827309b1756c4b892d28adfb4415/tqdm-4.61.1-py2.py3-none-any.whl (75kB) |████████████████████████████████| 81kB 2.0MB/s Requirement already satisfied: regex in /usr/local/lib/python3.7/dist-packages (from simpletransformers) (2019.12.20) Requirement already satisfied: transformers>=4.2.0 in /usr/local/lib/python3.7/dist-packages (from simpletransformers) (4.7.0) Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from simpletransformers) (1.19.5) Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from simpletransformers) (2.23.0) Collecting fsspec Downloading https://files.pythonhosted.org/packages/8e/d2/d05466997f7751a2c06a7a416b7d1f131d765f7916698d3fdcb3a4d037e5/fsspec-2021.6.0-py3-none-any.whl (114kB) |████████████████████████████████| 122kB 50.3MB/s Requirement already satisfied: importlib-metadata; python_version < "3.8" in /usr/local/lib/python3.7/dist-packages (from datasets->simpletransformers) (4.5.0) Requirement already satisfied: multiprocess in /usr/local/lib/python3.7/dist-packages (from datasets->simpletransformers) (0.70.12.2) Requirement already satisfied: pyarrow<4.0.0,>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from datasets->simpletransformers) (3.0.0) Requirement already satisfied: huggingface-hub<0.1.0 in /usr/local/lib/python3.7/dist-packages (from datasets->simpletransformers) (0.0.8) Requirement already satisfied: packaging in /usr/local/lib/python3.7/dist-packages (from datasets->simpletransformers) (20.9) Collecting xxhash Downloading https://files.pythonhosted.org/packages/7d/4f/0a862cad26aa2ed7a7cd87178cbbfa824fc1383e472d63596a0d018374e7/xxhash-2.0.2-cp37-cp37m-manylinux2010_x86_64.whl (243kB) |████████████████████████████████| 245kB 56.3MB/s Requirement already satisfied: dill in /usr/local/lib/python3.7/dist-packages (from datasets->simpletransformers) (0.3.4) Requirement already satisfied: sentry-sdk>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from wandb->simpletransformers) (1.1.0) Requirement already satisfied: GitPython>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from wandb->simpletransformers) (3.1.18) Requirement already satisfied: shortuuid>=0.5.0 in /usr/local/lib/python3.7/dist-packages (from wandb->simpletransformers) (1.0.1) Requirement already satisfied: python-dateutil>=2.6.1 in /usr/local/lib/python3.7/dist-packages (from wandb->simpletransformers) (2.8.1) Requirement already satisfied: promise<3,>=2.0 in /usr/local/lib/python3.7/dist-packages (from wandb->simpletransformers) (2.3) Requirement already satisfied: protobuf>=3.12.0 in /usr/local/lib/python3.7/dist-packages (from wandb->simpletransformers) (3.12.4) Requirement already satisfied: psutil>=5.0.0 in /usr/local/lib/python3.7/dist-packages (from wandb->simpletransformers) (5.4.8) Requirement already satisfied: six>=1.13.0 in /usr/local/lib/python3.7/dist-packages (from wandb->simpletransformers) (1.15.0) Requirement already satisfied: Click!=8.0.0,>=7.0 in /usr/local/lib/python3.7/dist-packages (from wandb->simpletransformers) (7.1.2) Requirement already satisfied: docker-pycreds>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from wandb->simpletransformers) (0.4.0) Requirement already satisfied: pathtools in /usr/local/lib/python3.7/dist-packages (from wandb->simpletransformers) (0.1.2) Requirement already satisfied: PyYAML in /usr/local/lib/python3.7/dist-packages (from wandb->simpletransformers) (3.13) Requirement already satisfied: configparser>=3.8.1 in /usr/local/lib/python3.7/dist-packages (from wandb->simpletransformers) (5.0.2) Requirement already satisfied: subprocess32>=3.5.3 in /usr/local/lib/python3.7/dist-packages (from wandb->simpletransformers) (3.5.4) Requirement already satisfied: tornado>=5.0 in /usr/local/lib/python3.7/dist-packages (from streamlit->simpletransformers) (5.1.1) Requirement already satisfied: astor in /usr/local/lib/python3.7/dist-packages (from streamlit->simpletransformers) (0.8.1) Collecting validators Downloading https://files.pythonhosted.org/packages/db/2f/7fed3ee94ad665ad2c1de87f858f10a7785251ff75b4fd47987888d07ef1/validators-0.18.2-py3-none-any.whl Requirement already satisfied: altair>=3.2.0 in /usr/local/lib/python3.7/dist-packages (from streamlit->simpletransformers) (4.1.0) Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.7/dist-packages (from streamlit->simpletransformers) (7.1.2) Requirement already satisfied: tzlocal in /usr/local/lib/python3.7/dist-packages (from streamlit->simpletransformers) (1.5.1) Requirement already satisfied: toml in /usr/local/lib/python3.7/dist-packages (from streamlit->simpletransformers) (0.10.2) Collecting pydeck>=0.1.dev5 Downloading https://files.pythonhosted.org/packages/d6/bc/f0e44828e4290367c869591d50d3671a4d0ee94926da6cb734b7b200308c/pydeck-0.6.2-py2.py3-none-any.whl (4.2MB) |████████████████████████████████| 4.2MB 53.6MB/s Collecting blinker Downloading https://files.pythonhosted.org/packages/1b/51/e2a9f3b757eb802f61dc1f2b09c8c99f6eb01cf06416c0671253536517b6/blinker-1.4.tar.gz (111kB) |████████████████████████████████| 112kB 59.7MB/s Collecting base58 Downloading https://files.pythonhosted.org/packages/b8/a1/d9f565e9910c09fd325dc638765e8843a19fa696275c16cc08cf3b0a3c25/base58-2.1.0-py3-none-any.whl Collecting watchdog; platform_system != "Darwin" Downloading https://files.pythonhosted.org/packages/f2/5b/36b3b11e557830de6fc1dc06e9aa3ee274119b8cea9cc98175dbbf72cf87/watchdog-2.1.2-py3-none-manylinux2014_x86_64.whl (74kB) |████████████████████████████████| 81kB 11.8MB/s Requirement already satisfied: cachetools>=4.0 in /usr/local/lib/python3.7/dist-packages (from streamlit->simpletransformers) (4.2.2) Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/dist-packages (from scikit-learn->simpletransformers) (1.0.1) Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas->simpletransformers) (2018.9) Requirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from transformers>=4.2.0->simpletransformers) (3.0.12) Requirement already satisfied: sacremoses in /usr/local/lib/python3.7/dist-packages (from transformers>=4.2.0->simpletransformers) (0.0.45) Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->simpletransformers) (2.10) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->simpletransformers) (2021.5.30) Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->simpletransformers) (3.0.4) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->simpletransformers) (1.24.3) Requirement already satisfied: typing-extensions>=3.6.4; python_version < "3.8" in /usr/local/lib/python3.7/dist-packages (from importlib-metadata; python_version < "3.8"->datasets->simpletransformers) (3.7.4.3) Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata; python_version < "3.8"->datasets->simpletransformers) (3.4.1) Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging->datasets->simpletransformers) (2.4.7) Requirement already satisfied: gitdb<5,>=4.0.1 in /usr/local/lib/python3.7/dist-packages (from GitPython>=1.0.0->wandb->simpletransformers) (4.0.7) Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from protobuf>=3.12.0->wandb->simpletransformers) (57.0.0) Requirement already satisfied: decorator>=3.4.0 in /usr/local/lib/python3.7/dist-packages (from validators->streamlit->simpletransformers) (4.4.2) Requirement already satisfied: jsonschema in /usr/local/lib/python3.7/dist-packages (from altair>=3.2.0->streamlit->simpletransformers) (2.6.0) Requirement already satisfied: jinja2 in /usr/local/lib/python3.7/dist-packages (from altair>=3.2.0->streamlit->simpletransformers) (2.11.3) Requirement already satisfied: entrypoints in /usr/local/lib/python3.7/dist-packages (from altair>=3.2.0->streamlit->simpletransformers) (0.3) Requirement already satisfied: toolz in /usr/local/lib/python3.7/dist-packages (from altair>=3.2.0->streamlit->simpletransformers) (0.11.1) Requirement already satisfied: traitlets>=4.3.2 in /usr/local/lib/python3.7/dist-packages (from pydeck>=0.1.dev5->streamlit->simpletransformers) (5.0.5) Requirement already satisfied: ipywidgets>=7.0.0 in /usr/local/lib/python3.7/dist-packages (from pydeck>=0.1.dev5->streamlit->simpletransformers) (7.6.3) Collecting ipykernel>=5.1.2; python_version >= "3.4" Downloading https://files.pythonhosted.org/packages/90/6d/6c8fe4b658f77947d4244ce81f60230c4c8d1dc1a21ae83e63b269339178/ipykernel-5.5.5-py3-none-any.whl (120kB) |████████████████████████████████| 122kB 59.5MB/s Requirement already satisfied: smmap<5,>=3.0.1 in /usr/local/lib/python3.7/dist-packages (from gitdb<5,>=4.0.1->GitPython>=1.0.0->wandb->simpletransformers) (4.0.0) Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.7/dist-packages (from jinja2->altair>=3.2.0->streamlit->simpletransformers) (2.0.1) Requirement already satisfied: ipython-genutils in /usr/local/lib/python3.7/dist-packages (from traitlets>=4.3.2->pydeck>=0.1.dev5->streamlit->simpletransformers) (0.2.0) Requirement already satisfied: nbformat>=4.2.0 in /usr/local/lib/python3.7/dist-packages (from ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (5.1.3) Requirement already satisfied: widgetsnbextension~=3.5.0 in /usr/local/lib/python3.7/dist-packages (from ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (3.5.1) Requirement already satisfied: jupyterlab-widgets>=1.0.0; python_version >= "3.6" in /usr/local/lib/python3.7/dist-packages (from ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (1.0.0) Requirement already satisfied: ipython>=4.0.0; python_version >= "3.3" in /usr/local/lib/python3.7/dist-packages (from ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (5.5.0) Requirement already satisfied: jupyter-client in /usr/local/lib/python3.7/dist-packages (from ipykernel>=5.1.2; python_version >= "3.4"->pydeck>=0.1.dev5->streamlit->simpletransformers) (5.3.5) Requirement already satisfied: jupyter-core in /usr/local/lib/python3.7/dist-packages (from nbformat>=4.2.0->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (4.7.1) Requirement already satisfied: notebook>=4.4.1 in /usr/local/lib/python3.7/dist-packages (from widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (5.3.1) Requirement already satisfied: pexpect; sys_platform != "win32" in /usr/local/lib/python3.7/dist-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (4.8.0) Requirement already satisfied: simplegeneric>0.8 in /usr/local/lib/python3.7/dist-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (0.8.1) Requirement already satisfied: pickleshare in /usr/local/lib/python3.7/dist-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (0.7.5) Requirement already satisfied: prompt-toolkit<2.0.0,>=1.0.4 in /usr/local/lib/python3.7/dist-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (1.0.18) Requirement already satisfied: pygments in /usr/local/lib/python3.7/dist-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (2.6.1) Requirement already satisfied: pyzmq>=13 in /usr/local/lib/python3.7/dist-packages (from jupyter-client->ipykernel>=5.1.2; python_version >= "3.4"->pydeck>=0.1.dev5->streamlit->simpletransformers) (22.1.0) Requirement already satisfied: Send2Trash in /usr/local/lib/python3.7/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (1.5.0) Requirement already satisfied: terminado>=0.8.1 in /usr/local/lib/python3.7/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (0.10.1) Requirement already satisfied: nbconvert in /usr/local/lib/python3.7/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (5.6.1) Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.7/dist-packages (from pexpect; sys_platform != "win32"->ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (0.7.0) Requirement already satisfied: wcwidth in /usr/local/lib/python3.7/dist-packages (from prompt-toolkit<2.0.0,>=1.0.4->ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (0.2.5) Requirement already satisfied: testpath in /usr/local/lib/python3.7/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (0.5.0) Requirement already satisfied: defusedxml in /usr/local/lib/python3.7/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (0.7.1) Requirement already satisfied: bleach in /usr/local/lib/python3.7/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (3.3.0) Requirement already satisfied: mistune<2,>=0.8.1 in /usr/local/lib/python3.7/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (0.8.4) Requirement already satisfied: pandocfilters>=1.4.1 in /usr/local/lib/python3.7/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (1.4.3) Requirement already satisfied: webencodings in /usr/local/lib/python3.7/dist-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck>=0.1.dev5->streamlit->simpletransformers) (0.5.1) Building wheels for collected packages: seqeval, blinker Building wheel for seqeval (setup.py) ... done Created wheel for seqeval: filename=seqeval-1.2.2-cp37-none-any.whl size=16184 sha256=3d09b50ef231311761e6e5c6a8ad6d8cac32b764eacd626bdb4e51d860136399 Stored in directory: /root/.cache/pip/wheels/52/df/1b/45d75646c37428f7e626214704a0e35bd3cfc32eda37e59e5f Building wheel for blinker (setup.py) ... done Created wheel for blinker: filename=blinker-1.4-cp37-none-any.whl size=13476 sha256=aebfea07d20a092bca0d1e17fe87063fde9d531f79e823b2b33abd0d127d6150 Stored in directory: /root/.cache/pip/wheels/92/a0/00/8690a57883956a301d91cf4ec999cc0b258b01e3f548f86e89 Successfully built seqeval blinker ERROR: google-colab 1.0.0 has requirement ipykernel~=4.10, but you'll have ipykernel 5.5.5 which is incompatible. ERROR: aicrowd-cli 0.1.7 has requirement requests<3,>=2.25.1, but you'll have requests 2.23.0 which is incompatible. ERROR: datasets 1.8.0 has requirement tqdm<4.50.0,>=4.27, but you'll have tqdm 4.61.1 which is incompatible. Installing collected packages: fsspec, xxhash, tqdm, datasets, validators, ipykernel, pydeck, blinker, base58, watchdog, streamlit, sentencepiece, seqeval, tensorboardx, simpletransformers Found existing installation: tqdm 4.41.1 Uninstalling tqdm-4.41.1: Successfully uninstalled tqdm-4.41.1 Found existing installation: ipykernel 4.10.1 Uninstalling ipykernel-4.10.1: Successfully uninstalled ipykernel-4.10.1 Successfully installed base58-2.1.0 blinker-1.4 datasets-1.8.0 fsspec-2021.6.0 ipykernel-5.5.5 pydeck-0.6.2 sentencepiece-0.1.96 seqeval-1.2.2 simpletransformers-0.61.6 streamlit-0.83.0 tensorboardx-2.2 tqdm-4.61.1 validators-0.18.2 watchdog-2.1.2 xxhash-2.0.2
In [3]:
API Key valid Saved API Key successfully!
In [4]:
# Downloading the Dataset ( removing data and assets folder if existing already and then creating the folder )
!rm -rf data
!mkdir data
!rm -rf assets
!mkdir assets
test.csv: 0% 0.00/3.01M [00:00<?, ?B/s] val.csv: 0% 0.00/883k [00:00<?, ?B/s] val.csv: 100% 883k/883k [00:00<00:00, 2.55MB/s] test.csv: 100% 3.01M/3.01M [00:00<00:00, 6.45MB/s] train.csv: 100% 8.77M/8.77M [00:00<00:00, 8.87MB/s]
Importing Libraries¶
In [5]:
from simpletransformers.classification import ClassificationModel
import numpy as np
import pandas as pd
In [18]:
import os
In [6]:
train_path = "/content/data/train.csv"
val_path = "/content/data/val.csv"
test_path = "/content/data/test.csv"
train_df = pd.read_csv(train_path)
val_df = pd.read_csv(val_path)
test_df = pd.read_csv(test_path)
In [7]:
train_df
Out[7]:
id | text | label | |
---|---|---|---|
0 | 0 | we propose deep network models and learning al... | 3 |
1 | 1 | multi-distance information computed by the MDL... | 3 |
2 | 2 | traditional solutions consider dense pedestria... | 2 |
3 | 3 | in this paper, is used the lagrangian classica... | 2 |
4 | 4 | the aim of this work is to determine how vulne... | 3 |
... | ... | ... | ... |
31495 | 31495 | the proposed method is easily programmed by ki... | 2 |
31496 | 31496 | research in unpaired video translation has foc... | 3 |
31497 | 31497 | deep learning models exhibit limited generaliz... | 3 |
31498 | 31498 | in this paper, we aim to incorporate global se... | 3 |
31499 | 31499 | to precisely calculate context-based probabili... | 3 |
31500 rows × 3 columns
In [8]:
train_df['label'].value_counts()
Out[8]:
3 19676 0 4352 1 4078 2 3394 Name: label, dtype: int64
In [9]:
combined_data = pd.concat([train_df,val_df],axis=0)
In [10]:
combined_data.drop(['id'],axis=1,inplace=True)
In [11]:
custom_args = {'fp16': True, # not using mixed precision
'train_batch_size': 16, # default is 8
'gradient_accumulation_steps': 2,
'do_lower_case': True,
'learning_rate': 1e-05, # using lower learning rate
'overwrite_output_dir': True, # important for CV
'num_train_epochs': 2} # default is 1
In [12]:
model = ClassificationModel("roberta", "xlm-roberta-large", args=custom_args,num_labels=4)
model.train_model(combined_data,)
You are using a model of type xlm-roberta to instantiate a model of type roberta. This is not supported for all configurations of models and can yield errors.
Some weights of the model checkpoint at xlm-roberta-large were not used when initializing RobertaForSequenceClassification: ['lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.dense.weight', 'lm_head.bias'] - This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at xlm-roberta-large and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.weight', 'classifier.out_proj.bias', 'classifier.dense.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
/usr/local/lib/python3.7/dist-packages/simpletransformers/classification/classification_model.py:602: UserWarning: Dataframe headers not specified. Falling back to using column 0 as text and column 1 as labels. "Dataframe headers not specified. Falling back to using column 0 as text and column 1 as labels."
/usr/local/lib/python3.7/dist-packages/simpletransformers/classification/classification_model.py:927: FutureWarning: Non-finite norm encountered in torch.nn.utils.clip_grad_norm_; continuing anyway. Note that the default behavior will change in a future release to error out if a non-finite total norm is encountered. At that point, setting error_if_nonfinite=false will be required to retain the old behavior. model.parameters(), args.max_grad_norm
Exception ignored in: <generator object tqdm.__iter__ at 0x7f7367f73ed0> Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/tqdm/std.py", line 1193, in __iter__ self.close() File "/usr/local/lib/python3.7/dist-packages/tqdm/notebook.py", line 283, in close self.disp(bar_style='danger', check_delay=False) File "/usr/local/lib/python3.7/dist-packages/tqdm/notebook.py", line 177, in display rtext.value = right File "/usr/local/lib/python3.7/dist-packages/traitlets/traitlets.py", line 604, in __set__ self.set(obj, value) File "/usr/local/lib/python3.7/dist-packages/traitlets/traitlets.py", line 593, in set obj._notify_trait(self.name, old_value, new_value) File "/usr/local/lib/python3.7/dist-packages/traitlets/traitlets.py", line 1222, in _notify_trait type='change', File "/usr/local/lib/python3.7/dist-packages/ipywidgets/widgets/widget.py", line 605, in notify_change self.send_state(key=name) File "/usr/local/lib/python3.7/dist-packages/ipywidgets/widgets/widget.py", line 489, in send_state self._send(msg, buffers=buffers) File "/usr/local/lib/python3.7/dist-packages/ipywidgets/widgets/widget.py", line 737, in _send self.comm.send(data=msg, buffers=buffers) File "/usr/local/lib/python3.7/dist-packages/ipykernel/comm/comm.py", line 121, in send """Send a message to the frontend-side version of this comm""" File "/usr/local/lib/python3.7/dist-packages/ipykernel/comm/comm.py", line 71, in _publish_msg buffers=buffers, File "/usr/local/lib/python3.7/dist-packages/jupyter_client/session.py", line 748, in send stream.send_multipart(to_send, copy=copy) File "/usr/local/lib/python3.7/dist-packages/ipykernel/iostream.py", line 262, in send_multipart return self.io_thread.send_multipart(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/ipykernel/iostream.py", line 212, in send_multipart self.schedule(lambda : self._really_send(*args, **kwargs)) File "/usr/local/lib/python3.7/dist-packages/ipykernel/iostream.py", line 203, in schedule self._event_pipe.send(b'') File "/usr/local/lib/python3.7/dist-packages/zmq/sugar/socket.py", line 505, in send return super(Socket, self).send(data, flags=flags, copy=copy, track=track) File "zmq/backend/cython/socket.pyx", line 718, in zmq.backend.cython.socket.Socket.send File "zmq/backend/cython/socket.pyx", line 765, in zmq.backend.cython.socket.Socket.send File "zmq/backend/cython/socket.pyx", line 235, in zmq.backend.cython.socket._send_copy File "zmq/backend/cython/checkrc.pxd", line 13, in zmq.backend.cython.checkrc._check_rc KeyboardInterrupt:
--------------------------------------------------------------------------- KeyboardInterrupt Traceback (most recent call last) <ipython-input-12-b82f9f4fc9ed> in <module>() 1 model = ClassificationModel("roberta", "xlm-roberta-large", args=custom_args,num_labels=4) ----> 2 model.train_model(combined_data,) /usr/local/lib/python3.7/dist-packages/simpletransformers/classification/classification_model.py in train_model(self, train_df, multi_label, output_dir, show_running_loss, args, eval_df, verbose, **kwargs) 626 eval_df=eval_df, 627 verbose=verbose, --> 628 **kwargs, 629 ) 630 /usr/local/lib/python3.7/dist-packages/simpletransformers/classification/classification_model.py in train(self, train_dataloader, output_dir, multi_label, show_running_loss, eval_df, verbose, **kwargs) 1120 1121 if args.save_model_every_epoch: -> 1122 self.save_model(output_dir_current, optimizer, scheduler, model=model) 1123 1124 if args.evaluate_during_training and args.evaluate_each_epoch: /usr/local/lib/python3.7/dist-packages/simpletransformers/classification/classification_model.py in save_model(self, output_dir, optimizer, scheduler, model, results) 2252 if optimizer and scheduler and self.args.save_optimizer_and_scheduler: 2253 torch.save( -> 2254 optimizer.state_dict(), os.path.join(output_dir, "optimizer.pt") 2255 ) 2256 torch.save( /usr/local/lib/python3.7/dist-packages/torch/serialization.py in save(obj, f, pickle_module, pickle_protocol, _use_new_zipfile_serialization) 377 if _use_new_zipfile_serialization: 378 with _open_zipfile_writer(opened_file) as opened_zipfile: --> 379 _save(obj, opened_zipfile, pickle_module, pickle_protocol) 380 return 381 _legacy_save(obj, opened_file, pickle_module, pickle_protocol) /usr/local/lib/python3.7/dist-packages/torch/serialization.py in _save(obj, zip_file, pickle_module, pickle_protocol) 493 # this means to that to get tensors serialized, you need to implement 494 # .cpu() on the underlying Storage --> 495 if storage.device.type != 'cpu': 496 storage = storage.cpu() 497 # Now that it is on the CPU we can directly copy it into the zip file KeyboardInterrupt:
In [13]:
predictions, raw_outputs = model.predict(list(test_df['text'].values))
In [15]:
# Applying the predictions to the labels column of the sample submission
test_df['label'] = predictions
test_df
Out[15]:
id | text | label | |
---|---|---|---|
0 | 0 | we propose a lightweight framework to detect i... | 3 |
1 | 1 | the proposed method presents an alternate solu... | 2 |
2 | 2 | proposed ear identification method fusing SIFT... | 3 |
3 | 3 | a method to reconstruct the three-dimensional ... | 3 |
4 | 4 | strong local consistencies can improve their p... | 0 |
... | ... | ... | ... |
10795 | 10795 | whole-body gradient echo scans of 240 subjects... | 3 |
10796 | 10796 | we present a tracker that accomplishes trackin... | 3 |
10797 | 10797 | the most popular FL algorithm is Federated Ave... | 1 |
10798 | 10798 | in the field of Autonomous Driving, the system... | 2 |
10799 | 10799 | our method takes as an input a foreground imag... | 3 |
10800 rows × 3 columns
In [19]:
!mkdir assets
# Saving the sample submission in assets directory
test_df.to_csv(os.path.join("assets", "submission.csv"), index=False)
mkdir: cannot create directory ‘assets’: File exists
In [ ]:
Mounting Google Drive 💾
Your Google Drive will be mounted to access the colab notebook
Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.activity.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fexperimentsandconfigs%20https%3a%2f%2fwww.googleapis.com%2fauth%2fphotos.native&response_type=code
Enter your authorization code:
In [ ]:
Content
Comments
You must login before you can post a comment.