Article source | turbine cloud community (ai/deep learning cloud GPU server address training platform, official experience url: gpushare.com/)

The original address | 2 data download + model training which boy 】 don’t want to have their own training models (to tears of the poor)

The author | junyu

I use RTX3090 to add rotary-position-embedding to OpenWebText data set and train electra-Small model. Because openWebText file is too large, this tutorial provides the data set I processed.

1. Environment selection

  • Pytorch, version 1.8.1
  • Python 3.8
  • Cuda 11.1

2. Prepare the OpenWebText dataset

Oss oss login # Download the openWebText.tar. gz package from the public database path Oss: / / junyu - the dataset/openwebtext tar. Gz/hy - # TMP unzip the files in the current directory tar - ZXVF openwebtext. Tar. GzCopy the code

Download electra pre-training code and install the dependent environment

# # switching path CD/hy - TMP code has been uploaded to making git clone https://github.com/JunnYu/hy_tutorial.git # if failed download please use mirror address git clone https://hub.fastgit.org/JunnYu/hy_tutorial.git # # switching path CD hy_tutorial unzip the files unzip electra_small_pretrain_pytorch. Zip # CD electra_SMALL_Pretrain_Pytorch # Install python dependency package PIP install -r requiredings.txtCopy the code

4. Register a Wandb account

(1) Open wandb.ai/site

(2) Click Settings under your profile picture to find the API key

Drop down to API keys, if not New key. Copy the API key

(3) Login

Wandb wandb login # copy into API keyCopy the code

5, background running pre-training program

# change path, Make sure to run the program nohup python pretrain.py >> in the background in the path CD /hy-tmp/hy_tutorial/ electra_SMALL_pretrain_pytorch Log 2>&1 & # Check the training log tail -f electra_SMALL_pretrain. logCopy the code

6. Terminate the pre-training program

Ps-aux # Kill 2983Copy the code

7. Wait for the program to finish running (about 55h)

Wandb sync wandb/latest-runCopy the code

8. DEV DATASET

9. Training details

  • Training batch_size 256
  • Learning rate LR 5E-4
  • Maximum sentence length max_seqlen 128
  • Training total step 50W
  • GPU RTX3090
  • The total training time was 55h

ROFORMER_ELECTRA’s WANDB log

  • Pre-training log
  • GLUE Fine tuning log

Huggingface. Co/junnyu/elec… Huggingface. Co/junnyu/rofo…