From 2df9d8047286b64cbd828c4693e5606690999541 Mon Sep 17 00:00:00 2001 From: Remi Hellequin <hellequir@fusion.ib0.ice.centralesupelec.fr> Date: Fri, 21 Feb 2020 16:32:02 +0100 Subject: [PATCH] Update README. Change default q in PBS scripts. Change temp files name in gitignore. --- .gitignore | 2 +- README.md | 9 +++++++-- dual_gpu_training.pbs | 3 ++- single_gpu_training.pbs | 3 ++- 4 files changed, 12 insertions(+), 5 deletions(-) diff --git a/.gitignore b/.gitignore index d9aca80..980cbee 100644 --- a/.gitignore +++ b/.gitignore @@ -1,5 +1,5 @@ transformers -debug_squad runs +debug_squad_* cached_* squad_train_* diff --git a/README.md b/README.md index 094b6bf..6fdd7a4 100644 --- a/README.md +++ b/README.md @@ -14,6 +14,7 @@ fusion-output -f <jobid> # watch the job logs during Fusion supercomputer documentation : https://mesocentre.pages.centralesupelec.fr/user_doc/ +Transformers on github : https://github.com/huggingface/transformers The documentation for `run_squad.py` can be found here : https://huggingface.co/transformers/examples.html#squad ## Configure environment @@ -39,13 +40,17 @@ Run the network training ```bash qsub <pbs_script>.pbs ``` -> Some temporary data is written in directory `--output_dir` (`./debug_squad/`). You may have to clean the directory manually before relaunching the training `rm -r ./debug_squad/` -Two training examples : +Two training examples are provided : - `single_gpu_training.pbs` : train the network on a single GPUs - `dual_gpu_training.pbs` : train the network on a two GPUs +Notes : + +- Some temporary data is written in directory `--output_dir` (`./debug_squad/`). You may have to clean the directory manually before relaunching the training `rm -r ./debug_squad/` +- During the TP sessions, you can use the reservation `isiaq` instead of the `gpuq` by commenting/decommenting lines beginning with `#PBS -q`) + ## Misc notes ### Squad dataset location diff --git a/dual_gpu_training.pbs b/dual_gpu_training.pbs index 74dc12a..b968b57 100644 --- a/dual_gpu_training.pbs +++ b/dual_gpu_training.pbs @@ -4,7 +4,8 @@ #PBS -l walltime=02:00:00 #PBS -l select=1:ncpus=24:ngpus=2:mem=20gb #PBS -q gpuq -#PBS -P test +##PBS -q isiaq +#PBS -P isia # Go to the current directory cd $PBS_O_WORKDIR diff --git a/single_gpu_training.pbs b/single_gpu_training.pbs index 1c8ee2e..460d289 100644 --- a/single_gpu_training.pbs +++ b/single_gpu_training.pbs @@ -4,7 +4,8 @@ #PBS -l walltime=02:00:00 #PBS -l select=1:ncpus=12:ngpus=1:mem=20gb #PBS -q gpuq -#PBS -P test +##PBS -q isiaq +#PBS -P isia # Go to the current directory cd $PBS_O_WORKDIR -- GitLab