Getting started in Multispeech

The aim of this document is to provide some practical and technical information for new members of the Multispeech team. In addition to the information provided here, you can consult the dedicated Q&A platform to ask your diverse questions, or to help others by answering their questions, if possible.

Also, check different channels of Multispeech on Mattermost. In this platform, you can directly communicate with others, ask/answer questions, share your interesting stuffs, etc.

For more detailed information on the team and Inria/Loria life, we highly recommend you to explore Welcome to Multispeech! documentation on GitLab, especially this section.

Content

IT infrastructure

Professional Email
Access from outside Loria
WiFi
Git
Shared disk space

Computational cluster

Setting up Conda
Reserving nodes on Grid5k
File transfer
Job monitoring
Datasets
Setting up Jupyter notebook
Visual Studio Code (VSC) for remote code editing and notebooks running
Other Grid5000 resources

IT infrastructure

The IT Portal is here. For more details, you can also visit this website.

Professional Email

You can access your Inria email account following this website.

Remember to always use your Inria email instead of a personal one.

Access from outside Loria

By ssh (for data) following this website (cf. also the example SSH config file available here).
By VPN (for websites, downloading papers, etc.) following this website.

WiFi

When you are outside your office but still at Loria, you can follow these instructions to connect to the WiFi.

Git

You should use Inria GitLab to host/share your codes. You can create a new group associated with your project. This repo will host all the codes (Python and LaTeX) that you will develop (but not data or PDFs, which are not meant to be stored in a Git repo).

Shared disk space

On the lab’s network:
/local/multispeech_ge/corpus for original corpora.
/local/multispeech_ge/calcul for intermediate/final computation results.
/local/multispeech_ge/congres for conference proceedings.
These three disk spaces should be mounted on your laptop/desktop.
On Grid’5000 group storage:
/srv/storage/talc@storage4.nancy.grid5000.fr/
/srv/storage/talc2@talc-data.nancy/
/srv/storage/talc3@storage4.nancy.grid5000.fr/

The three disk spaces are structured similarly to ⁣/local/multispeech_ge but there is much more space available.

Create your own folder in one of the multispeech/calcul/users/ directories in order to store the intermediate/final results of your experiments. See the following section for more details.
Remember to always monitor the stored data and remove your useless data.

Computational cluster

We do our experiments on Grid’5000, which is a large-scale and flexible testbed for experiment-driven research, with powerful GPU and CPU machines. Here, we provide a step-by-step guide to get started with Grid’5000.

Please also check this useful doc if you need more detailed information.

Create an account here (enter Multispeech for “Group Granting Access”).
Set-up SSH key pairs. See here on how to do that.
Paste the public key’s content (run cat ~/.ssh/id_rsa.pub) in your account space on Grid5000 (manage account -> SSH keys).
Create a config file for easy connecting (see the example SSH config file available here) and put it in ~/.ssh; you’ll also need to locate your generated public and private keys there.
Simply connect to the Nancy workstation by: ssh nancy.g5k.

Setting up Conda

Once you are done with the previous set-up, you can create a folder with your username at /srv/storage/talc3@storage4.nancy.grid5000.fr/multispeech/calcul/users. Then, follow the below instructions, while being connected to the cluster, to set up the conda environment for running your codes:

Install anaconda by first downloading an appropriate version, e.g.:
wget https://repo.anaconda.com/archive/Anaconda3-2020.11-Linux-x86_64.sh.
Then, install it using bash Anaconda3-2020.11-Linux-x86_64.sh.
Activate the conda environment: source ~/.bashrc.
Create a virtual environment using conda: conda create --name venv. See this website for more details.
Activate the virtual environment: conda activate venv.
Install PyTorch. You can follow the official PyTorch’s website for the installation details.

To know the CUDA version of the GPU machine that you are going to use for model training, you can simply run nvidia-smi in the terminal connected to that cluster. You can also see hardware properties of different clusters available on Grid`5000.

Important- It might happen that PyTorch is not installed with CUDA enabled. To resolve this issue, first make sure that Torch is uninstalled. You can uninstall it with pip uninstall torch, and then reinstall it using Wheel, e.g.:
pip install torch==1.12.1+cu102 torchvision==0.13.1+cu102 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu102
Then, if you get the following outputs, it means that PyTorch has been successfully compiled with CUDA enabled:

>>> import torch
>>> torch.__version__
'1.12.1+cu102'
>>> torch.zeros(1).cuda()
tensor([0.], device='cuda:0')
>>> torch.cuda.is_available()
True

Reserving nodes on Grid5k

Interactive mode (for debugging purposes). Examples:
oarsub -l gpu=1,walltime=01:00:00 -I -q production
or
oarsub -t exotic -p "cluster='grouille'" -l gpu=1 host=2,walltime=01:00:00 -I
Passive mode (for running your debugged codes). Examples:
oarsub -p "cluster='graffiti'" -l nodes=1,gpu=1,walltime=03:00:00 "bash run_oar_activate.sh" -q production
or
oarsub -p "cluster='graffiti'" -l nodes=1,gpu=1,walltime=03:00:00 "bash run_oar_activate.sh" -q production -O OUT/oar_job.%jobid%.output -E OUT/oar_job.%jobid%.error

See here for more details on how to book a node. You can choose different clusters and wall time depending on your needs. See the list of available clusters here.

You can also run your Python code from inside a bash script in passive mode:
oarsub -S ./train.sh -q production -p "cluster='graffiti'" -l nodes=1,walltime=30:00:00 -O OUT/oar_job.%jobid%.output -E OUT/oar_job.%jobid%.error
where, train.sh looks like this:

#!/bin/bash

# This is to activate a python environment with conda
source ~/.bashrc
conda activate venv

# Run the training script
python train.py

File transfer

You can easily transfer files between your local machine and Grid’5000 using rsync:

rsync -avzP /path/to/fileOrFolder/ username@nancy.g5k:/srv/storage/talc3@storage4.nancy.grid5000.fr/multispeech/calcul/users/username

You can reverse the two paths for transferring from Grid5k to your computer.

Job monitoring

You can see the real-time status of Grid’5000 following this link.

You can also monitor the status of your launched jobs by typing oarstat | grep username in the command line.

To delete your job, simply run oardel job_id, where job_id is the ID assigned to your job.

To connect to your running job, run oarsub -C job_id.

You can run watch -n 1 'nvidia-smi' to see the status of the GPU machine over time.

Datasets

You can find several datasets (corpora) on Grid5000 in the following directories:

/srv/storage/talc@storage4.nancy.grid5000.fr/multispeech/corpus
/srv/storage/talc3@storage4.nancy.grid5000.fr/multispeech/corpus

Setting up Jupyter notebook

When you launch a job in the interactive mode, e.g., for debugging your codes, it’s more comfortable to have a visual user interface. This can be achieved by Jupyter notebook, for which you will first need reserve a node on Grid5000 (cf. Reserving nodes on Grid5k) and then install Jupyter lab in your virtual environment by pip install jupyterlab.

Once done, follow these steps (you can also connect to your running job in the passive mode (cf. Job monitoring), and launch a Jupyter notebook):

Activate your conda env, and then run this: jupyter lab --ip 0.0.0.0:

>> source ~/.bashrc
>> conda activate venv
>> jupyter lab --ip 0.0.0.0

This will give you a URL link.

On your local machine run ssh g5k -L 8888:grele-10.nancy.grid5000.fr:8888 -v, where grele-10 is the node’s name that you have booked. Also, make sure that the port 8888 is the same as the one given in the output of the previous step.
Paste the URL link given in Step 1 in your browser.

Important- When your computer gets disconnected from the Internet, which could happen, e.g., when your computer goes into sleep, you will lose the interactive session and have to redo all the above steps, which is annoying. To avoid this, you can use the GNU screen, to set up a protected screen session.

In order to do that, once you connect to Grid’5000, e.g., following ssh username@nancy.g5k, run a screen session by typing screen, and then launch your interactive job, e.g., oarsub -l gpu=1,walltime=00:30:00 -I -q production. You can then follow Step 1 above. Once you get the URL link, you need to press CTRL+a, d to detach from your screen session, and then follow Steps 2-3 above.

To attach to the screen session, type screen -d -r. If you have multiple screen sessions running, you might have to give the ID of the session that you’re going to attach to. More information on screen sessions.

Visual Studio Code (VSC) for remote code editing and notebooks running

Prepared by Antoine Bruez.

It can be really helpful to use your favorite IDE for editing remote source codes and run Jupyter notebooks. This is made possible by configuring VSC (Visual Studio Code) for SSH connections and running notebooks with g5h-hosted kernels. Additional help can be found on the official website.

Summary

Prerequisites
Setup SSH VSCode
Run jupyter notebooks in VSC using remote kernels (hosted in g5k)

Prerequisites

Have VSC installed on your device.
Have access to G5K.
Having setup your ssh config file in your local /home/.ssh.
Have jupyterlab installed in a conda venv.

Setup SSH VSC

Open VSC in your local device.
- Install the Remote Explorer extension.
- Ctrl + Shift + X to access extension manager.
- Search for Remote Explorer and install it.
- It should appear on the left bar.
Connect to the remote server.
- Ctrl + Shift + P to open settings search bar.
- Look for Remote-SSH: Connect to host.
- Add a new host. Typically ssh -J g5k_user@access.grid5000.fr g5k_user@nancy.
- Then, Remote-SSH: Connect to host and connect to ⬆️ !
Work
- You can now open a Terminal from VSC, you are then logged into g5k_user@nancy !
- To open a remote folder and work remotely : Ctrl + K then Ctrl + O then browse the remote location you want to work in.
- If git client is installed and configured (remotely), you should be able to use the remote git client from vscode (left bar again).
- Every saved modification is applied on remote files.
Additional info
- A VSC server is launched in the remote device to comunicate with your local client (you can see it with htop on remote device).

Runing jupyter notebooks in VSC using remote kernels (hosted on G5K)

When dealing with a Jupyter notebook, one could be interested in editing it in VSC while running it on a G5K node. Let’s see how to do so.

Book a node on G5k:

g5k_user@fnancy: oarsub -I -q production -p cluster='grue' -l gpu=1,walltime=01:00:00

grk_user@grue$ source ~/.bashrc # To use your shortcuts (path)

grk_user@grue$ conda activate myenv

(myenv) grk_user@grue$ python -m jupyterlab --ip 0.0.0.0 # Launch the server

You should get:

[I 2023-05-02 11:36:00.469 ServerApp] Serving notebooks from local directory: yourpath

1 active kernel

Jupyter Server 2.5.0 is running at:

http://grue-X.nancy.grid5000.fr:8888/lab?token=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

http://127.0.0.1:8888/lab?token=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Copy the address & token http://grue-X.nancy.grid5000.fr:8888/lab?xxxxxx[...].

Open a .ipynb file in the remote VSC (the remote connection we have just setup, connected to Nancy).

💡 You may install some extensions like Python and Jupyter.
Select a kernel (right up button) > existing jupyterlab servers.
- Add the remote address & token that you just copied http://grue-2.nancy.grid5000.fr:8888/lab?xxxxxx[...].
- [Optional] Give it a name.
- Select this new remote kernel as the kernel you want to use.
Try running the following cell:
```
import torch
torch.cuda.is_available()
```
This should return True (if GPU is enabled in your node !).