Tensorflow is an open source software library used for numerical computation in machine learning, now at version 2.x.
It is constantly under development, and this also means that also its requirements in terms of pre-installed libraries change quite often.
However, compared to the previous version, it is easier to install on the current setup of the DCC cluster, and users are encouraged to do it in their home directory.
Here we provide some instructions, but for the always-changing requirements, please have a look at tensorflow official website.
Installation
We recommend installation of tensorflow in a virtual environment, and we cover here this installation procedure.
From version 2.1 to version 2.13, the installation provides both the CPU and the GPU versions of tensorflow at the same time. This is not true for the previous versions. Therefore, the installation instructions are the same in both cases, while the usage instructions differ.
The 3 step installation is as follows:
- Choose a compatible python version (3.5-3.10 depending on the version), and load the corresponding module
- Create a virtual environment, and activate it
- In the virtual environment
python3 -m pip install tensorflow
That’s all. Both the CPU and CPU version are installed.
For versions 2.14+, the base installation covers both the CPU and GPU versions, but there is an extra option to download also the CUDA, cuDNN libraries. See the Using tensorflow section, for specific instructions.
Using tensorflow
CPU
Just open a terminal, activate the virtual environment, and start using tensorflow. All the dependencies are automatically satisfied.
Remeber that if you run the program in batch mode, you need to activate the environment in the job-script to make tensorflow available.
GPU
We recommend to use the nodes equipped with the recent NVIDIA GPUs for better performance.
Tensorflow 2 depends on CUDA, cudnn and tensorRT (if you want to take advantage of this additional acceleration method in the inference phase).
All three of those dependencies are already available as modules.
Tensorflow 2.1, 2.2, 2.3
Tensorflow versions 2.1, 2,2 and 2.3 require CUDA 10.1, cudnn >= 7.6, and tensorRT 6.x
Therefore, to be able to run tensorflow 2 on GPUs, you need to :
-
- access a node with GPUs
- Activate the tensorflow virtual environment
- Load the following modules
module load tensorrt/6.0.1.5-cuda-10.1 module load cudnn/v7.6.5.32-prod-cuda-10.1
NOTE: the CUDA 10.1 module is automatically loaded by the previous two modules
Tensorflow 2.4
Tensorflow versions 2.4 requires CUDA 11.0, cudnn >=8.04, and tensorRT 7.x
Therefore, to be able to run tensorflow 2 on GPUs, you need to :
-
- access a node with GPUs
- Activate the tensorflow virtual environment
- Load the following modules
module load tensorrt/7.2.1.6-cuda-11.0 module load cudnn/v8.0.5.39-prod-cuda-11.0
NOTE: the CUDA 11.0 module is automatically loaded by the previous two modules
Tensorflow 2.5-2.11
Tensorflow versions 2.5 and higher require CUDA 11.1 at least, and cudnn ≥8.1
Therefore, to be able to run tensorflow 2 on GPUs, you need to :
-
- access a node with GPUs
- Activate the tensorflow virtual environment
- Load the following modules
module load cuda/11.6 module load cudnn/v8.3.2.44-prod-cuda-11.X module load tensorrt/7.2.3.4-cuda-11.X
You can choose a different cuda version ≥11.1
Tensorflow 2.12/2.13
Tensorflow versions 2.12 and 2.13 recommend CUDA 11.8 at least, and cudnn ≥8.6.0. It requires at least python 3.10.
Therefore, to be able to run tensorflow 2.12/2.13 on GPUs, you need to :
-
- access a node with GPUs
- Activate the tensorflow virtual environment
- Load the following modules
module load cuda/11.8 module load cudnn/v8.6.0.163-prod-cuda-11.X module load tensorrt/8.6.1.6-cuda-11.X
That’s all. As for the CPU case, if you need to run tensorflow on GPUs as batch jobs, you need to do activate the virtual environment and load the corresponding modules before the actual call to tensorflow.
Tensorflow 2.14-2.18
Since version 2.14, tensorflow provides two ways to install the program:
-
-
python3 -m pip install tensorflow
This does not install the CUDA dependencies.
-
python3 -m pip install tensorflow[and-cuda]
This installs tensorflow and also all the CUDA dependencies.
-
The installation with the CUDA libraries does not require any extra system module, but it takes up a lot more space.
If you want to use the system CUDA libraries, to run tensorflow 2.14-2.16 on GPUs, you need to:
-
- access a node with GPUs
- Activate the tensorflow virtual environment
- Load the following modules:
For tensorflow 2.14:module load cuda/11.8
module load cudnn/v8.6.0.163-prod-cuda-11.X
module load tensorrt/8.5.3.1-cuda-11.XFor tensorflow 2.15/2.16:
module load cuda/12.3.2
module load cudnn/v8.9.1.23-prod-cuda-12.X
module load tensorrt/8.6.1.6-cuda-12.XFor tensorflow 2.17:
module load cuda/12.3.2
module load cudnn/v8.9.7.29-prod-cuda-12.X
module load tensorrt/8.6.1.6-cuda-12.XFor tensorflow 2.18/2.19:
module load cuda/12.3.2
module load cudnn/v9.3.0.75-prod-cuda-12.XAs for the CPU case, if you need to run tensorflow on GPUs as batch jobs, you need to do activate the virtual environment and load the corresponding modules before the actual call to tensorflow.
Known issues
- If during the execution you encounter an error similar to
libdevice not found at ./libdevice.10.bc
please typeexport XLA_FLAGS="--xla_gpu_cuda_data_dir=$CUDA_ROOT"
in the terminal before running the python script, or put the line in the jobscript before calling python. - For tensorflow 2.14/2.15 it can happen that you get an error like
E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin
Please check, but in our experience the libraries are used despite the error message. - By default tensorflow allocates all the memory of all the reserved GPUs, even if it does not need it. To prevent this from happening, add this line in the jobscript:
export TF_FORCE_GPU_ALLOW_GROWTH=true
This way you can then see how much GPU memory the programa actually needs.
- If during the execution you encounter an error similar to