12.7 Commands
The following commands, all optional, are available for the HPC solution scheme (
If an NVIDIA GPU is available, then this may be selected using the Hardware command. If GPU hardware is specified, but the system cannot find an NVIDIA GPU, then ERROR 3005 will result. The memory required for the TUFLOW model will be compared against the free memory available on the GPU. If the available memory is insufficient to run the model ERROR 3017 will result. For models that only just fit within the available device memory by a small margin, it is possible that a memory allocation will fail due to being unable to find the required memory as a contiguous block, in which case ERROR 3018 will be reported during model initialisation.
The default setting of “CHECK”, causes TUFLOW to report CHECK 2420 if the HPC solution scheme is used with the double precision executable. Prior to the 2025 release, the default setting of this command was “ERROR” and TUFLOW would stop with ERROR 2420 (advising that it is recommended to use the single precision binary). Due to its explicit finite volume formulation and being depth based, the HPC 2D scheme does not generally require double precision (DP). There can also be substantial speed gains using single precision (SP) on some GPU cards, and there is significantly less memory footprint. However, 1D-2D linked models that also use the ESTRY 1D engine may still require double precision in situations where the model is at higher elevations. Additionally, applications such as long-term simulations and/or simulations with groundwater may require double precision for an accurate result. The “OFF” option will allows users to run the HPC solution scheme in double precision without producing a CHECK or ERROR message.
For computers with more than one GPU, the NVIDIA GPU driver will search all connected GPUs, and compile an enumerated list of NVIDIA GPUs. This list will range from 0 to n-1, where n is the number of attached NVIDIA GPUs. This command may be used to:
- Run a model on a particular GPU, e.g. “
GPU Device IDs == 1 ” will run the model on the second NVidia GPU in the list of available NVIDIA GPUs. - Distribute a model over two or more GPUs, e.g. “
GPU Device IDs == 0 1 ” will run the model spread over the first two NVIDIA GPUs in the list. Note that the GPUs do not have to be consecutive and the device IDs can appear in any order.
- If you only have one GPU device, or you wish to use the primary device, this command is not needed.
If desired, the selection of GPU Device IDs can be specified on the command line when running the executable:
- To select a specific GPU, specify “-puN” where N is the device ID.
- To select mutliple GPUs for a distributed run, supply a “-puN” argument for each device ID required, for example -pu0 -pu1.
- The “-pu” arguments will automatically override any GPU Device IDs specified in the tcf.
Also note:
- If the list of device IDs is longer than the number of available devices, then the list will be truncated to the number of available devices and WARNING 2784 issued.
- If a device ID is specified that is outside of the range 0…n-1 then ERROR 3005 will result.
- If a device ID is specified that already has a model running on it, then the requested GPU will be loaded up with the additional model, which will cause both the existing model and the new model to solve more slowly.
- A GPU module licence is required for each GPU.
- A CPU thread is created for managing the compute stream for each GPU device.
- Available memory checks are performed on all GPUs in the list.
- If Hardware target is CPU then this command is ignored.
These two commands are identical and control the number of CPU threads used by HPC (including Quadtree) when solving on CPU instead of GPU. For example, “CPU Threads == 6” runs the HPC 2D solver across 6 CPU cores. The number of threads may also be specified as a command line argument -nt[number_of_threads]. Using the command line argument will override any definition in the tcf file.
Notes:
- TUFLOW licences have 4 times the number of TUFLOW engine licences available as HPC “thread” licences. For example a local or network 4 licence has 16 thread licences available.
- The default number of threads that HPC will use for the CPU solver run is 4.
- The maximum number of threads possible is the lesser of the maximum number of cores available on the machine and the number of available TUFLOW thread licences.
- Pre-processing of SGS elevations can be computationally intensive, as can the compression of TIF output files. If the number of threads has not been specified then the number of threads used for these tasks will default to the maximum number of cores available without requiring any additional thread licences.
When a model is spread over two or more GPUs (or CPU threads), the model is partitioned into vertical ribbons and each device solves its own ribbon, with boundary data synchronised between the devices at each timestep. By default the ribbons are of equal width. However, the computational burden for each GPU is rarely uniform due to each ribbon having a different active cell or wet cell count. This command will vary the ribbon sizes in accordance with the load factors in the list, and can improve overall solution time for unbalanced models. Also, for models that require nearly all of the available GPU memory on systems with GPUs of different memory, this command can be used to apportion ribbon sizes accordingly. Additional notes:
- Load factors are mapped respectively to the devices in the same order in which they appear in the list of device IDs.
- Load factors are normalised after reading, if required, so their average is one.
- Upon completion of the model, TUFLOW will report the approximate computational load split across the devices, and offer a suggested list of load factors that may improve the workload balance.
- When running on CPU, this command adjusts the ribbon size for each thread.
- For a Quadtree model, the decomposition is not performed in ribbons, instead the list of cells is partitioned.
Some models of NVIDIA GPUs allow for a direct communications link between them, either via a specific hardware connector or via the PCI bus controller. This is known as ‘peer to peer’ access. TUFLOW HPC will by default enable peer to peer access if the driver reports that it is available. This command may be used to specifically disable peer to peer communications if desired.