13.7 Reproducibility of Results
A key concern with hydraulic modelling is the replication of results across a range of hardware. The following sections provide some information to be aware of when looking at the replication of results.
13.7.1 TUFLOW Classic (CPU only)
The TUFLOW Classic engine is written in Fortran and compiled for Windows™ with Intel™’s Fortran compiler, leading to well optimised CPU code. The computational algorithm is implicit in nature and difficult to parallelise for multi-thread execution. As a result race conditions do not exist and a repeat run of a model with the same inputs, with the same executable on the same type of CPU, should yield bit-wise identical results (i.e. a subtraction of results should be exactly zero). If a user finds that repeat model runs (same executable, same CPU) do not yield identical results then please contact support@tuflow.com as this may indicate a memory access error or an un-initialised variable in the code.
A repeat model run with the same executable but on a different type of CPU (e.g. Intel Xeon vs i9, or AMD Epyc vs Ryzen, or Intel vs AMD) may produce very slight differences in results due to the CPUs having different instruction set extensions that may or may not be utilised. Such differences are typically less than 1 mm water surface elevation but, in some locations, the differences can become accentuated due to changes in overtopping or an operational structure threshold.
Even though the source code for TUFLOW Classic is currently seeing little development, as we update Intel Fortran compilers, different releases of TUFLOW can be expected to show minor differences even on the same CPU.
13.7.2 TUFLOW HPC (incl. Quadtree) on CPU
The TUFLOW HPC engine (incl Quadtree) is written in C++ with NVIDIA™ CUDA GPU kernels. The kernels have been carefully written so that they can be compiled for both GPU and CPU execution. When compiled for CPU execution, the Microsoft Visual Studio compiler is used and the code is built into the dynamic link libraries (DLLs) in the executable bundle. Similar to TUFLOW Classic, repeat model runs of the same model (same TUFLOW executable, same CPU type) should produce bit-wise identical results. Again, if differences are observed please contact support@tuflow.com as this may indicate an issue that needs to be addressed.
As the TUFLOW HPC and Quadtree solvers are fundamentally different to TUFLOW Classic, there will always be minor differences between solutions from the different schemes even when using the same turbulence model and without sub-grid sampling (SGS). As TUFLOW HPC and Quadtree both default to using the Wu turbulence model while Classic can only use the Smagorinksy model, the differences will be more significant when run with the default settings.
13.7.3 TUFLOW HPC (incl. Quadtree) on GPU
When the CUDA kernels are compiled for GPU execution, a GPU agnostic intermediate code file (ptx) is produced with the final compilation of that being done on a just-in-time basis by the GPU driver on the machine that the executable is running on. The results may now depend not just on the version of TUFLOW executable used and type of CPU (since the pre-processing of the model input is still performed on CPU), but also on the type of GPU and the version of the GPU driver installed. GPUs have thousands of cores that work in parallel. However, the kernels have been carefully written to avoid race conditions, and the adaptive timestep control avoids relying on variables summed with atomic additions of floating point data. Provided these factors (version of TUFLOW, CPU type, GPU type, GPU driver version) are kept constant, a repeat model run will yield bit-wise reproducible results.
Also note that the GPU cores use different hardware for the floating point operations and the GPU compiler may re-optimise the sequence of instructions for complex lines of code compared to the CPU compiler. Even when using the same version of TUFLOW HPC, very small differences in the solution can be seen when the model is run on GPU compared to running on CPU. These differences are typically less than 1 mm water surface elevation but again, in some locations, the differences can become accentuated due to changes in overtopping or an operational structure threshold.
13.7.4 TUFLOW HPC (incl. Quadtree) on multiple GPUs / CPU threads
TUFLOW HPC (incl. Quadtree) support running a model across multiple GPU devices in one computer. In this case the model is decomposed into subdomains, and model data are exchanged and sychronised at the domain boundaries as required. Care has been taken to ensure that results when run on multiple GPUs (of all the same GPU type) are bit-wise identical to when run on a single GPU. If differences are observed between results from a run on multiple devices vs a single device please contact support@tuflow.com.
Likewise when running on multiple CPU threads (default is four unless specified), the model results should be bit-wise identical to when running on a single thread (CPU Threads