13.6 Optimising Startup and Run Times

Regardless of the method for initiating a TUFLOW simulation, a simulation start stats file (_start_stats.txt) is output to the same location as the .tlf file (see Section 14.4.3). This file contains information on the total time and the time elapsed for each stage of model initialisation. This can be used to identify the stages causing slow simulation start-up. If you have a problematic (slow starting) model, please email this file and corresponding .tlf file through to support@tuflow.com.

From the 2018-03-AA release, a new output file is created named “_run_stats.txt” (see Section 14.4.4). This file contains the amount of time that TUFLOW spends in the 1D and 2D computations. At each mass balance output interval, the percentage of the total computational effort that TUFLOW has spent in 1D calculations, 2D calculations and other is output to the run_stats file. The “other” column includes a variety of tasks that are neither 1D or 2D computations, such as writing of outputs, and transfer of data to GPU (if running on GPU devices). “Other” also includes time spent within an external 1D solver.

13.6.1 Improved pre-processing of 1D Model Inputs

For the 2020-01 release, reading and processing of 1D inputs has been significantly improved, particularly for large urban drainage models (>1,000 1D pipe network elements). For a tested model with 25,000 1D channels, the start-up was approximately 40 times faster with the 2020-01 release compared to the previous 2018-03 release changing the start-up time from nearly two hours to less than 3 minutes. No changes in model files is required to implement this improved pre-processing time.

13.6.2 Parallel Processing for SGS initialisation

With the default SGS Method C (the default if “SGS == ON”), the SGS elevations are retained in memory throughout the pre-processing stage, with the generation of the storage and face hydraulic data curves occurring only once at the end of the pre-processing. This is a computationally intense exercise, particularly for large models with small SGS sample distances. To speed up model initialisation, this has been parallelised to utilise multiple CPU cores.

By default, all CPU threads will be used for final SGS elevation pre-processing unless the number of threads (-nt[thread count]) command line argument has been specified. For example, to run on 8 threads the command line argument “-nt8” would be used. There is no check for thread licensing used for pre-processing. If the number of threads specified in the command line argument exceeds the number of threads available, all threads are used.

At the end of the .tgc file, after all elevation datasets have been processed, an XF file is written if the XF Files command is set to on (default). The XF file is written to an “xf” folder, which sits in the same location as the .tgc. The XF file is then used for any subsequent simulations for optimised pre-processing performance. To avoid re-processing when changes are made to .tgc data other than elevation (e.g. active cells, materials, soils, etc.), the XF file is not written with the same filename as the .tgc. Instead, the .xf will be prefixed by “hpc” or “qdt” for HPC single grid and Quadtree simulations respectively and includes the nesting level and cell size. Any text set with the .tcf command XF Files Include in Filename is included.

When reading the pre-processed SGS XF file, a check is done on the final SGS elevations, if these are consistent then the XF file is used.

13.6.3 Optimising Multi-GPU Performance (HPC Only)

If a model is simulated across multiple GPU devices, one of the devices (usually the one with the most wet cells) will be controlling the speed of the simulation and the other devices will be underutilised. By default, TUFLOW HPC divides a model equally over multiple GPU devices. However, for real-world models, it is usual for the GPUs to have an inequitable amount of workload due to the number of active cells and number of wet cells, and this can change throughout the simulation as the model wets and dries.

From the TUFLOW 2020-01 release it is possible to distribute the workload unequally to the GPU devices. During a simulation the workload efficiency of each GPU is output to the console and to the .tlf file with a suggested distribution provided at the end of the simulation. A number of iterations may be required to fully optimise the distribution.

For example, a model simulated across four GPU devices reported at the end of the simulation in the .tlf file:

Relative device loads: 60.3% 100.0% 83.7% 53.2%
HPC Suggested workload balance HPC Device Split == 1.23, 0.74, 0.89, 1.40.

The command HPC Device Split == 1.23, 0.74, 0.89, 1.40 was added to the .tcf file for the next simulation producing the improved device workload efficiencies below and a 20% faster run time.

Relative device loads: 100.0% 96.1% 96.8% 94.6%

Note: The benefit depends on the model, but if you have a significant variation in workload efficiencies between GPU devices this feature should provide a noticeable decrease in run times.

13.6.4 Auto Terminate (Simulation End) Options

TUFLOW Classic and HPC include an Auto-Terminate feature for stopping simulations after the flood peak has been experienced within the simulation. This can help project efficiencies by avoiding unnecessary model simulation time once the peak flood extent has been achieved.

The 2D cells that are monitored to trigger the auto-termination are controlled by specifying a value of 0 (exclude) or 1 (include) using the .tcf commands: Set Auto Terminate and Read GIS Auto Terminate (see Table 13.4).

For example, in the below, all cells are first set to be excluded for monitoring followed by the reading of a GIS layer to set cells individually.

Set Auto Terminate == 0
Read GIS Auto Terminate == ..\model\gis\2d_at_001_R.shp

Table 13.4: **2D Auto Terminate (2d_at) Attribute Descriptions**
No	Default GIS Attribute Name	Description	Type
1	AT	A value of 0 (exclude) or 1 (include) to be assigned to cells falling on or within the object.	Integer

At each Map Output Interval the monitored cells are compared against two criteria:

The percentage of the wet cells that have become wet since the last map output interval.
The velocity-depth product at the current timestep compared to the tracked maximum.

For the percentage of cells that have become wet since the last interval, the maximum allowable value is controlled with the .tcf command:

Auto Terminate Wet Cell Tolerance</> == <maximum_allowable_%_of_newly_wet_cells>

If set to 0, then if any monitored cells have become wet since the last map output the simulation continues. If set to a value of 5, then up to 5% of monitored cells can become wet since the last map output while still triggering an auto-termination of the simulation.

For the velocity-depth tolerance, at each output interval the velocity-depth product is compared to the tracked maximum value. If the current dV product is within the specified tolerance Auto Terminate dV Value Tolerance the simulation is not terminated.

The total number of cells that are allowable within the specified range is controlled with Auto Terminate dV Cell Tolerance. If set to a value of 1, then up to 1% of monitored cells can be within the tolerance value without triggering an auto-termination of the simulation. The larger the Auto Terminate dV Value Tolerance the further the dV product needs to have dropped from the peak value.

The time that the auto-terminate feature commences can be controlled using the .tcf command Auto Terminate Start Time otherwise the Start Time is used.

Note, this option is only assessed at every Map Output Interval.