Section 6 GIS Formats
The 2023-03 Release increases support for GIS formats with new vector and raster formats supported for both reading and writing of GIS data. The newly supported formats are GeoPackage (for both vector and raster data) and GeoTIFF (for raster data only). The 2023-03 Release also extends capabilities for the existing supported NetCDF raster format. The sections below describe the new and extended formats.
6.1 GeoPackage Format
GeoPackage vector and raster support has been added for the 2023-03 Release for reading and writing of both vector and raster data. GeoPackage (https://www.geopackage.org/) is a widely supported, open format, built upon an SQLite database (this is stored as a single file with the extension .gpkg). The benefits of this format include:
- More than one layer can be stored in a single file. For example, all model inputs can be stored in a single GeoPackage database.
- It supports spatial indexing, making it much faster to work within GIS packages.
- Faster to write from TUFLOW than Shapefile format.
Initially this was added to address some issues with the existing supported vector file formats, notably slow writing and viewing of Shapefiles for large datasets. The MapInfo (MIF/MID) file format is much faster to write from TUFLOW than the Shapefile format, however, loading, rendering, and interacting with the MIF file in GIS packages is much slower than either the Shapefile or GeoPackage. When loading either a MapInfo or Shapefile layer into GIS, typically all the data is loaded and processed when panning the map window, even the ones that are outside of the current view. GeoPackage supports spatial indexing, so that when the view is panned or changed a limited set of objects (within that view) need to be drawn, this makes rendering and interacting with the data much faster.
- Load the file into QGIS and view the grid for the entire model.
- Zoom to a small portion of the model and redraw.
Table 6.1 shows approximate times taken for TUFLOW to:
- Write the grid check files. Note, the model has approximately 1.3 million 2D cells, all data was written to local SSD and loaded into QGIS (3.22.1)
- Load the file into QGIS and view the grid for the entire model.
- Zoom to a small portion of the model and redraw.
GIS Format | Write grid check (seconds) | Load into QGIS and draw all grid cells (seconds) | Zoom into to small area (seconds) |
---|---|---|---|
MIF |
|
|
|
SHP |
|
|
|
GPKG |
|
|
|
6.1.1 GeoPackage Projection
A projection can be defined (used to set the projection used by TUFLOW for GeoPackage files), with the following .tcf command:
This needs to reference an existing GeoPackage file which can contain either vector or raster data. To create a new GeoPackage file and set the projection in QGIS the following steps are required:
- From the menu select Layer >> Create Layer >> New GeoPackage Layer.
- In the New GeoPackage Layer dialogue:
- Set a database filename and location (e.g. projection.gpkg).
- Set a table name, typically this would be the same as the filename without the extension.
- Choose a geometry type (it shouldn’t matter which type is selected, but no coordinate system can be set without a type).
- Set the CRS for your model.
- Select “OK”, the GeoPackage file is now ready to be used in TUFLOW.
6.1.2 GeoPackage Vector
6.1.2.1 GeoPackage Vector Inputs
The GIS outputs from TUFLOW can be set to GPKG by using the following command:
Since a GeoPackage database can have more than one layer, when reading data into TUFLOW a GeoPackage file and table may need to be specified. The following options are possible:
To specify a table in a .gpkg use “>>”:
Read GIS Z Shape == gis\2d_zsh.gpkg >> 2d_zsh_L Specify a .gpkg file path only. It will assume the table name is the same as the .gpkg database name. For example, the below commands are equivalent:
Read GIS Z Shape == gis\2d_zsh_L.gpkg Read GIS Z Shape == gis\2d_zsh_L.gpkg >> 2d_zsh_L To specify more than one table in a database in the same command line by using “&&”. This is similar to “|” however there is no need to reference the database again:
Read GIS Z Shape == gis\2d_zsh.gpkg >> 2d_zsh_L && 2d_zsh_P “&&” can be used in conjunction with “|”:
Read GIS Z Shape == gis\2d_zsh_R.gpkg | gis\2d_zsh.gpkg >> 2d_zsh_L && 2d_zsh_P Use the command “USE ALL” with “>>” to tell TUFLOW to use all tables in the database:
Read GIS Z Shape == gis\2d_zsh.gpkg >> USE ALL Specify a database to use for subsequent inputs, which can be changed as inputs are read in:
Spatial Database == gis\2d_zsh.gpkg Read GIS Z Shape == 2d_zsh_L | 2d_zsh_P Spatial Database == gis\2d_mat.gpkg Read GIS Mat == 2d_mat_R This command can be used in the .tcf, .ecf, .tbc, .tgc, .qcf, .tef. Commands are localised to their relevant control file with the exception of the .tcf which acts as a global command - local spatial database will take precedence. The spatial database can be turned off or reverted to the .tcf database with the same command. Turning it off is a global command even when used in a control file other than the .tcf. Regardless of the default GIS extension, if there is an open database TUFLOW will not append “.mif” to the end of the table name.
Spatial Database == OFF | TCF
6.1.2.2 GeoPackage Vector Outputs
Outputs can be written into separate databases or grouped:
Grouped databases will group by output folder location. Separate databases will still group geometries together e.g., PLOT_P, PLOT_L, PLOT_R will be written to one database.
6.1.3 GeoPackage Conversion Check
GPKG layers include a primary key attribute (as an integer value) which is sometimes included in the attribute table in GIS programs (this happens in QGIS). This is completely fine for GPKG layers as TUFLOW expects this attribute to be present within the format, however if the layer is converted to a different format (e.g. shapefile) the primary key is sometimes retained as an attribute in the converted file. This can cause issues in TUFLOW as this will not always throw an error since some TUFLOW layer types are expected to contain an integer value in the first column. For example, this will happen with quadtree nesting level (2d_qnl) layers and TUFLOW will interpret the primary key as a nesting level value (and the actual nesting level value will be ignored). Typically the name of the primary key is “fid” in GPKG layers (however this is not necessarily always the case).
TUFLOW will produce ERROR 0248 if it finds the first field name is ‘fid’ for shapefiles and mif files as this is an indicator that the layer has been incorrectly converted from GPKG. This message can be changed to a warning by using the following command:
6.1.4 GeoPackage Raster
TUFLOW supports the GeoPackage raster format via the ‘Gridded Tiled Coverage’ extension to the standard GeoPackage specification. This extension is supported by QGIS and GDAL, however to the best of knowledge of the TUFLOW team at the time of the 2023-03 Release, it is not currently supported by either MapInfo or ArcGIS.
The format uses a tiled structure to make rendering and loading faster by enabling the ability to only process the required tiles. The tiled structure also makes pyramids (sometimes referred to as ‘overviews’) inherently available for the format. Note, the 2023-03 Release does not support pyramid creation yet, however, this functionality may be added in a later Release.
6.1.4.1 GeoPackage Raster Inputs
Using a GPKG raster is done in the same manner as a GPKG vector layer (Section 6.1.2) but using the ‘Read GRID’ command, for example either of the below commands:
The GPKG projection command stated in (Section 6.1.1) will also work for GPKG raster layers.
6.1.4.2 GeoPackage Raster Outputs
The following existing output commands can be extended to use GPKG:
The GPKG raster outputs will also be grouped if the grouping spatial database command is used from Section 6.1.2 (e.g. the DEM_M and DEM_Z will be in the same GPKG database as the 2D check vector layers).
6.1.4.3 GeoPackage Raster Compression
The GPKG raster format supports LZW compression of the data. The data is compressed by default, however, can be turned off with the following command:
By default, TUFLOW will also use a compression predictor to improve the compression ratio, this can be turned off using the following command:
Limitation:
- Currently TUFLOW only supports GPKG raster data containing 32-bit floating point data. This is limited to the ‘Gridded Tiled Coverage’ extension as the native raster support in GPKG currently only supports PNG and JPEG encoding.
6.2 GeoTIFF Format
The GeoTIFF raster format is supported in the 2023-03 Release for both inputs and outputs. The following command will set GeoTIFF as the default grid format:
The GeoTIFF raster format can be added as a gridded map output:
A projection can be set for the output GeoTIFF rasters by using the following command in the TCF:
Only the header information is read in from the file with this command, so it is safe to use large files without any negative impacts on start-up speed. A projection is not required to output to the GeoTIFF format, however it is required to include the spatial reference system in the output GeoTIFF. Currently there is no input projection checking of raster layers (like there is for GIS vector inputs) and the projection is only used for the GeoTIFF outputs.
TUFLOW supports several compression methods for GeoTIFF:
- LZW (Read/Write)
- DEFLATE (Read/Write)
- PACKBITS (Read only)
The data is compressed by default when writing to GeoTIFF format using the ‘deflate’ method, however, can be changed or turned off using the following command:
By default, TUFLOW will also use a compression predictor to improve the compression ratio, this can be turned off using the following command:
Testing has shown that using deflate compression with horizontal differencing will typically achieve better compression than the typical method of using a ZIP file on the same uncompressed data.
TUFLOW will default to using all available CPU cores when reading/writing GeoTIFF files which can speed up processing when using compression. This can be changed by specifying the number of threads using the command line argument “-nt[thread count]”.
Limitations:
- Currently TUFLOW only supports GeoTIFFs containing 32-bit floating point data.
- Currently does not support ‘Cloud Optimised GeoTIFFs’ or the tiled GeoTIFF format.
- GeoTIFFs support multiple raster bands, however TUFLOW will currently assume the input dataset is within the first raster band.
- Currently TUFLOW only supports unrotated GeoTIFF rasters.
6.3 NetCDF Grid
NetCDF grids are now supported as standard raster inputs for all “
The NetCDF raster inputs should follow the NetCDF CF Convention and are treated as a database within TUFLOW (similar to GPKG). The new database input command options are:
and
Limitation:
- NetCDF rasters support multiple raster bands, however TUFLOW will currently assume the input dataset is within the first band.
Note, rainfall and external stress NetCDF inputs remain unchanged and do not use the new syntax.
6.4 Minor Enhancements and Bug Fixes for 2023-03-AB
6.4.1 Spatial Database Command Now Works in Quadtree Control File
Build 2023-03-AB fixes an issue when using the “Spatial Database” command in the Quadtree Control File (.qcf). Previously, when setting either the “Base Cell Size”, “Model Origin and Extent” or “Orientation Angle” to “TGC” the Spatial Database command referenced in the QCF could be overwritten by the spatial database reference in the TGC.
6.4.2 Read GRID Location Now Works for New Raster Formats
Build 2023-03-AB now accepts GeoTIFF, GPKG raster, and NetCDF raster formats for the “Read Grid Location” command in the TGC. This was previously only working for Quadtree in the 2023-03-AA release.
6.4.3 Compression Now Allows for Size Increase
Build 2023-03-AB now allows compressed sizes to become bigger than the uncompressed size. This is applicable for both DEFLATE and LZW compression methods and for GeoTIFF and GPKG raster formats. These formats store data in blocks of tiles or strips and each block of data is compressed individually. Previously, if one of these compressed blocks became larger after compression, it would cause ERROR 0635. The 2023-03-AB release allows compression to get larger. Note this may only be for a single block of data and not necessarily reflect what is happening to the overall file size under compression.
6.4.4 Increase Primary Key Column Name Length
Build 2023-03-AB increases the allowed column name length for the GPKG primary key column name from 5 to 50. This only affects reading GPKG layers created from certain applications. For example, the QGIS Kart plugin when converting from a SHP to GPKG will create a primary key column name exceeding 5 characters (it used ‘auto_pk’).
6.4.5 Retry Loop for Locked GPKG Databases
Build 2023-03-AB will enter a retry loop while trying to open a GPKG for reading if the database is locked. The retry loop pauses for 1 second before trying again and to prevent an infinite retry loop, TUFLOW will error if it fails to open the database 10 times. Previously this could occur (very rarely) if more than one TUFLOW model was initialising at the same time and using the same input GPKG database(s).
6.4.6 GPKG XF File Naming
Build 2023-03-AB changes the file naming convention for GPKG XF file names. Previously the convention was to use the database name followed by all layer names (then conventional XF naming suffixes), for example:
2d_ztin_EG07_010.gpkg_2d_ztin_EG07_010_L_2d_ztin_EG07_010_P.d1.5m_T00001.xf4
However, because these file names can become very long, build 2023-03-AB has changed the approach and hashes the layer names into an 8 character long hexadecimal number which allows TUFLOW to check for file name consistency and limits the filename length, for example:
2d_ztin_EG07_010.gpkg_FEBB7271.d1.5m_T0001.xf4
6.4.7 Bug Fix “Reached maximum concurrent SQLite statements”
Build 2023-03-AB fixes a bug that could occur when another application had the same GPKG database open for editing while TUFLOW was trying to read it. This would only occur if the same database was opened and closed many times by TUFLOW. Previously ERROR 0636 and ERROR 0647 was triggered in these situations.
6.4.8 Bug Fix “ERROR 0636 – Issue stepping through SQLite query”
Build 2023-03-AB fixes a bug that could occur reading a GPKG layer that contained mis-matched ‘rowid’ and ‘fid’ attributes (‘rowid’ is an internal attribute within SQLite tables and ‘fid’ is the common name for the primary key column in a GPKG layer). This could occur for any number of reasons and the user has very little control over this. Note, there is no requirement for the ‘rowid’ and ‘fid’ column to be in sync and the GPKG layers are not considered corrupt or invalid if this happened.
If the ‘rowid’ and ‘fid’ columns became mis-matched, TUFLOW could sometimes produce “ERROR 0636 – Issue occurred stepping through SQLite Query”. This was caused by incorrect parameters passed to the SQLite API routines which has now fixed in TUFLOW 2023-03-AB.
6.4.9 GPKG Multi-Part Polygons
In build 2023-03-AB the treatment of multi-part polygons in GPKG are now identical to SHP/MIF. Previously they were treated slightly differently, they were treated as a single polygon with multiple rings. This will not affect the inputs in most cases except where different polygon parts from the same feature overlapped, or for some inputs in TUFLOW that allowed for multi-part features but not for polygons containing multiple rings (e.g. polygons with holes).
6.4.10 ERROR 0305 Triggering When There is No GPKG Projection
Build 2023-03-AB no longer triggers ERROR 0305 or WARNING 0305 when a GPKG Projection is not included in the TCF. Previously this message could trigger if a SHP Projection was included.
6.5 Minor Enhancements and Bug Fixes for 2023-03-AC
6.5.1 GPKG XF File Creation Causing “ERROR 0645”
Build 2023-03-AC fixes a bug when reading multiple GPKG layers on a single line using the Shapefile convention (e.g.
6.5.2 Bug Fix For GPKG When “fid” Was Not the Same as “rowid”
Build 2023-03-AC fixes a bug when reading a GPKG layer that contained entries where the GPKG “fid” column was not equal to the internal SQLite “rowid” column. This would cause an infinite loop of “ERROR 0636 - Issue occurred stepping through SQLite query” and a “Should not be here [stmntID=0]”.
6.5.3 Exit TUFLOW if an ERROR Occurs Reading GPKG
Build 2023-03-AC will correctly error and exit if an unexpected error occurs reading a GPKG layer. Previously certain errors could cause an infinite loop to occur (e.g. the error reported in Section 6.5.2)
6.5.4 Bug Fix GPKG Reading 1d_pit
Build 2023-03-AC fixes a bug that could cause TUFLOW to exit uncleanly when reading a GPKG 1d_pit layer. This error was caused when TUFLOW checked the 1d_pit column names which was introduced in build 2023-03-AB with the addition of the ability to add a time lag for virtual pits.