Section 7 GIS Formats
The 2023-03 Release increases support for GIS formats with new vector and raster formats supported for both reading and writing of GIS data. The newly supported formats are GeoPackage (for both vector and raster data) and GeoTIFF (for raster data only). The 2023-03 Release also extends capabilities for the existing supported NetCDF raster format. The sections below describe the new and extended formats.
7.1 GeoPackage Format
GeoPackage vector and raster support has been added for the 2023-03 Release for reading and writing of both vector and raster data. GeoPackage (https://www.geopackage.org/) is a widely supported, open format, built upon an SQLite database (this is stored as a single file with the extension .gpkg). The benefits of this format include:
- More than one layer can be stored in a single file. For example, all model inputs can be stored in a single GeoPackage database.
- It supports spatial indexing, making it much faster to work within GIS packages.
- Faster to write from TUFLOW than Shapefile format.
Initially this was added to address some issues with the existing supported vector file formats, notably slow writing and viewing of Shapefiles for large datasets. The MapInfo (MIF/MID) file format is much faster to write from TUFLOW than the Shapefile format, however, loading, rendering, and interacting with the MIF file in GIS packages is much slower than either the Shapefile or GeoPackage. When loading either a MapInfo or Shapefile layer into GIS, typically all the data is loaded and processed when panning the map window, even the ones that are outside of the current view. GeoPackage supports spatial indexing, so that when the view is panned or changed a limited set of objects (within that view) need to be drawn, this makes rendering and interacting with the data much faster.
- Load the file into QGIS and view the grid for the entire model.
- Zoom to a small portion of the model and redraw.
Table 7.1 shows approximate times taken for TUFLOW to:
- Write the grid check files. Note, the model has approximately 1.3 million 2D cells, all data was written to local SSD and loaded into QGIS (3.22.1)
- Load the file into QGIS and view the grid for the entire model.
- Zoom to a small portion of the model and redraw.
GIS Format | Write grid check (seconds) | Load into QGIS and draw all grid cells (seconds) | Zoom into to small area (seconds) |
---|---|---|---|
MIF |
|
|
|
SHP |
|
|
|
GPKG |
|
|
|
7.1.1 GeoPackage Projection
A projection can be defined (used to set the projection used by TUFLOW for GeoPackage files), with the following .tcf command:
This needs to reference an existing GeoPackage file which can contain either vector or raster data. To create a new GeoPackage file and set the projection in QGIS the following steps are required:
- From the menu select Layer >> Create Layer >> New GeoPackage Layer.
- In the New GeoPackage Layer dialogue:
- Set a database filename and location (e.g. projection.gpkg).
- Set a table name, typically this would be the same as the filename without the extension.
- Choose a geometry type (it shouldn’t matter which type is selected, but no coordinate system can be set without a type).
- Set the CRS for your model.
- Select “OK”, the GeoPackage file is now ready to be used in TUFLOW.
7.1.2 GeoPackage Vector
7.1.2.1 GeoPackage Vector Inputs
The GIS outputs from TUFLOW can be set to GPKG by using the following command:
Since a GeoPackage database can have more than one layer, when reading data into TUFLOW a GeoPackage file and table may need to be specified. The following options are possible:
To specify a table in a .gpkg use “>>”:
Read GIS Z Shape == gis\2d_zsh.gpkg >> 2d_zsh_L Specify a .gpkg file path only. It will assume the table name is the same as the .gpkg database name. For example, the below commands are equivalent:
Read GIS Z Shape == gis\2d_zsh_L.gpkg Read GIS Z Shape == gis\2d_zsh_L.gpkg >> 2d_zsh_L To specify more than one table in a database in the same command line by using “&&”. This is similar to “|” however there is no need to reference the database again:
Read GIS Z Shape == gis\2d_zsh.gpkg >> 2d_zsh_L && 2d_zsh_P “&&” can be used in conjunction with “|”:
Read GIS Z Shape == gis\2d_zsh_R.gpkg | gis\2d_zsh.gpkg >> 2d_zsh_L && 2d_zsh_P Use the command “USE ALL” with “>>” to tell TUFLOW to use all tables in the database:
Read GIS Z Shape == gis\2d_zsh.gpkg >> USE ALL Specify a database to use for subsequent inputs, which can be changed as inputs are read in:
Spatial Database == gis\2d_zsh.gpkg Read GIS Z Shape == 2d_zsh_L | 2d_zsh_P Spatial Database == gis\2d_mat.gpkg Read GIS Mat == 2d_mat_R This command can be used in the .tcf, .ecf, .tbc, .tgc, .qcf, .tef. Commands are localised to their relevant control file with the exception of the .tcf which acts as a global command - local spatial database will take precedence. The spatial database can be turned off or reverted to the .tcf database with the same command. Turning it off is a global command even when used in a control file other than the .tcf. Regardless of the default GIS extension, if there is an open database TUFLOW will not append “.mif” to the end of the table name.
Spatial Database == OFF | TCF
7.1.2.2 GeoPackage Vector Outputs
Outputs can be written into separate databases or grouped:
Grouped databases will group by output folder location. Separate databases will still group geometries together e.g., PLOT_P, PLOT_L, PLOT_R will be written to one database.
7.1.3 GeoPackage Conversion Check
GPKG layers include a primary key attribute (as an integer value) which is sometimes included in the attribute table in GIS programs (this happens in QGIS). This is completely fine for GPKG layers as TUFLOW expects this attribute to be present within the format, however if the layer is converted to a different format (e.g. shapefile) the primary key is sometimes retained as an attribute in the converted file. This can cause issues in TUFLOW as this will not always throw an error since some TUFLOW layer types are expected to contain an integer value in the first column. For example, this will happen with quadtree nesting level (2d_qnl) layers and TUFLOW will interpret the primary key as a nesting level value (and the actual nesting level value will be ignored). Typically the name of the primary key is “fid” in GPKG layers (however this is not necessarily always the case).
TUFLOW will produce ERROR 0248 if it finds the first field name is ‘fid’ for shapefiles and mif files as this is an indicator that the layer has been incorrectly converted from GPKG. This message can be changed to a warning by using the following command:
7.1.4 GeoPackage Raster
TUFLOW supports the GeoPackage raster format via the ‘Gridded Tiled Coverage’ extension to the standard GeoPackage specification. This extension is supported by QGIS and GDAL, however to the best of knowledge of the TUFLOW team at the time of the 2023-03 Release, it is not currently supported by either MapInfo or ArcGIS.
The format uses a tiled structure to make rendering and loading faster by enabling the ability to only process the required tiles. The tiled structure also makes pyramids (sometimes referred to as ‘overviews’) inherently available for the format. Note, the 2023-03 Release does not support pyramid creation yet, however, this functionality may be added in a later Release.
7.1.4.1 GeoPackage Raster Inputs
Using a GPKG raster is done in the same manner as a GPKG vector layer (Section 7.1.2) but using the ‘Read GRID’ command, for example either of the below commands:
The GPKG projection command stated in (Section 7.1.1) will also work for GPKG raster layers.
7.1.4.2 GeoPackage Raster Outputs
The following existing output commands can be extended to use GPKG:
The GPKG raster outputs will also be grouped if the grouping spatial database command is used from Section 7.1.2 (e.g. the DEM_M and DEM_Z will be in the same GPKG database as the 2D check vector layers).
7.1.4.3 GeoPackage Raster Compression
The GPKG raster format supports LZW compression of the data. The data is compressed by default, however, can be turned off with the following command:
By default, TUFLOW will also use a compression predictor to improve the compression ratio, this can be turned off using the following command:
Limitation:
- Currently TUFLOW only supports GPKG raster data containing 32-bit floating point data. This is limited to the ‘Gridded Tiled Coverage’ extension as the native raster support in GPKG currently only supports PNG and JPEG encoding.
7.2 GeoTIFF Format
The GeoTIFF raster format is supported in the 2023-03 Release for both inputs and outputs. The following command will set GeoTIFF as the default grid format:
The GeoTIFF raster format can be added as a gridded map output:
A projection can be set for the output GeoTIFF rasters by using the following command in the TCF:
Only the header information is read in from the file with this command, so it is safe to use large files without any negative impacts on start-up speed. A projection is not required to output to the GeoTIFF format, however it is required to include the spatial reference system in the output GeoTIFF. Currently there is no input projection checking of raster layers (like there is for GIS vector inputs) and the projection is only used for the GeoTIFF outputs.
TUFLOW supports several compression methods for GeoTIFF:
- LZW (Read/Write)
- DEFLATE (Read/Write)
- PACKBITS (Read only)
The data is compressed by default when writing to GeoTIFF format using the ‘deflate’ method, however, can be changed or turned off using the following command:
By default, TUFLOW will also use a compression predictor to improve the compression ratio, this can be turned off using the following command:
Testing has shown that using deflate compression with horizontal differencing will typically achieve better compression than the typical method of using a ZIP file on the same uncompressed data.
TUFLOW will default to using all available CPU cores when reading/writing GeoTIFF files which can speed up processing when using compression. This can be changed by specifying the number of threads using the command line argument “-nt[thread count]”.
Limitations:
- Currently TUFLOW only supports GeoTIFFs containing 32-bit floating point data.
- Currently does not support ‘Cloud Optimised GeoTIFFs’ or the tiled GeoTIFF format.
- GeoTIFFs support multiple raster bands, however TUFLOW will currently assume the input dataset is within the first raster band.
- Currently TUFLOW only supports unrotated GeoTIFF rasters.
7.3 NetCDF Grid
NetCDF grids are now supported as standard raster inputs for all “
The NetCDF raster inputs should follow the NetCDF CF Convention and are treated as a database within TUFLOW (similar to GPKG). The new database input command options are:
and
Limitation:
- NetCDF rasters support multiple raster bands, however TUFLOW will currently assume the input dataset is within the first band.
Note, rainfall and external stress NetCDF inputs remain unchanged and do not use the new syntax.
7.4 2023-03-AB Minor Enhancements and Bug Fixes
7.4.1 Spatial Database Command Now Works in Quadtree Control File
Build 2023-03-AB fixes an issue when using the “Spatial Database” command in the Quadtree Control File (.qcf). Previously, when setting either the “Base Cell Size”, “Model Origin and Extent” or “Orientation Angle” to “TGC” the Spatial Database command referenced in the QCF could be overwritten by the spatial database reference in the TGC.
7.4.2 Read GRID Location Now Works for New Raster Formats
Build 2023-03-AB now accepts GeoTIFF, GPKG raster, and NetCDF raster formats for the “Read Grid Location” command in the TGC. This was previously only working for Quadtree in the 2023-03-AA build
7.4.3 Compression Now Allows for Size Increase
Build 2023-03-AB now allows compressed sizes to become bigger than the uncompressed size. This is applicable for both DEFLATE and LZW compression methods and for GeoTIFF and GPKG raster formats. These formats store data in blocks of tiles or strips and each block of data is compressed individually. Previously, if one of these compressed blocks became larger after compression, it would cause ERROR 0635. The 2023-03-AB release allows compression to get larger. Note this may only be for a single block of data and not necessarily reflect what is happening to the overall file size under compression.
7.4.4 Increase Primary Key Column Name Length
Build 2023-03-AB increases the allowed column name length for the GPKG primary key column name from 5 to 50. This only affects reading GPKG layers created from certain applications. For example, the QGIS Kart plugin when converting from a SHP to GPKG will create a primary key column name exceeding 5 characters (it used ‘auto_pk’).
7.4.5 Retry Loop for Locked GPKG Databases
Build 2023-03-AB will enter a retry loop while trying to open a GPKG for reading if the database is locked. The retry loop pauses for 1 second before trying again and to prevent an infinite retry loop, TUFLOW will error if it fails to open the database 10 times. Previously this could occur (very rarely) if more than one TUFLOW model was initialising at the same time and using the same input GPKG database(s).
7.4.6 GPKG XF File Naming
Build 2023-03-AB changes the file naming convention for GPKG XF file names. Previously the convention was to use the database name followed by all layer names (then conventional XF naming suffixes), for example:
2d_ztin_EG07_010.gpkg_2d_ztin_EG07_010_L_2d_ztin_EG07_010_P.d1.5m_T00001.xf4
However, because these file names can become very long, build 2023-03-AB has changed the approach and hashes the layer names into an 8 character long hexadecimal number which allows TUFLOW to check for file name consistency and limits the filename length, for example:
2d_ztin_EG07_010.gpkg_FEBB7271.d1.5m_T0001.xf4
7.4.7 Bug Fix “Reached maximum concurrent SQLite statements”
Build 2023-03-AB fixes a bug that could occur when another application had the same GPKG database open for editing while TUFLOW was trying to read it. This would only occur if the same database was opened and closed many times by TUFLOW. Previously ERROR 0636 and ERROR 0647 was triggered in these situations.
7.4.8 Bug Fix “ERROR 0636 – Issue stepping through SQLite query”
Build 2023-03-AB fixes a bug that could occur reading a GPKG layer that contained mis-matched ‘rowid’ and ‘fid’ attributes (‘rowid’ is an internal attribute within SQLite tables and ‘fid’ is the common name for the primary key column in a GPKG layer). This could occur for any number of reasons and the user has very little control over this. Note, there is no requirement for the ‘rowid’ and ‘fid’ column to be in sync and the GPKG layers are not considered corrupt or invalid if this happened.
If the ‘rowid’ and ‘fid’ columns became mis-matched, TUFLOW could sometimes produce “ERROR 0636 – Issue occurred stepping through SQLite Query”. This was caused by incorrect parameters passed to the SQLite API routines which has now fixed in TUFLOW 2023-03-AB.
7.4.9 GPKG Multi-Part Polygons
In build 2023-03-AB the treatment of multi-part polygons in GPKG are now identical to SHP/MIF. Previously they were treated slightly differently, they were treated as a single polygon with multiple rings. This will not affect the inputs in most cases except where different polygon parts from the same feature overlapped, or for some inputs in TUFLOW that allowed for multi-part features but not for polygons containing multiple rings (e.g. polygons with holes).
7.4.10 ERROR 0305 Triggering When There is No GPKG Projection
Build 2023-03-AB no longer triggers ERROR 0305 or WARNING 0305 when a GPKG Projection is not included in the TCF. Previously this message could trigger if a SHP Projection was included.
7.5 2023-03-AC Minor Enhancements and Bug Fixes
7.5.1 GPKG XF File Creation Causing “ERROR 0645”
Build 2023-03-AC fixes a bug when reading multiple GPKG layers on a single line using the Shapefile convention (e.g.
7.5.2 Bug Fix For GPKG When “fid” Was Not the Same as “rowid”
Build 2023-03-AC fixes a bug when reading a GPKG layer that contained entries where the GPKG “fid” column was not equal to the internal SQLite “rowid” column. This would cause an infinite loop of “ERROR 0636 - Issue occurred stepping through SQLite query” and a “Should not be here [stmntID=0]”.
7.5.3 Exit TUFLOW if an ERROR Occurs Reading GPKG
Build 2023-03-AC will correctly error and exit if an unexpected error occurs reading a GPKG layer. Previously certain errors could cause an infinite loop to occur (e.g. the error reported in Section 7.5.2)
7.5.4 Bug Fix GPKG Reading 1d_pit
Build 2023-03-AC fixes a bug that could cause TUFLOW to exit uncleanly when reading a GPKG 1d_pit layer. This error was caused when TUFLOW checked the 1d_pit column names which was introduced in build 2023-03-AB with the addition of the ability to add a time lag for virtual pits.
7.6 2023-03-AD Minor Enhancements and Bug Fixes
7.6.1 Bug Fix - Crash While Reading GeoTIFF Inputs
Build 2023-03-AD fixes a bug that could cause TUFLOW to crash while reading a GeoTIFF input. This crash was caused by a stack overflow and could occur when reading very wide raster inputs (>15,000 columns).
7.6.2 Raster Cell Shape Check
Build 2023-03-AD adds a check on the cell shape of an input raster. If the cell width is not equal to the cell height (i.e. not square), TUFLOW will produce “ERROR 0657 - Grid does not use square cells”.
7.6.3 GeoTIFF Coordinate Reference System Check
Build 2023-03-AD adds a check when reading a GeoTIFF raster on the Coordinate Reference System (CRS). If the CRS is a geographic type (i.e. it uses spherical coordinates), TUFLOW will produce “ERROR 0635 - GeoTIFF is using a geographic coordinate reference system”.
Similarly, if a GeoTIFF using a geographic CRS is input with the “
7.6.4 GPKG Projection Accepts String Input
Build 2023-03-AD enhances the GPKG projection command so that it can accept a string representation of the projection:
The string is a representation of the table entry that would be entered into the metadata table “gpkg_spatial_ref_sys” with columns delimited by “|”. There is expected to be 5 columns:
An example below is presented using the projection used in the TUFLOW tutorial model:
*Note: the definition is typically written out in a Well Known Text Format (WKT) as published by the Open Geospatial Consortium (OGC) (this is typically the same as what is in the .prj file accompanying Shape Files) and hasn’t been written out fully in this example.
The GPKG Projection has been updated in the TUFLOW Log File (TLF) to write out in the format above such that it can be copied into the TCF.
7.6.5 Bug Fix - Spatial Database == [path/to/gpkg] When Used in a Read File Within The TCF
Build 2023-03-AD fixes a bug that would not correctly override the current TCF spatial database when using “
Note: for all other control files, any “
7.6.6 Bug Fix - Spatial Database == OFF When Used in a Read File
Build 2023-03-AD fixes a bug that could cause the spatial database to not be turned off correctly when using the command “
7.6.7 GPKG Fully Implemented in Package Model Functionality
Build 2023-03-AD incorporates GPKG fully into the package model functionality. Prior to 2023-03-AD, GPKG files would be copied only if specified in “
Build 2023-03-AD also stops “CHECK 0712 - No file extension for GIS layer assuming .mif” from being erroneously written to the package model log file (pm.log) when copying GPKG inputs used in conjunction with “
The package model functionality will not copy GPKG databases multiple times when used in conjunction with “
- “
Spatial Database == database.gpkg ” - database.gpkg will not be copied multiple times (as layers are read in subsequent commands) - “
database.gpkg >> layer1 && layer2 ” - database.gpkg will not be copied multiple times - “
database.gpkg >> layer1 | database.gpkg >> layer2 ” - database.gpkg will be copied twice - “
database.gpkg >> layer1 ” - database.gpkg will be copied each time it is referenced in this way - “
database.gpkg ” - database.gpkg will be copied each time it is referenced in this way
Note: The package model functionality does not process the inputs and copies layers using a system copy. Therefore the entire database will be copied each time, including all layers contained within, which may contain unwanted or superseded inputs.
7.6.8 GPKG Reading - Change in Default Journal Mode
Build 2023-03-AD changes the default SQLite journal mode when TUFLOW reads a GPKG input. The journal mode is now un-altered when reading inputs and the current journal mode of the database is unchanged. Prior to 2023-03-AD, TUFLOW would always set the SQlite journal mode to ‘WAL’ (Write Ahead Log).
This change ensures that TUFLOW does not modify the input GPKG when it opens it for reading. Previously, if the GPKG was set to a journal mode other than ‘WAL’, the GPKG file would be modified when TUFLOW opened it and changed the journal mode. This could cause issues if any scripts, setup by the user, were running to check for file modification date changes (although no layer/input was modified).
Users should be aware that QGIS typically opens GPKGs with the journal mode set to ‘DELETE’ when viewing layers and will change to ‘WAL’ for editing/writing. This would cause modification dates to change when alternating between running TUFLOW and viewing inputs in QGIS.
TUFLOW still uses ‘WAL’ journal mode for output GPKG layers.
7.6.9 Support for User SQL When GPKG Databases are Opened
Build 2023-03-AD adds support for custom user SQLite SQL that will be executed when a GPKG database is opened. This has been added with the intention to give users the ability to change general SQLite database settings (e.g. the SQLite journal mode) and negates the need to have custom TUFLOW commands for each available setting. The command is intended for use with SQLite ‘Pragma’ commands and not for adding, removing, or selecting from tables. A list and description of SQLite Pragma commands can be found on sqlite.org.
Optional ‘Read’ and ‘Write’ context keywords can be used to restrict the SQL to only when TUFLOW is reading a GPKG or writing a GPKG. If no additional keyword is used (the default), the SQL is applied to both reading and writing contexts.
An example of using this command to change the SQLite journal mode to ‘WAL’ (generally SQLite is not case sensitive):
User commands are applied after any default settings used by TUFLOW, and will take precedence. Multiple commands can be specified by separating the commands with a semicolon (‘;’) on a single line, otherwise using the TUFLOW command multiple times will overwrite previous settings.
7.6.10 GPKG Improved Error Messaging if Polygon/Polyline Vertex Count Exceeds TUFLOW Limit
Build 2023-03-AD improves the error messaging reported by TUFLOW when it exceeds the TUFLOW vertex limit for GPKG layers. “ERROR 0310 - Exceeded number of allowable nodes in a single polyline or region. Requested (part no. 1, ring no. 6357) 500034 Limit 500000” is now reported, similar to the message reported by Shapefiles for the same restriction.
In the error message, “part no.” refers to the current polygon part that is being processed (only applicable if reading in a multi-part geometry) and the “ring no.” refers to the polygon ring number that is currently being processed (multiple polygon rings often represent holes in the polygon). GPKG counts the number of vertices as it processes each ring, therefore the requested number may not be the final vertex count (it is the vertex count at the ring it is currently processing).
7.6.11 Bug Fix - GeoTIFF Grids Being Cutoff
Build 2023-03-AD fixes a bug when outputting GeoTIFF grids where the output grid was not covering the entire model extent for really wide or long grids (more likely to happen with high-resolution outputs). This error occurred when the height or width of the grid was greater than 65,535 cells.
7.6.12 Bug Fix - GPKG XF Files Not Updating
Build 2023-03-AD fixes a bug that could occur when using a GPKG input layer, if the GPKG input layer was modified after an XF files had previously been written. Previously, TUFLOW could incorrectly use the XF file, even if the GPKG layer is newer, due to a mis-match in timezones between the modification date stored in the GPKG (UTC) and the XF file (local).
7.6.13 Enhances Handling of Line Breaks in Attributes
Build 2023-03-AD enhances how accidental line breaks are handled in GIS attributes. If a line break is encountered, build 2023-03-AD will replace the line break with a space and report CHECK 0659.
Example:
The location of the line break in the attribute value will be denoted using “\n”.