GPU-PostGIS

This chapter describes GPU-PostGIS

Overview

PostGIS is an extension to PostgreSQL to utilize geographic information. PostGIS provides data type (Geometry) for handling geographic data such as points, lines, and polygons, as well as a large number of functions and operators for evaluating geographic data elements, such as distance calculation, inclusion, and intersection determination. In addition, some of the operators can search faster by the R-Tree using GiST(Generalized Search Tree) mechanism included in PostgreSQL. Since the first version was released in 2001, it has been enhanced and maintained by the developer community for over 20 years.

These functions and operators provided by PostGIS are very large, over 500 in total. For this reason, PG-Strom has ported only a few relatively frequently used PostGIS functions to the GPU.

For example:

  • geometry st_point(float8 lon,float8 lat)
    • returns a point with the given longitude and latitude as a Point of Geometry type.
  • bool st_contains(geometry a,geometry b)
    • determines if the geometry a contains the geometry b or not.
  • bool st_crosses(geometry,geometry)
    • determines if the geometries intersect each other.
  • text st_relate(geometry,geometry)

PostGIS Usage

You can use GPU-PostGIS without any configurations.

PG-Strom will automatically determine if the PostGIS functions used in the query are executable on the GPU when PostGIS is installed from the package or the source code and the geometry data types and PostGIS functions are defined using the CREATE EXTENSION syntax.

Please refer to the PostGIS documentaion for installation.

For example, the following query uses the GPU-executable PostGIS funtion st_contains() and st_makepoint() to determine if a two-dimensional point read from the table is contained within the range of the geometry type constant 'polygon ((10 10,30 10,30 20,10 20,10 10))'.

As you can see from the fact that these functions are listed as part of the "GPU Filter:", PG-Strom will automatically detect supported PostGIS functions and attempt to run them on the GPU as much as possible.

=# explain select * from dpoints where st_contains('polygon ((10 10,30 10,30 20,10 20,10 10))', st_makepoint(x,y));

                              QUERY PLAN
------------------------------------------------------------------------------------------
 Custom Scan (GpuScan) on dpoints  (cost=1397205.10..12627630.76 rows=800 width=28)
   GPU Filter: st_contains('01030000000100000005000000000000000000244000000000000024400000000000003E4000000000000024400000000000003E4000000000000034400000000000002440000000000000344000000000000024400000000000002440'::geometry, st_makepoint(x, y))
   GPU Preference: GPU0 (NVIDIA Tesla V100-PCIE-16GB)
(3 rows)

GiST Index

Some of the PostGIS functions that evaluate relationships between geometries, such as st_contains() and st_crosses(), support the GiST index (R-Tree), which enables fast refinement of the search using only the CPU. GpuJoin in PG-Strom sometimes transfers not only the contents of the table but also GiST index (R-Tree) to filter the rows to be joined fast when the join condition between tables can be accelerated. This process is usually executed at a much higher parallelism level than the CPU, so a significant speedup can be expected.

On the other hand, GpuScan does not use GiST index to scan a single table. This is because IndexScan filtering by CPU is often faster.

The following is an example of a SQL statement to create a GiST index on city boundary data ("geom" column of "giscity" table).

=# CREATE INDEX on giscity USING gist (geom);
CREATE INDEX

The following is an execution plan of SQL that joins municipal boundary data ("giscity" table) and latitude and longitude data ("dpoints" table) and outputs the number of latitude and longitude data (points) contained in the area expressed as polygons for each municipality.

The optimizer selects GpuJoin, and GpuGiSTJoin to join "giscity" table with "dpoints" table. The "IndexFilter:" line shows that the filtering condition on the GiST index is (g.geom ~ st_makepoint(d.x, d.y)) and the index giscity_geom_idx will be used. The execution of PostGIS functions is a relatively "heavy" process even for GPU. By using GiST index, we can eliminate combinations that obviously do not match the condition and speed up the search process significantly.

=# EXPLAIN
   SELECT pref, city, count(*)
     FROM giscity g, dpoints d
    WHERE pref = 'Tokyo' AND st_contains(g.geom,st_makepoint(d.x, d.y))
    GROUP BY pref, city;
                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=5700646.35..5700759.39 rows=5024 width=29)
   Group Key: g.n03_001, g.n03_004
   ->  Sort  (cost=5700646.35..5700658.91 rows=5024 width=29)
         Sort Key: g.n03_004
         ->  Custom Scan (GpuPreAgg)  (cost=5700274.71..5700337.51 rows=5024 width=29)
               Reduction: Local
               Combined GpuJoin: enabled
               GPU Preference: GPU0 (NVIDIA Tesla V100-PCIE-16GB)
               ->  Custom Scan (GpuJoin) on dpoints d  (cost=638671.58..5668511.23 rows=50821573 width=21)
                     Outer Scan: dpoints d  (cost=0.00..141628.18 rows=7999618 width=16)
                     Depth 1: GpuGiSTJoin(nrows 7999618...50821573)
                              HeapSize: 3251.36KB
                              IndexFilter: (g.geom ~ st_makepoint(d.x, d.y)) on giscity_geom_idx
                              JoinQuals: st_contains(g.geom, st_makepoint(d.x, d.y))
                     GPU Preference: GPU0 (NVIDIA Tesla V100-PCIE-16GB)
                     ->  Seq Scan on giscity g  (cost=0.00..8929.24 rows=6353 width=1883)
                           Filter: ((pref)::text = 'Tokyo'::text)
(17 rows)