Index Support

Overview

PostgreSQL supports several index strategies. The default is B-tree that can rapidly fetch records with a particular value. Elsewhere, it also supports Hash, BRIN, GiST, GIN and others that have own characteristics for each. PG-Strom supports only BRIN-index right now.

BRIN-index works efficiently on the dataset we can expect physically neighbor records have similar key values, like timestamp values of time-series data. It allows to skip blocks-range if any records in the range obviously don't match to the scan qualifiers, then, also enables to reduce the amount of I/O due to full table scan.

PG-Strom also utilizes the feature of BRIN-index, to skip obviously unnecessary blocks from the ones to be loaded to GPU.

BRIN-index Ovewview

Configuration

No special configurations are needed to use BRIN-index.

PG-Strom automatically applies BRIN-index based scan if BRIN-index is configured on the referenced columns and scan qualifiers are suitable to the index.

Also see the PostgreSQL Documentation for the BRIN-index feature.

By the GUC parameters below, PG-Strom enables/disables usage of BRIN-index. It usually don't need to change from the default configuration, except for debugging or trouble shooting.

Parameter Type Default Description
pg_strom.enable_brin bool on enables/disables usage of BRIN-index

Operations

By EXPLAIN, we can check whether BRIN-index is in use.

postgres=# EXPLAIN ANALYZE
           SELECT * FROM dt
            WHERE ymd BETWEEN '2018-01-01' AND '2018-12-31'
              AND cat LIKE '%aaa%';
                                   QUERY PLAN
--------------------------------------------------------------------------------
 Custom Scan (GpuScan) on dt  (cost=94810.93..176275.00 rows=169992 width=44)
                              (actual time=1777.819..1901.537 rows=175277 loops=1)
   GPU Filter: ((ymd >= '2018-01-01'::date) AND (ymd <= '2018-12-31'::date) AND (cat ~~ '%aaa%'::text))
   Rows Removed by GPU Filter: 4385491
   BRIN cond: ((ymd >= '2018-01-01'::date) AND (ymd <= '2018-12-31'::date))
   BRIN skipped: 424704
 Planning time: 0.529 ms
 Execution time: 2323.063 ms
(7 rows)

In the example above, BRIN-index is configured on the ymd column. BRIN cond shows the qualifier of BRIN-index for concentration. BRIN skipped shows the number of skipped blocks actually.

In this case, 424704 blocks are skipped, then, it filters out 4385491 rows in the loaded blocks by the scan qualifiers.

GpuJoin and GpuPreAgg often pulls up its underlying table scan and runs the scan by itself, to reduce inefficient data transfer. In this case, it also uses the BRIN-index to concentrate the scan.

The example below shows a usage of BRIN-index in a query which includes GROUP BY.

postgres=# EXPLAIN ANALYZE
           SELECT cat,count(*)
             FROM dt WHERE ymd BETWEEN '2018-01-01' AND '2018-12-31'
         GROUP BY cat;
                                   QUERY PLAN
--------------------------------------------------------------------------------
 GroupAggregate  (cost=6149.78..6151.86 rows=26 width=12)
                 (actual time=427.482..427.499 rows=26 loops=1)
   Group Key: cat
   ->  Sort  (cost=6149.78..6150.24 rows=182 width=12)
             (actual time=427.465..427.467 rows=26 loops=1)
         Sort Key: cat
         Sort Method: quicksort  Memory: 26kB
         ->  Custom Scan (GpuPreAgg) on dt  (cost=6140.68..6142.95 rows=182 width=12)
                                            (actual time=427.331..427.339 rows=26 loops=1)
               Reduction: Local
               Outer Scan: dt  (cost=4000.00..4011.99 rows=4541187 width=4)
                               (actual time=78.573..415.961 rows=4560768 loops=1)
               Outer Scan Filter: ((ymd >= '2018-01-01'::date) AND (ymd <= '2018-12-31'::date))
               Rows Removed by Outer Scan Filter: 15564
               BRIN cond: ((ymd >= '2018-01-01'::date) AND (ymd <= '2018-12-31'::date))
               BRIN skipped: 424704
 Planning time: 30.992 ms
 Execution time: 818.994 ms
(14 rows)