In-memory Columnar Cache

Overview

PG-Strom has one another feature related to storage to supply processors data stream.

In-memory columnar cache reads data blocks of the target table, convert the row-format of PostgreSQL to columnar format which is suitable for summary and analytics, and cache them on memory.

This feature requires no special hardware like SSD-to-GPU Direct SQL Execution, on the other hands, RAM capacity is still smaller than SSD, so this feature is suitable to handle "not a large scale data set" up to 60%-75% of the system RAM size.

This feature is not "a columnar store". It means cached and converted data blocks are flashed once PostgreSQL server process has restarted for example. When any cached rows get updated, PG-Strom invalidates the columnar cache block which contains the updated rows. This design on the basis that columnar format is vulnerable to updating workloads. If we try to update columnar-store with keeping consistency towards update of row-store, huge degradation of write performance is not avoidable. On the other hands, it is lightweight operation to invalidate the columnar cache block which contains the updated row. PG-Strom can switch GPU kernels to be invoked for row- or columnar-format according to format of the loading data blocks. So, it works flexibly, even if a columnar cache block gets invalidated thus PG-Strom has to load data blocks from the shared buffer of PostgreSQL.

overview of in-memory columnar cache

System Setup

Location of the columnar cache

The pg_strom.ccache_base_dir parameter allows to specify the path to store the columnar cache. The default is /dev/shm where general Linux distribution mounts tmpfs filesystem, so files under the directory are "volatile", with no backing store.

Custom configuration of the parameter enables to construct columnar cache on larger and reasonably fast storage, like NVMe-SSD, as backing store. However, note that update of the cached rows invalidates whole of the chunk (128MB) which contains the updated rows. It may lead unexpected performance degradation, if workloads have frequent read / write involving I/O operations.

Columnar Cache Builder Configuration

PG-Strom can build in-memory columnar cache automatically and asynchronously using one or multiple background workers. These background workers are called columnar cache builder.

Columnar cache builder scans the target tables to construct columnar cache in the specified database, by round-robin, then converts to columnar format and keep it on the cache. It is an asynchronous job from the backend process which handles user's SQL.

Once a columnar cache is built, any other backend process can reference them. PG-Strom never construct columnar cache on demand, unlike usual disk cache mechanism, even if it is access to the area where columnar cache is not built yet. In this case, PG-Strom loads row-data through the normal storage system of PostgreSQL.

The number of columnar cache builders are fixed on the startup, so you need to setup pg_strom.ccache_num_builders parameters then restart PostgreSQL to increase the number of workers.

The pg_strom.ccache_databases parameter configures the databases associated with columnar cache builders. It requires superuser privilege to setup, and is updatable on PostgreSQL running. (Of course, it is possible to assign by postgresql.conf configuration on startup.)

Once a comma separated list of database names are assigned, columnar cache builders are associated to the specified databases in rotation. For example, if 5 columnar cache builders are running then 3 databases (postgres,my_test,benchmark) are assigned on the pg_strom.ccache_databases, 2 columnar cache builders are assigned on the postgres and my_test database for each, and 1 columnar cache builder is assigned on the benchmark database.

Source Table Configuration

DBA needs to specify the target tables to build columnar cache.

A SQL function pgstrom_ccache_enabled(regclass) adds the supplied table as target to build columnar cache. Other way round, a SQL function pgstrom_ccache_disabled(regclass) drops the supplied table from the target to build.

Internally, it is implemented as a special trigger function which invalidate columnar cache on write to the target tables. It means we don't build columnar cache on the tables which have no way to invalidate columnar cache.

postgres=# select pgstrom_ccache_enabled('t0');
 pgstrom_ccache_enabled
------------------------
 enabled
(1 row)

Operations

Check status of columnar cache

pgstrom.ccache_info provides the status of the current columnar cache.

You can check the table, block number, cache creation time and last access time per chunk.

contrib_regression_pg_strom=# SELECT * FROM pgstrom.ccache_info ;
 database_id | table_id | block_nr | nitems  |  length   |             ctime             |             atime
-------------+----------+----------+---------+-----------+-------------------------------+-------------------------------
       13323 | 25887    |   622592 | 1966080 | 121897472 | 2018-02-18 14:31:30.898389+09 | 2018-02-18 14:38:43.711287+09
       13323 | 25887    |   425984 | 1966080 | 121897472 | 2018-02-18 14:28:39.356952+09 | 2018-02-18 14:38:43.514788+09
       13323 | 25887    |    98304 | 1966080 | 121897472 | 2018-02-18 14:28:01.542261+09 | 2018-02-18 14:38:42.930281+09
         :       :             :         :          :                :                               :
       13323 | 25887    |    16384 | 1963079 | 121711472 | 2018-02-18 14:28:00.647021+09 | 2018-02-18 14:38:42.909112+09
       13323 | 25887    |   737280 | 1966080 | 121897472 | 2018-02-18 14:34:32.249899+09 | 2018-02-18 14:38:43.882029+09
       13323 | 25887    |   770048 | 1966080 | 121897472 | 2018-02-18 14:28:57.321121+09 | 2018-02-18 14:38:43.90157+09
(50 rows)

Check usage of columnar cache

You can check whether a particular query may reference columnar cache, or not, using EXPLAIN command.

The query below joins the table t0 and t1, and the Custom Scan (GpuJoin) which contains scan on the t0 shows CCache: enabled. It means columnar cache may be referenced at the scan on t0, however, it is not certain whether it is actually referenced until query execution. Columnar cache may be invalidated by the concurrent updates.

postgres=# EXPLAIN SELECT id,ax FROM t0 NATURAL JOIN t1 WHERE aid < 1000;

                                  QUERY PLAN
-------------------------------------------------------------------------------
 Custom Scan (GpuJoin) on t0  (cost=12398.65..858048.45 rows=1029348 width=12)
   GPU Projection: t0.id, t1.ax
   Outer Scan: t0  (cost=10277.55..864623.44 rows=1029348 width=8)
   Outer Scan Filter: (aid < 1000)
   Depth 1: GpuHashJoin  (nrows 1029348...1029348)
            HashKeys: t0.aid
            JoinQuals: (t0.aid = t1.aid)
            KDS-Hash (size: 10.78MB)
   CCache: enabled
   ->  Seq Scan on t1  (cost=0.00..1935.00 rows=100000 width=12)
(10 rows)

EXPLAIN ANALYZE command tells how many times columnar cache is referenced during the query execution.

After the execution of this query, Custom Scan (GpuJoin) which contains scan on t0 shows CCache Hits: 50. It means that columnar cache is referenced 50 times. Because the chunk size of columnar cache is 128MB, storage access is replaced to the columnar cache by 6.4GB.

postgres=# EXPLAIN ANALYZE SELECT id,ax FROM t0 NATURAL JOIN t1 WHERE aid < 1000;

                                    QUERY PLAN

-------------------------------------------------------------------------------------------
 Custom Scan (GpuJoin) on t0  (cost=12398.65..858048.45 rows=1029348 width=12)
                              (actual time=91.766..723.549 rows=1000224 loops=1)
   GPU Projection: t0.id, t1.ax
   Outer Scan: t0  (cost=10277.55..864623.44 rows=1029348 width=8)
                   (actual time=7.129..398.270 rows=100000000 loops=1)
   Outer Scan Filter: (aid < 1000)
   Rows Removed by Outer Scan Filter: 98999776
   Depth 1: GpuHashJoin  (plan nrows: 1029348...1029348, actual nrows: 1000224...1000224)
            HashKeys: t0.aid
            JoinQuals: (t0.aid = t1.aid)
            KDS-Hash (size plan: 10.78MB, exec: 64.00MB)
   CCache Hits: 50
   ->  Seq Scan on t1  (cost=0.00..1935.00 rows=100000 width=12)
                       (actual time=0.011..13.542 rows=100000 loops=1)
 Planning time: 23.390 ms
 Execution time: 1409.073 ms
(13 rows)

Attension for DROP DATABASE command

When columnar cache builder constructs columnar cache asynchronously, background worker process has internally connected to the specified database. When DROP DATABASE command tries to drop a database, PostgreSQL checks whether any session connects to the database. At that time, even if no user session connects to the database, DROP DATABASE will fail by columnar cache builder which keeps connection to the database.

Please remove the database name from the pg_strom.ccache_databases parameter prior to execution of DROP DATABASE command. Columnar cache builder will restart soon, then tries to connect databases according to the new configuration.