PG-Strom v2.3 Release

PG-Strom Development Team (1-Apr-2020)

Overview

Major changes in PG-Strom v2.3 includes:

GpuJoin supports parallel construction of inner buffer
Arrow_Fdw now becomes writable; supports INSERT/TRUNCATE.
pg2arrow command supports 'append' mode.
mysql2arrow command was added.

Prerequisites

PostgreSQL v10, v11, v12
CUDA Toolkit 10.1 or later
Linux distributions supported by CUDA Toolkit
Intel x86 64bit architecture (x86_64)
NVIDIA GPU CC 6.0 or later (Pascal or Volta)

New Features

GpuJoin supports parallel construction of inner buffer
- The older version construct inner buffer of GpuJoin by the backend process only. This restriction leads a problem; parallel scan of partitioned table delays extremely.
- This version allows both of the backend and worker processes to construct inner buffer. In case when we scan a partitioned table, any processes that is assigned to a particular child table can start GpuJoin operations immediately.
Refactoring of the partition-wise asymmetric GpuJoin
- By the refactoring of the partition-wise asymmetric GpuJoin, optimizer becomes to prefer multi-level GpuJoin in case when it offers cheaper execution cost.
Arrow_Fdw becomes writable; INSERT/TRUNCATE supported
- Arrow_Fdw foreign table allows bulk-loading by INSERT and data elimination by pgstrom.arrow_fdw_truncate.
pg2arrow command supports 'append' mode.
- We added --append option for pg2arrow command. As literal, it appends query results on existing Apache Arrow file.
- Also, -t table option was added as an alias of SELECT * FROM table.
mysql2arrow command was added.
- We added mysql2arrow command that connects to MySQL server, not PostgreSQL, and write out SQL query results as Apache Arrow files.
- It has equivalent functionality to pg2arrow except for enum data type. mysql2arrow saves enum values as flat Utf8 values without DictionaryBatch chunks.
Regression test was added
- Several test cases were added according to the PostgreSQL regression test framework.

Significant bug fixes

Revised cache invalidation logic for GPU device functions / types
- The older version had invalidated all the metadata cache entries of GPU device functions / type on execution of ALTER command. It was revised to invalidate the entries that are actually updated.
Revised extreme performance degradation if GROUP BY has same grouping key twice or even number times.
- GpuPreAgg combined hash values of grouping key of GROUP BY using XOR. So, if case when same column appeared even number time, it always leads 0 for hash-index problematically. Now we add a randomization for better hash distribution.
Potential infinite loop on GpuScan
- By uninitialized values, GpuScan potentially goes to infinite loop when SSD2GPU Direct SQL is available.
Potential GPU kernel crash on GpuJoin
- By uninitialized values, GpuJoin potentially makes GPU kernel crash when 3 or more tables are joined.

Deprecated Features

PostgreSQL v9.6 Support
- CustomScan API in PostgreSQL v9.6 lacks a few APIs to handle dynamic shared memory (DSM). It has been a problem to handle a common code for v10 or later. To avoid the problem, we dropped PostgreSQL v9.6 support in this version.
PL/CUDA
- According to the usecase analytics, users prefer familiar programming language environment like Python, rather than own special environment.
- A combination of Arrow_Fdw's GPU export functionality and CuPy invocation at PL/Python is a successor of PL/CUDA, for in-database machine-learning / statistical analytics.
Gstore_Fdw
- This feature is replaced by the writable Arrow_Fdw and its GPU export functionality.
Largeobject export to/import from GPU
- According to the usecase analytics, we determined this feature is not needed.