PG-Strom v2.2 Release

PG-Strom Development Team (1-May-2019)

Overview

Major enhancement in PG-Strom v2.2 includes:

  • Table partitioning support
  • Columnar store support with Arrow_Fdw
  • Pre-built GPU binary support
  • Enables to implement GPU functions that returns variable length data
  • GpuSort support on GPU memory store (Gstore_Fdw)
  • NVME-oF support (Experimental)

Prerequisites

  • PostgreSQL v9.6, v10, v11
  • CUDA Toolkit 10.1
  • Linux distributions supported by CUDA Toolkit
  • Intel x86 64bit architecture (x86_64)
  • NVIDIA GPU CC 6.0 or later (Pascal or Volta)

New Features

  • Table partitioning support
    • If multi-GPUs configuration, an optimal GPU shall be chosen according to the physical distance between GPU and child tables that construct a partition. If PG-Strom cannot identify the distance from PCIe-bus topology, like NVME-oF configuration, DBA can configure the relation of GPU and NVME-SSD using pg_strom.nvme_distance_map.
    • When we join a partitioned table with non-partition tables, this version can produce a query execution plan that preliminary joins the non-partitioned table with partition child tables for each, and gather the results from child tables. This feature is proposed to PostgreSQL v13 core, as Asymmetric Partition-wise JOIN.
  • Columnar store support with Arrow_Fdw
    • It supports to read external Apache Arrow files using foreign table.
    • It also supports SSD-to-GPU Direct SQL on Apache Arrow files.
  • Pre-built GPU binary support
    • When GPU binary code is generated from SQL, the older version wrote out eitire CUDA C source code, including static portions like libraries, then NVRTC(NVIDIA Run-Time Compiker) built them on the fly. However, a part of complicated function consumed much longer compilation time.
    • v2.2 preliminary builds static functions preliminary, and only dynamic portion from SQL are built dynamically. It reduces the time for GPU binary generation.
  • JSONB data type support
    • This version allows to reference elements of JSONB object, and to utilize them as numeric or test.
  • Enables to implement GPU functions that returns variable length data
    • This version allows to implement SQL functions that returns variable-length data, like textcat, on GPU devices.
  • GpuSort support on GPU memory store (Gstore_Fdw)
    • This version allows to read data from GPU memory store for SQL workloads execution, not only PL/CUDA.
  • Addition of regression test
    • Several simple regression tests are added.
  • NVME-oF support (Experimental)
    • It supports SSD-to-GPU Direct SQL from remote SSD disks which are mounted using NVME-over-Fabric. Please note that it is an experimental feature, and it needs to replace the nvme_rdma kernel module on Red Hat Enterprise Linux 7.x / CentOS 7.x.

Features to be deprecated

  • PostgreSQL v9.6 support

    • CustomScan API in PostgreSQL v9.6 lacks a few APIs to handle dynamic shared memory (DSM), so it is unable to collect run-time statistics.
    • It also changes the way to keep expression objects internally, therefore, we had to put #if ... #endif blocks at no little points. It has damaged to code maintainability.
    • Due to the problems, this is the last version to support PostgreSQL v9.6. If you applied PG-Strom on PostgreSQL v9.6, let us recommend to move PostgreSQL v11 as soon as possible.
  • The pgstrom format of Gstore_Fdw foreign table

    • The internal data format on GPU memory store (Gstore_Fdw) is originally designed for data source of PL/CUDA procedures. It is our own format, and used PostgreSQL's data representations as is, like variable-length data, numeric, and so on.
    • After that, NVIDIA released RAPIDS(cuDF), based on Apache Arrow, for data exchange on GPU, then its adoption becomes wider on machine-learning application and Python software stack.
    • PG-Strom will switch its internal data format of Gstore_Fdw, to improve interoperability with these machine-learning software, then existing data format shall be deprecated.

Dropped Features

  • In-memory columnar cache
    • As results of use-case analysis, we concluded Arrow_Fdw can replace this feature in most cases. Due to feature duplication, we dropped the in-memory columnar cache.