SQL Objects

This chapter introduces SQL objects additionally provided by PG-Strom.

System Information

pgstrom.device_info System View: It shows properties of GPU devices installed for PG-Strom.; Below is schema definition of the view.

name	type	description
gpu_id	`int`	GPU device number
att_name	`text`	Attribute name
att_value	`text`	Attribute value
att_desc	`text`	Attribute description

There are various kind of GPU device properties, but depending on the CUDA driver version where system is running. So, pgstrom.device_info system view identifies the target property by GPU device identifier (gpu_id) and attribute name (att_name).

Below is an example of pgstrom.device_info system view.

postgres=# select * from pgstrom.gpu_device_info limit 10;
 gpu_id |       att_name        |                att_value                 |              att_desc
--------+-----------------------+------------------------------------------+-------------------------------------
      0 | DEV_NAME              | NVIDIA A100-PCIE-40GB                    | GPU Device Name
      0 | DEV_ID                | 0                                        | GPU Device ID
      0 | DEV_UUID              | GPU-13943bfd-5b30-38f5-0473-78979c134606 | GPU Device UUID
      0 | DEV_TOTAL_MEMSZ       | 39.39GB                                  | GPU Total RAM Size
      0 | DEV_BAR1_MEMSZ        | 64.00GB                                  | GPU PCI Bar1 Size
      0 | NUMA_NODE_ID          | -1                                       | GPU NUMA Node Id
      0 | MAX_THREADS_PER_BLOCK | 1024                                     | Maximum number of threads per block
      0 | MAX_BLOCK_DIM_X       | 1024                                     | Maximum block dimension X
      0 | MAX_BLOCK_DIM_Y       | 1024                                     | Maximum block dimension Y
      0 | MAX_BLOCK_DIM_Z       | 64                                       | Maximum block dimension Z
(10 rows)

Arrow_Fdw

fdw_handler pgstrom.arrow_fdw_handler(): FDW handler function of Arrow_Fdw. Usually, users don't need to invoke this function.
void pgstrom.arrow_fdw_validator(text[], oid): FDW options validation function of Arrow_Fdw. Usually, users don't need to invoke this function.
event_trigger pgstrom.arrow_fdw_precheck_schema(): Event trigger function to validate schema definition of Arrow files. Usually, users don't need to invoke this function.
text pgstrom.arrow_fdw_check_pattern(text, text): Checks whether the file name given in the first argument matches the pattern given in the second argument. This is useful for checking how wildcards work when using the pattern option of Arrow_Fdw.

=# select pgstrom.arrow_fdw_check_pattern('data_202408_tokyo.data', 'data__${region}.data');
          arrow_fdw_check_pattern
-------------------------------------------
 true {@ymd]=[202408], $[region]=[tokyo]}
(1 row)

_${region}.data'のうち、ワイルドカード``に相当する部分が202408にマッチし、${region}に相当する部分がtokyo`にマッチしています。 }

In the above example, for the file name data_202408_tokyo.data, the part of the pattern 'data_@ymd_${region}.data' that corresponds to the wildcard ` matches202408, and the part that corresponds to${region}matchestokyo`. }

record pgstrom.arrow_fdw_metadata_info(regclass): Displays the metadata (CustomMetadata) embedded in the supplied Arrow file.

postgres=# select relid, filename, field, key, substring(value, 0, 64) from pgstrom.arrow_fdw_metadata_info('f_lineorder');
    relid    |              filename               |    field     |     key     |                            substring
-------------+-------------------------------------+--------------+-------------+-----------------------------------------------------------------
 f_lineorder | /opt/arrow/f_lineorder_sorted.arrow |              | sql_command | select * from v_lineorder order by lo_orderdate
 f_lineorder | /opt/arrow/f_lineorder_sorted.arrow | lo_orderdate | min_values  | 19920101,19920102,19920103,19920104,19920105,19920107,19920108,
 f_lineorder | /opt/arrow/f_lineorder_sorted.arrow | lo_orderdate | max_values  | 19920102,19920103,19920104,19920105,19920107,19920108,19920109,
(3 rows)

In the above example, the metadata (CustomMetadata) embedded in the f_lineorder foreign table managed by Arrow_Fdw is output in KEY=VALUE form.

Metadata embedded in the schema has the field column set to NULL. Otherwise, the column name is displayed. In this case, you can see that the sql_command metadata is embedded in the schema, and the min_values and max_values are embedded in the lo_orderdate column.

json pgstrom.arrow_fdw_metadata_stats(): Displays the statistics of the Arrow_Fdw metadata-cache

Arrow_Fdw somehow remembers the metadata (schema definition, data placement, etc.) of an Arrow file that has been referenced once in shared memory, and can improve responsiveness when referencing an Arrow_Fdw foreign table from the next time onwards.

This function can display statistical information about the metadata cache in shared memory.

If the metadata cache is insufficient, the reference responsiveness of Arrow_Fdw foreign tables may decrease. In that case, it is necessary to enlarge the metadata cache by increasing arrow_fdw.metadata_cache_size.

s=# select pgstrom.arrow_fdw_metadata_stats();

       arrow_fdw_metadata_stats

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 { "total_cache_sz" : 536870912, "total_cache_allocated" : 9568256, "total_cache_usage" : 9354072, "num_file_entries" : 2, "cache_usage_efficiency" : 0.977615, "cache_num_blocks" : 4096, "cache_active_blocks" : 73, "cache_free_blocks" : 4023, "cache_used_ratio" : 0.017822 "last_allocate_timestamp" : "2025-03-29 13:24:44.447" }
(1 row)

As in the example above, the statistics are displayed in JSON format, with the meaning of each field as follows:

total_cache_sz
- The total size of the metadata cache (in bytes)
total_cache_allocated
- Allocated metadata cache size (in bytes)
total_cache_usage
- Used metadata cache size (in bytes)
num_file_entries
- Number of Arrow files that are cached
cache_usage_efficiency
- The ratio of used metadata cache to the allocated metadata cache
cache_num_blocks
- Number of blocks in the metadata cache (in 128kB units)
cache_active_blocks
- Number of allocated metadata cache blocks
cache_free_blocks
- Number of free metadata cache blocks
cache_used_ratio
- Percentage of metadata cache blocks allocated
last_allocate_timestamp
- The timestamp of the last metadata cache allocation
last_reclaim_timestamp
- The timestamp when the metadata cache was last cleared

void pgstrom.arrow_fdw_import_file(text, text, text = null): This function tries to import Apache Arrow file, and defines a new foreign table. Its first argument is name of the new foreign table, the second argument is path of the Apache Arrow file, and the optional third argument is the schema name.; This function is similar to IMPORT FOREIGN SCHEMA statement, but allows to import Apache Arrow files that have wider fields than the limitation of number of columns in PostgreSQL (MaxTupleAttributeNumber = 1664). So, we recommend to use IMPORT FOREIGN SCHEMA statement for most cases.; The example below shows the steps to import an Apache Arrow file with 2000 of Int16 fields by the pgstrom.arrow_fdw_import_file. The result of \d mytest shows this foreign table has 2000 fields.; Due to the internal data format of PostgreSQL, it is not possible to read all the columns at once, but possible to read a part of columns like the last example.

=# select pgstrom.arrow_fdw_import_file('mytest', '/tmp/wide2000.arrow');
 arrow_fdw_import_file
-----------------------

(1 row)

=# \d
            List of relations
 Schema |  Name  |     Type      | Owner
--------+--------+---------------+--------
 public | mytest | foreign table | kaigai
(1 row)


=# \d mytest
                    Foreign table "public.mytest"
  Column   |   Type   | Collation | Nullable | Default | FDW options
-----------+----------+-----------+----------+---------+-------------
 object_id | integer  |           |          |         |
 c000      | smallint |           |          |         |
 c001      | smallint |           |          |         |
 c002      | smallint |           |          |         |
 c003      | smallint |           |          |         |
   :             :          :          :          :            :
 c1997     | smallint |           |          |         |
 c1998     | smallint |           |          |         |
 c1999     | smallint |           |          |         |
Server: arrow_fdw
FDW options: (file '/tmp/wide2000.arrow')

=# select * from mytest ;
ERROR:  target lists can have at most 1664 entries

=# select c0010,c1234,c1999 from mytest limit 3;
 c0010 | c1234 | c1999
-------+-------+-------
   232 |   232 |   232
   537 |   537 |   537
   219 |   219 |   219
(3 rows)

GPU Cache

pgstrom.gpucache_info System View: It shows the current status of GPU Cache.
Below is schema definition of the view.

name	type	description
`database_oid`	`oid`	Database OID where the table with GPU Cache belongs to.
`database_name`	`text`	Database name where the table with GPU Cache belongs to.
`table_oid`	`oid`	Table OID that has GPU Cache. Note that it may not be in the current database.
`table_name`	`text`	Table name that has GPU Cache. Note that it may not be in the current database.
`signature`	`int8`	An identifier hash value of GPU Cache. It may be changed after `ALTER TABLE` for example.
`phase`	`text`	Phase of GPU cache construction: either `not_built`, `is_empty`, `is_loading`, `is_ready`, or `corrupted`
`rowid_num_used`	`int8`	Number of allocated row-id
`rowid_num_free`	`int8`	Number of free row-id
`gpu_main_sz`	`int8`	Size of the fixed-length values area on the GPU Cache.
`gpu_main_nitems`	`int8`	Number of tuples on the GPU Cache.
`gpu_extra_sz`	`int8`	Size of the variable-length values area on the GPU Cache.
`gpu_extra_usage`	`int8`	Size of the used variable-length values area on the GPU Cache.
`gpu_extra_dead`	`int8`	Size of the free variable-length values area on the GPU Cache.
`redo_write_ts`	`timestamptz`	Last update timestamp on the REDO Log buffer
`redo_write_nitems`	`int8`	Total number of REDO Log entries written to the REDO Log buffer.
`redo_write_pos`	`int8`	Total bytes of REDO Log entries written to the REDO Log buffer.
`redo_read_nitems`	`int8`	Total number of REDO Log entries read from REDO Log buffer, and already applied to.
`redo_read_pos`	`int8`	Total bytes of REDO Log entries read from REDO Log buffer, and already applied to.
`redo_sync_pos`	`int8`	The latest position on the REDO Log buffer, where it is already required the background worker to synchronize onto the GPU Cache. When free space of REDO Log buffer becomes tight, it is internally used to avoid flood of simultaneous asynchronized requests by many sessions.
`config_options`	`text`	Options string of the GPU Cache

Below is an example of pgstrom.gpucache_info system view.

=# select * from pgstrom.gpucache_info ;
 database_oid | database_name | table_oid |    table_name    | signature  |  phase   | rowid_num_used | rowid_num_free | gpu_main_sz | gpu_main_nitems | gpu_extra_sz | gpu_extra_usage | gpu_extra_dead |         redo_write_ts         | redo_write_nitems | redo_write_pos | redo_read_nitems | redo_read_pos |
redo_sync_pos |                                                   config_options
--------------+---------------+-----------+------------------+------------+----------+----------------+----------------+-------------+-----------------+--------------+-----------------+----------------+-------------------------------+-------------------+----------------+------------------+---------------+---------------+---------------------------------------------------------------------------------------------------------------------
       193450 | hoge          |    603029 | cache_test_table | 4529357070 | is_ready |           4000 |           6000 |      439904 |            4000 |      3200024 |          473848 |              0 | 2023-12-18 01:25:42.850193+09 |              4000 |         603368 |             4000 |        603368 |       603368 | gpu_device_id=0,max_num_rows=10000,redo_buffer_size=157286400,gpu_sync_interval=4000000,gpu_sync_threshold=10485760
(1 row)

trigger pgstrom.gpucache_sync_trigger(): A trigger function to synchronize GPU Cache on table updates. See GPU Cache chapter for more details.
bigint pgstrom.gpucache_apply_redo(regclass): If the given table has GPU Cache configured, it forcibly applies the REDO log entries onto the GPU Cache.
bigint pgstrom.gpucache_compaction(regclass): If the given table has GPU Cache configured, it forcibly run compaction of the variable-length data buffer.
bigint pgstrom.gpucache_recovery(regclass): It tries to recover the corrupted GPU cache.

Test Data Generator

void pgstrom.random_setseed(int): It initializes the random seed.
bigint pgstrom.random_int(float=0.0, bigint=0, bigint=INT_MAX): It generates random data in bigint type within the range.
float pgstrom.random_float(float=0.0, float=0.0, float=1.0): It generates random data in float type within the range.
date pgstrom.random_date(float=0.0, date='2015-01-01', date='2025-12-31'): It generates random data in date type within the range.
time pgstrom.random_time(float=0.0, time='00:00:00', time='23:59:59'): It generates random data in time type within the range.
timetz pgstrom.random_timetz(float=0.0, time='00:00:00', time='23:59:59'): It generates random data in timetz type within the range.
timestamp pgstrom.random_timestamp(float=0.0, timestamp='2015-01-01', timestamp='2025-01-01'): It generates random data in timestamp type within the range.
macaddr pgstrom.random_macaddr(float=0.0, macaddr='ab:cd:00:00:00', macaddr='ab:cd:ff:ff:ff:ff'): It generates random data in macaddr type within the range.
inet pgstrom.random_inet(float=0.0, inet='192.168.0.1/16'): It generates random data in inet type within the range.
text pgstrom.random_text(float=0.0, text='test_**'): It generates random data in text type. The '*' characters in 2nd argument shall be replaced randomly.
text pgstrom.random_text_len(float=0.0, int=10): It generates random data in text type within the specified length.
int4range pgstrom.random_int4range(float=0.0, bigint=0, bigint=INT_MAX): It generates random data in int4range type within the range.}
int8range pgstrom.random_int8range(float=0.0, bigint=0, bigint=LONG_MAX): It generates random data in int8range type within the range.
tsrange pgstrom.random_tsrange(float=0.0, timestamp='2015-01-01', timestamp='2025-01-01'): It generates random data in tsrange type within the range.
tstzrange pgstrom.random_tstzrange(float=0.0, timestamptz='2015-01-01', timestamptz='2025-01-01'): It generates random data in tstzrange type within the range.
daterange pgstrom.random_daterange(float=0.0, date='2015-01-01', date='2025-12-31'): It generates random data in daterange type within the range.

Other Functions

text pgstrom.githash(): It displays the hash value of the source code revision from the currently loaded PG-Strom module is based. This value is useful in determining the software revision in the event of a failure.

postgres=# select pgstrom.githash();
                 githash
------------------------------------------
 103984be24cafd1e7ce6330a050960d97675c196

text pgstrom.license_query(): It displays the active commercial subscription, if loaded.

=# select pgstrom.license_query();
                                                                               license_query
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 { "version" : 2, "serial_nr" : "HDB-TRIAL", "issued_at" : "2020-11-24", "expired_at" : "2025-12-31", "gpus" : [ { "uuid" : "GPU-8ba149db-53d8-c5f3-0f55-97ce8cfadb28" } ]}
(1 row)