SQL Objects
This chapter introduces SQL objects additionally provided by PG-Strom.
System Information
pgstrom.device_info
System View-
It shows properties of GPU devices installed for PG-Strom.
-
Below is schema definition of the view.
name | type | description |
---|---|---|
gpu_id | int |
GPU device number |
att_name | text |
Attribute name |
att_value | text |
Attribute value |
att_desc | text |
Attribute description |
There are various kind of GPU device properties, but depending on the CUDA driver version where system is running.
So, pgstrom.device_info
system view identifies the target property by GPU device identifier (gpu_id
) and attribute name (att_name
).
Below is an example of pgstrom.device_info
system view.
postgres=# select * from pgstrom.gpu_device_info limit 10;
gpu_id | att_name | att_value | att_desc
--------+-----------------------+------------------------------------------+-------------------------------------
0 | DEV_NAME | NVIDIA A100-PCIE-40GB | GPU Device Name
0 | DEV_ID | 0 | GPU Device ID
0 | DEV_UUID | GPU-13943bfd-5b30-38f5-0473-78979c134606 | GPU Device UUID
0 | DEV_TOTAL_MEMSZ | 39.39GB | GPU Total RAM Size
0 | DEV_BAR1_MEMSZ | 64.00GB | GPU PCI Bar1 Size
0 | NUMA_NODE_ID | -1 | GPU NUMA Node Id
0 | MAX_THREADS_PER_BLOCK | 1024 | Maximum number of threads per block
0 | MAX_BLOCK_DIM_X | 1024 | Maximum block dimension X
0 | MAX_BLOCK_DIM_Y | 1024 | Maximum block dimension Y
0 | MAX_BLOCK_DIM_Z | 64 | Maximum block dimension Z
(10 rows)
Arrow_Fdw
fdw_handler pgstrom.arrow_fdw_handler()
- FDW handler function of Arrow_Fdw. Usually, users don't need to invoke this function.
void pgstrom.arrow_fdw_validator(text[], oid)
- FDW options validation function of Arrow_Fdw. Usually, users don't need to invoke this function.
event_trigger pgstrom.arrow_fdw_precheck_schema()
- Event trigger function to validate schema definition of Arrow files. Usually, users don't need to invoke this function.
text pgstrom.arrow_fdw_check_pattern(text, text)
- Checks whether the file name given in the first argument matches the pattern given in the second argument. This is useful for checking how wildcards work when using the
pattern
option of Arrow_Fdw.
=# select pgstrom.arrow_fdw_check_pattern('data_202408_tokyo.data', 'data__${region}.data');
arrow_fdw_check_pattern
-------------------------------------------
true {@ymd]=[202408], $[region]=[tokyo]}
(1 row)
_${region}.data'のうち、ワイルドカード``に相当する部分が
202408にマッチし、
${region}に相当する部分が
tokyo`にマッチしています。
}
In the above example, for the file name data_202408_tokyo.data
, the part of the pattern 'data_@ymd_${region}.data'
that corresponds to the wildcard ` matches
202408, and the part that corresponds to
${region}matches
tokyo`.
}
record pgstrom.arrow_fdw_metadata_info(regclass)
- Displays the metadata (CustomMetadata) embedded in the supplied Arrow file.
postgres=# select relid, filename, field, key, substring(value, 0, 64) from pgstrom.arrow_fdw_metadata_info('f_lineorder');
relid | filename | field | key | substring
-------------+-------------------------------------+--------------+-------------+-----------------------------------------------------------------
f_lineorder | /opt/arrow/f_lineorder_sorted.arrow | | sql_command | select * from v_lineorder order by lo_orderdate
f_lineorder | /opt/arrow/f_lineorder_sorted.arrow | lo_orderdate | min_values | 19920101,19920102,19920103,19920104,19920105,19920107,19920108,
f_lineorder | /opt/arrow/f_lineorder_sorted.arrow | lo_orderdate | max_values | 19920102,19920103,19920104,19920105,19920107,19920108,19920109,
(3 rows)
In the above example, the metadata (CustomMetadata) embedded in the f_lineorder
foreign table managed by Arrow_Fdw is output in KEY=VALUE form.
Metadata embedded in the schema has the field
column set to NULL. Otherwise, the column name is displayed. In this case, you can see that the sql_command
metadata is embedded in the schema, and the min_values
and max_values
are embedded in the lo_orderdate
column.
json pgstrom.arrow_fdw_metadata_stats()
- Displays the statistics of the Arrow_Fdw metadata-cache
Arrow_Fdw somehow remembers the metadata (schema definition, data placement, etc.) of an Arrow file that has been referenced once in shared memory, and can improve responsiveness when referencing an Arrow_Fdw foreign table from the next time onwards.
This function can display statistical information about the metadata cache in shared memory.
If the metadata cache is insufficient, the reference responsiveness of Arrow_Fdw foreign tables may decrease. In that case, it is necessary to enlarge the metadata cache by increasing arrow_fdw.metadata_cache_size
.
s=# select pgstrom.arrow_fdw_metadata_stats();
arrow_fdw_metadata_stats
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
{ "total_cache_sz" : 536870912, "total_cache_allocated" : 9568256, "total_cache_usage" : 9354072, "num_file_entries" : 2, "cache_usage_efficiency" : 0.977615, "cache_num_blocks" : 4096, "cache_active_blocks" : 73, "cache_free_blocks" : 4023, "cache_used_ratio" : 0.017822 "last_allocate_timestamp" : "2025-03-29 13:24:44.447" }
(1 row)
As in the example above, the statistics are displayed in JSON format, with the meaning of each field as follows:
- total_cache_sz
- The total size of the metadata cache (in bytes)
- total_cache_allocated
- Allocated metadata cache size (in bytes)
- total_cache_usage
- Used metadata cache size (in bytes)
- num_file_entries
- Number of Arrow files that are cached
- cache_usage_efficiency
- The ratio of used metadata cache to the allocated metadata cache
- cache_num_blocks
- Number of blocks in the metadata cache (in 128kB units)
- cache_active_blocks
- Number of allocated metadata cache blocks
- cache_free_blocks
- Number of free metadata cache blocks
- cache_used_ratio
- Percentage of metadata cache blocks allocated
- last_allocate_timestamp
- The timestamp of the last metadata cache allocation
- last_reclaim_timestamp
- The timestamp when the metadata cache was last cleared
void pgstrom.arrow_fdw_import_file(text, text, text = null)
-
This function tries to import Apache Arrow file, and defines a new foreign table. Its first argument is name of the new foreign table, the second argument is path of the Apache Arrow file, and the optional third argument is the schema name.
-
This function is similar to
IMPORT FOREIGN SCHEMA
statement, but allows to import Apache Arrow files that have wider fields than the limitation of number of columns in PostgreSQL (MaxTupleAttributeNumber
= 1664). So, we recommend to useIMPORT FOREIGN SCHEMA
statement for most cases. -
The example below shows the steps to import an Apache Arrow file with 2000 of Int16 fields by the
pgstrom.arrow_fdw_import_file
. The result of\d mytest
shows this foreign table has 2000 fields. -
Due to the internal data format of PostgreSQL, it is not possible to read all the columns at once, but possible to read a part of columns like the last example.
=# select pgstrom.arrow_fdw_import_file('mytest', '/tmp/wide2000.arrow');
arrow_fdw_import_file
-----------------------
(1 row)
=# \d
List of relations
Schema | Name | Type | Owner
--------+--------+---------------+--------
public | mytest | foreign table | kaigai
(1 row)
=# \d mytest
Foreign table "public.mytest"
Column | Type | Collation | Nullable | Default | FDW options
-----------+----------+-----------+----------+---------+-------------
object_id | integer | | | |
c000 | smallint | | | |
c001 | smallint | | | |
c002 | smallint | | | |
c003 | smallint | | | |
: : : : : :
c1997 | smallint | | | |
c1998 | smallint | | | |
c1999 | smallint | | | |
Server: arrow_fdw
FDW options: (file '/tmp/wide2000.arrow')
=# select * from mytest ;
ERROR: target lists can have at most 1664 entries
=# select c0010,c1234,c1999 from mytest limit 3;
c0010 | c1234 | c1999
-------+-------+-------
232 | 232 | 232
537 | 537 | 537
219 | 219 | 219
(3 rows)
GPU Cache
pgstrom.gpucache_info
System View- It shows the current status of GPU Cache.
Below is schema definition of the view.
name | type | description |
---|---|---|
database_oid |
oid |
Database OID where the table with GPU Cache belongs to. |
database_name |
text |
Database name where the table with GPU Cache belongs to. |
table_oid |
oid |
Table OID that has GPU Cache. Note that it may not be in the current database. |
table_name |
text |
Table name that has GPU Cache. Note that it may not be in the current database. |
signature |
int8 |
An identifier hash value of GPU Cache. It may be changed after ALTER TABLE for example. |
phase |
text |
Phase of GPU cache construction: either not_built , is_empty , is_loading , is_ready , or corrupted |
rowid_num_used |
int8 |
Number of allocated row-id |
rowid_num_free |
int8 |
Number of free row-id |
gpu_main_sz |
int8 |
Size of the fixed-length values area on the GPU Cache. |
gpu_main_nitems |
int8 |
Number of tuples on the GPU Cache. |
gpu_extra_sz |
int8 |
Size of the variable-length values area on the GPU Cache. |
gpu_extra_usage |
int8 |
Size of the used variable-length values area on the GPU Cache. |
gpu_extra_dead |
int8 |
Size of the free variable-length values area on the GPU Cache. |
redo_write_ts |
timestamptz |
Last update timestamp on the REDO Log buffer |
redo_write_nitems |
int8 |
Total number of REDO Log entries written to the REDO Log buffer. |
redo_write_pos |
int8 |
Total bytes of REDO Log entries written to the REDO Log buffer. |
redo_read_nitems |
int8 |
Total number of REDO Log entries read from REDO Log buffer, and already applied to. |
redo_read_pos |
int8 |
Total bytes of REDO Log entries read from REDO Log buffer, and already applied to. |
redo_sync_pos |
int8 |
The latest position on the REDO Log buffer, where it is already required the background worker to synchronize onto the GPU Cache. When free space of REDO Log buffer becomes tight, it is internally used to avoid flood of simultaneous asynchronized requests by many sessions. |
config_options |
text |
Options string of the GPU Cache |
Below is an example of pgstrom.gpucache_info
system view.
=# select * from pgstrom.gpucache_info ;
database_oid | database_name | table_oid | table_name | signature | phase | rowid_num_used | rowid_num_free | gpu_main_sz | gpu_main_nitems | gpu_extra_sz | gpu_extra_usage | gpu_extra_dead | redo_write_ts | redo_write_nitems | redo_write_pos | redo_read_nitems | redo_read_pos |
redo_sync_pos | config_options
--------------+---------------+-----------+------------------+------------+----------+----------------+----------------+-------------+-----------------+--------------+-----------------+----------------+-------------------------------+-------------------+----------------+------------------+---------------+---------------+---------------------------------------------------------------------------------------------------------------------
193450 | hoge | 603029 | cache_test_table | 4529357070 | is_ready | 4000 | 6000 | 439904 | 4000 | 3200024 | 473848 | 0 | 2023-12-18 01:25:42.850193+09 | 4000 | 603368 | 4000 | 603368 | 603368 | gpu_device_id=0,max_num_rows=10000,redo_buffer_size=157286400,gpu_sync_interval=4000000,gpu_sync_threshold=10485760
(1 row)
trigger pgstrom.gpucache_sync_trigger()
- A trigger function to synchronize GPU Cache on table updates. See GPU Cache chapter for more details.
bigint pgstrom.gpucache_apply_redo(regclass)
- If the given table has GPU Cache configured, it forcibly applies the REDO log entries onto the GPU Cache.
bigint pgstrom.gpucache_compaction(regclass)
- If the given table has GPU Cache configured, it forcibly run compaction of the variable-length data buffer.
bigint pgstrom.gpucache_recovery(regclass)
- It tries to recover the corrupted GPU cache.
Test Data Generator
void pgstrom.random_setseed(int)
- It initializes the random seed.
bigint pgstrom.random_int(float=0.0, bigint=0, bigint=INT_MAX)
- It generates random data in
bigint
type within the range. float pgstrom.random_float(float=0.0, float=0.0, float=1.0)
- It generates random data in
float
type within the range. date pgstrom.random_date(float=0.0, date='2015-01-01', date='2025-12-31')
- It generates random data in
date
type within the range. time pgstrom.random_time(float=0.0, time='00:00:00', time='23:59:59')
- It generates random data in
time
type within the range. timetz pgstrom.random_timetz(float=0.0, time='00:00:00', time='23:59:59')
- It generates random data in
timetz
type within the range. timestamp pgstrom.random_timestamp(float=0.0, timestamp='2015-01-01', timestamp='2025-01-01')
- It generates random data in
timestamp
type within the range. macaddr pgstrom.random_macaddr(float=0.0, macaddr='ab:cd:00:00:00', macaddr='ab:cd:ff:ff:ff:ff')
- It generates random data in
macaddr
type within the range. inet pgstrom.random_inet(float=0.0, inet='192.168.0.1/16')
- It generates random data in
inet
type within the range. text pgstrom.random_text(float=0.0, text='test_**')
- It generates random data in
text
type. The '*' characters in 2nd argument shall be replaced randomly. text pgstrom.random_text_len(float=0.0, int=10)
- It generates random data in
text
type within the specified length. int4range pgstrom.random_int4range(float=0.0, bigint=0, bigint=INT_MAX)
- It generates random data in
int4range
type within the range.} int8range pgstrom.random_int8range(float=0.0, bigint=0, bigint=LONG_MAX)
- It generates random data in
int8range
type within the range. tsrange pgstrom.random_tsrange(float=0.0, timestamp='2015-01-01', timestamp='2025-01-01')
- It generates random data in
tsrange
type within the range. tstzrange pgstrom.random_tstzrange(float=0.0, timestamptz='2015-01-01', timestamptz='2025-01-01')
- It generates random data in
tstzrange
type within the range. daterange pgstrom.random_daterange(float=0.0, date='2015-01-01', date='2025-12-31')
- It generates random data in
daterange
type within the range.
Other Functions
text pgstrom.githash()
- It displays the hash value of the source code revision from the currently loaded PG-Strom module is based. This value is useful in determining the software revision in the event of a failure.
postgres=# select pgstrom.githash();
githash
------------------------------------------
103984be24cafd1e7ce6330a050960d97675c196
text pgstrom.license_query()
- It displays the active commercial subscription, if loaded.
=# select pgstrom.license_query();
license_query
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
{ "version" : 2, "serial_nr" : "HDB-TRIAL", "issued_at" : "2020-11-24", "expired_at" : "2025-12-31", "gpus" : [ { "uuid" : "GPU-8ba149db-53d8-c5f3-0f55-97ce8cfadb28" } ]}
(1 row)