Data Types
PG-Strom support the following data types for use on GPU device.
Numeric types
int1
[length: 1byte]- 8bit integer data type; enhanced data type by PG-Strom
int2
(aliassmallint
) [length: 2bytes]- 16bit integer data type
int4
(aliasint
) [length: 4bytes]- 32bit integer data type
int8
(aliasbigint
) [length: 8bytes]- 64bit integer data type
float2
[length: 2bytes]- Half precision data type; enhanced data type by PG-Strom
Note
Even though GPU supports half-precision floating-point numbers by hardware, CPU (x86_64 processor) does not support it yet. So, when CPU processes float2
data types, it transform them to float
or double
on calculations. So, CPU has no advantages for calculation performance of float2
, unlike GPU. It is a feature to save storage/memory capacity for machine-learning / statistical-analytics.
float4
(aliasreal
) [length: 4bytes]- Single precision floating-point data type
float8
(aliasdouble precision
) [length: 8bytes]- Double precision floating-point data type
numeric
[length: variable]- Real number data type; handled as a 128bit fixed-point value in GPU
Note
When GPU processes values in numeric
data type, it is converted to an internal 128bit fixed-point number because of implementation reason. (This layout is identical to Decimal
type in Apache Arrow.)
It is transparently converted to/from the internal format, on the other hands, PG-Strom cannot convert numaric
datum with large number of digits, so tries to fallback operations by CPU. Therefore, it may lead slowdown if numeric
data with large number of digits are supplied to GPU device.
To avoid the problem, turn off the GUC option pg_strom.enable_numeric_type
not to run operational expression including numeric
data types on GPU devices.
Date and time types
date
[length: 4bytes]- Date data type
time
(aliastime without time zone
) [length: 8bytes]- Time data type
timetz
(aliastime with time zone
) [length: 12bytes]- Time with timezone data type
timestamp
(aliastimestamp without time zone
) [length: 8bytes]- Timestamp data type
timestamptz
(aliastimestamp with time zone
) [length: 8bytes]- Timestamp with timezone data type
interval
[length: 16bytes]- Interval data type
Variable length types
bpchar
[length: variable]- variable length text with whitespace paddings
varchar
[length: variable]- variable length text type
text
[length: variable]- variable length text type
bytea
[length: variable]- variable length binary type
unstructured data types
jsonb
[length: variable]- JSON data type with binary indexed keys
Note
Pay attention for the two points below, when GPU processes jsonb
data types.
jsonb
is not performance efficient data types because it has to load unreferenced attributes onto GPU from the storage, so tend to consume I/O bandwidth by junk data.
In case when jsonb
data length exceeds the threshold of datum TOASTen, entire jsonb
value is written out to TOAST table, thus, GPU cannot process these values and invokes inefficient CPU-fallback operations.
Regarding to the 2nd problem, you can extend table's storage option toast_tuple_target
to enlarge the threshold for datum TOASTen.
Miscellaneous types
boolean
[length: 1byte]- Boolean data type
money
[length: 8bytes]- Money data type
uuid
[length: 16bytes]- UUID data type
macaddr
[length: 6bytes]- Network MAC address data type
inet
[length: 7 or 19bytes]- Network address data type
cidr
[length: 7 or 19butes]- Network address data type
cube
[length: variable]- Extra data type provided by
contrib/cube
Geometry data types
geometry
[length: variable]- Geometry object of PostGIS
box2df
[length: 16bytes]- 2-dimension bounding box (used to GiST-index)