This chapter introduces the way to implement GPU executable native program as SQL functions, using PL/CUDA procedural language.

PL/CUDA Overview

PG-Strom internally constructs GPU programs by CUDA language, according to the supplied SQL, then generates GPU's native binary using just-in-time compile. CUDA is a programming environment provided by NVIDIA. It allows implementing parallel program which is executable on GPU device, using C-like statement. This transformation process from SQL statement to CUDA program is an internal process, thus, no need to pay attention what GPU programs are generated and executed from the standpoint of users.

On the other hands, PostgreSQL supports to add programming language to implement SQL functions by CREATE LANGUAGE statement. PL/CUDA is a language handler to supports CREATE LANGUAGE command. It also allows users to run arbitrary GPU programs manually implemented as SQL functions, but not only GPU programs automatically generated by PG-Strom based on SQL.

Its argument can take the data types supported by PG-Strom, like numeric, text, or array-matrix data type. These arguments are implicitly loaded onto GPU device memory by the PL/CUDA infrastructure, so users don't need to pay attention for data loading between the database and GPU devices. In a similar fashion, the return value of PL/CUDA function (including the case of variable length data type) will be written back to CPU from GPU, then decode to the result of SQL function.

You can also use foreign tables defined with gstore_fdw as arguments of PL/CUDA function. In this case, no need to load the data onto GPU for each invocation because foreign table already keeps the data, and available to use larger data than 1GB which is a restriction of variable length data in PostgreSQL.

Therefore, users can focus on productive tasks like implementation of statistical analysis, code optimization and so on, without routine process like data input/output between GPU and databases.

PL/CUDA Overview

Once a PL/CUDA function is declared with CREATE FUNCTION statement, it generates a CUDA program that embeds the definition of this function on the GPU's kernel function at the execution time. This kernel function contains initialization code to reference this PL/CUDA functions and auxiliary code to return run-time error to CPU side. Also, it can include some run-time functions to support execution of PG-Strom.

Here is no special memory protection mechanism on the native CUDA program made with PL/CUDA function, thus, execution of buggy PL/CUDA function can crash GPU execution environment or PostreSQL infrastructure in some cases. Thus, only database superuser can define PL/CUDA function.

Below is an example of simple PL/CUDA function. This function takes two int arguments, and then returns the sum of them with int data type.

postgres=# CREATE FUNCTION gpu_add(int, int)
RETURNS int
AS $$
#plcuda_include "cuda_mathlib.h"
#plcuda_begin
  if (get_global_id() == 0)
    *retval = pgfn_int4pl(kcxt, arg1, arg2);
#plcuda_end
$$ LANGUAGE plcuda;
CREATE FUNCTION

The code block enclosed by #plcuda_begin and #plcuda_end is main portion of PL/CUDA function. This kernel function can reference the int type argument as arg1 and arg2 which are pg_int4_t variables, and can return the result values written on the region pointed by retval variable which is a pointer of pg_int4_t * data type, as result of PL/CUDA function. pgfn_int4pl() is a runtime function of PG-Strom, declared at cuda_mathlib.h, which adds two pg_int4_t variables.

Below is an example of execution of this PL/CUDA function. Its two integer arguments (100 and 200) were sent to GPU device, then it wrote back the calculated result (300) from the GPU device. As like normal SQL functions, PL/CUDA function can be used as a part of SQL expression.

postgres=# SELECT gpu_add(100,200);
 gpu_add
---------
     300
(1 row)

The plcuda_function_source function allows showing the source of kernel function generated by the PL/CUDA function. The code block enclosed by the comment: /* ---- code by pl/cuda function ---- */ is the portion injected from the declaration of PL/CUDA function

postgres=# SELECT pgstrom.plcuda_function_source('gpu_add'::regproc);
                     plcuda_function_source
----------------------------------------------------------------
 #include <cuda_device_runtime_api.h>                          +
                                                               +
 #define HOSTPTRLEN 8                                          +
 #define DEVICEPTRLEN 8                                        +
 #define BLCKSZ 8192                                           +
 #define MAXIMUM_ALIGNOF 8                                     +
 #define MAXIMUM_ALIGNOF_SHIFT 3                               +
 #define PGSTROM_KERNEL_DEBUG 1                                +
 #include "cuda_common.h"                                      +
                                                               +
 #define PG_BOOLOID 16                                         +
 #define PG_INT2OID 21                                         +
 #define PG_INT4OID 23                                         +
 #define PG_INT8OID 20                                         +
 #define PG_FLOAT2OID 237809                                   +
 #define PG_FLOAT4OID 700                                      +
 #define PG_FLOAT8OID 701                                      +
 #define PG_CASHOID 790                                        +
 #define PG_UUIDOID 2950                                       +
 #define PG_MACADDROID 829                                     +
 #define PG_INETOID 869                                        +
 #define PG_CIDROID 650                                        +
 #define PG_DATEOID 1082                                       +
 #define PG_TIMEOID 1083                                       +
 #define PG_TIMETZOID 1266                                     +
 #define PG_TIMESTAMPOID 1114                                  +
 #define PG_TIMESTAMPTZOID 1184                                +
 #define PG_INTERVALOID 1186                                   +
 #define PG_BPCHAROID 1042                                     +
 #define PG_VARCHAROID 1043                                    +
 #define PG_NUMERICOID 1700                                    +
 #define PG_BYTEAOID 17                                        +
 #define PG_TEXTOID 25                                         +
 #define PG_INT4RANGEOID 3904                                  +
 #define PG_INT8RANGEOID 3926                                  +
 #define PG_TSRANGEOID 3908                                    +
 #define PG_TSTZRANGEOID 3910                                  +
 #define PG_DATERANGEOID 3912                                  +
                                                               +
 #include "cuda_mathlib.h"                                     +
 typedef union {                                               +
     pg_varlena_t     varlena_v;                               +
     pg_bool_t        bool_v;                                  +
     pg_int2_t        int2_v;                                  +
     pg_int4_t        int4_v;                                  +
     pg_int8_t        int8_v;                                  +
     pg_float2_t      float2_v;                                +
     pg_float4_t      float4_v;                                +
     pg_float8_t      float8_v;                                +
 #ifdef CUDA_NUMERIC_H                                         +
     pg_numeric_t     numeric_v;                               +
 #endif                                                        +
 #ifdef CUDA_MISC_H                                            +
     pg_money_t       money_v;                                 +
     pg_uuid_t        uuid_v;                                  +
     pg_macaddr_t     macaddr_v;                               +
     pg_inet_t        inet_v;                                  +
     pg_cidr_t        cidr_t;                                  +
 #endif                                                        +
 #ifdef CUDA_TIMELIB_H                                         +
     pg_date_t        date_v;                                  +
     pg_time_t        time_v;                                  +
     pg_timestamp_t   timestamp_v;                             +
     pg_timestamptz_t timestamptz_v;                           +
 #endif                                                        +
 #ifdef CUDA_TEXTLIB_H                                         +
     pg_bpchar_t      bpchar_v;                                +
     pg_text_t        text_v;                                  +
     pg_varchar_t     varchar_v;                               +
 #endif                                                        +
 #ifdef CUDA_RANGETYPE_H                                       +
     pg_int4range_t   int4range_v;                             +
     pg_int8range_t   int8range_v;                             +
 #ifdef CUDA_TIMELIB_H                                         +
     pg_tsrange_t     tsrange_v;                               +
     pg_tstzrange_t   tstzrange_v;                             +
     pg_daterange_t   daterange_v;                             +
 #endif                                                        +
 #endif                                                        +
   } pg_anytype_t;                                             +
                                                               +
                                                               +
 #include "cuda_plcuda.h"                                      +
 STATIC_INLINE(void)                                           +
 __plcuda_main_kernel(kern_plcuda *kplcuda,                    +
                    void *workbuf,                             +
                    void *results,                             +
                    kern_context *kcxt)                        +
 {                                                             +
   pg_int4_t *retval __attribute__ ((unused));                 +
   pg_int4_t arg1 __attribute__((unused));                     +
   pg_int4_t arg2 __attribute__((unused));                     +
   assert(sizeof(*retval) <= sizeof(kplcuda->__retval));       +
   retval = (pg_int4_t *)kplcuda->__retval;                    +
   arg1 = pg_int4_param(kcxt,0);                               +
   arg2 = pg_int4_param(kcxt,1);                               +
                                                               +
   /* ---- code by pl/cuda function ---- */                    +
   if (get_global_id() == 0)                                   +
     *retval = pgfn_int4pl(kcxt, arg1, arg2);                  +
   /* ---- code by pl/cuda function ---- */                    +
 }                                                             +
                                                               +
 KERNEL_FUNCTION(void)                                         +
 plcuda_main_kernel_entrypoint(kern_plcuda *kplcuda,           +
             void *workbuf,                                    +
             void *results)                                    +
 {                                                             +
   kern_parambuf *kparams = KERN_PLCUDA_PARAMBUF(kplcuda);     +
   kern_context kcxt;                                          +
                                                               +
   assert(kplcuda->nargs <= kparams->nparams);                 +
   INIT_KERNEL_CONTEXT(&kcxt,plcuda_main_kernel,kparams);      +
   __plcuda_main_kernel(kplcuda, workbuf, results, &kcxt);     +
   kern_writeback_error_status(&kplcuda->kerror_main, &kcxt.e);+
 }                                                             +
                                                               +
                                                               +
 #include "cuda_terminal.h"                                    +

(1 row)

PL/CUDA Structure

Function declaration with PL/CUDA is consists of several code blocks split by directives that begin from #plcuda_.... Only the code block start with #plcuda_begin is the minimum requirement, and you can add some other code block on demand.

#plcuda_decl
  [...any declarations...]
#plcuda_prep
  [...function body of prep kernel...]
#plcuda_begin
  [...function body of main kernel...]
#plcuda_post
  [...function body of post kernel...]
#plcuda_end

The declaration block, which begins with #plcuda_decl, can have declaration of static functions we can call from other code blocks. Unlike other code blocks, the contents of the code block won't be injected into a particular kernel function, and you need to declare complete static functions. When a kernel function is executed with parallel threads larger than block size on a GPU device, the only way to synchronize between multiple execution units is synchronization of kernel function exit. For example, in case when algorithm is implemented under the assumption of correct initialization of the result buffer, you have to initialize the results buffer first, then you cannot execute the core of algorithm until completion of the initialization. If a part of threads would be executed towards uninitialized buffer, it easily leads incorrect calculation results or crash of execution environment, you always need to avoid.

Every content of user defined code blocks, the preparation block begins from #plcuda_prep, the main block begins from #plcuda_begin, and the post-process block begins from #plcuda_post, shall be injected to the relevant kernel functions. Even though implementation of the preparation block and the post-process block are optional, we will ensure the order to launch the preparation kernel function, the main kernel function, then the post-process kernel function when these code blocks are defined. We intend to use these functions to initialize the results buffer or working buffer prior to execution of the main kernel function, or to summarize the final results next to execution of the main kernel.

An invocation of PL/CUDA function internall contains several SQL functions and launch GPU kernel functions. Prior to the GPU kernel functions, we have to determine the parameters when GPU kernel functions like number of threads, amount of results and working buffer. These parameters depend on the arguments, so PL/CUDA handler determines with other SQL functions that take identical argument signature.

Once we could determine the parameters to call GPU kernel function, PL/CUDA handler loads the arguments of PL/CUDA function onto the argument buffer on GPUs, by DMA copy, on demand.

Then, it launches the preparation kernel function (if any), the main kernel function, and the post-process kernel function (if any). Please note that we cannot synchronize GPU threads across the block size boundary, except for the timing of GPU kernel function begin/end. It means, if you expect a particular state exists on the working buffer or results buffer, buffer initialization by preparation kernel then reference of this data structure by the main kernel are required.

Finally, PL/CUDA handler writes back the contents of result buffer into the host side. In case when PL/CUDA function returns a fixed-length datum, the code block updates the area pointed by the retval variable which is initialized prior to execution of the user defined block. In case when PL/CUDA function returns a variable-length datum, retval points to the area of pg_varlena_t, and its value has to be a reference to the results buffer (void *results), if it is not a NULL. Please note that it shall not be written back if retval points out of the results buffer.

typedef struct {
    varlena    *value;      /* reference to the results buffer */
    cl_bool     isnull;     /* true, if NULL */
} pg_varlena_t;

PL/CUDA Callflow

#plcuda_num_threads directive allows specifying the number of threads to execute GPU kernel function. This directive can be used inside of the code block, and takes either a constant value or a SQL function. This SQL function has to be declared to take identical argument types and return bigint type.

In a similar fashion, #plcuda_shmem_unitsz allows to specify the amount of shared memory per thread, to be acquired on GPU kernel function launch. For example, when a GPU kernel function that consumes 8bytes per thread is launched with 384 threads per streaming-multiprocessor, 3KB of shared memory shall be available. Please note that the number of threads per streaming-multiprocessor shall be automatically calculated during the code optimization, a different concept from what we specify with #plcuda_num_threads directive.

#plcuda_kernel_maxthreads directive allows switching optimization policy of the kernel function for the current code block, from maximization of execution efficiency to maximization of number of threads per streaming-multiprocessor (usually 1024). Increase of number of threads per streaming-multiprocessor will improve the performance of workloads which heavily use inter-threads synchronization using shared memory, like reduction operation. On the other hands, it reduces number of registers per thread, needs a right policy in the right place.

#plcuda_num_threads (<value>|<function name>)
#plcuda_shmem_unitsz  (<value>|<function name>)
#plcuda_kernel_maxthreads

PL/CUDA References

This section is a reference for PL/CUDA function's directives and related SQL functions.

PL/CUDA Directives

#plcuda_begin

It marks beginning of the main kernel function code block. This directive is always required. Prior to execution of the code block on GPU, the arguments of PL/CUDA function are initialized for references by variable names like arg1, arg2, ... These variables have same representation with what PG-Strom represents SQL data types on GPU, for example, an argument of the real data type (that is single precision floating point type) is shown as a pg_float4_t type variable as declared below.

typedef struct {
    cl_float    value;
    cl_bool     isnull;
} pg_float4_t;

These variables are kept in private area of each threads, thus, update of these variables are not reflected on execution of the kernel function on the next step. If you want to share the state between kernel functions, value shall be kept in either the working buffer referenced by the void *workbuf pointer or the results buffer referenced by the void *results pointer.

#plcuda_end

It marks end of the kernel function code block. By the way, if a directive to start code block was put inside of the different code block, the current code block is implicitly closed by the #plcuda_end directive.

#plcuda_decl

Use of this directive is optional. It marks beginning of the declaration code block that contains the raw code to be declared prior to the definition of any kernel functions. Unlike other code blocks, the contents of this code block shall not be applied as a kernel function, thus, you have to put complete definition of functions.

#plcuda_prep

Use of this directive is optional. It marks beginning of the preparation code block that shall be executed on GPU prior to the main kernel function; begins from #plcuda_begin directive. We expect the preparation kernel initializes the results and working buffer. The main kernel shall not be kicked until completion of the preparation kernel. Arguments of PL/CUDA functions can be referenced like as the main kernel function doing.

#plcuda_post

You can optionally use this directive. It marks beginning of the post-process code block that shall be executed on GPU next to the main kernel function; begins from #plcuda_begin directive. We expect the post-process kernel set up the final results to be returned to the CPU side. The post-process kernel shall not be kicked until completion of the preparation kernel. Arguments of PL/CUDA functions can be referenced like as the main kernel function doing.

#plcuda_num_threads (<value>|<function>)

Use of this directive is optional. If not specified, the default is a constant value 1. This directive allows specifying the number of threads to execute the GPU kernel function if it is used in the code block of #plcuda_prep, #plcuda_begin, or #plcuda_post. If a constant value is specified, PL/CUDA runtime kicks the specified number of GPU threads to run the GPU kernel function. If a SQL function name is specified, PL/CUDA runtime call the specified SQL function, and then result of the function shall be applied as the number of GPU threads to run the GPU kernel function. This SQL function takes identical arguments with PL/CUDA function, and returns bigint data type.

#plcuda_shmem_unitsz (<value>|<function>)

Use of this directive is optional. If not specified, the default is a constant value 0.

This directive allows specifying amount of the shared memory per thread to be dinamically allocated on GPU kernel execution, if it is used in the code block of #plcuda_prep, #plcuda_begin, or #plcuda_post.

If a constant value is specified, PL/CUDA runtime kicks GPU kernel function with the specified amount of the shared memory per thread.

If a SQL function name is specified, PL/CUDA runtime call the specified SQL function, and then result of the function shall be applied as the amount of the shared memory per thread to run the GPU kernel function. This SQL function takes identical arguments with PL/CUDA function, and returns bigint data type.

Please note that amount of the shared memory actually acquired on execution of GPU kernel function depends on the number of threads per streaming-multiprocessor, not only the amount of shared memory per thread specified by this directive. (Also note that the number of threads per streaming-multiprocessor is a different concept what we specified using #plcuda_num_threads.) For example, if amount of shared memory per thread is 8 bytes and the number of streaming-multiprocessor is 384, 3KB of shared memory shall be allocated per streaming-multiprocessor. At that time, if the number of total threads specified by #plcuda_num_threads is 32768, this GPU kernel shall be executed with 86 streaming-multiprocessor. However, it is the role of scheduler to determine the timing to put kernels into, so it does not mean that 86 x 3KB = 256KB of the shared memory is consumed at once.

#plcuda_shmem_blocksz (<value>|<function>)

Use of this directive is optional. If not specified, the default is a constant value 0.

This directive allows specifying amount of the shared memory per block to be dinamically allocated on GPU kernel execution, if it is used in the code block of #plcuda_prep, #plcuda_begin, or #plcuda_post.

If a constant value is specified, PL/CUDA runtime kicks GPU kernel function with the specified amount of the shared memory per block.

If a SQL function name is specified, PL/CUDA runtime call the specified SQL function, and then result of the function shall be applied as the amount of the shared memory per block to run the GPU kernel function. This SQL function takes identical arguments with PL/CUDA function, and returns bigint data type.

#plcuda_kernel_blocksz (<value>|<function>)

Use of this directive is optional.

This directive allows specifying the number of threads per streaming-multiprocessor, if it is used in the code block of #plcuda_prep, #plcuda_begin, or #plcuda_post. It is usually a multiple number of the warp value of the device, and equal to or less than 1024. In the default, an optimal value is applied according to the resource consumption of the GPU kernel function, therefore, this directive shall not be used unless you have no special reason; a larger block size is preferable due to characteristics of the algorithm for example.

If a constant value is specified, PL/CUDA runtime kicks GPU kernel function with the specified amount of the shared memory per block. If a SQL function name is specified, PL/CUDA runtime calls the specified SQL function, and then result of the function shall be applied as the amount of the shared memory per block to run the GPU kernel function. This SQL function takes identical arguments with PL/CUDA function, and returns bigint data type.

Increase the number of threads per streaming-multiprocessor allows more threads to synchronize other threads using the shared memory, on the other hands, it leads decrease of the amount of registers a thread can use, thus, it may have performance degradation by private variables allocation on the (slow) global memory for example.

#plcuda_include ("library name"|<function name>)

This directive includes the static GPU library of PG-Strom, or a user defined code block, for use in PL/CUDA functions. Please note that it is NOT a feature to include arbitrary header files on the server system.

If any of the static library name below is specified, PL/CUDA runtime injects the library on the head of the generated CUDA C program. Honestlly, it is a legacy manner, so we expect limited use cases.

If a SQL function name is specified, PL/CUDA runtime calls the specified SQL function, and then result of the function shall be injected to the CUDA C code where #plcuda_include directive exists. This SQL function takes identical arguments with PL/CUDA function, and returns text data type.

Library name Description
"cuda_dynpara.h" A collection of GPU runtime functions related to dynamic parallelism; that launch kernel functions on GPU. Include of this file also links the device runtime library of CUDA.
"cuda_matrix.h" A collection of GPU runtime functions to process the array type of SQL as if vector/matrix.
"cuda_timelib.h" A collection of GPU runtime functions to process the date and time data type of SQL.
"cuda_textlib.h" A collection of GPU runtime functions to process the text data type and LIKE operator.
"cuda_numeric.h" A a collection of GPU runtime functions to process the numeric data type of SQL.
"cuda_mathlib.h" A collection of GPU runtime functions to process the arithmetic operators and mathematic functions of SQL.
"cuda_money.h" A collection of GPU runtime functions to process the currency data type of SQL.
"cuda_curand.h" A collection of GPU runtime functions to use curand library which supports random number generation, provided by CUDA.

#plcuda_results_bufsz (<value>|<function>)

Use of this directive is optional. If not specified, the default is a constant value 0.

This directive allows specifying amount of the results buffer in bytes, to be acquired on execution of PL/CUDA function. If PL/CUDA function is declared to return variable length datum, allocation of the results buffer is needed.

If a constant value is specified, PL/CUDA language handler acquires the specified amount of GPU RAM as the results buffer, then launch the GPU kernel functions. If a SQL function name is specified, PL/CUDA language handler call the specified SQL function, then result of the function shall be applied as the amount of GPU RAM for the results buffer and launch the GPU kernel functions. This SQL function takes identical arguments with PL/CUDA function, and returns bigint data type.

GPU kernel functions can access the results buffer as the region pointed by the void *results argument. If 0 bytes were specified, NULL shall be set on the void *results.

#plcuda_working_bufsz (<value>|<function>)

Use of this directive is optional. If not specified, the default is a constant value 0.

This directive allows specifying amount of the working buffer in bytes, to be acquired on execution of PL/CUDA function.

If a constant value is specified, PL/CUDA language handler acquires the specified amount of GPU RAM as the working buffer, and then launch the GPU kernel functions. If a SQL function name is specified, PL/CUDA language handler call the specified SQL function, then result of the function shall be applied as the amount of GPU RAM for the working buffer and launch the GPU kernel functions. This SQL function takes identical arguments with PL/CUDA function, and returns bigint data type.

GPU kernel functions can access the working buffer as the region pointed by the void results argument. If 0 bytes were specified, NULL shall be set on the void results.

#plcuda_sanity_check <function>

It allows to specify the sanity check function that preliminary checks adequacy of the supplied arguments, prior to GPU kernel launch. No sanity check function is configured on the default. Usually, launch of GPU kernel function is heavier task than call of another function on CPU, because it also involves initialization of GPU devices. If supplied arguments have unacceptable values from the specification of the PL/CUDA function, a few thousands or millions (or more in some cases) of GPU kernel threads shall be launched just to check the arguments and return an error status. If sanity check can be applied prior to the launch of GPU kernel function with enough small cost, it is a valuable idea to raise an error using sanity check function prior to the GPU kernel function. The sanity check function takes identical arguments with PL/CUDA function, and returns bool data type.

#plcuda_cpu_fallback <function>

It allows to specify the CPU fallback function that performs as like GPU kernel function. No CPU fallback function is configured on the default.

If GPU kernel function returns StromError_CpuReCheck error and the CPU fallback function is configured, the PL/CUDA language handler discards the results of processing on GPU side, then call the CPU fallback function. It is valuable to implement an alternative remedy, in case when GPU kernel function is not always executable for all possible input; for example, data size may be too large to load onto GPU RAM. Also note that we must have a trade-off of the performance because CPU fallback function shall be executed in CPU single thread.

Definition Result Description
plcuda_function_source(regproc) text It returns source code of the GPU kernel generated from the PL/CUDA function, towards the OID input of PL/CUDA function as argument.

Array-Matrix Functions

This section introduces the SQL functions that supports array-based matrix types provided by PG-Strom.

  • 2-dimensional Array
  • Element of array begins from 1 for each dimension
  • No NULL value is contained
  • Length of the array is less than 1GB, due to the restriction of variable length datum in PostgreSQL
  • Array with smallint, int, bigint, real or float data type

If and when the array satisfies the above terms, we can determine the location of (i,j) element of the array by the index uniquely, and it enables GPU thread to fetch the datum to be processed very efficiently. Also, array-based matrix packs only the data to be used for calculation, unlike usual row-based format, so it has advantaged on memory consumption and data transfer.

Definition Result Description
array_matrix(variadic arg, ...) array It is an aggregate function that combines all the rows supplied. For example, when 3 float arguments were supplied by 1000 rows, it returns an array-based matrix of 3 columns X 1000 rows, with float data type.
This function is declared to take variable length arguments. The arg takes one or more scalar values of either smallint, int, bigint, real or float. All the arg must have same data types.
matrix_unnest(array) record It is a set function that extracts the array-based matrix to set of records. array is an array of smallint, int, bigint, real or float data. It returns record type which consists of more than one columns according to the width of matrix. For example, in case of a matrix of 10 columns X 500 rows, each records contains 10 columns with element type of the matrix, then it generates 500 of the records.
It is similar to the standard unnest function, but generates record type, thus, it requires to specify the record type to be returned using AS (colname1 type[, ...]) clause.
rbind(array, array) array array is an array of smallint, int, bigint, real or float data. This function combines the supplied two matrices vertically. Both matrices needs to have same element data type. If width of matrices are not equivalent, it fills up the padding area by zero.
rbind(array) array array is an array of smallint, int, bigint, real or float data. This function is similar to rbind(array, array), but performs as an aggregate function, then combines all the input matrices into one result vertically.
cbind(array, array) array array is an array of smallint, int, bigint, real or float data. This function combines the supplied two matrices horizontally. Both matrices needs to have same element data type. If height of matrices are not equivalent, it fills up the padding area by zero.
cbind(array) array array is an array of smallint, int, bigint, real or float data. This function is similar to cbind(array, array), but performs as an aggregate function, then combines all the input matrices into one result horizontally.
transpose(array) array array is an array of smallint, int, bigint, real or float data. This function makes a transposed matrix that swaps height and width of the supplied matrix.
array_matrix_validation(anyarray) bool It validates whether the supplied array (anyarray) is adequate for the array-based matrix. It is intended to use for sanity check prior to invocation of PL/CUDA function, or check constraint on domain type definition.
array_matrix_height(array) int array is an array of either smallint, int, bigint, real or float data. This function returns the height of the supplied matrix.
array_matrix_width(array) int array is an array of either smallint, int, bigint, real or float data. This function returns the width of the supplied matrix.
array_vector_rawsize(regtype,int) bigint It returns required bytesize to store an array-based vector (1-dimensional array) with data type specified by the 1st argument and height by the 2nd argument. It is intended to use for #plcuda_results_bufsz and #plcuda_working_bufsz.
array_matrix_rawsize(regtype,int,int) bigint It returns required bytesize to store an array-based matrix with data type specified by the 1st argument, height by the 2nd argument and width by the 3rd argument. It is intended to use for #plcuda_results_bufsz and #plcuda_working_bufsz.
array_cube_rawsize(regtype,int,int,int) bigint It returns required bytesize to store an array-based cube (3-dimensional array) with data type specified by the 1st argument, height by the 2nd argument, width by the 3rd argument, and depth by the 4th argument. It is intended to use for #plcuda_results_bufsz and #plcuda_working_bufsz.