Configuration

This chapter describes the meanings of environment variables that affect operations of database systems and how to set them.

Configuration Settings Reference

Setting parameters

All parameter names are case-sensitive. Each parameter value is one of five types: Boolean (or string), integer, floating point, and enumerated (enum). The data type sets the syntax for setting parameters.

  • Boolean: You can set the value to one of “on, off, true, false, yes, no, 1, 0 (case insensitive)” or one of “t, f, y, n”.

  • String: In general, single quotation marks are placed at both ends; if a value itself contains a single quotation mark, add one more single quotation mark. Quotation marks can be omitted if the value is a common, simple number or identifier.

  • Numeric (integer and floating point): The decimal point is allowed only when it is a floating-point parameter. Do not use thousands separators (e.g. ‘,’ in 1,000,000). Quotes are unnecessary.

  • Numeric with Unit: Some numeric parameters have implicit units because they are used to describe memory or time. Units can be in kilobytes, blocks (usually 8 kilobytes), milliseconds, seconds, and minutes. Among these settings, numeric values without units use the basic unit of setting, which can be found in pg_settings.unit. For you convenience, the setting can specify explicitly-specified units. For instance, if the time value is “120 ms”, the actual unit of the parameter will be converted to “ms.” Note that the value must be written in string (including quotes) to use this feature. Unit names are case-sensitive, and there can be a space between the numeric value and the unit.

    • Valid memory units are kB (kilobytes) and MB (megabytes), GB (gigabytes), and TB (terabytes). The memory multiplier is 1024 (not 1000).

    • Valid time units are ms (milliseconds), s (seconds), min (minutes), h (hour), and d (days).

  • Enumerated: Parameters of “enumerated” type are created in the same way as string parameters, but are limited to a single set of values. The allowed values of these parameters can be found in pg_settings.enumvals. Enum parameter values are not case-sensitive.

Parameter Interaction via the Configuration File

The most basic way to set these parameters is to edit postgresql.conf, which is usually in the data directory. If the database cluster directory is initialized, the default copy is installed. An example similar to this file is as follows:

# This is a comment
log_connections = yes
log_destination = 'syslog'
search_path = '"$user", public'
shared_buffers = 128MB

A parameter per line is specified. The equals sign between name and value is optional. Blank spaces are not important (except for the quoted parameters) and blank lines are ignored. The hash mark (#) means that the rest of the line is a comment. Simple identifiers or non-numeric parameter values should use single quotes. To include a single quotation mark in the parameter value, you must add one more single quotation mark, or use backslashes and quotation marks.

The parameters configured like this are provided in the cluster by default. The settings visible in the active session are these values unless you override them. The following section describes how an administrator or user overrides these defaults.

Each time the main server process receives a SIGHUP signal, the configuration file is read again. Executing pg_ctl reload on the command line or calling SQL function pg_reload_conf () sends SIGHUP. The main server process also spreads this signal to all server processes that are currently running, allowing the new value to be applied to the existing sessions (applied after the currently-running client command is complete). Alternatively, the user may send signals directly to a single server process. Some parameters can only be configured when the server starts. Changing the entries of the configuration file will be ignored until the server restarts. Likewise, incorrect parameter settings in the configuration file are also ignored during the SIGHUP process (but recorded in the log).

In addition to postgresql.conf, the AgesnGraph data directory ($AGDATA) contains postgresql.auto.conf that has the same format as postgresql.conf; postgresql.auto.conf should not be edited directly. This file contains the settings provided by the ALTER SYSTEM command. Every time postgresql.conf is present, this file is automatically read and the corresponding settings are applied in the same manner. The settings in postgresql.auto.conf take precedence over those in postgresql.conf.

Parameter Interaction via SQL

AgensGraph provides three SQL commands for setting configuration defaults. The ALTER SYSTEM command mentioned above provides a way to change global defaults using a SQL syntax, which is equivalent to editing postgresql.conf in function. In addition, there are two commands that enable default settings for each database or role.

  • ALTER DATABASE: Overrides global settings by database.

  • ALTER ROLE: Overrides both global and database-specific settings with custom values.

The values set using ALTER DATABASE and ALTER ROLE shall apply only when starting a new database session. This overrides the value obtained from the configuration file or server command line and applies the default for the rest of the session. As some settings cannot be changed after starting the server, configuration using this command (or one of those listed below) is not possible.

When a client connects to the database, it provides two additional SQL commands (or equivalent functions) that can interact with the AgensGraph session-local configuration settings.

  • You can check the current values of all parameters with SHOW command. The corresponding function is current_setting (setting_name text).

  • SET command allows you to modify the current value of the parameter, which can be set locally in the session. It does not affect other sessions. The corresponding function is set_config (setting_name, new_value, is_local).

The system view pg_settings can also be used to check and change session-local values.

  • View query is similar to SHOW ALL, but shows more detailed results. It is also more flexible as it enables you to specify filter conditions or join with other relations.

  • In the view, using UPDATE to update the setting column is equivalent to executing the SET command.
    For example:

SET configuration_parameter TO DEFAULT;

The above syntax is identical with the one below:

UPDATE pg_settings SET setting = reset_val WHERE name = 'configuration_parameter';

Managing Configuration File Contents

AgensGraph provides features to subdivide the complicated postgresql.conf into smaller files. Though the configuration methods of the features are not identical, they are useful especially for managing related servers.

In addition to setting individual parameters, postgresql.conf has “include, a directive. It specifies other file to read and process it as if the file is inserted into the configuration file. This feature physically divides the configuration file. Here is a simple example of include:

include 'filename'

If the filename is not absolute, it is processed as a relative path to the configuration file directory it references. “include” can be nested.

There is also a directive “include_if_exists”, which works the same as the include directive except when the reference file does not exist or the file cannot be read. “include” treats this as an error condition, but include_if_exists simply logs the message and continues to process the referenced configuration file.

postgresql.conf may also include include_dir, which specifies the path to the configuration file to include as follows:

include_dir 'path'

If it is not an absolute path, it is processed as a relative path to the referenced configuration file directory. Files that are not directories within the specified directory are included only when their names end with .conf. As such files may be hidden on some platforms, file names that start with . are also ignored to prevent mistakes. Files in the include directory are processed in the file name order (according to the C locale conventions; for example, numeric-alphabetical order and uppercase-lowercase order).

Rather than using a single postgresql.conf file, “include” files or directories can be used to logically separate database configuration. Imagine a company that runs two database servers with different memory capacities. In the case of logging, there are likely to be configuration elements shared by two databases. However, the two servers are likely to have different/customized memory-related parameters. The way to manage such situation is to split the changed customized configuration content into three files. Users may include each file by adding the following code to the end of the postgresql.conf file:

include 'shared.conf'
include 'memory.conf'
include 'server.conf'

The shared.conf file on all systems is identical. Each server with a different memory size can share the same memory.conf. Users are able to manage both 8GB RAM and 16GB RAM servers as a single file. Server-specific configuration information is included in server.conf.

You may also create a configuration file directory and add this information into the file. For example, the conf.d directory can be referenced as the last entry in postgresql.conf.

include_dir 'conf.d'

Then, specify the filename of the conf.d directory as follows:

00shared.conf
01memory.conf
02server.conf

This naming convention clarifies the order of file loading. When the server reads the configuration file, only the last parameter setting is applied. In this example, the values set in conf.d/02server.conf override the values set in conf.d/01memory.conf.

You can use this method for a descriptive naming of files.

00shared.conf
01memory-8GB.conf
02server-foo.conf

This placement order assigns a unique name for each configuration file change. By doing so, this reduces ambiguity when several server configurations are stored in a single location (e.g. the version control repository).

File Locations

By default, the three configuration files (postgresql.conf, pg_hba.conf, and pg_ident.conf) are stored in the data directory of the database cluster and the configuration file can be placed in a different location using the following parameters: By keeping configuration files separate, management and correct backup of configuration files can be done in an easier manner.

  • data_directory (string)
    Specifies a directory to store the data.. This parameter can be set only at server startup.

  • hba_file (string)
    Specifies a configuration file for host-based authentication (pg_hba.conf). This parameter can be set only at server startup.

  • ident_file (string)
    Specifies a configuration file for user name mapping (pg_ident.conf). This parameter can be set only at server startup.

  • external_pid_file (string)
    Specifies the name of an additional process ID (PID) file that must be created by the server for use by a server management program. This parameter can be set only at server startup.

The above parameters are not explicitly set in the default installation. The data directory is specified as the AGDATA environment variable, and all configuration files can be found in the data directory.

To keep a configuration file somewhere other than the data directory, the AGDATA environment variable must contain the configuration file and data_directory parameters should be set in postgresql.conf so that the data directory is physically located. data_directory redefines the data directory for the locations of data directories (not the locations of configuration files).

If desired, the name/location of the configuration file can be specified individually using the config_file, hba_file, and ident_file, and can be set within the default configuration file (postgresql.conf). It is not necessary to specify AGDATA if three parameters and data_directory are explicitly set.

When setting these parameters, the relative path is interpreted relative to the directory where Agens starts.

Resource Consumption

Memory

  • shared_buffers (integer)

    • Set the amount of memory used by the database server for shared memory buffers. The default value is typically 128 megabytes (128MB), but this value may not be reached if it is not supported by the kernel configuration (to be determined during initdb). This setting should be at least 128 kilobytes. (The BLCKSZ value (not default) changes the minimum value.) Settings much larger than the minimum value are used when a good performance is needed.

    • If you are using a dedicated database server with more than 1GB of RAM, the appropriate starting value for shared_buffers is 25% of the system memory. Workloads are more effective with larger settings of shared_buffers. However, as AgensGraph also depends on the operating system cache, allocating more than 40% of RAM to shared_buffers is not recommended for system efficiency. If you set a larger shared_buffers to run a large amount of new or changed data write processes over a long period of time, you should also increase the setting accordingly in checkpoint_segments.

    • If the system RAM is less than 1GB, you need an adequate space for the operating system. Thus, making the RAM ratio smaller is a right decision. Increasing the shared_buffers value on Windows is not effective. You may get better results by making this setting smaller, and by making the cache relatively large for the operating system. The useful range for shared_buffers on Windows systems is 64MB to 512MB.

  • huge_pages (enum)

    • Sets the Huge page to Active/Inactive. Valid values are try (default) and on, off.

    • Currently this feature is only supported on Linux. If it is set to try, it is ignored in other systems.

    • Using huge pages results in a higher performance as it consumes smaller page tables and less CPU for memory management.

    • If you set huge_pages to try, the server will use huge pages. However, if it fails, it will fall back to using normal allocation. If set to on, the server will not start if it fails to use huge pages. If off, it does not use huge pages.

  • temp_buffers (integer)

    • Sets the maximum number of temporary buffers used by each database session. There is a session-local buffer that is used only to access temporary tables. The default value is 8 megabytes (8 MB). Settings can be changed within an individual session, but only before the first use of the temporary tables in a session. Subsequent changes to the value will not take effect in the session.

    • The session allocates temporary buffers up to the limit set in temp_buffers. In practice, the cost for setting a large value in a session that does not require a lot of temporary buffers amounts to a buffer descriptor or only about 64 bytes each time temp_buffers increases. However, if the buffer is actually used, it requires an additional 8192 bytes (or typically BLCKSZ bytes).

  • max_prepared_transactions (integer)

    • The maximum number of transactions that can be in a “ready” state at the same time.

    • If you do not plan to use prepared transactions, this parameter should be set to “0” to prevent mistakes in creating prepared transactions. If you set max_prepared_transactions to at least max_connections when using prepared transaction, you may let the session hold the prepared transaction.

    • When the standby server is running, it should be set to be greater than or equal to the master server value. Otherwise, the standby server will not allow queries.

  • work_mem (integer)

    • Specifies the amount of memory to be used by the internal Align command and hash table before writing the temporary disk file. The default value is 4 megabytes (4MB). For complex queries, several Align or hash commands can be executed in parallel. Each command can use the amount of memory specified by this value before writing the data to the temporary file. Sessions that are running may execute this command at the same time as well. The total memory used is a multiple of work_mem. The Align command is used for ORDER BY, DISTINCT, and merge joins. Hash tables are used for hash joins, hash-based aggregations, and hash-based processing of IN subqueries.

  • maintenance_work_mem (integer)

    • Specifies the maximum amount of memory used in maintenance commands such as VACUUM, CREATE INDEX, and ALTER TABLE ADD FOREIGN KEY. The default is 64 megabytes (64 MB). This command may be executed one at a time in a database session, and a normal installation cannot have multiple concurrent commands. It is safe to set this value to a much larger value than work_mem. Large settings can improve vacuuming and the database dump recovery performance.

    • If autovacuum is running, it can be allocated in multiples of autovacuum_max_workers in this memory. Thus, do not set the default too high. It may be useful to set/manage autovacuum_work_mem separately.

  • autovacuum_work_mem (integer)

    • Specifies the maximum amount of memory used by each autovacuum worker process. The default value is -1, which means that maintenance_work_mem should be used instead. This setting does not affect VACUUM when running in a different context.

  • max_stack_depth (integer)

    • Specifies the maximum safe depth of the server execution stack. The ideal configuration is to set it slightly short of the safety margin specified forcibly by the kernel (as set by ulimit -s or to be equivalent to local). The safety margin is required since only important parts of the potential recursive routines (e.g. expression evaluation), not all routines on the server, are checked for stack depth. The default setting is basically small, 2 megabytes (2MB), with a low probability of collision. However, if the setting is too small, it may be difficult to execute composite functions. Only the super user may change the setting.

    • If max_stack_depth is set to a larger value than the actual kernel limit, the runaway recursive function may conflict with the backend process. On a platform where AgensGraph is allowed to determine the kernel limit, the server does not allow this variable to be set to an unstable value. However, as not all platforms provide information, you should be careful when choosing values.

  • dynamic_shared_memory_type (enum)

    • Specifies the dynamic shared memory implementation to be used by the server. Possible values include posix (POSIX shared memory allocated through shm_open) and sysv (System V shared memory allocated through shmget), windows (Windows shared memory), mmap (shared memory simulation that uses memory map stored in the data directory), and none (disabled function). Some of these are not supported on certain platforms. The first support option is the default settings of the given platform. Use of non-default mmap options on platforms is not recommended generally. This is because the operating system repeatedly rewrites the modified page to disk, increasing system I/O load. However, in the case you save the pg_dynshmem directory on a RAM disk or disable other shared memory features, debugging can be useful.

Write Ahead Log

Settings

  • wal_level (enum)

    • wal_level determines the amount of information to be recorded in the WAL. The default is minimal, which records only the information required to recover from a collision or an immediate shutdown. Archive adds only the logging needed for the WAL archive. hot_standby adds more information to the standby server for read-only queries. logical adds necessary information to support logical decoding. Each level includes information logged at a low level. This parameter is set when the server starts.

    • At the minimal level, some bulk-running WAL logging can safely be skipped. This will speed up execution. Executions where this optimization can be applied include:

      1. CREATE TABLE AS

      2. CREATE INDEX

      3. CLUSTER

      4. COPY into tables that were created or truncated in the same transaction.

      Above executions apply to the tables created in the same transaction or the tables in which records were deleted.

    • However, as the minimal WAL does not get enough information to reconstruct the data from the base backup and WAL logs, you must use archive or higher to perform WAL archiving (archive_mode) and streaming replication.

    • Information identical with archive at the hot_standby level and information needed to reconfigure the state of the running transaction are logged from WAL. To use read-only queries on the standby server, set wal_level to hot_standby or higher on the operating server and enable hot_standby on the standby server. We consider that there is no measurable performance difference between the hot_standby and archive levels. If you find any noticeable change in operations, please feel free to give us your feedback.

    • Information that is considered identical with using hot_standby at the logical level and information needed to use a logical change set are logged from WAL. Using the logical level increases the WAL volume. This is true especially when you set multiple tables to REPLICA IDENTITY FULL and run multiple UPDATE and DELETE statements.

  • fsync (boolean)

    • If this parameter is on, the AgensGraph server will attempt to verify whether the update has been physically written to disk or not using the fsync () system call or a corresponding method (see wal_sync_method). Accordingly, after a conflict with an operating system or hardware, it is possible to restore the database cluster to a certain state.

    • Turning off fsync has a performance advantage. Data corruption, however, may not be recoverable in the event of a power outage or system failure. It is therefore desirable to turn off fsync only if you can easily rebuild the entire database with external data.

    • When turning off fsync is considered safe and when you plan to load a new database cluster from a backup file during initial loading, you can batch-process data from the database to be rebuilt, or can rebuild a read-only database that is used only for failover. It is not desirable for only high-performance hardware to turn off fsync.

    • In the case where you disable and re-enable fsync, it is necessary to forcibly move all the changed buffers of the kernel to a durable storage area for recovery reliability. This can be done by executing initdb –sync-only or sync, unmounting the file system, or rebooting the server.

    • If you disable synchronous_commit for non-critical transactions, you can have potential performance benefits as it disables fsync without risking data collisions.

    • fsync can only be set in the postgresql.conf file or on the server command line. If you disable this parameter, you must consider disabling full_page_writes as well.

  • synchronous_commit (enum)

    • Specifies whether the transaction commit should be waiting until the WAL record is written to disk before the command returns a “success” indication to the client. Valid values are on, remote_write, local, and off. The default and safety settings are on. If off, there may be a delay between the time when the “success” indication is delivered to the client and the time when it is guaranteed that the transaction is truly secure without a server crash. (The maximum delay is three times wal_writer_delay.) Unlike fsync, setting this parameter to off does not cause database consistency problems. It is true that operating systems or database failures result in partial loss of so-called recently-committed transactions, but the database state appears to be like a clear shutdown of the transaction. Therefore, disabling synchronous_commit can be a useful alternative if performance is more important than certainty for transactional durability.

    • If synchronous_standby_names is set, this parameter also controls whether the transaction commit should wait until the WAL record of the transaction is replicated to the standby server. If set to on, the commit waits until the transaction commit record is received from the standby server and a response from the current synchronous standby server indicates that it has been written to disk. This ensures that transactions are not lost in the case where there is no corruption of the database store on both the production server and the standby server. When set to remote_write, the commit waits until it receives a response indicating that i) a commit record of the transaction is received and written to the standby server’s operating system, but ii) it is uncertain whether the data has arrived in the stable storage of the standby server. For data retention, this setting is sufficient even if the standby server instance of AgensGraph fails. However, this is not the case if the standby server fails at the operating system level.

    • If you are using synchronous replication, it is generally reasonable to i) wait for it to be written locally to disk, ii) wait for replication of WAL records, or ii) allow transactions to be committed asynchronously. However, the local settings can be used for transactions waiting for local disk writes (rather than synchronous replication). If you do not set synchronous_standby_names, on, remote_write, and local settings will provide the same synchronization level, and transaction commit will only wait for local disk writes.

    • You can change this parameter at any time. The actions of all transactions are determined by the settings when they are actually committed. Thus, it is possible and useful to make some transaction commits synchronous and others asynchronous. For instance, when the default is reversed, in order to make a single transaction commit with multiple statements asynchronous, you should execute SET LOCAL synchronous_commit TO OFF within the transaction.

  • wal_sync_method (enum)

    • This method is used to forcibly update WAL on the disk. If fsync is off, this setting is ignored because the WAL file is not forcibly updated. Possible values are:

      1. open_datasync (write WAL files with open() option O_DSYNC)

      2. fdatasync (call fdatasync() at each commit)

      3. fsync (call fsync() at each commit)

      4. fsync_writethrough (call fsync() at each commit, forcing write-through of any disk write cache)

      5. open_sync (write WAL files with open() option O_SYNC)

    • Use O_DIRECT if you also need the open_* option. The option may not be available on some platforms. The first type in the list above supported by the platform is the default value. In Linux, however, fdatasync is the default. The default value is not necessarily ideal. To create a fail-safe configuration or to optimize performance, it may be necessary to change values or change other aspects of system configuration, which will be covered in Chapter 13.1. This parameter can only be set in the postgresql.conf file or on the server command line.

  • full_page_writes (boolean)

    • If this parameter is on, the AgensGraph server records the whole content of each page on disk to WAL during the first modification of each disk page after a checkpoint. This is necessary because, in the event of an operating system failure, page writes in progress are only partially completed and old/new data may coexist in the pages on disk. In general, you cannot fully recover such pages only with row-level change data stored in WAL during a crash recovery. Storing full-page images would ensure proper recovery of the pages; it would, however, definitely increase the size of data to be written to WAL. (As WAL replay always starts at the checkpoint, doing this job during the first change of each page after the checkpoint will be enough. One way to reduce the cost of writing the entire page to WAL is to increase the checkpoint interval parameter.

    • Turning off this parameter would improve normal operating speed, but in the event of a system failure, corrupted data may become unrecoverable or data corruption may not be revealed. These risks are small, but they are similar to when fsync is turned off. This means that this parameter may be disabled only when the environment is identical with the recommended conditions in fsync.

    • Disabling this parameter does not affect the use of WAL archiving for point-in-time recovery (PITR).

    • This parameter can only be set in the postgresql.conf file or on the server command line. The default value is on.

  • wal_log_hints (boolean)

    • If this parameter is on, the AgensGraph server writes the whole content of the page to WAL during the first modification of each disk page after the checkpoint, including minor modification of the so-called hint bits.

    • If data checksum is enabled, hint bit updates are always WAL-logged and this parameter setting is ignored. If data checksum is enabled in the database, you can use this parameter setting to test how much additional WAL logging occurs.

    • This parameter is set at server startup. The default is off.

  • wal_buffers (integer)

    • Sum of the shared memory that is used for the WAL data and has not been written to disk yet. The size of default setting “-1” is identical with that of 1/32 (about 3%) of shared_buffers (64kB or more, less than one WAL segment, and usually 16MB). If the automatic setting is too large or too small, you can directly select it; if it is less than 32kB, it is processed as 32kB. This parameter is set at server startup.

    • As the content of the WAL buffers is written to disk per transaction commit, an extremely large value is likely to have little advantage. However, setting this value to at least a few megabytes will improve the write performance of busy servers by allowing multiple clients to commit at once. Autotuning with the default setting of -1 produces reasonable results in most cases.

  • wal_writer_delay (integer)

    • Determines the delay between operations of the WAL writer. At each operation, the writer writes WAL to disk. After sleeping for milliseconds (wal_writer_delay), it repeats the process. The default is 200 milliseconds (200ms). In many systems, the efficient sleep delay setting is 10 milliseconds. If wal_writer_delay is set to a value other than a multiple of 10, the result will be identical with the value set by rounding a multiple of 10. This parameter can only be set in postgresql.conf or on the server command line.

  • commit_delay (integer)

    • commit_delay adds the measured time delay in microseconds before initializing the WAL flush. This can improve the group commit throughput by allowing a large volume of transactions to be committed through a single WAL flush if the system load is large enough to be ready to commit additional transactions within a given interval. However, this also increases the waiting time up to commit_delay microseconds per WAL flush. In the case where there is no transaction ready to commit, then the delay is a waste of time. Thus, with the writing about to start, the delay is done when there are the required number of transactions (at least commit_siblings) being performed. In addition, if fsync is disabled, the delay is not performed. The default commit_delay is 0 (no delay). Only the superuser may change the setting.

  • commit_siblings (integer)

    • The minimum number of concurrent open transactions required for performing commit_delay. If the value is large, it is highly likely that at least one other transaction is ready to commit during the delay interval. The default is 5 transactions.

Query Planning

Planner Method Configuration

These configuration parameters provide an approximate way to affect the query plan selected by the query optimizer. If the default plan selected by the optimizer for a particular query is not optimal, you may use one of these configuration parameters as an ad hoc way to force the optimizer to choose another plan. A better way for the optimizer to improve the level of the selected plan is to adjust planner cost constants, run ANALYZE manually, increase the default_statistics_target configuration parameter, and use ALTER TABLE SET STATISTICS; this way, the statistics collected for a particular column can be increased.

  • enable_bitmapscan (boolean)

    • Enable or disable query planner’s use of bitmap-scan plan types. The default value is on.

  • enable_hashagg (boolean)

    • Enable or disable query planner’s use of hashed aggregation plan types. The default value is on.

  • enable_hashjoin (boolean)

    • Enable or disable query planner’s use of hash-join plan types. The default value is on.

  • enable_indexscan (boolean)

    • Enable or disable query planner’s use of index-scan plan types. The default value is on.

  • enable_indexonlyscan (boolean)

    • Enable or disable query planner’s use of index-only-scan plan types. The default value is on.

  • enable_material (boolean)

    • Enable or disable query planner’s use of materialization. It is difficult to completely inhibit materialization, but turning off this variable will prevent the planner from inserting materialize nodes except when accuracy is required. The default value is on.

  • enable_mergejoin (boolean)

    • Enable or disable query planner’s use of merge-join plan types. The default value is on.

  • enable_nestloop (boolean)

    • Enable or disable query planner’s use of nested-loop join plans. It is difficult to completely inhibit nested-loop joins, but turning off this variable will prevent the planner from using it if there are other alternative methods available. The default value is on.

  • enable_seqscan (boolean)

    • Enable or disable query planner’s use of sequential scan plan types. It is difficult to completely inhibit sequential scans, but turning off this variable will prevent the planner from using it if there are other alternative methods available. The default value is on.

  • enable_sort (boolean)

    • Enable or disable query planner’s use of explicit sort steps. It is difficult to completely inhibit explicit sorts, but turning off this variable will prevent the planner from using it if there are other alternative methods available. The default value is on.

  • enable_tidscan (boolean)

    • Enable or disable query planner’s use of TID scan plan types. The default value is on.

The followings are the parameters added in AgensGraph.

  • enable_eager (boolean)

    • Enables or disables the use of eager plans in the query planner. The default value is ON.

  • enable_gathermerge (boolean)

    • Enables or disables the use of gather merge plans in the query planner. The default value is ON.

  • enable_multiple_update (boolean)

    • Users may choose whether to enable/disable enable_multiple_update to allow/disallow multiple modifications of the graph elements. The default value is ON.

Planner Cost Constants

The cost variable described in this section is calculated at an arbitrary scale. As only the relative values are related, the query plan does not change if it is moved up or down with the same coefficient. Basically, these cost variables are based on the cost of fetching sequential pages. Since seq_page_cost is generally set to 1.0, other cost variables are configured based on seq_page_cost. You may use other costs, if you want, such as the number of milliseconds that you actually run on a particular machine.

Note: Unfortunately, there is no suitable way to determine the ideal value for a cost variable. It is best to process this as an average of the total query mixes received by a particular installation. In light of some experience, we should say changing this value can be very dangerous.

  • seq_page_cost (floating point)

    • Sets the disk page fetch cost, which is part of the sequential fetch method expected by the planner. The default value is 1.0. This value can override tables and indexes in a particular tablespace by setting the same named tablespace parameters.

  • random_page_cost (floating point)

    • Sets the processing cost of nonsequentially-fetched disk pages expected by the planner. Sets the disk page fetch cost, which is part of the sequential fetch method expected by the planner. The default value is 4.0. This value can override tables and indexes in a particular tablespace by setting the same named tablespace parameters.

    • Reducing this value in proportion to seq_page_cost causes the system to be biased towards the index scan. Increasing this value makes index scans more expensive; both values can be increased or decreased together to change the importance of disk I/O costs in proportion to the CPU cost. This feature is explained in the following parameters.

    • Random access to disk storage is typically four times more expensive than sequential access. However, as with indexed reads, most of random access to disk occurs in cache; a small default is used (4.0). Even if random access appears to be 40 times slower than sequential access, we may expect 90% of the actual random reads to be cached.

    • If a 90% cache ratio is a false assumption in the user’s workload, you can increase the random_page_cost to reflect the actual costs of random storage reads. Thus, it may be appropriate to reduce random_page_cost if the database is smaller than the total server memory and the data is fully cached. A storage with a relatively low random reading cost, such as an SSD, may be better modeled with a lower random_page_cost value.

    Tip: Although setting random_page_cost below seq_page_cost is allowed in the system, it is not reasonable to do so in practice. However, if the entire database is cached in RAM, no cost is required in touching out-of-sequence pages and the two parameters may be identically configured. Also, since the cost for fetching pages that are already in RAM in an excessively-cached database is much lower than the cost for fetching pages in a normal state, the user must reduce both values in proportion to the CPU parameters.

  • cpu_tuple_cost (floating point)

    • Sets the processing cost of each row during a query expected by the planner. The default value is 0.01.

  • cpu_index_tuple_cost (floating point)

    • Sets the processing cost of each index entry during index scan expected by the planner. The default value is 0.005.

  • cpu_operator_cost (floating point)

    • Sets the processing cost of each operator or function during a query expected by the planner. The default value is 0.0025.

  • effective_cache_size (integer)

    • Sets the effective size of disk cache that can be used for a single query to be estimated by the planner. This parameter is reflected in the cost of using index. The larger the value, the more likely index scan will be used. The smaller the value, the more likely sequential scan will be used. If you set this parameter, you must consider both the shared buffers and disk cache of the kernel used by the AgensGraph data file. The number of concurrent queries that are expected for different tables should be considered as well. This parameter has no effect on the amount of shared memory allocated by AgensGraph, nor does it preserve the kernel disk cache; it is used only for estimation purposes. The system also does not assume that data will remain in the disk cache between queries. The default is 4 gigabytes (4GB).

Error Reporting and Logging

Where To Log

  • log_destination (string)

    • There are several ways to record server messages, including stderr, csvlog, and syslog. Separate these parameters with commas and add them to the desired list of log targets. The default is to write to stderr only. This parameter can only be set in postgresql.conf or on the server command line.

    • If log_destination contains csvlog, the log entries are output in a “comma separated values (CSV)” format for easier loading into the program.

  • logging_collector (boolean)

    • This parameter is a background process that uses logging_collector to capture log messages sent to stderr and redirect them to a log file. This approach is more useful than logging to syslog because some types of messages do not appear in the syslog output. (Common examples include a dynamic link error message and an error message generated by a script such as archive_command). This parameter can be set only at server startup.

  • log_directory (string)

    • If logging_collector is enabled, this parameter determines the directory where the log files will be created. It can be specified as an absolute path or can be compared to the cluster data directory. This parameter can only be set in postgresql.conf or on the server command line. The default setting is pg_log.

  • log_filename (string)

    • If logging_collector is enabled, this parameter sets the filename of the generated log file. As this value is processed as a strftime pattern, it can be used to specify a %-escapes file name. (If there is a timezone-dependent %-escapes, the operation will be performed based on the realm specified in log_timezone.) The supported %-escapes is similar to those listed in the Open Group’s strfectime specification. Platform-specific (non-standard) extensions do not work as the system’s strfectime is not directly used. The default value is postgresql-%Y-%m-%d_%H%M%S.log. If you specify a file name, reuse the log so that it does not fill the entire disk.

      If you enable the CSV format output in log_destination, .csv is added as a timestamp log file name to generate a file name in CSV format (If you write log_filename as .log, .log is created).

      This parameter can be set only in postgresql.conf or on the server command line.

  • log_file_mode (integer)

    • On UNIX systems, this parameter sets the permissions on the log file when logging_collector is enabled (this parameter is ignored on Microsoft Windows). The parameter value is expected to be specified in an approved format mode by the chmod and umask calls (if you want to use a general octet format, the number must start with 0 (zero)).

      The default permissions are 0600, and only the server owner can read or write the log files. Another commonly useful setting is 0640, which allows a group of users to read the files. To use these settings, you need to change log_directory to store files outside the cluster data directory. It is not wise to set the log file world-readable because it may contain sensitive data in some cases.

      This parameter can be set only in postgresql.conf or on the server command line.

  • log_rotation_age (integer)

    • If logging_collectoris enabled, this parameter determines the maximum lifetime of individual log files. After the maximum lifetime expires, a new log file is created. Set this parameter to 0 if you do not want to create new log files on a time basis. This parameter can be set only in postgresql.conf or on the server command line.

  • log_rotation_size (integer)

    • When logging_collector is enabled, this parameter determines the maximum size of individual log files. After a KB file of the specified size is generated, a new log file is created. Set this parameter to 0 if you do not want to create new log files on a size basis. This parameter can be set only in postgresql.conf or on the server command line.

  • log_truncate_on_rotation (boolean)

    • If logging_collector is enabled, this parameter will overwrite existing log files with the same names using AgensGraph rather than adding new ones. However, truncate occurs only when a new file is opened by time-based rotation rather than server restart or size-based rotation. When the parameter is off, existing files are added in all cases. For instance, if log_filename is set to something like postgresql-%H.log, you can create a 24-hour log file and then overwrite it periodically. This parameter can be set only in postgresql.conf or on the server command line.

      Example: To maintain a seven-day log, create a log file named as server_log.Mon, server_log.Tue, etc. everyday. In order to automatically overwrite last week’s log with this week’s log, set log_filename to server_log.%a, and turn on log_truncate_on_rotation, and set log_rotation_age to 1440.

      Example: A log file per hour is required to maintain a 24-hour log. As faster circular logging is possible when the file size exceeds 1GB, set log_filename to server_log.%H%M, turn on log_truncate_on_rotation, set log_rotation_age to 60, and set log_rotation_size to 1000000. If log_filename include %M, you may allow size-driven rotation by choosing a filename that is different from the time-specified filename.

  • syslog_facility (enum)

    • If you enable logging to syslog, this parameter determines which syslog “facility” to use. You may choose one from LOCAL0, LOCAL1, LOCAL2, LOCAL3, LOCAL4, LOCAL5, LOCAL6, and LOCAL7. The default value is LOCAL0. Refer to the documentation for the system syslog daemon. This parameter can be set only in postgresql.conf or on the server command line.

  • syslog_ident (string)

    • If you enable logging to syslog, this parameter determines the name of the program used to identify the AgensGraph messages in the syslog log. The default value is PostgreSQL. This parameter can be set only in postgresql.conf or on the server command line.

  • syslog_sequence_numbers (boolean)

    • When logged in syslog and the parameter is on (default), each message is prefixed with an increasing sequence number (e.g. [2]). This cycle means suppression of “— last message iteration frequency—” that requires many syslog implementations. A more recent syslog implementation might not need it (e.g. $RepeatedMsgReduction in rsyslog) because it can configure repetitive message suppression. You can also disable this option if you really want to block repetitive messages.

      This parameter can be set only in postgresql.conf or on the server command line.

  • syslog_split_messages (boolean)

    • If you enable logging to syslog, this parameter determines how messages are passed to syslog. When set to on (default), messages are split by line; long lines are split into 1024 bytes, which is the usual size limit in existing syslog implementations. When set to off, the AgensGraph server log messages are forwarded to the syslog service and are connected to the syslog service to handle potentially-large messages.

      If syslog is ultimately logged in a text file, it is a good idea to leave the settings as they are as most syslog implementations cannot handle large messages or configure them in a unusual way. However, if syslog is ultimately used on other media, keeping the message logically can be more useful.

      This parameter can be set only in postgresql.conf or on the server command line.

  • event_source (string)

    • When you enable logging in the event log, this parameter determines the program name used to identify AgensGraph messages in the log. The default value is PostgreSQL. This parameter can be set only in postgresql.conf or on the server command line.

Version and Platform Compatibility

Other Platforms and Clients

The followings are the parameters added in AgensGraph.

  • allow_null_properties (boolean)

    • You may allow insertion of null properties into vertices and edges. The default is OFF.

  • case_sensitive_ident (boolean)

    • Use of case-sensitive identifiers can be set. If case_sensitive_ident is ON, all identifiers are case-sensitive. The default is OFF.

  • case_compat_type_func (boolean)

    • When case_sensitive_ident is ON, case-insensitive identifiers can be used for type and function names. The default is OFF.