Data Type
AgensGraph provides diverse data types. You can add new types as well using CREATE TYPE command. Refer to the [User-defined Type] clause for more information on CREATE TYPE.
The following table lists the generic data types provided by default, and some of the types that are used internally or not used may not be included. (“Alias” is an internally-used name.)
Name |
Alias |
Description |
---|---|---|
bigint |
int8 |
Signed 8-byte integer |
bigserial |
Auto-incrementing 8-byte integer |
|
bit [ (n) ] |
16-bit integer |
Fixed length bit string |
bit varying [ (n) ] |
varbit |
Variable-length bit string |
boolean |
bool |
Logical Boolean (true / false) |
box |
Square box on a plane |
|
bytea |
Binary data (“byte array”) |
|
character [ (n) ] |
char [ (n) ] |
Fixed-length character string |
character varying [ (n) ] |
varchar [ (n) ] |
Variable-length character string |
cidr |
IPv4 or IPv6 network address |
|
circle |
Circle on a plane |
|
date |
Calendar date (year, month, day) |
|
double precision |
float8 |
Double-precision floating-point number (8 bytes) |
inet |
IPv4 or IPv6 host address |
|
integer |
int, int4 |
Signed 4-byte integer |
interval [ fields ] [ (p) ] |
Time range |
|
json |
Text JSON data |
|
jsonb |
Binary JSON data, disjointed |
|
line |
Infinite straight line on a plane |
|
lseg |
Segment on a plane |
|
macaddr |
Media Access Control (MAC) address |
|
money |
Traffic volume |
|
numeric [ (p, s) ] |
decimal [ (p, s) ] |
The exact number of selectable digits |
path |
Geometric path in the plane |
|
pg_lsn |
AgensGraph log sequence number |
|
point |
Geometric points on a plane |
|
polygon |
Geometrically-closed path in plane |
|
real |
float4 |
Single-precision floating-point number (4 bytes) |
smallint |
int2 |
Signed two-byte integer |
smallserial |
serial2 |
Auto-incrementing 2-byte integer |
serial |
serial4 |
Auto-incrementing 4-byte integer |
text |
Variable-length character string |
|
time [ (p) ] [ without time zone ] |
Time (no time zone) |
|
time [ (p) ] with time zone |
timetz |
Includes time and time zone |
timestamp [ (p) ] [ without time zone ] |
Date and time (no timezone) |
|
timestamp [ (p) ] with time zone |
timestamptz |
Date and time, including time zone |
tsquery |
Text search query |
|
tsvector |
Text Search Document |
|
txid_snapshot |
User-level transaction ID snapshot |
|
uuid |
Universal unique identifier |
|
xml |
XML data |
Numeric Types
A numeric type consists of a 2/4/8 byte integer, a 4/8 byte floating point numbers, and a selectable total number of digits.
The following numeric types are available:
Name |
Storage Size |
Description |
Range |
---|---|---|---|
smallint |
2 bytes |
A small range of integers |
-32768 to +32767 |
integer |
4 bytes |
Common integer |
-2147483648 to +2147483647 |
bigint |
8 bytes |
A large range of integers |
-9223372036854775808 to +9223372036854775807 |
decimal |
variable |
Custom precision, correct |
Up to 131072 digits before the decimal point,\ |
up to 16383 digits after the decimal point |
|||
numeric |
variable |
Custom precision, correct |
Up to 131072 digits before the decimal point,\ |
up to 16383 digits after the decimal point |
|||
real |
4 bytes |
Variable precision, incorrect |
6 digits precision |
double precision |
8 bytes |
Variable precision, incorrect |
15 digits precision |
smallserial |
2 bytes |
Automatic incremental constant (small) |
1 to 32767 |
serial |
4 bytes |
Automatic incremental constant |
1 to 2147483647 |
bigserial |
8 bytes |
Automatic incremental integer (large) |
1 to 9223372036854775807 |
Integer Types
smallint, integer, and bigint types store a wide range of integers without decimal fractions. If you try to store a value beyond the allowable range, an error will occur.
integer type: This type is generally chosen as it provides the best balance point of range, storage size, and performance.
smallint type: Typically used only when there is insufficient disk space.
bigint type: To be used when the integer type range is insufficient.
SQL specifies only integer (or int), smallint, and bigint. (available as int2, int4, and int8 as well).
Arbitrary Precision Numbers
Numeric types may store a very large number of digits and perform calculations correctly. It is especially recommended when storing amounts and quantities that should be accurate. Arithmetic of numeric values, however, is much slower than the integer types or floating-point types described in the next section.
Scale in the numeric type means the number of digits to the right of the decimal point. Precision means the total number of significant digits of the total number. That is, the total number of digits on both sides of the decimal point. Thus, precision and scale of the number 23.5141 is 6 and 4, respectively. The scale of whole number can be considered 0 (Scale 0).
You may configure both the maximum precision and the maximum scale of a numeric column.
To declare a column of numeric type, use the following syntax:
NUMERIC(precision, scale)
Precision must be positive and scale must be zero or positive.
NUMERIC(precision)
If you specify numeric without precision or scale as follows, it selects Scale 0.
NUMERIC
If you create a numeric column that can store precision and scale values without specifying them, you may store the precision value. This kind of column can be used without restriction if it does not specify a specific scale. However, a numeric column with a scale value specified will be limited by the scale value. (When transferring data, make sure to specify the precision and scale always).
note::
Note: The maximum allowable precision is 1000 if explicitly specified in the type declaration. NUMERICs that do not specify precision are limited to the ranges described in the table.
In the case where the scale in the value to be stored is greater than the scale declared in the column, the system rounds off the value to the specified scale; after rounding-off, if the number of digits to the right of the decimal point exceeds “the declared scale subtracted from the declared precision,” an error will occur. Numeric values are physically stored without extra 0
values. Therefore, the precision and scale of the declared column is the maximum (not a fixed allocation). In this sense, numeric types are closer to varchar (n) than to char (n). The actual storage requirement is 2 bytes for each 4-digit group plus 3 for 8-byte overhead.
The decimal and numeric types are the same and both are SQL standards.
Floating-Point Types
The real and double precision data types are inaccurate variable-precision numeric types. Inaccuracy means that some values cannot be accurately converted to their internal form and are stored as approximate values, making the storage and retrieval of values somewhat inconsistent.
If you need accurate storage and computation (e.g. amount), use the numeric type instead.
If you need to perform complex calculations using this type for some unavoidable reason (e.g. when you need a specific behavior in boundary cases (infinity, underflow)), you should be careful with the implementation.
Comparing two floating-point values to see if they are equal may not always work as expected.
On most platforms, the real type has a minimum range of 1E-37 to 1E+37, with precision of at least 6. The double precision type generally has precision of at least 15, ranging from 1E-307 to 1E+308. Too large or too small values will generate an error. If precision of the entered number is too large, it may be rounded up. An underflow error occurs if the number is so close to zero that it cannot be marked as non-zero.
Serial Types
The smallserial, serial, and bigserial data types are not actual types, but are international notations for creating unique identifier columns (similar to the AUTO_INCREMENT attribute supported by some other databases).
The following two statements work in the same manner:
-- SERIAL
CREATE TABLE tablename (
colname SERIAL
);
-- SEQUENCE
CREATE SEQUENCE tablename_colname_seq;
CREATE TABLE tablename (
colname integer NOT NULL DEFAULT nextval ('tablename_colname_seq')
);
ALTER SEQUENCE tablename_colname_seq OWNED BY tablename.colname;
Create an integer column and sort it by default assigned in the sequence. A NOT NULL constraint is applied to prevent null values from being inserted. (In most cases, it is possible to prevent duplicate values from being accidentally entered with a UNIQUE or PRIMARY KEY constraint; such a constraint is not automatically generated.)
Finally, the sequence is marked as column “owned by,” so that it is not deleted unless the column or table is deleted.
The serial column default is specified in order to insert the next value of sequence into the serial column. This can be done by excluding columns from the column list of the INSERT statement or by using the DEFAULT keyword.
The serial type is the same as serial4; both generate an integer column.
The bigserial and serial8 types work the same except when creating a bigint column. bigserial should be used if you expect to use more than 231 identifiers throughout the lifetime of the table.
The smallserial and serial2 types operate the same except when creating a smallint column.
The sequence created for the serial column is automatically deleted when the owning column is deleted. You can delete the sequence without deleting the column, but the column base expression is forcibly deleted.
Character Types
The following character types are available:
Name |
Description |
---|---|
character varying(n), varchar(n) |
Variable with limit |
character(n), char(n) |
Fixed length, fill in blank |
text |
Unlimited variable length |
The two basic character types define character variation(n) and character(n). The value of n is a positive integer, and both can store up to n length characters (not bytes).
If you store a string that is longer than n, an error occurs if the excess string is not empty; if it is blank, it is truncated to the length of the specified n value. If you store a string that is shorter than n characters, it is filled with blanks for the character type. For character varying, the string except blanks is stored.
Trailing blanks in character types are treated as not syntactically significant and are ignored when comparing two values of the character type. Trailing blanks are syntactically significant when using pattern-matching regular expressions such as character varying, text, and LIKE.
char(n) and varchar(n) are aliases of character variation(n) and character(n). A character without a specifier is equivalent to a character(1), and a character varying without a specifier stores strings regardless of its size. It also provides a text type for storing strings, regardless of length.
An example of using the character type is shown below:
create table test1 (a character (4));
CREATE TABLE
insert into test1 values ('ok');
INSERT 0 1
select a, char_length(a) from test1;
a | char_length
------+-------------
ok | 2
create table test2 (b varchar(5));
CREATE TABLE
insert into test2 values ('ok');
INSERT 0 1
insert into test2 values ('good ');
INSERT 0 1
insert into test2 values ('too long');
Error: character varying(5) tries to store data that is too long for the data type.
insert into test2 values ('too long'::varchar(5));
INSERT 0 1
select b, char_length(b) from test2;
b | char_length
-------+-------------
ok | 2
good | 5
too l | 5
Date/Time Types
The following date/time types are available:
Name |
Storage Size |
Description |
Low Value |
High Value |
Resolution |
---|---|---|---|---|---|
timestamp [ (p) ] [ without time zone ] |
8 bytes |
Both date and time (no timezone) |
4713 BC |
294276 AD |
1 microsecond / 14 digits |
timestamp [ (p) ] with time zone |
8 bytes |
Both date and time, including timezone |
4713 BC |
294276 AD |
1 microsecond / 14 digits |
date |
4 bytes |
Date (no time) |
4713 BC |
5874897 AD |
1 day |
time [ (p) ] [ without time zone ] |
8 bytes |
Time (no date) |
00:00:00 |
24:00:00 |
1 microsecond / 14 digits |
time [ (p) ] with time zone |
12 bytes |
Include time of day, time zone |
00:00:00 |
24:00:00-1459 |
1 microsecond / 14 digits |
+1459 |
|||||
interval [ fields ] [ (p) ] |
16 bytes |
Time interval |
-178000000 years |
178000000 years |
1 microsecond / 14 digits |
Note: The SQL standard uses timestamp and timestamp without time zone equally, and timestamptz is the abbreviation for timestamp with time zone.
time, timestamp, and interval may set the scale (the number of digits of the decimal fraction) on the second field as the p value, and there is no explicit restriction on precision (total number of digits) by default. The allowable range of p is 0 to 6 for timestamp and interval type.
For the time type, the allowable range of p is 0 to 6 when using 8-byte integer storage, and 0 to 10 when using floating-point storage.
The interval type has an additional option of limiting the set of stored fields by writing one of the following statements:
YEAR
MONTH
DAY
HOUR
MINUTE
SECOND
YEAR TO MONTH
DAY TO HOUR
DAY TO MINUTE
DAY TO SECOND
HOUR TO MINUTE
HOUR TO SECOND
MINUTE TO SECOND
Date/Time Input
The date and time input can be specified in the order of day, month, and year as follows:
set datestyle to sql, mdy;
set datestyle to sql, dmy;
set datestyle to sql, ymd;
Set the DateStyle parameter to MDY to select the month-day-year interpretation, DMY to select the day-month-year interpretation, or YMD to select the year-month-day interpretation.
The date or time input must be preceded and followed by single quotation marks, like a text string.
Date
Available date types:
Example |
Description |
---|---|
1999-01-08 |
ISO 8601. January 8 in random mode (recommended format) |
January 8, 1999 |
Ambiguous in datestyle input mode |
1/8/1999 |
January 8 in MDY mode. August 1 in DMY mode |
1/18/1999 |
January 18 in MDY mode. Rejected in other modes |
01/02/03 |
January 2, 2003 in MDY mode \ |
February 1, 2003 in DMY mode \ |
|
February 3, 2001 in YMD mode |
|
1999-Jan-08 |
January 8 in random mode |
Jan-08-1999 |
January 8 in random mode |
08-Jan-1999 |
January 8 in random mode |
99-Jan-08 |
January 8 in YMD mode; others are errors |
08-Jan-99 |
January 8, except error in YMD mode |
Jan-08-99 |
January 8, except error in YMD mode |
19990108 |
ISO 8601. January 8, 1999 in random mode |
990108 |
ISO 8601. January 8, 1999 in random mode |
1999.008 |
Year and day of the year |
J2451187 |
Julian date |
January 8, 99 BC |
Year 99 BC |
Time
The visual type is time [(p)] without time zone and time [(p)] with time zone. time alone is the same as time without time zone. A valid entry of this type consists of time, followed by an optional time zone. (See the table below.) If the time zone is specified as an input for time without time zone, it is ignored. You can specify the date, but ignore it except when using the time zone name associated with daylight saving time, such as America/New_York. In this case, a date specification is required to determine whether the standard time or daylight saving time period is applied. The appropriate time zone offset is recorded in the time with time zone value.Table - [Enter Time]
Example
Description
04:05:06.789
ISO 8601
04:05:06
ISO 8601
04:05
ISO 8601
040506
ISO 8601
04:05 AM
Same as 04:05. AM does not affect the value.
04:05 PM
Same as 16:05. Input must be <= 12.
04:05:06.789-8
ISO 8601
04:05:06-08:00
ISO 8601
04:05-08:00
ISO 8601
040506-08
ISO 8601
04:05:06 PST
Time zone specified by abbreviation
2003-04-12 04:05:06 America/New_York
Time zone specified by full name
Table - [En ter Time Slot]
Example
Description
PST
Abbreviation (for Pacific Time)
America/New_York
All time zone names
PST8PDT
POSIX style time zone specification
-8:00
ISO-8601 Offset for PST
-800
ISO-8601 Offset for PST
-8
ISO-8601 Offset for PST
zulu
Military acronym for UTC
See the [Time Zones] section below for how to specify the time zone.
Time stamp
The valid input of timestamp type consists of a connection of date and time followed by an optional time zone and optional AD or BC. (AD/BC may appear before the time zone, which is not the preferred order.)1999-01-08 04:05:06 1999-01-08 04:05:06 -8:00
These are valid values in accordance with the ISO 8601 standard. The following command format is also supported:
January 8 04:05:06 1999 PST
The SQL standard distinguishes timestamp without time zone and timestamp with time zone literals by the presence of a “+” or “-” sign and the time zone offset after that time.
-- timestamp without time zone TIMESTAMP '2004-10-19 10:23:54' -- timestamp with time zone TIMESTAMP '2004-10-19 10:23:54+02'
If you do not specify timestamp with time zone, it is treated as timestamp without time zone; thus, if you use timestamp with time zone, you must specify an explicit type.
TIMESTAMP WITH TIME ZONE '2004-10-19 10:23:54+02'
Special values
Some special date/time input values are supported for convenience as shown in the table. The values infinity and -infinity are simple abbreviations that are specially represented within the system, without modification, but otherwise converted to the default date/time value when read. (Note the now and related strings are converted to specific time values at the moment they are read.) If you use all of these values as constants in your SQL command, you must use single quotation marks around the constants.Input String
Valid Type
Description
epoch
date, timestamp
1970-01-01 00: 00: 00 + 00 (Unix system time 0)
infinity
date, timestamp
After all other timestamps
-infinity
date, timestamp
Before all other timestamps
now
date, time, timestamp
Start time of current transaction
today
date, timestamp
Midnight today
tomorrow
date, timestamp
Tomorrow midnight
yesterday
date, timestamp
Yesterday midnight
allballs
time
00:00:00.00 UTC
You can also use the following SQL-compatible functions to get the current time values of the data types such as CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP, LOCALTIME, and LOCALTIMESTAMP.
Date/Time Output
The output format of the date/time type can be set to one of four styles: ISO 8601, SQL (Ingres), typical POSTGRES (Unix date format) or German. The default is ISO format.
The following table shows an example of each output style. The output of date/time type has usually only the date or time depending on the example given.
Style Specification |
Description |
Example |
---|---|---|
ISO |
ISO 8601, SQL standard |
1997-12-17 07:37:16-08 |
SQL |
Traditional style |
12/17/1997 07:37:16.00 PST |
Postgres |
Original style |
Wed Dec 17 07:37:16 1997 PST |
German |
Local style |
17.12.1997 07:37:16.00 PST |
If the DMY field order is specified, it is output in order of month, day. In other cases, it is output in order of day and month. The following table is an example:
datestyle Setting |
Input Ordering |
Example Output |
---|---|---|
SQL, DMY |
day/month/year |
17/12/1997 15:37:16.00 CET |
SQL, MDY |
month/day/year |
12/17/1997 07:37:16.00 PST |
Postgres, DMY |
day/month/year |
Wed 17 Dec 07:37:16 1997 PST |
The date/time style can be selected by the user using SET datestyle command, DateStyle parameter in the postgresql.conf configuration file, or PGDATESTYLE environment variables of the server or client. The formatting function to_char allows you to specify date/time output formats in a more flexible manner.
Time Zones
Time zones and notations are affected by political decisions in addition to topographical features. Worldwide time zones have been standardized in the 1900s but are constantly changing due to daylight time regulations. AgensGraph uses the IANA (Olson) timezone database. For future times, it considers that the latest known rules for a specified time zone will be complied with indefinitely in the distant future. However, it has a peculiar mix of dates, time types and features.
Two obvious problems are:
Date type does not have an associated time zone (only time type has). In the real world, the time zone has no meaning unless it is associated with date and time, since the offset can vary over the year containing the daylight saving time boundary.
The default time zone is specified as a constant numeric offset from UTC. Thus, it may not be possible to adapt to daylight saving time when performing date/time calculations across DST boundaries.
When using time zone to solve this problem, we recommend using date/time type that includes both date and time (this is supported for compatibility but we do not recommend using the time with time zone type). Assume a local time zone for types that only contain dates or times. The date and time in the time zone are internally stored in UTC. Then, they are converted to a local time in the Time Zone specified by the TimeZone configuration parameters before being displayed to the client. Allow the time zone to be specified in three different formats.
The full names of time zone (e.g. America/New_York). The recognized time zone names are listed in the pg_timezone_names view. (Identical time zone names are recognized by many other software applications as well.)
Time zone abbreviations (e.g. PST). In contrast to the full names of time zone that can imply the daylight time conversion date rule set, the corresponding specification simply defines a specific offset from UTC. The recognized abbreviations are listed in the pg_timezone_abbrevs view. You cannot set the configuration parameters TimeZone or log_timezone as a time zone abbreviation, but may use an abbreviation and AT TIME ZONE operator as date/time input values.
In addition to time zone names and abbreviations, it accepts POSIX style time zone specifications of STDoffset or STDoffsetDST; STD is a regional abbreviation, offsetUTC is the numerical offset for time of the west, and DST is the optional summertime local abbreviation that is assumed to be one hour earlier than the specified offset. For instance, if EST5EDT is not yet a recognized local name, it is accepted and becomes functionally identical to the US East Coast time. In this syntax, local abbreviations can be any string of characters or any string using angle brackets (<>). If a daylight saving area abbreviation is present, it is assumed to be used in accordance with the same daylight saving time conversion rules as those used in the posixrules entry in the IANA time zone database. As posixrules are identical with US/Eastern in a standard AgensGraph installation, the POSIX style time zone specification complies with the USA Daylight Saving Time rules. You can adjust this behavior by replacing the posixrules file, if necessary.
Simply put, this is the difference between abbreviations and full names. Time zone abbreviations represent a specific offset from UTC, while many of time zone full names imply a local daylight time rule and have two possible UTC offsets. For instance, “2014-06-04 12:00 America/New_York” representing the noon in the New York area was Eastern Daylight Time (UTC-4) for the day. Accordingly, “2014-06-04 12:00 EDT” specifies the same time instance. However, “2014-06-04 12:00 EST” specifies noon Eastern Standard Time (UTC-5) regardless of whether daylight saving time is nominally in effect on that date.
To complicate the problem, some areas use the same time zone abbreviations meaning different UTC offsets at different times. For example, MSK in Moscow meant UTC+3 for several years, and UTC+4 for other cases. Even if these abbreviations are interpreted according to their meanings for a given date (or most recent meaning), as in the above EST example, this does not necessarily match the local time of the date. Note that the POSIX style time zone function does not check for the correctness of local abbreviations; this means you may silently accept false input. For example, SET TIMEZONE TO FOOBAR0 works by letting the system use specific abbreviations for UTC. Another problem to keep in mind is that positive offset is used for the Greenwich’s position west in the POSIX time zone region name. Elsewhere, AgensGraph conforms to the ISO-8601 notation, where the positive time-zone offset is east of Greenwich. Time zone names and abbreviations are case sensitive and are not embedded in the server; they are to be searched from the configuration files stored in the installation directory (…/share/timezone/and …/share/timezonesets/). TimeZone configuration parameters can be set in other standard ways, which is different from postgresql.conf as follows:
The SQL command SET TIME ZONE sets a time zone for the session. You can use SET TIMEZONE TO, which is more compatible with the SQL specification.
The PGTZ environment variable is used by the libpq client to send SET TIME ZONE command to the server when connected.
Interval Input
The interval value can be written using the following detailed syntax:
[@] quantity unit [quantity unit...] [direction]
Where quantity is a number and unit can be microsecond, millisecond, second, minute, hour, day, week, month, year, decade, century, millennium, or their abbreviations, singular or plural; direction can be ago or empty. The at (@) sign is an optional noise. Quantities in different units are implicitly added using an appropriate sign accounting. This syntax is also used for interval output when IntervalStyle is set to postgres_verbose.
Days, hours, minutes and seconds can be specified without explicitly marking units. For example, “1 12:59:10” is read the same as “1 day 12 hours 59 min 10 sec.”
Combinations of years and months can be specified using dashes as well. For example, “200-10” is read the same as “200 years 10 months.” (This short format is actually the only one allowed by the SQL standards, and is used for output if IntervalStyle is set to sql_standard).
The format using specifiers:
P quantity unit [ quantity unit ...] [ T [ quantity unit ...]]
The string must contain P, and may contain T to include the unit of time. Available unit abbreviations are listed in the table below. Units can be omitted and can be specified in any order, but units of less than one day must appear after T. In particular, the meaning of M varies depending on whether it is before or after T.
Table [ISO 8601 Interval Unit Abbreviations]
Abbreviation |
Meaning |
---|---|
Y |
Year |
M |
Month(in the date part) |
W |
Week |
D |
Day |
H |
Hour |
M |
Minute(in the time part) |
S |
Second |
Alternative format:
P [ years-months-days ] [ T hours:minutes:seconds ]
The string must start with P, and T separates the date and time of the interval. The value is specified as a number similar to the ISO 8601 date. If you create an interval constant with the fields specification, or if you assign a string to an interval column defined by the fields specification, the interpretation of the unmarked quantity varies depending on the fields. For example, INTERVAL ‘1’ YEAR is interpreted as a year, whereas INTERVAL ‘1’ means 1 second. The lowest right field value allowed by the fields specification is ignored.
For example, INTERVAL ‘1 day 2:03:04’ HOUR TO MINUTE will eventually delete the “seconds” field (not the date field). As all fields of interval values must have the same signs according to the SQL standards, the leading negative signs apply to all fields. For example, the negative sign in the interval literal ‘-1 2:03:04’ applies to both date and hour/minute/second parts. As the fields are allowed to have different signs and the signs of each field are independently processed in text representation, the hour/minute/second part is regarded as a positive value in this example. If IntervalStyle is set to sql_standard, the leading sign is assumed to be applied to all fields (if there is no additional sign).
If the field is negative, it is better to explicitly append the (negative) sign to avoid ambiguity. Internally, interval values are stored as months, days, and seconds. This is because the number of days in the month is different, and a day can be 23 or 25 hours depending on implementation of daylight saving time. The month and day fields are integers, and the seconds field can be stored as decimals. Since the interval is usually generated by constant strings or timestamp subtraction, this way of storage is not problematic in most cases. The functions justify_days and justify_hours are useful when you want to control overflowing days and times in the normal range. Field values in some fields of the detailed input format and the simpler input field may have a decimal fraction (e.g. “1.5 week” or “01:02:03.45”). These inputs are converted to an appropriate number of months, days, and seconds for storage. This results in the decimal(s) of months or days being added to a subfield using the conversion factor Jan=30 days and 1 day=24 hours. For example, “1.5 month” is one month and 15 days. Only the seconds are displayed with a decimal fraction. The table below shows some examples of valid interval inputs.
Table [[Interval Input]]
Example |
Description |
---|---|
1-2 |
SQL standard format: 1 year 2 months |
3 4:05:06 |
SQL standard format: 3 days 4 hours 5 minutes 6 seconds |
1 year 2 months 3 days 4 hours 5 minutes 6 seconds |
Typical Postgres format: 1 year 2 months 3 days 4 hours 5 minutes 6 seconds |
P1Y2M3DT4H5M6S |
ISO 8601 “format with designators”: same as above |
P0001-02-03T04:05:06 |
ISO 8601 “alternative format”: same as above |
Interval Output
The output format of interval type can be set to one of the four styles sql_standard, postgres, postgres_verbose, or iso_8601 using the command SET intervalstyle. The default is postgres. The table below shows an example of each output style. The sql_standard style generates an output that conforms to the SQL standard specification for the interval literal string if the interval value satisfies the standard limit (year-month or day-minute only if positive and negative values are not mixed). Otherwise, a sign is explicitly added to eliminate the confusion of the sign mixing interval; the output appears as if the standard year-month literal string is followed by a day-time literal string.
Table [Example of interval output style]
Style Specification |
Year-Month Interval |
Day-Time Interval |
Mixed Interval |
---|---|---|---|
sql_standard |
1-2 |
3 4:05:06 |
-1-2 +3 -4:05:06 |
postgres |
1 year 2 mons |
3 days 04:05:06 |
-1 year -2 mons +3 days -04:05:06 |
postgres_verbose |
@ 1 year 2 mons |
@ 3 days 4 hours 5 mins |
@ 1 year 2 mons -3 days 4 hours 5 |
6 secs |
mins 6 secs ago |
||
iso_8601 |
P1Y2M |
P3DT4H5M6S |
P-1Y-2M3DT-4H-5M-6S |
Boolean Type
It provides a standard SQL type boolean; on top of “true” and “false” states, it has “unknown,” a third state that is expressed as SQL null.
Name |
Storage Size |
Description |
---|---|---|
boolean |
1 byte |
True or False state |
Valid literal values in the true state are:
TRUE
't'
'true'
'y'
'yes'
'on'
'1'
In the false state, you may use the following values:
FALSE
'f'
'false'
'n'
'no'
'off'
'0'
Leading or trailing blanks are ignored and it is not case sensitive. Using the keywords TRUE and FALSE is preferred. The following example shows that a boolean value is displayed (output) using the characters t and f.
An example of using the boolean type:
CREATE TABLE test1 (a boolean, b text);
INSERT INTO test1 VALUES (TRUE, 'sic est');
INSERT INTO test1 VALUES (FALSE, 'non est');
SELECT * FROM test1;
a | b
---+---------
t | sic est
f | non est
SELECT * FROM test1 WHERE a;
a | b
---+---------
t | sic est
Geometric Types
Geometric types display two-dimensional spatial objects and the available geometric types are as follows:
Name |
Storage Size |
Description |
Representation |
---|---|---|---|
point |
16bytes |
Point on the plane |
(x,y) |
line |
32bytes |
Infinite straight line |
{A,B,C} |
lseg |
32bytes |
Infinite segment |
((x1,y1),(x2,y2)) |
box |
32bytes |
Square box |
((x1,y1),(x2,y2)) |
path |
16+16n bytes |
Closed path (similar to a polygon) |
((x1,y1),…) |
path |
16+16n bytes |
Open path |
[(x1,y1),…] |
polygon |
40+16n bytes |
Polygon (similar to a closed path) |
((x1,y1),…) |
circle |
24bytes |
Circle |
<(x,y),r> (Center point and radius) |
Points
Points are the basic two-dimensional building blocks for geometric types. The value of point type is specified using one of the following syntaxes:
( x , y )
x , y
Where x and y are the floating-point coordinates, respectively. Points are output using the first syntax.
Lines
Lines are expressed by a linear equation Ax+By+C=0; where A and B are both nonzero. The value of line type is the input and output in the following format.
{ A, B, C }
Alternatively, you can use one of the following formats for input:
[ ( x1 , y1 ) , ( x2 , y2 ) ]
( ( x1 , y1 ) , ( x2 , y2 ) )
( x1 , y1 ) , ( x2 , y2 )
x1 , y1 , x2 , y2
Where (x1, y1) and (x2, y2) are two different points on a straight line.
Line Segments
A line segment is represented by a pair of points that form the two end points of a line segment. The value of lseg type is specified using one of the following syntaxes.
[ ( x1 , y1 ) , ( x2 , y2 ) ]
( ( x1 , y1 ) , ( x2 , y2 ) )
( x1 , y1 ) , ( x2 , y2 )
x1 , y1 , x2 , y2
Where (x1, y1) and (x2, y2) are the two end points of a line segment. The Line Segment is output using the first syntax.
Boxes
A box is represented by a pair of points that form the opposite corners of the box. The value of box type is specified using one of the following syntaxes:
( ( x1 , y1 ) , ( x2 , y2 ) )
( x1 , y1 ) , ( x2 , y2 )
x1 , y1 , x2 , y2
Where (x1, y1) and (x2, y2) are the two opposite corners of the box. A box is output using the second syntax. Two opposite corners are provided as inputs, but values are reordered as needed to be saved as upper right corner and lower left corner.
Paths
A path is represented by a list of connected points. If the first and last points of the list are considered unconnected, it is an open path. If the first and last points are considered to be connected, it is a closed path. The value of path type is specified using one of the following syntaxes:
[ ( x1 , y1 ) , ... , ( xn , yn ) ]
( ( x1 , y1 ) , ... , ( xn , yn ) )
( x1 , y1 ) , ... , ( xn , yn )
( x1 , y1 , ... , xn , yn )
x1 , y1 , ... , xn , yn
Where the points are the two end points of a line segment constituting the path. Square brackets ([]) indicate open paths and parentheses (()) indicate closed paths. If the outermost parentheses are omitted, as in the third through fifth statements, they are considered closed paths. The path is output using the second syntax properly.
Polygons
Polygons are represented by a list of points (polygon vertices). Polygons are very similar to closed paths, but are stored differently and have their own set of routines supported. The value of polygon type is specified using one of the following syntaxes:
( ( x1 , y1 ) , ... , ( xn , yn ) )
( x1 , y1 ) , ... , ( xn , yn )
( x1 , y1 , ... , xn , yn )
x1 , y1 , ... , xn , yn
Where the points are the two end points of a line segment constituting the boundary of the polygon. The polygon is output using the first syntax.
Circles
A circle is represented by the center point and radius. The value of circle type is specified using one of the following syntaxes:
< ( x , y ) , r >
( ( x , y ) , r )
( x , y ) , r
x , y , r
Where (x, y) is the center of the circle and r is the radius. Circle is output using the first syntax.
XML Type
The XML type is used to store XML data. It allows you to inspect input values in a well-formed format rather than storing XML data in a text field, and has functions that supports safe operations. This data type requires an installation built with configure –with-libxml.
Creating XML Values
To generate an xml type value from character data, use the function below:
XMLPARSE ( { DOCUMENT | CONTENT } value)
Examples:
XMLPARSE (DOCUMENT '<?xml version="1.0"?><book><title>Manual</title><chapter>...</chapter></book>')
XMLPARSE (CONTENT 'abc<foo>bar</foo><bar>foo</bar>')
The XML type does not check the input value for the document type definition (DTD) even if the input value is specified as DTD. It does not support validation of other XML schema languages (e.g. XML schema).
The inverse operation to generate a string value in xml uses the following function:
XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type )
type can be character, character varying, text (or an alias in it). According to the SQL standard, this is the only way to convert XML and character types, but may simply cast the values.
Choosing DOCUMENT and CONTENT is determined by the “XML Option” session configuration parameters, which can be set using standard commands when a string value is cast to xml type or passes through XMLSARIALIZE without going through XMLPARSE or XMLSERIALIZE.
SET xmloption TO { DOCUMENT | CONTENT };
As the default is CONTENT, XML data in any format are allowed.
Note: You cannot directly convert a string containing DTD to an XML format, since the definition of an XML content fragment does not allow XML if the default XML option settings are used. You should use XMLPARSE or change the XML option.
Encoding Handling
Care should be taken when you process multiple character encodings on the client and when XML data is passed through it. If you use text mode to pass a query to the server and pass a query result to the client (normal mode), convert all character data passed between the client and server, and convert the character encoding backwards at each end. It contains a text representation of the XML value as shown in the example above. Usually this implies that the encoding declaration contained in the XML data can be invalidated, as the embedded encoding declaration does not change, if the character data is converted to another encoding during transmission between the client and server. To cope with this behavior, the encoding declaration contained in the character string that exists for the XML type input is ignored and CONTENT is regarded as the current server encoding. As a result, strings of XML data must be transmitted from the client through the current client encoding for proper processing. It is the client’s responsibility to convert the document to the current client encoding or to properly adjust the client encoding before sending it to the server. The value of type XML in the output does not have an encoding declaration, and the client considers all data to be the current client encoding.
If you use binary mode to pass query parameters to the server and pass the query results back to the client, the character set conversion is not performed, making things more difficult. In this case, the encoding declaration of the XML data is complied with; if absent, the data is assumed to be UTF-8 (note that this is required by the XML standard and does not support UTF-16). At the output, the data has an encoding declaration specifying the client encoding, if it is not the client encoding is UTF-8; if it is, the encoding declaration will be omitted.
Processing XML data is less likely to cause errors, and is more efficient when the XML data encoding, client encoding, and server encoding are all identical. Since XML data is internally processed as UTF-8, the calculation is most efficient when the server is UTF-8 as well.
Note: Some XML-related functions may not work with non-ASCII data if the server encoding is not UTF-8. This is especially known as a problem in xpath ().
Accessing XML Values
The XML data type is unique in that it does not provide a comparison operator. This is because there is no well-defined and universally useful comparison algorithm for XML data. As a result, you cannot retrieve a row by comparing the xml column with the search value. You must use XML with a separate key field (e.g. ID). An alternative solution to compare XML values is to convert it to a string first. Note, however, that string comparisons have nothing to do with useful XML comparison methods.
Since there is no comparison operator for XML data types, it is not possible to create an index directly on this type of column. If you need to retrieve XML data fast, the possible solutions include casting and indexing expressions to character string types or indexing XPath expressions. Of course, the search should be adjusted by the indexed expressions in actual querying.
The text search feature can also be used to speed up retrieval of entire documents in XML data. However, the necessary preprocessing support is not yet available.
JSON Types
The JSON data type is for storing JavaScript Object Notation (JSON) data as specified in RFC 71591. The applicable data can also be saved as text, but the JSON data type has an advantage of enforcing each stored value to be valid according to the JSON rules. There are also a number of JSON-specific functions and operators available for these types of stored data.
There are two JSON data types: JSON and JSONB. These two types accept almost the same set of values as input. A substantial difference between the two is efficiency. The JSON data type stores an exact copy of the input text, and its processing function must be re-parsed for each execution. On the other hand, the JSONB data is stored in decomposed binary format, which is a bit slower on input due to the added conversion overhead, but much faster during processing because no re-parsing is required. JSONB also supports indexing, which can be a significant advantage.
The JSON type stores exact copies of the input text, preserving syntactically-insignificant whitespace between tokens and key sequences in the JSON object. In addition, if a JSON object inside a value contains the same key more than once, all key/value pairs are retained. (Processing functions assume the last value is valid.) Conversely, JSONB does not retain whitespace, the order of the object keys, and duplicate object keys. If a duplicate key is specified in the input, only the last value is retained. In general, most applications should store JSON data as JSONB unless there is a specific reason such as existing assumptions about object key ordering. Only one character set encoding per database is allowed. Therefore, when the database encoding is not UTF8, it is impossible for the JSON type to strictly comply with the JSON specification. Attempts to directly include characters that cannot be represented in the database encoding would fail. In contrast, characters that cannot be represented by UTF8 but can be represented by database encoding are allowed. RFC 7159 allows a JSON string to contain a Unicode escape sequence denoted by \uXXXX
. In a JSON type input function, Unicode escapes are allowed regardless of the database encoding, and are checked for syntactical correctness (i.e. \u
followed by four hexadecimal digits). However, the input function for JSONB is stricter. When the database encoding is not UTF8, it does not allow Unicode escapes for non-ASCII characters (U+007F and above). In addition, in JSONB types, use of a Unicode surrogate pair that denies \u0000
(as it cannot be represented as a text type) and specifies characters out of the Unicode Basic Multilingual Plane (BMT) must be correct; it is converted to ASCII or UTF8 characters corresponding to a valid Unicode escape and stored (overlapping surrogate pairs are included as a single character).
Note: Many JSON processing functions convert Unicode escapes to regular characters, and thus return an error even if the input is of type json rather than jsonb. In general, it is best not to mix Unicode escapes with UTF8 database encodings in JSON whenever possible.
If you convert the text JSON input to JSONB, the primitive types described in RFC 7159 are effectively mapped to the native AgensGraph types, as shown in Table 8-23. Accordingly, several local constraints that do not apply to the JSON type and JSON (even abstractly) and that constitute valid JSONB data are added; this corresponds to a restriction on whether it can be represented as a primitive data type or not. In particular, JSONB rejects numbers that are outside the range of the AgensGraph numeric data type and that do not leave JSON. Constraints on which the corresponding implementations are defined are allowed by RFC 7159. In practice, however, this problem is much more common in other implementations, as JSON’s number base type is commonly used to represent IEEE 754 double floating point (explicitly predicted and allowed in RFC 7159). If you use JSON in interchange format with the system, you should consider the risk of loss of numeric precision when compared to the original data stored in AgensGraph. Conversely, there are some local constraints on the input type of the JSON base type that do not apply to the corresponding AgensGraph type, as shown in the table.
[JSON basic type and corresponding AgensGraph type]
JSON primitive type |
Type |
Notes |
---|---|---|
string |
text |
If the database encoding is not UTF8, \u0000 is not allowed as it is a non-ASCII Unicode escape |
number |
numeric |
NaN and infinity values are not allowed |
boolean |
boolean |
Only lowercase spelling true and false are accepted |
null |
(없음) |
SQL NULL is conceptually different |
JSON Input and Output Syntax
The input/output syntaxes in the JSON data type can be specified as in RFC 7159. The following example shows all valid JSON (or JSONB) expressions.
-- Simple scalar/primitive value
-- Primitive values can be numbers, quoted strings, true, false, or null
SELECT '5'::json;
-- Array of zero or more elements (elements need not be of same type)
SELECT '[1, 2, "foo", null]'::json;
-- Object containing pairs of keys and values
-- Note that object keys must always be quoted strings
SELECT '{"bar": "baz", "balance": 7.77, "active": false}'::json;
-- Arrays and objects can be nested arbitrarily
SELECT '{"foo": [true, "bar"], "tags": {"a": 1, "b": null}}'::json;
As mentioned earlier, when a JSON value is input and printed without further processing, JSON outputs the same text as the input, and JSONB does not retain syntactically-insignificant content (e.g. spaces). Refer to the differences presented below.
SELECT '{"bar": "baz", "balance": 7.77, "active":false}'::json;
Result :
json
-------------------------------------------------
{"bar": "baz", "balance": 7.77, "active":false}
(1 row)
SELECT '{"bar": "baz", "balance": 7.77, "active":false}'::jsonb;
Result :
jsonb
--------------------------------------------------
{"bar": "baz", "active": false, "balance": 7.77}
(1 row)
One thing to note, though not syntactically-significant, is that it is in JSONB, where numbers are printed according to the default numeric type. In practice, this means that the number marked as “e” will be omitted when printed. Here is an example:
SELECT '{"reading": 1.230e-5}'::json, '{"reading": 1.230e-5}'::jsonb;
Result :
json | jsonb
-----------------------+-------------------------
{"reading": 1.230e-5} | {"reading": 0.00001230}
(1 row)
As shown in this example, however, JSONB preserves the trailing zero (0) for checkup even if it is syntactically-insignificant.
Designing JSON documents effectively
Expressing data in JSON can be much more flexible than that in traditional relational data models where requirements are enforced in a variable environment. Both methods may coexist and complement each other in the same application. However, in applications that require maximum flexibility, it is recommended that the JSON documents have a slightly fixed structure. Structures are not generally applicable (you can also apply some business rules declaratively), but using a predictable structure makes it easier to write queries that usefully summarize a “document” (datum) set of tables. JSON data is affected by the same concurrency control considerations as other data types when stored in tables. It should be noted that, even though it is feasible to store a large document, updates obtain row-level locking for the entire row. You should consider limiting the JSON document to a manageable size in order to reduce lock contention between update transactions. In principle, JSON documents indicate that the atomic data pointed by business rules cannot be segmented into smaller datums, each of which can be modified independently.
JSONB Containment and Existence
Testing containment is an important feature of JSONB. There is no feature set similar to the JSON type. Containment tests whether a single JSONB document is contained in another document. This example returns true except where noted.
-- Simple scalar/primitive values contain only the identical value:
SELECT '"foo"'::jsonb @> '"foo"'::jsonb;
-- The array on the right side is contained within the one on the left:
SELECT '[1, 2, 3]'::jsonb @> '[1, 3]'::jsonb;
-- Order of array elements is not significant, so this is also true:
SELECT '[1, 2, 3]'::jsonb @> '[3, 1]'::jsonb;
-- Duplicate array elements don't matter either:
SELECT '[1, 2, 3]'::jsonb @> '[1, 2, 2]'::jsonb;
-- The object with a single pair on the right side is contained
-- within the object on the left side:
SELECT '{"product": "PostgreSQL", "version": 9.4, "jsonb": true}'::jsonb
@> '{"version": 9.4}'::jsonb;
-- The array on the right side is not considered contained within the
-- array on the left, even though a similar array is nested within it:
SELECT '[1, 2, [1, 3]]'::jsonb @> '[1, 3]'::jsonb; -- yields false
-- But with a layer of nesting, it is contained:
SELECT '[1, 2, [1, 3]]'::jsonb @> '[[1, 3]]'::jsonb;
-- Similarly, containment is not reported here:
SELECT '{"foo": {"bar": "baz"}}'::jsonb @> '{"bar": "baz"}'::jsonb; -- yields false
-- A top-level key and an empty object is contained:
SELECT '{"foo": {"bar": "baz"}}'::jsonb @> '{"foo": {}}'::jsonb;
The general principle is that the structure and data content of the contained objects are consistent with the containing objects. It may be possible after discarding unmatched (inconsistent) array elements or object key/value pairs from the containing object. However, when comparing containments, the order of array elements is not important, and duplicate array elements are actually considered only once.
As a special exception to the general principle that structures must be matched, arrays may contain primitive values.
-- This array contains the primitive string value:
SELECT '["foo", "bar"]'::jsonb @> '"bar"'::jsonb;
-- This exception is not reciprocal -- non-containment is reported here:
SELECT '"bar"'::jsonb @> '["bar"]'::jsonb; -- yields false
JSONB has the existence operator that is a variation of the containing theme, which tests whether a string (specified as a text value) appears at the top level of the JSONB value as an object key or array element. This example returns true, except where noted.
-- String exists as array element:
SELECT '["foo", "bar", "baz"]'::jsonb ? 'bar';
-- String exists as object key:
SELECT '{"foo": "bar"}'::jsonb ? 'foo';
-- Object values are not considered:
SELECT '{"foo": "bar"}'::jsonb ? 'bar'; -- yields false
-- As with containment, existence must match at the top level:
SELECT '{"foo": {"bar": "baz"}}'::jsonb ? 'bar'; -- yields false
-- A string is considered to exist if it matches a primitive JSON string:
SELECT '"foo"'::jsonb ? 'foo';
Unlike arrays, JSON objects are optimized internally for search and do not perform linear searches. This means that they are more suitable than arrays that test for containment or existence when there are many related keys and elements.
JSONB Indexing
The GIN index can be used to efficiently search for a key or a key/value pair in a number of JSONB documents (datums). Two GIN “operator classes” with different performance and flexibility tradeoffs are provided. The default GIN operator classes for JSONB support queries using operators such as @>, ?, ?&, and ?|. Here is an example of creating an index within this operator class:
CREATE INDEX idxgin ON api USING GIN (jdoc);
jsonb_path_ops, which is not a default GIN operator class, supports indexing of @> operator only. Here is an example of creating an index within this operator class:
CREATE INDEX idxginp ON api USING GIN (jdoc jsonb_path_ops);
Consider a table example that stores a JSON document retrieved from a third-party Web service using a documented schema definition. The general document will be as follows:
{
"guid": "9c36adc1-7fb5-4d5b-83b4-90356a46061a",
"name": "Angela Barton",
"is_active": true,
"company": "Magnafone",
"address": "178 Howard Place, Gulf, Washington, 702",
"registered": "2009-11-07T08:53:22 +08:00",
"latitude": 19.793713,
"longitude": 86.513373,
"tags": [
"enim",
"aliquip",
"qui"
]
}
Save this document as a JSONB column called jdoc in a table called api. When a GIN index is created in this column, the following query uses the index.
-- Find documents in which the key "company" has value "Magnafone"
SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @> '{"company": "Magnafone"}';
However, even though the operator “?” can be indexed, it is not directly applied to the indexed column jdoc. Thus, the index cannot be used in the following query.
-- Find documents in which the key "tags" contains key or array element "qui"
SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc -> 'tags' ? 'qui';
Still, the above query can use the index, if an expression index is properly used. If a query for a particular item is common within the “tags” key, an index definition like the one below is useful.
CREATE INDEX idxgintags ON api USING GIN ((jdoc -> 'tags'));
WHERE clause jdoc->’tags’ ? ‘qui’ is an application of the indexable operator “?” and the indexed expression jdoc -> ‘tags’ is recognized.
Another way to query is to make the best use of containment. For example, a simple GIN index on a jdoc column can support this query.
-- Find documents in which the key "tags" contains array element "qui"
SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @> '{"tags": ["qui"]}';
However, while these indexes store copies of all the keys and values of the jdoc column, the expression index in the previous example stores only the data found under the tags key. It is true that the simple index approach is much more flexible (as it supports queries on arbitrary keys), but the targeted expression index is much smaller and more fast-searched than the simple index.
The jsonb_path_ops operator class only supports queries using @> operator, but it has a good performance advantage over the default operator class, jsonb_ops. With the same data, the size and search specificity of the jsonb_path_ops index are generally much smaller and better than those of the jsonb_ops index, respectively. This is especially true for queries that contain keys that appear frequently in the data. Searches with this class are therefore generally much better than using the default operator class.
A technical difference between jsonb_ops and jsonb_path_ops GIN indexes is that the former creates index entries that are independent for each key and value of the data, while the latter only creates index entries for the data values. By default, each jsonb_path_ops index entry is a hash of values and keys leading to it. Let’s take a look at an index {“foo”: {“bar”: “baz”} as an example; a single index entry is generated by combining all three of foo, bar and baz into a hash value. Thus, a constraint query that looks for such a structure results in an extremely specific index search. However, there is no way to know at all whether foo will appear as a key. Conversely, the jsonb_ops index creates three index entries that represent foo, bar, and baz, respectively. Then, to execute a constraint query, it looks up a row that contains all three of these items. A GIN index can perform its AND search very efficiently, but is less specific and slower than an equivalent jsonb_path_ops search, especially when there are a great deal of rows containing one of the three index entries. A disadvantage of the jsonb_path_ops approach is that it does not create index entries for JSON structures that do not contain values such as {“a”: {}}. A full index scan is required if a document search containing the structure is requested, which is very slow. Thus, jsonb_path_ops is not suitable for applications that perform frequent searches. JSONB also supports btree and hash indexes, which is useful only when checking the equivalence of the entire JSON document. The B-tree ordering of JSONB data is not that important, but is necessary for completeness.
Object > Array > Boolean > Number > String > Null
Object with n pairs > object with n - 1 pairs
Array with n elements > array with n - 1 elements
Objects that have the same number of pairs are compared in the following order:
key-1, value-1, key-2 ...
Object keys are compared in the order of storage; in particular, storing short keys in front of long keys leads to non-intuitive results:
{ "aa": 1, "c": 1} > {"b": 1, "d": 1}
Similarly, arrays with the same number of elements are compared in the following order:
element-1, element-2 ...
The native JSON values are compared using the comparison rules, which also apply to the primitive data types. Strings are compared using the default database collation.
Arrays
Arrays allow the column type of the table to be defined as a variable length multidimensional array. You can create arrays of built-in or user-defined base types, enum types, or composite types. Arrays of domains are not yet supported.
Declaration of Array Types
In order to explain use of array types, let’s create a table as follows:
CREATE TABLE sal_emp (
name text,
pay_by_quarter integer[],
schedule text[][]
);
As indicated, the array data type is named by adding square brackets ([]) to the array element’s data type name. The above command creates a table called sal_emp with a column of type text (name), where a one-dimensional array of type integer (pay_by_quarter) represents the employee’s quarterly salary and a two-dimensional array of text (schedule) indicates the employee’s weekly schedule. CREATE TABLE allows the exact size of the array to be specified. For example:
CREATE TABLE tictactoe (
squares integer[3][3]
);
However, the current implementation ignores the array size limits provided. That is, the operation is the same as an array of unspecified length. The current implementation does not enforce the declared number of dimensions. Arrays of particular element types are considered to be of the same type, regardless of their size or the number of dimensions. Therefore, declaring an array size or the number of dimensions in CREATE TABLE is simply documentation and does not affect runtime behavior.
You can utilize the keyword ARRAY to use an alternative syntax that conforms to the SQL standard for one-dimensional arrays. pay_by_quarter may have been defined as:
pay_by_quarter integer ARRAY[4],
If the array size is not specified:
pay_by_quarter integer ARRAY,
In any case, however, size constraints are not enforced.
Array Value Input
To create an array value as a literal constant, place the value in braces and separate it with a comma. You can use double quotes around the value; you should do this if it contains a comma or brace. The general form of an array constant:
'{ val1 delim val2 delim ... }'
In the example above, delim is the delimiter for the type recorded in the pg_type entry. All of the standard data types provided by the AgensGraph distribution use commas, except for the type box that uses semicolons (;). Each val is a constant of the array element type or subarray. For example:
'{{1,2,3},{4,5,6},{7,8,9}}'
This constant is a two-dimensional 3x3 array consisting of three integer sub-arrays. To set an element of an array constant to NULL, create NULL for the element value. (NULL case can be changed.) If you want the actual string value “NULL,” use double quotation marks before and after NULL. (Constants are initially processed as strings and passed to the array input conversion routine, which may require explicit type specifications.)
The following is an example of INSERT statement and its execution result:
INSERT INTO sal_emp
VALUES ('Bill',
'{10000, 10000, 10000, 10000}',
'{{"meeting", "lunch"}, {"training", "presentation"}}');
INSERT INTO sal_emp
VALUES ('Carol',
'{20000, 25000, 25000, 25000}',
'{{"breakfast", "consulting"}, {"meeting", "lunch"}}');
SELECT * FROM sal_emp;
Result :
name | pay_by_quarter | schedule
-------+---------------------------+-------------------------------------------
Bill | {10000,10000,10000,10000} | {{meeting,lunch},{training,presentation}}
Carol | {20000,25000,25000,25000} | {{breakfast,consulting},{meeting,lunch}}
(2 rows)
A multidimensional array must have a matching range for each dimension, and an inconsistency will cause the following error:
INSERT INTO sal_emp
VALUES ('Bill',
'{10000, 10000, 10000, 10000}',
'{{"meeting", "lunch"}, {"meeting"}}');
ERROR: multidimensional arrays must have array expressions with matching dimensions
The constructor syntax of ARRAY can also be used.
INSERT INTO sal_emp
VALUES ('Bill',
ARRAY[10000, 10000, 10000, 10000],
ARRAY[['meeting', 'lunch'], ['training', 'presentation']]);
INSERT INTO sal_emp
VALUES ('Carol',
ARRAY[20000, 25000, 25000, 25000],
ARRAY[['breakfast', 'consulting'], ['meeting', 'lunch']]);
An array element is a regular SQL constant or expression. A string literal is enclosed in single quotes instead of double quotes as in an array literal.
Accessing Arrays
You can run some queries from the table created above. Access a single element of an array. The following query retrieves the name of the employee whose salary has been changed in the second quarter.
SELECT name FROM sal_emp WHERE pay_by_quarter[1] <> pay_by_quarter[2];
Result :
name
-------
Carol
(1 row)
Array numbers are enclosed in square brackets; use array notation starting from 1. That is, the array of n elements starts at array[1] and ends at array[n].
This query retrieves the salary of all employees for the third quarter.
SELECT pay_by_quarter[3] FROM sal_emp;
Result :
pay_by_quarter
----------------
10000
25000
(2 rows)
You can also access any rectangular slice in an array or subarray. An array slice is represented using lower-bound:upper-bound for one or more array dimensions.
For example, the following query retrieves the first item in Bill’s schedule on the first two days of the week.
SELECT schedule[1:2][1:1] FROM sal_emp WHERE name = 'Bill';
Result :
schedule
------------------------
{{meeting},{training}}
(1 row)
If you create a dimension as a slice (for example, including a colon), all dimensions are processed as slices. A dimension with only a single number (no colon) is processed as being from 1 to the specified number. For example, [2] is processed as [1: 2], as follows:
SELECT schedule[1:2][2] FROM sal_emp WHERE name = 'Bill';
Result :
schedule
-------------------------------------------
{{meeting,lunch},{training,presentation}}
(1 row)
To avoid confusion in non-slice cases, it is best to use the slice syntax for all dimensions such as [1:2] [1:1] rather than [2] [1:1].
You can omit the lower-bound or upper-bound of the slice specifier. The missing bounds are replaced by lower or upper of the array, as shown in the following example:
SELECT schedule[:2][2:] FROM sal_emp WHERE name = 'Bill';
Result :
schedule
------------------------
{{lunch},{presentation}}
(1 row)
SELECT schedule[:][1:1] FROM sal_emp WHERE name = 'Bill';
Result :
schedule
------------------------
{{meeting},{training}}
(1 row)
The array subscript expression returns null if the array itself or subscript expression is null. In addition, null is returned if the subscript is out of the bounds of the array (in which case no errors are produced). For example, schedule [3][3], which is referenced when schedule is the current dimension [1:3] [1:2], prints NULL. Similarly, an array referencing an incorrect number of subscripts prints null, not an error.
Array slice expressions similarly return null if the array itself or the subscript expression is null. However, in other cases (e.g. selecting an array slice that is completely out of the bounds of the current array), the slice expression outputs an empty (zero-dimensional) array instead of null. If the requested slice partially overlaps the array bounds, it is reduced to an overlap region instead of returning null.
The current dimensions of array values can be retrieved using array_dims function.
SELECT array_dims(schedule) FROM sal_emp WHERE name = 'Carol';
Result :
array_dims
------------
[1:2][1:2]
(1 row)
array_dims produces a text result, which is more readable to humans, but inconvenient to the program. Dimensions can also be retrieved using array_upper and array_lower, which return the upper and lower bounds of the specified array dimension, respectively.
SELECT array_upper(schedule, 1) FROM sal_emp WHERE name = 'Carol';
array_upper
-------------
2
(1 row)
array_length returns the length of the specified array dimension.
SELECT array_length(schedule, 1) FROM sal_emp WHERE name = 'Carol';
Result :
array_length
--------------
2
(1 row)
cardinality returns the total number of elements in an array of all dimensions; it is actually the number of rows generated by the call on unnest.
SELECT cardinality(schedule) FROM sal_emp WHERE name = 'Carol';
Result :
cardinality
-------------
4
(1 row)
Modifying Arrays
You may update the array value as a whole;
UPDATE sal_emp SET pay_by_quarter = '{25000,25000,27000,27000}'
WHERE name = 'Carol';
or use ARRAY expression syntax.
UPDATE sal_emp SET pay_by_quarter = ARRAY[25000,25000,27000,27000]
WHERE name = 'Carol';
An array can update a single element;
UPDATE sal_emp SET pay_by_quarter[4] = 15000
WHERE name = 'Bill';
or update it only partially.
UPDATE sal_emp SET pay_by_quarter[1:2] = '{27000,27000}'
WHERE name = 'Carol';
The omitted lower-bound or upper-bound slice syntax can be used only when updating NULL or nonzero array values.
The stored array value can be expanded by assigning it to an element that does not yet exist. The position between the previously existing element and the newly allocated element is filled with null. For example, if the array myarray currently has 4 elements, it will have 6 elements after an update of assigning to myarray [6]. myarray [5] contains null. Currently, the expansion in this manner is only allowed for one-dimensional arrays (not multidimensional arrays). Subscripted assignments allow you to create arrays whose subscripts do not start at one (1). For example, to create an array with subscript values -2 through 7, you may assign it to myarray [-2:7].
The new array value can also be created using the concatenation operator ||.
SELECT ARRAY[1,2] || ARRAY[3,4] as array;
Result :
array
-----------
{1,2,3,4}
(1 row)
SELECT ARRAY[5,6] || ARRAY[[1,2],[3,4]] as array;
Result :
array
---------------------
{{5,6},{1,2},{3,4}}
(1 row)
The concatenation operator lets you push a single element into the beginning or end of a one-dimensional array. It accommodates two N-dimensional arrays (or N-dimensional and N+1-dimensional arrays).
If you push a single element to the beginning or end of a one-dimensional array as shown in the following example, the result is an array filled with the same lower bound subscript as the array operand.
SELECT array_dims(1 || '[0:1]={2,3}'::int[]);
Result :
array_dims
------------
[0:2]
(1 row)
SELECT array_dims(ARRAY[1,2] || 3);
Result :
array_dims
------------
[1:3]
(1 row)
When concatenating two arrays having the same number of dimensions, the result holds the lower bound of the outer dimension of the left operand. The result is that all the elements of the right operand are followed by an array of all the elements of the left operand, as the following example shows:
SELECT array_dims(ARRAY[1,2] || ARRAY[3,4,5]);
Result :
array_dims
------------
[1:5]
(1 row)
Result :
SELECT array_dims(ARRAY[[1,2],[3,4]] || ARRAY[[5,6],[7,8],[9,0]]);
Result :
array_dims
------------
[1:5][1:2]
(1 row)
If you push an N-dimensional array from the beginning to the end of an N+1-dimensional array, the result is similar to the element array example above. Each N-dimensional subarray is essentially an element of the outer dimension of the N+1-dimensional array, as shown in the following example:
SELECT array_dims(ARRAY[1,2] || ARRAY[[3,4],[5,6]]);
Result :
array_dims
------------
[1:3][1:2]
(1 row)
You can also create functions using the functions array_prepend, array_append, or array_cat; while the first two only support one-dimensional arrays, array_cat supports multidimensional arrays. The concatenation operator discussed above prefers the direct use of these functions. In effect, these functions exist primarily for use when implementing a concatenation operator. However, they can also be useful when creating user-defined aggregates. For example:
SELECT array_prepend(1, ARRAY[2,3]);
Result :
array_prepend
---------------
{1,2,3}
(1 row)
SELECT array_append(ARRAY[1,2], 3);
Result :
array_append
--------------
{1,2,3}
(1 row)
SELECT array_cat(ARRAY[1,2], ARRAY[3,4]);
Result :
array_cat
-----------
{1,2,3,4}
(1 row)
SELECT array_cat(ARRAY[[1,2],[3,4]], ARRAY[5,6]);
Result :
array_cat
---------------------
{{1,2},{3,4},{5,6}}
(1 row)
SELECT array_cat(ARRAY[5,6], ARRAY[[1,2],[3,4]]);
Result :
array_cat
---------------------
{{5,6},{1,2},{3,4}}
In a simple case, the concatenation operator described above is preferred over using these functions directly. However, using one of these functions may help avoid ambiguity, since the concatenation operator is overloaded to process all three cases.
SELECT ARRAY[1, 2] || '{3, 4}' as array; -- the untyped literal is taken as an array
Result :
array
-----------
{1,2,3,4}
SELECT ARRAY[1, 2] || '7'; -- so is this one
ERROR: malformed array literal: "7"
SELECT ARRAY[1, 2] || NULL as array; -- so is an undecorated NULL
Result :
array
-------
{1,2}
(1 row)
SELECT array_append(ARRAY[1, 2], NULL); -- this might have been meant
Result :
array_append
--------------
{1,2,NULL}
In the above example, parser sees an integer array on one side of the concatenation operator, and a constant of indeterminate type on the other. An empirical method used to analyze the constant type is to assume it is the same type as the other inputs of the operator (in this case, an integer array). So the concatenation operator is considered array_cat, not array_append. If that is a wrong choice, you can modify the constant by casting it to an element type of the array. Using array_append can be a desirable solution.
Searching in Arrays
To retrieve the value of an array, you should check each value; you may do this if you know the size of the array. For example:
SELECT * FROM sal_emp WHERE pay_by_quarter[1] = 10000 OR
pay_by_quarter[2] = 10000 OR
pay_by_quarter[3] = 10000 OR
pay_by_quarter[4] = 10000;
However, in large arrays, it quickly gets bored and is not helpful when the array size is unknown. The above query can be replaced by:
SELECT * FROM sal_emp WHERE 10000 = ANY (pay_by_quarter);
You can also find all the rows with an array value equal to 10000:
SELECT * FROM sal_emp WHERE 10000 = ALL (pay_by_quarter);
Alternatively, you can use the generate_subscripts function.
SELECT * FROM
(SELECT pay_by_quarter,
generate_subscripts(pay_by_quarter, 1) AS s
FROM sal_emp) AS foo
WHERE pay_by_quarter[s] = 10000;
It is also possible to search for an array using && operator, which checks whether the left operand overlaps with the right operand.
SELECT * FROM sal_emp WHERE pay_by_quarter && ARRAY[10000];
You can also use array_position and array_positions functions to retrieve a specific value in an array. The former returns the subscript of the first value in the array, and the latter returns an array with all the subscripts of the values in the array.
SELECT array_position(ARRAY['sun','mon','tue','wed','thu','fri','sat'], 'mon');
Result :
array_positions
-----------------
2
SELECT array_positions(ARRAY[1, 4, 3, 1, 3, 4, 2, 1], 1);
Result :
array_positions
-----------------
{1,4,8}
Array Input and Output Syntax
The external text expression of an array value consists of the I/O conversion rules for the array element type and the items that are interpreted according to the decoration denoting the array structure. Decoration consists of braces at both ends of the value of an array and delimiters between adjacent items. The delimiters are usually commas (,), but can be something else. It is determined by the typedelim setting for the element type of the array. All of the standard data types use commas, except for type box that uses semicolons (;). In a multidimensional array, each dimension (row, plane, cube, etc.) has its own level of braces and delimiters must be used between adjacent levels of brace entities.
Array output routines use double quotes around element values if they are empty strings, contain braces, delimiters, double quotes, backslashes, or whitespace, or match the word NULL. Double quotes and backslashes embedded in element values are escaped by a backslash. In the case of numeric data types, it is safe to assume that double quotes never appear. However, for text data types you must cope with presence or absence of quotation marks. By default, the lower bound index value is set to 1 in the dimension of the array. If you specify an array using another lower bound, the range of array subscripts can be explicitly specified before creating the array content. This type of decoration consists of brackets ([]) before and after the lower/upper limits of each array dimension and colons (:) as space separator s. An array dimension decoration is followed by an equal sign (=).
SELECT f1[1][-2][3] AS e1, f1[1][-1][5] AS e2
FROM (SELECT '[1:1][-2:-1][3:5]={{{1,2,3},{4,5,6}}}'::int[] AS f1) AS ss;
Result :
e1 | e2
----+----
1 | 6
(1 row)
The array output routine includes an explicit dimension in the result only if there is more than one lower bound.
If the value created for an element is NULL (in the case of variation), the element is considered to be NULL. Presence of quotes or backslashes disables this and allows input of the literal string value “NULL.” You may also set the array_nulls configuration parameter to off to prevent NULL from being recognized as NULL. As indicated earlier, you can use double quotes around individual array elements when creating array values. This should be done in the case where the element values can be confused by the array value parser. For example, elements that contain braces, commas (or data type separators), double quotes, backslashes, and leading or trailing whitespace must use double quotes. Empty strings and strings that match NULL must also be quoted. To add double quotes or backslashes into the quoted array element values, use an escape string syntax and precede it with a backslash. You may also use backslash escaping to avoid quotation marks and protect all data characters that might be processed incorrectly as array syntax.
You may add a space before the left brace or after the right brace. You can also add a space before or after the individual item string. Spaces are ignored in all these cases. Whitespaces within double-quote elements and spaces at both ends of non-whitespace characters in elements are ignored.
Range Types
The range type is a data type that represents the range of values of some element type. For example, the range of timestamps can be used to indicate the reserved time range of a meeting room. In this case, the data type is tsrange (short for “timestamp range”) and timestamp is subtype. The subtype must have a total order so that whether the element value is within, before, or after the value range can be well-defined.
The range types are useful in that they can represent multiple element values as a single range value and clearly express concepts like the overlap range. One of the most obvious examples is to use time and date ranges for scheduling. These types can also be useful for price ranges, measuring ranges of instruments, etc.
Built-in Range Types
The following built-in range types are provided:
int4range - Range of integer
int8range - Range of bigint
numrange - Range of numeric
tsrange - Range of timestamp without time zone
tstzrange - Range of timestamp with time zone
daterange - Range of date
You can also define your own range types. See User-defined Type for more information.
Examples
CREATE TABLE reservation (room int, during tsrange);
INSERT INTO reservation VALUES
(1108, '[2010-01-01 14:30, 2010-01-01 15:30)');
-- Containment
SELECT int4range(10, 20) @> 3;
-- Overlaps
SELECT numrange(11.1, 22.2) && numrange(20.0, 30.0);
-- Extract the upper bound
SELECT upper(int8range(15, 25));
-- Compute the intersection
SELECT int4range(10, 20) * int4range(15, 25);
-- Is the range empty?
SELECT isempty(numrange(1, 5));
Inclusive and Exclusive Bounds
All non-empty ranges have two bounds, a lower bound and an upper bound. All points between these values are included in the range. Inclusive bounds mean that the boundary points themselves are included in the range, and exclusive bounds mean that the boundary points are not included in the range. In the text form of the range, the inclusive lower bound is expressed as “[” and the exclusive lower bound is expressed as “(.” Similarly, the inclusive upper bound is expressed as “]”, and the exclusive upper bound is expressed as “).”
The functions lower_inc and upper_inc test the lower and upper bounds of the range value, respectively.
Infinite (Unbounded) Ranges
The lower bound of the range can be omitted, meaning that all points below the upper bound are included in the range. Likewise, if the upper bound of the range is omitted, all points above the lower bound are included in the range. If both the lower and upper bounds are omitted, all values of the element type are considered to be included in the range.
This corresponds to considering that the lower bound is “negative infinity” or the upper bound is “positive infinity.” Note, however, that this infinite value is never a value of the element type of the range, and cannot be part of the range. (Therefore, there is no such thing as inclusive infinite bounds; when you try to create one, it is automatically converted to an exclusive bound).
It is true that some element types have an “infinite” notation, but it is another different value in relation to the range type mechanism. For example, in the timestamp range, [today,] is the same as [today,). However, [today, infinity] can be sometimes different from [today, infinity). The latter excludes the special timestamp value infinity.
The functions lower_inf and upper_inf test infinite lower/upper bounds of each range, respectively.
Range Input/Output
The input for the range value should follow one of the following patterns:
(lower-bound,upper-bound)
(lower-bound,upper-bound]
[lower-bound,upper-bound)
[lower-bound,upper-bound]
empty
The parentheses or square brackets indicate whether the lower and upper bounds are excluded or included as described above. The last pattern is empty, indicating an empty range (a range with no points). The lower-bound can be a string that is a valid input to a subtype, or can be left empty if there is no lower bound. Similarly, upper-bound can be a string that is a valid input to a subtype, or can be left empty if there is no upper bound. Each boundary value can be quoted using “(double quote) characters; this is a must since, if the boundary value contains parentheses, square brackets, commas, double quotes, or backslashes, these characters may be mistaken for part of the range syntax. To insert double quotes or backslashes to quoted boundary values, you must precede them with a backslash. (Double quotation pairs within boundary values in double quotes are also processed as double quotation marks, similar to the rules for single quotation marks in SQL literal strings.) To protect all data characters that might be processed incorrectly with the range syntax, you may avoid quoting and use backslash escaping. In addition, when creating a boundary value that is an empty string, you should write “”; if you do not enter anything, it will mean infinite boundary. Whitespaces are allowed before and after the range values, but spaces between parentheses or square brackets are considered to be part of the lower or upper bound value. (It may or may not be important depending on the element type).
Examples:
-- includes 3, does not include 7, and does include all points in between
SELECT '[3,7)'::int4range;
-- does not include either 3 or 7, but includes all points in between
SELECT '(3,7)'::int4range;
-- includes only the single point 4
SELECT '[4,4]'::int4range;
-- includes no points (and will be normalized to 'empty')
SELECT '[4,4)'::int4range;
Constructing Ranges
Each range type has a constructor function with the same name as the range type. Using a constructor function is more convenient than writing a range literal constant, because you do not need to quote additional boundary values. A constructor function accepts two or three arguments. While the three-argument form creates a range from the third argument to the boundary of the form specified, the two-argument form creates a range of standard forms (lower bound, excluding upper bound). The third argument must be one of the followings strings: “()”, “(]”, “[]” or “[]”
Examples:
-- The full form is: lower bound, upper bound, and text argument indicating
-- inclusivity/exclusivity of bounds.
SELECT numrange(1.0, 14.0, '(]');
-- If the third argument is omitted, '[)' is assumed.
SELECT numrange(1.0, 14.0);
-- Although '(]' is specified here, on display the value will be converted to
-- canonical form, since int8range is a discrete range type (see below).
SELECT int8range(1, 14, '(]');
-- Using NULL for either bound causes the range to be unbounded on that side.
SELECT numrange(NULL, 2.2);
Discrete Range Types
The discrete range is a well-defined “step-by-step” type of which element type is integer or date. In this type, if there is no valid value between two elements, they can be said to be adjacent. In contrast to the continuous range, this is always (or almost always) possible to recognize different element values between the two given values. For example, a range beyond the numeric type, like the range beyond timestamp, is continuous. (timestamp can be processed discretely in theory because of its precision limitations, but it is better to regard it as a sequence when the size of the step is not of interest.) Another way to think about the discrete range type is to have a clear idea of the “next” or “previous” value for each element value. It should be noted that, by selecting the next or previous element value rather than the given original, it is possible to convert between expressions including the boundaries of the ranges and expressions excluding the boundaries of the ranges. For example, integer range types [4,8] and (3,9) denote the same set of values. This is not the case, however, for numerical ranges. The discrete range type must have a canonicalization function that recognizes the desired step size for the element type. The canonicalization function is especially responsible for converting the values equally to the range types that have an identity representation of a consistent inclusion or exclusion range. Unless a canonicalization function is specified, the ranges of different types are always processed as non-equivalence, even though they actually denote the same set of values. The built-in range types, int4range, int8range, and daterange, use the canonical form, which includes the lower bound and excludes the upper bound (i.e.”[)”). User defined range types may use other notations.
Defining New Range Types
You may define your own range type. The most common reason for doing this is to use a subtype range that is not provided as a built-in range type. This is an example of defining a new range type for the subtype float8.
CREATE TYPE floatrange AS RANGE (
subtype = float8,
subtype_diff = float8mi
);
SELECT '[1.234, 5.678]'::floatrange;
Since float8 is not a meaningful “step”, we do not define a canonicalization function in this example. If the subtype has a discrete value rather than a continuous value, CREATE TYPE command must specify a canonical function. The canonicalization function must take an input range value and return an equivalent range value that may be different from the boundary type. The canonical outputs of the two ranges representing the same set of values (e.g. integer range [1, 7] and [1, 8)) should be the same. It does not matter which expression is chosen to be canonical, as long as two equivalent values of the same type are always mapped to the same value of the same type. Besides controlling inclusive/exclusive bounds format, the canonicalization function can process boundary values well if the desired step size is greater than the subtype’s storable size. For example, the step size of a range type beyond timestamp can be defined as time. On this occasion, the canonicalization function may require rounding-off if it is not a multiple of the time, or an error may occur instead. If you define your own range type, you may specify different subtype B-tree operator classes or collations to use in order to change the sort order that determines which values belong to a given range. In addition, a range type to be used in a GiST or SP-GiST index should define a subtype difference or subtype_diff, a function(An index works without subtype_diff, but is much less efficient when a difference function is provided). The subtype difference function takes two input values of a subtype and returns the difference expressed as a float8 value (e.g. X minus Y). In the above example, you may use a function that underlies the normal float8 subtraction operator, but for other subtypes, type conversions may be required. Several creative ideas about how to represent differences in numbers may be needed. To the greatest range possible, the subtype_diff function must agree on the sort order implied by the selected operator class es and collations. That is, the result of this should always be a positive value when the first argument is greater than the second argument according to the sort order.
CREATE FUNCTION time_subtype_diff(x time, y time) RETURNS float8 AS
'SELECT EXTRACT(EPOCH FROM (x - y))' LANGUAGE sql STRICT IMMUTABLE;
CREATE TYPE timerange AS RANGE (
subtype = time,
subtype_diff = time_subtype_diff
);
SELECT '[11:10, 23:00]'::timerange;
For more information on creating range types, see User-defined Type.
Indexing
GiST and SP-GiST indexes are able to generate table columns of range type. The following is an example of creating a GiST index.
CREATE INDEX reservation_idx ON reservation USING GIST (during);
GiST or SP-GiST indexes can speed up queries involving range operators such as =, &&, <@, @>, <<, >>, -|-, &<, and &>.
B-tree and hash indexes may create table columns of range type as well. For these index types, the only useful range operation is basically “=”. Even if there is a B-tree sort order that uses “<” and “>” operators and is defined for range values, the order itself is arbitrary and not very useful in the real world. The B-tree and hash support for range types is intended primarily to allow sorting and hashing inside queries rather than creating actual indexes.
Constraints on Ranges
UNIQUE is a natural constraint on scalar values. However, it is appropriate not for range types but for exclusion constraints, mainly. An exclusion constraint allows the specification of constraints such as “nonoverlap” in range types.
Examples:
CREATE TABLE reservation (
during tsrange,
EXCLUDE USING GIST (during WITH &&)
);
The constraint prevents overlapping values from being simultaneously present in the table.
INSERT INTO reservation VALUES
('[2010-01-01 11:30, 2010-01-01 15:00)');
INSERT 0 1
INSERT INTO reservation VALUES
('[2010-01-01 14:45, 2010-01-01 15:45)');
ERROR: conflicting key value violates exclusion constraint "reservation_during_excl"
DETAIL: Key (during)=(["2010-01-01 14:45:00","2010-01-01 15:45:00")) conflicts
with existing key (during)=(["2010-01-01 11:30:00","2010-01-01 15:00:00")).
You can use the btree_gist extension to define an exclusion constraint on a regular scalar data type, which makes it possible to exclude/combine the range with the maximum flexibility. For instance, after btree_gist is installed, the following constraint rejects overlapping ranges only when the number of meeting rooms is equal.
CREATE EXTENSION btree_gist;
CREATE TABLE room_reservation (
room text,
during tsrange,
EXCLUDE USING GIST (room WITH =, during WITH &&)
);
INSERT INTO room_reservation VALUES
('123A', '[2010-01-01 14:00, 2010-01-01 15:00)');
INSERT 0 1
INSERT INTO room_reservation VALUES
('123A', '[2010-01-01 14:30, 2010-01-01 15:30)');
ERROR: conflicting key value violates exclusion constraint "room_reservation_room_during_excl"
DETAIL: Key (room, during)=(123A, ["2010-01-01 14:30:00","2010-01-01 15:30:00")) conflicts
with existing key (room, during)=(123A, ["2010-01-01 14:00:00","2010-01-01 15:00:00")).
INSERT INTO room_reservation VALUES
('123B', '[2010-01-01 14:30, 2010-01-01 15:30)');
INSERT 0 1
User-defined Type
You can add a new type using CREATE TYPE command. There are five types of CREATE TYPE: Composite Type, Enum Type, Range Type, Base Type, and Shell Type.
Syntex :
CREATE TYPE name AS
( [ attribute_name data_type [ COLLATE collation ] [, ... ] ] )
CREATE TYPE name AS ENUM
( [ 'label' [, ... ] ] )
CREATE TYPE name AS RANGE (
SUBTYPE = subtype
[ , SUBTYPE_OPCLASS = subtype_operator_class ]
[ , COLLATION = collation ]
[ , CANONICAL = canonical_function ]
[ , SUBTYPE_DIFF = subtype_diff_function ]
)
CREATE TYPE name (
INPUT = input_function,
OUTPUT = output_function
[ , RECEIVE = receive_function ]
[ , SEND = send_function ]
[ , TYPMOD_IN = type_modifier_input_function ]
[ , TYPMOD_OUT = type_modifier_output_function ]
[ , ANALYZE = analyze_function ]
[ , INTERNALLENGTH = { internallength | VARIABLE } ]
[ , PASSEDBYVALUE ]
[ , ALIGNMENT = alignment ]
[ , STORAGE = storage ]
[ , LIKE = like_type ]
[ , CATEGORY = category ]
[ , PREFERRED = preferred ]
[ , DEFAULT = default ]
[ , ELEMENT = element ]
[ , DELIMITER = delimiter ]
[ , COLLATABLE = collatable ]
)
CREATE TYPE name
Examples :
This is an example of creating a composite type and using it for function definition.
CREATE TYPE compfoo AS (f1 int, f2 text);
CREATE FUNCTION getfoo() RETURNS SETOF compfoo AS $$
SELECT fooid, fooname FROM foo
$$ LANGUAGE SQL;
This is an example of creating an enum type and using it for table definition.
CREATE TYPE bug_status AS ENUM ('new', 'open', 'closed');
CREATE TABLE bug (
id serial,
description text,
status bug_status
);
This is an example of creating a range type.
CREATE TYPE float8_range AS RANGE (subtype = float8, subtype_diff = float8mi);
This is an example of creating a base type and then using it for table definition.
CREATE TYPE box;
CREATE FUNCTION my_box_in_function(cstring) RETURNS box AS ... ;
CREATE FUNCTION my_box_out_function(box) RETURNS cstring AS ... ;
CREATE TYPE box (
INTERNALLENGTH = 16,
INPUT = my_box_in_function,
OUTPUT = my_box_out_function
);
CREATE TABLE myboxes (
id integer,
description box
);