Linux Database HOWTO (PostgreSQL Relational Database System): FAQ

34. FAQ - PostgreSQL Features

34.1 How do I specify a KEY or other constraints on a column?

Column constraints are not supported in PostgreSQL. As a consequence, the system does not check for duplicates.

Under 6.0, create a unique index on the column. Attempts to create duplicate of that column will report an error.

34.2 Does PostgreSQL support nested subqueries?

Subqueries are not implemented, but they can be simulated using sql functions.

34.3 How do I define a unique indices?

PostgreSQL 6.0 supports unique indices.

34.4 I've having a lot of problems using rules.

Currently, the rule system in PostgreSQL is mostly broken. It works enough to support the view mechanism, but that's about it. Use PostgreSQL rules at your own peril.

34.5 I can't seem to write into the middle of large objects reliably.

The Inversion large object system in PostgreSQL is also mostly broken. It works well enough for storing large wads of data and reading them back out, but the implementation has some underlying problems. Use PostgreSQL large objects at your own peril.

34.6 Does PostgreSQL have a graphical user interface? A report generator? A embedded query language interface?

No. No. No. Not in the official distribution at least. Some users have reported some success at using 'pgbrowse' and 'onyx' as frontends to PostgreSQL. Several contributions are working on tk based frontend tools. Ask on the mailing list.

34.7 How can I write client applications to PostgreSQL?

PostgreSQL supports a C-callable library interface called libpq as well as a Tcl-based library interface called libtcl.

Others have contributed a perl interface and a WWW gateway to PostgreSQL. See the PostgreSQL home pages for more details.

34.8 How do I prevent other hosts from accessing my PostgreSQL backend?

Use host-based authentication by modifying the file $ PGDATA/pg_hba accordingly.

34.9 How do I set up a pg_group?

Currently, there is no easy interface to set up user groups. You have to explicitly insert/update the pg_group table. For example:


        jolly=> insert into pg_group (groname, grosysid, grolist)
        jolly=>     values ('posthackers', '1234', '5443, 8261');
        INSERT 548224
        jolly=> grant insert on foo to group posthackers;
        CHANGE
        jolly=>

The fields in pg_group are:

* groname: the group name. This a char16 and should be purely alphanumeric. Do not include underscores or other punctuation. * grosysid: the group id. This is an int4. This should be unique for each group. * grolist: the list of pg_user id's that belong in the group.


   This is an int4[].

34.10 What is the exact difference between binary cursors and normal cursors?

Normal cursors return data back in ASCII format. Since data is stored natively in binary format, the system must do a conversion to produce the ASCII format. In addition, ASCII formats are often large in size than binary format. Once the attributes come back in ASCII, often the client application then has to convert it to a binary format to manipulate it anyway.

Binary cursors give you back the data in the native binary representation. Thus, binary cursors will tend to be a little faster since there's less overhead of conversion.

However, ASCII is architectural neutral whereas binary representation can differ between different machine architecture. Thus, if your client machine uses a different representation than you server machine, getting back attributes in binary format is probably not what you want. Also, if your main purpose is displaying the data in ASCII, then getting it back in ASCII will save you some effort on the client side.

34.11 Why doesn't the != operator work?

SQL specifies < > as the inequality operator, and that is what we have defined for the built-in types.

In 6.0, != is equivalent to < > .

34.12 What is a R-tree index and what is it used for?

An r-tree index is used for indexing spatial data. A hash index can't handle range searches. A B-tree index only handles range searches in a single dimension. R-tree's can handle multi-dimensional data. For example, if a R-tree index can be built on an attribute of type 'point', the system can more efficient answer queries like select all points within a bounding rectangle.

The canonical paper that describes the original R-Tree design is:

Guttman, A. "R-Trees: A Dynamic Index Structure for Spatial Searching." Proc of the 1984 ACM SIGMOD Int'l Conf on Mgmt of Data, 45-57.

You can also find this paper in Stonebraker's "Readings in Database Systems"

34.13 What is the maximum size for a tuple?

Tuples are limited to 8K bytes. Taking into account system attributes and other overhead, one should stay well shy of 8,000 bytes to be on the safe side. To use attributes larger than 8K, try using the large objects interface.

Tuples do not cross 8k boundaries so a 5k tuple will require 8k of storage.

34.14 I defined indices but my queries don't seem to make use of them. Why?

PostgreSQL does not automatically maintain statistics. One has to make an explicit 'vacuum' call to update the statistics. After statistics are updated, the optimizer has a better shot at using indices. Note that the optimizer is limited and does not use indices in some circumstances (such as OR clauses).

If the system still does not see the index, it is probably because you have created an index on a field with the improper *_ops type. For example, you have created a CHAR(4) field, but have specified a char_ops index type_class.

See the create_index manual page for information on what type classes are available. It must match the field type.

Postgres does not warn the user when the improper index is created.

Indexes not used for ORDER BY operations.

34.15 Are there ODBC drivers for PostgreSQL?

There are two ODBC drivers available, PostODBC and OpenLink ODBC.

For all people being interested in PostODBC, there are now two mailing lists devoted to the discussion of PostODBC. The mailing lists are:

* postodbc-users@listserv.direct.net * postodbc-developers@listserv.direct.net

these lists are ordinary majordomo mailing lists. You can subscribe by sending a mail to:

* majordomo@listserv.direct.net

OpenLink ODBC is currently in beta under Linux. You can get it from http://www.openlinksw.com/postgres.html It works with our standard ODBC client software so you'll have Postgres ODBC available on every client platform we support (Win, Mac, Unix, VMS).

We will probably be selling this product to people who need commercial-quality support, but a freeware version will always be available. Questions to postgres95@openlink.co.uk.

34.16 How do I use postgres for multi-dimensional indexing (> 2 dimensions)?

Builtin R-Trees can handle polygons and boxes. In theory, R-trees can be extended to handle higher number of dimensions. In practice, extending R-trees require a bit of work and we don't currently have any documentation on how to do it.

34.17 How do I do regular expression searches? case-insensitive regexp searching?


PostgreSQL supports the SQL LIKE syntax as well as more general regular
expression searching with the ~ operator. The !~ is the negated regexp
operator. ~* and !~* are the case-insensitive regular expression operators.

34.18 I can't access the database as the 'root' user.

You should not create database users with user id 0(root). They will be unable to access the database. This is a security precaution because of the ability of any user to dynamically link object modules into the database engine.

34.19 I experienced a server crash during a vacuum. How do I remove the lock file?

If the server crashes during a vacuum command, chances are it will leave a lock file hanging around. Attempts to re-run the vacuum command result in

WARN:can't create lock file -- another vacuum cleaner running?

If you are sure that no vacuum is actually running, you can remove the file called "pg_vlock" in your database directory (which is $ PGDATA/base/< dbName >)

34.20 What is the difference between the various character types?


Type            Internal Name   Notes

--------------------------------------------------

CHAR            char            1 character   

CHAR2           char2           2 characters  

CHAR4           char4           4 characters   optimized for a fixed length

CHAR8           char8           8 characters  

CHAR16          char16          16 characters 

CHAR(#)         bpchar          blank padded to the specified fixed length

VARCHAR(#)      varchar         size specifies maximum length, no padding

TEXT            text            length limited only by maximum tuple length

BYTEA           bytea           variable-length array of bytes

Remember, you need to use the internal name when creating indexes on these
fields or when doing other internal operations.

The last four types above are "varlena" types (i.e. the first four bytes is
the length, followed by the data). CHAR(#) and VARCHAR(#) allocate the
maximum number of bytes no matter how much data is stored in the field.
TEXT and BYTEA are the only character types that have variable length on
the disk.

34.21 In a query, how do I detect if a field is NULL?

PostgreSQL has two builtin keywords, "isnull" and "notnull" (note no spaces). Version 1.05 and later and 6.* understand IS NULL and IS NOT NULL.

34.22 How do I see how the query optimizer is evaluating my query?

Place the word 'EXPLAIN' at the beginning of the query, for example:

EXPLAIN SELECT * FROM table1 WHERE age = 23;

34.23 How do I create a serial field?

Postgres does not allow the user to specifiy a user column as type SERIAL. Instead, you can use each row's oid field as a unique value. However, if you need to dump and reload the database, you need to be using postgres version 1.07 or later or 6.* with pgdump's -o option or COPY's WITH OIDS option to preserver the oids.

Another valid way of doing this is to create a function:

create table my_oids (f1 int4);

insert into my_oids values (1);

create function new_oid () returns int4 as 'update my_oids set f1 = f1 + 1; select f1 from my_oids; ' language 'sql';

then:

create table my_stuff (my_key int4, value text);

insert into my_stuff values (new_oid(), 'hello');

However, keep in mind there is a race condition here where one server could do the update, then another one do an update, and they both could select the same new id. This statement should be performed within a transaction.

Sequences are implemented in 6.1

34.24 How do I create a multi-column index?

In 6.0, you can not directly create a multi-column index using create index. You need to define a function which acts on the multiple columns, then use create index with that function.

In 6.1, this feature is available.

34.25 What are the temp_XXX files in my database directory?

They are temp_ files generated by the query executor. For example, if a sort needs to be done to satisfy an ORDER BY, some temp files are generated as a result of the sort.

If you have no transactions or sorts running at the time, it is safe to delete the temp_ files.

34.26 Why are my table files not getting any smaller after a delete?

If you run vacuum in pre-6.0, unused rows will be marked for reuse, but the file blocks are not released.

In 6.0, vacuum properly shrinks tables.

34.27 Why can't I connect to my database from another machine?

The default configuration allows only connections from tcp/ip host localhost. You need to add a host entry to the file pgsql/data/pg_hba.

34.28 I get the error 'default index class unsupported' when creating an index. How do I do it?

You probably used:

create index idx1 on person using btree (name);

PostgreSQL indexes are extensible, and therefore in pre-6.0, you must specify a class_type when creating an index. Read the manual page for create index (called create_index).

Version 6.0, if you do not specify a class_type, it defaults to the proper type for the column.

34.29 Why does creating an index crash the backend server?

You have probably defined an incorrect *_ops type class for the field you are indexing.

34.30 How do I find out what indexes or operations are defined in the database?

Run the file pgsql/src/tutorial/syscat.source. It illustrates many of the 'select's needed to get information out of the database system tables.

34.31 Why do statements require an extra character at the end? Why does 'createuser' return 'unexpected last match in input()'? Why does pg_dump fail?

You have compile postgres with flex version 2.5.3. There is bug in this version of flex. Use flex version 2.5.2 or flex 2.5.4 instead. There is a doc/README.flex file which will properly patch the flex 2.5.3 source code.

34.32 All my servers crash under concurrent table access. Why?

This problem can be caused by a kernel that is not configured to support semaphores.

34.33 What tools are available for hooking postgres to Web pages?

For web integration, PHP/FI is an excellent interface. The URL for that is http://www.vex.net/php/

PHP is great for simple stuff, but for more complex stuff, some still use the perl interface and CGI.pm.

An example of using WWW with C to talk to Postgres is can be tried at:

* http://www.postgreSQL.org/~mlc

An WWW gatway based on WDB using perl can be downloaded from:

* http://www.eol.ists.ca/~dunlop/wdb -p95

34.34 What is the time-warp feature and how does it relate to vacuum?

PostgreSQL handles data changes differently than most database systems. When a row is changed in a table, the original row is marked with the time it was changed, and a new row is created with the current data. By default, only current rows are used in a table. If you specify a date/time after the table name in a FROM clause, you can access the data that was current at that time, i.e.

SELECT * FROM employees 'July 24, 1996 09:00:00'

displays employee rows in the table at the specified time. You can specify intervals like date,date, date,, ,date, or ,. This last option accesses all rows that ever existed.

INSERTed rows get a timestamp too, so rows that were not in the table at the desired time will not appear.

Vacuum removes rows that are no longer current. This time-warp feature is used by the engine for rollback and crash recovery. Expiration times can be set with purge.

In 6.0, once a table is vacuumed, the creation time of a row may be incorrect, causing time-traval to fail.

The time-travel feature will be removed in 7.0.

34.35 How do I tune the database engine for better performance?

There are two things that can be done. You can use Openlink's option to disable fsync() by starting the postmaster with a '-o -F' option. This will prevent fsync()'s from flushing to disk after every transaction.

You can also use the postmaster -B option to increase the number of shared memory buffers shared among the backend processes. If you make this parameter too high, the process will not start or crash unexpectedly. Each buffer is 8K and the defualt is 64 buffers.

34.36 What debugging features are available in PostgreSQL?

PostgreSQL has several features that report status information that can be valuable for debugging purposes.

First, by compiling with DEBUG defined, many assert()'s monitor the progress of the backend and halt the program when something unexpected occurs.

Both postmaster and postgres have several debug options available. First, whenever you start the postmaster, make sure you send the standard output and error to a log file, like:


        cd /usr/local/pgsql
        ./bin/postmaster >server.log 2>&1 &

This will put a server.log file in the top-level PostgreSQL directory. This file can contain useful information about problems or errors encountered by the server. Postmaster has a -d option that allows even more detailed information to be reported. The -d option takes a number 1-3 that specifies the debug level. The query plans in a verbose debug file can be formatted using the 'indent' program. (You may need to remove the '====' lines in 1.* releases.) Be warned that a debug level greater than one generates large log files in 1.* releases.

You can actuall run the postgres backend from the command line, and type your SQL statement directly. This is recommended ONLY for debugging purposes. Note that a newline terminates the query, not a semicolon. If you have compiled with debugging symbols, you can perhaps use a debugger to see what is happening. Because the backend was not started from the postmaster, it is not running in an identical environment and locking/backend interaction problems may not be duplicated. Some operating system can attach to a running backend directly to diagnose problems.

The postgres program has a -s, -A, -t options that can be very usefull for debugging and performance measurements.

The EXPLAIN command (see this FAQ) allows you to see how PostgreSQL is iterpreting your query.

34.37 What is an oid? What is a tid?

Oids are Postgres's answer to unique row ids or serial columns. Every row that is created in Postgres gets a unique oid. All oids generated by initdb are less than 16384 (from backend/access/transam.h). All post-initdb (user-created) oids are equal or greater that this. All these oids are unique not only within a table, or database, but unique within the entire postgres installation.

Postgres uses oids in its internal system tables to link rows in separate tables. These oids can be used to identify specific user rows and used in joins. It is recommended you use column type oid to store oid values. See the sql(l) manual page to see the other internal columns.

Tids are used to indentify specific physical rows with block and offset values. Tids change after rows are modified or reloaded. They are used by index entries to point to physical rows. They can not be accessed through sql.

34.38 What is the meaning of some of the terms used in Postgres?

Some of the source code and older documentation use terms that have more common usage. Here are some:

* row, record, tuple * attribute, field, column * table, class * retrieve, select * replace, update * append, insert * oid, serial value * portal, cursor * range variable, table name, table alias

Please let me know if you think of any more.