Moving to postgresql and some ignorant questions
I have been a long time user of mysql. Switching to Postgresql because
the true features included in 5.1 (as of this moment) are nothing to
write home about. The InnoDB stuff is highly advocated but it has its
own set of issues, and when one looks at things like backup/restore
etc, it is clearly targeted at expensive installs with full DBAs and
such, which we cannot afford.
So I have been reading up feverishly on PGSQL and it surely is a more
serious DB, which is good, but also a bit steep in its learning curve.
I have some pretty silly questions to ask below. Just to confirm that
I manage my switch as smoothly as possible!
By way of information, we have new double AMD Opterons with 3GB of
RAM. The postgres that has been installed is 8.2.3. But our DB is not
as large as some of the discussions on pgsql-general. No table is more
than 10 million records or likely to exceed that anytime soon. But I
have some heavy simultaneous user connections much like any web
application for a busy website.
In particular I have a table that needs very high availability: it has
bout 10,000 INSERTS a day, about 500,000 SELECTS a day (with or
without joins), but most importantly about 1 million UPDATEs. (It is
the UPDATE that is bothering the MYSQL engine of "MYISAM" type with
frequent data corruption).
Though other threads I have learned that multiple inserts or updates
can be sped up with:
[QUOTE]
- BEGIN TRANSACTION;
- INSERT OR UPDATE queries, ideally PREPAREd;
- COMMIT;
[/QUOTE]
QUESTION1:
Is this it? Or am I missing something in terms of execution? We use
Perl on our site but may be gradually switching to PHP at some point.
Will the above plan of execution be ok?
My queries are all optimized and indexed well. But the defragmentation
resulting from UPDATEs can be a pain. I wonder if PGSQL's autovacuum
will help. As for autovacuum we have every intention of leaving it on.
Will the following settings be alright?
[QUOTE]
autovacuum = on
vacuum_cost_delay = 30
stats_start_collector = on
stats_row_level = on
autovacuum_vacuum_threshold = 100
autovacuum_analyze_threshold = 50
[/QUOTE]
I am hoping that the frequent vacuum thresholds will help, but:
QUESTION 2:
Are these settings too aggressive? While an autovacuum is running in
the background, will it lock tables or anything? Can the tables still
be operational, and the autovacuum will automatically resume from
whatever point it was at? I am worried about how autovacuum will
perform when
QUESTION 3.
Data integrity checks in MySQL world were very patchy, relying on CGI
stuff to make sure, which does not always work. We are trying our best
to get rid of them. With postgresql, I realize we can have triggers as
well as foreign keys. But searching through old threads on this group
suggests that triggers also present performance problems. On tables
that are infrequently updated, can I write triggers without worrying
about performance? Or, how can I make sure their performance is as
best as it can be, i.e., which of the config vars is responsible for
that?
QUESTION 4:
Character sets: In MySQL we had utf-8 as our default character set.
Yet, on some tables we had the collation of latin1_general_ci, and
only on specific columns we had "utf8_general_ci" (where non-english
text needed to be stored). How should I do this in pgsql? When I do a
mysqldump of these tables, and then COPY them back into pgsql, I
always see the error "ERROR: invalid byte sequence for encoding
"UTF8": 0xa7". So I do a
\encoding latin1
And then my COPY import works. But this is not what I want. How can I
set up one of my columns in this table to be utf-8, and the rest to be
latin? Then I would like to import with "\encoding utf8". Can this be
somehow done?
Sorry for this long post, but as exciting as this switch is, it is
also daunting because I feel like I am moving into serious databases
territory and I don't want to goof up. I have read up a lot and am
continuing to, but it would be great if someone can shed some light on
the above to begin with.
TIA!
"Phoenix Kiula" <phoenix.kiula@gmail.com> writes:
Though other threads I have learned that multiple inserts or updates
can be sped up with:[QUOTE]
- BEGIN TRANSACTION;
- INSERT OR UPDATE queries, ideally PREPAREd;
- COMMIT;
[/QUOTE]QUESTION1:
Is this it? Or am I missing something in terms of execution? We use
Perl on our site but may be gradually switching to PHP at some point.
Will the above plan of execution be ok?
A transaction is a bunch of queries which you want to all get committed or
aborted together. The expensive step in Postgres is the COMMIT. Postgres does
an fsync which forces the data to go to disk (if you're using good hardware,
and don't have fsync=off). That takes from 5-15ms depending on how much data
and how fast your drives are.
Grouping more work into a single transaction makes the delay for the fsync at
COMMIT time less of a problem. Also having more connections (but not too many,
more than a few per processor is probably not helping, more than 100 and it's
probably slowing you down) also means it's less important since another
process can do some of its work while you're waiting for the fsync.
My queries are all optimized and indexed well. But the defragmentation
resulting from UPDATEs can be a pain. I wonder if PGSQL's autovacuum
will help. As for autovacuum we have every intention of leaving it on.
Will the following settings be alright?
Yes, it's a pain. Running vacuum frequently will be necessary. You may also
have to raise your fsm settings to allow Postgres to remember more free space
between vacuums.
[QUOTE]
autovacuum = on
vacuum_cost_delay = 30
stats_start_collector = on
stats_row_level = on
autovacuum_vacuum_threshold = 100
autovacuum_analyze_threshold = 50
[/QUOTE]I am hoping that the frequent vacuum thresholds will help, but:
QUESTION 2:
Are these settings too aggressive? While an autovacuum is running in
the background, will it lock tables or anything? Can the tables still
be operational, and the autovacuum will automatically resume from
whatever point it was at? I am worried about how autovacuum will
perform when
VACUUM doesn't lock tables. It's designed to operate without interfering.
It does still take up i/o bandwidth which affects performance. The
autovacuum_cost_delay above tells it to wait 30ms every few pages to try to
avoid slowing down production. You'll have to judge based on experience
whether it's taking too long with that time. You may be better off starting
with 10ms or 20ms instead.
I don't think the threshold parameters will be relevant to you. You should
look at autovacuum_vacuum_scale_factor and autovacuum_analyze_scale_factor. I
may be wrong though, someone more familiar with autovacuum in 8.2 might have
to speak up.
QUESTION 3.
Data integrity checks in MySQL world were very patchy, relying on CGI
stuff to make sure, which does not always work. We are trying our best
to get rid of them. With postgresql, I realize we can have triggers as
well as foreign keys. But searching through old threads on this group
suggests that triggers also present performance problems. On tables
that are infrequently updated, can I write triggers without worrying
about performance? Or, how can I make sure their performance is as
best as it can be, i.e., which of the config vars is responsible for
that?
Triggers are quite efficient in Postgres but they still cost something. Nobody
will be able to give you a blanket statement that you can do anything without
testing it. But you're probably better off having them and then considering
removing them later if you have a problem.
Incidentally, most data integrity checks are handled with CHECK constraints
and FOREIGN KEY constraints rather than manual triggers. They're both easier
and cheaper.
QUESTION 4:
Character sets: In MySQL we had utf-8 as our default character set.
Yet, on some tables we had the collation of latin1_general_ci, and
only on specific columns we had "utf8_general_ci" (where non-english
text needed to be stored). How should I do this in pgsql? When I do a
mysqldump of these tables, and then COPY them back into pgsql, I
always see the error "ERROR: invalid byte sequence for encoding
"UTF8": 0xa7". So I do a\encoding latin1
And then my COPY import works. But this is not what I want. How can I
set up one of my columns in this table to be utf-8, and the rest to be
latin? Then I would like to import with "\encoding utf8". Can this be
somehow done?
Sorry, this is one of the main deficiencies in Postgres. You will probably
have to convert your data to utf8 across the board and hopefully you'll find a
collation which satisfies all your needs. You can't switch encoding or
collation on the fly.
You could look at the CONVERT function which might help, but I'm not sure
exactly what you would have to do to solve your immediate problem.
If you really need multiple collations in a single database there's a function
pg_strxfrm() which was posted to this list a long time ago. It probably needs
to be updated for 8.2 but what it does is take a string and a collation and
provide a bytea which sorts properly in that collation. The problem is that
it's not very efficient and on some OSes it's extremely inefficient.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
On 14/08/07, Gregory Stark <stark@enterprisedb.com> wrote:
"Phoenix Kiula" <phoenix.kiula@gmail.com> writes:
Though other threads I have learned that multiple inserts or updates
can be sped up with:[QUOTE]
- BEGIN TRANSACTION;
- INSERT OR UPDATE queries, ideally PREPAREd;
- COMMIT;
[/QUOTE]QUESTION1:
Is this it? Or am I missing something in terms of execution? We use
Perl on our site but may be gradually switching to PHP at some point.
Will the above plan of execution be ok?A transaction is a bunch of queries which you want to all get committed or
aborted together. The expensive step in Postgres is the COMMIT. Postgres does
an fsync which forces the data to go to disk (if you're using good hardware,
and don't have fsync=off). That takes from 5-15ms depending on how much data
and how fast your drives are.
If I am reading this right, does this mean it is probably better to
leave fsync as "fsync=off" on production machines? Also, is COMMIT
automatic for my queries? In some minor testing I did (not scientific
I did at all) some queries through Perl did not update the database at
all. I had "fsync=off" in my conf file, and there was no COMMIT etc in
my SQL, just plain SQL. So I am a bit confused. What's a good starting
point?
Grouping more work into a single transaction makes the delay for the fsync at
COMMIT time less of a problem.
Agree. That's what I am trying to do. Include as many UPDATEs etc into
the same TRANSACTION block, but my worry is when I read about
autocommit and how it is enabled by default in postgresql 8.
Specifying explicit BEGIN and COMMIT blocks should only commit when I
want the DB to, or will each UPDATE in the middle of this block get
executed?
Sorry if this is a naive question. I am reading up as much as I can.
Also having more connections (but not too many,
more than a few per processor is probably not helping, more than 100 and it's
probably slowing you down) also means it's less important since another
process can do some of its work while you're waiting for the fsync.
So, again, in the conf file, is this what you recommend:
fsync=off
max_connections=100
?
Yes, it's a pain. Running vacuum frequently will be necessary. You may also
have to raise your fsm settings to allow Postgres to remember more free space
between vacuums.
Thank you for your comments about autovacuum. I have these FSM and
memory type settings settings in my conf (picked off the internet :))
--
max_fsm_relations = 1500
max_fsm_pages = 80000
shared_buffers = 21000
effective_cache_size = 21000
sort_mem = 16348
work_mem = 16348
vacuum_mem = 16348
temp_buffers = 4096
authentication_timeout = 10s
ssl = off
Do these sound right?
Incidentally, most data integrity checks are handled with CHECK constraints
and FOREIGN KEY constraints rather than manual triggers. They're both easier
and cheaper.
The problem with simple CHECK constraints is that they can only
reference the primary key in another table. What if I want more than
one column to be the same as the referenced table, but do not want to
have a compound primary key in the referenced table? From reading and
re-reading the manual, I dont think FKs allow for this. Only primary
key references are supported.
Sorry, this is one of the main deficiencies in Postgres. You will probably
have to convert your data to utf8 across the board and hopefully you'll find a
collation which satisfies all your needs.
I am happy to do this, but I don't see an ALTER DATABASE command. I
would really like not to have to execute the CREATE DATABASE command
again! Can I make the entire DB utf8 using some command now? Have not
been able to find it. ALl manual and google stuff seems to point to
the CREATE DB command only.
Many thanks!
On Tue, Aug 14, 2007 at 11:09:36PM +0800, Phoenix Kiula wrote:
If I am reading this right, does this mean it is probably better to
leave fsync as "fsync=off" on production machines?
No, you want "fsync=on" on any machine which holds data you
care about. And you want hardware which doesn't lie to you
so that "fsync is finished" really means the data is
on-disk. Else PostgreSQL cannot ensure ACID compliance.
Specifying explicit BEGIN and COMMIT blocks should only commit when I
want the DB to,
yes
or will each UPDATE in the middle of this block get
executed?
It will get executed but the effects will only become
publicly visible after COMMIT (assuming no errors in the
queries in which case you could issue COMMITs all day long
and still see no change in the database from the transaction
in which the error occurred)
So, again, in the conf file, is this what you recommend:
fsync=off
No.
max_connections=100
Yes, depending on your usage patterns.
The problem with simple CHECK constraints is that they can only
reference the primary key in another table.
Not so. Or you need to explain what you mean by "simple
CHECK constraints".
I am happy to do this, but I don't see an ALTER DATABASE command.
ALTER DATABASE is there, of course, but it doesn't help you.
I would really like not to have to execute the CREATE DATABASE command
again!
You'll have to, unfortunately, I fear. Once you go about it
take the opportunity and make sure the locale and encoding
settings of initdb are compatible with an UTF8 database.
Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Thank you AM. Very useful note, must appreciate the info you shared.
About COPY, I have two simple questions:
1. Is there anything like an ALTER DATABASE command? I would like to
change the character set without having to recreate the DATABASE
again!
2. Also, when I do a mysqldump I seem to be ending up with "\r" in my
lines, or so the COPY command tells me:
ERROR: literal carriage return found in data
HINT: Use "\r" to represent carriage return.
CONTEXT: COPY links, line 962974
But afaik, there is nothing in my data in MySQL that contains the new
line characters! I have checked and rechecked in the DB to find such
characters. I cannot open that text file to check because it contains
a lot of data (from about 7 million records). So I tried to do a "grep
'\r' FILENAME" but that doesn't help either because of course there is
a newline character, that is what is used to separate a line! Is
there anything else I can do to make sure mysqldump data comes
through? Perhaps separate fields by TABS and lines by a specific
character such as "`"?
TIA for any thoughts.
Import Notes
Reply to msg id not found: 549048DA-73AA-4A79-8BCA-1AF0B3DFF86C@themactionfaction.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 08/14/07 10:09, Phoenix Kiula wrote:
On 14/08/07, Gregory Stark <stark@enterprisedb.com> wrote:
"Phoenix Kiula" <phoenix.kiula@gmail.com> writes:
[snip]
QUESTION1:
Is this it? Or am I missing something in terms of execution? We use
Perl on our site but may be gradually switching to PHP at some point.
Will the above plan of execution be ok?A transaction is a bunch of queries which you want to all get committed or
aborted together. The expensive step in Postgres is the COMMIT. Postgres does
an fsync which forces the data to go to disk (if you're using good hardware,
and don't have fsync=off). That takes from 5-15ms depending on how much data
and how fast your drives are.If I am reading this right, does this mean it is probably better to
leave fsync as "fsync=off" on production machines? Also, is COMMIT
automatic for my queries? In some minor testing I did (not scientific
I did at all) some queries through Perl did not update the database at
all. I had "fsync=off" in my conf file, and there was no COMMIT etc in
my SQL, just plain SQL. So I am a bit confused. What's a good starting
point?
[snip]
So, again, in the conf file, is this what you recommend:
fsync=off
I seriously doubt that's what he means.
"fsync=off" *is* faster than "fsync=on", but leaves your data at
risk in case of a hardware crash.
Turning it off during initial data load is quite common, though.
Just remember to turn it back on!!!
- --
Ron Johnson, Jr.
Jefferson LA USA
Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFGwczGS9HxQb37XmcRAhUvAJ9LsQPvd6tQDp+/Mzh3jl8oPs4mHQCffjev
2uCJa3x0/NzUQBVmaJMcVR4=
=kVMU
-----END PGP SIGNATURE-----
"Phoenix Kiula" <phoenix.kiula@gmail.com> writes:
If I am reading this right, does this mean it is probably better to
leave fsync as "fsync=off" on production machines?
Only if you can reconstruct your data from other sources in the case of a
server crash or power failure. I wouldn't recommend it.
Also, is COMMIT automatic for my queries? In some minor testing I did (not
scientific I did at all) some queries through Perl did not update the
database at all. I had "fsync=off" in my conf file, and there was no COMMIT
etc in my SQL, just plain SQL. So I am a bit confused. What's a good
starting point?
psql runs in autocommit mode by default. If you want multiple queries in a
transaction you have to issue a BEGIN statement. Drivers may do various things
by default.
Grouping more work into a single transaction makes the delay for the fsync at
COMMIT time less of a problem.Agree. That's what I am trying to do. Include as many UPDATEs etc into
the same TRANSACTION block, but my worry is when I read about
autocommit and how it is enabled by default in postgresql 8.
Specifying explicit BEGIN and COMMIT blocks should only commit when I
want the DB to, or will each UPDATE in the middle of this block get
executed?Sorry if this is a naive question. I am reading up as much as I can.
If you issue a BEGIN then nothing gets committed until you issue a COMMIT. If
anything happens in the meantime then everything you've done since the BEGIN
disappears.
For batch work like loading then it makes sense to handle a 100-1000 records
per transaction. But for most purposes you want to group things together based
on what you want to happen if an error occurs. Group together into a single
transaction precisely the changes that you want to be committed together or
rolled back together. Don't structure your program around the performance
issues.
For the remaining questions I would say you need to experiment. Perhaps others
will have more ideas.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Phoenix Kiula wrote:
Agree. That's what I am trying to do. Include as many UPDATEs etc into
the same TRANSACTION block, but my worry is when I read about
autocommit and how it is enabled by default in postgresql 8.
Specifying explicit BEGIN and COMMIT blocks should only commit when I
want the DB to, or will each UPDATE in the middle of this block get
executed?
The database doesn't have autocommit, AFAIK.
Some of the clients have, but it only applies if you don't put your SQL
statements between BEGIN; and COMMIT; (or ROLLBACK;) statements.
I never really use anything but psql for a client, so I can'treally say
how other clients (pgadmin fe.) handle this.
Incidentally, most data integrity checks are handled with CHECK constraints
and FOREIGN KEY constraints rather than manual triggers. They're both easier
and cheaper.The problem with simple CHECK constraints is that they can only
reference the primary key in another table. What if I want more than
one column to be the same as the referenced table, but do not want to
have a compound primary key in the referenced table? From reading and
re-reading the manual, I dont think FKs allow for this. Only primary
key references are supported.
You're confusing CHECK constraints and FOREIGN KEY constraints. They're
different things ;)
CHECK constraints verify that data in a certain column matches a certain
condition. I'm not sure they can reference columns in other tables,
unless you wrap those checks in stored procedures maybe...
For example:
CREATE TABLE test (
age int NOT NULL CHECK (age > 0)
);
Next to that, you can define DOMAINs - basically your own customized
data types that can follow _your_ rules. Admittedly I have never done
that yet, but it's supposed to be one of the key features of the
relational model (I've seen claims that you're actually not supposed to
use the base types, but define domains for all your data types).
*And* you can define compound foreign key constraints,
for example:
CREATE TABLE employee (
employee_id serial NOT NULL,
company_id integer NOT NULL,
name text NOT NULL,
CONSTRAINT employee_pk
PRIMARY KEY (employee_id, company_id)
);
CREATE TABLE division (
employee_id integer NOT NULL,
company_id integer NOT NULL,
name text NOT NULL,
CONSTRAINT division_fk
FOREIGN KEY (employee_id, company_id)
REFERENCES employee
ON DELETE SET NULL
ON UPDATE CASCADE
);
Also a nice trick, when performing DDL statements (CREATE TABLE and
friends), you can wrap them in a transaction and commit (or rollback) if
you like the end result (or not). I believe the only exception to that
rule is CREATE DATABASE.
--
Alban Hertroys
alban@magproductions.nl
magproductions b.v.
T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede
// Integrate Your World //
On 8/14/07, Alban Hertroys <alban@magproductions.nl> wrote:
*And* you can define compound foreign key constraints,
for example:CREATE TABLE employee (
employee_id serial NOT NULL,
company_id integer NOT NULL,
name text NOT NULL,
CONSTRAINT employee_pk
PRIMARY KEY (employee_id, company_id)
);CREATE TABLE division (
employee_id integer NOT NULL,
company_id integer NOT NULL,
name text NOT NULL,
CONSTRAINT division_fk
FOREIGN KEY (employee_id, company_id)
REFERENCES employee
ON DELETE SET NULL
ON UPDATE CASCADE
);
You can also have multiple foreign keys to different tables, and to
non-primary keys, as long as they are pointing to columns with a
unique constraint on them.
Also a nice trick, when performing DDL statements (CREATE TABLE and
friends), you can wrap them in a transaction and commit (or rollback) if
you like the end result (or not). I believe the only exception to that
rule is CREATE DATABASE.
One of my all time favorite features of pgsql.
create tablespace is also non-transactable.
You're confusing CHECK constraints and FOREIGN KEY constraints. They're
different things ;)CHECK constraints verify that data in a certain column matches a certain
condition. I'm not sure they can reference columns in other tables,
unless you wrap those checks in stored procedures maybe...For example:
CREATE TABLE test (
age int NOT NULL CHECK (age > 0)
);Next to that, you can define DOMAINs - basically your own customized
data types that can follow _your_ rules. Admittedly I have never done
that yet, but it's supposed to be one of the key features of the
relational model (I've seen claims that you're actually not supposed to
use the base types, but define domains for all your data types).*And* you can define compound foreign key constraints,
for example:CREATE TABLE employee (
employee_id serial NOT NULL,
company_id integer NOT NULL,
name text NOT NULL,
CONSTRAINT employee_pk
PRIMARY KEY (employee_id, company_id)
);CREATE TABLE division (
employee_id integer NOT NULL,
company_id integer NOT NULL,
name text NOT NULL,
CONSTRAINT division_fk
FOREIGN KEY (employee_id, company_id)
REFERENCES employee
ON DELETE SET NULL
ON UPDATE CASCADE
);Also a nice trick, when performing DDL statements (CREATE TABLE and
friends), you can wrap them in a transaction and commit (or rollback) if
you like the end result (or not). I believe the only exception to that
rule is CREATE DATABASE.
Thank you for this detailed explanation Alban. But I want to include
FK constraints on a table2 on a column in the referenced table1 where
column values are not unique.
I just want row data to be consistent for the same ID. Yes, this is
repetitive and enough to rile DB purists, but it has its uses (for
performance in certain reporting queries).
Related Questions:
1. Should I explore views for this? I am very skeptical about them
coming from MySQL as the performance of MySQL views is horrendous.
Besides, if they are updated everytime, there's little use for a view
in the first place, I may as well simply query the table -- or is this
wrong? The UPDATE only locks and commits to the table, and then the
view gets auto updated?
2. Or, I could do this with triggers, and now I realize also with
"rules" (CREATE RULE). Which are faster, rules or triggers, are they
similar in speed? Basically I want the rule/trigger to cascade the
update to table1.col1 and table1.col2 to similar columns in table2.
I will surely be exploring views, and reading more of this:
http://www.postgresql.org/docs/8.2/interactive/rules-views.html , but
I just wanted to know what the usual thoughts on this are. Are views
updated as soon as its underlying table(s) are updated? Can I control
the duration or timing of their update? I searched for "materialized
views", after having seen that word on the performance list, but most
of the search results and the discussions on that forum are beyond my
comprehension!!
Would appreciate any thoughts on performance of views. PGSQL seems to
treat views just like tables, so I wonder if there's any performance
gain!
If you issue a BEGIN then nothing gets committed until you issue a COMMIT. If
anything happens in the meantime then everything you've done since the BEGIN
disappears.
There are some cases where I would like to bunch queries into a
transaction purely for speed purposes, but they're not interdependent
for integrity. E.g.,
BEGIN TRANSACTION;
UPDATE1;
UPDATE2;
UPDATE3;
COMMIT;
If UPDATE2 fails because it, say, violates a foreign key constraint,
then so be it. I want UPDATE3 to go ahead. Is this not possible, or is
there an option I can use to do these kind of independent-query
transactions?
In response to "Phoenix Kiula" <phoenix.kiula@gmail.com>:
If you issue a BEGIN then nothing gets committed until you issue a COMMIT. If
anything happens in the meantime then everything you've done since the BEGIN
disappears.There are some cases where I would like to bunch queries into a
transaction purely for speed purposes, but they're not interdependent
for integrity. E.g.,BEGIN TRANSACTION;
UPDATE1;
UPDATE2;
UPDATE3;
COMMIT;If UPDATE2 fails because it, say, violates a foreign key constraint,
then so be it. I want UPDATE3 to go ahead. Is this not possible, or is
there an option I can use to do these kind of independent-query
transactions?
That's not possible, by design.
However, your application could keep track of which queries have run, and
if one fails, start the transaction over without the failing query.
But the rule is, if any query within the transaction errors, then all queries
within the transaction are rolled back.
--
Bill Moran
http://www.potentialtech.com
Phoenix Kiula wrote on 14.08.2007 19:46:
There are some cases where I would like to bunch queries into a
transaction purely for speed purposes, but they're not interdependent
for integrity. E.g.,BEGIN TRANSACTION;
UPDATE1;
UPDATE2;
UPDATE3;
COMMIT;If UPDATE2 fails because it, say, violates a foreign key constraint,
then so be it. I want UPDATE3 to go ahead. Is this not possible, or is
there an option I can use to do these kind of independent-query
transactions?
You could do this with savepoints which are a kind of sub-transaction inside a
"bigger" transaction.
e.g.:
BEGIN TRANSACTION;
SAVEPOINT sp1;
UPDATE1;
IF (failed) rollback to savepoint sp1;
SAVEPOINT sp1;
UPDATE2;
IF (failed) rollback to savepoint sp2;
COMMIT;
Details here: http://www.postgresql.org/docs/8.2/static/sql-savepoint.html
But I doubt that this would be faster that doing a transaction per update.
Thomas
On 8/14/07, Bill Moran <wmoran@potentialtech.com> wrote:
But the rule is, if any query within the transaction errors, then all
queries
within the transaction are rolled back.
This is the default behaviour, but with psql and ON_ERROR_ROLLBACK parameter
the behaviour can be changed. See
http://www.postgresql.org/docs/8.2/interactive/app-psql.html
Regards
MP
You could do this with savepoints which are a kind of sub-transaction inside a
"bigger" transaction.e.g.:
BEGIN TRANSACTION;SAVEPOINT sp1;
UPDATE1;
IF (failed) rollback to savepoint sp1;SAVEPOINT sp1;
UPDATE2;
IF (failed) rollback to savepoint sp2;
Thanks Thomas, this is a great feature even if I am not looking for it
right now, I bet I can use it at some point!
"Phoenix Kiula" <phoenix.kiula@gmail.com> writes:
There are some cases where I would like to bunch queries into a
transaction purely for speed purposes, but they're not interdependent
for integrity. E.g.,
How do you know you need to do this for speed if you haven't run it yet? I
would suggest you build your application around the application needs first,
then later look at how to optimize it.
Remember the two rules of optimization:
1) Don't
2) (for experts only) Don't yet
The only case where you should consider batching together transactions like
that is if you're processing a batch data load of some kind. In that case you
have a large volume of updates and they're all single-threaded. But usually in
that case you want to abort the whole load if you have a problem.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
On 8/14/07, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
*And* you can define compound foreign key constraints,
Thank you for this detailed explanation Alban. But I want to include
FK constraints on a table2 on a column in the referenced table1 where
column values are not unique.I just want row data to be consistent for the same ID. Yes, this is
repetitive and enough to rile DB purists, but it has its uses (for
performance in certain reporting queries).
I'm not sure I understand this. If the set of values you want to
reference is not unique, what are the semantics of this reference?
What should happen if one of those sets gets deleted, for instance?
Perhaps you mean table1 has columns "id" and "col2", and the "col2"
values are not unique, but "id" and "col2" together are? In that case
you can simply put a UNIQUE constraint on them together, and use both
columns in a single foreign key constraint from table2.
1. Should I explore views for this? I am very skeptical about them
coming from MySQL as the performance of MySQL views is horrendous.
Besides, if they are updated everytime, there's little use for a view
in the first place, I may as well simply query the table -- or is this
wrong? The UPDATE only locks and commits to the table, and then the
view gets auto updated?
A view is simply an alternate presentation of data in one or more
tables. It's a persistent query, and the performance is effectively
the same as the query itself. (More on "materialized views" below.)
In regard to locking, it sounds like you're thinking in MYISAM terms;
PostgreSQL uses MVCC instead of locking. Two UPDATEs attempted at the
same time may cause one to wait for the other (if both touch the same
rows), but during this time all readers (e.g. SELECT) will continue to
run without waiting. If you were running into concurrent performance
issues with MYISAM due to locking, PostgreSQL might surprise you.
2. Or, I could do this with triggers, and now I realize also with
"rules" (CREATE RULE). Which are faster, rules or triggers, are they
similar in speed? Basically I want the rule/trigger to cascade the
update to table1.col1 and table1.col2 to similar columns in table2.
They are simply different mechanisms. Rules rewrite a query during
the parsing phase, and then run the eventual query tree as if you'd
entered it yourself. Triggers are procedural actions in response to
events. Normally you pick one based on the semantics of what you want
to do.
If you can use foreign keys as above, ON UPDATE CASCADE will probably
do what you want without having to do anything else.
I will surely be exploring views, and reading more of this:
http://www.postgresql.org/docs/8.2/interactive/rules-views.html , but
I just wanted to know what the usual thoughts on this are. Are views
updated as soon as its underlying table(s) are updated? Can I control
the duration or timing of their update? I searched for "materialized
views", after having seen that word on the performance list, but most
of the search results and the discussions on that forum are beyond my
comprehension!!
Yeah, you're confusing standard views with "materialized views". A
standard view is basically a persistent query, and does not store any
data itself. PostgreSQL implements them with rules. When you SELECT
against a view, the query rewriter combines your conditions with the
stored SELECT used to create the view, and runs the final query
against the table(s) you created the view from, just as if you'd
entered it yourself. If the original query against the table(s) is
complex, a view makes a very nice way to simplify things for
applications.
If you want to create an "updateable view", where applications can
INSERT/UPDATE/DELETE against the view itself, you need to add rules to
it to translate operations on the presented view columns into
operations on the table(s) behind them.
A "materialized view" is basically a view that stores copies of its
data. This is useful in situations where you need some kind of cache,
possibly because the query behind the view takes a long time to
complete. PostgreSQL does not have built in support for materialized
views, but rules and triggers can be used to create them. The
discussions you encountered revolve around the finer points of doing
that.
Would appreciate any thoughts on performance of views. PGSQL seems to
treat views just like tables, so I wonder if there's any performance
gain!
In general, a view is performance neutral: it's just a mechanism for
simplifying presentation of data.
It also has uses for security, since you can create a view and grant
roles access to it while still denying them access to the underlying
tables. This could be used to hide a particular table column, for
instance. It doesn't sound like this is of any use for your
application though.
On 8/14/07, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
Thank you for this detailed explanation Alban. But I want to include
FK constraints on a table2 on a column in the referenced table1 where
column values are not unique.I just want row data to be consistent for the same ID. Yes, this is
repetitive and enough to rile DB purists, but it has its uses (for
performance in certain reporting queries).
Then you need to define a lookup table, and have both of your tables
reference it by foreign key. You can create an update trigger on one
of the child tables to put a row into the lookup table if it doesn't
exist. If I'm not clear, let me know.
1. Should I explore views for this? I am very skeptical about them
coming from MySQL as the performance of MySQL views is horrendous.
Besides, if they are updated everytime, there's little use for a view
in the first place, I may as well simply query the table -- or is this
wrong? The UPDATE only locks and commits to the table, and then the
view gets auto updated?
In pgsql, views are actually empty tables that are defined by a SQL
statement and fired by rules when you select from the view. I.e.
create view abc as select * from xyz becomes an empty table abc which
has a rule for selects that runs select * from xyz when you access it.
The performance of select * from abc will be almost exactly the same
as select * from xyz, except for some very small overhead from the
rules engine.
The real uses for views are to allow you to reduce query complexity in
the client. Suppose you have a query that joins and / or unions a
dozen tables with really complex join logic. you can just wrap it in
a view, and when you select from the view, postgresql will execute the
real query behind it as though you passed it in.
2. Or, I could do this with triggers, and now I realize also with
"rules" (CREATE RULE). Which are faster, rules or triggers, are they
similar in speed? Basically I want the rule/trigger to cascade the
update to table1.col1 and table1.col2 to similar columns in table2.
I think you're talking about updatable views, which you can build with
postgresql. Rules can let you do this pretty easily.
I just wanted to know what the usual thoughts on this are. Are views
updated as soon as its underlying table(s) are updated?
Yep, because views are just enclosed queries.
Note that you CAN do materialized views with pgsql. Once you've
gotten familiar with regular postgresql stuff, look up materialized
views for postgresql again on google. they're not that hard really,
but most the time you really don't need them.
Would appreciate any thoughts on performance of views. PGSQL seems to
treat views just like tables, so I wonder if there's any performance
gain!
nope, but no great loss either.
----- Original Message -----
From: "madhtr" <madhtr@schif.org>
To: "Tom Lane" <tgl@sss.pgh.pa.us>
Cc: <pgsql-general@postgresql.org>
Sent: Wednesday, February 14, 2007 22:33
Subject: Re: [GENERAL] pqlib in c++: PQconnectStart PQconnectPoll
Another line of thought, given the reading-between-the-lines conclusion
that you are trying to use PG 7.3 libraries on Windows, is that there
was something broken in the async-connect code back then on that
platform. If you really are trying to do that, do yourself a favor and
move to 8.0 or later. Nobody's going to be very interested in fixing
7.3. (I did try your program with 7.3 on Unix, though, and it seemed
fine except the error message was spelled a bit differently.)Ty, I'll check that ... :)
Rats ... my source version is 8.2.3, psql returns 8.1.4 for select
version();
back to the drawing board ... It worked for you, I just have to make it work
for me ... mebbe I'll go through the source a bit, ty:)
madhtr
Hi Tom :)
Is this:
libpq-fe.h:
/**
$PostgreSQL: pgsql/src/interfaces/libpq/libpq-fe.h,v 1.134 2006/10/04
00:30:13 momjian Exp $
**/
the latest version of libpq?
----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "madhtr" <madhtr@schif.org>
Cc: <pgsql-general@postgresql.org>
Sent: Tuesday, August 14, 2007 21:42
Subject: Re: [GENERAL] pqlib in c++: PQconnectStart PQconnectPoll
Show quoted text
"madhtr" <madhtr@schif.org> writes:
From: "Tom Lane" <tgl@sss.pgh.pa.us>
... although it takes a good long while (several seconds) because of the
"sleep(1)" in the interaction with the postmaster.Sleep(1) should be only 1/1000 of a second. I do that so I don't hammer
the
processor with my while loop when i am not using a select().Ah. I was interpreting it in Unix terms, where sleep() measures in
seconds. With a wait of a few msec it might not be too intolerable.regards, tom lane