general questions on Postgresql and deployment on win32 platform
I have gone through the documentation that come with version 8 beta 4 and I
have a number of questions.
(1) backup/restore
I notice that in the documentation, it seems to suggest that an online
backup, made via pg_start_backup() and pg_stop_backup() functions would back
up all databases running on the server, rather than any specific instance.
If that's the case, is it correct to state that WAL is instance specific
rather than database specific?
Does it also mean that I must back up and restore all the databases (or
database cluster in Postgresql Speak) even if I am only interested in 1
database?
(2) WAL location
I can't seem to find a way to move WAL log to a different location other
than data/pg_xlog. On *nix platform, it's a simple matter of creating a
symbolic link. But on win32, there is no equivalent. However, even under
*nix system, I believe symbolic link can only be created for directories on
the same hard drive. This seems less than optimal. Typically, one would
place database files on RAID 5 drives (to maximize random access speed) and
log files on mirrored drives (to maximize sequential access speed).
(3) Trigger
Is it correct to state that old and new keywords are only available to
PL/pgSQL function (but not a SQL function) and to row level trigger? Is
there anyway to refer to old and new rows for statement level trigger? The
visibility rule is also quite confusing. Can someone please clarify it in
terms of before/after and row/statement level triggers (i.e., 4 possible
permutations).
(4) Process/thread on win32 platform
On win32 platform, a process simply provides environment for a thread to run
(whereas a thread is more like a *nix process, win32 has something called
fibre that is more lightweight than thread). Typically, it's not recommended
to start a new process per connection for DBMS running on win32 lest its
performance would suffer. For example, SQL Server actually spawns threads to
manage connections rather than processes. Does the Postgresql on win32
platform use process or thread when a connection is made? I notice that the
configuration file lets you specify maximum number of connections. If I have
a very high number of persistent concurrent users (thus preventing
connection pooling from helping much), would performance suffer
significantly?
Thanks for any responses.
Calvin Wood wrote:
Does it also mean that I must back up and restore all the databases (or
database cluster in Postgresql Speak) even if I am only interested in 1
database?
You can use pg_dump to backup individual databases (or tables etc). A
file-level backup does require all databases in one "cluster" (as we
call it).
In fact, if you can use pg_dump to backup your databases I would
recommend doing so. It gives you much more control over restoring
partial databases or to other systems.
(2) WAL location
I can't seem to find a way to move WAL log to a different location other
than data/pg_xlog.
You can move everything else (using tablespaces). There is a symlink
maker tool for Windows (sorry, can't remember where from) but installing
to the logging drivers and then setting the default tablespace to the
RAID drives is probably simplest.
(3) Trigger
Is it correct to state that old and new keywords are only available to
PL/pgSQL function (but not a SQL function) and to row level trigger? Is
there anyway to refer to old and new rows for statement level trigger? The
visibility rule is also quite confusing. Can someone please clarify it in
terms of before/after and row/statement level triggers (i.e., 4 possible
permutations).
There has been some discussion of what OLD/NEW would mean in a statement
level trigger. I don't believe any work has been done on this yet
though. Trigger functions can be written in plpgsql, plperl, pltcl,
plpython or C (although I'm not sure which of these are supported on
Win32 yet).
As far as change visibility is concerned, statement triggers are simple.
Statement/Before = no changes
Statement/After = all changes
For row triggers
Row/Before = some changes (as the statement progresses)
Row/After = all changes
So - for row/before, the first row affected in a 1000-row update will
see the old version of all 1000 rows. The last row affected will see the
new version of 999 rows. Which order the rows are updated in is not defined.
(4) Process/thread on win32 platform
Does the Postgresql on win32
platform use process or thread when a connection is made? I notice that the
configuration file lets you specify maximum number of connections. If I have
a very high number of persistent concurrent users (thus preventing
connection pooling from helping much), would performance suffer
significantly?
PostgreSQL is process-based and this is unlikely to change (you can read
the hackers mailing list archives for details). If you have a very high
number of persistent concurrent users then win32 is probably not the
best platform to run on - your hardware is too constrained. However, in
your scenario process-based systems don't really suffer - the real
bottleneck is in starting up a new process which is noticably slower on
Win32 than most other platforms.
Since 8.0 will be the first release of PG on Windows, I wouldn't expect
performance to be as good as on the various *nix ports. It's probably
too early to say where bottlenecks will be - I would recommend testing
against your specific requirements. If you could share the results of
such tests, I'm sure they'd be gratefully received by the community.
--
Richard Huxton
Archonet Ltd
I have gone through the documentation that come with version
8 beta 4 and I have a number of questions.(1) backup/restore
I notice that in the documentation, it seems to suggest that
an online backup, made via pg_start_backup() and
pg_stop_backup() functions would back up all databases
running on the server, rather than any specific instance.
If that's the case, is it correct to state that WAL is
instance specific rather than database specific?Does it also mean that I must back up and restore all the
databases (or database cluster in Postgresql Speak) even if I
am only interested in 1 database?
If you want to use PITR based backup/recovery, yes. If you use the
"classic" way of doing pg_dump of the database, then you can do it on
individual database location.
(2) WAL location
I can't seem to find a way to move WAL log to a different
location other than data/pg_xlog. On *nix platform, it's a
simple matter of creating a symbolic link. But on win32,
there is no equivalent. However, even under *nix system, I
believe symbolic link can only be created for directories on
the same hard drive. This seems less than optimal. Typically,
one would place database files on RAID 5 drives (to maximize
random access speed) and log files on mirrored drives (to
maximize sequential access speed).
You can use either Directory Junctions (use the junction tool from
www.sysinternals.com for example), or you can mount the new disk
directly into the WAL directory (this can be done using the disk
administrator mmc snapin, or using the diskpart.exe utility)
(3) Trigger
Is it correct to state that old and new keywords are only
available to PL/pgSQL function (but not a SQL function) and
to row level trigger? Is there anyway to refer to old and new
rows for statement level trigger? The visibility rule is also
quite confusing. Can someone please clarify it in terms of
before/after and row/statement level triggers (i.e., 4
possible permutations).
Can't speak for this one specifically.
(4) Process/thread on win32 platform
On win32 platform, a process simply provides environment for
a thread to run (whereas a thread is more like a *nix
process, win32 has something called fibre that is more
lightweight than thread). Typically, it's not recommended to
start a new process per connection for DBMS running on win32
lest its performance would suffer. For example, SQL Server
actually spawns threads to manage connections rather than
processes. Does the Postgresql on win32 platform use process
or thread when a connection is made?
Postgresql on win32 uses a process/connection model. This is far from
ideal, but it's done to make sure you actually use the same database as
you do on Unix.
You can regain a large part of the loss by using pooled connections.
(BTW, SQL Server doesn't necessarily use a thread/connection model
either - it can use both fibres and some other hybrids. But you're right
in that it only uses one process)
I notice that the
configuration file lets you specify maximum number of
connections. If I have a very high number of persistent
concurrent users (thus preventing connection pooling from
helping much), would performance suffer significantly?
It would suffer some, but not very much I think. It uses more shared
memory, but that's not a big issue on win32. Some scans have to be made
over larger memory areas, and that's where you get a loss. I don't
*think* it would be significant, but no testing has been done for that
AFAIK.
//Magnus
Import Notes
Resolved by subject fallback
Calvin Wood <calvin.wood@gmail.com> writes:
symbolic link. But on win32, there is no equivalent. However, even under
*nix system, I believe symbolic link can only be created for directories on
the same hard drive. This seems less than optimal. Typically, one would
place database files on RAID 5 drives (to maximize random access speed) and
log files on mirrored drives (to maximize sequential access speed).
On Unix, a symbolic link can point anywhere. It's hard links that are
limited to the same filesystem (not necessarily a disk drive in the
hardware sense).
-Doug