replace all with * in pg_hba.conf
If people are happy with Tom's suggestion of using '*' instead of 'all'
in pg_hba.conf I will prepare a patch for it.
(I will also replace the ugly long IP6 localhost netmask with a CIDR mask).
cheers
andrew
Andrew Dunstan wrote:
If people are happy with Tom's suggestion of using '*' instead of
'all' in pg_hba.conf I will prepare a patch for it.
Well, while we're breaking stuff in the name of improvement, what about
databases named "*" or databases with spaces in their names?
Peter Eisentraut wrote:
Andrew Dunstan wrote:
If people are happy with Tom's suggestion of using '*' instead of
'all' in pg_hba.conf I will prepare a patch for it.Well, while we're breaking stuff in the name of improvement, what about
databases named "*" or databases with spaces in their names?
Good point. Perhaps we need to provide for an escaping mechanism in the
routines that parse the file, although personally I have little sympathy
for anyone who names a database '*'. I think it comes into the category
of "Doctor, it hurts when I do this" ... "Then stop doing that." Spaces
are a more likely problem, especially when we get W32 native users.
cheers
andrew
I wrote:
Peter Eisentraut wrote:
Andrew Dunstan wrote:
If people are happy with Tom's suggestion of using '*' instead of
'all' in pg_hba.conf I will prepare a patch for it.Well, while we're breaking stuff in the name of improvement, what
about databases named "*" or databases with spaces in their names?Good point. Perhaps we need to provide for an escaping mechanism in
the routines that parse the file, although personally I have little
sympathy for anyone who names a database '*'. I think it comes into
the category of "Doctor, it hurts when I do this" ... "Then stop doing
that." Spaces are a more likely problem, especially when we get W32
native users.
Looking at the code I discover that there is already provision covering
spaces etc., because you can quote names. It's even documented ;-)
The minimal disturbance change might be to teach the parser to
distinguish between a quoted 'all' and an unquoted 'all', and forget the
'*' idea. Alternatively, do the same sort of thing, but replacing 'all'
with '*'. A patch for the first would be quite tiny - similar for '*'
except for extra doc and sample file changes.
cheers
andrew
Andrew Dunstan <andrew@dunslane.net> writes:
The minimal disturbance change might be to teach the parser to
distinguish between a quoted 'all' and an unquoted 'all', and forget the
'*' idea.
Probably we ought to go with that, on backwards-compatibility grounds.
regards, tom lane
On Fri, 19 Dec 2003, Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
The minimal disturbance change might be to teach the parser to
distinguish between a quoted 'all' and an unquoted 'all', and forget the
'*' idea.Probably we ought to go with that, on backwards-compatibility grounds.
why not do both, but deprecate the use of all in the docs? say with an
eventual goal of removing the use of all altogether in 2 releases?
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
The minimal disturbance change might be to teach the parser to
distinguish between a quoted 'all' and an unquoted 'all', and forget the
'*' idea.Probably we ought to go with that, on backwards-compatibility grounds.
OK, here's the patch. Should we also do this for "sameuser" and
"samegroup" for the sake of completness?
cheers
andrew
Attachments:
quote-all.patchtext/plain; name=quote-all.patchDownload
Index: hba.c
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/backend/libpq/hba.c,v
retrieving revision 1.118
diff -c -w -r1.118 hba.c
*** hba.c 5 Dec 2003 15:50:31 -0000 1.118
--- hba.c 19 Dec 2003 15:26:18 -0000
***************
*** 97,102 ****
--- 97,103 ----
char *end_buf = buf + (bufsz - 1);
bool in_quote = false;
bool was_quote = false;
+ bool saw_quote = false;
/* Move over initial whitespace and commas */
while ((c = getc(fp)) != EOF && (pg_isblank(c) || c == ','))
***************
*** 149,155 ****
--- 150,159 ----
was_quote = false;
if (c == '"')
+ {
in_quote = !in_quote;
+ saw_quote = true;
+ }
c = getc(fp);
}
***************
*** 161,167 ****
--- 165,179 ----
if (c != EOF)
ungetc(c, fp);
}
+
+
+ /* append newline to a magical "all" */
+
+ if ( !saw_quote && strncmp(start_buf,"all",3) == 0 )
+ *buf++ = '\n';
+
*buf = '\0';
+
}
/*
***************
*** 446,452 ****
return true;
}
else if (strcmp(tok, user) == 0 ||
! strcmp(tok, "all") == 0)
return true;
}
--- 458,464 ----
return true;
}
else if (strcmp(tok, user) == 0 ||
! strcmp(tok, "all\n") == 0)
return true;
}
***************
*** 463,469 ****
for (tok = strtok(param_str, MULTI_VALUE_SEP); tok != NULL; tok = strtok(NULL, MULTI_VALUE_SEP))
{
! if (strcmp(tok, "all") == 0)
return true;
else if (strcmp(tok, "sameuser") == 0)
{
--- 475,481 ----
for (tok = strtok(param_str, MULTI_VALUE_SEP); tok != NULL; tok = strtok(NULL, MULTI_VALUE_SEP))
{
! if (strcmp(tok, "all\n") == 0)
return true;
else if (strcmp(tok, "sameuser") == 0)
{
Index: pg_hba.conf.sample
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/backend/libpq/pg_hba.conf.sample,v
retrieving revision 1.47
diff -c -w -r1.47 pg_hba.conf.sample
*** pg_hba.conf.sample 13 Sep 2003 16:43:38 -0000 1.47
--- pg_hba.conf.sample 19 Dec 2003 15:26:18 -0000
***************
*** 35,40 ****
--- 35,44 ----
# encrypted passwords. OPTION is the ident map or the name of the PAM
# service.
#
+ # Database and user names containing spaces, commas, quotes and other special
+ # characters can be quoted. Quoting "all" makes the name lose its special
+ # character, and just match a database or username called "all".
+ #
# This file is read on server startup and when the postmaster receives
# a SIGHUP signal. If you edit the file on a running system, you have
# to SIGHUP the postmaster for the changes to take effect, or use
***************
*** 59,62 ****
# IPv4-style local connections:
host all all 127.0.0.1 255.255.255.255 trust
# IPv6-style local connections:
! host all all ::1 ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff trust
--- 63,66 ----
# IPv4-style local connections:
host all all 127.0.0.1 255.255.255.255 trust
# IPv6-style local connections:
! host all all ::1/128 trust
Marc G. Fournier wrote:
On Fri, 19 Dec 2003, Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
The minimal disturbance change might be to teach the parser to
distinguish between a quoted 'all' and an unquoted 'all', and forget the
'*' idea.Probably we ought to go with that, on backwards-compatibility grounds.
why not do both, but deprecate the use of all in the docs? say with an
eventual goal of removing the use of all altogether in 2 releases?
The extra value is fairly low IMNSHO if we can distinguish between a
magical and a non-magical 'all' - see the patch I just submitted.
cheers
andrew
Andrew Dunstan <andrew@dunslane.net> writes:
Marc G. Fournier wrote:
why not do both, but deprecate the use of all in the docs? say with an
eventual goal of removing the use of all altogether in 2 releases?
The extra value is fairly low IMNSHO if we can distinguish between a
magical and a non-magical 'all' - see the patch I just submitted.
Also, your point about the special sameuser and samegroup keywords is a
good one. We should make those all work consistently (ie, quoting makes
it not a keyword). Going in the "*" direction would only make sense
if that were the only special case --- but it isn't. I don't think we
want to start choosing random symbols for sameuser, samegroup, and other
stuff we might think of in future.
regards, tom lane
Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
Marc G. Fournier wrote:
why not do both, but deprecate the use of all in the docs? say with an
eventual goal of removing the use of all altogether in 2 releases?The extra value is fairly low IMNSHO if we can distinguish between a
magical and a non-magical 'all' - see the patch I just submitted.Also, your point about the special sameuser and samegroup keywords is a
good one. We should make those all work consistently (ie, quoting makes
it not a keyword). Going in the "*" direction would only make sense
if that were the only special case --- but it isn't. I don't think we
want to start choosing random symbols for sameuser, samegroup, and other
stuff we might think of in future.
Right. Revised patch sent to patches list.
cheers
andrew
I wrote:
Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
The minimal disturbance change might be to teach the parser to
distinguish between a quoted 'all' and an unquoted 'all', and forget
the '*' idea.Probably we ought to go with that, on backwards-compatibility grounds.
OK, here's the patch. Should we also do this for "sameuser" and
"samegroup" for the sake of completness?
Revised patch for this as suggested by Tom.
cheers
andrew
Attachments:
quote-all.patchtext/plain; name=quote-all.patchDownload
Index: hba.c
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/backend/libpq/hba.c,v
retrieving revision 1.118
diff -c -w -r1.118 hba.c
*** hba.c 5 Dec 2003 15:50:31 -0000 1.118
--- hba.c 19 Dec 2003 17:42:20 -0000
***************
*** 87,102 ****
* token or EOF, whichever comes first. If no more tokens on line,
* return null string as *buf and position file to beginning of
* next line or EOF, whichever comes first. Allow spaces in quoted
! * strings. Terminate on unquoted commas. Handle comments.
*/
void
next_token(FILE *fp, char *buf, const int bufsz)
{
int c;
char *start_buf = buf;
! char *end_buf = buf + (bufsz - 1);
bool in_quote = false;
bool was_quote = false;
/* Move over initial whitespace and commas */
while ((c = getc(fp)) != EOF && (pg_isblank(c) || c == ','))
--- 87,105 ----
* token or EOF, whichever comes first. If no more tokens on line,
* return null string as *buf and position file to beginning of
* next line or EOF, whichever comes first. Allow spaces in quoted
! * strings. Terminate on unquoted commas. Handle comments. Treat
! * unquoted keywords that might be user names or database names
! * specially, by appending a newline to them.
*/
void
next_token(FILE *fp, char *buf, const int bufsz)
{
int c;
char *start_buf = buf;
! char *end_buf = buf + (bufsz - 2);
bool in_quote = false;
bool was_quote = false;
+ bool saw_quote = false;
/* Move over initial whitespace and commas */
while ((c = getc(fp)) != EOF && (pg_isblank(c) || c == ','))
***************
*** 149,155 ****
--- 152,161 ----
was_quote = false;
if (c == '"')
+ {
in_quote = !in_quote;
+ saw_quote = true;
+ }
c = getc(fp);
}
***************
*** 161,167 ****
--- 167,188 ----
if (c != EOF)
ungetc(c, fp);
}
+
+
+ if ( !saw_quote &&
+ (
+ strncmp(start_buf,"all",3) == 0 ||
+ strncmp(start_buf,"sameuser",8) == 0 ||
+ strncmp(start_buf,"samegroup",9) == 0
+ )
+ )
+ {
+ /* append newline to a magical keyword */
+ *buf++ = '\n';
+ }
+
*buf = '\0';
+
}
/*
***************
*** 446,452 ****
return true;
}
else if (strcmp(tok, user) == 0 ||
! strcmp(tok, "all") == 0)
return true;
}
--- 467,473 ----
return true;
}
else if (strcmp(tok, user) == 0 ||
! strcmp(tok, "all\n") == 0)
return true;
}
***************
*** 463,476 ****
for (tok = strtok(param_str, MULTI_VALUE_SEP); tok != NULL; tok = strtok(NULL, MULTI_VALUE_SEP))
{
! if (strcmp(tok, "all") == 0)
return true;
! else if (strcmp(tok, "sameuser") == 0)
{
if (strcmp(dbname, user) == 0)
return true;
}
! else if (strcmp(tok, "samegroup") == 0)
{
if (check_group(dbname, user))
return true;
--- 484,497 ----
for (tok = strtok(param_str, MULTI_VALUE_SEP); tok != NULL; tok = strtok(NULL, MULTI_VALUE_SEP))
{
! if (strcmp(tok, "all\n") == 0)
return true;
! else if (strcmp(tok, "sameuser\n") == 0)
{
if (strcmp(dbname, user) == 0)
return true;
}
! else if (strcmp(tok, "samegroup\n") == 0)
{
if (check_group(dbname, user))
return true;
***************
*** 1068,1074 ****
errmsg("cannot use Ident authentication without usermap field")));
found_entry = false;
}
! else if (strcmp(usermap_name, "sameuser") == 0)
{
if (strcmp(pg_user, ident_user) == 0)
found_entry = true;
--- 1089,1095 ----
errmsg("cannot use Ident authentication without usermap field")));
found_entry = false;
}
! else if (strcmp(usermap_name, "sameuser\n") == 0)
{
if (strcmp(pg_user, ident_user) == 0)
found_entry = true;
Index: pg_hba.conf.sample
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/backend/libpq/pg_hba.conf.sample,v
retrieving revision 1.47
diff -c -w -r1.47 pg_hba.conf.sample
*** pg_hba.conf.sample 13 Sep 2003 16:43:38 -0000 1.47
--- pg_hba.conf.sample 19 Dec 2003 17:42:20 -0000
***************
*** 35,40 ****
--- 35,45 ----
# encrypted passwords. OPTION is the ident map or the name of the PAM
# service.
#
+ # Database and user names containing spaces, commas, quotes and other special
+ # characters can be quoted. Quoting one of the keywords "all", "sameuser" or
+ # "samegroup" makes the name lose its special character, and just match a
+ # database or username with that name.
+ #
# This file is read on server startup and when the postmaster receives
# a SIGHUP signal. If you edit the file on a running system, you have
# to SIGHUP the postmaster for the changes to take effect, or use
***************
*** 59,62 ****
# IPv4-style local connections:
host all all 127.0.0.1 255.255.255.255 trust
# IPv6-style local connections:
! host all all ::1 ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff trust
--- 64,67 ----
# IPv4-style local connections:
host all all 127.0.0.1 255.255.255.255 trust
# IPv6-style local connections:
! host all all ::1/128 trust
That IPv6 cleanup is major!
! host all all ::1 ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff trust
! host all all ::1/128 trust
Your patch has been added to the PostgreSQL unapplied patches list at:
http://momjian.postgresql.org/cgi-bin/pgpatches
I will try to apply it within the next 48 hours.
---------------------------------------------------------------------------
Andrew Dunstan wrote:
I wrote:
Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
The minimal disturbance change might be to teach the parser to
distinguish between a quoted 'all' and an unquoted 'all', and forget
the '*' idea.Probably we ought to go with that, on backwards-compatibility grounds.
OK, here's the patch. Should we also do this for "sameuser" and
"samegroup" for the sake of completness?Revised patch for this as suggested by Tom.
cheers
andrew
Index: hba.c =================================================================== RCS file: /projects/cvsroot/pgsql-server/src/backend/libpq/hba.c,v retrieving revision 1.118 diff -c -w -r1.118 hba.c *** hba.c 5 Dec 2003 15:50:31 -0000 1.118 --- hba.c 19 Dec 2003 17:42:20 -0000 *************** *** 87,102 **** * token or EOF, whichever comes first. If no more tokens on line, * return null string as *buf and position file to beginning of * next line or EOF, whichever comes first. Allow spaces in quoted ! * strings. Terminate on unquoted commas. Handle comments. */ void next_token(FILE *fp, char *buf, const int bufsz) { int c; char *start_buf = buf; ! char *end_buf = buf + (bufsz - 1); bool in_quote = false; bool was_quote = false;/* Move over initial whitespace and commas */ while ((c = getc(fp)) != EOF && (pg_isblank(c) || c == ',')) --- 87,105 ---- * token or EOF, whichever comes first. If no more tokens on line, * return null string as *buf and position file to beginning of * next line or EOF, whichever comes first. Allow spaces in quoted ! * strings. Terminate on unquoted commas. Handle comments. Treat ! * unquoted keywords that might be user names or database names ! * specially, by appending a newline to them. */ void next_token(FILE *fp, char *buf, const int bufsz) { int c; char *start_buf = buf; ! char *end_buf = buf + (bufsz - 2); bool in_quote = false; bool was_quote = false; + bool saw_quote = false;/* Move over initial whitespace and commas */ while ((c = getc(fp)) != EOF && (pg_isblank(c) || c == ',')) *************** *** 149,155 **** --- 152,161 ---- was_quote = false;if (c == '"') + { in_quote = !in_quote; + saw_quote = true; + }c = getc(fp); } *************** *** 161,167 **** --- 167,188 ---- if (c != EOF) ungetc(c, fp); } + + + if ( !saw_quote && + ( + strncmp(start_buf,"all",3) == 0 || + strncmp(start_buf,"sameuser",8) == 0 || + strncmp(start_buf,"samegroup",9) == 0 + ) + ) + { + /* append newline to a magical keyword */ + *buf++ = '\n'; + } + *buf = '\0'; + }/*
***************
*** 446,452 ****
return true;
}
else if (strcmp(tok, user) == 0 ||
! strcmp(tok, "all") == 0)
return true;
}--- 467,473 ---- return true; } else if (strcmp(tok, user) == 0 || ! strcmp(tok, "all\n") == 0) return true; }***************
*** 463,476 ****for (tok = strtok(param_str, MULTI_VALUE_SEP); tok != NULL; tok = strtok(NULL, MULTI_VALUE_SEP)) { ! if (strcmp(tok, "all") == 0) return true; ! else if (strcmp(tok, "sameuser") == 0) { if (strcmp(dbname, user) == 0) return true; } ! else if (strcmp(tok, "samegroup") == 0) { if (check_group(dbname, user)) return true; --- 484,497 ----for (tok = strtok(param_str, MULTI_VALUE_SEP); tok != NULL; tok = strtok(NULL, MULTI_VALUE_SEP)) { ! if (strcmp(tok, "all\n") == 0) return true; ! else if (strcmp(tok, "sameuser\n") == 0) { if (strcmp(dbname, user) == 0) return true; } ! else if (strcmp(tok, "samegroup\n") == 0) { if (check_group(dbname, user)) return true; *************** *** 1068,1074 **** errmsg("cannot use Ident authentication without usermap field"))); found_entry = false; } ! else if (strcmp(usermap_name, "sameuser") == 0) { if (strcmp(pg_user, ident_user) == 0) found_entry = true; --- 1089,1095 ---- errmsg("cannot use Ident authentication without usermap field"))); found_entry = false; } ! else if (strcmp(usermap_name, "sameuser\n") == 0) { if (strcmp(pg_user, ident_user) == 0) found_entry = true; Index: pg_hba.conf.sample =================================================================== RCS file: /projects/cvsroot/pgsql-server/src/backend/libpq/pg_hba.conf.sample,v retrieving revision 1.47 diff -c -w -r1.47 pg_hba.conf.sample *** pg_hba.conf.sample 13 Sep 2003 16:43:38 -0000 1.47 --- pg_hba.conf.sample 19 Dec 2003 17:42:20 -0000 *************** *** 35,40 **** --- 35,45 ---- # encrypted passwords. OPTION is the ident map or the name of the PAM # service. # + # Database and user names containing spaces, commas, quotes and other special + # characters can be quoted. Quoting one of the keywords "all", "sameuser" or + # "samegroup" makes the name lose its special character, and just match a + # database or username with that name. + # # This file is read on server startup and when the postmaster receives # a SIGHUP signal. If you edit the file on a running system, you have # to SIGHUP the postmaster for the changes to take effect, or use *************** *** 59,62 **** # IPv4-style local connections: host all all 127.0.0.1 255.255.255.255 trust # IPv6-style local connections: ! host all all ::1 ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff trust --- 64,67 ---- # IPv4-style local connections: host all all 127.0.0.1 255.255.255.255 trust # IPv6-style local connections: ! host all all ::1/128 trust
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Patch applied. Thanks.
---------------------------------------------------------------------------
Andrew Dunstan wrote:
I wrote:
Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
The minimal disturbance change might be to teach the parser to
distinguish between a quoted 'all' and an unquoted 'all', and forget
the '*' idea.Probably we ought to go with that, on backwards-compatibility grounds.
OK, here's the patch. Should we also do this for "sameuser" and
"samegroup" for the sake of completness?Revised patch for this as suggested by Tom.
cheers
andrew
Index: hba.c =================================================================== RCS file: /projects/cvsroot/pgsql-server/src/backend/libpq/hba.c,v retrieving revision 1.118 diff -c -w -r1.118 hba.c *** hba.c 5 Dec 2003 15:50:31 -0000 1.118 --- hba.c 19 Dec 2003 17:42:20 -0000 *************** *** 87,102 **** * token or EOF, whichever comes first. If no more tokens on line, * return null string as *buf and position file to beginning of * next line or EOF, whichever comes first. Allow spaces in quoted ! * strings. Terminate on unquoted commas. Handle comments. */ void next_token(FILE *fp, char *buf, const int bufsz) { int c; char *start_buf = buf; ! char *end_buf = buf + (bufsz - 1); bool in_quote = false; bool was_quote = false;/* Move over initial whitespace and commas */ while ((c = getc(fp)) != EOF && (pg_isblank(c) || c == ',')) --- 87,105 ---- * token or EOF, whichever comes first. If no more tokens on line, * return null string as *buf and position file to beginning of * next line or EOF, whichever comes first. Allow spaces in quoted ! * strings. Terminate on unquoted commas. Handle comments. Treat ! * unquoted keywords that might be user names or database names ! * specially, by appending a newline to them. */ void next_token(FILE *fp, char *buf, const int bufsz) { int c; char *start_buf = buf; ! char *end_buf = buf + (bufsz - 2); bool in_quote = false; bool was_quote = false; + bool saw_quote = false;/* Move over initial whitespace and commas */ while ((c = getc(fp)) != EOF && (pg_isblank(c) || c == ',')) *************** *** 149,155 **** --- 152,161 ---- was_quote = false;if (c == '"') + { in_quote = !in_quote; + saw_quote = true; + }c = getc(fp); } *************** *** 161,167 **** --- 167,188 ---- if (c != EOF) ungetc(c, fp); } + + + if ( !saw_quote && + ( + strncmp(start_buf,"all",3) == 0 || + strncmp(start_buf,"sameuser",8) == 0 || + strncmp(start_buf,"samegroup",9) == 0 + ) + ) + { + /* append newline to a magical keyword */ + *buf++ = '\n'; + } + *buf = '\0'; + }/*
***************
*** 446,452 ****
return true;
}
else if (strcmp(tok, user) == 0 ||
! strcmp(tok, "all") == 0)
return true;
}--- 467,473 ---- return true; } else if (strcmp(tok, user) == 0 || ! strcmp(tok, "all\n") == 0) return true; }***************
*** 463,476 ****for (tok = strtok(param_str, MULTI_VALUE_SEP); tok != NULL; tok = strtok(NULL, MULTI_VALUE_SEP)) { ! if (strcmp(tok, "all") == 0) return true; ! else if (strcmp(tok, "sameuser") == 0) { if (strcmp(dbname, user) == 0) return true; } ! else if (strcmp(tok, "samegroup") == 0) { if (check_group(dbname, user)) return true; --- 484,497 ----for (tok = strtok(param_str, MULTI_VALUE_SEP); tok != NULL; tok = strtok(NULL, MULTI_VALUE_SEP)) { ! if (strcmp(tok, "all\n") == 0) return true; ! else if (strcmp(tok, "sameuser\n") == 0) { if (strcmp(dbname, user) == 0) return true; } ! else if (strcmp(tok, "samegroup\n") == 0) { if (check_group(dbname, user)) return true; *************** *** 1068,1074 **** errmsg("cannot use Ident authentication without usermap field"))); found_entry = false; } ! else if (strcmp(usermap_name, "sameuser") == 0) { if (strcmp(pg_user, ident_user) == 0) found_entry = true; --- 1089,1095 ---- errmsg("cannot use Ident authentication without usermap field"))); found_entry = false; } ! else if (strcmp(usermap_name, "sameuser\n") == 0) { if (strcmp(pg_user, ident_user) == 0) found_entry = true; Index: pg_hba.conf.sample =================================================================== RCS file: /projects/cvsroot/pgsql-server/src/backend/libpq/pg_hba.conf.sample,v retrieving revision 1.47 diff -c -w -r1.47 pg_hba.conf.sample *** pg_hba.conf.sample 13 Sep 2003 16:43:38 -0000 1.47 --- pg_hba.conf.sample 19 Dec 2003 17:42:20 -0000 *************** *** 35,40 **** --- 35,45 ---- # encrypted passwords. OPTION is the ident map or the name of the PAM # service. # + # Database and user names containing spaces, commas, quotes and other special + # characters can be quoted. Quoting one of the keywords "all", "sameuser" or + # "samegroup" makes the name lose its special character, and just match a + # database or username with that name. + # # This file is read on server startup and when the postmaster receives # a SIGHUP signal. If you edit the file on a running system, you have # to SIGHUP the postmaster for the changes to take effect, or use *************** *** 59,62 **** # IPv4-style local connections: host all all 127.0.0.1 255.255.255.255 trust # IPv6-style local connections: ! host all all ::1 ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff trust --- 64,67 ---- # IPv4-style local connections: host all all 127.0.0.1 255.255.255.255 trust # IPv6-style local connections: ! host all all ::1/128 trust
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Is there a way to force database to load
a frequently-accessed table into cache and keep it there?
Thanks,
Mike.
On Fri, Jan 16, 2004 at 12:00:08PM -0500, Michael Brusser wrote:
Is there a way to force database to load
a frequently-accessed table into cache and keep it there?
If it is frequently accessed, I guess it would be in the cachke
permanently....
--
__________________________________________________
"Nothing is as subjective as reality"
Reinoud van Leeuwen reinoud.v@n.leeuwen.net
http://www.xs4all.nl/~reinoud
__________________________________________________
Michael Brusser <michael@synchronicity.com> writes:
Is there a way to force database to load a frequently-accessed table
into cache and keep it there?
No.
BTW, this idea has been suggested in the past, so check the archives
for the prior discussions on this topic. The usual response is that
the PostgreSQL bufmgr should already placing hot pages into the cache,
so there isn't really a need for this mechanism. (And if the bufmgr
doesn't do this well enough, we should improve the bufmgr -- as Jan
has done for 7.5)
-Neil
On Fri, 16 Jan 2004, Michael Brusser wrote:
Is there a way to force database to load
a frequently-accessed table into cache and keep it there?
Nope. But there is a new cache buffer handler that may make it into 7.5
that would make that happen automagically.
This discussion seems likely to have a major effect on DBT-3 (DSS-type)
performance from PostgreSQL...
On Fri, 16 Jan 2004, Michael Brusser wrote:
Is there a way to force database to load
a frequently-accessed table into cache and keep it there?Scott Marlow replied...
Nope. But there is a new cache buffer handler that may make it into
7.5
that would make that happen automagically.
The important question here is "what forces blocks out of cache?" rather
than thinking about how to directly keep them there.
Jeroen T. Vermeulen wrote:
Sent: Friday, January 16, 2004 23:02
Subject: [HACKERS] Reverse scans?Would it be doable, and would it be useful, to try to alternate the
directions of table and index scans every time each table/index was
fully scanned?I was thinking that it could help cache performance at various levels
in cases where data at the end of a large table, say, that remained in
memory after a scan, would otherwise be flushed out by a new scan of
the
same table. If the next scan of the same table was to go in the other
direction, any remains of the last time around that were still in the
filesystem cache, buffer pool, hard disk cache etc. would stand a
greater
chance of being reused.
Jereon's idea is a good one when we consider the current behaviour,
which is
that large scans are placed into block buffer cache, which then forces
other data out. I would like to question the latter behaviour, so we can
address the cause and not just the symptom.
Earlier versions of Oracle had a parameter called something like
SMALL_TABLE_THRESHOLD. Any scan of a table larger than that threshold
was NOT placed into buffer cache, but was consumed directly by the
shadow process (the backend). Teradata also uses a similar buffer
control technique for large table scans.
If a table is too large to fit into buffer, it clearly wasn't going to
be cached properly in the first place; Jereon's idea only works well for
tables near to the size of the cache. If the table is MUCH bigger then
it will have very little gain. Good original thinking, though I'm not
sure its worth it.
Oracle 9i now offers some variety for buffer cache management (as does
DB2). You can specify at the tablespace and object level whether to use
one of two buffer pools, KEEP or RECYCLE. The idea is to separate out
the two types of blocks - ones that are there because they're well used
and other blocks that are there at the moment, but unlikely to stay.
My suggestion would be to:
- split the buffer cache into two, just as Oracle does: KEEP & RECYCLE.
This could default to KEEP=66% of total memory available, but could also
be settable by init parameter.
[changes to the memory management routines]
- if we do a scan on a table whose size in blocks is more than some
fraction (25%?) of KEEP bufferpool then we place the blocks into RECYCLE
bufferpool. This can be decided immediately following optimization,
rather than including it within the optimizer decision process since we
aren't going to change the way the statement executes, we're just going
to stop it from having an adverse effect on other current or future
statements.
[additional test to set parameter, then work out where to note it]
Notice that I haven't suggested that the KEEP/RECYCLE option could be
specified at table level. That optionality sounds like a lot of extra
work, when what is needed is the automatic avoidance of cache-spoiling
behaviour. (This would still mean that very large indexes with random
request patterns would still spoil cache...maybe implement that later?)
This would remove most reasons for spoiling the cache and blocks would
then leave the cache only when they were genuinely no longer wanted.
Any comments?? Takers?
Simon,
have you read src/backend/storage/buffer/README of current CVS tip?
The algorithm in the new replacement strategy is an attempt to figure
that SMALL_TABLE_THRESHOLD automatically. Do you see anything that can
be improved in that algorithm?
Jan
Simon Riggs wrote:
This discussion seems likely to have a major effect on DBT-3 (DSS-type)
performance from PostgreSQL...On Fri, 16 Jan 2004, Michael Brusser wrote:
Is there a way to force database to load
a frequently-accessed table into cache and keep it there?Scott Marlow replied...
Nope. But there is a new cache buffer handler that may make it into
7.5
that would make that happen automagically.
The important question here is "what forces blocks out of cache?" rather
than thinking about how to directly keep them there.Jeroen T. Vermeulen wrote:
Sent: Friday, January 16, 2004 23:02
Subject: [HACKERS] Reverse scans?Would it be doable, and would it be useful, to try to alternate the
directions of table and index scans every time each table/index was
fully scanned?I was thinking that it could help cache performance at various levels
in cases where data at the end of a large table, say, that remained in
memory after a scan, would otherwise be flushed out by a new scan ofthe
same table. If the next scan of the same table was to go in the other
direction, any remains of the last time around that were still in the
filesystem cache, buffer pool, hard disk cache etc. would stand agreater
chance of being reused.
Jereon's idea is a good one when we consider the current behaviour,
which is
that large scans are placed into block buffer cache, which then forces
other data out. I would like to question the latter behaviour, so we can
address the cause and not just the symptom.Earlier versions of Oracle had a parameter called something like
SMALL_TABLE_THRESHOLD. Any scan of a table larger than that threshold
was NOT placed into buffer cache, but was consumed directly by the
shadow process (the backend). Teradata also uses a similar buffer
control technique for large table scans.If a table is too large to fit into buffer, it clearly wasn't going to
be cached properly in the first place; Jereon's idea only works well for
tables near to the size of the cache. If the table is MUCH bigger then
it will have very little gain. Good original thinking, though I'm not
sure its worth it.Oracle 9i now offers some variety for buffer cache management (as does
DB2). You can specify at the tablespace and object level whether to use
one of two buffer pools, KEEP or RECYCLE. The idea is to separate out
the two types of blocks - ones that are there because they're well used
and other blocks that are there at the moment, but unlikely to stay.My suggestion would be to:
- split the buffer cache into two, just as Oracle does: KEEP & RECYCLE.
This could default to KEEP=66% of total memory available, but could also
be settable by init parameter.
[changes to the memory management routines]
- if we do a scan on a table whose size in blocks is more than some
fraction (25%?) of KEEP bufferpool then we place the blocks into RECYCLE
bufferpool. This can be decided immediately following optimization,
rather than including it within the optimizer decision process since we
aren't going to change the way the statement executes, we're just going
to stop it from having an adverse effect on other current or future
statements.
[additional test to set parameter, then work out where to note it]Notice that I haven't suggested that the KEEP/RECYCLE option could be
specified at table level. That optionality sounds like a lot of extra
work, when what is needed is the automatic avoidance of cache-spoiling
behaviour. (This would still mean that very large indexes with random
request patterns would still spoil cache...maybe implement that later?)This would remove most reasons for spoiling the cache and blocks would
then leave the cache only when they were genuinely no longer wanted.Any comments?? Takers?
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #
"Simon Riggs" <simon@2ndquadrant.com> writes:
Any comments?? Takers?
Um, did you read the discussion of the ARC buffer management algorithm
that's already been implemented for 7.5?
The main objection I have to doing it Oracle's way is that that creates
more parameters that DBAs have to get right for reasonable performance.
Self-tuning algorithms are better, when available.
regards, tom lane
Jan Wieck wrote:
have you read src/backend/storage/buffer/README of current CVS tip?
The algorithm in the new replacement strategy is an attempt to figure
that SMALL_TABLE_THRESHOLD automatically. Do you see anything that can
be improved in that algorithm?
Jan,
I've read src/backend/storage/buffer/README rev 1.6 as you suggest. The
new algorithm looks great - many thanks for implementing that.
I'm not able to improve on this for the general case - I especially like
the automatic management that it gives, allowing you to avoid additional
DBA set parameters (and the coding to add these option
settings/keywords).
My concern was for DBT-3 performance and general Decision Support (DSS)
workloads, where large proportion of table scans occur (not on the DBT-3
single-threaded test). The new strategy is much better than the older
one and is likely to have a positive effect in this area. I don't think,
right now, that anything further should be changed, in the interests of
stability.
For the record/for the future: My observation was that two commercial
databases focused on DSS use a strategy which in terms of the new ARC
implementation is effectively: "place blocks in T1 (RECENCY/RECYCLE
buffer) and NEVER promote them to T2 (FREQUENCY/KEEP buffer)" when they
do large object scans.
In the new README, you note that:
StrategyHintVacuum(bool vacuum_active)
Because vacuum reads all relations of the entire
database
through the buffer manager, it can greatly disturb the
buffer replacement strategy. This function is used by
vacuum
to inform that all subsequent buffer lookups are caused
by vacuum scanning relations.
...I would say that scans of very large tables also "greatly disturb the
buffer replacement strategy", i.e. have exactly the same effect on the
cache as the Vacuum utility.
You'd clearly thought of the idea before me, though with regard to
Vacuum.
If we know ahead of time that a large scan is going to have this effect,
why wait for the ARC to play its course, why not take exactly the same
action?
Have large scans call StrategyHint also. (Maybe rename it...?)...of
course, some extra code to establish it IS a large scan...
...large table lookup should wait until a shared catalog cache is
implemented
Anyway, this idea can wait at least until we have extensive performance
tuning on DBT-3 with 7.5. Thanks again for adding the new algorithm.
Best Regards, Simon
...
Simon Riggs wrote:
...
My suggestion would be to:
- split the buffer cache into two, just as Oracle does: KEEP &
RECYCLE.
This could default to KEEP=66% of total memory available, but could
also
be settable by init parameter.
[changes to the memory management routines]
- if we do a scan on a table whose size in blocks is more than some
fraction (25%?) of KEEP bufferpool then we place the blocks into
RECYCLE
bufferpool. This can be decided immediately following optimization,
rather than including it within the optimizer decision process since
we
aren't going to change the way the statement executes, we're just
going
Show quoted text
to stop it from having an adverse effect on other current or future
statements.
[additional test to set parameter, then work out where to note it]
Jan Wieck wrote:
have you read src/backend/storage/buffer/README of current CVS tip?
Tom Lane wrote:
Um, did you read the discussion of the ARC buffer management algorithm
that's already been implemented for 7.5?
Tom, Jan,
No, I hadn't read this. Thank you both for your time and trouble to
point this out for me, which I was not aware of.
My understanding, possibly faulty, was that, if work was completed, then
it appears on the TODO list with a dash in front of it. The new cache
management strategy isn't mentioned there, so was not aware that any
work was completed (or even in progress). No finger pointing, just an
observation of how the development process works...
If the TODO-list-with-dash isn't the correct place to have looked, is
there another list of committed changes for the next release? The latest
README in CVS doesn't have a list of "what's new in 7.5" or similar.
Do we need such a list? (I'd be happy to compile and maintain this if it
agreed that it is a good idea to have such a document or process as
separate from TODO - I'll be doing this anyway before I pass further
comments!)
Regards, Simon
Simon,
thanks for the time to give this further thought.
Simon Riggs wrote:
If we know ahead of time that a large scan is going to have this effect,
why wait for the ARC to play its course, why not take exactly the same
action?
Have large scans call StrategyHint also. (Maybe rename it...?)...of
course, some extra code to establish it IS a large scan...
...large table lookup should wait until a shared catalog cache is
implemented
The problem with this is a) how to detect that something will be a large
scan, and b) how to decide what is a large scan in the first place.
Large sequential scans in warehousing are often part of more complex
join operations. And just because something returns a large number of
result rows doesn't mean that the input data was that much.
As for the definition of "large" itself, this depends on the size of the
buffer cache and the access pattern of the application. As you surely
have noticed, the usual sizes of B1+T1 = T2+B2 = C in the algorithm.
Buffers evicted from T1 are remembered in B1, and because of that even
repeated sequential scans of the same large relation will only cycle
through T1 blocks, never cause any turbulence in T2 or B2.
The only thing that will affect T2 and B2 dramatically by adjusting the
cache split point is multiple scanning of more than one significantly
large but smaller than C table. Scanning the same large but smaller C
table over and over will have it after the second scan in T2, where it
belongs. But having two tables A and B that are both just smaller C and
having an access pattern like A, A, B, B, A, A, ... will cause many B1
hits and thereby increase the target T1 size. And it must be exactly
that access pattern, because A, A, A, B, B, B, A, A, A, ... produces a
complete MISS on the first, a B1 hit on the second and a B2 hit on the
third scan, so it will up and down the split point evenly.
Honestly, I don't even know what type of application could possibly
produce such a screwed access pattern. And I am absolutely confident one
can find corner cases to wring down Oracles complicated configuration
harness more easily.
Anyway, this idea can wait at least until we have extensive performance
tuning on DBT-3 with 7.5. Thanks again for adding the new algorithm.
Everyone is always welcome to try and show that something can be
improved. And we are in the middle of the 7.5 development cycle, so feel
free to hack around.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #
"Simon Riggs" <simon@2ndquadrant.com> writes:
If the TODO-list-with-dash isn't the correct place to have looked, is
there another list of committed changes for the next release?
We tend to rely on the CVS commit logs as the definitive source. You
can pull the info from the CVS server (I use cvs2cl.pl to format the
results nicely), or read the archives of pgsql-committers.
In theory there should be a section at the head of release.sgml
mentioning the major changes done-so-far, but for various reasons this
hasn't gotten installed in the 7.5 branch yet. (Look at the CVS
versions during 7.4 development to see how we did it last time.)
As far as the ARC change goes, I believe Jan still considers it a
work-in-progress, so it may not be appropriate to list yet anyway.
(Jan, where are you on that exactly?)
Do we need such a list? (I'd be happy to compile and maintain this if it
agreed that it is a good idea to have such a document or process as
separate from TODO - I'll be doing this anyway before I pass further
comments!)
If you wanted to go through the existing 7.5 commits and write up a
new done-so-far section, it'd save someone else (like me or Bruce)
from having to do it sometime soon ...
regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes:
In theory there should be a section at the head of release.sgml
mentioning the major changes done-so-far, but for various reasons
this hasn't gotten installed in the 7.5 branch yet. (Look at the
CVS versions during 7.4 development to see how we did it last time.)
Well, keep in mind we didn't do it very effectively in 7.4 :-) The
vast majority of changes weren't recorded there, and the ones that
were had to be fleshed out quite a lot in the actual release notes.
The last time that someone (Peter and myself, IIRC) suggested that we
really incrementally maintain the release notes during the development
cycle, Bruce said that he personally finds it more comfortable to
summarize the CVS changelogs all at once shortly before we release the
first beta. AFAIR that's where the discussion ended.
-Neil
Neil Conway <neilc@samurai.com> writes:
Tom Lane <tgl@sss.pgh.pa.us> writes:
In theory there should be a section at the head of release.sgml
mentioning the major changes done-so-far, but for various reasons
this hasn't gotten installed in the 7.5 branch yet. (Look at the
CVS versions during 7.4 development to see how we did it last time.)
Well, keep in mind we didn't do it very effectively in 7.4 :-) The
vast majority of changes weren't recorded there, and the ones that
were had to be fleshed out quite a lot in the actual release notes.
The last time that someone (Peter and myself, IIRC) suggested that we
really incrementally maintain the release notes during the development
cycle, Bruce said that he personally finds it more comfortable to
summarize the CVS changelogs all at once shortly before we release the
first beta. AFAIR that's where the discussion ended.
It's fine with me if Bruce prefers to build the release notes directly
from the change logs. As I saw it, the purpose of the temporary list of
things-done-so-far is not to be the raw material for the release notes.
It's to let alpha testers know about major changes that they might want
to test. As such, it's fine that it's incomplete.
The other way we could handle this goal is to be a tad more vigorous about
checking off items as "done" in the TODO list. However, Bruce generally
doesn't bother to make a new entry in the TODO list if someone does
something that wasn't in the list to begin with, and so I'm not sure
it's the right vehicle.
regards, tom lane
OK, I will attempt to draw together this information as currently
stands. If this makes any sense, we can discuss what the
requirement/process is for regular maintenance (daily/weekly/monthly
etc).
Understood to mean "changes in next release (current progress)" - items
that have been completed/committed since last release, for the purpose
of informing developers/testers what's new PRIOR to full release.
Leaving unobstructed the functions of
- TODO list - a combined list of desired work items (Bruce)
- Release Notes - final list of features of a release (Bruce)
This should help alpha testing, which should allow more control of what
actually does get released (and therefore what the contents of Release
Notes should be)
Best Regards, Simon
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Friday, January 23, 2004 20:40
To: Neil Conway
Cc: simon@2ndquadrant.com; 'Jan Wieck'; 'Postgresql Hackers'
Subject: Re: 7.5 change documentationNeil Conway <neilc@samurai.com> writes:
Tom Lane <tgl@sss.pgh.pa.us> writes:
In theory there should be a section at the head of release.sgml
mentioning the major changes done-so-far, but for various reasons
this hasn't gotten installed in the 7.5 branch yet. (Look at the
CVS versions during 7.4 development to see how we did it last
time.)
Well, keep in mind we didn't do it very effectively in 7.4 :-) The
vast majority of changes weren't recorded there, and the ones that
were had to be fleshed out quite a lot in the actual release notes.The last time that someone (Peter and myself, IIRC) suggested that
we
really incrementally maintain the release notes during the
development
cycle, Bruce said that he personally finds it more comfortable to
summarize the CVS changelogs all at once shortly before we release
the
first beta. AFAIR that's where the discussion ended.
It's fine with me if Bruce prefers to build the release notes directly
from the change logs. As I saw it, the purpose of the temporary list
of
things-done-so-far is not to be the raw material for the release
notes.
It's to let alpha testers know about major changes that they might
want
to test. As such, it's fine that it's incomplete.
The other way we could handle this goal is to be a tad more vigorous
about
checking off items as "done" in the TODO list. However, Bruce
generally
Show quoted text
doesn't bother to make a new entry in the TODO list if someone does
something that wasn't in the list to begin with, and so I'm not sure
it's the right vehicle.regards, tom lane
Jan,
Happy to continue the discussion...though without changing my suggestion
that we defer any further more specialised improvements for now.
Jan Wieck replied to...
Simon Riggs wrote:If we know ahead of time that a large scan is going to have this
effect,
why wait for the ARC to play its course, why not take exactly the
same
action?
Have large scans call StrategyHint also. (Maybe rename it...?)...of
course, some extra code to establish it IS a large scan...
...large table lookup should wait until a shared catalog cache is
implementedThe problem with this is a) how to detect that something will be a
large
scan, and b) how to decide what is a large scan in the first place.
My thoughts are that we know immediately prior to execution whether or
not a plan calls for a full table scan (FTS) (or not). We also know the
table and therefore its size. A large table in this context is one that
would disrupt the cache if it made it onto T2. We can discuss an
appropriate and usefully simple rule, perhaps sizeoftable(T) > 2*C???
Large sequential scans in warehousing are often part of more complex
join operations.
Yes, I agree. PostgreSQL is particularly prone to this currently,
because of the high number of plans that resolve to FTS. Complexity of
plan shouldn't effect the basic situation that we are reading all the
blocks of a table and putting them in sequentially into T1 and then
working on them. Plan complexity may increase the time that a T1 block
stays in memory, with subsequent increase in probability of promotion to
T1.
And just because something returns a large number of
result rows doesn't mean that the input data was that much.
I agree also that overall execution time may be unrelated to whether a
"large" table is involved. The number of output rows shouldn't have any
effect on input rows and thus data blocks that need to be cached.
(Jan gives a detailed analysis...ending with)
Honestly, I don't even know what type of application could possibly
produce such a screwed access pattern. And I am absolutely confident
one
can find corner cases to wring down Oracles complicated configuration
harness more easily.
I agree with everything you say. The algorithm copes well with almost
every sequential pattern of access and there is significant benefit from
ignoring the very very very rare cases that might give it problems.
My thoughts are about multiple concurrent accesses, specifically FTS on
large tables, rather than sequential ones.
Buffers evicted from T1 are remembered in B1, and because of that even
repeated sequential scans of the same large relation will only cycle
through T1 blocks, never cause any turbulence in T2 or B2.
If we have a situation where a single backend makes repeated scans of
the same table, these will be sequential and will have no effect on T1.
In a DW situation, you are likely to have one or more very popular large
tables (maybe think of this as the "Fact table", if you have a
dimensional design). The tables are large and therefore query execution
times will be extended (and accepted by user). In this situation it is
very likely that: i) a single user/app submits multiple requests from
other windows/threads
Or simply,
ii) multiple users access the popular table
The common effect will be concurrent, rather than sequential, access to
the popular table. Different SQL statements will have different plans
and will perform scans of the same table at different rates because of
other joins, more complex WHERE clauses etc. Like waves at a beach
moving at different rates. Every time one scan catches up with another,
it will cause T1 hits for almost the whole of the T1 list, promoting all
of these blocks to the top of the T2 MRU and thus spoiling the cache -
if it hits one it will probably hit most of them. This will not happen
ALL the time, but I don't want it to happen EVER. Even in DW situation,
I still want to be inserting data regularly (that's how the table got
big!), so I have index blocks and other stuff that I want almost
permanently in memory. Concurrent access via an index might have the
same effect, though less dramatically.
The closer the size of a table I to C, the greater the likelihood that
these spurious cache hits will occur. (Of course, it might be argued
that these are worthwhile and genuine cache hits - I argue that they are
not wanted and this is the main basis of my discussion). Of course, if a
table does fit in memory than that is very good. If a table was, say
2*C, then spurious cache hits will occur often and spoil the whole of
T2.
The DBT-3 workload is very similar to TPC-D/TPC-H workload. The test
consists of a power test (all queries sequentially in random order) and
a throughput test (1 or more concurrent streams, each stream executing
all queries in a random order). When this benchmark first came out most
vendors chose to perform the throughput test with only 1 stream (though
with parallel processing)...I would say one reason for this is poor
cache management...hence recent changes in various commercial products.
In summary, I believe there is a reasonably common effect in DW
situations where concurrent query access to large and popular tables
will result in undesirable cache spoiling. This effect will still occur
even after the ARC improvements are introduced - though in every other
case I can think of, the ARC code is a major improvement on earlier
strategies and should be hailed as a major improvement in automatic
performance adaptation.
There are two solution ideas:
i) change the code so that FTS on large tables use the "no cache"
strategy that has already been developed to support Vaccuum.
ii) more complex: synchronise the FTS of the large table so that all
backends that want scans produce only one set of I/Os and they share the
block many times (yet still don't put it in cache!). FTS don't start at
"the beginning" every time, they start wherever a current scan has got
to, then loop back round at end (so average of two concurrent scans is
1.5 times as much I/O as a single FTS - a 25% saving on I/O). A more
detailed explanation may be required - this technique is in commercial
use within the Teradata rdbms. Implementing it would take some doing...
Best Regards
Simon
Simon Riggs wrote:
Jan,
[...]
My thoughts are about multiple concurrent accesses, specifically FTS on
large tables, rather than sequential ones.
Single or multiple backends is irrelevant here because a data block only
exists once, and therefore we have only one shared buffer cache.
Buffers evicted from T1 are remembered in B1, and because of that even
repeated sequential scans of the same large relation will only cycle
through T1 blocks, never cause any turbulence in T2 or B2.If we have a situation where a single backend makes repeated scans of
the same table, these will be sequential and will have no effect on T1.
You really have to look at this a bit more global, not table related.
The strategy of ARC is this:
In an unknown access pattern, if a specific block is accessed less
frequently than every C requests, then it will only go into T1, age, get
evicted and the CDB moves to B1, will get removed from that and is
forgotten. Every block that is accessed more frequently than C will be
after it's last access in any of the four queues of the directory and
immediately go into T2.
The adjustment of the target T1 size is an attempt to catch as many
newcomers as possible. If an application does many inserts, it will
access new blocks very soon again, so that a small T1 is sufficient to
hold them in memory until their next access where they move into T2. An
application that does non-uniform random access to blocks (there are
always bestsellers and less frequently asked items), then a larger T1
might better satisfy that access pattern.
In a DW situation, you are likely to have one or more very popular large
tables (maybe think of this as the "Fact table", if you have a
dimensional design). The tables are large and therefore query execution
times will be extended (and accepted by user). In this situation it is
very likely that: i) a single user/app submits multiple requests from
other windows/threads
Or simply,
ii) multiple users access the popular table
If that causes that it's blocks are more frequently requested than every
C lookups, it belongs into T2.
The common effect will be concurrent, rather than sequential, access to
the popular table. Different SQL statements will have different plans
and will perform scans of the same table at different rates because of
other joins, more complex WHERE clauses etc. Like waves at a beach
moving at different rates. Every time one scan catches up with another,
it will cause T1 hits for almost the whole of the T1 list, promoting all
of these blocks to the top of the T2 MRU and thus spoiling the cache -
if it hits one it will probably hit most of them. This will not happen
ALL the time, but I don't want it to happen EVER. Even in DW situation,
I still want to be inserting data regularly (that's how the table got
big!), so I have index blocks and other stuff that I want almost
permanently in memory. Concurrent access via an index might have the
same effect, though less dramatically.The closer the size of a table I to C, the greater the likelihood that
these spurious cache hits will occur. (Of course, it might be argued
that these are worthwhile and genuine cache hits - I argue that they are
not wanted and this is the main basis of my discussion). Of course, if a
table does fit in memory than that is very good. If a table was, say
2*C, then spurious cache hits will occur often and spoil the whole of
T2.
How can any generic algorithm ever sense that when the application is
accessing the same blocks multiple times, it should NOT cache them? Are
you asking for a fine granulated tuning of cache priorities and
behaviour on a per table basis?
The DBT-3 workload is very similar to TPC-D/TPC-H workload. The test
consists of a power test (all queries sequentially in random order) and
a throughput test (1 or more concurrent streams, each stream executing
all queries in a random order). When this benchmark first came out most
vendors chose to perform the throughput test with only 1 stream (though
with parallel processing)...I would say one reason for this is poor
cache management...hence recent changes in various commercial products.In summary, I believe there is a reasonably common effect in DW
situations where concurrent query access to large and popular tables
will result in undesirable cache spoiling. This effect will still occur
even after the ARC improvements are introduced - though in every other
case I can think of, the ARC code is a major improvement on earlier
strategies and should be hailed as a major improvement in automatic
performance adaptation.There are two solution ideas:
i) change the code so that FTS on large tables use the "no cache"
strategy that has already been developed to support Vaccuum.
ii) more complex: synchronise the FTS of the large table so that all
backends that want scans produce only one set of I/Os and they share the
block many times (yet still don't put it in cache!). FTS don't start at
"the beginning" every time, they start wherever a current scan has got
to, then loop back round at end (so average of two concurrent scans is
1.5 times as much I/O as a single FTS - a 25% saving on I/O). A more
detailed explanation may be required - this technique is in commercial
use within the Teradata rdbms. Implementing it would take some doing...
How will the configuration of all that look like? You are using several
business terms a human brain can imagine to describe various access
patterns you want to be treated specially. In the whole system catalog
and all the way down to the buffer cache, we only have some file and
block number, maybe the size of it too but that's not guaranteed (think
of blind writes by a backend of another DB). So how do we express what
you want in some algorithm that we can put into the strategy?
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #
Jan,
I think we should suspend further discussion for now...in summary:
ARC Buffer management is an important new performance feature for 7.5;
the implementation is a good one and should have positive benefit for
everybody's workload. ARC will adapt to a variety of situations and has
been designed to allow Vacuum to avoid interfering with user
applications.
That's the important bit: The implementation notes are detailed; I've
read them a few times to ensure I've got it straight. I am confident
that the situation I described CAN exist with regard to multiple
concurrent queries performing full table scans upon a single large
table. Further debate on that point is continuing because of my poor
explanation of that situation; forgive me. Thanks very much for your
further explanations and examples.
I will take a more practical tack on this now: providing evidence of a
real query mix that exhibits the described properties and quantifying
the effects and their frequency. If it IS worth it, and I accept that it
may not be, I'll have a hack at the very specialised improvement I was
suggesting, for very specific workload types.
Best Regards
Simon Riggs
Tom Lane wrote:
"Simon Riggs" <simon@2ndquadrant.com> writes:
If the TODO-list-with-dash isn't the correct place to have looked, is
there another list of committed changes for the next release?We tend to rely on the CVS commit logs as the definitive source. You
can pull the info from the CVS server (I use cvs2cl.pl to format the
results nicely), or read the archives of pgsql-committers.In theory there should be a section at the head of release.sgml
mentioning the major changes done-so-far, but for various reasons this
hasn't gotten installed in the 7.5 branch yet. (Look at the CVS
versions during 7.4 development to see how we did it last time.)As far as the ARC change goes, I believe Jan still considers it a
work-in-progress, so it may not be appropriate to list yet anyway.
(Jan, where are you on that exactly?)Do we need such a list? (I'd be happy to compile and maintain this if it
agreed that it is a good idea to have such a document or process as
separate from TODO - I'll be doing this anyway before I pass further
comments!)If you wanted to go through the existing 7.5 commits and write up a
new done-so-far section, it'd save someone else (like me or Bruce)
from having to do it sometime soon ...
Doesn't Robert Treat's News Bits list all the major changes weekly?
That would b e a good source.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Tom Lane wrote:
Neil Conway <neilc@samurai.com> writes:
It's fine with me if Bruce prefers to build the release notes directly
from the change logs. As I saw it, the purpose of the temporary list of
things-done-so-far is not to be the raw material for the release notes.
It's to let alpha testers know about major changes that they might want
to test. As such, it's fine that it's incomplete.The other way we could handle this goal is to be a tad more vigorous about
checking off items as "done" in the TODO list. However, Bruce generally
doesn't bother to make a new entry in the TODO list if someone does
something that wasn't in the list to begin with, and so I'm not sure
it's the right vehicle.
Right. I see TODO as a way for us to remember our limitations, and to
document them for our users. Once an item is completed, it didn't seem
necessary to put it on the TODO list.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian wrote
Tom Lane wrote:
"Simon Riggs" <simon@2ndquadrant.com> writes:
If the TODO-list-with-dash isn't the correct place to have looked,
is
there another list of committed changes for the next release?
We tend to rely on the CVS commit logs as the definitive source.
You
can pull the info from the CVS server (I use cvs2cl.pl to format the
results nicely), or read the archives of pgsql-committers.In theory there should be a section at the head of release.sgml
mentioning the major changes done-so-far, but for various reasons
this
hasn't gotten installed in the 7.5 branch yet. (Look at the CVS
versions during 7.4 development to see how we did it last time.)As far as the ARC change goes, I believe Jan still considers it a
work-in-progress, so it may not be appropriate to list yet anyway.
(Jan, where are you on that exactly?)Do we need such a list? (I'd be happy to compile and maintain this
if
it
agreed that it is a good idea to have such a document or process
as
separate from TODO - I'll be doing this anyway before I pass
further
comments!)
If you wanted to go through the existing 7.5 commits and write up a
new done-so-far section, it'd save someone else (like me or Bruce)
from having to do it sometime soon ...Doesn't Robert Treat's News Bits list all the major changes weekly?
That would b e a good source.
Bruce - The excellent work that both you and Robert do is a slightly
different view to what I had in mind - I agree they are all aspects of
the same information. I'm posting a first output of this now, so we can
discuss whether such a thing is useful, and or whether it can ever be
all of useful/accurate/timely.
I'll happily add this to the HEAD of release.sgml, though lets agree the
content/direction first, before I spend time on a more formal
publication mechanism.
Best Regards, Simon Riggs
POSTGRESQL: Summary of Changes since last release (7.4.1)
----------------------------------------------------------
26 Jan 2004
This is a summary of most changes since code versions marked 7_4_1,
rather than a weekly news bulletin, a summary of desired future items,
or the definitive list of what's in any particular release. The
intention is to help everybody understand what's coming and what might
be affected, though most importantly, where you might add value to the
community as a designer, developer, tester, technical author or
advocate.
So far in this release dev cycle, major functionality will effect
- PERFORMANCE
- OPTIMIZER/EXECUTOR
- ROBUSTNESS
- SECURITY
Other code changes are summarised and their major impacts noted.
These notes cover major changes and are not guaranteed complete, or even
fully tested. Many additional patches to the latest full release have
been submitted and these are appreciated just as much, even though they
have *mostly* more isolated effects. Documentation changes continue,
though aren't described here, neither are client side
utilities/interfaces.
Nothing mentioned here is DEFINITELY in 7.5 or any future release;
testing of everything mentioned here is encouraged and appreciated, for
regression, performance and robustness. There is not yet a CVS branch
specifically for any later release than 7_4_1; these changes are not yet
even guaranteed to build into a consistent release when taken together.
Description of changes is designed to highlight benefit and impact, as
well as identifying specific areas of code change and potential knock-on
effects.
MAJOR FUNCTIONALITY
PERFORMANCE
- Major new memory buffer cache algorithm has now been implemented using
the Adaptive Replacement Cache algorithm. The implementation should have
positive benefit for everybody's workload, since ARC will adapt to a
variety of situations and has been designed to allow Vacuum to avoid
interfering with user applications. (Jan) src/backend/buffer
- New performance profiling of Intel CPU has allowed new spinlock code
to achieve performance/throughput gains of up to 10% using DBT-2 (OLTP)
workloads. Further gains to follow? (Manfred Spraul/Tom)
src/backend/storage/lmgr/s_lock.c
- TIP 9 now needs changing! Cross-data-type comparisons are now
indexable by btrees. All the existing cross-type comparison operators
(int2/int4/int8 and float4/float8) have appropriate support. (Tom)
Implications for user defined types and indices also? [HACKERS] 8-Nov-03
- All operations on TEMP relations are no longer logged in WAL, nor are
they involved in checkpoints, thus improving performance. (Tom)
- Index performance improved when scanning highly non-unique indices;
will greatly improve performance of cursor/fetch logic. B-tree's
initial-positioning-strategy code has been improved so that we start
scan at first entry, rather than reading in all entries that share that
index value before we begin to scan. (Tom, after Dimitry Tkach)
- Heap access code is now faster when using compressed columns in-line;
previous assumption was that all compressed columns were also toasted
(Tom)
- Optimized calling performance for dynamically loaded C functions. Hash
table added to cache lookups of 'C'-language functions. Some limited
testing suggests that this puts the lookup speed for external functions
just about on par with built-in functions. (Tom)
OPTIMIZER/EXECUTOR IMPROVEMENTS
- Genetic Optimizer usage has been re-analyzed; geqo defaults have now
been set to more effective values which are expected to significantly
improve plan selection for complex multi-way joins (> 10-way).
geqo_effort setting now offers an easy 1..10 setting (like IBM DB2),
that allows this to be controlled realistically by user/DBA. New
heuristic added to significantly reduce number of join plans attempted
before geqo begins. (Tom)
- Avoid redundant unique-ification step on subqueries where the result
is already known to be unique (i.e. it is a SELECT DISTINCT ...
subquery, IN subqueries that use UNION/INTERSECT/EXCEPT (without ALL)).
Also set join_in_selectivity correctly. (Tom)
- Avoid redundant projection step when scanning a table that we need all
the columns from. In case of SELECT INTO, we have to check that the
hasoids flag matches the desired output type, too. (Tom)
- Repair mis-estimation of indexscan CPU costs. When an indexqual
contains a run-time key (that is, a nonconstant expression compared to
the index variable), the key is evaluated just once per scan, but we
were charging costs as though it were evaluated once per visited index
entry. (Tom)
- Avoid planner failure for cases involving Cartesian products inside IN
(sub-SELECT) constructs. (Tom)
ROBUSTNESS
- Local buffer manager is no longer used for newly-created non-TEMP
relations; a new non-TEMP relation goes through the shared bufmgr and
thus will participate normally in checkpoints. TEMP relations use the
local buffer manager throughout their lifespan. (Tom)
- Massive overhaul of pg_dump: make use of dependency information from
pg_depend to determine a safe dump order. Defaults and check
constraints
can be emitted either as part of a table or domain definition, or
separately if that's needed to break a dependency loop. Lots of old
half-baked code for controlling dump order removed. Performance work has
also occurred to address some regressions in performance this caused.
(Tom)
- Changes to ALTER .. SET PATH allows ordered dumps to restore without
error - pg_restore options to select restore order now removed - not
needed (Tom)
- In backend/access/transam/ add warning to AtEOXact_SPI() to catch
cases where the current txn has been committed without SPI_finish()
being called
first. Allows detection of resource leaks... (Joe)
- psql memory allocation is being cleaned up, using safer calls
(Bruce/Neil)
- Transaction logic now enhanced to stay intact even at final stage of
resource failure conditions, such as running out of disk space etc (Tom)
SECURITY
- New permission-checking code. Rather than relying on the query context
of a rangetable entry to identify what permissions it wants checked,
store a full AclMode mask in each RTE, and check exactly those bits.
This allows an RTE specifying, say, INSERT privilege on a view to be
copied into a derived UPDATE query without changing meaning. (Tom)
OTHER NEW FUNCTIONALITY
- Add "WITH / WITHOUT OIDS" clause to CREATE TABLE AS. This allows the
user to explicitly specify whether OIDs should be included in the
newly-created relation; useful because it provides a way for application
authors to ensure their applications are compatible with future versions
of (in which the relation created by CREATE TABLE AS won't include OIDs
by default). (Neil)
- Add more kinds of exprs that can be accepted after a CREATE SCHEMA
(Neil)
- Info Schema enhanced further to support named function parameters
(Dennis)
- Change factorial to return a numeric (Gavin)
- Comments can now be set on individual Cast, Conversion, Op Class,
Large Object and Language (s) (Chris)
- Have psql \dn show only visible temp schemas using current_schemas()
- Have psql '\i ~/<tab><tab>' actually load files it displays from home
dir
- Allow psql \du to show groups, and add \dg for groups
- Allow pg_dump to dump CREATE CONVERSION (Chris)
SUMMARY OF OTHER CODE CHANGES
- Remove the explicit casting of NULL literals to a pointer in a wide
variety of code locations (Neil)
- Streamline local buffer manager code: Since it's no longer necessary
to fsync relations as they move out of the local buffers into shared
buffers, quite a lot of smgr.c/md.c/fd.c code is no longer needed and
has been removed: there's no concept of a dirty relation anymore in
md.c/fd.c, and we never fsync anything but WAL. (Tom)
- Add operator strategy and comparison-value datatype fields to ScanKey.
Remove the 'strategy map' code - Passing the strategy number to the
index AM directly should now be simpler and faster. Changes to
ScanKeyEntryInitialize() API touches quite a lot of files. (Tom)
- nbtree function _bt_first is now substantially changed/simplified
(Tom)
- Change PG_DELAY from msec to usec and use it consistently rather than
select(). Add Win32 Sleep() for delay. (Bruce)
- Supporting relaxing of ALTER...SET PATH requires changing the API for
GUC assign_hook functions, which touches a lot of places (Tom)
- In src/backend/access/ we have major changes in heap, nbtree, transam
and
common code
- Almost all files src/backend/commands have changes; mostly robustness
- JDBC interface has been moved out into its own project to improve the
focus on this popular and important area of client code.
MAJOR IMPACTS NOTED (Upgrades, performance, change of defaults etc)
- initdb forced due to pg_proc change to support Named function
parameters
- initdb forced due to change of stored rule representation.
- initdb recommended to allow picking up Info Schema changes
- geqo defaults have now been set to significantly more effective
values; we should expect increased optimization elapsed times for very
large queries, though hopefully outweighed by a significant improvement
in plan selection
MAJOR WORK IN PROGRESS (Much less accurate than the above...)
- Win32 - Changes to many areas, especially the postmaster, ipc, libpq
etc, to streamline and allow a single source Win32 port - the Win32 port
is coming closer! (Claudio)
- Replication - involving remote copying of WAL logs, then cut-over and
automatic catch-up on secondary node (Jan)
- Background writer work progresses, which is likely to improve overall
scalability/performance by smoothing dirty blocks writes; forms the
basis for increased server availability also (Jan)
- Initial stages of Named Function parameter support have been
committed. pg_proc has a column to store names, and CREATE FUNCTION can
insert data into it, but that's all as yet. (Dennis)
- Buffer manager locking changes: lock contention data have shown that
the BufMgrLock is the major source of contention under heavy load,
effecting multi-CPU SMP scalability. Patch to rework the bufmgr's
locking scheme to be more granular and further perf testing may yield
further perf gains. (Neil)
- Backend internal data structure changes: list rewrite: the linked list
implementation used throughout the backend is being redesigned for
constant-time length and append operations. This was done because
lappend() is called quite frequently, and allows some ugly code
(FastList) to be removed. (Neil)
- Hash index changes: complete wrap up unique hash indexes, as well as
some
improvements to hash index concurrent performance. (Neil)
- Customizable ANALYZE function for user definable functions is generic
functionality being added, though as base for PostGIS enhancements
(Mark)
- Re-evaluation of Genetic Query Optimizer parameters and usage will
likely continue for some and any real usage scenarios/observations are
welcome
WHAT **ISN'T** IN THIS RELEASE (YET!)
- Many TODO items still to be claimed...
- SQL Commands, standardisation and compatibility features
- Referential Integrity & Inheritance
- Administration (client interfaces may have changed)
- National/Multi-language support extensions
DEVELOPER FEATURES (logging, debugging etc)
- bootstrap can now be cancelled using CTRL-C (Tom)
- use debug_shared_buffers = <seconds> to show ARC mem buffer contents
(Jan)
All corrections and changes welcome...if this is well received, then I
will monitor pgsql-commiters to keep track of things. Please let me know
about even minor technical mistakes, though please lets not revisit the
designs of everything again! -- Simon
Import Notes
Resolved by subject fallback
On Wednesday 28 January 2004 00:38, Simon Riggs wrote:
POSTGRESQL: Summary of Changes since last release (7.4.1)
All corrections and changes welcome...if this is well received, then I
will monitor pgsql-commiters to keep track of things.
Speaking as JustAUser (TM) I find this very useful. I always have trouble
keeping track on what may/probably/will appear in upcoming versions when
people ask on the lists.
--
Richard Huxton
Archonet Ltd
Richard Huxton wrote:
On Wednesday 28 January 2004 00:38, Simon Riggs wrote:
POSTGRESQL: Summary of Changes since last release (7.4.1)
All corrections and changes welcome...if this is well received, then I
will monitor pgsql-commiters to keep track of things.Speaking as JustAUser (TM) I find this very useful. I always have trouble
keeping track on what may/probably/will appear in upcoming versions when
people ask on the lists.
Yes, very nice. We could link to it from the developers page.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian
Richard Huxton wrote:On Wednesday 28 January 2004 00:38, Simon Riggs wrote:
POSTGRESQL: Summary of Changes since last release (7.4.1)
All corrections and changes welcome...if this is well received,
then I
will monitor pgsql-commiters to keep track of things.
Speaking as JustAUser (TM) I find this very useful. I always have
trouble
keeping track on what may/probably/will appear in upcoming versions
when
people ask on the lists.
Yes, very nice. We could link to it from the developers page.
Thanks. I'll update it weekly on Tuesdays (suits me best...) up to the
next release and then we can see whether it was useful and worth
continuing.
Could you help me by suggesting where/how to update it (would it be in
the CVS, or do you want me to just e-mail you/somebody the file?) and
what format would work best (text, HTML, docbook).
I've had comments from a number of others also, so I will include all of
these points into tomorrow's version.
Best regards, Simon
"Simon Riggs" <simon@2ndquadrant.com> writes:
Could you help me by suggesting where/how to update it (would it be in
the CVS, or do you want me to just e-mail you/somebody the file?) and
what format would work best (text, HTML, docbook).
Seems like the way to work this is to keep it on a webpage somewhere in
developer.postgresql.org. Since the material is by nature transient,
archiving it in CVS doesn't seem like the best way to go. People will
want to look at it on the web instead.
Marc, can you help Simon set this up? It sounds like a worthwhile
experiment to me ...
regards, tom lane