pgbench without dbname worked differently with psql and pg_dump
Dear hackers,
Recently the 'd' option of pgbench is unified with other applications like psql and pg_dump,
but I found further difference. pgbebch uses an OS user as the dbanme even when the '-U' is
specified. Please see below experiments.
```
# Assuming the OS user is "hayato", whereas the database admin is "postgres"
$ initdb -U postgres -D data
...
Success. You can now start the database server using:
pg_ctl -D data -l logfile start
$pg_ctl -D data -l logfile start
waiting for server to start.... done
server started
# psql can connect to the database "postgres"
$ psql -U postgres -c "SELECT current_database();"
current_database
------------------
postgres
(1 row)
# pg_dump can connect as well
$ pg_dump -U postgres
--
-- PostgreSQL database dump
--
....
# ... but pgbench cannot
$ pgbench -U postgres
pgbench: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: FATAL: database "hayato" does not exist
pgbench: error: could not create connection for setup
```
Also, I found that dbname parameter for both pgbench and pg_dump is described by
the same sentences [1]https://www.postgresql.org/docs/devel/pgbench.html#PGBENCH-OPTION-DBNAME ``` [-d] dbname [--dbname=]dbname Specifies the name of the database to test in. If this is not specified, the environment variable PGDATABASE is used. If that is not set, the user name specified for the connection is used. ``` [2]https://www.postgresql.org/docs/devel/app-pgdump.html#:~:text=of%20the%20output.-,dbname,-Specifies%20the%20name ``` dbname Specifies the name of the database to be dumped. If this is not specified, the environment variable PGDATABASE is used. If that is not set, the user name specified for the connection is used. ```. This is also confusing.
Is it an expected behavior?
[1]: https://www.postgresql.org/docs/devel/pgbench.html#PGBENCH-OPTION-DBNAME ``` [-d] dbname [--dbname=]dbname Specifies the name of the database to test in. If this is not specified, the environment variable PGDATABASE is used. If that is not set, the user name specified for the connection is used. ```
```
[-d] dbname
[--dbname=]dbname
Specifies the name of the database to test in. If this is not specified, the environment variable PGDATABASE is used.
If that is not set, the user name specified for the connection is used.
```
[2]: https://www.postgresql.org/docs/devel/app-pgdump.html#:~:text=of%20the%20output.-,dbname,-Specifies%20the%20name ``` dbname Specifies the name of the database to be dumped. If this is not specified, the environment variable PGDATABASE is used. If that is not set, the user name specified for the connection is used. ```
```
dbname
Specifies the name of the database to be dumped. If this is not specified, the environment variable PGDATABASE is used.
If that is not set, the user name specified for the connection is used.
```
Best regards,
Hayato Kuroda
FUJITSU LIMITED
On Tue, Jan 21, 2025 at 12:33 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
Dear hackers,
Recently the 'd' option of pgbench is unified with other applications like psql and pg_dump,
but I found further difference. pgbebch uses an OS user as the dbanme even when the '-U' is
specified. Please see below experiments.```
# Assuming the OS user is "hayato", whereas the database admin is "postgres"
$ initdb -U postgres -D data
...
Success. You can now start the database server using:pg_ctl -D data -l logfile start
$pg_ctl -D data -l logfile start
waiting for server to start.... done
server started# psql can connect to the database "postgres"
$ psql -U postgres -c "SELECT current_database();"
current_database
------------------
postgres
(1 row)# pg_dump can connect as well
$ pg_dump -U postgres
--
-- PostgreSQL database dump
--
....# ... but pgbench cannot
$ pgbench -U postgres
pgbench: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: FATAL: database "hayato" does not exist
pgbench: error: could not create connection for setup
```Also, I found that dbname parameter for both pgbench and pg_dump is described by
the same sentences [1] [2]. This is also confusing.Is it an expected behavior?
[1]: https://www.postgresql.org/docs/devel/pgbench.html#PGBENCH-OPTION-DBNAME
```
[-d] dbname
[--dbname=]dbname
Specifies the name of the database to test in. If this is not specified, the environment variable PGDATABASE is used.
If that is not set, the user name specified for the connection is used.
```[2]: https://www.postgresql.org/docs/devel/app-pgdump.html#:~:text=of%20the%20output.-,dbname,-Specifies%20the%20name
```
dbname
Specifies the name of the database to be dumped. If this is not specified, the environment variable PGDATABASE is used.
If that is not set, the user name specified for the connection is used.
```
I see the same behaviour. The code in pgbench which decided the
database to connect to, looks similar to the relevant code in
createdb. But createdb documentation is in agreement with the code.
[1]: https://www.postgresql.org/docs/17/app-createdb.html
```
dbname
Specifies the name of the database to be created. The name must be
unique among all PostgreSQL databases in this cluster. The default is
to create a database with the same name as the current system user.
```
I couldn't locate the code in psql which calculates the name of the
database from a quick read of psql/startup.c. I think pgbench should
behave like psql, not like createdb.
[1]: https://www.postgresql.org/docs/17/app-createdb.html
--
Best Wishes,
Ashutosh Bapat
Dear Ashutosh,
I see the same behaviour. The code in pgbench which decided the
database to connect to, looks similar to the relevant code in
createdb. But createdb documentation is in agreement with the code.
[1]
```
dbnameSpecifies the name of the database to be created. The name must be
unique among all PostgreSQL databases in this cluster. The default is
to create a database with the same name as the current system user.
```
Thanks for the confirmation. I found below executables seem to have the same
logic to decide the dbname:
- pg_amcheck
- pgbench
- clusterdb
- createdb
- reindexdb
- vacuumdb
Also, I found that clusterdb/reindexdb/vacuumdb has the same description like
"If that is not set, the user name specified for the connection is used.",
it was not correct. IIUC createdb has the good description [1]description in created.sgml: ``` The default is to create a database with the same name as the current system user. ``` and it can be re-used
everywhere.
I couldn't locate the code in psql which calculates the name of the
database from a quick read of psql/startup.c. I think pgbench should
behave like psql, not like createdb.
Yes, it is same as feeling. IIUC libpq functions accept that dbname is NULL, so
no need to do something. One consideration is the compatibility issue - Users may
surprise because the default behavior is changed.
Based on the discussion, I've created two patches. 0001 contains changes for the
pgbench, and 0002 is a doc patch for other applications.
[1]: description in created.sgml: ``` The default is to create a database with the same name as the current system user. ```
```
The default is to create a database with the same name as the current system user.
```
------
Best regards,
Hayato Kuroda
Attachments:
0001-pgbench-Use-default-connection-when-dbname-is-not-sp.patchapplication/octet-stream; name=0001-pgbench-Use-default-connection-when-dbname-is-not-sp.patchDownload
From cb84e91574afe67f9685623f8715438130b7e789 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Wed, 22 Jan 2025 10:42:54 +0900
Subject: [PATCH 1/2] pgbench: Use default connection when dbname is not
specified
---
src/bin/pgbench/pgbench.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index c415e0f32c..eaaa4188f4 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -1567,7 +1567,11 @@ doConnect(void)
if (!conn)
{
- pg_log_error("connection to database \"%s\" failed", dbName);
+ if (dbName)
+ pg_log_error("connection to database \"%s\" failed", dbName);
+ else
+ pg_log_error("connection to default database failed");
+
return NULL;
}
@@ -7088,8 +7092,6 @@ main(int argc, char **argv)
dbName = env;
else if ((env = getenv("PGUSER")) != NULL && *env != '\0')
dbName = env;
- else
- dbName = get_user_name_or_exit(progname);
}
}
--
2.43.5
0002-Unify-descriptions-when-dbname-is-not-specified.patchapplication/octet-stream; name=0002-Unify-descriptions-when-dbname-is-not-specified.patchDownload
From 67169a6bb52c15ea3edabfce819d202a73f96b44 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Wed, 22 Jan 2025 10:46:36 +0900
Subject: [PATCH 2/2] Unify descriptions when dbname is not specified
---
doc/src/sgml/ref/clusterdb.sgml | 6 +++---
doc/src/sgml/ref/reindexdb.sgml | 6 +++---
doc/src/sgml/ref/vacuumdb.sgml | 6 +++---
3 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/doc/src/sgml/ref/clusterdb.sgml b/doc/src/sgml/ref/clusterdb.sgml
index d3145318b3..aee54bc93a 100644
--- a/doc/src/sgml/ref/clusterdb.sgml
+++ b/doc/src/sgml/ref/clusterdb.sgml
@@ -92,9 +92,9 @@ PostgreSQL documentation
Specifies the name of the database to be clustered,
when <option>-a</option>/<option>--all</option> is not used.
If this is not specified, the database name is read
- from the environment variable <envar>PGDATABASE</envar>. If
- that is not set, the user name specified for the connection is
- used. The <replaceable>dbname</replaceable> can be a <link
+ from the environment variable <envar>PGDATABASE</envar>. The default
+ is to create a database with the same name as the current system user.
+ The <replaceable>dbname</replaceable> can be a <link
linkend="libpq-connstring">connection string</link>. If so,
connection string parameters will override any conflicting command
line options.
diff --git a/doc/src/sgml/ref/reindexdb.sgml b/doc/src/sgml/ref/reindexdb.sgml
index 98c3333228..7d1820dff2 100644
--- a/doc/src/sgml/ref/reindexdb.sgml
+++ b/doc/src/sgml/ref/reindexdb.sgml
@@ -129,9 +129,9 @@ PostgreSQL documentation
Specifies the name of the database to be reindexed,
when <option>-a</option>/<option>--all</option> is not used.
If this is not specified, the database name is read
- from the environment variable <envar>PGDATABASE</envar>. If
- that is not set, the user name specified for the connection is
- used. The <replaceable>dbname</replaceable> can be a <link
+ from the environment variable <envar>PGDATABASE</envar>. The default
+ is to create a database with the same name as the current system user.
+ The <replaceable>dbname</replaceable> can be a <link
linkend="libpq-connstring">connection string</link>. If so,
connection string parameters will override any conflicting command
line options.
diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 66fccb30a2..1cd1e85cae 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -152,9 +152,9 @@ PostgreSQL documentation
Specifies the name of the database to be cleaned or analyzed,
when <option>-a</option>/<option>--all</option> is not used.
If this is not specified, the database name is read
- from the environment variable <envar>PGDATABASE</envar>. If
- that is not set, the user name specified for the connection is
- used. The <replaceable>dbname</replaceable> can be a <link
+ from the environment variable <envar>PGDATABASE</envar>. The default
+ is to create a database with the same name as the current system user.
+ The <replaceable>dbname</replaceable> can be a <link
linkend="libpq-connstring">connection string</link>. If so,
connection string parameters will override any conflicting command
line options.
--
2.43.5
On Tue, Jan 21, 2025 at 6:54 PM Hayato Kuroda (Fujitsu) <
kuroda.hayato@fujitsu.com> wrote:
IIUC createdb has the good description [1] and it can be re-used
everywhere.
As none of those programs "create" a database the wording "The default is
to create a database" seems quite a poor choice to standardize on.
The wording you suggest replacing seems fine as-is.
ISTM the following is the only issue that probably needs to be addressed.
Either by fixing the apparent bug in the code or documenting the correct
behavior in the pgbench docs.
# ... but pgbench cannot
$ pgbench -U postgres
pgbench: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed:
FATAL: database "hayato" does not exist
pgbench: error: could not create connection for setup
Your patch 0001 doesn't really seem to do either.
"connection to default database failed" is a strictly worse error message
than the existing one used unconditionally.
There probably should still be the existing "else" block in the other hunk
- we do default to OS user name - but to handle -U shouldn't we just add an
"else if (username) ...block before it?
David J.
Dear David,
As none of those programs "create" a database the wording "The default is to
create a database" seems quite a poor choice to standardize on.
Oh, I misunderstood. Sorry. I wanted to suggest like below:
```
The default is to connect to a database with the same name as the current system
user.
````
My main point was we should clarify that the OS (system) user-name is used.
"connection to default database failed" is a strictly worse error message than the
existing one used unconditionally.
There probably should still be the existing "else" block in the other hunk - we do
default to OS user name - but to handle -U shouldn't we just add an "else if (username)
...block before it?
You meant like attached, right? It seems simpler than my idea.
PSA updated version.
----------
Best regards,
Hayato Kuroda
Attachments:
v2-0001-pgbench-Use-default-connection-when-dbname-is-not.patchapplication/octet-stream; name=v2-0001-pgbench-Use-default-connection-when-dbname-is-not.patchDownload
From 4faeb66ca074926e5a7d0c07a82bd591112784d3 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Wed, 22 Jan 2025 10:42:54 +0900
Subject: [PATCH v2 1/2] pgbench: Use default connection when dbname is not
specified
---
src/bin/pgbench/pgbench.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index c415e0f32c..dfb77ff84e 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -7088,6 +7088,8 @@ main(int argc, char **argv)
dbName = env;
else if ((env = getenv("PGUSER")) != NULL && *env != '\0')
dbName = env;
+ else if (username)
+ dbName = username;
else
dbName = get_user_name_or_exit(progname);
}
--
2.43.5
v2-0002-Unify-descriptions-when-dbname-is-not-specified.patchapplication/octet-stream; name=v2-0002-Unify-descriptions-when-dbname-is-not-specified.patchDownload
From 777855d544d6bc270beaffb2f6b70e7408c92546 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Wed, 22 Jan 2025 10:46:36 +0900
Subject: [PATCH v2 2/2] Unify descriptions when dbname is not specified
---
doc/src/sgml/ref/clusterdb.sgml | 6 +++---
doc/src/sgml/ref/reindexdb.sgml | 6 +++---
doc/src/sgml/ref/vacuumdb.sgml | 6 +++---
3 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/doc/src/sgml/ref/clusterdb.sgml b/doc/src/sgml/ref/clusterdb.sgml
index d3145318b3..c7c90f56db 100644
--- a/doc/src/sgml/ref/clusterdb.sgml
+++ b/doc/src/sgml/ref/clusterdb.sgml
@@ -92,9 +92,9 @@ PostgreSQL documentation
Specifies the name of the database to be clustered,
when <option>-a</option>/<option>--all</option> is not used.
If this is not specified, the database name is read
- from the environment variable <envar>PGDATABASE</envar>. If
- that is not set, the user name specified for the connection is
- used. The <replaceable>dbname</replaceable> can be a <link
+ from the environment variable <envar>PGDATABASE</envar>. The default
+ is to connect to a database with the same name as the current system
+ user. The <replaceable>dbname</replaceable> can be a <link
linkend="libpq-connstring">connection string</link>. If so,
connection string parameters will override any conflicting command
line options.
diff --git a/doc/src/sgml/ref/reindexdb.sgml b/doc/src/sgml/ref/reindexdb.sgml
index 98c3333228..ded32d5e52 100644
--- a/doc/src/sgml/ref/reindexdb.sgml
+++ b/doc/src/sgml/ref/reindexdb.sgml
@@ -129,9 +129,9 @@ PostgreSQL documentation
Specifies the name of the database to be reindexed,
when <option>-a</option>/<option>--all</option> is not used.
If this is not specified, the database name is read
- from the environment variable <envar>PGDATABASE</envar>. If
- that is not set, the user name specified for the connection is
- used. The <replaceable>dbname</replaceable> can be a <link
+ from the environment variable <envar>PGDATABASE</envar>. The default
+ is to connect to a database with the same name as the current system
+ user. The <replaceable>dbname</replaceable> can be a <link
linkend="libpq-connstring">connection string</link>. If so,
connection string parameters will override any conflicting command
line options.
diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 66fccb30a2..2a28a2ea7a 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -152,9 +152,9 @@ PostgreSQL documentation
Specifies the name of the database to be cleaned or analyzed,
when <option>-a</option>/<option>--all</option> is not used.
If this is not specified, the database name is read
- from the environment variable <envar>PGDATABASE</envar>. If
- that is not set, the user name specified for the connection is
- used. The <replaceable>dbname</replaceable> can be a <link
+ from the environment variable <envar>PGDATABASE</envar>. The default
+ is to connect to a d atabase with the same name as the current system
+ user. The <replaceable>dbname</replaceable> can be a <link
linkend="libpq-connstring">connection string</link>. If so,
connection string parameters will override any conflicting command
line options.
--
2.43.5
On Wed, Jan 22, 2025 at 7:24 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
```
Thanks for the confirmation. I found below executables seem to have the same
logic to decide the dbname:- pg_amcheck
The documentation just says "If no such options are present, the
default database will be checked.". But it does not mention what the
default database is. So possibly we need to mention what the default
database is.
- clusterdb
- reindexdb
- vacuumdb
I confirm that all these utilities need a documentation update.
Regarding the patch 0002: I would prefer to keep the old construct
i.e. "If that is not set, the current system user name is used as the
database name.". But I think your construct is acceptable too; the app
has to connect to the database before doing anything.
vaccumdb changes need a fix - s/d atabase/database/
Interestingly, all these utilities use the value of PGUSER as the
databasename if it exists. I think the doc change needs to reflect
that as well.
- pgbench
This is the more interesting part and probably changes the perspective
on everything. Digging up history shows that when
412893b4168819955e9bf428036cd95b0832d657 was committed, we really used
the username (variable name login at that time) as the database name.
The username itself defaulted to PGUSER.
f1516ad7b3a9645a316846fa7b2229163bc55907 changed the code to use only
PGUSER and didn't use username at all. That's where I think the
document and the behaviour went out of sync. We should be fixing the
behaviour change caused by f1516ad7b3a9645a316846fa7b2229163bc55907,
which your 0001 tries to do. However, it should check username before
checking existence of PGUSER. It assumes that doConnect() and its
minions would use PGUSER if username is not specified, which is what
412893b4168819955e9bf428036cd95b0832d657 did as well. So it looks
good. I am adding Michael Pacquire and Kota Miyake to this thread for
their opinions.
This makes me wonder whether the real intention of code in clusterdb,
reindexdb and vacuumdb is to use the connection user name as the
database name as the document suggests. But then the current code is
very old - like 22 years old or something. So we should just fix the
documentation. It's quite possible that all the users just pass -d.
That explains why nobody noticed this descripancy for last 22 odd
years. Or may be we are missing something ... .
--
Best Wishes,
Ashutosh Bapat