64-bit queryId?

Started by Robert Haasover 8 years ago50 messages

robertmhaas@gmail.com

over 8 years ago

Our Query currently has space for a 32-bit queryId, but that seems
reasonably likely to result in collisions:

https://en.wikipedia.org/wiki/Birthday_problem#Probability_table

If you have as many as 50,000 queries, there's a 25% probability of
having at least one collision; that doesn't seem particularly
unrealistic. Obviously, normalization reduces the number of distinct
queries a lot, but if queries are dynamically generated you might
still have quite a few of them.

How about widening the value to uint64?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Robert Haas (#1)

Re: 64-bit queryId?

Robert Haas <robertmhaas@gmail.com> writes:

How about widening the value to uint64?

Doesn't really seem like that would guarantee no collisions.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Tom Lane (#2)

Re: 64-bit queryId?

On Sat, Sep 30, 2017 at 1:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

How about widening the value to uint64?

Doesn't really seem like that would guarantee no collisions.

This moves the possibility of a 25% collision from 50k queries 3.3
billion. Not zero, but safe enough as a minimal change for many years
to come.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

robertmhaas@gmail.com

over 8 years ago

In reply to: Tom Lane (#2)

Re: 64-bit queryId?

On Sat, Sep 30, 2017 at 12:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

How about widening the value to uint64?

Doesn't really seem like that would guarantee no collisions.

Well, no duh. If you come up with a hash function that maps an
infinite domain onto a finite range while guaranteeing no collisions,
I will look forward to reading the paper with interest.

Assuming, however, that you don't manage to prove all known
mathematics inconsistent, what one might reasonably hope to do is
render collisions remote enough that one need not worry about them too
much in practice. From that point of view, reducing the probability
of a collision by several orders of magnitude seems worth doing if (1)
the cost isn't too much and (2) the probability of a collision as
things stand is significant. I argue that both of those things are
probably true.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Peter Geoghegan

pg@bowt.ie

over 8 years ago

In reply to: Robert Haas (#4)

Re: 64-bit queryId?

On Sat, Sep 30, 2017 at 7:34 AM, Robert Haas <robertmhaas@gmail.com> wrote:

Assuming, however, that you don't manage to prove all known
mathematics inconsistent, what one might reasonably hope to do is
render collisions remote enough that one need not worry about them too
much in practice.

Isn't that already true in the case of queryId? I've never heard any
complaints about collisions. Most people don't change
pg_stat_statements.max, so the probability of a collision is more like
1%. And, that's the probability of *any* collision, not the
probability of a collision that the user actually cares about. The
majority of entries in pg_stat_statements among those ten thousand
will not be interesting. Often, 90%+ will be one-hit wonders. If that
isn't true, then you're probably not going to find pg_stat_statements
very useful, because you have nothing to focus on.

I have heard complaints about a number of different things in
pg_stat_statements, like the fact that we don't always manage to
replace constants with '?'/'$n' characters in all cases. I heard about
that quite a few times during my time at Heroku. But never this.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Peter Geoghegan

pg@bowt.ie

over 8 years ago

In reply to: Peter Geoghegan (#5)

Re: 64-bit queryId?

On Sat, Sep 30, 2017 at 8:39 AM, Peter Geoghegan <pg@bowt.ie> wrote:

Isn't that already true in the case of queryId? I've never heard any
complaints about collisions. Most people don't change
pg_stat_statements.max, so the probability of a collision is more like
1%. And, that's the probability of *any* collision, not the
probability of a collision that the user actually cares about. The
majority of entries in pg_stat_statements among those ten thousand
will not be interesting.

Correction: ten thousand is an example value of pg_stat_statements.max
in the docs, not the default value. The default is five thousand.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

robertmhaas@gmail.com

over 8 years ago

In reply to: Peter Geoghegan (#5)

Re: 64-bit queryId?

On Sat, Sep 30, 2017 at 11:39 AM, Peter Geoghegan <pg@bowt.ie> wrote:

On Sat, Sep 30, 2017 at 7:34 AM, Robert Haas <robertmhaas@gmail.com> wrote:

Assuming, however, that you don't manage to prove all known
mathematics inconsistent, what one might reasonably hope to do is
render collisions remote enough that one need not worry about them too
much in practice.

Isn't that already true in the case of queryId? I've never heard any
complaints about collisions. Most people don't change
pg_stat_statements.max, so the probability of a collision is more like
1%. And, that's the probability of *any* collision, not the
probability of a collision that the user actually cares about. The
majority of entries in pg_stat_statements among those ten thousand
will not be interesting. Often, 90%+ will be one-hit wonders. If that
isn't true, then you're probably not going to find pg_stat_statements
very useful, because you have nothing to focus on.

Well, I think that the fact that pg_stat_statements.max exists at all
is something that could be fixed now that we have DSA. It's hard to
tell right now whether the fact that we don't hear about collisions is
an artifact of that limit or of the underlying workload. Also, if
someone did have collisions, how would they even know?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

andres@anarazel.de

over 8 years ago

In reply to: Robert Haas (#7)

Re: 64-bit queryId?

On 2017-09-30 11:50:08 -0400, Robert Haas wrote:

Well, I think that the fact that pg_stat_statements.max exists at all
is something that could be fixed now that we have DSA.

You normally *do* want a limit imo. And given that query strings are now
stored externally, I don't think there's a huge space concern anymore?

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Peter Geoghegan (#5)

Re: 64-bit queryId?

Peter Geoghegan <pg@bowt.ie> writes:

On Sat, Sep 30, 2017 at 7:34 AM, Robert Haas <robertmhaas@gmail.com> wrote:

Assuming, however, that you don't manage to prove all known
mathematics inconsistent, what one might reasonably hope to do is
render collisions remote enough that one need not worry about them too
much in practice.

Isn't that already true in the case of queryId? I've never heard any
complaints about collisions.

More to the point: with 32-bit IDs, it's apparent that you shouldn't
really rely on them being unique, and should design your usage so that
it will survive collisions. Robert seems to be arguing that if we
merely made the IDs wider, it would be okay to design applications that
don't allow for that and would fail hard on a collision. I'm reminded
of Weinberg's famous line "If builders built houses the way programmers
build programs, the first woodpecker to come along would destroy
civilization".

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

andres@anarazel.de

over 8 years ago

In reply to: Tom Lane (#9)

Re: 64-bit queryId?

On 2017-09-30 12:03:57 -0400, Tom Lane wrote:

Peter Geoghegan <pg@bowt.ie> writes:

On Sat, Sep 30, 2017 at 7:34 AM, Robert Haas <robertmhaas@gmail.com> wrote:

Assuming, however, that you don't manage to prove all known
mathematics inconsistent, what one might reasonably hope to do is
render collisions remote enough that one need not worry about them too
much in practice.

Isn't that already true in the case of queryId? I've never heard any
complaints about collisions.

More to the point: with 32-bit IDs, it's apparent that you shouldn't
really rely on them being unique, and should design your usage so that
it will survive collisions. Robert seems to be arguing that if we
merely made the IDs wider, it would be okay to design applications that
don't allow for that and would fail hard on a collision. I'm reminded
of Weinberg's famous line "If builders built houses the way programmers
build programs, the first woodpecker to come along would destroy
civilization".

I think you're putting words and intent into Robert's mouth. If you
design a hashtable you're concerned about unnecessary collisions, even
if you're aware of them. Additionally, it's not clear you always can do
all that much about the collisions, without accepting undue overhead -
see e.g. pg_stat_statements.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

robertmhaas@gmail.com

over 8 years ago

In reply to: Tom Lane (#9)

Re: 64-bit queryId?

On Sat, Sep 30, 2017 at 12:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

More to the point: with 32-bit IDs, it's apparent that you shouldn't
really rely on them being unique, and should design your usage so that
it will survive collisions.

But the point is precisely that we do not do that. pg_stat_statements
"survives" collisions, but nothing good happens.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Alexander Korotkov

a.korotkov@postgrespro.ru

over 8 years ago

In reply to: Peter Geoghegan (#5)

Re: 64-bit queryId?

On Sat, Sep 30, 2017 at 6:39 PM, Peter Geoghegan <pg@bowt.ie> wrote:

On Sat, Sep 30, 2017 at 7:34 AM, Robert Haas <robertmhaas@gmail.com>
wrote:

Assuming, however, that you don't manage to prove all known
mathematics inconsistent, what one might reasonably hope to do is
render collisions remote enough that one need not worry about them too
much in practice.

Isn't that already true in the case of queryId? I've never heard any
complaints about collisions. Most people don't change
pg_stat_statements.max, so the probability of a collision is more like
1%. And, that's the probability of *any* collision, not the
probability of a collision that the user actually cares about. The
majority of entries in pg_stat_statements among those ten thousand
will not be interesting. Often, 90%+ will be one-hit wonders. If that
isn't true, then you're probably not going to find pg_stat_statements
very useful, because you have nothing to focus on.

I heard from customers that they periodically dump contents of
pg_stat_statements and then build statistics over long period of time. If
even they leave default pg_stat_statements.max unchanged, probability of
collision would be significantly higher.

I have heard complaints about a number of different things in

pg_stat_statements, like the fact that we don't always manage to
replace constants with '?'/'$n' characters in all cases. I heard about
that quite a few times during my time at Heroku. But never this.

I'm not sure that we should be oriented only by users complaints in this
problem by couple reasons.

1) Some defects could leave unrevealed by users during long periods of
time. We have examples of GiST picksplit algorithm to be bad resulting bad
query performance. However, users never complained about that because they
didn't know what is real fair performance for this kind of queries. In the
same way users may suffer from queryId collisions during long time without
noticing it. They might have inaccurate statistics of queries execution,
but they don't notice it because they have nothing to compare.

2) Ideally, we should fix potential problems before they materialize, not
only after users complaints.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

robertmhaas@gmail.com

over 8 years ago

In reply to: Andres Freund (#8)

Re: 64-bit queryId?

On Sat, Sep 30, 2017 at 11:55 AM, Andres Freund <andres@anarazel.de> wrote:

On 2017-09-30 11:50:08 -0400, Robert Haas wrote:

Well, I think that the fact that pg_stat_statements.max exists at all
is something that could be fixed now that we have DSA.

You normally *do* want a limit imo. And given that query strings are now
stored externally, I don't think there's a huge space concern anymore?

Well, that's probably true. I guess you don't want your query
generator to run the system out of memory.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

andres@anarazel.de

over 8 years ago

In reply to: Robert Haas (#13)

Re: 64-bit queryId?

On 2017-09-30 18:17:37 -0400, Robert Haas wrote:

On Sat, Sep 30, 2017 at 11:55 AM, Andres Freund <andres@anarazel.de> wrote:

On 2017-09-30 11:50:08 -0400, Robert Haas wrote:

Well, I think that the fact that pg_stat_statements.max exists at all
is something that could be fixed now that we have DSA.

You normally *do* want a limit imo. And given that query strings are now
stored externally, I don't think there's a huge space concern anymore?

Well, that's probably true. I guess you don't want your query
generator to run the system out of memory.

Hah. I'm just pondering running sqlsmith against a pg without such a
limit.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

stark@mit.edu

over 8 years ago

In reply to: Alexander Korotkov (#12)

Re: 64-bit queryId?

On 30 September 2017 at 21:03, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

I heard from customers that they periodically dump contents of
pg_stat_statements and then build statistics over long period of time. If
even they leave default pg_stat_statements.max unchanged, probability of
collision would be significantly higher.

Indeed. It's simple enough to export stats to prometheus with queryid
as the key. Then even if the query ages out of the database stats you
have graphs and derivative metrics going further back.

I have to admit this was my first reaction to the idea of using sha1
hashes for git commits as well. But eventually I came around. If the
chances of a hash collision are smaller than a cosmic ray flipping a
bit or a digital electronics falling into a meta-stable state then I
had to admit there's not much value in being theoretically more
correct.

In practice if the query has aged out of pg_stat_statements and you're
exporting pg_stat_statements metrics to longer-term storage there's
really nothing more "correct" than just using a long enough hash and
assuming there are no collisions anyways. If 64-bits is still not
sufficient we could just go to 160-bit sha1 or 256-bit sha256.

Actually there's a reason I'm wondering if we shouldn't use a
cryptographic hash or even an HMAC. Currently if you're non-superuser
we, quite sensibly, hide the query text. But we also hide the queryid.
The latter is really inconvenient since it really makes the stats
utterly useless. I'm not sure what the rationale was but the only
thing I can think of is a fear that it's possible to reverse engineer
the query using brute force. An HMAC, or for most purposes even a
simple cryptographic hash with a secret salt would make that
impossible.

(I have other advances in pg_stat_statements I would love to see. It
would be so much more helpful if pg_stat_statements also kept a few
examples of query parameters such as the most recent set, the set that
caused the longest execution, maybe the set with the largest of each
metric.)

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Greg Stark (#15)

Re: 64-bit queryId?

Greg Stark <stark@mit.edu> writes:

Indeed. It's simple enough to export stats to prometheus with queryid
as the key. Then even if the query ages out of the database stats you
have graphs and derivative metrics going further back.

I'm not really ready to buy into that as a supported use-case, because
it immediately leads to the conclusion that we need to promise stability
of query IDs across PG versions, which seems entirely infeasible to me
(at least with anything like the current method of deriving them).

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

stark@mit.edu

over 8 years ago

In reply to: Tom Lane (#16)

Re: 64-bit queryId?

On 1 October 2017 at 16:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Greg Stark <stark@mit.edu> writes:

Indeed. It's simple enough to export stats to prometheus with queryid
as the key. Then even if the query ages out of the database stats you
have graphs and derivative metrics going further back.

I'm not really ready to buy into that as a supported use-case, because
it immediately leads to the conclusion that we need to promise stability
of query IDs across PG versions, which seems entirely infeasible to me
(at least with anything like the current method of deriving them).

Well these kinds of monitoring systems tend to be used by operations
people who are a lot more practical and a lot less worried about
theoretical concerns like that.

What they're going to want to do is things like "alert on any query
using more than 0.1 cpu core" or "alert on any query being the top i/o
consumer that isn't amongst the top 10 over the previous day" or
"alert on any query using more than 4x the cpu or i/o on one database
than it does on average across all our databases". That last one is a
case where it would be nice if the queryid values were stable across
versions but it's hardly surprising that they aren't.

In context the point was merely that the default
pg_stat_statements.max of 5000 isn't sufficient to argue that 32-bit
values are enough. It wouldn't be hard for there to be 64k different
queries over time and across all the databases in a fleet and at that
point it becomes likely there'll be a 32-bit collision.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

robertmhaas@gmail.com

over 8 years ago

In reply to: Greg Stark (#17)

Re: 64-bit queryId?

On Sun, Oct 1, 2017 at 3:48 PM, Greg Stark <stark@mit.edu> wrote:

Well these kinds of monitoring systems tend to be used by operations
people who are a lot more practical and a lot less worried about
theoretical concerns like that.

+1, well said.

In context the point was merely that the default
pg_stat_statements.max of 5000 isn't sufficient to argue that 32-bit
values are enough. It wouldn't be hard for there to be 64k different
queries over time and across all the databases in a fleet and at that
point it becomes likely there'll be a 32-bit collision.

Yeah.

I think Alexander Korotkov's points are quite good, too.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Magnus Hagander

magnus@hagander.net

over 8 years ago

In reply to: Robert Haas (#18)

Re: 64-bit queryId?

On Mon, Oct 2, 2017 at 1:22 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Oct 1, 2017 at 3:48 PM, Greg Stark <stark@mit.edu> wrote:

Well these kinds of monitoring systems tend to be used by operations
people who are a lot more practical and a lot less worried about
theoretical concerns like that.

+1, well said.

+1 as well. I think these people would be perfectly find by it changing
across a version upgrade as long as they know that's the deal. *Most* of
the time there is no version upgrade going on, so it would work fine that
time.

Most operations people already deal with a lot of such parameters changing
around. I'm sure most of them would be more than happy with an improvement,
even if it's not mathematically perfect.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

Joshua D. Drake

jd@commandprompt.com

over 8 years ago

In reply to: Robert Haas (#18)

Re: 64-bit queryId?

On 10/01/2017 04:22 PM, Robert Haas wrote:

On Sun, Oct 1, 2017 at 3:48 PM, Greg Stark <stark@mit.edu> wrote:

Well these kinds of monitoring systems tend to be used by operations
people who are a lot more practical and a lot less worried about
theoretical concerns like that.

+1, well said.

In context the point was merely that the default
pg_stat_statements.max of 5000 isn't sufficient to argue that 32-bit
values are enough. It wouldn't be hard for there to be 64k different
queries over time and across all the databases in a fleet and at that
point it becomes likely there'll be a 32-bit collision.

Yeah.

I think Alexander Korotkov's points are quite good, too.

+1 to both of these as well.

jD

--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
***** Unless otherwise stated, opinions are my own. *****

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

robertmhaas@gmail.com

over 8 years ago

In reply to: Joshua D. Drake (#20)

1 attachment(s)

Re: 64-bit queryId?

On Mon, Oct 2, 2017 at 11:02 AM, Joshua D. Drake <jd@commandprompt.com> wrote:

+1 to both of these as well.

OK, so here's a patch. Review appreciated.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

64-bit-queryid-v1.patchapplication/octet-stream; name=64-bit-queryid-v1.patchDownload

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index a0e7a46871..02da6eb06e 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -21,7 +21,7 @@
  * as the collations of Vars and, most notably, the values of constants.
  *
  * This jumble is acquired at the end of parse analysis of each query, and
- * a 32-bit hash of it is stored into the query's Query.queryId field.
+ * a 64-bit hash of it is stored into the query's Query.queryId field.
  * The server then copies this value around, making it available in plan
  * tree(s) generated from the query.  The executor can then use this value
  * to blame query costs on the proper queryId.
@@ -95,7 +95,7 @@ PG_MODULE_MAGIC;
 #define PGSS_TEXT_FILE	PG_STAT_TMP_DIR "/pgss_query_texts.stat"
 
 /* Magic number identifying the stats file format */
-static const uint32 PGSS_FILE_HEADER = 0x20140125;
+static const uint32 PGSS_FILE_HEADER = 0x20171002;
 
 /* PostgreSQL major version number, changes in which invalidate all entries */
 static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
@@ -130,7 +130,7 @@ typedef struct pgssHashKey
 {
 	Oid			userid;			/* user OID */
 	Oid			dbid;			/* database OID */
-	uint32		queryid;		/* query identifier */
+	uint64		queryid;		/* query identifier */
 } pgssHashKey;
 
 /*
@@ -303,8 +303,8 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 					DestReceiver *dest, char *completionTag);
 static uint32 pgss_hash_fn(const void *key, Size keysize);
 static int	pgss_match_fn(const void *key1, const void *key2, Size keysize);
-static uint32 pgss_hash_string(const char *str, int len);
-static void pgss_store(const char *query, uint32 queryId,
+static uint64 pgss_hash_string(const char *str, int len);
+static void pgss_store(const char *query, uint64 queryId,
 		   int query_location, int query_len,
 		   double total_time, uint64 rows,
 		   const BufferUsage *bufusage,
@@ -781,7 +781,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 		prev_post_parse_analyze_hook(pstate, query);
 
 	/* Assert we didn't do this already */
-	Assert(query->queryId == 0);
+	Assert(query->queryId == UINT64CONST(0));
 
 	/* Safety check... */
 	if (!pgss || !pgss_hash)
@@ -797,7 +797,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 	 */
 	if (query->utilityStmt)
 	{
-		query->queryId = 0;
+		query->queryId = UINT64CONST(0);
 		return;
 	}
 
@@ -812,14 +812,15 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 
 	/* Compute query ID and mark the Query node with it */
 	JumbleQuery(&jstate, query);
-	query->queryId = hash_any(jstate.jumble, jstate.jumble_len);
+	query->queryId =
+		DatumGetUInt64(hash_any_extended(jstate.jumble, jstate.jumble_len, 0));
 
 	/*
 	 * If we are unlucky enough to get a hash of zero, use 1 instead, to
 	 * prevent confusion with the utility-statement case.
 	 */
-	if (query->queryId == 0)
-		query->queryId = 1;
+	if (query->queryId == UINT64CONST(0))
+		query->queryId = UINT64CONST(1);
 
 	/*
 	 * If we were able to identify any ignorable constants, we immediately
@@ -855,7 +856,7 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
 	 * counting of optimizable statements that are directly contained in
 	 * utility statements.
 	 */
-	if (pgss_enabled() && queryDesc->plannedstmt->queryId != 0)
+	if (pgss_enabled() && queryDesc->plannedstmt->queryId != UINT64CONST(0))
 	{
 		/*
 		 * Set up to track total elapsed time in ExecutorRun.  Make sure the
@@ -926,9 +927,9 @@ pgss_ExecutorFinish(QueryDesc *queryDesc)
 static void
 pgss_ExecutorEnd(QueryDesc *queryDesc)
 {
-	uint32		queryId = queryDesc->plannedstmt->queryId;
+	uint64		queryId = queryDesc->plannedstmt->queryId;
 
-	if (queryId != 0 && queryDesc->totaltime && pgss_enabled())
+	if (queryId != UINT64CONST(0) && queryDesc->totaltime && pgss_enabled())
 	{
 		/*
 		 * Make sure stats accumulation is done.  (Note: it's okay if several
@@ -1104,10 +1105,10 @@ pgss_match_fn(const void *key1, const void *key2, Size keysize)
  * identifying the query, without normalizing constants.  Used when hashing
  * utility statements.
  */
-static uint32
+static uint64
 pgss_hash_string(const char *str, int len)
 {
-	return hash_any((const unsigned char *) str, len);
+	return hash_any_extended((const unsigned char *) str, len, 0);
 }
 
 /*
@@ -1121,7 +1122,7 @@ pgss_hash_string(const char *str, int len)
  * query string.  total_time, rows, bufusage are ignored in this case.
  */
 static void
-pgss_store(const char *query, uint32 queryId,
+pgss_store(const char *query, uint64 queryId,
 		   int query_location, int query_len,
 		   double total_time, uint64 rows,
 		   const BufferUsage *bufusage,
@@ -1173,7 +1174,7 @@ pgss_store(const char *query, uint32 queryId,
 	/*
 	 * For utility statements, we just hash the query string to get an ID.
 	 */
-	if (queryId == 0)
+	if (queryId == UINT64CONST(0))
 		queryId = pgss_hash_string(query, query_len);
 
 	/* Set up key for hashtable search */
@@ -2324,7 +2325,7 @@ AppendJumble(pgssJumbleState *jstate, const unsigned char *item, Size size)
 
 		if (jumble_len >= JUMBLE_SIZE)
 		{
-			uint32		start_hash = hash_any(jumble, JUMBLE_SIZE);
+			uint64		start_hash = hash_any_extended(jumble, JUMBLE_SIZE, 0);
 
 			memcpy(jumble, &start_hash, sizeof(start_hash));
 			jumble_len = sizeof(start_hash);
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 5dc26ed17a..1b477baecb 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -162,7 +162,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	 */
 	pstmt = makeNode(PlannedStmt);
 	pstmt->commandType = CMD_SELECT;
-	pstmt->queryId = 0;
+	pstmt->queryId = UINT64CONST(0);
 	pstmt->hasReturning = false;
 	pstmt->hasModifyingCTE = false;
 	pstmt->canSetTag = true;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 2532edc94a..6cb1fa1d57 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -54,6 +54,11 @@ static void outChar(StringInfo str, char c);
 #define WRITE_UINT_FIELD(fldname) \
 	appendStringInfo(str, " :" CppAsString(fldname) " %u", node->fldname)
 
+/* Write an unsigned integer field (anythign written with UINT64_FORMAT) */
+#define WRITE_UINT64_FIELD(fldname) \
+	appendStringInfo(str, " :" CppAsString(fldname) " " UINT64_FORMAT, \
+					 node->fldname)
+
 /* Write an OID field (don't hard-wire assumption that OID is same as uint) */
 #define WRITE_OID_FIELD(fldname) \
 	appendStringInfo(str, " :" CppAsString(fldname) " %u", node->fldname)
@@ -260,7 +265,7 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
 	WRITE_NODE_TYPE("PLANNEDSTMT");
 
 	WRITE_ENUM_FIELD(commandType, CmdType);
-	WRITE_UINT_FIELD(queryId);
+	WRITE_UINT64_FIELD(queryId);
 	WRITE_BOOL_FIELD(hasReturning);
 	WRITE_BOOL_FIELD(hasModifyingCTE);
 	WRITE_BOOL_FIELD(canSetTag);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 07ba69178c..ccb6a1f4ac 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -33,6 +33,7 @@
 #include "nodes/parsenodes.h"
 #include "nodes/plannodes.h"
 #include "nodes/readfuncs.h"
+#include "utils/builtins.h"
 
 
 /*
@@ -70,6 +71,12 @@
 	token = pg_strtok(&length);		/* get field value */ \
 	local_node->fldname = atoui(token)
 
+/* Read an unsigned integer field (anything written using UINT64_FORMAT) */
+#define READ_UINT64_FIELD(fldname) \
+	token = pg_strtok(&length);		/* skip :fldname */ \
+	token = pg_strtok(&length);		/* get field value */ \
+	local_node->fldname = pg_strtouint64(token, NULL, 10)
+
 /* Read an long integer field (anything written as ":fldname %ld") */
 #define READ_LONG_FIELD(fldname) \
 	token = pg_strtok(&length);		/* skip :fldname */ \
@@ -231,7 +238,7 @@ _readQuery(void)
 
 	READ_ENUM_FIELD(commandType, CmdType);
 	READ_ENUM_FIELD(querySource, QuerySource);
-	local_node->queryId = 0;	/* not saved in output format */
+	local_node->queryId = UINT64CONST(0);	/* not saved in output format */
 	READ_BOOL_FIELD(canSetTag);
 	READ_NODE_FIELD(utilityStmt);
 	READ_INT_FIELD(resultRelation);
@@ -1456,7 +1463,7 @@ _readPlannedStmt(void)
 	READ_LOCALS(PlannedStmt);
 
 	READ_ENUM_FIELD(commandType, CmdType);
-	READ_UINT_FIELD(queryId);
+	READ_UINT64_FIELD(queryId);
 	READ_BOOL_FIELD(hasReturning);
 	READ_BOOL_FIELD(hasModifyingCTE);
 	READ_BOOL_FIELD(canSetTag);
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index 7054d4f77d..7a61af7905 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -3575,7 +3575,7 @@ RewriteQuery(Query *parsetree, List *rewrite_events)
 List *
 QueryRewrite(Query *parsetree)
 {
-	uint32		input_query_id = parsetree->queryId;
+	uint64		input_query_id = parsetree->queryId;
 	List	   *querylist;
 	List	   *results;
 	ListCell   *l;
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index f3e4c69753..fba1450ddb 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -111,7 +111,7 @@ typedef struct Query
 
 	QuerySource querySource;	/* where did I come from? */
 
-	uint32		queryId;		/* query identifier (can be set by plugins) */
+	uint64		queryId;		/* query identifier (can be set by plugins) */
 
 	bool		canSetTag;		/* do I set the command result tag? */
 
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index a382331f41..dd74efa9a4 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -44,7 +44,7 @@ typedef struct PlannedStmt
 
 	CmdType		commandType;	/* select|insert|update|delete|utility */
 
-	uint32		queryId;		/* query identifier (copied from Query) */
+	uint64		queryId;		/* query identifier (copied from Query) */
 
 	bool		hasReturning;	/* is it insert|update|delete RETURNING? */

GavinFlower@archidevsys.co.nz

over 8 years ago

In reply to: Joshua D. Drake (#20)

Re: 64-bit queryId?

On 03/10/17 04:02, Joshua D. Drake wrote:

On 10/01/2017 04:22 PM, Robert Haas wrote:

On Sun, Oct 1, 2017 at 3:48 PM, Greg Stark <stark@mit.edu> wrote:

Well these kinds of monitoring systems tend to be used by operations
people who are a lot more practical and a lot less worried about
theoretical concerns like that.

+1, well said.

In context the point was merely that the default
pg_stat_statements.max of 5000 isn't sufficient to argue that 32-bit
values are enough. It wouldn't be hard for there to be 64k different
queries over time and across all the databases in a fleet and at that
point it becomes likely there'll be a 32-bit collision.

Yeah.

I think Alexander Korotkov's points are quite good, too.

+1 to both of these as well.

jD

Did a calculation:

#         probability of collision
54561    0.499993
54562        0.500005

Essentially, you hit a greater than 50% chance of a collision before you
get to 55 thousand statements.

Cheers,
Gavin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Peter Geoghegan

pg@bowt.ie

over 8 years ago

In reply to: Robert Haas (#21)

Re: 64-bit queryId?

On Mon, Oct 2, 2017 at 9:10 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Oct 2, 2017 at 11:02 AM, Joshua D. Drake <jd@commandprompt.com> wrote:

+1 to both of these as well.

OK, so here's a patch. Review appreciated.

You need to change the SQL interface as well, although I'm not sure
exactly how. The problem is that you are now passing a uint64 queryId
to Int64GetDatumFast() within pg_stat_statements_internal(). That
worked when queryId was a uint32, because you can easily represent
values <= UINT_MAX as an int64/int8. However, you cannot represent the
second half of the range of uint64 within a int64/int8. I think that
this will behave different depending on USE_FLOAT8_BYVAL, if nothing
else.

The background here is that we used int8 as the output type for the
function when queryId was first exposed via the SQL interface because
there was no 32-bit unsigned int type that we could have used (except
possibly Oid, but that's weird). You see the same pattern in a couple
of other places.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Peter Geoghegan (#23)

Re: 64-bit queryId?

Peter Geoghegan <pg@bowt.ie> writes:

You need to change the SQL interface as well, although I'm not sure
exactly how. The problem is that you are now passing a uint64 queryId
to Int64GetDatumFast() within pg_stat_statements_internal(). That
worked when queryId was a uint32, because you can easily represent
values <= UINT_MAX as an int64/int8. However, you cannot represent the
second half of the range of uint64 within a int64/int8. I think that
this will behave different depending on USE_FLOAT8_BYVAL, if nothing
else.

Maybe intentionally drop the high-order bit, so that it's a 63-bit ID?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Alexander Korotkov

a.korotkov@postgrespro.ru

over 8 years ago

In reply to: Tom Lane (#24)

Re: 64-bit queryId?

On Tue, Oct 3, 2017 at 12:32 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Geoghegan <pg@bowt.ie> writes:

You need to change the SQL interface as well, although I'm not sure
exactly how. The problem is that you are now passing a uint64 queryId
to Int64GetDatumFast() within pg_stat_statements_internal(). That
worked when queryId was a uint32, because you can easily represent
values <= UINT_MAX as an int64/int8. However, you cannot represent the
second half of the range of uint64 within a int64/int8. I think that
this will behave different depending on USE_FLOAT8_BYVAL, if nothing
else.

Maybe intentionally drop the high-order bit, so that it's a 63-bit ID?

+1,
I see 3 options there:
1) Drop high-order bit, as you proposed.
2) Allow negative queryIds.
3) Implement unsigned 64-type.

#1 causes minor loss of precision which looks rather insignificant in given
context.
#2 might be rather unexpected for users whose previously had non-negative
queryIds. Changing queryId from 32-bit to 64-bit itself might require some
adoption from monitoring software. But queryIds are user-visible, and
negative queryIds would look rather nonlogical.
#3 would be attaching hard and long-term problem by insufficient reason.
Thus, #1 looks like most harmless solution.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Alexander Korotkov (#25)

Re: 64-bit queryId?

On Tue, Oct 3, 2017 at 9:07 AM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

+1,
I see 3 options there:
1) Drop high-order bit, as you proposed.
2) Allow negative queryIds.
3) Implement unsigned 64-type.

#1 causes minor loss of precision which looks rather insignificant in given
context.
#2 might be rather unexpected for users whose previously had non-negative
queryIds. Changing queryId from 32-bit to 64-bit itself might require some
adoption from monitoring software. But queryIds are user-visible, and
negative queryIds would look rather nonlogical.

Per the principal of least astonishment perhaps:
https://en.wikipedia.org/wiki/Principle_of_least_astonishment
Negative values tend to be considered as error codes as well.

#3 would be attaching hard and long-term problem by insufficient reason.
Thus, #1 looks like most harmless solution.

In this case going for #1 looks like the safest bet.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

andres@anarazel.de

over 8 years ago

In reply to: Alexander Korotkov (#25)

Re: 64-bit queryId?

On 2017-10-03 03:07:09 +0300, Alexander Korotkov wrote:

On Tue, Oct 3, 2017 at 12:32 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Geoghegan <pg@bowt.ie> writes:

You need to change the SQL interface as well, although I'm not sure
exactly how. The problem is that you are now passing a uint64 queryId
to Int64GetDatumFast() within pg_stat_statements_internal(). That
worked when queryId was a uint32, because you can easily represent
values <= UINT_MAX as an int64/int8. However, you cannot represent the
second half of the range of uint64 within a int64/int8. I think that
this will behave different depending on USE_FLOAT8_BYVAL, if nothing
else.

Maybe intentionally drop the high-order bit, so that it's a 63-bit ID?

+1,
I see 3 options there:
1) Drop high-order bit, as you proposed.
2) Allow negative queryIds.
3) Implement unsigned 64-type.

4) use numeric, efficiency when querying is not a significant concern here
5) use a custom type that doesn't support arithmetic, similar to pg_lsn.

FWIW, I think we should consider going for something like 5) for
pg_class.relpages.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Vladimir Sitnikov

sitnikov.vladimir@gmail.com

over 8 years ago

In reply to: Robert Haas (#21)

Re: 64-bit queryId?

OK, so here's a patch. Review appreciated.

Please correct typo "Write an unsigned integer field (anythign written with
UINT64_FORMAT)". anythign -> anything.

Vladimir

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Andres Freund (#27)

Re: 64-bit queryId?

On Tue, Oct 3, 2017 at 3:12 PM, Andres Freund <andres@anarazel.de> wrote:

On 2017-10-03 03:07:09 +0300, Alexander Korotkov wrote:

On Tue, Oct 3, 2017 at 12:32 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
+1,
I see 3 options there:
1) Drop high-order bit, as you proposed.
2) Allow negative queryIds.
3) Implement unsigned 64-type.

4) use numeric, efficiency when querying is not a significant concern here
5) use a custom type that doesn't support arithmetic, similar to pg_lsn.

Why not just returning a hexa-like text?
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

andres@anarazel.de

over 8 years ago

In reply to: Michael Paquier (#29)

Re: 64-bit queryId?

On 2017-10-03 17:06:20 +0900, Michael Paquier wrote:

On Tue, Oct 3, 2017 at 3:12 PM, Andres Freund <andres@anarazel.de> wrote:

On 2017-10-03 03:07:09 +0300, Alexander Korotkov wrote:

On Tue, Oct 3, 2017 at 12:32 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
+1,
I see 3 options there:
1) Drop high-order bit, as you proposed.
2) Allow negative queryIds.
3) Implement unsigned 64-type.

4) use numeric, efficiency when querying is not a significant concern here
5) use a custom type that doesn't support arithmetic, similar to pg_lsn.

Why not just returning a hexa-like text?

Two reasons: First, it'd look fairly different to before, whereas 4/5
would probably just continue to work fairly transparently in a lot of
cases. Secondly, what's the advantage in doing so over 4)?

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

robertmhaas@gmail.com

over 8 years ago

In reply to: Alexander Korotkov (#25)

1 attachment(s)

Re: 64-bit queryId?

On Mon, Oct 2, 2017 at 8:07 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

+1,
I see 3 options there:
1) Drop high-order bit, as you proposed.
2) Allow negative queryIds.
3) Implement unsigned 64-type.

#1 causes minor loss of precision which looks rather insignificant in given
context.
#2 might be rather unexpected for users whose previously had non-negative
queryIds. Changing queryId from 32-bit to 64-bit itself might require some
adoption from monitoring software. But queryIds are user-visible, and
negative queryIds would look rather nonlogical.
#3 would be attaching hard and long-term problem by insufficient reason.
Thus, #1 looks like most harmless solution.

I think we should just allow negative queryIds. I mean, the hash
functions have behaved that way for a long time:

rhaas=# select hashtext('');
hashtext
-------------
-1477818771
(1 row)

It seems silly to me to throw away a perfectly good bit from the hash
value just because of some risk of minor user confusion. I do like
Michael's suggestion of outputing hexa-like text, but changing the
types of the user-visible output columns seems like a job for another
patch. Similarly, if we were to adopt Andres's suggestions of a new
type or using numeric or Alexander's suggestion of implementing a
64-bit unsigned type, I think it should be a separate patch from this
one.

I would note that such a patch will actually create real
incompatibility -- extension upgrades might fail if there are
dependent views, for example -- whereas merely having query IDs start
to sometimes be negative only creates an incompatibility for people
who assumed that the int64 type wouldn't actually use its full range
of allowable values. I don't deny that someone may have done that, of
course, but I wouldn't guess that it's a common thing... maybe I'm
wrong.

Meanwhile, updated patch with a fix for the typo pointed out by
Vladimir Sitnikov attached.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

64-bit-queryid-v2.patchapplication/octet-stream; name=64-bit-queryid-v2.patchDownload

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index a0e7a46871..02da6eb06e 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -21,7 +21,7 @@
  * as the collations of Vars and, most notably, the values of constants.
  *
  * This jumble is acquired at the end of parse analysis of each query, and
- * a 32-bit hash of it is stored into the query's Query.queryId field.
+ * a 64-bit hash of it is stored into the query's Query.queryId field.
  * The server then copies this value around, making it available in plan
  * tree(s) generated from the query.  The executor can then use this value
  * to blame query costs on the proper queryId.
@@ -95,7 +95,7 @@ PG_MODULE_MAGIC;
 #define PGSS_TEXT_FILE	PG_STAT_TMP_DIR "/pgss_query_texts.stat"
 
 /* Magic number identifying the stats file format */
-static const uint32 PGSS_FILE_HEADER = 0x20140125;
+static const uint32 PGSS_FILE_HEADER = 0x20171002;
 
 /* PostgreSQL major version number, changes in which invalidate all entries */
 static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
@@ -130,7 +130,7 @@ typedef struct pgssHashKey
 {
 	Oid			userid;			/* user OID */
 	Oid			dbid;			/* database OID */
-	uint32		queryid;		/* query identifier */
+	uint64		queryid;		/* query identifier */
 } pgssHashKey;
 
 /*
@@ -303,8 +303,8 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 					DestReceiver *dest, char *completionTag);
 static uint32 pgss_hash_fn(const void *key, Size keysize);
 static int	pgss_match_fn(const void *key1, const void *key2, Size keysize);
-static uint32 pgss_hash_string(const char *str, int len);
-static void pgss_store(const char *query, uint32 queryId,
+static uint64 pgss_hash_string(const char *str, int len);
+static void pgss_store(const char *query, uint64 queryId,
 		   int query_location, int query_len,
 		   double total_time, uint64 rows,
 		   const BufferUsage *bufusage,
@@ -781,7 +781,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 		prev_post_parse_analyze_hook(pstate, query);
 
 	/* Assert we didn't do this already */
-	Assert(query->queryId == 0);
+	Assert(query->queryId == UINT64CONST(0));
 
 	/* Safety check... */
 	if (!pgss || !pgss_hash)
@@ -797,7 +797,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 	 */
 	if (query->utilityStmt)
 	{
-		query->queryId = 0;
+		query->queryId = UINT64CONST(0);
 		return;
 	}
 
@@ -812,14 +812,15 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 
 	/* Compute query ID and mark the Query node with it */
 	JumbleQuery(&jstate, query);
-	query->queryId = hash_any(jstate.jumble, jstate.jumble_len);
+	query->queryId =
+		DatumGetUInt64(hash_any_extended(jstate.jumble, jstate.jumble_len, 0));
 
 	/*
 	 * If we are unlucky enough to get a hash of zero, use 1 instead, to
 	 * prevent confusion with the utility-statement case.
 	 */
-	if (query->queryId == 0)
-		query->queryId = 1;
+	if (query->queryId == UINT64CONST(0))
+		query->queryId = UINT64CONST(1);
 
 	/*
 	 * If we were able to identify any ignorable constants, we immediately
@@ -855,7 +856,7 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
 	 * counting of optimizable statements that are directly contained in
 	 * utility statements.
 	 */
-	if (pgss_enabled() && queryDesc->plannedstmt->queryId != 0)
+	if (pgss_enabled() && queryDesc->plannedstmt->queryId != UINT64CONST(0))
 	{
 		/*
 		 * Set up to track total elapsed time in ExecutorRun.  Make sure the
@@ -926,9 +927,9 @@ pgss_ExecutorFinish(QueryDesc *queryDesc)
 static void
 pgss_ExecutorEnd(QueryDesc *queryDesc)
 {
-	uint32		queryId = queryDesc->plannedstmt->queryId;
+	uint64		queryId = queryDesc->plannedstmt->queryId;
 
-	if (queryId != 0 && queryDesc->totaltime && pgss_enabled())
+	if (queryId != UINT64CONST(0) && queryDesc->totaltime && pgss_enabled())
 	{
 		/*
 		 * Make sure stats accumulation is done.  (Note: it's okay if several
@@ -1104,10 +1105,10 @@ pgss_match_fn(const void *key1, const void *key2, Size keysize)
  * identifying the query, without normalizing constants.  Used when hashing
  * utility statements.
  */
-static uint32
+static uint64
 pgss_hash_string(const char *str, int len)
 {
-	return hash_any((const unsigned char *) str, len);
+	return hash_any_extended((const unsigned char *) str, len, 0);
 }
 
 /*
@@ -1121,7 +1122,7 @@ pgss_hash_string(const char *str, int len)
  * query string.  total_time, rows, bufusage are ignored in this case.
  */
 static void
-pgss_store(const char *query, uint32 queryId,
+pgss_store(const char *query, uint64 queryId,
 		   int query_location, int query_len,
 		   double total_time, uint64 rows,
 		   const BufferUsage *bufusage,
@@ -1173,7 +1174,7 @@ pgss_store(const char *query, uint32 queryId,
 	/*
 	 * For utility statements, we just hash the query string to get an ID.
 	 */
-	if (queryId == 0)
+	if (queryId == UINT64CONST(0))
 		queryId = pgss_hash_string(query, query_len);
 
 	/* Set up key for hashtable search */
@@ -2324,7 +2325,7 @@ AppendJumble(pgssJumbleState *jstate, const unsigned char *item, Size size)
 
 		if (jumble_len >= JUMBLE_SIZE)
 		{
-			uint32		start_hash = hash_any(jumble, JUMBLE_SIZE);
+			uint64		start_hash = hash_any_extended(jumble, JUMBLE_SIZE, 0);
 
 			memcpy(jumble, &start_hash, sizeof(start_hash));
 			jumble_len = sizeof(start_hash);
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 5dc26ed17a..1b477baecb 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -162,7 +162,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	 */
 	pstmt = makeNode(PlannedStmt);
 	pstmt->commandType = CMD_SELECT;
-	pstmt->queryId = 0;
+	pstmt->queryId = UINT64CONST(0);
 	pstmt->hasReturning = false;
 	pstmt->hasModifyingCTE = false;
 	pstmt->canSetTag = true;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 2532edc94a..43d62062bc 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -54,6 +54,11 @@ static void outChar(StringInfo str, char c);
 #define WRITE_UINT_FIELD(fldname) \
 	appendStringInfo(str, " :" CppAsString(fldname) " %u", node->fldname)
 
+/* Write an unsigned integer field (anything written with UINT64_FORMAT) */
+#define WRITE_UINT64_FIELD(fldname) \
+	appendStringInfo(str, " :" CppAsString(fldname) " " UINT64_FORMAT, \
+					 node->fldname)
+
 /* Write an OID field (don't hard-wire assumption that OID is same as uint) */
 #define WRITE_OID_FIELD(fldname) \
 	appendStringInfo(str, " :" CppAsString(fldname) " %u", node->fldname)
@@ -260,7 +265,7 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
 	WRITE_NODE_TYPE("PLANNEDSTMT");
 
 	WRITE_ENUM_FIELD(commandType, CmdType);
-	WRITE_UINT_FIELD(queryId);
+	WRITE_UINT64_FIELD(queryId);
 	WRITE_BOOL_FIELD(hasReturning);
 	WRITE_BOOL_FIELD(hasModifyingCTE);
 	WRITE_BOOL_FIELD(canSetTag);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 07ba69178c..ccb6a1f4ac 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -33,6 +33,7 @@
 #include "nodes/parsenodes.h"
 #include "nodes/plannodes.h"
 #include "nodes/readfuncs.h"
+#include "utils/builtins.h"
 
 
 /*
@@ -70,6 +71,12 @@
 	token = pg_strtok(&length);		/* get field value */ \
 	local_node->fldname = atoui(token)
 
+/* Read an unsigned integer field (anything written using UINT64_FORMAT) */
+#define READ_UINT64_FIELD(fldname) \
+	token = pg_strtok(&length);		/* skip :fldname */ \
+	token = pg_strtok(&length);		/* get field value */ \
+	local_node->fldname = pg_strtouint64(token, NULL, 10)
+
 /* Read an long integer field (anything written as ":fldname %ld") */
 #define READ_LONG_FIELD(fldname) \
 	token = pg_strtok(&length);		/* skip :fldname */ \
@@ -231,7 +238,7 @@ _readQuery(void)
 
 	READ_ENUM_FIELD(commandType, CmdType);
 	READ_ENUM_FIELD(querySource, QuerySource);
-	local_node->queryId = 0;	/* not saved in output format */
+	local_node->queryId = UINT64CONST(0);	/* not saved in output format */
 	READ_BOOL_FIELD(canSetTag);
 	READ_NODE_FIELD(utilityStmt);
 	READ_INT_FIELD(resultRelation);
@@ -1456,7 +1463,7 @@ _readPlannedStmt(void)
 	READ_LOCALS(PlannedStmt);
 
 	READ_ENUM_FIELD(commandType, CmdType);
-	READ_UINT_FIELD(queryId);
+	READ_UINT64_FIELD(queryId);
 	READ_BOOL_FIELD(hasReturning);
 	READ_BOOL_FIELD(hasModifyingCTE);
 	READ_BOOL_FIELD(canSetTag);
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index 7054d4f77d..7a61af7905 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -3575,7 +3575,7 @@ RewriteQuery(Query *parsetree, List *rewrite_events)
 List *
 QueryRewrite(Query *parsetree)
 {
-	uint32		input_query_id = parsetree->queryId;
+	uint64		input_query_id = parsetree->queryId;
 	List	   *querylist;
 	List	   *results;
 	ListCell   *l;
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index f3e4c69753..fba1450ddb 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -111,7 +111,7 @@ typedef struct Query
 
 	QuerySource querySource;	/* where did I come from? */
 
-	uint32		queryId;		/* query identifier (can be set by plugins) */
+	uint64		queryId;		/* query identifier (can be set by plugins) */
 
 	bool		canSetTag;		/* do I set the command result tag? */
 
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index a382331f41..dd74efa9a4 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -44,7 +44,7 @@ typedef struct PlannedStmt
 
 	CmdType		commandType;	/* select|insert|update|delete|utility */
 
-	uint32		queryId;		/* query identifier (copied from Query) */
+	uint64		queryId;		/* query identifier (copied from Query) */
 
 	bool		hasReturning;	/* is it insert|update|delete RETURNING? */

Alexander Korotkov

a.korotkov@postgrespro.ru

over 8 years ago

In reply to: Robert Haas (#31)

Re: 64-bit queryId?

On Tue, Oct 3, 2017 at 5:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Oct 2, 2017 at 8:07 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

+1,
I see 3 options there:
1) Drop high-order bit, as you proposed.
2) Allow negative queryIds.
3) Implement unsigned 64-type.

#1 causes minor loss of precision which looks rather insignificant in

given

context.
#2 might be rather unexpected for users whose previously had non-negative
queryIds. Changing queryId from 32-bit to 64-bit itself might require

some

adoption from monitoring software. But queryIds are user-visible, and
negative queryIds would look rather nonlogical.
#3 would be attaching hard and long-term problem by insufficient reason.
Thus, #1 looks like most harmless solution.

I think we should just allow negative queryIds. I mean, the hash
functions have behaved that way for a long time:

rhaas=# select hashtext('');
hashtext
-------------
-1477818771
(1 row)

It seems silly to me to throw away a perfectly good bit from the hash
value just because of some risk of minor user confusion. I do like
Michael's suggestion of outputing hexa-like text, but changing the
types of the user-visible output columns seems like a job for another
patch. Similarly, if we were to adopt Andres's suggestions of a new
type or using numeric or Alexander's suggestion of implementing a
64-bit unsigned type, I think it should be a separate patch from this
one.

I would note that such a patch will actually create real
incompatibility -- extension upgrades might fail if there are
dependent views, for example -- whereas merely having query IDs start
to sometimes be negative only creates an incompatibility for people
who assumed that the int64 type wouldn't actually use its full range
of allowable values. I don't deny that someone may have done that, of
course, but I wouldn't guess that it's a common thing... maybe I'm
wrong.

BTW, you didn't comment Tom's suggestion about dropping high-order bit
which trades minor user user confusion to minor loss of precision.
TBH, for me it's not so important whether we allow negative queryIds or
drop high-order bit. I would be anyway very good to have 64-(or 63-)bit
queryIds committed.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

robertmhaas@gmail.com

over 8 years ago

In reply to: Alexander Korotkov (#32)

Re: 64-bit queryId?

On Tue, Oct 3, 2017 at 12:04 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

BTW, you didn't comment Tom's suggestion about dropping high-order bit which
trades minor user user confusion to minor loss of precision.

Oh, I thought I did comment on that. I favor allowing negative IDs
rather than minor loss of precision.

TBH, for me it's not so important whether we allow negative queryIds or drop
high-order bit. I would be anyway very good to have 64-(or 63-)bit queryIds
committed.

Great.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Robert Haas (#31)

Re: 64-bit queryId?

On Tue, Oct 3, 2017 at 11:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:

It seems silly to me to throw away a perfectly good bit from the hash
value just because of some risk of minor user confusion. I do like
Michael's suggestion of outputing hexa-like text, but changing the
types of the user-visible output columns seems like a job for another
patch. Similarly, if we were to adopt Andres's suggestions of a new
type or using numeric or Alexander's suggestion of implementing a
64-bit unsigned type, I think it should be a separate patch from this
one.

Yeah, any of them would require a version bump of pg_stat_statements.
My suggestion would be actually to just document the use of to_hex in
pg_stat_statements if there is any issue with query ID signed-ness.
Still, I have yet to see any user stories about complains on the
matter, so even just added a note in the docs may be overdoing it.

Thinking about that freshly today (new day, new thoughts of course), I
think that I would go with the 64-bit version. The patch gets more
simple, and there are ways to easily get around negative numbers by
applying anything like to_hex() on top of pg_stat_statements.

+/* Write an unsigned integer field (anything written with UINT64_FORMAT) */
+#define WRITE_UINT64_FIELD(fldname) \
+   appendStringInfo(str, " :" CppAsString(fldname) " " UINT64_FORMAT, \
+                    node->fldname)
Correct call here.

 {
-   return hash_any((const unsigned char *) str, len);
+   return hash_any_extended((const unsigned char *) str, len, 0);
 }
[...]
-           uint32      start_hash = hash_any(jumble, JUMBLE_SIZE);
+           uint64      start_hash = hash_any_extended(jumble, JUMBLE_SIZE, 0);
Missing two DatumGetUInt64() perhaps? HEAD looks wrong to me as well.
-- 
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

robertmhaas@gmail.com

over 8 years ago

In reply to: Michael Paquier (#34)

Re: 64-bit queryId?

On Tue, Oct 3, 2017 at 9:34 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

+/* Write an unsigned integer field (anything written with UINT64_FORMAT) */
+#define WRITE_UINT64_FIELD(fldname) \
+   appendStringInfo(str, " :" CppAsString(fldname) " " UINT64_FORMAT, \
+                    node->fldname)
Correct call here.

I'm sorry, but I don't understand this comment.

{
-   return hash_any((const unsigned char *) str, len);
+   return hash_any_extended((const unsigned char *) str, len, 0);
}
[...]
-           uint32      start_hash = hash_any(jumble, JUMBLE_SIZE);
+           uint64      start_hash = hash_any_extended(jumble, JUMBLE_SIZE, 0);
Missing two DatumGetUInt64() perhaps? HEAD looks wrong to me as well.

Ah, yes, you're right.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Robert Haas (#35)

Re: 64-bit queryId?

On Wed, Oct 4, 2017 at 10:37 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Oct 3, 2017 at 9:34 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
+/* Write an unsigned integer field (anything written with UINT64_FORMAT) */
+#define WRITE_UINT64_FIELD(fldname) \
+   appendStringInfo(str, " :" CppAsString(fldname) " " UINT64_FORMAT, \
+                    node->fldname)
Correct call here.
I'm sorry, but I don't understand this comment.

I just mean that your patch is correct here. I don't always complain :)
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

robertmhaas@gmail.com

over 8 years ago

In reply to: Michael Paquier (#36)

1 attachment(s)

Re: 64-bit queryId?

On Tue, Oct 3, 2017 at 9:39 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

I'm sorry, but I don't understand this comment.

I just mean that your patch is correct here. I don't always complain :)

Oh, OK. I'm all right with my patch being correct.

Here's a new version that hopefully fixes the things that you noticed
were incorrect.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

64-bit-queryid-v3.patchapplication/octet-stream; name=64-bit-queryid-v3.patchDownload

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index a0e7a46871..e5ad6ae5ee 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -21,7 +21,7 @@
  * as the collations of Vars and, most notably, the values of constants.
  *
  * This jumble is acquired at the end of parse analysis of each query, and
- * a 32-bit hash of it is stored into the query's Query.queryId field.
+ * a 64-bit hash of it is stored into the query's Query.queryId field.
  * The server then copies this value around, making it available in plan
  * tree(s) generated from the query.  The executor can then use this value
  * to blame query costs on the proper queryId.
@@ -95,7 +95,7 @@ PG_MODULE_MAGIC;
 #define PGSS_TEXT_FILE	PG_STAT_TMP_DIR "/pgss_query_texts.stat"
 
 /* Magic number identifying the stats file format */
-static const uint32 PGSS_FILE_HEADER = 0x20140125;
+static const uint32 PGSS_FILE_HEADER = 0x20171004;
 
 /* PostgreSQL major version number, changes in which invalidate all entries */
 static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
@@ -130,7 +130,7 @@ typedef struct pgssHashKey
 {
 	Oid			userid;			/* user OID */
 	Oid			dbid;			/* database OID */
-	uint32		queryid;		/* query identifier */
+	uint64		queryid;		/* query identifier */
 } pgssHashKey;
 
 /*
@@ -303,8 +303,8 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 					DestReceiver *dest, char *completionTag);
 static uint32 pgss_hash_fn(const void *key, Size keysize);
 static int	pgss_match_fn(const void *key1, const void *key2, Size keysize);
-static uint32 pgss_hash_string(const char *str, int len);
-static void pgss_store(const char *query, uint32 queryId,
+static uint64 pgss_hash_string(const char *str, int len);
+static void pgss_store(const char *query, uint64 queryId,
 		   int query_location, int query_len,
 		   double total_time, uint64 rows,
 		   const BufferUsage *bufusage,
@@ -781,7 +781,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 		prev_post_parse_analyze_hook(pstate, query);
 
 	/* Assert we didn't do this already */
-	Assert(query->queryId == 0);
+	Assert(query->queryId == UINT64CONST(0));
 
 	/* Safety check... */
 	if (!pgss || !pgss_hash)
@@ -797,7 +797,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 	 */
 	if (query->utilityStmt)
 	{
-		query->queryId = 0;
+		query->queryId = UINT64CONST(0);
 		return;
 	}
 
@@ -812,14 +812,15 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 
 	/* Compute query ID and mark the Query node with it */
 	JumbleQuery(&jstate, query);
-	query->queryId = hash_any(jstate.jumble, jstate.jumble_len);
+	query->queryId =
+		DatumGetUInt64(hash_any_extended(jstate.jumble, jstate.jumble_len, 0));
 
 	/*
 	 * If we are unlucky enough to get a hash of zero, use 1 instead, to
 	 * prevent confusion with the utility-statement case.
 	 */
-	if (query->queryId == 0)
-		query->queryId = 1;
+	if (query->queryId == UINT64CONST(0))
+		query->queryId = UINT64CONST(1);
 
 	/*
 	 * If we were able to identify any ignorable constants, we immediately
@@ -855,7 +856,7 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
 	 * counting of optimizable statements that are directly contained in
 	 * utility statements.
 	 */
-	if (pgss_enabled() && queryDesc->plannedstmt->queryId != 0)
+	if (pgss_enabled() && queryDesc->plannedstmt->queryId != UINT64CONST(0))
 	{
 		/*
 		 * Set up to track total elapsed time in ExecutorRun.  Make sure the
@@ -926,9 +927,9 @@ pgss_ExecutorFinish(QueryDesc *queryDesc)
 static void
 pgss_ExecutorEnd(QueryDesc *queryDesc)
 {
-	uint32		queryId = queryDesc->plannedstmt->queryId;
+	uint64		queryId = queryDesc->plannedstmt->queryId;
 
-	if (queryId != 0 && queryDesc->totaltime && pgss_enabled())
+	if (queryId != UINT64CONST(0) && queryDesc->totaltime && pgss_enabled())
 	{
 		/*
 		 * Make sure stats accumulation is done.  (Note: it's okay if several
@@ -1104,10 +1105,11 @@ pgss_match_fn(const void *key1, const void *key2, Size keysize)
  * identifying the query, without normalizing constants.  Used when hashing
  * utility statements.
  */
-static uint32
+static uint64
 pgss_hash_string(const char *str, int len)
 {
-	return hash_any((const unsigned char *) str, len);
+	return DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+											len, 0));
 }
 
 /*
@@ -1121,7 +1123,7 @@ pgss_hash_string(const char *str, int len)
  * query string.  total_time, rows, bufusage are ignored in this case.
  */
 static void
-pgss_store(const char *query, uint32 queryId,
+pgss_store(const char *query, uint64 queryId,
 		   int query_location, int query_len,
 		   double total_time, uint64 rows,
 		   const BufferUsage *bufusage,
@@ -1173,7 +1175,7 @@ pgss_store(const char *query, uint32 queryId,
 	/*
 	 * For utility statements, we just hash the query string to get an ID.
 	 */
-	if (queryId == 0)
+	if (queryId == UINT64CONST(0))
 		queryId = pgss_hash_string(query, query_len);
 
 	/* Set up key for hashtable search */
@@ -2324,8 +2326,10 @@ AppendJumble(pgssJumbleState *jstate, const unsigned char *item, Size size)
 
 		if (jumble_len >= JUMBLE_SIZE)
 		{
-			uint32		start_hash = hash_any(jumble, JUMBLE_SIZE);
+			uint64		start_hash;
 
+			start_hash = DatumGetUInt64(hash_any_extended(jumble,
+														  JUMBLE_SIZE, 0));
 			memcpy(jumble, &start_hash, sizeof(start_hash));
 			jumble_len = sizeof(start_hash);
 		}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 5dc26ed17a..1b477baecb 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -162,7 +162,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	 */
 	pstmt = makeNode(PlannedStmt);
 	pstmt->commandType = CMD_SELECT;
-	pstmt->queryId = 0;
+	pstmt->queryId = UINT64CONST(0);
 	pstmt->hasReturning = false;
 	pstmt->hasModifyingCTE = false;
 	pstmt->canSetTag = true;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 2532edc94a..43d62062bc 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -54,6 +54,11 @@ static void outChar(StringInfo str, char c);
 #define WRITE_UINT_FIELD(fldname) \
 	appendStringInfo(str, " :" CppAsString(fldname) " %u", node->fldname)
 
+/* Write an unsigned integer field (anything written with UINT64_FORMAT) */
+#define WRITE_UINT64_FIELD(fldname) \
+	appendStringInfo(str, " :" CppAsString(fldname) " " UINT64_FORMAT, \
+					 node->fldname)
+
 /* Write an OID field (don't hard-wire assumption that OID is same as uint) */
 #define WRITE_OID_FIELD(fldname) \
 	appendStringInfo(str, " :" CppAsString(fldname) " %u", node->fldname)
@@ -260,7 +265,7 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
 	WRITE_NODE_TYPE("PLANNEDSTMT");
 
 	WRITE_ENUM_FIELD(commandType, CmdType);
-	WRITE_UINT_FIELD(queryId);
+	WRITE_UINT64_FIELD(queryId);
 	WRITE_BOOL_FIELD(hasReturning);
 	WRITE_BOOL_FIELD(hasModifyingCTE);
 	WRITE_BOOL_FIELD(canSetTag);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 07ba69178c..ccb6a1f4ac 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -33,6 +33,7 @@
 #include "nodes/parsenodes.h"
 #include "nodes/plannodes.h"
 #include "nodes/readfuncs.h"
+#include "utils/builtins.h"
 
 
 /*
@@ -70,6 +71,12 @@
 	token = pg_strtok(&length);		/* get field value */ \
 	local_node->fldname = atoui(token)
 
+/* Read an unsigned integer field (anything written using UINT64_FORMAT) */
+#define READ_UINT64_FIELD(fldname) \
+	token = pg_strtok(&length);		/* skip :fldname */ \
+	token = pg_strtok(&length);		/* get field value */ \
+	local_node->fldname = pg_strtouint64(token, NULL, 10)
+
 /* Read an long integer field (anything written as ":fldname %ld") */
 #define READ_LONG_FIELD(fldname) \
 	token = pg_strtok(&length);		/* skip :fldname */ \
@@ -231,7 +238,7 @@ _readQuery(void)
 
 	READ_ENUM_FIELD(commandType, CmdType);
 	READ_ENUM_FIELD(querySource, QuerySource);
-	local_node->queryId = 0;	/* not saved in output format */
+	local_node->queryId = UINT64CONST(0);	/* not saved in output format */
 	READ_BOOL_FIELD(canSetTag);
 	READ_NODE_FIELD(utilityStmt);
 	READ_INT_FIELD(resultRelation);
@@ -1456,7 +1463,7 @@ _readPlannedStmt(void)
 	READ_LOCALS(PlannedStmt);
 
 	READ_ENUM_FIELD(commandType, CmdType);
-	READ_UINT_FIELD(queryId);
+	READ_UINT64_FIELD(queryId);
 	READ_BOOL_FIELD(hasReturning);
 	READ_BOOL_FIELD(hasModifyingCTE);
 	READ_BOOL_FIELD(canSetTag);
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index 7054d4f77d..7a61af7905 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -3575,7 +3575,7 @@ RewriteQuery(Query *parsetree, List *rewrite_events)
 List *
 QueryRewrite(Query *parsetree)
 {
-	uint32		input_query_id = parsetree->queryId;
+	uint64		input_query_id = parsetree->queryId;
 	List	   *querylist;
 	List	   *results;
 	ListCell   *l;
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index f3e4c69753..fba1450ddb 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -111,7 +111,7 @@ typedef struct Query
 
 	QuerySource querySource;	/* where did I come from? */
 
-	uint32		queryId;		/* query identifier (can be set by plugins) */
+	uint64		queryId;		/* query identifier (can be set by plugins) */
 
 	bool		canSetTag;		/* do I set the command result tag? */
 
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index a382331f41..dd74efa9a4 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -44,7 +44,7 @@ typedef struct PlannedStmt
 
 	CmdType		commandType;	/* select|insert|update|delete|utility */
 
-	uint32		queryId;		/* query identifier (copied from Query) */
+	uint64		queryId;		/* query identifier (copied from Query) */
 
 	bool		hasReturning;	/* is it insert|update|delete RETURNING? */

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Robert Haas (#37)

Re: 64-bit queryId?

On Wed, Oct 4, 2017 at 9:08 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Oct 3, 2017 at 9:39 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

I'm sorry, but I don't understand this comment.

I just mean that your patch is correct here. I don't always complain :)

Oh, OK. I'm all right with my patch being correct.

Here's a new version that hopefully fixes the things that you noticed
were incorrect.

I am still on the learning curve with pg_stat_statements... This still
does not look complete to me. pgss_hash_fn only makes use of the last
four bytes of the query ID. What about computing the hash using as
also the first four bytes? With the current code, if the last four
bytes of two queries match then they would be counted together looking
at pgss_store().

I have spotted as well this comment in pg_stat_statements.c:
/* Increment the counts, except when jstate is not NULL */
if (!jstate)
I think that this should be "when jstate is NULL".
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

robertmhaas@gmail.com

over 8 years ago

In reply to: Michael Paquier (#38)

Re: 64-bit queryId?

On Wed, Oct 4, 2017 at 9:49 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

I am still on the learning curve with pg_stat_statements... This still
does not look complete to me. pgss_hash_fn only makes use of the last
four bytes of the query ID. What about computing the hash using as
also the first four bytes? With the current code, if the last four
bytes of two queries match then they would be counted together looking
at pgss_store().

Not really; dynahash won't merge two keys just because their hash
codes come out the same. But you're right; that's probably not the
best way to do it. TBH, why do we even have pgss_hash_fn? It seems
like using tag_hash would be superior.

I have spotted as well this comment in pg_stat_statements.c:
/* Increment the counts, except when jstate is not NULL */
if (!jstate)
I think that this should be "when jstate is NULL".

I think that you're right, but that's unrelated to this patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Robert Haas (#39)

Re: 64-bit queryId?

On Wed, Oct 4, 2017 at 11:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Not really; dynahash won't merge two keys just because their hash
codes come out the same. But you're right; that's probably not the
best way to do it. TBH, why do we even have pgss_hash_fn? It seems
like using tag_hash would be superior.

Yes, using tag_hash would be just better than any custom formula.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

robertmhaas@gmail.com

over 8 years ago

In reply to: Michael Paquier (#40)

1 attachment(s)

Re: 64-bit queryId?

On Wed, Oct 4, 2017 at 10:11 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Wed, Oct 4, 2017 at 11:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Not really; dynahash won't merge two keys just because their hash
codes come out the same. But you're right; that's probably not the
best way to do it. TBH, why do we even have pgss_hash_fn? It seems
like using tag_hash would be superior.

Yes, using tag_hash would be just better than any custom formula.

OK, here's v4, which does it that way.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

64-bit-queryid-v4.patchapplication/octet-stream; name=64-bit-queryid-v4.patchDownload

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index a0e7a46871..b04b4d6ce1 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -21,7 +21,7 @@
  * as the collations of Vars and, most notably, the values of constants.
  *
  * This jumble is acquired at the end of parse analysis of each query, and
- * a 32-bit hash of it is stored into the query's Query.queryId field.
+ * a 64-bit hash of it is stored into the query's Query.queryId field.
  * The server then copies this value around, making it available in plan
  * tree(s) generated from the query.  The executor can then use this value
  * to blame query costs on the proper queryId.
@@ -95,7 +95,7 @@ PG_MODULE_MAGIC;
 #define PGSS_TEXT_FILE	PG_STAT_TMP_DIR "/pgss_query_texts.stat"
 
 /* Magic number identifying the stats file format */
-static const uint32 PGSS_FILE_HEADER = 0x20140125;
+static const uint32 PGSS_FILE_HEADER = 0x20171004;
 
 /* PostgreSQL major version number, changes in which invalidate all entries */
 static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
@@ -130,7 +130,7 @@ typedef struct pgssHashKey
 {
 	Oid			userid;			/* user OID */
 	Oid			dbid;			/* database OID */
-	uint32		queryid;		/* query identifier */
+	uint64		queryid;		/* query identifier */
 } pgssHashKey;
 
 /*
@@ -301,10 +301,8 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 					ProcessUtilityContext context, ParamListInfo params,
 					QueryEnvironment *queryEnv,
 					DestReceiver *dest, char *completionTag);
-static uint32 pgss_hash_fn(const void *key, Size keysize);
-static int	pgss_match_fn(const void *key1, const void *key2, Size keysize);
-static uint32 pgss_hash_string(const char *str, int len);
-static void pgss_store(const char *query, uint32 queryId,
+static uint64 pgss_hash_string(const char *str, int len);
+static void pgss_store(const char *query, uint64 queryId,
 		   int query_location, int query_len,
 		   double total_time, uint64 rows,
 		   const BufferUsage *bufusage,
@@ -500,12 +498,10 @@ pgss_shmem_startup(void)
 	memset(&info, 0, sizeof(info));
 	info.keysize = sizeof(pgssHashKey);
 	info.entrysize = sizeof(pgssEntry);
-	info.hash = pgss_hash_fn;
-	info.match = pgss_match_fn;
 	pgss_hash = ShmemInitHash("pg_stat_statements hash",
 							  pgss_max, pgss_max,
 							  &info,
-							  HASH_ELEM | HASH_FUNCTION | HASH_COMPARE);
+							  HASH_ELEM | HASH_BLOBS);
 
 	LWLockRelease(AddinShmemInitLock);
 
@@ -781,7 +777,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 		prev_post_parse_analyze_hook(pstate, query);
 
 	/* Assert we didn't do this already */
-	Assert(query->queryId == 0);
+	Assert(query->queryId == UINT64CONST(0));
 
 	/* Safety check... */
 	if (!pgss || !pgss_hash)
@@ -797,7 +793,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 	 */
 	if (query->utilityStmt)
 	{
-		query->queryId = 0;
+		query->queryId = UINT64CONST(0);
 		return;
 	}
 
@@ -812,14 +808,15 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 
 	/* Compute query ID and mark the Query node with it */
 	JumbleQuery(&jstate, query);
-	query->queryId = hash_any(jstate.jumble, jstate.jumble_len);
+	query->queryId =
+		DatumGetUInt64(hash_any_extended(jstate.jumble, jstate.jumble_len, 0));
 
 	/*
 	 * If we are unlucky enough to get a hash of zero, use 1 instead, to
 	 * prevent confusion with the utility-statement case.
 	 */
-	if (query->queryId == 0)
-		query->queryId = 1;
+	if (query->queryId == UINT64CONST(0))
+		query->queryId = UINT64CONST(1);
 
 	/*
 	 * If we were able to identify any ignorable constants, we immediately
@@ -855,7 +852,7 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
 	 * counting of optimizable statements that are directly contained in
 	 * utility statements.
 	 */
-	if (pgss_enabled() && queryDesc->plannedstmt->queryId != 0)
+	if (pgss_enabled() && queryDesc->plannedstmt->queryId != UINT64CONST(0))
 	{
 		/*
 		 * Set up to track total elapsed time in ExecutorRun.  Make sure the
@@ -926,9 +923,9 @@ pgss_ExecutorFinish(QueryDesc *queryDesc)
 static void
 pgss_ExecutorEnd(QueryDesc *queryDesc)
 {
-	uint32		queryId = queryDesc->plannedstmt->queryId;
+	uint64		queryId = queryDesc->plannedstmt->queryId;
 
-	if (queryId != 0 && queryDesc->totaltime && pgss_enabled())
+	if (queryId != UINT64CONST(0) && queryDesc->totaltime && pgss_enabled())
 	{
 		/*
 		 * Make sure stats accumulation is done.  (Note: it's okay if several
@@ -1070,44 +1067,15 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 }
 
 /*
- * Calculate hash value for a key
- */
-static uint32
-pgss_hash_fn(const void *key, Size keysize)
-{
-	const pgssHashKey *k = (const pgssHashKey *) key;
-
-	return hash_uint32((uint32) k->userid) ^
-		hash_uint32((uint32) k->dbid) ^
-		hash_uint32((uint32) k->queryid);
-}
-
-/*
- * Compare two keys - zero means match
- */
-static int
-pgss_match_fn(const void *key1, const void *key2, Size keysize)
-{
-	const pgssHashKey *k1 = (const pgssHashKey *) key1;
-	const pgssHashKey *k2 = (const pgssHashKey *) key2;
-
-	if (k1->userid == k2->userid &&
-		k1->dbid == k2->dbid &&
-		k1->queryid == k2->queryid)
-		return 0;
-	else
-		return 1;
-}
-
-/*
  * Given an arbitrarily long query string, produce a hash for the purposes of
  * identifying the query, without normalizing constants.  Used when hashing
  * utility statements.
  */
-static uint32
+static uint64
 pgss_hash_string(const char *str, int len)
 {
-	return hash_any((const unsigned char *) str, len);
+	return DatumGetUInt64(hash_any_extended((const unsigned char *) str,
+											len, 0));
 }
 
 /*
@@ -1121,7 +1089,7 @@ pgss_hash_string(const char *str, int len)
  * query string.  total_time, rows, bufusage are ignored in this case.
  */
 static void
-pgss_store(const char *query, uint32 queryId,
+pgss_store(const char *query, uint64 queryId,
 		   int query_location, int query_len,
 		   double total_time, uint64 rows,
 		   const BufferUsage *bufusage,
@@ -1173,7 +1141,7 @@ pgss_store(const char *query, uint32 queryId,
 	/*
 	 * For utility statements, we just hash the query string to get an ID.
 	 */
-	if (queryId == 0)
+	if (queryId == UINT64CONST(0))
 		queryId = pgss_hash_string(query, query_len);
 
 	/* Set up key for hashtable search */
@@ -2324,8 +2292,10 @@ AppendJumble(pgssJumbleState *jstate, const unsigned char *item, Size size)
 
 		if (jumble_len >= JUMBLE_SIZE)
 		{
-			uint32		start_hash = hash_any(jumble, JUMBLE_SIZE);
+			uint64		start_hash;
 
+			start_hash = DatumGetUInt64(hash_any_extended(jumble,
+														  JUMBLE_SIZE, 0));
 			memcpy(jumble, &start_hash, sizeof(start_hash));
 			jumble_len = sizeof(start_hash);
 		}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 5dc26ed17a..1b477baecb 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -162,7 +162,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	 */
 	pstmt = makeNode(PlannedStmt);
 	pstmt->commandType = CMD_SELECT;
-	pstmt->queryId = 0;
+	pstmt->queryId = UINT64CONST(0);
 	pstmt->hasReturning = false;
 	pstmt->hasModifyingCTE = false;
 	pstmt->canSetTag = true;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 2532edc94a..43d62062bc 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -54,6 +54,11 @@ static void outChar(StringInfo str, char c);
 #define WRITE_UINT_FIELD(fldname) \
 	appendStringInfo(str, " :" CppAsString(fldname) " %u", node->fldname)
 
+/* Write an unsigned integer field (anything written with UINT64_FORMAT) */
+#define WRITE_UINT64_FIELD(fldname) \
+	appendStringInfo(str, " :" CppAsString(fldname) " " UINT64_FORMAT, \
+					 node->fldname)
+
 /* Write an OID field (don't hard-wire assumption that OID is same as uint) */
 #define WRITE_OID_FIELD(fldname) \
 	appendStringInfo(str, " :" CppAsString(fldname) " %u", node->fldname)
@@ -260,7 +265,7 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
 	WRITE_NODE_TYPE("PLANNEDSTMT");
 
 	WRITE_ENUM_FIELD(commandType, CmdType);
-	WRITE_UINT_FIELD(queryId);
+	WRITE_UINT64_FIELD(queryId);
 	WRITE_BOOL_FIELD(hasReturning);
 	WRITE_BOOL_FIELD(hasModifyingCTE);
 	WRITE_BOOL_FIELD(canSetTag);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 07ba69178c..ccb6a1f4ac 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -33,6 +33,7 @@
 #include "nodes/parsenodes.h"
 #include "nodes/plannodes.h"
 #include "nodes/readfuncs.h"
+#include "utils/builtins.h"
 
 
 /*
@@ -70,6 +71,12 @@
 	token = pg_strtok(&length);		/* get field value */ \
 	local_node->fldname = atoui(token)
 
+/* Read an unsigned integer field (anything written using UINT64_FORMAT) */
+#define READ_UINT64_FIELD(fldname) \
+	token = pg_strtok(&length);		/* skip :fldname */ \
+	token = pg_strtok(&length);		/* get field value */ \
+	local_node->fldname = pg_strtouint64(token, NULL, 10)
+
 /* Read an long integer field (anything written as ":fldname %ld") */
 #define READ_LONG_FIELD(fldname) \
 	token = pg_strtok(&length);		/* skip :fldname */ \
@@ -231,7 +238,7 @@ _readQuery(void)
 
 	READ_ENUM_FIELD(commandType, CmdType);
 	READ_ENUM_FIELD(querySource, QuerySource);
-	local_node->queryId = 0;	/* not saved in output format */
+	local_node->queryId = UINT64CONST(0);	/* not saved in output format */
 	READ_BOOL_FIELD(canSetTag);
 	READ_NODE_FIELD(utilityStmt);
 	READ_INT_FIELD(resultRelation);
@@ -1456,7 +1463,7 @@ _readPlannedStmt(void)
 	READ_LOCALS(PlannedStmt);
 
 	READ_ENUM_FIELD(commandType, CmdType);
-	READ_UINT_FIELD(queryId);
+	READ_UINT64_FIELD(queryId);
 	READ_BOOL_FIELD(hasReturning);
 	READ_BOOL_FIELD(hasModifyingCTE);
 	READ_BOOL_FIELD(canSetTag);
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index 7054d4f77d..7a61af7905 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -3575,7 +3575,7 @@ RewriteQuery(Query *parsetree, List *rewrite_events)
 List *
 QueryRewrite(Query *parsetree)
 {
-	uint32		input_query_id = parsetree->queryId;
+	uint64		input_query_id = parsetree->queryId;
 	List	   *querylist;
 	List	   *results;
 	ListCell   *l;
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 50eec730b3..732e5d6788 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -111,7 +111,7 @@ typedef struct Query
 
 	QuerySource querySource;	/* where did I come from? */
 
-	uint32		queryId;		/* query identifier (can be set by plugins) */
+	uint64		queryId;		/* query identifier (can be set by plugins) */
 
 	bool		canSetTag;		/* do I set the command result tag? */
 
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index a382331f41..dd74efa9a4 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -44,7 +44,7 @@ typedef struct PlannedStmt
 
 	CmdType		commandType;	/* select|insert|update|delete|utility */
 
-	uint32		queryId;		/* query identifier (copied from Query) */
+	uint64		queryId;		/* query identifier (copied from Query) */
 
 	bool		hasReturning;	/* is it insert|update|delete RETURNING? */

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Robert Haas (#41)

Re: 64-bit queryId?

On Thu, Oct 5, 2017 at 4:12 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Oct 4, 2017 at 10:11 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Wed, Oct 4, 2017 at 11:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Not really; dynahash won't merge two keys just because their hash
codes come out the same. But you're right; that's probably not the
best way to do it. TBH, why do we even have pgss_hash_fn? It seems
like using tag_hash would be superior.

Yes, using tag_hash would be just better than any custom formula.

OK, here's v4, which does it that way.

v4 looks correct to me. Testing it through (pgbench and some custom
queries) I have not spotted issues. If the final decision is to use
64-bit query IDs, then this patch could be pushed.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

robertmhaas@gmail.com

over 8 years ago

In reply to: Michael Paquier (#42)

Re: 64-bit queryId?

On Wed, Oct 4, 2017 at 9:00 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

v4 looks correct to me. Testing it through (pgbench and some custom
queries) I have not spotted issues. If the final decision is to use
64-bit query IDs, then this patch could be pushed.

Cool. Committed, thanks for the review.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Robert Haas (#43)

Re: 64-bit queryId?

On Thu, Oct 12, 2017 at 9:05 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Oct 4, 2017 at 9:00 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

v4 looks correct to me. Testing it through (pgbench and some custom
queries) I have not spotted issues. If the final decision is to use
64-bit query IDs, then this patch could be pushed.

Cool. Committed, thanks for the review.

The final result looks fine for me. Thanks.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

rjuju123@gmail.com

about 8 years ago

In reply to: Michael Paquier (#44)

Re: 64-bit queryId?

On Thu, Oct 12, 2017 at 2:46 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Thu, Oct 12, 2017 at 9:05 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Oct 4, 2017 at 9:00 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

v4 looks correct to me. Testing it through (pgbench and some custom
queries) I have not spotted issues. If the final decision is to use
64-bit query IDs, then this patch could be pushed.

Cool. Committed, thanks for the review.

The final result looks fine for me. Thanks.

Sorry for replying so late, but I have a perhaps naive question about
the hashtable handling with this new version.

IIUC, the shared hash table is now created with HASH_BLOBS instead of
HASH_FUNCTION, so since sizeof(pgssHashKey) != sizeof(uint32) the hash
table will use tag_hash() to compute the hash key.

tag_hash() uses all the bits present in the given struct, so this can
be problematic if padding bits are not zeroed, which isn't garanted by
C standard for local variable.

WIth current pgssHashKey definition, there shouldn't be padding bits,
so it should be safe. But I wonder if adding an explicit memset() of
the key in pgss_store() could avoid extension authors to have
duplicate entries if they rely on this code, or prevent future issue
in the unlikely case of adding other fields to pgssHashKey.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

robertmhaas@gmail.com

about 8 years ago

In reply to: Julien Rouhaud (#45)

Re: 64-bit queryId?

On Wed, Oct 18, 2017 at 9:20 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:

Sorry for replying so late, but I have a perhaps naive question about
the hashtable handling with this new version.

IIUC, the shared hash table is now created with HASH_BLOBS instead of
HASH_FUNCTION, so since sizeof(pgssHashKey) != sizeof(uint32) the hash
table will use tag_hash() to compute the hash key.

tag_hash() uses all the bits present in the given struct, so this can
be problematic if padding bits are not zeroed, which isn't garanted by
C standard for local variable.

WIth current pgssHashKey definition, there shouldn't be padding bits,
so it should be safe. But I wonder if adding an explicit memset() of
the key in pgss_store() could avoid extension authors to have
duplicate entries if they rely on this code, or prevent future issue
in the unlikely case of adding other fields to pgssHashKey.

I guess we should probably add additional comment to the definition of
pgssHashKey warning of the danger. I'm OK with adding a memset if
somebody can promise me it will get optimized away by all reasonably
commonly-used compilers, but I'm not that keen on adding more cycles
to protect against a hypothetical danger.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Michael Paquier

michael.paquier@gmail.com

about 8 years ago

In reply to: Robert Haas (#46)

Re: 64-bit queryId?

On Thu, Oct 19, 2017 at 4:12 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Oct 18, 2017 at 9:20 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:

Sorry for replying so late, but I have a perhaps naive question about
the hashtable handling with this new version.

IIUC, the shared hash table is now created with HASH_BLOBS instead of
HASH_FUNCTION, so since sizeof(pgssHashKey) != sizeof(uint32) the hash
table will use tag_hash() to compute the hash key.

tag_hash() uses all the bits present in the given struct, so this can
be problematic if padding bits are not zeroed, which isn't garanted by
C standard for local variable.

WIth current pgssHashKey definition, there shouldn't be padding bits,
so it should be safe. But I wonder if adding an explicit memset() of
the key in pgss_store() could avoid extension authors to have
duplicate entries if they rely on this code, or prevent future issue
in the unlikely case of adding other fields to pgssHashKey.

I guess we should probably add additional comment to the definition of
pgssHashKey warning of the danger. I'm OK with adding a memset if
somebody can promise me it will get optimized away by all reasonably
commonly-used compilers, but I'm not that keen on adding more cycles
to protect against a hypothetical danger.

A comment is an adapted answer for me too. There is no guarantee that
memset improvements will get committed. They will likely be, but
making a hard promise is difficult.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

rjuju123@gmail.com

about 8 years ago

In reply to: Michael Paquier (#47)

1 attachment(s)

Re: 64-bit queryId?

On Thu, Oct 19, 2017 at 1:08 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Thu, Oct 19, 2017 at 4:12 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Oct 18, 2017 at 9:20 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:

WIth current pgssHashKey definition, there shouldn't be padding bits,
so it should be safe. But I wonder if adding an explicit memset() of
the key in pgss_store() could avoid extension authors to have
duplicate entries if they rely on this code, or prevent future issue
in the unlikely case of adding other fields to pgssHashKey.

I guess we should probably add additional comment to the definition of
pgssHashKey warning of the danger. I'm OK with adding a memset if
somebody can promise me it will get optimized away by all reasonably
commonly-used compilers, but I'm not that keen on adding more cycles
to protect against a hypothetical danger.

A comment is an adapted answer for me too.

I agree, and I'm perfectly fine with adding a comment around pgssHashKey.

PFA a patch to warn about the danger.

Attachments:

pgss_paddingbits.difftext/plain; charset=US-ASCII; name=pgss_paddingbits.diffDownload

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index b04b4d6ce1..829ee69f51 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -124,7 +124,10 @@ typedef enum pgssVersion
 
 /*
  * Hashtable key that defines the identity of a hashtable entry.  We separate
- * queries by user and by database even if they are otherwise identical.
+ * queries by user and by database even if they are otherwise identical.  Be
+ * careful when adding new fields, tag_hash() is used to compute the hash key,
+ * so we rely on the fact that no padding bit is present in this structure.
+ * Otherwise, we'd have to zero the key variable in pgss_store.
  */
 typedef struct pgssHashKey
 {

robertmhaas@gmail.com

about 8 years ago

In reply to: Julien Rouhaud (#48)

Re: 64-bit queryId?

On Thu, Oct 19, 2017 at 2:11 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:

I agree, and I'm perfectly fine with adding a comment around pgssHashKey.

PFA a patch to warn about the danger.

Committed a somewhat different version of this - hope you are OK with
the result.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

rjuju123@gmail.com

about 8 years ago

In reply to: Robert Haas (#49)

Re: 64-bit queryId?

On Fri, Oct 20, 2017 at 3:44 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Oct 19, 2017 at 2:11 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:

I agree, and I'm perfectly fine with adding a comment around pgssHashKey.

PFA a patch to warn about the danger.

Committed a somewhat different version of this - hope you are OK with
the result.

That's much better than what I proposed. Thanks a lot!

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers