Removing faulty hyperLogLog merge function

Started by Peter Geogheganover 9 years ago11 messages
#1Peter Geoghegan
pg@heroku.com
1 attachment(s)

The function hyperLogLogMerge() is faulty [1]/messages/by-id/CAM3SWZT-i6R9JU5YXa8MJUou2_r3LfGJZpQ9tYa1BYxfkj0=cQ@mail.gmail.com -- Peter Geoghegan. It has no current
callers, though. I propose that we rip it out, as in the attached
patch.

[1]: /messages/by-id/CAM3SWZT-i6R9JU5YXa8MJUou2_r3LfGJZpQ9tYa1BYxfkj0=cQ@mail.gmail.com -- Peter Geoghegan
--
Peter Geoghegan

Attachments:

0001-Remove-faulty-hyperLogLog-merge-function.patchtext/x-patch; charset=US-ASCII; name=0001-Remove-faulty-hyperLogLog-merge-function.patchDownload
From 3b1e6db08412bdadd3ad0d5a62a87c71a83e4e04 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Mon, 25 Apr 2016 19:24:43 -0700
Subject: [PATCH] Remove faulty hyperLogLog merge function

The approach taken is not valid with the "sparse" HLL states that
Postgres supports.  Fortunately, there were no callers of the faulty
function, so the impact of this bug was quite limited.

Author: Peter Geoghegan
Discussion: CAM3SWZT-i6R9JU5YXa8MJUou2_r3LfGJZpQ9tYa1BYxfkj0=cQ@mail.gmail.com
Backpatch: 9.5, where the hyperLogLog infrastructure was introduced
---
 src/backend/lib/hyperloglog.c | 22 ----------------------
 src/include/lib/hyperloglog.h |  1 -
 2 files changed, 23 deletions(-)

diff --git a/src/backend/lib/hyperloglog.c b/src/backend/lib/hyperloglog.c
index fa7f05a..6d246ce 100644
--- a/src/backend/lib/hyperloglog.c
+++ b/src/backend/lib/hyperloglog.c
@@ -221,28 +221,6 @@ estimateHyperLogLog(hyperLogLogState *cState)
 }
 
 /*
- * Merges the estimate from one HyperLogLog state to another, returning the
- * estimate of their union.
- *
- * The number of registers in each must match.
- */
-void
-mergeHyperLogLog(hyperLogLogState *cState, const hyperLogLogState *oState)
-{
-	int			r;
-
-	if (cState->nRegisters != oState->nRegisters)
-		elog(ERROR, "number of registers mismatch: %zu != %zu",
-			 cState->nRegisters, oState->nRegisters);
-
-	for (r = 0; r < cState->nRegisters; ++r)
-	{
-		cState->hashesArr[r] = Max(cState->hashesArr[r], oState->hashesArr[r]);
-	}
-}
-
-
-/*
  * Worker for addHyperLogLog().
  *
  * Calculates the position of the first set bit in first b bits of x argument
diff --git a/src/include/lib/hyperloglog.h b/src/include/lib/hyperloglog.h
index b999b30..ee88f8f 100644
--- a/src/include/lib/hyperloglog.h
+++ b/src/include/lib/hyperloglog.h
@@ -63,7 +63,6 @@ extern void initHyperLogLog(hyperLogLogState *cState, uint8 bwidth);
 extern void initHyperLogLogError(hyperLogLogState *cState, double error);
 extern void addHyperLogLog(hyperLogLogState *cState, uint32 hash);
 extern double estimateHyperLogLog(hyperLogLogState *cState);
-extern void mergeHyperLogLog(hyperLogLogState *cState, const hyperLogLogState *oState);
 extern void freeHyperLogLog(hyperLogLogState *cState);
 
 #endif   /* HYPERLOGLOG_H */
-- 
2.7.4

#2Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#1)
Re: Removing faulty hyperLogLog merge function

On Mon, Apr 25, 2016 at 10:39 PM, Peter Geoghegan <pg@heroku.com> wrote:

The function hyperLogLogMerge() is faulty [1]. It has no current
callers, though. I propose that we rip it out, as in the attached
patch.

[1] /messages/by-id/CAM3SWZT-i6R9JU5YXa8MJUou2_r3LfGJZpQ9tYa1BYxfkj0=cQ@mail.gmail.com

I'm not prepared to commit this over the objection offered by Tomas
Vondra on that thread.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Peter Geoghegan
pg@heroku.com
In reply to: Robert Haas (#2)
Re: Removing faulty hyperLogLog merge function

On Tue, Apr 26, 2016 at 4:41 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I'm not prepared to commit this over the objection offered by Tomas
Vondra on that thread.

You don't want to remove buggy code that is currently unused simply
because it might be useful to have that functionality in the future?

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#3)
Re: Removing faulty hyperLogLog merge function

On Tue, Apr 26, 2016 at 7:46 PM, Peter Geoghegan <pg@heroku.com> wrote:

On Tue, Apr 26, 2016 at 4:41 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I'm not prepared to commit this over the objection offered by Tomas
Vondra on that thread.

You don't want to remove buggy code that is currently unused simply
because it might be useful to have that functionality in the future?

No, I don't want to remove code that somebody thinks we should fix
instead of removing on your say-so.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Peter Geoghegan
pg@heroku.com
In reply to: Robert Haas (#4)
Re: Removing faulty hyperLogLog merge function

On Tue, Apr 26, 2016 at 4:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:

You don't want to remove buggy code that is currently unused simply
because it might be useful to have that functionality in the future?

No, I don't want to remove code that somebody thinks we should fix
instead of removing on your say-so.

I don't think that that's an efficient use of anyone's time. If
somebody proposes a patch with functionality that needs to merge HLL
states, then they can add an implementation at that time.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#5)
Re: Removing faulty hyperLogLog merge function

On Tue, Apr 26, 2016 at 7:54 PM, Peter Geoghegan <pg@heroku.com> wrote:

On Tue, Apr 26, 2016 at 4:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:

You don't want to remove buggy code that is currently unused simply
because it might be useful to have that functionality in the future?

No, I don't want to remove code that somebody thinks we should fix
instead of removing on your say-so.

I don't think that that's an efficient use of anyone's time. If
somebody proposes a patch with functionality that needs to merge HLL
states, then they can add an implementation at that time.

Peter, if there is consensus on this patch, I will commit it. If
there isn't, I won't. If you don't like that, sorry.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#2)
Re: Removing faulty hyperLogLog merge function

Robert Haas <robertmhaas@gmail.com> writes:

I'm not prepared to commit this over the objection offered by Tomas
Vondra on that thread.

FWIW, I agree with Peter that we should remove this code. We know that it
is buggy. Leaving it there constitutes an "attractive nuisance" --- that
is, I'm afraid that someone will submit a patch that depends on that
function, and that we might forget that the function is broken and commit
said patch.

Tomas' objection would be reasonable if a fix was simple, but so far as
I can tell from the thread, it's not. In particular, Peter doesn't trust
the upstream patch in question. But whether or not you trust it, doing
nothing is not a sane choice. The reasonable alternatives are to remove
the merge function or sync the upstream patch.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#7)
Re: Removing faulty hyperLogLog merge function

On Tue, Apr 26, 2016 at 9:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I'm not prepared to commit this over the objection offered by Tomas
Vondra on that thread.

FWIW, I agree with Peter that we should remove this code. We know that it
is buggy. Leaving it there constitutes an "attractive nuisance" --- that
is, I'm afraid that someone will submit a patch that depends on that
function, and that we might forget that the function is broken and commit
said patch.

Tomas' objection would be reasonable if a fix was simple, but so far as
I can tell from the thread, it's not. In particular, Peter doesn't trust
the upstream patch in question. But whether or not you trust it, doing
nothing is not a sane choice. The reasonable alternatives are to remove
the merge function or sync the upstream patch.

Now I agree with that. And now we do not have a 1-1 tie on which
alternative to prefer, which is a good start towards a consensus. Any
other views?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Joe Conway
mail@joeconway.com
In reply to: Robert Haas (#8)
Re: Removing faulty hyperLogLog merge function

On 04/26/2016 07:23 PM, Robert Haas wrote:

On Tue, Apr 26, 2016 at 9:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I'm not prepared to commit this over the objection offered by Tomas
Vondra on that thread.

FWIW, I agree with Peter that we should remove this code. We know that it
is buggy. Leaving it there constitutes an "attractive nuisance" --- that
is, I'm afraid that someone will submit a patch that depends on that
function, and that we might forget that the function is broken and commit
said patch.

Tomas' objection would be reasonable if a fix was simple, but so far as
I can tell from the thread, it's not. In particular, Peter doesn't trust
the upstream patch in question. But whether or not you trust it, doing
nothing is not a sane choice. The reasonable alternatives are to remove
the merge function or sync the upstream patch.

Now I agree with that. And now we do not have a 1-1 tie on which
alternative to prefer, which is a good start towards a consensus. Any
other views?

I haven't followed this issue all that closely, but to me it seems
pretty clear. If the function is brand new to 9.6, buggy, and not even
used anywhere, I cannot imagine why we would leave it in the tree.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

#10Michael Paquier
michael.paquier@gmail.com
In reply to: Joe Conway (#9)
Re: Removing faulty hyperLogLog merge function

On Wed, Apr 27, 2016 at 12:08 PM, Joe Conway <mail@joeconway.com> wrote:

On 04/26/2016 07:23 PM, Robert Haas wrote:

On Tue, Apr 26, 2016 at 9:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I'm not prepared to commit this over the objection offered by Tomas
Vondra on that thread.

FWIW, I agree with Peter that we should remove this code. We know that it
is buggy. Leaving it there constitutes an "attractive nuisance" --- that
is, I'm afraid that someone will submit a patch that depends on that
function, and that we might forget that the function is broken and commit
said patch.

Tomas' objection would be reasonable if a fix was simple, but so far as
I can tell from the thread, it's not. In particular, Peter doesn't trust
the upstream patch in question. But whether or not you trust it, doing
nothing is not a sane choice. The reasonable alternatives are to remove
the merge function or sync the upstream patch.

Now I agree with that. And now we do not have a 1-1 tie on which
alternative to prefer, which is a good start towards a consensus. Any
other views?

I haven't followed this issue all that closely, but to me it seems
pretty clear. If the function is brand new to 9.6, buggy, and not even
used anywhere, I cannot imagine why we would leave it in the tree.

+1. We should definitely not encourage its use for 3rd-part plugins.
-- 
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#10)
Re: Removing faulty hyperLogLog merge function

On Wed, Apr 27, 2016 at 1:37 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

I haven't followed this issue all that closely, but to me it seems
pretty clear. If the function is brand new to 9.6, buggy, and not even
used anywhere, I cannot imagine why we would leave it in the tree.

+1. We should definitely not encourage its use for 3rd-part plugins.

So, now there's clearly a consensus. Committed. :-)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers