Issues with generate_series using integer boundaries

Started by Thom Brownalmost 15 years ago34 messages

Thom Brown

thom@linux.com

almost 15 years ago

Hi,

I've noticed that if I try to use generate_series to include the upper
boundary of int4, it never returns:

SELECT x FROM generate_series(2147483643::int4, 2147483647::int4) AS a(x);

But the same query with int8 returns instantly:

SELECT x FROM generate_series(2147483643::int8, 2147483647::int8) AS a(x);

However, the int8 version of generate_series has the same problem.
This never returns:

SELECT x FROM generate_series(9223372036854775803::int8,
9223372036854775807::int8) AS a(x);

Another issue happens when using the lower boundaries:

postgres=# SELECT x FROM generate_series(-2147483648::int4,
-2147483644::int4) AS a(x);
ERROR: integer out of range
postgres=# SELECT x FROM generate_series(-9223372036854775808::int8,
-9223372036854775804::int8) AS a(x);
ERROR: bigint out of range

I've recreated this on 9.0.1 and 9.1devel on a 64-bit platform.

Bug?

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

Thom Brown

thom@linux.com

almost 15 years ago

In reply to: Thom Brown (#1)

Re: Issues with generate_series using integer boundaries

On 1 February 2011 00:15, Thom Brown <thom@linux.com> wrote:

Hi,

I've noticed that if I try to use generate_series to include the upper
boundary of int4, it never returns:

SELECT x FROM generate_series(2147483643::int4, 2147483647::int4) AS a(x);

But the same query with int8 returns instantly:

SELECT x FROM generate_series(2147483643::int8, 2147483647::int8) AS a(x);

However, the int8 version of generate_series has the same problem.
This never returns:

SELECT x FROM generate_series(9223372036854775803::int8,
9223372036854775807::int8) AS a(x);

Another issue happens when using the lower boundaries:

postgres=# SELECT x FROM generate_series(-2147483648::int4,
-2147483644::int4) AS a(x);
ERROR: integer out of range
postgres=# SELECT x FROM generate_series(-9223372036854775808::int8,
-9223372036854775804::int8) AS a(x);
ERROR: bigint out of range

I've recreated this on 9.0.1 and 9.1devel on a 64-bit platform.

Bug?

Actually, those lower bound errors aren't related to generate_series,
but I'd still like to know why -2147483648::int4 is out of range.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

Tom Lane

tgl@sss.pgh.pa.us

almost 15 years ago

In reply to: Thom Brown (#2)

Re: Issues with generate_series using integer boundaries

Thom Brown <thom@linux.com> writes:

Actually, those lower bound errors aren't related to generate_series,
but I'd still like to know why -2147483648::int4 is out of range.

:: binds tighter than - (and everything else too). Write
(-2147483648)::int4 instead.

regards, tom lane

Thom Brown

thom@linux.com

almost 15 years ago

In reply to: Tom Lane (#3)

Re: Issues with generate_series using integer boundaries

On 1 February 2011 00:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thom Brown <thom@linux.com> writes:

Actually, those lower bound errors aren't related to generate_series,
but I'd still like to know why -2147483648::int4 is out of range.

:: binds tighter than - (and everything else too). Write
(-2147483648)::int4 instead.

D'oh. You explained this to me before. This time I'll endeavour to
remember it. At least that explains the problem I created for myself
with lower boundaries.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

Thom Brown

thom@linux.com

almost 15 years ago

In reply to: Thom Brown (#4)

Re: Issues with generate_series using integer boundaries

On 1 February 2011 00:41, Thom Brown <thom@linux.com> wrote:

On 1 February 2011 00:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thom Brown <thom@linux.com> writes:

Actually, those lower bound errors aren't related to generate_series,
but I'd still like to know why -2147483648::int4 is out of range.

:: binds tighter than - (and everything else too). Write
(-2147483648)::int4 instead.

D'oh. You explained this to me before. This time I'll endeavour to
remember it. At least that explains the problem I created for myself
with lower boundaries.

Okay, so lower boundaries are still affected in the same way as upper
boundaries after all, at least when using a reverse series:

SELECT x FROM generate_series((-2147483644)::int4,
(-2147483648)::int4, -1) AS a(x);

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

Tom Lane

tgl@sss.pgh.pa.us

almost 15 years ago

In reply to: Thom Brown (#1)

Re: Issues with generate_series using integer boundaries

Thom Brown <thom@linux.com> writes:

I've noticed that if I try to use generate_series to include the upper
boundary of int4, it never returns:

I'll bet it's testing "currval > bound" without considering the
possibility that incrementing currval caused an overflow wraparound.
We fixed a similar problem years ago in plpgsql FOR-loops...

regards, tom lane

Thom Brown

thom@linux.com

almost 15 years ago

In reply to: Tom Lane (#6)

Re: Issues with generate_series using integer boundaries

On 1 February 2011 01:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thom Brown <thom@linux.com> writes:

I've noticed that if I try to use generate_series to include the upper
boundary of int4, it never returns:

I'll bet it's testing "currval > bound" without considering the
possibility that incrementing currval caused an overflow wraparound.
We fixed a similar problem years ago in plpgsql FOR-loops...

And here's another case:

SELECT x FROM generate_series(2147483643, 2147483644,5) AS a(x);

A step of 1 would work fine, but forcing it to exceed the boundary in
this way means it never returns.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

Thom Brown

thom@linux.com

almost 15 years ago

In reply to: Thom Brown (#7)

Re: Issues with generate_series using integer boundaries

On 1 February 2011 01:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thom Brown <thom@linux.com> writes:

I've noticed that if I try to use generate_series to include the upper
boundary of int4, it never returns:

I'll bet it's testing "currval > bound" without considering the
possibility that incrementing currval caused an overflow wraparound.
We fixed a similar problem years ago in plpgsql FOR-loops...

Yes, you're right. Internally, the current value is checked against
the finish. If it hasn't yet passed it, the current value is
increased by the step. When it reaches the upper bound, since it
hasn't yet exceeded the finish, it proceeds to increment it again,
resulting in the iterator wrapping past the upper bound to become the
lower bound. This then keeps it looping from the lower bound upward,
so the current value stays well below the end.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

Alban Hertroys

dalroi@solfertje.student.utwente.nl

almost 15 years ago

In reply to: Thom Brown (#8)

Re: Issues with generate_series using integer boundaries

On 1 Feb 2011, at 21:26, Thom Brown wrote:

On 1 February 2011 01:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thom Brown <thom@linux.com> writes:

I've noticed that if I try to use generate_series to include the upper
boundary of int4, it never returns:

I'll bet it's testing "currval > bound" without considering the
possibility that incrementing currval caused an overflow wraparound.
We fixed a similar problem years ago in plpgsql FOR-loops...

Yes, you're right. Internally, the current value is checked against
the finish. If it hasn't yet passed it, the current value is
increased by the step. When it reaches the upper bound, since it
hasn't yet exceeded the finish, it proceeds to increment it again,
resulting in the iterator wrapping past the upper bound to become the
lower bound. This then keeps it looping from the lower bound upward,
so the current value stays well below the end.

That could actually be used as a feature to create a repeating series. A bit more control would be useful though :P

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,4d487c1211731974314558!

#10

Thom Brown

thom@linux.com

almost 15 years ago

In reply to: Alban Hertroys (#9)

Re: Issues with generate_series using integer boundaries

On 1 February 2011 21:32, Alban Hertroys
<dalroi@solfertje.student.utwente.nl> wrote:

On 1 Feb 2011, at 21:26, Thom Brown wrote:

On 1 February 2011 01:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thom Brown <thom@linux.com> writes:

I've noticed that if I try to use generate_series to include the upper
boundary of int4, it never returns:

I'll bet it's testing "currval > bound" without considering the
possibility that incrementing currval caused an overflow wraparound.
We fixed a similar problem years ago in plpgsql FOR-loops...

Yes, you're right. Internally, the current value is checked against
the finish. If it hasn't yet passed it, the current value is
increased by the step. When it reaches the upper bound, since it
hasn't yet exceeded the finish, it proceeds to increment it again,
resulting in the iterator wrapping past the upper bound to become the
lower bound. This then keeps it looping from the lower bound upward,
so the current value stays well below the end.

That could actually be used as a feature to create a repeating series. A bit more control would be useful though :P

I don't quite understand why the code works. As I see it, it always
returns a set with values 1 higher than the corresponding result. So
requesting 1 to 5 actually returns 2 to 6 internally, but somehow it
correctly shows 1 to 5 in the query output. If there were no such
discrepancy, the upper-bound/lower-bound problem wouldn't exist, so
not sure how those output values result in the correct query result
values.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

#11

Thom Brown

thom@linux.com

almost 15 years ago

In reply to: Thom Brown (#10)

1 attachment(s)

Re: Issues with generate_series using integer boundaries

On 1 February 2011 23:08, Thom Brown <thom@linux.com> wrote:

On 1 February 2011 21:32, Alban Hertroys
<dalroi@solfertje.student.utwente.nl> wrote:

On 1 Feb 2011, at 21:26, Thom Brown wrote:

On 1 February 2011 01:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thom Brown <thom@linux.com> writes:

I've noticed that if I try to use generate_series to include the upper
boundary of int4, it never returns:

I'll bet it's testing "currval > bound" without considering the
possibility that incrementing currval caused an overflow wraparound.
We fixed a similar problem years ago in plpgsql FOR-loops...

Yes, you're right. Internally, the current value is checked against
the finish. If it hasn't yet passed it, the current value is
increased by the step. When it reaches the upper bound, since it
hasn't yet exceeded the finish, it proceeds to increment it again,
resulting in the iterator wrapping past the upper bound to become the
lower bound. This then keeps it looping from the lower bound upward,
so the current value stays well below the end.

That could actually be used as a feature to create a repeating series. A bit more control would be useful though :P

I don't quite understand why the code works. As I see it, it always
returns a set with values 1 higher than the corresponding result. So
requesting 1 to 5 actually returns 2 to 6 internally, but somehow it
correctly shows 1 to 5 in the query output. If there were no such
discrepancy, the upper-bound/lower-bound problem wouldn't exist, so
not sure how those output values result in the correct query result
values.

Okay, I've attached a patch which fixes it. It allows ranges up to
upper and down to lower bounds as well as accounting for the
possibility for the step to cause misalignment of the iterating value
with the end value. The following now works which would usually get
stuck in a loop:

postgres=# SELECT x FROM generate_series(2147483643::int4,
2147483647::int4) AS a(x);
x
------------
2147483643
2147483644
2147483645
2147483646
2147483647
(5 rows)

postgres=# SELECT x FROM generate_series(2147483642::int4,
2147483647::int4, 2) AS a(x);
x
------------
2147483642
2147483644
2147483646
(3 rows)

postgres=# SELECT x FROM generate_series(2147483643::int4,
2147483647::int4, 6) AS a(x);
x
------------
2147483643
(1 row)

It's probably safe to assume the changes in the patch aren't up to
scratch and it's supplied for demonstration purposes only, so could
someone please use the same principals and code in the appropriate
changes?

Thanks

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

Attachments:

generate_series_fix.patchapplication/octet-stream; name=generate_series_fix.patchDownload

diff --git a/doc/src/sgml/ref/create_user_mapping.sgml b/doc/src/sgml/ref/create_user_mapping.sgml
index 3094442..c960628 100644
--- a/doc/src/sgml/ref/create_user_mapping.sgml
+++ b/doc/src/sgml/ref/create_user_mapping.sgml
@@ -34,7 +34,7 @@ CREATE USER MAPPING FOR { <replaceable class="parameter">user_name</replaceable>
    <command>CREATE USER MAPPING</command> defines a mapping of a user
    to a foreign server.  A user mapping typically encapsulates
    connection information that a foreign-data wrapper uses together
-   with the information encapsulated be a foreign server to access an
+   with the information encapsulated by a foreign server to access an
    external data resource.
   </para>
 
diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index 78bac40..0441aaf 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -1304,7 +1304,6 @@ generate_series_step_int4(PG_FUNCTION_ARGS)
 {
 	FuncCallContext *funcctx;
 	generate_series_fctx *fctx;
-	int32		result;
 	MemoryContext oldcontext;
 
 	/* stuff done only on the first call of the function */
@@ -1343,27 +1342,32 @@ generate_series_step_int4(PG_FUNCTION_ARGS)
 
 		funcctx->user_fctx = fctx;
 		MemoryContextSwitchTo(oldcontext);
+
+		/* send off first value in series */
+		SRF_RETURN_NEXT(funcctx, Int32GetDatum(start));
 	}
 
 	/* stuff done on every call of the function */
 	funcctx = SRF_PERCALL_SETUP();
 
 	/*
-	 * get the saved state and use current as the result for this iteration
+	 * get the saved state
 	 */
 	fctx = funcctx->user_fctx;
-	result = fctx->current;
 
-	if ((fctx->step > 0 && fctx->current <= fctx->finish) ||
-		(fctx->step < 0 && fctx->current >= fctx->finish))
+	if ((fctx->step > 0 && fctx->current < fctx->finish &&
+			fctx->current + fctx->step > fctx->current) ||
+		(fctx->step < 0 && fctx->current > fctx->finish &&
+			fctx->current + fctx->step < fctx->current))
 	{
 		/* increment current in preparation for next iteration */
 		fctx->current += fctx->step;
 
-		/* do when there is more left to send */
-		SRF_RETURN_NEXT(funcctx, Int32GetDatum(result));
+		/* send off current value in series */
+		SRF_RETURN_NEXT(funcctx, Int32GetDatum(fctx->current));
 	}
 	else
 		/* do when there is no more left */
 		SRF_RETURN_DONE(funcctx);
 }
+

#12

Thom Brown

thom@linux.com

almost 15 years ago

In reply to: Thom Brown (#11)

1 attachment(s)

Re: Issues with generate_series using integer boundaries

On 3 February 2011 11:31, Thom Brown <thom@linux.com> wrote:

On 1 February 2011 23:08, Thom Brown <thom@linux.com> wrote:

On 1 February 2011 21:32, Alban Hertroys
<dalroi@solfertje.student.utwente.nl> wrote:

On 1 Feb 2011, at 21:26, Thom Brown wrote:

On 1 February 2011 01:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thom Brown <thom@linux.com> writes:

I've noticed that if I try to use generate_series to include the upper
boundary of int4, it never returns:

I'll bet it's testing "currval > bound" without considering the
possibility that incrementing currval caused an overflow wraparound.
We fixed a similar problem years ago in plpgsql FOR-loops...

Yes, you're right. Internally, the current value is checked against
the finish. If it hasn't yet passed it, the current value is
increased by the step. When it reaches the upper bound, since it
hasn't yet exceeded the finish, it proceeds to increment it again,
resulting in the iterator wrapping past the upper bound to become the
lower bound. This then keeps it looping from the lower bound upward,
so the current value stays well below the end.

That could actually be used as a feature to create a repeating series. A bit more control would be useful though :P

I don't quite understand why the code works. As I see it, it always
returns a set with values 1 higher than the corresponding result. So
requesting 1 to 5 actually returns 2 to 6 internally, but somehow it
correctly shows 1 to 5 in the query output. If there were no such
discrepancy, the upper-bound/lower-bound problem wouldn't exist, so
not sure how those output values result in the correct query result
values.

Okay, I've attached a patch which fixes it. It allows ranges up to
upper and down to lower bounds as well as accounting for the
possibility for the step to cause misalignment of the iterating value
with the end value. The following now works which would usually get
stuck in a loop:

postgres=# SELECT x FROM generate_series(2147483643::int4,
2147483647::int4) AS a(x);
x
------------
2147483643
2147483644
2147483645
2147483646
2147483647
(5 rows)

postgres=# SELECT x FROM generate_series(2147483642::int4,
2147483647::int4, 2) AS a(x);
x
------------
2147483642
2147483644
2147483646
(3 rows)

postgres=# SELECT x FROM generate_series(2147483643::int4,
2147483647::int4, 6) AS a(x);
x
------------
2147483643
(1 row)

It's probably safe to assume the changes in the patch aren't up to
scratch and it's supplied for demonstration purposes only, so could
someone please use the same principals and code in the appropriate
changes?

Thanks

And I see I accidentally included a doc change in there. Removed and
reattached:

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

Attachments:

generate_series_fix.v2.patchapplication/octet-stream; name=generate_series_fix.v2.patchDownload

diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index 78bac40..0441aaf 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -1304,7 +1304,6 @@ generate_series_step_int4(PG_FUNCTION_ARGS)
 {
 	FuncCallContext *funcctx;
 	generate_series_fctx *fctx;
-	int32		result;
 	MemoryContext oldcontext;
 
 	/* stuff done only on the first call of the function */
@@ -1343,27 +1342,32 @@ generate_series_step_int4(PG_FUNCTION_ARGS)
 
 		funcctx->user_fctx = fctx;
 		MemoryContextSwitchTo(oldcontext);
+
+		/* send off first value in series */
+		SRF_RETURN_NEXT(funcctx, Int32GetDatum(start));
 	}
 
 	/* stuff done on every call of the function */
 	funcctx = SRF_PERCALL_SETUP();
 
 	/*
-	 * get the saved state and use current as the result for this iteration
+	 * get the saved state
 	 */
 	fctx = funcctx->user_fctx;
-	result = fctx->current;
 
-	if ((fctx->step > 0 && fctx->current <= fctx->finish) ||
-		(fctx->step < 0 && fctx->current >= fctx->finish))
+	if ((fctx->step > 0 && fctx->current < fctx->finish &&
+			fctx->current + fctx->step > fctx->current) ||
+		(fctx->step < 0 && fctx->current > fctx->finish &&
+			fctx->current + fctx->step < fctx->current))
 	{
 		/* increment current in preparation for next iteration */
 		fctx->current += fctx->step;
 
-		/* do when there is more left to send */
-		SRF_RETURN_NEXT(funcctx, Int32GetDatum(result));
+		/* send off current value in series */
+		SRF_RETURN_NEXT(funcctx, Int32GetDatum(fctx->current));
 	}
 	else
 		/* do when there is no more left */
 		SRF_RETURN_DONE(funcctx);
 }
+

#13

Thom Brown

thom@linux.com

almost 15 years ago

In reply to: Thom Brown (#12)

Re: Issues with generate_series using integer boundaries

On 3 February 2011 11:34, Thom Brown <thom@linux.com> wrote:

On 3 February 2011 11:31, Thom Brown <thom@linux.com> wrote:

On 1 February 2011 23:08, Thom Brown <thom@linux.com> wrote:

On 1 February 2011 21:32, Alban Hertroys
<dalroi@solfertje.student.utwente.nl> wrote:

On 1 Feb 2011, at 21:26, Thom Brown wrote:

On 1 February 2011 01:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thom Brown <thom@linux.com> writes:

I've noticed that if I try to use generate_series to include the upper
boundary of int4, it never returns:

I'll bet it's testing "currval > bound" without considering the
possibility that incrementing currval caused an overflow wraparound.
We fixed a similar problem years ago in plpgsql FOR-loops...

Yes, you're right. Internally, the current value is checked against
the finish. If it hasn't yet passed it, the current value is
increased by the step. When it reaches the upper bound, since it
hasn't yet exceeded the finish, it proceeds to increment it again,
resulting in the iterator wrapping past the upper bound to become the
lower bound. This then keeps it looping from the lower bound upward,
so the current value stays well below the end.

That could actually be used as a feature to create a repeating series. A bit more control would be useful though :P

I don't quite understand why the code works. As I see it, it always
returns a set with values 1 higher than the corresponding result. So
requesting 1 to 5 actually returns 2 to 6 internally, but somehow it
correctly shows 1 to 5 in the query output. If there were no such
discrepancy, the upper-bound/lower-bound problem wouldn't exist, so
not sure how those output values result in the correct query result
values.

Okay, I've attached a patch which fixes it. It allows ranges up to
upper and down to lower bounds as well as accounting for the
possibility for the step to cause misalignment of the iterating value
with the end value. The following now works which would usually get
stuck in a loop:

postgres=# SELECT x FROM generate_series(2147483643::int4,
2147483647::int4) AS a(x);
x
------------
2147483643
2147483644
2147483645
2147483646
2147483647
(5 rows)

postgres=# SELECT x FROM generate_series(2147483642::int4,
2147483647::int4, 2) AS a(x);
x
------------
2147483642
2147483644
2147483646
(3 rows)

postgres=# SELECT x FROM generate_series(2147483643::int4,
2147483647::int4, 6) AS a(x);
x
------------
2147483643
(1 row)

It's probably safe to assume the changes in the patch aren't up to
scratch and it's supplied for demonstration purposes only, so could
someone please use the same principals and code in the appropriate
changes?

Thanks

And I see I accidentally included a doc change in there. Removed and
reattached:

Actually, further testing indicates this causes other problems:

postgres=# SELECT x FROM generate_series(1, 9,-1) AS a(x);
x
---
1
(1 row)

Should return no rows.

postgres=# SELECT x FROM generate_series(1, 9,3) AS a(x);
x
----
1
4
7
10
(4 rows)

Should return 3 rows.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

#14

Thom Brown

thom@linux.com

almost 15 years ago

In reply to: Thom Brown (#13)

1 attachment(s)

Re: Issues with generate_series using integer boundaries

On 3 February 2011 13:32, Thom Brown <thom@linux.com> wrote:

Actually, further testing indicates this causes other problems:

postgres=# SELECT x FROM generate_series(1, 9,-1) AS a(x);
x
---
1
(1 row)

Should return no rows.

postgres=# SELECT x FROM generate_series(1, 9,3) AS a(x);
x
----
1
4
7
10
(4 rows)

Should return 3 rows.

Still messy code, but the attached patch does the job now:

postgres=# SELECT x FROM
generate_series(2147483643::int4,2147483647::int4) AS a(x);
x
------------
2147483643
2147483644
2147483645
2147483646
2147483647
(5 rows)

postgres=# SELECT x FROM
generate_series(2147483642::int4,2147483647::int4, 2) AS a(x);
x
------------
2147483642
2147483644
2147483646
(3 rows)

postgres=# SELECT x FROM
generate_series(2147483643::int4,2147483647::int4, 6) AS a(x);
x
------------
2147483643
(1 row)

postgres=# SELECT x FROM generate_series((-2147483643)::int4,
(-2147483648)::int4, -1) AS a(x);
x
-------------
-2147483643
-2147483644
-2147483645
-2147483646
-2147483647
-2147483648
(6 rows)

postgres=# SELECT x FROM generate_series(1, 9,-1) AS a(x);
x
---
(0 rows)

postgres=# SELECT x FROM generate_series(1, 9,3) AS a(x);
x
---
1
4
7
(3 rows)

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

Attachments:

generate_series_fix.v3.patchapplication/octet-stream; name=generate_series_fix.v3.patchDownload

diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index 78bac40..46d2716 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -1304,7 +1304,6 @@ generate_series_step_int4(PG_FUNCTION_ARGS)
 {
 	FuncCallContext *funcctx;
 	generate_series_fctx *fctx;
-	int32		result;
 	MemoryContext oldcontext;
 
 	/* stuff done only on the first call of the function */
@@ -1343,25 +1342,34 @@ generate_series_step_int4(PG_FUNCTION_ARGS)
 
 		funcctx->user_fctx = fctx;
 		MemoryContextSwitchTo(oldcontext);
+
+		/* ensure first value in series should exist */
+		if ((fctx->step > 0 && fctx->current + fctx->step <= fctx->finish) ||
+			(fctx->step < 0 && fctx->current + fctx->step >= fctx->finish))
+		/* send off first value in series */
+		SRF_RETURN_NEXT(funcctx, Int32GetDatum(start));
 	}
 
 	/* stuff done on every call of the function */
 	funcctx = SRF_PERCALL_SETUP();
 
 	/*
-	 * get the saved state and use current as the result for this iteration
+	 * get the saved state
 	 */
 	fctx = funcctx->user_fctx;
-	result = fctx->current;
 
-	if ((fctx->step > 0 && fctx->current <= fctx->finish) ||
-		(fctx->step < 0 && fctx->current >= fctx->finish))
+	if ((fctx->step > 0 &&
+		fctx->current < fctx->current + fctx->step &&
+		fctx->current + fctx->step <= fctx->finish) ||
+		(fctx->step < 0 &&
+		fctx->current > fctx->current + fctx->step &&
+		fctx->current + fctx->step >= fctx->finish))
 	{
 		/* increment current in preparation for next iteration */
 		fctx->current += fctx->step;
 
-		/* do when there is more left to send */
-		SRF_RETURN_NEXT(funcctx, Int32GetDatum(result));
+		/* send off current value in series */
+		SRF_RETURN_NEXT(funcctx, Int32GetDatum(fctx->current));
 	}
 	else
 		/* do when there is no more left */

#15

David Johnston

polobo@yahoo.com

almost 15 years ago

In reply to: Thom Brown (#14)

Re: Issues with generate_series using integer boundaries

The proposed generate_series(1,9,-1) behavior seems unusual. I think it
should throw a warning if the step direction and the start-end directions do
not match. Alternatively, the series generated could go from 9 -> 1 instead
of returning an empty series (basically the first two arguments are simply
bounds and the step sign determines which is upper and which is lower). The
result where the set contains the sole member { 1 } makes sense to me in
that you wanted to start with 1 and then increment by -1 until you are
either less-than 1 or greater-than 9; which is the same thing you are doing
when you have a positive step value and always treat the first argument as
the initial value. With that behavior you are ALWAYS returning the first
argument, then stepping, then returning any other argument that still fall
within the range. If you do not return the first argument you are
implicitly starting with zero (0) and incrementing and then seeing whether
the first step falls inside the specified range.

David J

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Thom Brown
Sent: Thursday, February 03, 2011 8:58 AM
To: Alban Hertroys
Cc: Tom Lane; PGSQL Mailing List
Subject: Re: [GENERAL] Issues with generate_series using integer boundaries

On 3 February 2011 13:32, Thom Brown <thom@linux.com> wrote:

Actually, further testing indicates this causes other problems:

postgres=# SELECT x FROM generate_series(1, 9,-1) AS a(x);
x
---
1
(1 row)

Should return no rows.

postgres=# SELECT x FROM generate_series(1, 9,3) AS a(x);
x
----
1
4
7
10
(4 rows)

Should return 3 rows.

Still messy code, but the attached patch does the job now:

postgres=# SELECT x FROM
generate_series(2147483643::int4,2147483647::int4) AS a(x);
x
------------
2147483643
2147483644
2147483645
2147483646
2147483647
(5 rows)

postgres=# SELECT x FROM
generate_series(2147483642::int4,2147483647::int4, 2) AS a(x);
x
------------
2147483642
2147483644
2147483646
(3 rows)

postgres=# SELECT x FROM
generate_series(2147483643::int4,2147483647::int4, 6) AS a(x);
x
------------
2147483643
(1 row)

postgres=# SELECT x FROM generate_series((-2147483643)::int4,
(-2147483648)::int4, -1) AS a(x);
x
-------------
-2147483643
-2147483644
-2147483645
-2147483646
-2147483647
-2147483648
(6 rows)

postgres=# SELECT x FROM generate_series(1, 9,-1) AS a(x); x
---
(0 rows)

postgres=# SELECT x FROM generate_series(1, 9,3) AS a(x); x
---
1
4
7
(3 rows)

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

#16

Thom Brown

thom@linux.com

almost 15 years ago

In reply to: David Johnston (#15)

Re: Issues with generate_series using integer boundaries

On 3 February 2011 14:37, David Johnston <polobo@yahoo.com> wrote:

The proposed generate_series(1,9,-1) behavior seems unusual.

I haven't proposed this behaviour as it already occurs. I just
include it for testing to ensure no other part of generate series is
affected by such changes. Whether or not that is desired behaviour is
separate as I only wish to fix the bug.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

#17

Thom Brown

thom@linux.com

almost 15 years ago

In reply to: Thom Brown (#14)

1 attachment(s)

Re: Issues with generate_series using integer boundaries

On 3 February 2011 13:58, Thom Brown <thom@linux.com> wrote:

On 3 February 2011 13:32, Thom Brown <thom@linux.com> wrote:

Actually, further testing indicates this causes other problems:

postgres=# SELECT x FROM generate_series(1, 9,-1) AS a(x);
x
---
1
(1 row)

Should return no rows.

postgres=# SELECT x FROM generate_series(1, 9,3) AS a(x);
x
----
1
4
7
10
(4 rows)

Should return 3 rows.

Still messy code, but the attached patch does the job now:

postgres=# SELECT x FROM
generate_series(2147483643::int4,2147483647::int4) AS a(x);
x
------------
2147483643
2147483644
2147483645
2147483646
2147483647
(5 rows)

postgres=# SELECT x FROM
generate_series(2147483642::int4,2147483647::int4, 2) AS a(x);
x
------------
2147483642
2147483644
2147483646
(3 rows)

postgres=# SELECT x FROM
generate_series(2147483643::int4,2147483647::int4, 6) AS a(x);
x
------------
2147483643
(1 row)

postgres=# SELECT x FROM generate_series((-2147483643)::int4,
(-2147483648)::int4, -1) AS a(x);
x
-------------
-2147483643
-2147483644
-2147483645
-2147483646
-2147483647
-2147483648
(6 rows)

postgres=# SELECT x FROM generate_series(1, 9,-1) AS a(x);
x
---
(0 rows)

postgres=# SELECT x FROM generate_series(1, 9,3) AS a(x);
x
---
1
4
7
(3 rows)

Copying to -hackers.

The issue is that generate_series will not return if the series hits
either the upper or lower boundary during increment, or goes beyond
it. The attached patch fixes this behaviour, but should probably be
done a better way. The first 3 examples above will not return.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

Attachments:

generate_series_fix.v3.patchapplication/octet-stream; name=generate_series_fix.v3.patchDownload

diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index 78bac40..46d2716 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -1304,7 +1304,6 @@ generate_series_step_int4(PG_FUNCTION_ARGS)
 {
 	FuncCallContext *funcctx;
 	generate_series_fctx *fctx;
-	int32		result;
 	MemoryContext oldcontext;
 
 	/* stuff done only on the first call of the function */
@@ -1343,25 +1342,34 @@ generate_series_step_int4(PG_FUNCTION_ARGS)
 
 		funcctx->user_fctx = fctx;
 		MemoryContextSwitchTo(oldcontext);
+
+		/* ensure first value in series should exist */
+		if ((fctx->step > 0 && fctx->current + fctx->step <= fctx->finish) ||
+			(fctx->step < 0 && fctx->current + fctx->step >= fctx->finish))
+		/* send off first value in series */
+		SRF_RETURN_NEXT(funcctx, Int32GetDatum(start));
 	}
 
 	/* stuff done on every call of the function */
 	funcctx = SRF_PERCALL_SETUP();
 
 	/*
-	 * get the saved state and use current as the result for this iteration
+	 * get the saved state
 	 */
 	fctx = funcctx->user_fctx;
-	result = fctx->current;
 
-	if ((fctx->step > 0 && fctx->current <= fctx->finish) ||
-		(fctx->step < 0 && fctx->current >= fctx->finish))
+	if ((fctx->step > 0 &&
+		fctx->current < fctx->current + fctx->step &&
+		fctx->current + fctx->step <= fctx->finish) ||
+		(fctx->step < 0 &&
+		fctx->current > fctx->current + fctx->step &&
+		fctx->current + fctx->step >= fctx->finish))
 	{
 		/* increment current in preparation for next iteration */
 		fctx->current += fctx->step;
 
-		/* do when there is more left to send */
-		SRF_RETURN_NEXT(funcctx, Int32GetDatum(result));
+		/* send off current value in series */
+		SRF_RETURN_NEXT(funcctx, Int32GetDatum(fctx->current));
 	}
 	else
 		/* do when there is no more left */

#18

Itagaki Takahiro

itagaki.takahiro@gmail.com

almost 15 years ago

In reply to: Thom Brown (#17)

Re: [HACKERS] Issues with generate_series using integer boundaries

On Fri, Feb 4, 2011 at 21:32, Thom Brown <thom@linux.com> wrote:

The issue is that generate_series will not return if the series hits
either the upper or lower boundary during increment, or goes beyond
it. The attached patch fixes this behaviour, but should probably be
done a better way. The first 3 examples above will not return.

There are same bug in int8 and timestamp[tz] versions.
We also need fix for them.
=# SELECT x FROM generate_series(9223372036854775807::int8,
9223372036854775807::int8) AS a(x);
=# SELECT x FROM generate_series('infinity'::timestamp, 'infinity', '1
sec') AS a(x);
=# SELECT x FROM generate_series('infinity'::timestamptz, 'infinity',
'1 sec') AS a(x);

postgres=# SELECT x FROM generate_series(1, 9,-1) AS a(x);
postgres=# SELECT x FROM generate_series(1, 9,3) AS a(x);

They work as expected in 9.1dev.

--
Itagaki Takahiro

#19

Thom Brown

thom@linux.com

almost 15 years ago

In reply to: Itagaki Takahiro (#18)

1 attachment(s)

Re: [HACKERS] Issues with generate_series using integer boundaries

On 7 February 2011 09:04, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote:

On Fri, Feb 4, 2011 at 21:32, Thom Brown <thom@linux.com> wrote:

The issue is that generate_series will not return if the series hits
either the upper or lower boundary during increment, or goes beyond
it. The attached patch fixes this behaviour, but should probably be
done a better way. The first 3 examples above will not return.

There are same bug in int8 and timestamp[tz] versions.
We also need fix for them.
=# SELECT x FROM generate_series(9223372036854775807::int8,
9223372036854775807::int8) AS a(x);

Yes, of course, int8 functions are separate. I attach an updated
patch, although I still think there's a better way of doing this.

=# SELECT x FROM generate_series('infinity'::timestamp, 'infinity', '1
sec') AS a(x);
=# SELECT x FROM generate_series('infinity'::timestamptz, 'infinity',
'1 sec') AS a(x);

I'm not sure how this should be handled. Should there just be a check
for either kind of infinity and return an error if that's the case? I
didn't find anything wrong with using timestamp boundaries:

postgres=# SELECT x FROM generate_series('1 Jan 4713 BC
00:00:00'::timestamp, '1 Jan 4713 BC 00:00:05'::timestamp, '1 sec') AS
a(x);
x
------------------------
4713-01-01 00:00:00 BC
4713-01-01 00:00:01 BC
4713-01-01 00:00:02 BC
4713-01-01 00:00:03 BC
4713-01-01 00:00:04 BC
4713-01-01 00:00:05 BC
(6 rows)

Although whether this demonstrates a true timestamp boundary, I'm not sure.

postgres=# SELECT x FROM generate_series(1, 9,-1) AS a(x);
postgres=# SELECT x FROM generate_series(1, 9,3) AS a(x);

They work as expected in 9.1dev.

Those 2 were to demonstrate that the changes don't affect existing
functionality. My previous patch proposal (v2) caused these to return
unexpected output.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

Attachments:

generate_series_fix.v4.patchapplication/octet-stream; name=generate_series_fix.v4.patchDownload

diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index 78bac40..46d2716 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -1304,7 +1304,6 @@ generate_series_step_int4(PG_FUNCTION_ARGS)
 {
 	FuncCallContext *funcctx;
 	generate_series_fctx *fctx;
-	int32		result;
 	MemoryContext oldcontext;
 
 	/* stuff done only on the first call of the function */
@@ -1343,25 +1342,34 @@ generate_series_step_int4(PG_FUNCTION_ARGS)
 
 		funcctx->user_fctx = fctx;
 		MemoryContextSwitchTo(oldcontext);
+
+		/* ensure first value in series should exist */
+		if ((fctx->step > 0 && fctx->current + fctx->step <= fctx->finish) ||
+			(fctx->step < 0 && fctx->current + fctx->step >= fctx->finish))
+		/* send off first value in series */
+		SRF_RETURN_NEXT(funcctx, Int32GetDatum(start));
 	}
 
 	/* stuff done on every call of the function */
 	funcctx = SRF_PERCALL_SETUP();
 
 	/*
-	 * get the saved state and use current as the result for this iteration
+	 * get the saved state
 	 */
 	fctx = funcctx->user_fctx;
-	result = fctx->current;
 
-	if ((fctx->step > 0 && fctx->current <= fctx->finish) ||
-		(fctx->step < 0 && fctx->current >= fctx->finish))
+	if ((fctx->step > 0 &&
+		fctx->current < fctx->current + fctx->step &&
+		fctx->current + fctx->step <= fctx->finish) ||
+		(fctx->step < 0 &&
+		fctx->current > fctx->current + fctx->step &&
+		fctx->current + fctx->step >= fctx->finish))
 	{
 		/* increment current in preparation for next iteration */
 		fctx->current += fctx->step;
 
-		/* do when there is more left to send */
-		SRF_RETURN_NEXT(funcctx, Int32GetDatum(result));
+		/* send off current value in series */
+		SRF_RETURN_NEXT(funcctx, Int32GetDatum(fctx->current));
 	}
 	else
 		/* do when there is no more left */
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index bbab90c..4e85195 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -1337,7 +1337,6 @@ generate_series_step_int8(PG_FUNCTION_ARGS)
 {
 	FuncCallContext *funcctx;
 	generate_series_fctx *fctx;
-	int64		result;
 	MemoryContext oldcontext;
 
 	/* stuff done only on the first call of the function */
@@ -1376,25 +1375,35 @@ generate_series_step_int8(PG_FUNCTION_ARGS)
 
 		funcctx->user_fctx = fctx;
 		MemoryContextSwitchTo(oldcontext);
+
+		/* ensure first value in series should exist */
+		if ((fctx->step > 0 && fctx->current + fctx->step <= fctx->finish) ||
+			(fctx->step < 0 && fctx->current + fctx->step >= fctx->finish))
+		/* send off first value in series */
+		SRF_RETURN_NEXT(funcctx, Int64GetDatum(start));
+
 	}
 
 	/* stuff done on every call of the function */
 	funcctx = SRF_PERCALL_SETUP();
 
 	/*
-	 * get the saved state and use current as the result for this iteration
+	 * get the saved state
 	 */
 	fctx = funcctx->user_fctx;
-	result = fctx->current;
 
-	if ((fctx->step > 0 && fctx->current <= fctx->finish) ||
-		(fctx->step < 0 && fctx->current >= fctx->finish))
+	if ((fctx->step > 0 &&
+		fctx->current < fctx->current + fctx->step &&
+		fctx->current + fctx->step <= fctx->finish) ||
+		(fctx->step < 0 &&
+		fctx->current > fctx->current + fctx->step &&
+		fctx->current + fctx->step >= fctx->finish))
 	{
 		/* increment current in preparation for next iteration */
 		fctx->current += fctx->step;
 
-		/* do when there is more left to send */
-		SRF_RETURN_NEXT(funcctx, Int64GetDatum(result));
+		/* send off current value in series */
+		SRF_RETURN_NEXT(funcctx, Int64GetDatum(fctx->current));
 	}
 	else
 		/* do when there is no more left */

#20

Itagaki Takahiro

itagaki.takahiro@gmail.com

almost 15 years ago

In reply to: Thom Brown (#19)

Re: [HACKERS] Issues with generate_series using integer boundaries

On Mon, Feb 7, 2011 at 20:38, Thom Brown <thom@linux.com> wrote:

Yes, of course, int8 functions are separate. I attach an updated
patch, although I still think there's a better way of doing this.

Thanks. Please add the patch to the *current* commitfest
because it's a bugfix.
https://commitfest.postgresql.org/action/commitfest_view?id=9

I've not tested the patch yet, but if we could drop the following
line in the patch, the code could be much cleaner.
/* ensure first value in series should exist */

I'm not sure how this should be handled. Should there just be a check
for either kind of infinity and return an error if that's the case? I

Maybe so. It also works if we had infinity on timestamp overflow, but
I've not tested yet. Anyway, we need similar fix for timestamp versions.

--
Itagaki Takahiro

#21

Thom Brown

thom@linux.com

almost 15 years ago

In reply to: Itagaki Takahiro (#20)

Re: [HACKERS] Issues with generate_series using integer boundaries

On 8 February 2011 09:22, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote:

On Mon, Feb 7, 2011 at 20:38, Thom Brown <thom@linux.com> wrote:

Yes, of course, int8 functions are separate. I attach an updated
patch, although I still think there's a better way of doing this.

Thanks. Please add the patch to the *current* commitfest
because it's a bugfix.
https://commitfest.postgresql.org/action/commitfest_view?id=9

I've not tested the patch yet, but if we could drop the following
line in the patch, the code could be much cleaner.
/* ensure first value in series should exist */

I'm not sure how this should be handled. Should there just be a check
for either kind of infinity and return an error if that's the case? I

Maybe so. It also works if we had infinity on timestamp overflow, but
I've not tested yet. Anyway, we need similar fix for timestamp versions.

Well, in its current state, I expect it to get rejected, but I guess
at least it gets a better chance of being looked at. I've added it to
the commitfest now.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

#22

Andrew Dunstan

andrew@dunslane.net

almost 15 years ago

In reply to: Thom Brown (#19)

Re: [HACKERS] Issues with generate_series using integer boundaries

On 02/07/2011 06:38 AM, Thom Brown wrote:

On 7 February 2011 09:04, Itagaki Takahiro<itagaki.takahiro@gmail.com> wrote:

On Fri, Feb 4, 2011 at 21:32, Thom Brown<thom@linux.com> wrote:

The issue is that generate_series will not return if the series hits
either the upper or lower boundary during increment, or goes beyond
it. The attached patch fixes this behaviour, but should probably be
done a better way. The first 3 examples above will not return.

There are same bug in int8 and timestamp[tz] versions.
We also need fix for them.
=# SELECT x FROM generate_series(9223372036854775807::int8,
9223372036854775807::int8) AS a(x);

Yes, of course, int8 functions are separate. I attach an updated
patch, although I still think there's a better way of doing this.

=# SELECT x FROM generate_series('infinity'::timestamp, 'infinity', '1
sec') AS a(x);
=# SELECT x FROM generate_series('infinity'::timestamptz, 'infinity',
'1 sec') AS a(x);

I'm not sure how this should be handled. Should there just be a check
for either kind of infinity and return an error if that's the case? I
didn't find anything wrong with using timestamp boundaries:

postgres=# SELECT x FROM generate_series('1 Jan 4713 BC
00:00:00'::timestamp, '1 Jan 4713 BC 00:00:05'::timestamp, '1 sec') AS
a(x);
x
------------------------
4713-01-01 00:00:00 BC
4713-01-01 00:00:01 BC
4713-01-01 00:00:02 BC
4713-01-01 00:00:03 BC
4713-01-01 00:00:04 BC
4713-01-01 00:00:05 BC
(6 rows)

Although whether this demonstrates a true timestamp boundary, I'm not sure.

postgres=# SELECT x FROM generate_series(1, 9,-1) AS a(x);
postgres=# SELECT x FROM generate_series(1, 9,3) AS a(x);

They work as expected in 9.1dev.

Those 2 were to demonstrate that the changes don't affect existing
functionality. My previous patch proposal (v2) caused these to return
unexpected output.

Isn't this all really a bug fix that should be backpatched, rather than
a commitfest item?

cheers

andrew

#23

Itagaki Takahiro

itagaki.takahiro@gmail.com

almost 15 years ago

In reply to: Andrew Dunstan (#22)

Re: [HACKERS] Issues with generate_series using integer boundaries

On Wed, Feb 9, 2011 at 10:17, Andrew Dunstan <andrew@dunslane.net> wrote:

Isn't this all really a bug fix that should be backpatched, rather than a
commitfest item?

Sure, but we don't have any bug trackers...

--
Itagaki Takahiro

#24

Andrew Dunstan

andrew@dunslane.net

almost 15 years ago

In reply to: Itagaki Takahiro (#23)

Re: [HACKERS] Issues with generate_series using integer boundaries

On 02/08/2011 08:19 PM, Itagaki Takahiro wrote:

On Wed, Feb 9, 2011 at 10:17, Andrew Dunstan<andrew@dunslane.net> wrote:

Isn't this all really a bug fix that should be backpatched, rather than a
commitfest item?

Sure, but we don't have any bug trackers...

Quite right, but the commitfest manager isn't meant to be a substitute
for one. Bug fixes aren't subject to the same restrictions of feature
changes.

cheers

andrew

#25

Glenn Maynard

glenn@zewt.org

almost 15 years ago

In reply to: Tom Lane (#3)

Re: Issues with generate_series using integer boundaries

On Mon, Jan 31, 2011 at 7:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thom Brown <thom@linux.com> writes:

Actually, those lower bound errors aren't related to generate_series,
but I'd still like to know why -2147483648::int4 is out of range.

:: binds tighter than - (and everything else too). Write
(-2147483648)::int4 instead.

That's surprising enough that it might be worth generating a warning if the
typecasting operator is used on a mathmatical expression--"a -
b::int4"--rather than a single value (eg. "(a - b)::int4" or "f()::int4").
I don't know the grammar to know if that fits.

--
Glenn Maynard

#26

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Andrew Dunstan (#24)

Re: [HACKERS] Issues with generate_series using integer boundaries

On Tue, Feb 8, 2011 at 8:30 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

Quite right, but the commitfest manager isn't meant to be a substitute for
one. Bug fixes aren't subject to the same restrictions of feature changes.

Another option would be to add this here:

http://wiki.postgresql.org/wiki/PostgreSQL_9.1_Open_Items

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#27

Thom Brown

thom@linux.com

almost 15 years ago

In reply to: Robert Haas (#26)

Re: [HACKERS] Issues with generate_series using integer boundaries

On 9 February 2011 02:11, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Feb 8, 2011 at 8:30 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

Quite right, but the commitfest manager isn't meant to be a substitute for
one. Bug fixes aren't subject to the same restrictions of feature changes.

Another option would be to add this here:

http://wiki.postgresql.org/wiki/PostgreSQL_9.1_Open_Items

I've removed it from the commitfest because it really doesn't belong
there, and I've added it to the open items list.

Thanks

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

#28

Robert Haas

robertmhaas@gmail.com

over 14 years ago

In reply to: Thom Brown (#27)

1 attachment(s)

Re: [HACKERS] Issues with generate_series using integer boundaries

On Wed, Feb 9, 2011 at 4:50 AM, Thom Brown <thom@linux.com> wrote:

On 9 February 2011 02:11, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Feb 8, 2011 at 8:30 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

Quite right, but the commitfest manager isn't meant to be a substitute for
one. Bug fixes aren't subject to the same restrictions of feature changes.

Another option would be to add this here:

http://wiki.postgresql.org/wiki/PostgreSQL_9.1_Open_Items

I've removed it from the commitfest because it really doesn't belong
there, and I've added it to the open items list.

So, I finally got around to look at this, and I think there is a
simpler solution. When an overflow occurs while calculating the next
value, that just means that the value we're about to return is the
last one that should be generated. So we just need to frob the
context state so that the next call will decide we're done. There are
any of number of ways to do that; I just picked what looked like the
easiest one.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

generate-series-overflow.patchapplication/octet-stream; name=generate-series-overflow.patchDownload

diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index 019fcaa..a367421 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -1382,6 +1382,10 @@ generate_series_step_int4(PG_FUNCTION_ARGS)
 		/* increment current in preparation for next iteration */
 		fctx->current += fctx->step;
 
+		/* if next-value computation overflows, this is the final result */
+		if (SAMESIGN(result, fctx->step) && !SAMESIGN(result, fctx->current))
+			fctx->step = 0;
+
 		/* do when there is more left to send */
 		SRF_RETURN_NEXT(funcctx, Int32GetDatum(result));
 	}
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index 4519164..2e74482 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -1410,6 +1410,10 @@ generate_series_step_int8(PG_FUNCTION_ARGS)
 		/* increment current in preparation for next iteration */
 		fctx->current += fctx->step;
 
+		/* if next-value computation overflows, this is the final result */
+		if (SAMESIGN(result, fctx->step) && !SAMESIGN(result, fctx->current))
+			fctx->step = 0;
+
 		/* do when there is more left to send */
 		SRF_RETURN_NEXT(funcctx, Int64GetDatum(result));
 	}

#29

Thom Brown

thom@linux.com

over 14 years ago

In reply to: Robert Haas (#28)

Re: [HACKERS] Issues with generate_series using integer boundaries

On 17 June 2011 04:44, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Feb 9, 2011 at 4:50 AM, Thom Brown <thom@linux.com> wrote:

On 9 February 2011 02:11, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Feb 8, 2011 at 8:30 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

Quite right, but the commitfest manager isn't meant to be a substitute for
one. Bug fixes aren't subject to the same restrictions of feature changes.

Another option would be to add this here:

http://wiki.postgresql.org/wiki/PostgreSQL_9.1_Open_Items

I've removed it from the commitfest because it really doesn't belong
there, and I've added it to the open items list.

So, I finally got around to look at this, and I think there is a
simpler solution. When an overflow occurs while calculating the next
value, that just means that the value we're about to return is the
last one that should be generated. So we just need to frob the
context state so that the next call will decide we're done. There are
any of number of ways to do that; I just picked what looked like the
easiest one.

I knew there'd be a much simpler way of solving this. Works for me.

Thanks Robert.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#30

David Johnston

polobo@yahoo.com

over 14 years ago

In reply to: Robert Haas (#28)

Re: [HACKERS] Issues with generate_series using integer boundaries

On Wed, Feb 9, 2011 at 4:50 AM, Thom Brown <thom@linux.com> wrote:

On 9 February 2011 02:11, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Feb 8, 2011 at 8:30 PM, Andrew Dunstan

<andrew@dunslane.net> wrote:

Quite right, but the commitfest manager isn't meant to be a
substitute for one. Bug fixes aren't subject to the same restrictions

feature changes.

Another option would be to add this here:

http://wiki.postgresql.org/wiki/PostgreSQL_9.1_Open_Items

I've removed it from the commitfest because it really doesn't belong
there, and I've added it to the open items list.

So, I finally got around to look at this, and I think there is a simpler

solution.

When an overflow occurs while calculating the next value, that just means
that the value we're about to return is the last one that should be

generated.

So we just need to frob the context state so that the next call will

decide

we're done. There are any of number of ways to do that; I just picked

what

looked like the easiest one.

Tangential comment but have you considered emitting a warning (and/or log
entry) when you are 10,000-50,000 away from issuing the last available
number in the sequence so that some recognition exists that any code
depending on the sequence is going to fail soon?

Also, during sequence creation you know the integer type being used so that
maximum value is known and an overflow should not need to come into play (I
guess the trade-off is the implicit "try-catch" [or whatever mechanism C
uses] performance hit versus the need to store another full integer in the
data structure).

You could also give access to the "warning threshold" value so that the
developer can change it to whatever value is desired (with a meaningful
default of course).

David J.

#31

Robert Haas

robertmhaas@gmail.com

over 14 years ago

In reply to: David Johnston (#30)

Re: [HACKERS] Issues with generate_series using integer boundaries

On Fri, Jun 17, 2011 at 10:39 AM, David Johnston <polobo@yahoo.com> wrote:

Tangential comment but have you considered emitting a warning (and/or log
entry) when you are 10,000-50,000 away from issuing the last available
number in the sequence so that some recognition exists that any code
depending on the sequence is going to fail soon?

Also, during sequence creation you know the integer type being used so that
maximum value is known and an overflow should not need to come into play (I
guess the trade-off is the implicit "try-catch" [or whatever mechanism C
uses] performance hit versus the need to store another full integer in the
data structure).

You could also give access to the "warning threshold" value so that the
developer can change it to whatever value is desired (with a meaningful
default of course).

There are already tools out there that can monitor this stuff - for
example, check_postgres.pl.

http://bucardo.org/check_postgres/check_postgres.pl.html#sequence

We tend to avoid emitting warnings for this kind of thing because they
can consume vast amounts of disk space, and a lot of times no one's
looking at them anyway.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#32

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Robert Haas (#28)

Re: [HACKERS] Issues with generate_series using integer boundaries

Robert Haas <robertmhaas@gmail.com> writes:

So, I finally got around to look at this, and I think there is a
simpler solution. When an overflow occurs while calculating the next
value, that just means that the value we're about to return is the
last one that should be generated. So we just need to frob the
context state so that the next call will decide we're done. There are
any of number of ways to do that; I just picked what looked like the
easiest one.

+1 for this solution.

BTW, there was some mention of changing the timestamp versions of
generate_series as well, but right offhand I'm not convinced that
those need any change. I think you'll get overflow detection there
automatically from the functions being used --- and if not, it's a
bug in those functions, not in generate_series.

regards, tom lane

#33

Robert Haas

robertmhaas@gmail.com

over 14 years ago

In reply to: Tom Lane (#32)

Re: [HACKERS] Issues with generate_series using integer boundaries

On Fri, Jun 17, 2011 at 2:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

So, I finally got around to look at this, and I think there is a
simpler solution. When an overflow occurs while calculating the next
value, that just means that the value we're about to return is the
last one that should be generated. So we just need to frob the
context state so that the next call will decide we're done. There are
any of number of ways to do that; I just picked what looked like the
easiest one.

+1 for this solution.

BTW, there was some mention of changing the timestamp versions of
generate_series as well, but right offhand I'm not convinced that
those need any change. I think you'll get overflow detection there
automatically from the functions being used --- and if not, it's a
bug in those functions, not in generate_series.

Maybe not, because those functions probably throw an error if an
overflow is detected, and that's not really correct. By definition,
the second generate_series() is the point at which we should stop
generating, and that point has to be within the range of the
underlying data type, by definition. So if an overflow occurs, that's
just another way of saying that we've certainly gone past the stop
point and needn't generate anything further. The error is an artifact
of the method we've used to generate the next point.

I'm not sure how much energy it's worth expending on that case. Using
really large dates may be less common that using values that strain
the range of a 4-byte integer. But it might at least be worth a TODO.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#34

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Robert Haas (#33)

Re: [HACKERS] Issues with generate_series using integer boundaries

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Jun 17, 2011 at 2:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

BTW, there was some mention of changing the timestamp versions of
generate_series as well, but right offhand I'm not convinced that
those need any change. �I think you'll get overflow detection there
automatically from the functions being used --- and if not, it's a
bug in those functions, not in generate_series.

Maybe not, because those functions probably throw an error if an
overflow is detected, and that's not really correct.

Oh, good point.

I'm not sure how much energy it's worth expending on that case. Using
really large dates may be less common that using values that strain
the range of a 4-byte integer. But it might at least be worth a TODO.

Yeah, I can't get excited about it either; restructuring that code
enough to avoid an error seems like a lot more work than the case is
worth. Maybe someday somebody will hit the case in practice and then
be motivated to work on it, but in the meantime ...

regards, tom lane