SELECT '(1, nan, 3)'::cube;

Started by Robert Brewerover 15 years ago6 messagesbugs

fumanchu@aminus.org

over 15 years ago

Hi folks,

I'm working on a hypercube implementation in Postgres using contrib/cube

and need to insert 80,000 rows in one go from Python. Doing so with
INSERT, even multiple statements in one call, is pretty slow. I've been
investigating if using COPY is faster. It is, but there's a problem:
some of the cubes should include NaN. Writing:

INSERT INTO foo (coords) VALUES (cube(ARRAY[1, 'nan', 3]::float[]));

...works fine. But I can't find the magic incantation to do the same
thing using COPY FROM. Indeed, even SELECT chokes on this:

# SELECT '(1, nan, 3)'::cube;
ERROR: bad cube representation
LINE 1: select '(1, nan, 3)'::cube;
^
DETAIL: syntax error at or near "n"

Note that the 'cube' datatype also doesn't have a 'binary' repr, and
COPY FROM exhibits the same problem with CSV as it does with text.

I actually found out I could trick the parser into storing 'Inf' by just

giving it a very large number, like '(1, 1e+309, 3)::cube', but I can't
figure out an equivalent trick for NaN.

Also note that 'nan' and 'inf' are not a problem for float[]

# select '{1, nan, inf}'::float[];
float8
-----------
{1,NaN,Infinity}
(1 row)

...but 1e+309 raises an error for float (where it does not for cube):

# select '{1, 1e+309, 3}'::float[];
ERROR: "1e+309" is out of range for type double precision
LINE 1: select '{1, 1e+309, 3}'::float[];

So:

1. It would be good to have cube parsing in PG behave more like float,

allowing 'nan' and 'inf', and once those are working, deny using large
numbers to fake the parser into returning Infinity.
2. If anyone can think of a way to trick the cube parser in the short
term into returning NaN, I'd love to hear it.
3. If anyone understands cubeparse.c better than I do and can shoot me

a quick patch, I'd appreciate it.

Robert Brewer
fumanchu@aminus.org

Tom Lane

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Robert Brewer (#1)

Re: SELECT '(1, nan, 3)'::cube;

"Robert Brewer" <fumanchu@aminus.org> writes:

I'm working on a hypercube implementation in Postgres using contrib/cube

and need to insert 80,000 rows in one go from Python. Doing so with
INSERT, even multiple statements in one call, is pretty slow. I've been
investigating if using COPY is faster. It is, but there's a problem:
some of the cubes should include NaN. Writing:

INSERT INTO foo (coords) VALUES (cube(ARRAY[1, 'nan', 3]::float[]));

...works fine. But I can't find the magic incantation to do the same
thing using COPY FROM.

cube_in doesn't accept either 'nan' or 'inf'. It's probably a bug that
you can get those things into a cube value via cube(float8[]). Or we
could see about upgrading the datatype to allow them, but that would
require looking at all its operations not just cube_in. It seems pretty
likely to me that there are some other things in that module that won't
behave sanely with NaN, because the original author clearly never
thought about it.

I'd suggest rethinking your design to avoid needing NaN in a cube.

regards, tom lane

Robert Brewer

fumanchu@aminus.org

over 15 years ago

In reply to: Tom Lane (#2)

Re: SELECT '(1, nan, 3)'::cube;

Tom Lane wrote:

"Robert Brewer" <fumanchu@aminus.org> writes:

I'm working on a hypercube implementation in Postgres using

contrib/cube

and need to insert 80,000 rows in one go from Python. Doing so with
INSERT, even multiple statements in one call, is pretty slow. I've

been

investigating if using COPY is faster. It is, but there's a problem:

some of the cubes should include NaN. Writing:

INSERT INTO foo (coords) VALUES (cube(ARRAY[1, 'nan',

3]::float[]));

...works fine. But I can't find the magic incantation to do the same

thing using COPY FROM.

cube_in doesn't accept either 'nan' or 'inf'. It's probably a bug

that

you can get those things into a cube value via cube(float8[]). Or we
could see about upgrading the datatype to allow them, but that would
require looking at all its operations not just cube_in. It seems

pretty

likely to me that there are some other things in that module that

won't

behave sanely with NaN, because the original author clearly never
thought about it.

I'd suggest rethinking your design to avoid needing NaN in a cube.

Thanks Tom,

I could, of course, use a sentinel value like -1 or even -Inf to mean
"indeterminate", but I'd like future versions of this app (wink) to be
able to use qnan's as IEEE 754 intended. Note that cube doesn't take
NULL either. It may be surprising and underspecified, but using NaN and
Inf in other cube operations (including gist indices) does indeed work
quite well--probably because they are, after all, just floats. I believe
it's a bug to handle only some subset of values of an extremely
well-known and well-specified datatype like float, but of course I
understand if contrib modules aren't high priority.

Robert Brewer
fumanchu@aminus.org

Tom Lane

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Robert Brewer (#3)

Re: SELECT '(1, nan, 3)'::cube;

"Robert Brewer" <fumanchu@aminus.org> writes:

Tom Lane wrote:

I'd suggest rethinking your design to avoid needing NaN in a cube.

I could, of course, use a sentinel value like -1 or even -Inf to mean
"indeterminate", but I'd like future versions of this app (wink) to be
able to use qnan's as IEEE 754 intended. Note that cube doesn't take
NULL either. It may be surprising and underspecified, but using NaN and
Inf in other cube operations (including gist indices) does indeed work
quite well--probably because they are, after all, just floats. I believe
it's a bug to handle only some subset of values of an extremely
well-known and well-specified datatype like float, but of course I
understand if contrib modules aren't high priority.

Well, this is the sort of thing that'll only get fixed when someone who
actually uses the datatype is motivated to improve it. IOW, if you
submitted a patch it'd likely get accepted.

regards, tom lane

bricklen

bricklen@gmail.com

over 15 years ago

In reply to: Robert Brewer (#1)

Re: SELECT '(1, nan, 3)'::cube;

On Tue, Mar 15, 2011 at 9:08 AM, Robert Brewer <fumanchu@aminus.org> wrote:

I'm working on a hypercube implementation in Postgres using contrib/cube

and need to insert 80,000 rows in one go from Python. Doing so with
INSERT, even multiple statements in one call, is pretty slow. I've been
investigating if using COPY is faster.

When you say "multiple statements", do you mean

INSERT INTO foo (coords) VALUES
(cube(ARRAY[1, 'nan', 3]::float[])),
(cube(ARRAY[2, 'nan', 4]::float[])),
(cube(ARRAY[3, 'nan', 5]::float[])),
(cube(ARRAY[4, 'nan', 6]::float[]));

I was going to suggest trying that method, but if you already have
then please ignore me!

Robert Brewer

fumanchu@aminus.org

over 15 years ago

In reply to: bricklen (#5)

Re: SELECT '(1, nan, 3)'::cube;

bricklen wrote:

On Tue, Mar 15, 2011 at 9:08 AM, Robert Brewer <fumanchu@aminus.org>
wrote:

I'm working on a hypercube implementation in Postgres using

contrib/cube

and need to insert 80,000 rows in one go from Python. Doing so with
INSERT, even multiple statements in one call, is pretty slow. I've

been

investigating if using COPY is faster.

When you say "multiple statements", do you mean

INSERT INTO foo (coords) VALUES
(cube(ARRAY[1, 'nan', 3]::float[])),
(cube(ARRAY[2, 'nan', 4]::float[])),
(cube(ARRAY[3, 'nan', 5]::float[])),
(cube(ARRAY[4, 'nan', 6]::float[]));

I was going to suggest trying that method, but if you already have
then please ignore me!

Yes, I'm using the above now. Looking for something faster. I'll probably settle on using -Inf for the short term so I can use COPY.

Bob