bigint out of range

Started by Daulat Ramalmost 7 years ago16 messagesgeneral

Daulat.Ram@exponential.com

almost 7 years ago

Hello team ,
We are getting ERROR: bigint out of range. Please help on this.

ERROR: bigint out of range
kbdb=# INSERT INTO kb_dar_ran_url_check (url_hash,stat_date,topic_id,site_id,url,status,user_comments,review_comments) VALUES (72894677781133866997924561390146294513,E'19-04-2019',32793,1035,E'https://bikez.com/search/index.php',1,NULL,NULL);

ERROR: bigint out of range

Table structure is :

Thanks,
Daulat

Geoff Winkless

pgsqladmin@geoff.dj

almost 7 years ago

In reply to: Daulat Ram (#1)

Re: bigint out of range

On Thu, 16 May 2019 at 16:31, Daulat Ram <Daulat.Ram@exponential.com> wrote:

Hello team ,

We are getting ERROR: bigint out of range. Please help on this.

Bigint is -9223372036854775808 to 9223372036854775807.

https://www.postgresql.org/docs/current/datatype-numeric.html

David G. Johnston

david.g.johnston@gmail.com

almost 7 years ago

In reply to: Daulat Ram (#1)

Re: bigint out of range

On Thu, May 16, 2019 at 8:31 AM Daulat Ram <Daulat.Ram@exponential.com>
wrote:

url_hash | bigint | | not null |

Change the type of url_hash; make it text instead of bigint.

As a bonus:

Use text instead of arbitrary varchar(4000) fields and add, e.g., check
(length(url) < 4000) or something better.

Needing 38 digit integers for "id" fields seems odd.

Not sure what kind of date would need 128 characters to represent; ideally
you'd just make that a "date" typed field and normalize on input.

David J.

Tony Shelver

tshelver@gmail.com

almost 7 years ago

In reply to: Daulat Ram (#1)

Re: bigint out of range

Adding to what David said, I don't see much point of having a bigint
status. Usually status attributes are a fixed set of values that can be
checked programmatically.

Show quoted text

Peter J. Holzer

hjp-pgsql@hjp.at

almost 7 years ago

In reply to: David G. Johnston (#3)

Re: bigint out of range

On 2019-05-16 08:48:51 -0700, David G. Johnston wrote:

On Thu, May 16, 2019 at 8:31 AM Daulat Ram <Daulat.Ram@exponential.com> wrote:

url_hash | bigint | | not null |

Change the type of url_hash; make it text instead of bigint.

Or numeric(38, 0). I think it isn't coincidence that he tries to store
a 38-digit number in it.

As a bonus:

Use text instead of arbitrary varchar(4000) fields and add, e.g., check (length
(url) < 4000) or something better.

Needing 38 digit integers for "id" fields seems odd.

I guess the schema was copied from Oracle. In Oracle, all numbers are
really 38 digit decimal floating point numbers and the limit for
varchar2 is 4000.

David G. Johnston

david.g.johnston@gmail.com

almost 7 years ago

In reply to: Peter J. Holzer (#5)

Re: bigint out of range

On Saturday, May 18, 2019, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

On 2019-05-16 08:48:51 -0700, David G. Johnston wrote:

On Thu, May 16, 2019 at 8:31 AM Daulat Ram <Daulat.Ram@exponential.com>

wrote:

url_hash | bigint | | not null |

Change the type of url_hash; make it text instead of bigint.

Or numeric(38, 0). I think it isn't coincidence that he tries to store
a 38-digit number in it.

You don’t perform math on a hash thus its not a number no matter that it
may contain only digits.

David J.

Peter J. Holzer

hjp-pgsql@hjp.at

almost 7 years ago

In reply to: David G. Johnston (#6)

Re: bigint out of range

On 2019-05-18 10:49:53 -0700, David G. Johnston wrote:

On Saturday, May 18, 2019, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

On 2019-05-16 08:48:51 -0700, David G. Johnston wrote:

On Thu, May 16, 2019 at 8:31 AM Daulat Ram <Daulat.Ram@exponential.com>

wrote:

url_hash | bigint | | not null |

Change the type of url_hash; make it text instead of bigint.

Or numeric(38, 0). I think it isn't coincidence that he tries to store
a 38-digit number in it.

You don’t perform math on a hash

That's not generally true. Hashes are used for further computation for
example in hash tables or in cryptography.

thus its not a number

This is just silly. All hash functions I have ever encountered compute a
single fixed size integer from a stream of integers. The result may be
larger than a machine word, in which case the representation in C (or a
similar low level language) may be an array of words (or bytes), but
it's still an integer.

Ron

ronljohnsonjr@gmail.com

almost 7 years ago

In reply to: Peter J. Holzer (#7)

Re: bigint out of range

On 5/18/19 2:27 PM, Peter J. Holzer wrote:

On 2019-05-18 10:49:53 -0700, David G. Johnston wrote:

On Saturday, May 18, 2019, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

On 2019-05-16 08:48:51 -0700, David G. Johnston wrote:

On Thu, May 16, 2019 at 8:31 AM Daulat Ram <Daulat.Ram@exponential.com>

wrote:

url_hash | bigint | | not null |

Change the type of url_hash; make it text instead of bigint.

Or numeric(38, 0). I think it isn't coincidence that he tries to store
a 38-digit number in it.

You don’t perform math on a hash

That's not generally true. Hashes are used for further computation for
example in hash tables or in cryptography.

How is it "using math" to use a hash key in a hash lookup table?

--
Angular momentum makes the world go 'round.

Peter J. Holzer

hjp-pgsql@hjp.at

almost 7 years ago

In reply to: Ron (#8)

Re: bigint out of range

On 2019-05-18 15:19:22 -0500, Ron wrote:

On 5/18/19 2:27 PM, Peter J. Holzer wrote:

On 2019-05-18 10:49:53 -0700, David G. Johnston wrote:

You don’t perform math on a hash

That's not generally true. Hashes are used for further computation for
example in hash tables or in cryptography.

How is it "using math" to use a hash key in a hash lookup table?

hash modulo table size.

#10

Ron

ronljohnsonjr@gmail.com

almost 7 years ago

In reply to: Peter J. Holzer (#9)

Re: bigint out of range

On 5/18/19 3:49 PM, Peter J. Holzer wrote:

On 2019-05-18 15:19:22 -0500, Ron wrote:

On 5/18/19 2:27 PM, Peter J. Holzer wrote:

On 2019-05-18 10:49:53 -0700, David G. Johnston wrote:

You don’t perform math on a hash

That's not generally true. Hashes are used for further computation for
example in hash tables or in cryptography.

How is it "using math" to use a hash key in a hash lookup table?

hash modulo table size.

I've seen that used when the *tablespace* is pre-allocated, and you hash
modulo the tablespace *page number*. (Yes, performance tanks when you start
filling up pages.) How do you hash on the (ever growing) table size?

--
Angular momentum makes the world go 'round.

#11

Peter J. Holzer

hjp-pgsql@hjp.at

almost 7 years ago

In reply to: Ron (#10)

Re: bigint out of range

On 2019-05-18 17:14:59 -0500, Ron wrote:

On 5/18/19 3:49 PM, Peter J. Holzer wrote:

On 2019-05-18 15:19:22 -0500, Ron wrote:

On 5/18/19 2:27 PM, Peter J. Holzer wrote:

On 2019-05-18 10:49:53 -0700, David G. Johnston wrote:

You don’t perform math on a hash

That's not generally true. Hashes are used for further computation for
example in hash tables or in cryptography.

How is it "using math" to use a hash key in a hash lookup table?

hash modulo table size.

I've seen that used when the tablespace is pre-allocated, and you hash modulo
the tablespace page number. (Yes, performance tanks when you start filling up
pages.) How do you hash on the (ever growing) table size?

The hash function returns a number in a range much larger than the
possible number of buckets. 64 bits is a good choice today.

To determine the bucket you need to reduce this number to something in
the range [0, nr_buckets). This is where modulo comes in:

i = h % nr_buckets

If the the table fills up, you increase nr_buckets, reallocate and
rehash all entries.

(If nr_buckets is a power of two, the modulo operation can be
efficiently implemented by using bitwise and)

#12

Ron

ronljohnsonjr@gmail.com

almost 7 years ago

In reply to: Peter J. Holzer (#11)

Re: bigint out of range

On 5/18/19 5:39 PM, Peter J. Holzer wrote:

On 2019-05-18 17:14:59 -0500, Ron wrote:

On 5/18/19 3:49 PM, Peter J. Holzer wrote:

On 2019-05-18 15:19:22 -0500, Ron wrote:

On 5/18/19 2:27 PM, Peter J. Holzer wrote:

On 2019-05-18 10:49:53 -0700, David G. Johnston wrote:

You don’t perform math on a hash

That's not generally true. Hashes are used for further computation for
example in hash tables or in cryptography.

How is it "using math" to use a hash key in a hash lookup table?

hash modulo table size.

I've seen that used when the tablespace is pre-allocated, and you hash modulo
the tablespace page number. (Yes, performance tanks when you start filling up
pages.) How do you hash on the (ever growing) table size?

The hash function returns a number in a range much larger than the
possible number of buckets. 64 bits is a good choice today.

To determine the bucket you need to reduce this number to something in
the range [0, nr_buckets). This is where modulo comes in:

i = h % nr_buckets

If the the table fills up, you increase nr_buckets, reallocate and
rehash all entries.

Ouch. Response time on a big table would take a serious hit if that rehash
happened in the middle of the day on a big OLTP system. Even worse if it
were a 24x365 system, because you couldn't schedule an enlargement/rehash
during a down period.

--
Angular momentum makes the world go 'round.

#13

Peter J. Holzer

hjp-pgsql@hjp.at

almost 7 years ago

In reply to: Ron (#12)

Re: bigint out of range

On 2019-05-18 19:16:19 -0500, Ron wrote:

On 5/18/19 5:39 PM, Peter J. Holzer wrote:

On 2019-05-18 17:14:59 -0500, Ron wrote:

On 5/18/19 3:49 PM, Peter J. Holzer wrote:
On 2019-05-18 15:19:22 -0500, Ron wrote:
On 5/18/19 2:27 PM, Peter J. Holzer wrote:
On 2019-05-18 10:49:53 -0700, David G. Johnston wrote:

You don’t perform math on a hash

That's not generally true. Hashes are used for further computation for
example in hash tables or in cryptography.

How is it "using math" to use a hash key in a hash lookup table?

hash modulo table size.

I've seen that used when the tablespace is pre-allocated, and you hash modulo
the tablespace page number. (Yes, performance tanks when you start filling up
pages.) How do you hash on the (ever growing) table size?

The hash function returns a number in a range much larger than the
possible number of buckets. 64 bits is a good choice today.

To determine the bucket you need to reduce this number to something in
the range [0, nr_buckets). This is where modulo comes in:

i = h % nr_buckets

If the the table fills up, you increase nr_buckets, reallocate and
rehash all entries.

Ouch. Response time on a big table would take a serious hit if that rehash
happened in the middle of the day on a big OLTP system.

So that might be a reason not to use hash indexes tables where the worst
case insert time is a concern (the average insert time is still O(1)).

But please be aware that I answered your question 'How is it "using
math" to use a hash key?', not 'How are hash indexes in PostgreSQL
implemented?'. So my answer covered the most simple and generic
implementation.

You can split buckets lazily, and in fact PostgreSQL does this. I just
looked at the code briefly, but I don't really understand how it works.
Guess I would have to read Margo Seltzer's paper paper for that.

Still, even with that optimization (or tradeoff - it may make the lookup
slower) the README notes that splits are expensive.

#14

Ron

ronljohnsonjr@gmail.com

almost 7 years ago

In reply to: Peter J. Holzer (#13)

Re: bigint out of range

On 5/19/19 5:43 AM, Peter J. Holzer wrote:
[snip]

But please be aware that I answered your question 'How is it "using
math" to use a hash key?', not 'How are hash indexes in PostgreSQL
implemented?'. So my answer covered the most simple and generic
implementation.

I understand.

--
Angular momentum makes the world go 'round.

#15

Stephen Frost

sfrost@snowman.net

almost 7 years ago

In reply to: David G. Johnston (#3)

Re: bigint out of range

Greetings,

* David G. Johnston (david.g.johnston@gmail.com) wrote:

On Thu, May 16, 2019 at 8:31 AM Daulat Ram <Daulat.Ram@exponential.com>
wrote:

url_hash | bigint | | not null |

Change the type of url_hash; make it text instead of bigint.

Making it text wastes a bunch of space actually, since it's really a
binary value. I tend to recommend using a bytea for storing hashes.
Given that hashes are really fixed width, it'd be nice if we had a set
of proper datatypes for them, perhaps not so much to avoid the 1-byte
overhead from storing as a variable-length bytea, but because we could
then avoid having a 7-byte hole due to padding if the hash is followed
by a bigint or such.

Thanks,

Stephen

#16

Stephen Frost

sfrost@snowman.net

almost 7 years ago

In reply to: Peter J. Holzer (#13)

Re: bigint out of range

Greetings,

* Peter J. Holzer (hjp-pgsql@hjp.at) wrote:

On 2019-05-18 19:16:19 -0500, Ron wrote:

If the the table fills up, you increase nr_buckets, reallocate and
rehash all entries.

Ouch. Response time on a big table would take a serious hit if that rehash
happened in the middle of the day on a big OLTP system.

As noted below, that isn't actually how it works with PG's hash indexes.

So that might be a reason not to use hash indexes tables where the worst
case insert time is a concern (the average insert time is still O(1)).

The worst-case insert time is certainly worse than the average but it's
not nearly as bad as being imagined here.

You can split buckets lazily, and in fact PostgreSQL does this. I just
looked at the code briefly, but I don't really understand how it works.
Guess I would have to read Margo Seltzer's paper paper for that.

Still, even with that optimization (or tradeoff - it may make the lookup
slower) the README notes that splits are expensive.

There's multiple different levels, really, from "best case" where the
new item can just be placed directly on to a page that has free space,
to "slightly worse case" where an overflow page has to be added, to
"even worse case" where a prior page split has to be finished, to
"probably worst case" where a full page split has to happen. There
might even be some pathological cases where a prior page split has to be
finished and then an overflow page has to be added and then another page
split has to be done, or some such. Even so, for equality-based
lookups (where the data set has few or zero duplicates), hash indexes
can work quite well, but it's of course good to understand that inserts
aren't always going to be exactly the same speed every time.

Thanks,

Stephen