BUG #13846: INSERT ON CONFLICT consumes sequencers on conflicts
The following bug has been logged on the website:
Bug reference: 13846
Logged by: Paul Hester
Email address: paul@salesintel.com
PostgreSQL version: 9.5rc1
Operating system: Windows 10 Pro
Description:
When using an INSERT statement with an ON CONFLICT clause, if there is a
conflict, and the table being inserted into has a column defaulted to
nextval('seq'), the sequencer 'seq' is always incremented. This can quickly
and artificially consume all values of the sequencer; it behaves as if
conflict detection happens after inserts are attempted, rather than before.
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On 2016-01-05 15:02:27 +0000, paul@salesintel.com wrote:
The following bug has been logged on the website:
Bug reference: 13846
Logged by: Paul Hester
Email address: paul@salesintel.com
PostgreSQL version: 9.5rc1
Operating system: Windows 10 Pro
Description:When using an INSERT statement with an ON CONFLICT clause, if there is a
conflict, and the table being inserted into has a column defaulted to
nextval('seq'), the sequencer 'seq' is always incremented. This can quickly
and artificially consume all values of the sequencer; it behaves as if
conflict detection happens after inserts are attempted, rather than before.
Yes. That's by design. You can't reliably do conflict detection before
evaluating column default values.
Andres Freund
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
If none of the columns with defaults are used to detect conflicts, why must they be evaluated before conflict detection in order to reliably detect a conflict?
Lots of systems categorize information into ‘organizational’ data that may be hierarchical in nature and fairly static, in relation to the other ‘operational’ kind of data, which is usually generated from events; those events being organized by the former for batching and reporting purposes, etc. It is often the case UPSERTS are most useful when deriving the organizational data from the operational data, meaning a very typical usage of an UPSERT would have many more updates than inserts, in direct correlation to the number of events being processed. Its not unreasonable over a relatively short period of time (within the overall lifetime of a particular system) to process billions of events, meaning SERIAL typed columns used as surrogate keys to the natural keys of organizational data could exhaust their sequencers for no good reason, and SMALLSERIAL just wouldn’t be usable, BIGSERIAL could work, but they’re, uhm.. BIG.
There should be a way to not evaluate defaulted columns not used in the conflict_target, so that SERIAL typed columns not used for conflict detection don’t unnecessary exhaust their sequencers…. (I think that was like a quadruple-negative sentence there…, sorry about that)
Regards,
Paul
From: Andres Freund
Sent: Tuesday, January 5, 2016 9:13 AM
To: paul@salesintel.com
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #13846: INSERT ON CONFLICT consumes sequencers onconflicts
On 2016-01-05 15:02:27 +0000, paul@salesintel.com wrote:
The following bug has been logged on the website:
Bug reference: 13846
Logged by: Paul Hester
Email address: paul@salesintel.com
PostgreSQL version: 9.5rc1
Operating system: Windows 10 Pro
Description:When using an INSERT statement with an ON CONFLICT clause, if there is a
conflict, and the table being inserted into has a column defaulted to
nextval('seq'), the sequencer 'seq' is always incremented. This can quickly
and artificially consume all values of the sequencer; it behaves as if
conflict detection happens after inserts are attempted, rather than before.
Yes. That's by design. You can't reliably do conflict detection before
evaluating column default values.
Andres Freund
On Tue, Jan 5, 2016 at 8:13 AM, Andres Freund <andres@anarazel.de> wrote:
When using an INSERT statement with an ON CONFLICT clause, if there is a
conflict, and the table being inserted into has a column defaulted to
nextval('seq'), the sequencer 'seq' is always incremented. This can quickly
and artificially consume all values of the sequencer; it behaves as if
conflict detection happens after inserts are attempted, rather than before.Yes. That's by design. You can't reliably do conflict detection before
evaluating column default values.
Right. If you didn't consume a sequence value, but just did a
"peek-ahead", then several concurrently inserting sessions would all
"peek-ahead" and see the same value. There'd then be a race condition
that broke the useful guarantees that ON CONFLICT DO UPDATE makes.
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
No peek-ahead necessary, when the sequenced column would not be used in the
conflict_target. UPSERTS to dimension tables in a star schema as part of an
OLAP system are very handy, but typically the records in the dimension
tables use a surrogate key based on sequences for maintenance & performance
reasons, and the 'natural' key/value is the only column that would be used
to detect a conflict. In this case, the 'normal' path would be the
conflict_action nearly all the time, and not the insert. If the typical
path was the INSERT, I'd bet the data would be from some event source, and
I guess in that case if there were a key column defaulted with a sequence,
like an order#, it would have to be used to resolve conflicts. But it's
hard to imagine a case in a process flow where you wouldn't already know if
the event existed or not, limiting the need for an UPSERT on event data.
This limitation, of consuming sequencers used for defaults on columns not
necessary to resolve conflict, diminishes the viability for using Postges
UPSERTS for large data warehouses, is all I'm saying (or requires surrogate
keys being 64 bit). Just caught me by surprise in comparison to other
RDBMSs offering some form of UPSERTs that would not consume a sequencer if
its values weren't required to resolve conflicts.
On Tue, Jan 5, 2016 at 12:27 PM, Peter Geoghegan <pg@heroku.com> wrote:
On Tue, Jan 5, 2016 at 8:13 AM, Andres Freund <andres@anarazel.de> wrote:
When using an INSERT statement with an ON CONFLICT clause, if there is a
conflict, and the table being inserted into has a column defaulted to
nextval('seq'), the sequencer 'seq' is always incremented. This canquickly
and artificially consume all values of the sequencer; it behaves as if
conflict detection happens after inserts are attempted, rather thanbefore.
Yes. That's by design. You can't reliably do conflict detection before
evaluating column default values.Right. If you didn't consume a sequence value, but just did a
"peek-ahead", then several concurrently inserting sessions would all
"peek-ahead" and see the same value. There'd then be a race condition
that broke the useful guarantees that ON CONFLICT DO UPDATE makes.--
Peter Geoghegan
--
*Paul Hester*
Co-founder & Chief Architect
www.salesintel.com
On 2016-01-05 22:04:41 -0700, Paul Hester wrote:
No peek-ahead necessary, when the sequenced column would not be used in the
conflict_target. UPSERTS to dimension tables in a star schema as part of an
OLAP system are very handy, but typically the records in the dimension
tables use a surrogate key based on sequences for maintenance & performance
reasons, and the 'natural' key/value is the only column that would be used
to detect a conflict. In this case, the 'normal' path would be the
conflict_action nearly all the time, and not the insert. If the typical
path was the INSERT, I'd bet the data would be from some event source, and
I guess in that case if there were a key column defaulted with a sequence,
like an order#, it would have to be used to resolve conflicts. But it's
hard to imagine a case in a process flow where you wouldn't already know if
the event existed or not, limiting the need for an UPSERT on event data.
It'd be very fragile. You could only do that if the value isn't involved
in any unique indexes, if there are no BEFORE triggers (since they need
to see the value).
This limitation, of consuming sequencers used for defaults on columns not
necessary to resolve conflict, diminishes the viability for using Postges
UPSERTS for large data warehouses, is all I'm saying (or requires surrogate
keys being 64 bit). Just caught me by surprise in comparison to other
RDBMSs offering some form of UPSERTs that would not consume a sequencer if
its values weren't required to resolve conflicts.
If you use normal sequences you already need to cope with gaps in
sequences. And ids needing to be 64bits if you're a longer lived
business and it's for a a halfway "popular" table, isn't something new.
So I don't think addressing this is worth adding code for, especially
given the fragility of the situations where it'd be usable.
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Paul Hester schrieb am 06.01.2016 um 06:04:
This limitation, of consuming sequencers used for defaults on columns
not necessary to resolve conflict, diminishes the viability for using
Postges UPSERTS for large data warehouses, is all I'm saying (or
requires surrogate keys being 64 bit). Just caught me by surprise in
comparison to other RDBMSs offering some form of UPSERTs that would
not consume a sequencer if its values weren't required to resolve
conflicts.
Why does this "diminish" the viability of a sequence?
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
To be clear, the column defaulted to a sequencer is not at all involved in determining conflicts, and therefore it should not matter if it is used in a unique index, nor would there be any problems with a BEFORE trigger.
Gaps aren’t any kind of issue, and it’s not about how popular a related table is to the table upon which the UPSERT would be applied.
For example, I might be receiving event data from a shop floor where everything is instrumented; tank levels, step counters, etc. There could be millions of events over a relatively short period of time. I might have a field in the event data which indicates the ‘class’ of device that generated the event; ex: ‘pump’, ‘actuator’, ‘regulator’, ‘thermometer’. There may only be less than a few hundred classes of devices that rarely change. So I have a table, with a surrogate SMALLINT key defaulted to a sequence, and a ‘name’ column which is a device class name.
As the raw event data is processed, I want to do an UPSERT into the device class table, using only the class name to resolve conflicts, to get it’s surrogate key, which is then used while inserting the related event record linking that event record to the particular device class. Since device classes are rarely added, it will be the common case that an actual INSERT is almost never performed.
Using INSERT ON CONFLICT in this case, however, would mean the device class table, which would only ever hold a few hundred records and would only require a SMALLINT for its surrogate key, would have to have a BIGINT for no good reason, which really isn’t a big deal. What IS the big deal is that the event table would also require a BIGINT for the device class column to relate the event to the device class. In most cases, there are many more ‘organizing’ columns, similar to the device class in this example, meaning BIGINTs would have to be used for no good reason within event tables, which can hold vast volumes of data. This can have a real and measurable impact on memory, network, and storage requirements, which eventually map to dollars.
This basic scenario is very common, and one that greatly benefits from UPSERT capability; i.e. linking event data to organizational data. It’s much less common to have to UPSERT into some event table; i.e. the INSERT part of UPSERTs is usually the path less taken. I used the example of a shop floor with device classes, but its the same issue for looking at orders by product hierarchies, commissions by territories, stars by classification, web-page-hits by marketing campaigns…. I could list hundreds of processes within many problem domains that are of this same basic pattern.
My only point is, there’s another great capability in Postgres, for doing efficient concurrent UPSERTS, but that one of the most common and prime use cases for it is going to cost more money from having to use 64bit numbers everywhere for no good reason.
From: Andres Freund
Sent: Wednesday, January 6, 2016 11:19 AM
To: Paul Hester
Cc: Peter Geoghegan; pgsql-bugs
Subject: Re: [BUGS] BUG #13846: INSERT ON CONFLICT consumes sequencers onconflicts
On 2016-01-05 22:04:41 -0700, Paul Hester wrote:
No peek-ahead necessary, when the sequenced column would not be used in the
conflict_target. UPSERTS to dimension tables in a star schema as part of an
OLAP system are very handy, but typically the records in the dimension
tables use a surrogate key based on sequences for maintenance & performance
reasons, and the 'natural' key/value is the only column that would be used
to detect a conflict. In this case, the 'normal' path would be the
conflict_action nearly all the time, and not the insert. If the typical
path was the INSERT, I'd bet the data would be from some event source, and
I guess in that case if there were a key column defaulted with a sequence,
like an order#, it would have to be used to resolve conflicts. But it's
hard to imagine a case in a process flow where you wouldn't already know if
the event existed or not, limiting the need for an UPSERT on event data.
It'd be very fragile. You could only do that if the value isn't involved
in any unique indexes, if there are no BEFORE triggers (since they need
to see the value).
This limitation, of consuming sequencers used for defaults on columns not
necessary to resolve conflict, diminishes the viability for using Postges
UPSERTS for large data warehouses, is all I'm saying (or requires surrogate
keys being 64 bit). Just caught me by surprise in comparison to other
RDBMSs offering some form of UPSERTs that would not consume a sequencer if
its values weren't required to resolve conflicts.
If you use normal sequences you already need to cope with gaps in
sequences. And ids needing to be 64bits if you're a longer lived
business and it's for a a halfway "popular" table, isn't something new.
So I don't think addressing this is worth adding code for, especially
given the fragility of the situations where it'd be usable.
On Wed, Jan 6, 2016 at 1:00 PM, Paul <paul@salesintel.com> wrote:
My only point is, there’s another great capability in Postgres, for doing
efficient concurrent UPSERTS, but that one of the most common and prime use
cases for it is going to cost more money from having to use 64bit numbers
everywhere for no good reason.
Not really; at least, it won't increase the size of a plain BIGSERIAL
primary key index (which will generally not be composite):
postgres=# create table foo as select (random() * 100000000)::int4
pkey from generate_series(1, 100000);
SELECT 100000
postgres=# create index on foo (pkey);
CREATE INDEX
postgres=# \dt+ foo
List of relations
Schema │ Name │ Type │ Owner │ Size │ Description
────────┼──────┼───────┼───────┼─────────┼─────────────
public │ foo │ table │ pg │ 3544 kB │
(1 row)
postgres=# \di+ foo_pkey_idx
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼──────────────┼───────┼───────┼───────┼─────────┼─────────────
public │ foo_pkey_idx │ index │ pg │ foo │ 2208 kB │
(1 row)
postgres=# alter table foo alter column pkey type int8;
ALTER TABLE
postgres=# \di+ foo_pkey_idx
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼──────────────┼───────┼───────┼───────┼─────────┼─────────────
public │ foo_pkey_idx │ index │ pg │ foo │ 2208 kB │
(1 row)
The unchanged size of the index foo_pkey_idx seen here due to
alignment considerations. Granted, the heap might still be a bit
larger than it would otherwise be, because it will usually be
"composite", but I think the "cost" of your using int8 will not break
the bank.
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
All things are relative, to how much data, and how big the bank. And having built gigantic (billions of billions of fact records, hundreds of dimensions, millions of members) data warehouse and olap solutions, you really do try to keep stuff small because it does have an impact on cost, size, and performance.
On a more conceptual note, if a sequence-defaulted column is used to resolve a conflict, i.e. it’s in the conflict_target, than by definition there will never be a conflict!
So anything about ‘fragility’ and ‘peek-ahead’ and ‘BEFORE triggers’ not working just doesn’t make any sense. If a sequence is used to resolve a conflict, there will never be a conflict, and if it isn’t, there’s no need to get the next sequenced value before being able to determine conflicting records meaning no concurrency issues; you would still only need to get the next sequential value only for an actual insert.
From: Peter Geoghegan
Sent: Wednesday, January 6, 2016 2:13 PM
To: Paul
Cc: Andres Freund; pgsql-bugs
Subject: Re: [BUGS] BUG #13846: INSERT ON CONFLICT consumes sequencersonconflicts
On Wed, Jan 6, 2016 at 1:00 PM, Paul <paul@salesintel.com> wrote:
My only point is, there’s another great capability in Postgres, for doing
efficient concurrent UPSERTS, but that one of the most common and prime use
cases for it is going to cost more money from having to use 64bit numbers
everywhere for no good reason.
Not really; at least, it won't increase the size of a plain BIGSERIAL
primary key index (which will generally not be composite):
postgres=# create table foo as select (random() * 100000000)::int4
pkey from generate_series(1, 100000);
SELECT 100000
postgres=# create index on foo (pkey);
CREATE INDEX
postgres=# \dt+ foo
List of relations
Schema │ Name │ Type │ Owner │ Size │ Description
────────┼──────┼───────┼───────┼─────────┼─────────────
public │ foo │ table │ pg │ 3544 kB │
(1 row)
postgres=# \di+ foo_pkey_idx
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼──────────────┼───────┼───────┼───────┼─────────┼─────────────
public │ foo_pkey_idx │ index │ pg │ foo │ 2208 kB │
(1 row)
postgres=# alter table foo alter column pkey type int8;
ALTER TABLE
postgres=# \di+ foo_pkey_idx
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼──────────────┼───────┼───────┼───────┼─────────┼─────────────
public │ foo_pkey_idx │ index │ pg │ foo │ 2208 kB │
(1 row)
The unchanged size of the index foo_pkey_idx seen here due to
alignment considerations. Granted, the heap might still be a bit
larger than it would otherwise be, because it will usually be
"composite", but I think the "cost" of your using int8 will not break
the bank.
--
Peter Geoghegan
I’m looking at math more like a single Fact table having 500 million records, with 10 dimension columns. If INTs were used for the dimension columns, that’s 20GB. If I had to change those dimension columns to BIGINTs, that’s 40GB. This can impact how much you can fit into server memory, where olaps like to live, and just slow down moving stuff around between memory and disk and over the network and backups, etc.
On a technical note, why is the following flow considered ‘fragile’?
1) Evaluate only columns used in conflict_target
a. Conflict-resolving Columns with default nextval() increment the corresponding sequencer
i. And in this case, there were never be conflicts by definition, so ON CONFLICT can always be ignored
2) If conflict, DO UPDATE
a. If nextval()-defaulted column used in conflict_target, we never get here
b. Defaults never evaluated
3) Else Evaluate remaining columns not used in conflict_target and INSERT
a. Columns with nextval() increment their corresponding sequencer
From: Peter Geoghegan
Sent: Wednesday, January 6, 2016 2:13 PM
To: Paul
Cc: Andres Freund; pgsql-bugs
Subject: Re: [BUGS] BUG #13846: INSERT ON CONFLICT consumes sequencersonconflicts
On Wed, Jan 6, 2016 at 1:00 PM, Paul <paul@salesintel.com> wrote:
My only point is, there’s another great capability in Postgres, for doing
efficient concurrent UPSERTS, but that one of the most common and prime use
cases for it is going to cost more money from having to use 64bit numbers
everywhere for no good reason.
Not really; at least, it won't increase the size of a plain BIGSERIAL
primary key index (which will generally not be composite):
postgres=# create table foo as select (random() * 100000000)::int4
pkey from generate_series(1, 100000);
SELECT 100000
postgres=# create index on foo (pkey);
CREATE INDEX
postgres=# \dt+ foo
List of relations
Schema │ Name │ Type │ Owner │ Size │ Description
────────┼──────┼───────┼───────┼─────────┼─────────────
public │ foo │ table │ pg │ 3544 kB │
(1 row)
postgres=# \di+ foo_pkey_idx
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼──────────────┼───────┼───────┼───────┼─────────┼─────────────
public │ foo_pkey_idx │ index │ pg │ foo │ 2208 kB │
(1 row)
postgres=# alter table foo alter column pkey type int8;
ALTER TABLE
postgres=# \di+ foo_pkey_idx
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼──────────────┼───────┼───────┼───────┼─────────┼─────────────
public │ foo_pkey_idx │ index │ pg │ foo │ 2208 kB │
(1 row)
The unchanged size of the index foo_pkey_idx seen here due to
alignment considerations. Granted, the heap might still be a bit
larger than it would otherwise be, because it will usually be
"composite", but I think the "cost" of your using int8 will not break
the bank.
--
Peter Geoghegan
Hi,
On 2016-01-06 15:00:17 -0700, Paul wrote:
I’m looking at math more like a single Fact table having 500 million
records, with 10 dimension columns. If INTs were used for the
dimension columns, that’s 20GB.
If I had to change those dimension columns to BIGINTs, that’s
40GB. This can impact how much you can fit into server memory, where
olaps like to live, and just slow down moving stuff around between
memory and disk and over the network and backups, etc.
Row headers, padding & alignment makes that absolutely different in
reality. It's like a sub 20% difference.
And in pretty much all the cases with sufficient insertion rates you're
going to want bigints anyway. If there's few rows it doesn't matter
anyway.
On a technical note, why is the following flow considered ‘fragile’?
The fragility comes from the fact that it'd only be used in a subset of
cases. E.g. if a BEFORE trigger is present the to-be-inserted rows needs
to be "materialized" to be presented to the trigger.
1) Evaluate only columns used in conflict_target
a. Conflict-resolving Columns with default nextval() increment the corresponding sequencer
i. And in this case, there were never be conflicts by definition, so
ON CONFLICT can always be ignored
Wrong. Rows with sequences can very well conflict, there's nothing
forcing sequences to always be used.
Also note that sequence default values aren't in any way different from other
default values, and that relevant pieces of code currently don't know
whether a default value is a nextval or not.
2) If conflict, DO UPDATE
a. If nextval()-defaulted column used in conflict_target, we never get here
b. Defaults never evaluated
3) Else Evaluate remaining columns not used in conflict_target and INSERT
a. Columns with nextval() increment their corresponding sequencer
Perhaps read the code, and prototype it? I can tell you that it'd be a
significant amount of work, and that I'm personally absolutely not
interested in investing significant amounts time into it. But that
doesn't have to stop *you*.
Anyway, EOD for me.
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Jan 6, 2016 at 9:00 PM, Paul <paul@salesintel.com> wrote:
As the raw event data is processed, I want to do an UPSERT into the device
class table, using only the class name to resolve conflicts, to get it’s
surrogate key, which is then used while inserting the related event record
linking that event record to the particular device class. Since device
classes are rarely added, it will be the common case that an actual INSERT
is almost never performed.
Fwiw while the implementation of UPSERT is fairly efficient it's
probably still too high an overhead to handle this kind of case.
Ingesting high volume of sensor data like this you want to be very
fast with little extra work. It's perfectly sensible to use UPSERT to
insert sensor ids but you wouldn't want to do so on every single
sensor datum or you'll end up spending more time and I/O doing that
than just ingesting your raw data.
Luckily in a case like that you don't expect the sensor ids to be
updated or deleted so you aren't really concerned about concurrency
except for a race between multiple insertions. So simply keeping a
list of known sensors and using upsert whenever a new sensor id is
seen (and periodically refreshing the list) would work perfectly well.
And independently of how sequences work you would want to be doing
that anyways for performance.
I'm more sympathetic to your concern about storage density than Peter
and Andres seem to be but unfortunately the reality is that it's much
harder to implement what you describe than it might appear. That often
happens in Postgres because it's a general purpose flexible platform
and has to handle whatever setup users create. So often solutions that
make perfect sense for a typical setup actually don't work in general.
Implementing UPSERT correctly for general case was actually really
really hard already so avoiding extra complications for features like
this is an important strategy for getting things done. I'm sure you'll
agree that we're better off with UPSERT with this limitation than no
UPSERT at all...
Fwiw this is the kind of feedback for which there are beta releases.
We need users to try to develop applications using new features before
they're released to find exactly these types of mismatches with user
expectations. Even if someone wanted to work on this now it wouldn't
happen until 9.6 which means any application that it would have helped
would probably already have run into the problem and had to adjust
already.
--
greg
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Hi Andres, appreciate your replies. I hope you will still read this when you can, as I believe I can make the financial case, and that the solution is relatively simple. I also believe there’s still some misunderstanding to the problem. Maybe you could point me to the right place in the code.
Row headers, padding & alignment makes that absolutely different in
reality. It's like a sub 20% difference.
Regarding the space increase going from 10 INTs to 10 BIGINTS per row, using my example of 500 million fact records, and referring to http://www.postgresql.org/docs/9.5/static/storage-page-layout.html
Using 10 INT columns per record, with row-index, row-header, and 64 bit alignment overhead, requires 4B (row Idx) + 24B (row hdr) + 40B (row data) = 68B per row. With a page size of 8192B, minus 24B for the page header, gives 8168 / 68 = ~120 rows per page.
Using 20 BIGINT columns per record, with row-index, row-header, and 64 bit alignment overhead, requires 4B (row Idx) + 24B (row hdr) + 80B (row data) = 108B per row. With a page size of 8192B, minus 24B for the page header, gives 8168 / 108 = ~75 rows per page.
1 – 75 / 120 = ~38%
So its much nearer 40%, and not sub 20%. When dealing in 10s to 100’s of GBs (there are terabyte warehouses), 38% is significant, and can have a real financial impact with respect to memory, network, primary and backup storage costs.
And in pretty much all the cases with sufficient insertion rates you're
going to want bigints anyway. If there's few rows it doesn't matter
anyway.
This is true for the surrogate key (generated by a sequencer default) of a Fact table (which comes from operational business transactions, like taking orders, or web-page visits). As I stated, there are warehouses with billions of fact records, that are just about always sequentially inserted and have a timestamp holding when they occurred. I have never come across a practical need to do UPSERTs on Fact tables (i.e. transaction tables, event tables, data generated by things occurring in the real world etc.).
But the table where the UPSERT is being applied is not the Fact table! It’s a related Dimension table, and in most cases, the INSERT path will not be taken!! A Fact table will have columns representing Members along Dimensions. For example, an Order Fact table holding records for individual Order Line Items, would most likely have a Dimension column to represent the Status of the Line Item; i.e. Open, Shipped, Invoiced, Closed. There would be a separate Status dimension table, with a SMALLINT column, that would be used in the Fact table to relate an Order Line Item to a particular Status. It is the Status Dimension table upon which an UPSERT would be performed for every Order Line Item ETL’d from the OLTP system into the warehouse.
So, I’m not clear how your above statement applies exactly.
The fragility comes from the fact that it'd only be used in a subset of
cases. E.g. if a BEFORE trigger is present the to-be-inserted rows needs
to be "materialized" to be presented to the trigger.
(I’m assuming you mean the BEFORE INSERT trigger, and I’m noting you state ‘to-be-inserted’ records.)
I would then argue that the BEFORE trigger is being applied inappropriately in the case of an INSERT with an ON CONFLICT clause. First, I think the purpose of the INSERT ON CONFLICT is to implement UPSERTs; would you agree? We were always able to implement UPSERT logic before, it just was a pain in the butt because of concurrency. Before, I would have had a separate INSERT statement and a separate UPDATE statement. If I had a BEFORE INSERT trigger, it would only execute for records that were actually going to be inserted. However, you’re now stating that the INSERT ON CONFLICT will execute a BEFORE INSERT trigger even for records that are not inserted?? That doesn’t seem quite logical, and contrary to ‘old fashioned’ upsert logic. Does this also mean that the BEFORE UPDATE trigger is always called as well, or is it never called?
Wrong. Rows with sequences can very well conflict, there's nothing
forcing sequences to always be used.
My oversight, you are correct here, should an explicit value be provided for a column normally defaulted from a sequencer. But that doesn’t invalidate the basic premise of what I’ve been trying to get across.
Also note that sequence default values aren't in any way different from other
default values, and that relevant pieces of code currently don't know
whether a default value is a nextval or not.
The relevant pieces of code wouldn’t need to know anything about how the default value is computed for any column. The relevant code merely needs to only evaluate default values for only the columns used to resolve conflicts, and not all of the columns. Then, for the records that don’t conflict, evaluate any remaining default expressions, and then call the BEFORE INSERT trigger with only the records that are actually going to be inserted!!
I greatly welcome your thoughts, and please point to the right place in the code.
Regards,
paul
From: Andres Freund
Sent: Wednesday, January 6, 2016 3:11 PM
To: Paul
Cc: Peter Geoghegan; pgsql-bugs
Subject: Re: [BUGS] BUG #13846: INSERT ON CONFLICT consumessequencersonconflicts
Hi,
On 2016-01-06 15:00:17 -0700, Paul wrote:
I’m looking at math more like a single Fact table having 500 million
records, with 10 dimension columns. If INTs were used for the
dimension columns, that’s 20GB.
If I had to change those dimension columns to BIGINTs, that’s
40GB. This can impact how much you can fit into server memory, where
olaps like to live, and just slow down moving stuff around between
memory and disk and over the network and backups, etc.
Row headers, padding & alignment makes that absolutely different in
reality. It's like a sub 20% difference.
And in pretty much all the cases with sufficient insertion rates you're
going to want bigints anyway. If there's few rows it doesn't matter
anyway.
On a technical note, why is the following flow considered ‘fragile’?
The fragility comes from the fact that it'd only be used in a subset of
cases. E.g. if a BEFORE trigger is present the to-be-inserted rows needs
to be "materialized" to be presented to the trigger.
1) Evaluate only columns used in conflict_target
a. Conflict-resolving Columns with default nextval() increment the corresponding sequencer
i. And in this case, there were never be conflicts by definition, so
ON CONFLICT can always be ignored
Wrong. Rows with sequences can very well conflict, there's nothing
forcing sequences to always be used.
Also note that sequence default values aren't in any way different from other
default values, and that relevant pieces of code currently don't know
whether a default value is a nextval or not.
2) If conflict, DO UPDATE
a. If nextval()-defaulted column used in conflict_target, we never get here
b. Defaults never evaluated
3) Else Evaluate remaining columns not used in conflict_target and INSERT
a. Columns with nextval() increment their corresponding sequencer
Perhaps read the code, and prototype it? I can tell you that it'd be a
significant amount of work, and that I'm personally absolutely not
interested in investing significant amounts time into it. But that
doesn't have to stop *you*.
Anyway, EOD for me.
On Wed, Jan 6, 2016 at 5:14 PM, Paul <paul@salesintel.com> wrote:
If I had a BEFORE INSERT trigger, it would only execute for records that
were actually going to be inserted. However, you’re now stating that the
INSERT ON CONFLICT will execute a BEFORE INSERT trigger even for records
that are not inserted?? That doesn’t seem quite logical, and contrary to
‘old fashioned’ upsert logic. Does this also mean that the BEFORE UPDATE
trigger is always called as well, or is it never called?
This seems to boil down to the two possible ways of manually implementing
UPSERT:
UPDATE, if not present, INSERT
INSERT, if failing, UPDATE
In the later the before insert trigger fires and influences whether the
insert
fails. In the former you are already pretty certain the insert will
succeed because the UPDATE found no records.
We've implemented INSERT, if failing UPDATE. The insert has to be
attempted and right now there is no concept of targeted partial
deferrability when constructing the record to be inserted.
To solve this situation it is likely that some form of "UPDATE ON MISSING
INSERT" would need to be designed. The insert portion would specify
"DEFAULT" for sequence columns and would execute nextval() only if the ON
MISSING portion is executed.
Fundamentally, the difference is that ON MISSING is considerably less
complicated than ON CONFLICT. What is wanted here is an ON MISSING
interpretation but what we've implemented is ON CONFLICT. It seems that
the hackers are in agreement that our implementation of ON CONFLICT is
consistent with its definition. That it doesn't efficiently solve problems
better handled by ON MISSING - while unfortunate - doesn't constitute a
bug: only an opportunity for future enhancement.
David J.
On Fri, May 6, 2016 at 12:02 PM, David G. Johnston
<david.g.johnston@gmail.com> wrote:
To solve this situation it is likely that some form of "UPDATE ON MISSING
INSERT" would need to be designed. The insert portion would specify
"DEFAULT" for sequence columns and would execute nextval() only if the ON
MISSING portion is executed.
That's unworkable, at least without accepting a bunch of new
edge-cases, like having the insert then have a duplicate violation
involving a value that was determined to not exist in the first phase.
IOW, it's unworkable to do an insert on the basis of an *absence* of
something in an index or in a table (and not get those edge-cases).
Doing so on the basis of the *presence* of a value (i.e. INSERT ... ON
CONFLICT DO UPDATE as implemented) lets the implementation clamp down
on race conditions enough to provide those useful user-visible
guarantees about getting 1 of 2 possible outcomes.
There are multiple definitions of a value "existing" here that are in
tension here. It's rather complicated.
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Fri, May 6, 2016 at 12:10 PM, Peter Geoghegan <pg@heroku.com> wrote:
On Fri, May 6, 2016 at 12:02 PM, David G. Johnston
<david.g.johnston@gmail.com> wrote:To solve this situation it is likely that some form of "UPDATE ON MISSING
INSERT" would need to be designed. The insert portion would specify
"DEFAULT" for sequence columns and would execute nextval() only if the ON
MISSING portion is executed.That's unworkable, at least without accepting a bunch of new
edge-cases, like having the insert then have a duplicate violation
involving a value that was determined to not exist in the first phase.
IOW, it's unworkable to do an insert on the basis of an *absence* of
something in an index or in a table (and not get those edge-cases).
Doing so on the basis of the *presence* of a value (i.e. INSERT ... ON
CONFLICT DO UPDATE as implemented) lets the implementation clamp down
on race conditions enough to provide those useful user-visible
guarantees about getting 1 of 2 possible outcomes.There are multiple definitions of a value "existing" here that are in
tension here. It's rather complicated.
UPDATE ON MISSING INSERT ON CONFLICT DO THE ORIGINAL UPDATE ...
I'm nowhere near close enough to this to contribute deeply - the most I
hope for is to spark an idea in someone else.
David J.
Hi Dave,
Thanks for the informative reply. We absolutely agree it’s not a bug in implementation, but maybe a bug in conceptualization with regard to usability. Not knowing the initial motive for the feature, we assumed it was meant for an intuitively simpler form of the rather common upsert pattern, but it seems INSERT ON CONFLICT was meant for some other purpose that we don’t quite understand. We stopped using and went back to a manual approach, which also works great because PostgreSQL is just kinda great 😊.
If any related enhancements where to be done, our ‘holy grail’ would be an implementation of the MERGE statement, like this, or this, or this… We can at least ask for it right?? 😉
Thanks for all your guys’ hard work
-p
From: David G. Johnston
Sent: Friday, May 6, 2016 1:02 PM
To: Paul
Cc: Andres Freund; Peter Geoghegan; pgsql-bugs
Subject: Re: [BUGS] BUG #13846: INSERT ON CONFLICTconsumessequencersonconflicts
On Wed, Jan 6, 2016 at 5:14 PM, Paul <paul@salesintel.com> wrote:
If I had a BEFORE INSERT trigger, it would only execute for records that were actually going to be inserted. However, you’re now stating that the INSERT ON CONFLICT will execute a BEFORE INSERT trigger even for records that are not inserted?? That doesn’t seem quite logical, and contrary to ‘old fashioned’ upsert logic. Does this also mean that the BEFORE UPDATE trigger is always called as well, or is it never called?
This seems to boil down to the two possible ways of manually implementing UPSERT:
UPDATE, if not present, INSERT
INSERT, if failing, UPDATE
In the later the before insert trigger fires and influences whether the insert
fails. In the former you are already pretty certain the insert will succeed because the UPDATE found no records.
We've implemented INSERT, if failing UPDATE. The insert has to be attempted and right now there is no concept of targeted partial deferrability when constructing the record to be inserted.
To solve this situation it is likely that some form of "UPDATE ON MISSING INSERT" would need to be designed. The insert portion would specify "DEFAULT" for sequence columns and would execute nextval() only if the ON MISSING portion is executed.
Fundamentally, the difference is that ON MISSING is considerably less complicated than ON CONFLICT. What is wanted here is an ON MISSING interpretation but what we've implemented is ON CONFLICT. It seems that the hackers are in agreement that our implementation of ON CONFLICT is consistent with its definition. That it doesn't efficiently solve problems better handled by ON MISSING - while unfortunate - doesn't constitute a bug: only an opportunity for future enhancement.
David J.
Just for clarity on the real financial problem of usability,
In our situation, we have ‘event’ tables, and related ‘category’ tables; I’m simplifying and generalizing a bit. The event tables capture streams of activity. Each record has several columns that categorize each event. Sometimes new categories come along from the raw event data source, but several categories would never have more than a couple hundred discreet values, if that. We also need to filter, group, and aggregate the events along some of those categories.
So structure is something like this (although with more category columns)
CREATE TABLE event (
id BIGSERAL NOT NULL,
cat_a_id SMALLINT NOT NULL, /* would have to be BIGINT if using IOC */
cat_b_id SMALLINT NOT NULL, /* would have to be BIGINT if using IOC */
measure FLOAT,
CONSTRAINT pk_event PRIMARY KEY (id)
);
CREATE INDEX ix_event1 ON event USING BTREE(cat_a_id);
CREATE INDEX ix_event2 ON event USING BTREE(cat_b_id, cat_a_id);
CREATE TABLE cat_a (
id SMALLSERIAL NOT NULL, /* would have to be BIGSERIAL if using IOC */
name TEXT NOT NULL,
CONSTRAINT pk_cat_a PRIMARY KEY (id)
);
CREATE UNIQUE INDEX ux_cat_a ON EVENT USING BTREE(name);
CREATE TABLE cat_b (
id SMALLSERIAL NOT NULL, /* would have to be BIGSERIAL if using IOC */
name TEXT NOT NULL,
CONSTRAINT pk_cat_b PRIMARY KEY (id)
);
CREATE UNIQUE INDEX ux_cat_b ON EVENT USING BTREE(name);
Some ‘raw’ input source records might look like this
“so-reg”, “open”, 10
“so-reg”, “stalled”, 1
“no-reg”, “stalled”, 1
“unknown’, “ongoing”, 1
As part of ingesting the raw event source into the event table, we do an upsert on the category tables, get the category’s id, then store the id in the respective category column in the event table.
Just in a few months, one of our event tables, having 9 category type columns, is already up over 200M records, and may hit a 1B in a few more months, and maybe in a year or so over 4B.
If we had continued to use INSERT ON CONFLICT [IOC], the cat_a_id and cat_b_id columns in the event table would have to be changed from SMALLINT to BIGINT, just because the related id sequences from the cat_a and cat_b tables are getting consumed on every insert into the event table, even though 99.9% of the time nothing is actually being inserted into the cat_a and cat_b tables.
We’re running our stuff in a cloud host, so we’re paying for memory, cpu, SSD storage, HDD storage, and network bandwidth (backups) on a monthly basis.
So if using IOC, our one event table with 9 category columns at 200M records goes from 9 * 2 = 18 bytes-per-record to 9 * 8 = 72 bytes-per-record. Or, from 3.6GB to 14.4GB! At a 1B events 18GB to 72GB!! And that doesn’t include the indexes on the event table for certain category columns and combinations, and that’s just one table in one replica.
56GB extra means we can fit way less of the event rows into memory, it takes up way more expensive SSD space (and HDD for backup), uses way more bandwidth and just takes longer for backups and replication, and the CPU when doing sorts and grouping etc has to compare lots more bits. This can add up to thousands of dollars or more over time.
Also, if we had kept using IOC and made the category columns just INT, would we have had to convert 4B records from INT to BIGINT sometime next year. Ouch!!
So, are problem with using IOC isn’t at all technical, it’s financial! Very expensive just to save a few lines of SQL…
I’m kinda curious what the initial use cases for IOC were? If it actually was for UPSERTs, under what conditions? A huge amount of data is just events and categories; i.e. tweets, emails, sensor readings, purchase orders, etc., and region, person, device, product-brand, etc. Ingesting this kind of data is a prime use case for UPSERTs.. Just curious what problem IOC is solving?
Anyways, none of this is meant to be at all a knock on anyone contributing to PG.. It’s an absolutely amazing amount of high quality value for free.. Thanks to all of you.
From: Paul
Sent: Friday, May 6, 2016 2:06 PM
To: David G. Johnston
Cc: Andres Freund; Peter Geoghegan; pgsql-bugs
Subject: RE: [BUGS] BUG #13846: INSERT ONCONFLICTconsumessequencersonconflicts
Hi Dave,
Thanks for the informative reply. We absolutely agree it’s not a bug in implementation, but maybe a bug in conceptualization with regard to usability. Not knowing the initial motive for the feature, we assumed it was meant for an intuitively simpler form of the rather common upsert pattern, but it seems INSERT ON CONFLICT was meant for some other purpose that we don’t quite understand. We stopped using and went back to a manual approach, which also works great because PostgreSQL is just kinda great 😊.
If any related enhancements where to be done, our ‘holy grail’ would be an implementation of the MERGE statement, like this, or this, or this… We can at least ask for it right?? 😉
Thanks for all your guys’ hard work
-p
From: David G. Johnston
Sent: Friday, May 6, 2016 1:02 PM
To: Paul
Cc: Andres Freund; Peter Geoghegan; pgsql-bugs
Subject: Re: [BUGS] BUG #13846: INSERT ON CONFLICTconsumessequencersonconflicts
On Wed, Jan 6, 2016 at 5:14 PM, Paul <paul@salesintel.com> wrote:
If I had a BEFORE INSERT trigger, it would only execute for records that were actually going to be inserted. However, you’re now stating that the INSERT ON CONFLICT will execute a BEFORE INSERT trigger even for records that are not inserted?? That doesn’t seem quite logical, and contrary to ‘old fashioned’ upsert logic. Does this also mean that the BEFORE UPDATE trigger is always called as well, or is it never called?
This seems to boil down to the two possible ways of manually implementing UPSERT:
UPDATE, if not present, INSERT
INSERT, if failing, UPDATE
In the later the before insert trigger fires and influences whether the insert
fails. In the former you are already pretty certain the insert will succeed because the UPDATE found no records.
We've implemented INSERT, if failing UPDATE. The insert has to be attempted and right now there is no concept of targeted partial deferrability when constructing the record to be inserted.
To solve this situation it is likely that some form of "UPDATE ON MISSING INSERT" would need to be designed. The insert portion would specify "DEFAULT" for sequence columns and would execute nextval() only if the ON MISSING portion is executed.
Fundamentally, the difference is that ON MISSING is considerably less complicated than ON CONFLICT. What is wanted here is an ON MISSING interpretation but what we've implemented is ON CONFLICT. It seems that the hackers are in agreement that our implementation of ON CONFLICT is consistent with its definition. That it doesn't efficiently solve problems better handled by ON MISSING - while unfortunate - doesn't constitute a bug: only an opportunity for future enhancement.
David J.