xmin and very high number of concurrent transactions

Started by Vijaykumar Jainabout 7 years ago8 messagesgeneral

vjain@opentable.com

about 7 years ago

I was asked this question in one of my demos, and it was interesting one.

we update xmin for new inserts with the current txid.
now in a very high concurrent scenario where there are more than 2000
concurrent users trying to insert new data,
will updating xmin value be a bottleneck?

i know we should use pooling solutions to reduce concurrent
connections but given we have enough resources to take care of
spawning a new process for a new connection,

Regards,
Vijay

Adrian Klaver

adrian.klaver@aklaver.com

about 7 years ago

In reply to: Vijaykumar Jain (#1)

Re: xmin and very high number of concurrent transactions

On 3/12/19 12:19 PM, Vijaykumar Jain wrote:

I was asked this question in one of my demos, and it was interesting one.

we update xmin for new inserts with the current txid.

Why?

now in a very high concurrent scenario where there are more than 2000
concurrent users trying to insert new data,
will updating xmin value be a bottleneck?

i know we should use pooling solutions to reduce concurrent
connections but given we have enough resources to take care of
spawning a new process for a new connection,

Regards,
Vijay

--
Adrian Klaver
adrian.klaver@aklaver.com

Vijaykumar Jain

vjain@opentable.com

about 7 years ago

In reply to: Adrian Klaver (#2)

Re: [External] Re: xmin and very high number of concurrent transactions

no i mean not we end users, postgres does it (?) via the xmin and xmax
fields from inherited tables :) if that is what you wanted in a why
or are you asking, does postgres even update those rows and i am wrong
assuming it that way?

since the values need to be atomic,
consider the below analogy
assuming i(postgres) am person giving out token to
people(connections/tx) in a queue.
if there is a single line, (sequential) then it is easy for me to
simply give them 1 token incrementing the value and so on.
but if there are thousands of users in parallel lines, i am only one
person delivering the token, will operate sequentially, and the other
person is "blocked" for sometime before it gets the token with the
required value.
so if there are 1000s or users with the "delay" may impact my
performance coz i need to maintain the value of the token to be able
to know what token value i need to give to next person?

i do not know if am explaining it correctly, pardon my analogy,

Regards,
Vijay

Show quoted text

On Wed, Mar 13, 2019 at 1:10 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 3/12/19 12:19 PM, Vijaykumar Jain wrote:

I was asked this question in one of my demos, and it was interesting one.

we update xmin for new inserts with the current txid.

Why?

now in a very high concurrent scenario where there are more than 2000
concurrent users trying to insert new data,
will updating xmin value be a bottleneck?

i know we should use pooling solutions to reduce concurrent
connections but given we have enough resources to take care of
spawning a new process for a new connection,

Regards,
Vijay

--
Adrian Klaver
adrian.klaver@aklaver.com

Adrian Klaver

adrian.klaver@aklaver.com

about 7 years ago

In reply to: Vijaykumar Jain (#3)

Re: [External] Re: xmin and very high number of concurrent transactions

On 3/12/19 1:02 PM, Vijaykumar Jain wrote:

no i mean not we end users, postgres does it (?) via the xmin and xmax
fields from inherited tables :) if that is what you wanted in a why
or are you asking, does postgres even update those rows and i am wrong
assuming it that way?

Not sure where the inherited tables come in?

See below for more info:
https://www.postgresql.org/docs/11/storage-page-layout.html

AFAIK xmin and xmax are just done as part of the insert or delete
operations so there is no updating involved.

I would say the impact to performance would come from the overhead of
each connection rather then maintaining xmin/xmax.

since the values need to be atomic,
consider the below analogy
assuming i(postgres) am person giving out token to
people(connections/tx) in a queue.
if there is a single line, (sequential) then it is easy for me to
simply give them 1 token incrementing the value and so on.
but if there are thousands of users in parallel lines, i am only one
person delivering the token, will operate sequentially, and the other
person is "blocked" for sometime before it gets the token with the
required value.
so if there are 1000s or users with the "delay" may impact my
performance coz i need to maintain the value of the token to be able
to know what token value i need to give to next person?

i do not know if am explaining it correctly, pardon my analogy,

Regards,
Vijay

On Wed, Mar 13, 2019 at 1:10 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 3/12/19 12:19 PM, Vijaykumar Jain wrote:

I was asked this question in one of my demos, and it was interesting one.

we update xmin for new inserts with the current txid.

Why?

now in a very high concurrent scenario where there are more than 2000
concurrent users trying to insert new data,
will updating xmin value be a bottleneck?

i know we should use pooling solutions to reduce concurrent
connections but given we have enough resources to take care of
spawning a new process for a new connection,

Regards,
Vijay

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Adrian Klaver
adrian.klaver@aklaver.com

Noname

reg_pg_stefanz@perfexpert.ch

about 7 years ago

In reply to: Vijaykumar Jain (#1)

Re: xmin and very high number of concurrent transactions

I may have misunderstood the documentation or your question, but I had
the understanding that xmin is not updated, but is only set on insert
(but yes, also for update, but updates are also inserts for Postgres as
updates are executed as delete/insert)

from https://www.postgresql.org/docs/10/ddl-system-columns.html

xmin
The identity (transaction ID) of the inserting transaction for this

row version. (A row version is an individual state of > row; each update
of a row creates a new row version for the same logical row.)

therfore I assume, there are no actual updates of xmin values

Stefan

Show quoted text

On 12.03.2019 20:19, Vijaykumar Jain wrote:

I was asked this question in one of my demos, and it was interesting one.

we update xmin for new inserts with the current txid.
now in a very high concurrent scenario where there are more than 2000
concurrent users trying to insert new data,
will updating xmin value be a bottleneck?

i know we should use pooling solutions to reduce concurrent
connections but given we have enough resources to take care of
spawning a new process for a new connection,

Regards,
Vijay

Laurenz Albe

laurenz.albe@cybertec.at

about 7 years ago

In reply to: Vijaykumar Jain (#1)

Re: xmin and very high number of concurrent transactions

Vijaykumar Jain wrote:

I was asked this question in one of my demos, and it was interesting one.

we update xmin for new inserts with the current txid.
now in a very high concurrent scenario where there are more than 2000
concurrent users trying to insert new data,
will updating xmin value be a bottleneck?

i know we should use pooling solutions to reduce concurrent
connections but given we have enough resources to take care of
spawning a new process for a new connection,

You can read the function GetNewTransactionId in
src/backend/access/transam/varsup.c for details.

Transaction ID creation is serialized with a "light-weight lock",
so it could potentially be a bottleneck.

Often that is dwarfed by the I/O requirements from many concurrent
commits, but if most of your transactions are rolled back or you
use "synchronous_commit = off", I can imagine that it could matter.

It is not a matter of how many clients there are, but of how
often a new writing transaction is started.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

Julien Rouhaud

rjuju123@gmail.com

about 7 years ago

In reply to: Laurenz Albe (#6)

Re: xmin and very high number of concurrent transactions

On Wed, Mar 13, 2019 at 9:50 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

Vijaykumar Jain wrote:

I was asked this question in one of my demos, and it was interesting one.

we update xmin for new inserts with the current txid.
now in a very high concurrent scenario where there are more than 2000
concurrent users trying to insert new data,
will updating xmin value be a bottleneck?

i know we should use pooling solutions to reduce concurrent
connections but given we have enough resources to take care of
spawning a new process for a new connection,

You can read the function GetNewTransactionId in
src/backend/access/transam/varsup.c for details.

Transaction ID creation is serialized with a "light-weight lock",
so it could potentially be a bottleneck.

Also I think that GetSnapshotData() would be the major bottleneck way
before GetNewTransactionId() becomes problematic. Especially with
such a high number of active backends.

Vijaykumar Jain

vjain@opentable.com

about 7 years ago

In reply to: Julien Rouhaud (#7)

Re: [External] Re: xmin and very high number of concurrent transactions

Thank you everyone for responding.
Appreciate your help.

Looks like I need to understand the concepts a little more in detail , to
be able to ask the right questions, but atleast now I can look at the
relevant docs.

On Wed, 13 Mar 2019 at 2:44 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Wed, Mar 13, 2019 at 9:50 AM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

Vijaykumar Jain wrote:

I was asked this question in one of my demos, and it was interesting

one.

we update xmin for new inserts with the current txid.
now in a very high concurrent scenario where there are more than 2000
concurrent users trying to insert new data,
will updating xmin value be a bottleneck?

i know we should use pooling solutions to reduce concurrent
connections but given we have enough resources to take care of
spawning a new process for a new connection,

You can read the function GetNewTransactionId in
src/backend/access/transam/varsup.c for details.

Transaction ID creation is serialized with a "light-weight lock",
so it could potentially be a bottleneck.

Also I think that GetSnapshotData() would be the major bottleneck way
before GetNewTransactionId() becomes problematic. Especially with
such a high number of active backends.

Regards,
Vijay