Postgres code for a query intermediate dataset

Started by Rohit Goyalover 11 years ago17 messages
#1Rohit Goyal
rhtgyl.87@gmail.com

Hi All,

I want to work on the code of intermediate dataset of select and update
query.

For example.

Rohit's salary has been updated 4 times, so it has 4 different version of
salary.

I want to select salary of person named Rohit. Now suppose , in
intermediate result, I found 4 different versions of the data. I want to
know the code portion which i need to look for working on all 4 versions in
dataset. :)

Thanks in advance!!

Regards,
Rohit Goyal

--
Regards,
Rohit Goyal

#2Atri Sharma
atri.jiit@gmail.com
In reply to: Rohit Goyal (#1)
Re: Postgres code for a query intermediate dataset

On Sat, Sep 13, 2014 at 11:06 PM, Rohit Goyal <rhtgyl.87@gmail.com> wrote:

Hi All,

I want to work on the code of intermediate dataset of select and update
query.

For example.

Rohit's salary has been updated 4 times, so it has 4 different version of
salary.

I want to select salary of person named Rohit. Now suppose , in
intermediate result, I found 4 different versions of the data. I want to
know the code portion which i need to look for working on all 4 versions in
dataset. :)

Thanks in advance!!

Not sure what you are looking for, but each update is an insert of a new
tuple with the new values and marking the old tuple as deleted.

There is no need for tracking the versions of any changes in data set. Each
update operation leaves only one visible tuple. If the transaction commits,
inserted tuple becomes visible and old row is marked deleted. If the
transaction rollbacks, only the old tuple shall remain visible.
--
Regards,

Atri
*l'apprenant*

#3David G Johnston
david.g.johnston@gmail.com
In reply to: Atri Sharma (#2)
Re: Postgres code for a query intermediate dataset

Atri Sharma wrote

On Sat, Sep 13, 2014 at 11:06 PM, Rohit Goyal &lt;

rhtgyl.87@

&gt; wrote:

Hi All,

I want to work on the code of intermediate dataset of select and update
query.

For example.

Rohit's salary has been updated 4 times, so it has 4 different version of
salary.

I want to select salary of person named Rohit. Now suppose , in
intermediate result, I found 4 different versions of the data. I want to
know the code portion which i need to look for working on all 4 versions
in
dataset. :)

Thanks in advance!!

Not sure what you are looking for, but each update is an insert of a new
tuple with the new values and marking the old tuple as deleted.

There is no need for tracking the versions of any changes in data set.
Each
update operation leaves only one visible tuple. If the transaction
commits,
inserted tuple becomes visible and old row is marked deleted. If the
transaction rollbacks, only the old tuple shall remain visible.
--
Regards,

Atri
*l'apprenant*

Or rather even if you want to be able to reference the older versions of
that record there is nothing in PostgreSQL to facilitate that. You have to
manually create and manage the data so that you know during what time period
a given record is valid.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Postgres-code-for-a-query-intermediate-dataset-tp5818931p5818935.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Atri Sharma
atri.jiit@gmail.com
In reply to: David G Johnston (#3)
Re: Postgres code for a query intermediate dataset

On Sat, Sep 13, 2014 at 11:52 PM, David G Johnston <
david.g.johnston@gmail.com> wrote:

Atri Sharma wrote

On Sat, Sep 13, 2014 at 11:06 PM, Rohit Goyal &lt;

Or rather even if you want to be able to reference the older versions of
that record there is nothing in PostgreSQL to facilitate that. You have to
manually create and manage the data so that you know during what time
period
a given record is valid.

David J.

Sometimes I do miss 'time travel' we used to have :)

Regards,

Atri
--
Regards,

Atri
*l'apprenant*

#5Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: Atri Sharma (#4)
Re: Postgres code for a query intermediate dataset

On 14/09/14 06:35, Atri Sharma wrote:

On Sat, Sep 13, 2014 at 11:52 PM, David G Johnston
<david.g.johnston@gmail.com <mailto:david.g.johnston@gmail.com>> wrote:

Atri Sharma wrote

On Sat, Sep 13, 2014 at 11:06 PM, Rohit Goyal &lt;

Or rather even if you want to be able to reference the older
versions of
that record there is nothing in PostgreSQL to facilitate that. You
have to
manually create and manage the data so that you know during what
time period
a given record is valid.

David J.

Sometimes I do miss 'time travel' we used to have :)

Regards,

Atri
--
Regards,
Atri
/l'apprenant/

That is only because the Guild of Time Travellers was formed, and we are
very selective in whom we allow to join. It was a massive undertaking
to purge the knowledge of effective time travel from the general
populace (H. G. Wells had to be expelled with a partial brain wipe)! :-)

On a more serious note:
I did design and implement a system to allow what the original poster
was after, it involved 2 tables for each logical table, and used both an
EFFECTIVE_DATE & an AS_AT_DATE. This allowed insurance quotes to be
valid for a given of time, even if the insurance rates were set change
after the quote was given (but before the quote expired). This was
about 15 years ago. It was amusing that my wife joined that team 10
years after I left, and found 2 of my original colleagues still there!

Cheers,
Gavin

#6Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Rohit Goyal (#1)
Re: Postgres code for a query intermediate dataset

On 14/09/14 05:36, Rohit Goyal wrote:

Hi All,

I want to work on the code of intermediate dataset of select and update
query.

For example.

Rohit's salary has been updated 4 times, so it has 4 different version
of salary.

I want to select salary of person named Rohit. Now suppose , in
intermediate result, I found 4 different versions of the data. I want to
know the code portion which i need to look for working on all 4 versions
in dataset. :)

Hi Rohit,

Currently in Postgres, these intermediate versions all exist - however a
given session can only see one of them. Also VACUUM is allowed to
destroy versions that no other transactions can see.

So if I'm understanding you correctly, you would like to have some way
for a session to see *all* these versions (and I guess preventing VACUUM
from destroying them).

It is certainly possible (or used to be via snapshot manipulation, I
haven't looked at that code in a while sorry) to enable a session to see
all the old versions, and is quite a cool idea (Postgres used to have
this ability in older versions - called Time Travel).

For pure practicality, this can be achieved without any code
modifications using TRIGGERs and an extra table (as Gavin alludes to).

Do tell us a bit more about what you are wanting to do!

Cheers

Mark

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Atri Sharma
atri.jiit@gmail.com
In reply to: Mark Kirkwood (#6)
Re: Postgres code for a query intermediate dataset

On Sunday, September 14, 2014, Mark Kirkwood <mark.kirkwood@catalyst.net.nz>
wrote:

On 14/09/14 05:36, Rohit Goyal wrote:

Hi All,

I want to work on the code of intermediate dataset of select and update
query.

For example.

Rohit's salary has been updated 4 times, so it has 4 different version
of salary.

I want to select salary of person named Rohit. Now suppose , in
intermediate result, I found 4 different versions of the data. I want to
know the code portion which i need to look for working on all 4 versions
in dataset. :)

Hi Rohit,

Currently in Postgres, these intermediate versions all exist - however a
given session can only see one of them. Also VACUUM is allowed to destroy
versions that no other transactions can see.

So if I'm understanding you correctly, you would like to have some way for
a session to see *all* these versions (and I guess preventing VACUUM from
destroying them).

Any modifications of that sort are bound to introduce lots of pain, not to
mention performance degradation and the added responsibility of ensuring
that dead tuples don't bloat up the system (prevent vacuum from running at
regular intervals and you can have a xid wraparound).

I just mentioned that in case you are planning to go in that direction. If
you only want the data, use the triggers as Gavin mentioned.

Regards,

Atri

--
Regards,

Atri
*l'apprenant*

#8Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Atri Sharma (#7)
Re: Postgres code for a query intermediate dataset

On 14/09/14 19:25, Atri Sharma wrote:

On Sunday, September 14, 2014, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz <mailto:mark.kirkwood@catalyst.net.nz>>
wrote:

On 14/09/14 05:36, Rohit Goyal wrote:

Hi All,

I want to work on the code of intermediate dataset of select and
update
query.

For example.

Rohit's salary has been updated 4 times, so it has 4 different
version
of salary.

I want to select salary of person named Rohit. Now suppose , in
intermediate result, I found 4 different versions of the data. I
want to
know the code portion which i need to look for working on all 4
versions
in dataset. :)

Hi Rohit,

Currently in Postgres, these intermediate versions all exist -
however a given session can only see one of them. Also VACUUM is
allowed to destroy versions that no other transactions can see.

So if I'm understanding you correctly, you would like to have some
way for a session to see *all* these versions (and I guess
preventing VACUUM from destroying them).

Any modifications of that sort are bound to introduce lots of pain, not
to mention performance degradation and the added responsibility of
ensuring that dead tuples don't bloat up the system (prevent vacuum from
running at regular intervals and you can have a xid wraparound).

I just mentioned that in case you are planning to go in that direction.
If you only want the data, use the triggers as Gavin mentioned.

Obviously in the general case sure - but (as yet) we don't have much
idea about Rohit's use case and workload. If retrieving past versions is
the *primary* workload bias and high update concurrency is not required
then this could well work better than a trigger based solution.

And it does not seem too onerous to have the ability to switch this on
as required, viz:

ALTER TABLE table1 VERSIONING;

(or similar syntax) which makes VACUUM leave this table alone. It might
make more sense to make such a concept apply to a TABLESPACE instead
mind you (i.e things in here are for archive/versioning purposes)...

Clearly we'd need to see the code for any of this and evaluate if it is
good or terrible, but I'm not seeing the idea as bad as stated.

Cheers

Mark

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Rohit Goyal
rhtgyl.87@gmail.com
In reply to: Mark Kirkwood (#6)
Re: Postgres code for a query intermediate dataset

Hi Mark,

On Sun, Sep 14, 2014 at 8:57 AM, Mark Kirkwood <
mark.kirkwood@catalyst.net.nz> wrote:

On 14/09/14 05:36, Rohit Goyal wrote:

Hi All,

I want to work on the code of intermediate dataset of select and update
query.

For example.

Rohit's salary has been updated 4 times, so it has 4 different version
of salary.

I want to select salary of person named Rohit. Now suppose , in
intermediate result, I found 4 different versions of the data. I want to
know the code portion which i need to look for working on all 4 versions
in dataset. :)

Hi Rohit,

Currently in Postgres, these intermediate versions all exist - however a
given session can only see one of them. Also VACUUM is allowed to destroy
versions that no other transactions can see.

Exactly, one visible version is there per session. But, I want to test my

algorithm in which i myself want to work on all the intermediate version
and find the correct one for the session.

So if I'm understanding you correctly, you would like to have some way for
a session to see *all* these versions (and I guess preventing VACUUM from
destroying them).

yes and I also want to know the portion of code where i can find all the
historical or intermediate versions

It is certainly possible (or used to be via snapshot manipulation, I
haven't looked at that code in a while sorry) to enable a session to see
all the old versions, and is quite a cool idea (Postgres used to have this
ability in older versions - called Time Travel).

For pure practicality, this can be achieved without any code modifications
using TRIGGERs and an extra table (as Gavin alludes to).

Can you explain me more about how to starting working using trigger?

Do tell us a bit more about what you are wanting to do!

Cheers

Mark

--
Regards,
Rohit Goyal

#10Atri Sharma
atri.jiit@gmail.com
In reply to: Mark Kirkwood (#8)
Re: Postgres code for a query intermediate dataset

On Sun, Sep 14, 2014 at 1:30 PM, Mark Kirkwood <
mark.kirkwood@catalyst.net.nz> wrote:

On 14/09/14 19:25, Atri Sharma wrote:

On Sunday, September 14, 2014, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz <mailto:mark.kirkwood@catalyst.net.nz>>

wrote:

On 14/09/14 05:36, Rohit Goyal wrote:

Hi All,

I want to work on the code of intermediate dataset of select and
update
query.

For example.

Rohit's salary has been updated 4 times, so it has 4 different
version
of salary.

I want to select salary of person named Rohit. Now suppose , in
intermediate result, I found 4 different versions of the data. I
want to
know the code portion which i need to look for working on all 4
versions
in dataset. :)

Hi Rohit,

Currently in Postgres, these intermediate versions all exist -
however a given session can only see one of them. Also VACUUM is
allowed to destroy versions that no other transactions can see.

So if I'm understanding you correctly, you would like to have some
way for a session to see *all* these versions (and I guess
preventing VACUUM from destroying them).

Any modifications of that sort are bound to introduce lots of pain, not
to mention performance degradation and the added responsibility of
ensuring that dead tuples don't bloat up the system (prevent vacuum from
running at regular intervals and you can have a xid wraparound).

I just mentioned that in case you are planning to go in that direction.
If you only want the data, use the triggers as Gavin mentioned.

Obviously in the general case sure - but (as yet) we don't have much idea
about Rohit's use case and workload. If retrieving past versions is the
*primary* workload bias and high update concurrency is not required then
this could well work better than a trigger based solution.

And it does not seem too onerous to have the ability to switch this on as
required, viz:

ALTER TABLE table1 VERSIONING;

(or similar syntax) which makes VACUUM leave this table alone.

How do you plan to do all that VACUUM does for this table then?

It seems to me that you are saying to VACUUM that it need not be concerned
with table 'A' and you are assuming ownership of all the tasks performed by
VACUUM for this table. Seems pretty broken to me, not to mention the
performance degradations.

Regards,

Atri

Regards,

Atri

#11Atri Sharma
atri.jiit@gmail.com
In reply to: Mark Kirkwood (#8)
Re: Postgres code for a query intermediate dataset

On Sun, Sep 14, 2014 at 1:30 PM, Mark Kirkwood <
mark.kirkwood@catalyst.net.nz> wrote:

On 14/09/14 19:25, Atri Sharma wrote:

On Sunday, September 14, 2014, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz <mailto:mark.kirkwood@catalyst.net.nz>>

wrote:

On 14/09/14 05:36, Rohit Goyal wrote:

Hi All,

I want to work on the code of intermediate dataset of select and
update
query.

For example.

Rohit's salary has been updated 4 times, so it has 4 different
version
of salary.

I want to select salary of person named Rohit. Now suppose , in
intermediate result, I found 4 different versions of the data. I
want to
know the code portion which i need to look for working on all 4
versions
in dataset. :)

Hi Rohit,

Currently in Postgres, these intermediate versions all exist -
however a given session can only see one of them. Also VACUUM is
allowed to destroy versions that no other transactions can see.

So if I'm understanding you correctly, you would like to have some
way for a session to see *all* these versions (and I guess
preventing VACUUM from destroying them).

Any modifications of that sort are bound to introduce lots of pain, not
to mention performance degradation and the added responsibility of
ensuring that dead tuples don't bloat up the system (prevent vacuum from
running at regular intervals and you can have a xid wraparound).

I just mentioned that in case you are planning to go in that direction.
If you only want the data, use the triggers as Gavin mentioned.

Obviously in the general case sure - but (as yet) we don't have much idea
about Rohit's use case and workload. If retrieving past versions is the
*primary* workload bias and high update concurrency is not required then
this could well work better than a trigger based solution.

And it does not seem too onerous to have the ability to switch this on as
required, viz:

ALTER TABLE table1 VERSIONING;

(or similar syntax) which makes VACUUM leave this table alone. It might
make more sense to make such a concept apply to a TABLESPACE instead mind
you (i.e things in here are for archive/versioning purposes)...

What I think can be done is have a tuplestore which has the delta of
updated rows i.e. only have the changes made in an update statement stored
in a tuplestore (it could be a part of RelationData). It should be simple
enough to have tuplestore store the oid of the inserted tuple and the
difference between new tuple and the old tuple. No changes need to be done
for old tuple since it can be marked as deleted and VACUUM can remove it as
normal logic.

Not a clean way, but should work for what you proposed.

#12Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Atri Sharma (#10)
Re: Postgres code for a query intermediate dataset

On 14/09/14 20:24, Atri Sharma wrote:

On Sun, Sep 14, 2014 at 1:30 PM, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz <mailto:mark.kirkwood@catalyst.net.nz>>
wrote:

On 14/09/14 19:25, Atri Sharma wrote:

On Sunday, September 14, 2014, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz
<mailto:mark.kirkwood@catalyst.net.nz>
<mailto:mark.kirkwood@__catalyst.net.nz
<mailto:mark.kirkwood@catalyst.net.nz>>>

wrote:

On 14/09/14 05:36, Rohit Goyal wrote:

Hi All,

I want to work on the code of intermediate dataset of
select and
update
query.

For example.

Rohit's salary has been updated 4 times, so it has 4
different
version
of salary.

I want to select salary of person named Rohit. Now
suppose , in
intermediate result, I found 4 different versions of
the data. I
want to
know the code portion which i need to look for working
on all 4
versions
in dataset. :)

Hi Rohit,

Currently in Postgres, these intermediate versions all exist -
however a given session can only see one of them. Also
VACUUM is
allowed to destroy versions that no other transactions can see.

So if I'm understanding you correctly, you would like to
have some
way for a session to see *all* these versions (and I guess
preventing VACUUM from destroying them).

Any modifications of that sort are bound to introduce lots of
pain, not
to mention performance degradation and the added responsibility of
ensuring that dead tuples don't bloat up the system (prevent
vacuum from
running at regular intervals and you can have a xid wraparound).

I just mentioned that in case you are planning to go in that
direction.
If you only want the data, use the triggers as Gavin mentioned.

Obviously in the general case sure - but (as yet) we don't have much
idea about Rohit's use case and workload. If retrieving past
versions is the *primary* workload bias and high update concurrency
is not required then this could well work better than a trigger
based solution.

And it does not seem too onerous to have the ability to switch this
on as required, viz:

ALTER TABLE table1 VERSIONING;

(or similar syntax) which makes VACUUM leave this table alone.

How do you plan to do all that VACUUM does for this table then?

It seems to me that you are saying to VACUUM that it need not be
concerned with table 'A' and you are assuming ownership of all the tasks
performed by VACUUM for this table. Seems pretty broken to me, not to
mention the performance degradations.

I think the whole point of such a modification is that nothing is done
to such tables, as you want to see all the previous versions.

Clearly this is less performant for standard workloads...but we are
talking about non standard workloads surely...

Regards

Mark

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Rohit Goyal (#9)
Re: Postgres code for a query intermediate dataset

On 14/09/14 20:11, Rohit Goyal wrote:

Hi Mark,

On Sun, Sep 14, 2014 at 8:57 AM, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz <mailto:mark.kirkwood@catalyst.net.nz>>
wrote:

Currently in Postgres, these intermediate versions all exist -
however a given session can only see one of them. Also VACUUM is
allowed to destroy versions that no other transactions can see.

Exactly, one visible version is there per session. But, I want to test
my algorithm in which i myself want to work on all the intermediate
version and find the correct one for the session.

So if I'm understanding you correctly, you would like to have some
way for a session to see *all* these versions (and I guess
preventing VACUUM from destroying them).

yes and I also want to know the portion of code where i can find all the
historical or intermediate versions

Well that's going to be a bit of a learning curve for you :-), the
concept to get familiar with is snapshots (see
src/backand/access/heap/heapam.c to get started).

It is certainly possible (or used to be via snapshot manipulation, I
haven't looked at that code in a while sorry) to enable a session to
see all the old versions, and is quite a cool idea (Postgres used to
have this ability in older versions - called Time Travel).

For pure practicality, this can be achieved without any code
modifications using TRIGGERs and an extra table (as Gavin alludes to).

Can you explain me more about how to starting working using trigger?

I'm not sure we have specif examples in the docs for what you want to
do, but generally see
http://www.postgresql.org/docs/9.4/static/plpgsql-trigger.html

Cheers

Mark

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Rohit Goyal
rhtgyl.87@gmail.com
In reply to: Mark Kirkwood (#13)
Re: Postgres code for a query intermediate dataset

Hi Mark & Atri, :)

Thanks for reply. But, I think i confused you. I am talking about access
using indexes. So, I assume that B+ tree store key-value pair where rohit
is the key and all the versions are its value.

Another way to think is I have a secondary index on emp. name and there are
4 rohit exist in DB. So, now B+ tree gives me 4 different tuple pointer for
each Rohit. I want to know the code portion for this where i can see all 4
tuple pointer before each one have I/O access to fetch its tuple.

Are the suggestions still valid?

On Sun, Sep 14, 2014 at 10:53 AM, Mark Kirkwood <
mark.kirkwood@catalyst.net.nz> wrote:

On 14/09/14 20:11, Rohit Goyal wrote:

Hi Mark,

On Sun, Sep 14, 2014 at 8:57 AM, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz <mailto:mark.kirkwood@catalyst.net.nz>>
wrote:

Currently in Postgres, these intermediate versions all exist -

however a given session can only see one of them. Also VACUUM is
allowed to destroy versions that no other transactions can see.

Exactly, one visible version is there per session. But, I want to test
my algorithm in which i myself want to work on all the intermediate
version and find the correct one for the session.

So if I'm understanding you correctly, you would like to have some
way for a session to see *all* these versions (and I guess
preventing VACUUM from destroying them).

yes and I also want to know the portion of code where i can find all the
historical or intermediate versions

Well that's going to be a bit of a learning curve for you :-), the concept
to get familiar with is snapshots (see src/backand/access/heap/heapam.c
to get started).

Thanks I will read it. :) Can you please tel me some specifics from this c
file, if you already know :)

It is certainly possible (or used to be via snapshot manipulation, I
haven't looked at that code in a while sorry) to enable a session to
see all the old versions, and is quite a cool idea (Postgres used to
have this ability in older versions - called Time Travel).

For pure practicality, this can be achieved without any code
modifications using TRIGGERs and an extra table (as Gavin alludes to).

Can you explain me more about how to starting working using trigger?

I'm not sure we have specif examples in the docs for what you want to do,
but generally see http://www.postgresql.org/docs/9.4/static/plpgsql-
trigger.html

Cheers

Mark

--
Regards,
Rohit Goyal

#15Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Rohit Goyal (#14)
Re: Postgres code for a query intermediate dataset

On 14/09/14 21:18, Rohit Goyal wrote:

Hi Mark & Atri, :)

Thanks for reply. But, I think i confused you. I am talking about access
using indexes. So, I assume that B+ tree store key-value pair where
rohit is the key and all the versions are its value.

Another way to think is I have a secondary index on emp. name and there
are 4 rohit exist in DB. So, now B+ tree gives me 4 different tuple
pointer for each Rohit. I want to know the code portion for this where i
can see all 4 tuple pointer before each one have I/O access to fetch its
tuple.

Are the suggestions still valid?

Visibility rules mentioned earlier apply equally to tables and indexes
(strictly speaking what happens is index tuples are checked against the
relevant tables to see if your session can see them), so discussion of
whether tuples are retrieved via index or table scans is not really
relevant (i.e query planning/optimization is separate from tuple
visibility).

Cheers

Mark

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Mark Kirkwood (#12)
Re: Postgres code for a query intermediate dataset

On 14/09/14 20:43, Mark Kirkwood wrote:

On 14/09/14 20:24, Atri Sharma wrote:

How do you plan to do all that VACUUM does for this table then?

It seems to me that you are saying to VACUUM that it need not be
concerned with table 'A' and you are assuming ownership of all the tasks
performed by VACUUM for this table. Seems pretty broken to me, not to
mention the performance degradations.

I think the whole point of such a modification is that nothing is done
to such tables, as you want to see all the previous versions.

Clearly this is less performant for standard workloads...but we are
talking about non standard workloads surely...

To be fair with respect to what Atri is saying, I should have said
something like:

Clearly this is *horribly* less performant for standard workloads...etc :-)

Also there is the good point he raised about transaction xid wrap, so
some messing about with that part of VACUUM would be required too (it's
the little complications that all add up)!

The TRIGGER based approach is clearly a lot simpler! However for an
interest project to understand Postgres internals the other approach is
worthwhile.

Cheers

Mark

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Robert Haas
robertmhaas@gmail.com
In reply to: Rohit Goyal (#14)
Re: Postgres code for a query intermediate dataset

On Sun, Sep 14, 2014 at 5:18 AM, Rohit Goyal <rhtgyl.87@gmail.com> wrote:

Thanks for reply. But, I think i confused you. I am talking about access
using indexes. So, I assume that B+ tree store key-value pair where rohit is
the key and all the versions are its value.

Another way to think is I have a secondary index on emp. name and there are
4 rohit exist in DB. So, now B+ tree gives me 4 different tuple pointer for
each Rohit. I want to know the code portion for this where i can see all 4
tuple pointer before each one have I/O access to fetch its tuple.

You may want to look at index_getnext(), index_getnext_tid(), and/or
heap_hot_search_buffer().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers