Prototype: In-place upgrade

Started by Zdenek Kotalaover 17 years ago26 messages
#1Zdenek Kotala
Zdenek.Kotala@Sun.COM
1 attachment(s)

Attached patch is prototype of in-place upgrade as was presented on PGCon this year.

Main idea is to learn postgres to handle different version of page and tuple
structures.

1) page - Patch contains new page API and all code access page through this API.
Functions check page version and return correct data to caller. It is mostly
complete now. Only ItemId flags need finish.

2) tuple - HeapTuple structure has been extended with t_ver attribute which
contains page layout version and direct access to HeapTupleHeader is forbidden.
It is possible now only through HeapTuple* functions (see htup.c).
(HeapTupleHeader access still stays in a several functions like heap_form_tuple).

This patch version still does not allow to read old database, but it shows how
it should work. Main disadvantage of this approach is performance penalty.

Please, let me know your opinion about this approach.

Future work:
1) learn WAL to process different tuple structure version
2) tuple conversion to new version and put it into executor (ExecStoreTuple)
3) multiversion MaxItemSize constant

thanks for your comments Zdenek

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql

Attachments:

upgrade.patch.gzapplication/x-gzip; name=upgrade.patch.gzDownload
#2Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Zdenek Kotala (#1)
Re: Prototype: In-place upgrade

The patch seems to be missing the new htup.c file.

Zdenek Kotala wrote:

Attached patch is prototype of in-place upgrade as was presented on
PGCon this year.

Main idea is to learn postgres to handle different version of page and
tuple structures.

1) page - Patch contains new page API and all code access page through
this API. Functions check page version and return correct data to
caller. It is mostly complete now. Only ItemId flags need finish.

2) tuple - HeapTuple structure has been extended with t_ver attribute
which contains page layout version and direct access to HeapTupleHeader
is forbidden. It is possible now only through HeapTuple* functions (see
htup.c). (HeapTupleHeader access still stays in a several functions like
heap_form_tuple).

This patch version still does not allow to read old database, but it
shows how it should work. Main disadvantage of this approach is
performance penalty.

Please, let me know your opinion about this approach.

Future work:
1) learn WAL to process different tuple structure version
2) tuple conversion to new version and put it into executor
(ExecStoreTuple)
3) multiversion MaxItemSize constant

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#3Zdenek Kotala
Zdenek.Kotala@Sun.COM
In reply to: Heikki Linnakangas (#2)
Re: Prototype: In-place upgrade

Heikki Linnakangas napsal(a):

The patch seems to be missing the new htup.c file.

Upps, I'm sorry I'm going to fix it and I will send new version asap.

Zdenek

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql

#4Zdenek Kotala
Zdenek.Kotala@Sun.COM
In reply to: Heikki Linnakangas (#2)
1 attachment(s)
Re: Prototype: In-place upgrade v02

Heikki Linnakangas napsal(a):

The patch seems to be missing the new htup.c file.

I'm sorry. I attached new version which is synchronized with current head. I
would like to say more comments as well.

1) The patch contains also changes which was discussed during July commit
fest. - PageGetTempPage modification suggested by Tom
- another hash.h backward compatible cleanup

2) I add tuplimits.h header file which contains tuple limits for different
access method. It is not finished yet, but idea is to keep all limits in one
file and easily add limits for different page layout version - for example
replace static computing with dynamic based on relation (maxtuplesize could be
store in pg_class for each relation).

I need this header also because I fallen in a cycle in header dependency.

3) I already sent Page API performance result in
http://archives.postgresql.org/pgsql-hackers/2008-08/msg00398.php

I replaced call sequence PagetGetItemId, PageGetItemId with PageGetIndexTuple
and PageGetHeapTuple function. It is main difference in this patch. PAgeGetHeap
Tuple fills t_ver in HeapTuple to identify correct tupleheader version.

It would be good to mention that PageAPI (and tuple API) implementation is only
prototype without any performance optimization.

4) This patch contains more topics for decision. First is general if this
approach is acceptable. Second is about new Page API if we replace all page
access with new proposed macros/(inline)function. Third is how to name and where
store different data structure version. My idea is use suffix with underscore
and page layout version and keep all version in a same header file.

5) I got another idea about usage of page API. I call it "3 in 1". Because all
page access will be through New API, it could be use for WAL logging and other
WAL recording could be reduced. Replication could be easily added based on page
modification. It is just idea for thinking.

6) it is probably all for Friday evening.

Zdenek

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql

Attachments:

upgrade_02.patch.gzapplication/x-gzip; name=upgrade_02.patch.gzDownload
#5Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Zdenek Kotala (#4)
Re: Prototype: In-place upgrade v02

Zdenek Kotala wrote:

Heikki Linnakangas napsal(a):

The patch seems to be missing the new htup.c file.

I'm sorry. I attached new version which is synchronized with current
head. I would like to say more comments as well.

1) The patch contains also changes which was discussed during July
commit fest. - PageGetTempPage modification suggested by Tom
- another hash.h backward compatible cleanup

It might be a good idea to split that into a separate patch. The sheer
size of this patch is quite daunting, even though the bulk of it is
straightforward search&replace.

2) I add tuplimits.h header file which contains tuple limits for
different access method. It is not finished yet, but idea is to keep all
limits in one file and easily add limits for different page layout
version - for example replace static computing with dynamic based on
relation (maxtuplesize could be store in pg_class for each relation).

I need this header also because I fallen in a cycle in header dependency.

3) I already sent Page API performance result in
http://archives.postgresql.org/pgsql-hackers/2008-08/msg00398.php

I replaced call sequence PagetGetItemId, PageGetItemId with
PageGetIndexTuple and PageGetHeapTuple function. It is main difference
in this patch. PAgeGetHeap Tuple fills t_ver in HeapTuple to identify
correct tupleheader version.

It would be good to mention that PageAPI (and tuple API) implementation
is only prototype without any performance optimization.

You mentioned 5% performance degradation in that thread. What test case
was that? What would be a worst-case scanario, and how bad is it?

5% is a pretty hefty price, especially when it's paid by not only
upgraded installations, but also freshly initialized clusters. I think
you'll need to pursue those performance optimizations.

4) This patch contains more topics for decision. First is general if
this approach is acceptable.

I don't like the invasiveness of this approach. It's pretty invasive
already, and ISTM you'll need similar switch-case handling of all data
types that have changed the internal representation as well.

We've talked about this before, so you'll remember that I favor teh
approach is to convert the page format, page at a time, when the pages
are read in. I grant you that there's non-trivial issues with that as
well, like if the converted data takes more space and don't fit in the
page anymore.

I wonder if we could go with some sort of a hybrid approach? Convert the
whole page when it's read in, but if it doesn't fit, fall back to
tricks like loosening the alignment requirements on platforms that can
handle non-aligned data, or support a special truncated page header,
without pd_tli and pd_prune_xid fields. Just a thought, not sure how
feasible those particular tricks are, but something along those lines..

All in all, though. I find it a bit hard to see the big picture. For
upgrade-in-place, what are all the pieces that we need? To keep this
concrete, let's focus on PG 8.2 -> PG 8.3 (or are you focusing on PG 8.3
-> 8.4? That's fine with me as well, but let's pick one) and forget
about hypothetical changes that might occur in a future version. I can see:
1. Handling page layout changes (pd_prune_xid, pd_flags)
2. Handling tuple header changes (infomask2, HOT bits, combocid)
3. Handling changes in data type representation (packed varlens)
4. Toast chunk size
5. Catalogs

After putting all those together, how large a patch are we talking
about, and what's the performance penalty then? How much of all that
needs to be in core, and how much can live in a pgfoundry project or an
extra binary in src/bin or contrib? I realize that none of us have a
crystal ball, and one has to start somewhere, but I feel uneasy
committing to an approach until we have a full plan.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#6Greg Smith
gsmith@gregsmith.com
In reply to: Heikki Linnakangas (#5)
Re: Prototype: In-place upgrade v02

On Fri, 5 Sep 2008, Heikki Linnakangas wrote:

All in all, though. I find it a bit hard to see the big picture.

I've been working on trying to see that myself lately, have been dumping
links to all the interesting material at
http://wiki.postgresql.org/wiki/In-place_upgrade if there's any of that
you haven't seen before.

To keep this concrete, let's focus on PG 8.2 -> PG 8.3 (or are you
focusing on PG 8.3 -> 8.4? That's fine with me as well, but let's pick
one)

From a complexity perspective, the changes needed to go from 8.2->8.3 seem
much larger than what's needed for 8.3->8.4. There's also a huge PR win
if 8.4 goes out the door saying that in-place upgrades are available from
the previous version starting at the 8.4 release. Given the limited time
left, I would think a focus on nailing the 8.3->8.4 conversion down first
and then slipping in support for earlier revs later would be one way to
get this into more managable chunks. Obviously if you can fit
infrastructure that makes the 8.2 conversion easier that's worth doing,
but I'd hate to see this get bogged down worrying too much about things
that haven't actually changed since 8.3.

The specific areas I am getting up to speed to help out with here are
catalog updates and working on integration/testing.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#7Bruce Momjian
bruce@momjian.us
In reply to: Heikki Linnakangas (#5)
Re: Prototype: In-place upgrade v02

Heikki Linnakangas wrote:

4) This patch contains more topics for decision. First is general if
this approach is acceptable.

I don't like the invasiveness of this approach. It's pretty invasive
already, and ISTM you'll need similar switch-case handling of all data
types that have changed the internal representation as well.

We've talked about this before, so you'll remember that I favor teh
approach is to convert the page format, page at a time, when the pages
are read in. I grant you that there's non-trivial issues with that as
well, like if the converted data takes more space and don't fit in the
page anymore.

I 100% agree with Heikki here; having the conversion spill out into the
main backend is very expensive and adds lots of complexity. The only
argument for the Zdenek's conversion spill appoach is that it allows
conversion to happen at a more natural time than when the page is read
in, but frankly I think the conversion needs are going to be pretty
limited and are better done in a localized way at page read-in time.

As far as the page not fitting after conversion, what about some user
command that will convert an entire table to the new format if page
expansion fails.

I wonder if we could go with some sort of a hybrid approach? Convert the
whole page when it's read in, but if it doesn't fit, fall back to
tricks like loosening the alignment requirements on platforms that can
handle non-aligned data, or support a special truncated page header,
without pd_tli and pd_prune_xid fields. Just a thought, not sure how
feasible those particular tricks are, but something along those lines..

All in all, though. I find it a bit hard to see the big picture. For
upgrade-in-place, what are all the pieces that we need? To keep this
concrete, let's focus on PG 8.2 -> PG 8.3 (or are you focusing on PG 8.3
-> 8.4? That's fine with me as well, but let's pick one) and forget
about hypothetical changes that might occur in a future version. I can see:
1. Handling page layout changes (pd_prune_xid, pd_flags)
2. Handling tuple header changes (infomask2, HOT bits, combocid)
3. Handling changes in data type representation (packed varlens)
4. Toast chunk size
5. Catalogs

After putting all those together, how large a patch are we talking
about, and what's the performance penalty then? How much of all that
needs to be in core, and how much can live in a pgfoundry project or an
extra binary in src/bin or contrib? I realize that none of us have a
crystal ball, and one has to start somewhere, but I feel uneasy
committing to an approach until we have a full plan.

Yes, another very good point.

I am ready to focus on these issues for 8.4; all this needs to be
fleshed out, perhaps on a wiki. As a starting point, what would be
really nice is to start a wiki that lists all data format changes for
every major release.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#8Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Bruce Momjian (#7)
Re: Prototype: In-place upgrade v02

Bruce Momjian wrote:

As far as the page not fitting after conversion, what about some user
command that will convert an entire table to the new format if page
expansion fails.

VACUUM?

Having to run a manual command defeats the purpose somewhat, though.
Especially if you have no way of knowing on what tables it needs to be
run on.

I am ready to focus on these issues for 8.4; all this needs to be
fleshed out, perhaps on a wiki. As a starting point, what would be
really nice is to start a wiki that lists all data format changes for
every major release.

Have you looked at http://wiki.postgresql.org/wiki/In-place_upgrade
already, that Greg Smith mentioned elsewhere in this thread? That's a
good starting point.

In fact, I don't think there's any low-level data format changes yet
between 8.3 and 8.4, so this would be a comparatively easy release to
implement upgrade-in-place. There's just the catalog changes, but AFAICS
nothing that would require scanning through relations.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#9Bruce Momjian
bruce@momjian.us
In reply to: Heikki Linnakangas (#8)
Re: Prototype: In-place upgrade v02

Heikki Linnakangas wrote:

Bruce Momjian wrote:

As far as the page not fitting after conversion, what about some user
command that will convert an entire table to the new format if page
expansion fails.

VACUUM?

Having to run a manual command defeats the purpose somewhat, though.
Especially if you have no way of knowing on what tables it needs to be
run on.

My assumption is that the page not fitting would be a rare case so
requiring something like vacuum to fix it would be OK.

What I don't want to do it to add lots of complexity to the code just to
handle the page expansion case, when such a case is rare and perhaps can
be fixed by a vacuum.

I am ready to focus on these issues for 8.4; all this needs to be
fleshed out, perhaps on a wiki. As a starting point, what would be
really nice is to start a wiki that lists all data format changes for
every major release.

Have you looked at http://wiki.postgresql.org/wiki/In-place_upgrade
already, that Greg Smith mentioned elsewhere in this thread? That's a
good starting point.

Agreed.

In fact, I don't think there's any low-level data format changes yet
between 8.3 and 8.4, so this would be a comparatively easy release to
implement upgrade-in-place. There's just the catalog changes, but AFAICS
nothing that would require scanning through relations.

Yep.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#8)
Re: Prototype: In-place upgrade v02

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

In fact, I don't think there's any low-level data format changes yet
between 8.3 and 8.4, so this would be a comparatively easy release to
implement upgrade-in-place. There's just the catalog changes, but AFAICS
nothing that would require scanning through relations.

After a quick scan of the catversion.h changelog (which hopefully covers
any such changes): we changed sequences incompatibly, we changed hash
indexes incompatibly (even without the pending patch that would change
their contents beyond recognition), and Teodor did some stuff to GIN
indexes that might or might not represent an on-disk format change,
you'd have to ask him. We also whacked around the sort order of
bpchar_pattern_ops btree indexes.

I didn't see anything that looked like an immediate change in user table
contents, unless they used the "name" type; but what of relation forks?

regards, tom lane

#11Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#10)
Re: Prototype: In-place upgrade v02

Tom Lane wrote:

I didn't see anything that looked like an immediate change in user table
contents, unless they used the "name" type; but what of relation forks?

Relation forks didn't change anything inside relation files, so no
scanning of relations is required because of that. Neither will the FSM
rewrite. Not sure about DSM yet.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#12Gregory Stark
stark@enterprisedb.com
In reply to: Heikki Linnakangas (#11)
Re: Prototype: In-place upgrade v02

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

Tom Lane wrote:

I didn't see anything that looked like an immediate change in user table
contents, unless they used the "name" type; but what of relation forks?

Relation forks didn't change anything inside relation files, so no scanning of
relations is required because of that. Neither will the FSM rewrite. Not sure
about DSM yet.

And just to confirm -- they don't change the name of the files the postmaster
expects to find in its data directory, right?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's PostGIS support!

#13Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Gregory Stark (#12)
Re: Prototype: In-place upgrade v02

Gregory Stark wrote:

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

Relation forks didn't change anything inside relation files, so no scanning of
relations is required because of that. Neither will the FSM rewrite. Not sure
about DSM yet.

And just to confirm -- they don't change the name of the files the postmaster
expects to find in its data directory, right?

Right. But it wouldn't be a big issue anyway. Renaming would be quick
regardless of the relation sizes, FSM and DSM will introduce new files,
though, that probably need to be created as part of the upgrade, but
again they're not very big.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#14Zdenek Kotala
Zdenek.Kotala@Sun.COM
In reply to: Heikki Linnakangas (#5)
Re: Prototype: In-place upgrade v02

Heikki Linnakangas napsal(a):

Zdenek Kotala wrote:

Heikki Linnakangas napsal(a):

The patch seems to be missing the new htup.c file.

I'm sorry. I attached new version which is synchronized with current
head. I would like to say more comments as well.

1) The patch contains also changes which was discussed during July
commit fest. - PageGetTempPage modification suggested by Tom
- another hash.h backward compatible cleanup

It might be a good idea to split that into a separate patch. The sheer
size of this patch is quite daunting, even though the bulk of it is
straightforward search&replace.

Yes, I will do it.

2) I add tuplimits.h header file which contains tuple limits for
different access method. It is not finished yet, but idea is to keep
all limits in one file and easily add limits for different page layout
version - for example replace static computing with dynamic based on
relation (maxtuplesize could be store in pg_class for each relation).

I need this header also because I fallen in a cycle in header dependency.

3) I already sent Page API performance result in
http://archives.postgresql.org/pgsql-hackers/2008-08/msg00398.php

I replaced call sequence PagetGetItemId, PageGetItemId with
PageGetIndexTuple and PageGetHeapTuple function. It is main difference
in this patch. PAgeGetHeap Tuple fills t_ver in HeapTuple to identify
correct tupleheader version.

It would be good to mention that PageAPI (and tuple API)
implementation is only prototype without any performance optimization.

You mentioned 5% performance degradation in that thread. What test case
was that? What would be a worst-case scanario, and how bad is it?

Paul van den Bogaart tested long run OLTP workload on it. He used iGen test.

5% is a pretty hefty price, especially when it's paid by not only
upgraded installations, but also freshly initialized clusters. I think
you'll need to pursue those performance optimizations.

5% is worst scenario. Current version is not optimized. It is written for easy
debugging and (D)tracing. Pageheaders structures are very similar and we can
easily remove switches for most of attributes and replace function with macros
or inline function.

4) This patch contains more topics for decision. First is general if
this approach is acceptable.

I don't like the invasiveness of this approach. It's pretty invasive
already, and ISTM you'll need similar switch-case handling of all data
types that have changed the internal representation as well.

I agree in general. But for example new page API is not so invasive and by my
opinion it should be implemented (with or without multiversion support), because
it cleans a code. HeapTuple processing is easy too, but unfortunately it
requires lot of modifications on many places. I has wonder how many pieces of
code access directly to HeapTupleHeader and does not use HeapTuple data
structure. I think we should make a conclusion what is recommended usage of
HeapTupleHeader and HeapTuple. Most of changes in a code is like replacing
HeapTupleHeaderGetXmax(tuple->t_data) with HeapTupleGetXmax(tuple) and so on. I
think it should be cleanup anyway.

You mentioned data types, but it is not a problem. You can easily extend data
type attribute about version information and call correct in/out functions. Or
use different Oid for new data type version. There are more possible easy
solutions for data types. And for conversion You can use ALTER TABLE command.
Main idea is keep data in all format in a relation. This approach should use
also for integer/float datetime problem.

We've talked about this before, so you'll remember that I favor teh
approach is to convert the page format, page at a time, when the pages
are read in. I grant you that there's non-trivial issues with that as
well, like if the converted data takes more space and don't fit in the
page anymore.

I like conversion on read too, because it is easy but there are more problems.

The non-fit page is one them. Others problems are with indexes. For example
hash index stores bitmap into page and it is not mentioned anywhere. Only hash
am knows what page contains this kind of data. It is probably impossible to
convert this page during a reading. :(

I wonder if we could go with some sort of a hybrid approach? Convert the
whole page when it's read in, but if it doesn't fit, fall back to
tricks like loosening the alignment requirements on platforms that can
handle non-aligned data, or support a special truncated page header,
without pd_tli and pd_prune_xid fields. Just a thought, not sure how
feasible those particular tricks are, but something along those lines..

OK, I have backup idea :-). Stay tuned :-)

All in all, though. I find it a bit hard to see the big picture. For
upgrade-in-place, what are all the pieces that we need? To keep this
concrete, let's focus on PG 8.2 -> PG 8.3 (or are you focusing on PG 8.3
-> 8.4? That's fine with me as well, but let's pick one) and forget
about hypothetical changes that might occur in a future version. I can see:
1. Handling page layout changes (pd_prune_xid, pd_flags)
2. Handling tuple header changes (infomask2, HOT bits, combocid)

2.5 + composite data type

3. Handling changes in data type representation (packed varlens)

3.5 Data types generally (cidr/inet)

4. Toast chunk size

4.5 general MaxTupleSize for each different AM

5. Catalogs

6. AM methods

After putting all those together, how large a patch are we talking
about, and what's the performance penalty then? How much of all that
needs to be in core, and how much can live in a pgfoundry project or an
extra binary in src/bin or contrib? I realize that none of us have a
crystal ball, and one has to start somewhere, but I feel uneasy
committing to an approach until we have a full plan.

Unfortunately, I'm still in analyzing phase. Presented patch is prototype of one
possible approach. I hit lot of problems and I don't have still answers on all
of them. I'm going to update wiki page to share all these information.

At this moment, I think that I can implement offline heap conversion (8.2->8.4)
and all indexed will be reindex. It is what we can have for 8.4. Online
conversion has lot of problems which we are not able to answer at this moment.

Zdenek

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql

#15Zdenek Kotala
Zdenek.Kotala@Sun.COM
In reply to: Bruce Momjian (#7)
Re: Prototype: In-place upgrade v02

Bruce Momjian napsal(a):

As far as the page not fitting after conversion, what about some user
command that will convert an entire table to the new format if page
expansion fails.

Keep in a mind that there are more kind of pages. Heap is easy, but each index
AM has own specific :(. Better approach is move tuple to the new page and
invalidate all related table indexes. Following reindex automatically convert
whole table.

After putting all those together, how large a patch are we talking
about, and what's the performance penalty then? How much of all that
needs to be in core, and how much can live in a pgfoundry project or an
extra binary in src/bin or contrib? I realize that none of us have a
crystal ball, and one has to start somewhere, but I feel uneasy
committing to an approach until we have a full plan.

Yes, another very good point.

I am ready to focus on these issues for 8.4; all this needs to be
fleshed out, perhaps on a wiki. As a starting point, what would be
really nice is to start a wiki that lists all data format changes for
every major release.

As Greg mentioned in his mail there wiki page is already there. Unfortunately, I
did not time to put actual information there. I'm going to do soon.

Zdenek

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql

#16Zdenek Kotala
Zdenek.Kotala@Sun.COM
In reply to: Bruce Momjian (#9)
Re: Prototype: In-place upgrade v02

Bruce Momjian napsal(a):

Heikki Linnakangas wrote:

Bruce Momjian wrote:

As far as the page not fitting after conversion, what about some user
command that will convert an entire table to the new format if page
expansion fails.

VACUUM?

Having to run a manual command defeats the purpose somewhat, though.
Especially if you have no way of knowing on what tables it needs to be
run on.

My assumption is that the page not fitting would be a rare case so
requiring something like vacuum to fix it would be OK.

It is 1-2% records per heap. I assume that is is more for BTree.

What I don't want to do it to add lots of complexity to the code just to
handle the page expansion case, when such a case is rare and perhaps can
be fixed by a vacuum.

Unfortunately it is not so rare. And only heap on 32bit x86 platform (4byte Max
alignment) is no problem. But all index pages are affected.

In fact, I don't think there's any low-level data format changes yet
between 8.3 and 8.4, so this would be a comparatively easy release to
implement upgrade-in-place. There's just the catalog changes, but AFAICS
nothing that would require scanning through relations.

Yep.

I did not test now but pg_upgrade.sh script worked fine in May without any
modification for conversion 8.3->8.4.

Zdenek

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql

#17Zdenek Kotala
Zdenek.Kotala@Sun.COM
In reply to: Tom Lane (#10)
Re: Prototype: In-place upgrade v02

Tom Lane napsal(a):

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

In fact, I don't think there's any low-level data format changes yet
between 8.3 and 8.4, so this would be a comparatively easy release to
implement upgrade-in-place. There's just the catalog changes, but AFAICS
nothing that would require scanning through relations.

After a quick scan of the catversion.h changelog (which hopefully covers
any such changes): we changed sequences incompatibly, we changed hash
indexes incompatibly (even without the pending patch that would change
their contents beyond recognition), and Teodor did some stuff to GIN
indexes that might or might not represent an on-disk format change,
you'd have to ask him. We also whacked around the sort order of
bpchar_pattern_ops btree indexes.

Hmm, It seems that reindex is only good answer on all these changes. Sequence
should be converted during catalog conversion.

Another idea is to create backward compatible AM and put them into separate
library. If these AM will work also with old page structure then there should
not be reason for reindexing or index page conversion after upgrade.

Zdenek

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql

#18Zdenek Kotala
Zdenek.Kotala@Sun.COM
In reply to: Heikki Linnakangas (#11)
Re: Prototype: In-place upgrade v02

Heikki Linnakangas napsal(a):

Tom Lane wrote:

I didn't see anything that looked like an immediate change in user table
contents, unless they used the "name" type; but what of relation forks?

Relation forks didn't change anything inside relation files, so no
scanning of relations is required because of that. Neither will the FSM
rewrite. Not sure about DSM yet.

Does it mean, that if you "inject" old data file after catalog upgrade, then FSM
will works without any problem?

Zdenek

PS: I plan to review FSM this week.

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql

#19Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Zdenek Kotala (#18)
Re: Prototype: In-place upgrade v02

Zdenek Kotala wrote:

Heikki Linnakangas napsal(a):

Relation forks didn't change anything inside relation files, so no
scanning of relations is required because of that. Neither will the
FSM rewrite. Not sure about DSM yet.

Does it mean, that if you "inject" old data file after catalog upgrade,
then FSM will works without any problem?

Yes. You'll need to construct an FSM, but it doesn't necessarily need to
reflect the reality. You could just fill it with zeros, meaning that
there's no free space anywhere, and let the next vacuum fill it with
real information. Or you could read the old pg_fsm.cache file and fill
the new FSM accordingly.

PS: I plan to review FSM this week.

Thanks!

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#20Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Zdenek Kotala (#17)
Re: Prototype: In-place upgrade v02

Zdenek Kotala wrote:

Tom Lane napsal(a):

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

In fact, I don't think there's any low-level data format changes yet
between 8.3 and 8.4, so this would be a comparatively easy release to
implement upgrade-in-place. There's just the catalog changes, but
AFAICS nothing that would require scanning through relations.

After a quick scan of the catversion.h changelog (which hopefully covers
any such changes): we changed sequences incompatibly, we changed hash
indexes incompatibly (even without the pending patch that would change
their contents beyond recognition), and Teodor did some stuff to GIN
indexes that might or might not represent an on-disk format change,
you'd have to ask him. We also whacked around the sort order of
bpchar_pattern_ops btree indexes.

Hmm, It seems that reindex is only good answer on all these changes.

Isn't that exactly what we want to avoid with upgrade-in-place? As long
as the conversion can be done page-at-a-time, without consulting other
pages, we can do it when the page is read in.

I'm not sure what the GIN changes were, but I didn't see any changes to
the page layout at a quick glance.

The bpchar_pattern_ops change you mentioned must be this one:

A not-immediately-obvious incompatibility is that the sort order within
bpchar_pattern_ops indexes changes --- it had been identical to plain
strcmp, but is now trailing-blank-insensitive. This will impact
in-place upgrades, if those ever happen.

The way I read that, bpchar_pattern_ops just became less sensitive. Some
values are now considered equal that weren't before, and thus can now be
stored in any order. That's not an incompatible change, right?

Sequence should be converted during catalog conversion.

Agreed.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zdenek Kotala (#17)
Re: Prototype: In-place upgrade v02

Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:

Another idea is to create backward compatible AM and put them into separate
library. If these AM will work also with old page structure then there should
not be reason for reindexing or index page conversion after upgrade.

I don't think that'd be real workable. It would require duplicating all
the entries for that AM in pg_opfamily, pg_amop, etc. Which we could do
for the built-in entries, I suppose, but what happens to user-defined
operator classes?

At least for the index changes proposed so far for 8.4, it seems to me
that the best solution is to mark affected indexes as not "indisvalid"
and require a post-conversion REINDEX to fix 'em. Obviously a better
solution would be nice later, but we have to avoid putting huge amounts
of work into noncritical problems, else the whole feature is just not
going to get finished.

regards, tom lane

#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#20)
Re: Prototype: In-place upgrade v02

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

The bpchar_pattern_ops change you mentioned must be this one:

A not-immediately-obvious incompatibility is that the sort order within
bpchar_pattern_ops indexes changes --- it had been identical to plain
strcmp, but is now trailing-blank-insensitive. This will impact
in-place upgrades, if those ever happen.

Yup.

The way I read that, bpchar_pattern_ops just became less sensitive. Some
values are now considered equal that weren't before, and thus can now be
stored in any order. That's not an incompatible change, right?

No, consider 'abc^I' vs 'abc ' (^I denoting a tab character). These are
unequal in either case, but the sort order has flipped.

regards, tom lane

#23Zdenek Kotala
Zdenek.Kotala@Sun.COM
In reply to: Heikki Linnakangas (#20)
Re: Prototype: In-place upgrade v02

Heikki Linnakangas napsal(a):

Zdenek Kotala wrote:

Tom Lane napsal(a):

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

In fact, I don't think there's any low-level data format changes yet
between 8.3 and 8.4, so this would be a comparatively easy release
to implement upgrade-in-place. There's just the catalog changes, but
AFAICS nothing that would require scanning through relations.

After a quick scan of the catversion.h changelog (which hopefully covers
any such changes): we changed sequences incompatibly, we changed hash
indexes incompatibly (even without the pending patch that would change
their contents beyond recognition), and Teodor did some stuff to GIN
indexes that might or might not represent an on-disk format change,
you'd have to ask him. We also whacked around the sort order of
bpchar_pattern_ops btree indexes.

Hmm, It seems that reindex is only good answer on all these changes.

Isn't that exactly what we want to avoid with upgrade-in-place? As long
as the conversion can be done page-at-a-time, without consulting other
pages, we can do it when the page is read in.

Yes, but I meant what we can do for 8.4.

Zdenek

#24Zdenek Kotala
Zdenek.Kotala@Sun.COM
In reply to: Heikki Linnakangas (#19)
Re: Prototype: In-place upgrade v02

Heikki Linnakangas napsal(a):

Zdenek Kotala wrote:

Heikki Linnakangas napsal(a):

Relation forks didn't change anything inside relation files, so no
scanning of relations is required because of that. Neither will the
FSM rewrite. Not sure about DSM yet.

Does it mean, that if you "inject" old data file after catalog
upgrade, then FSM will works without any problem?

Yes. You'll need to construct an FSM, but it doesn't necessarily need to
reflect the reality. You could just fill it with zeros, meaning that
there's no free space anywhere, and let the next vacuum fill it with
real information. Or you could read the old pg_fsm.cache file and fill
the new FSM accordingly.

I think zeroed FSM is good, because new items should not be added on to old page.

Zdenek

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql

#25Zdenek Kotala
Zdenek.Kotala@Sun.COM
In reply to: Tom Lane (#21)
Re: Prototype: In-place upgrade v02

Tom Lane napsal(a):

Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:

Another idea is to create backward compatible AM and put them into separate
library. If these AM will work also with old page structure then there should
not be reason for reindexing or index page conversion after upgrade.

I don't think that'd be real workable. It would require duplicating all
the entries for that AM in pg_opfamily, pg_amop, etc. Which we could do
for the built-in entries, I suppose, but what happens to user-defined
operator classes?

When catalog upgrade will be performed directly, user-defined op classes should
stay in the catalog. But question is what's happen with regproc records and if
all functions will be compatible with a new server ... It invokes idea that we
need stable API for operator and data types implementation. All datatype which
will use only this API, then can be used on new PostgreSQL versions without
recompilation.

At least for the index changes proposed so far for 8.4, it seems to me
that the best solution is to mark affected indexes as not "indisvalid"
and require a post-conversion REINDEX to fix 'em. Obviously a better
solution would be nice later, but we have to avoid putting huge amounts
of work into noncritical problems, else the whole feature is just not
going to get finished.

Agree.

Zdenek

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql

#26Bruce Momjian
bruce@momjian.us
In reply to: Zdenek Kotala (#14)
Re: Prototype: In-place upgrade v02

Zdenek Kotala wrote:

You mentioned data types, but it is not a problem. You can easily extend data
type attribute about version information and call correct in/out functions. Or
use different Oid for new data type version. There are more possible easy
solutions for data types. And for conversion You can use ALTER TABLE command.
Main idea is keep data in all format in a relation. This approach should use
also for integer/float datetime problem.

This kind of code structure scares me that our system will become so
complex that it will hinder our ability to continue making improvements.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +