large object does not exist after pg_migrator

Started by Jamie Foxalmost 17 years ago17 messageshackersgeneral
Jump to latest
#1Jamie Fox
jfox@directcommerce.com
hackersgeneral

Hi -
After what seemed to be a normal successful pg_migrator migration from 8.3.7
to 8.4.0, in either link or copy mode, vacuumlo fails on both our production
and qa databases:

Jul 1 11:17:03 db2 postgres[9321]: [14-1] LOG: duration: 175.563 ms
statement: DELETE FROM vacuum_l WHERE lo IN (SELECT "xml_data" FROM
"public"."xml_user")
Jul 1 11:17:03 db2 postgres[9321]: [15-1] ERROR: large object 17919608
does not exist
Jul 1 11:17:03 db2 postgres[9321]: [16-1] ERROR: current transaction is
aborted, commands ignored until end of transaction block

I migrated our qa database using pg_dump/pg_restore and vacuumlo has no
problem with it. When I try querying the two databases for large objects
manually I see the same error in the one that was migrated with pg_migrator:

select loread(lo_open(xml_data,262144),1073741819) from xml_user where id =
'10837246';
ERROR: large object 24696063 does not exist
SQL state: 42704

I can also see that the pg_largeobject table is different, in the pg_restore
version the Rows (estimated) is 316286 and Rows (counted) is the same, in
the pg_migrator version the Rows (counted) is only 180507.

Any advice on what I might look for to try and track down this problem?
pg_restore on our production database takes too long so it would be really
nice to use pg_migrator instead.

Thanks,

Jamie

#2Jamie Fox
jfox@directcommerce.com
In reply to: Jamie Fox (#1)
hackersgeneral
Re: large object does not exist after pg_migrator

Hi -
This is probably more helpful - the pg_largeobject table only changed after
vacuumlo, not before. When comparing pre- and post- pg_migrator databases
(no vacuum or vacuumlo):

select * from pg_largeobject where loid = '24696063';

in the pre- there are three rows, having pageno 0 through 3, in the post-
database there are no results.

Thanks for any advice,

Jamie

On Mon, Jul 13, 2009 at 2:13 PM, Jamie Fox <jfox@directcommerce.com> wrote:

Show quoted text

Hi -
After what seemed to be a normal successful pg_migrator migration from
8.3.7 to 8.4.0, in either link or copy mode, vacuumlo fails on both our
production and qa databases:

Jul 1 11:17:03 db2 postgres[9321]: [14-1] LOG: duration: 175.563 ms
statement: DELETE FROM vacuum_l WHERE lo IN (SELECT "xml_data" FROM
"public"."xml_user")
Jul 1 11:17:03 db2 postgres[9321]: [15-1] ERROR: large object 17919608
does not exist
Jul 1 11:17:03 db2 postgres[9321]: [16-1] ERROR: current transaction is
aborted, commands ignored until end of transaction block

I migrated our qa database using pg_dump/pg_restore and vacuumlo has no
problem with it. When I try querying the two databases for large objects
manually I see the same error in the one that was migrated with pg_migrator:

select loread(lo_open(xml_data,262144),1073741819) from xml_user where id =
'10837246';
ERROR: large object 24696063 does not exist
SQL state: 42704

I can also see that the pg_largeobject table is different, in the
pg_restore version the Rows (estimated) is 316286 and Rows (counted) is the
same, in the pg_migrator version the Rows (counted) is only 180507.

Any advice on what I might look for to try and track down this problem?
pg_restore on our production database takes too long so it would be really
nice to use pg_migrator instead.

Thanks,

Jamie

#3Bruce Momjian
bruce@momjian.us
In reply to: Jamie Fox (#1)
hackersgeneral
Re: [GENERAL] large object does not exist after pg_migrator

Jamie Fox wrote:

Hi -
After what seemed to be a normal successful pg_migrator migration from 8.3.7
to 8.4.0, in either link or copy mode, vacuumlo fails on both our production
and qa databases:

Jul 1 11:17:03 db2 postgres[9321]: [14-1] LOG: duration: 175.563 ms
statement: DELETE FROM vacuum_l WHERE lo IN (SELECT "xml_data" FROM
"public"."xml_user")
Jul 1 11:17:03 db2 postgres[9321]: [15-1] ERROR: large object 17919608
does not exist
Jul 1 11:17:03 db2 postgres[9321]: [16-1] ERROR: current transaction is
aborted, commands ignored until end of transaction block

I migrated our qa database using pg_dump/pg_restore and vacuumlo has no
problem with it. When I try querying the two databases for large objects
manually I see the same error in the one that was migrated with pg_migrator:

select loread(lo_open(xml_data,262144),1073741819) from xml_user where id =
'10837246';
ERROR: large object 24696063 does not exist
SQL state: 42704

I can also see that the pg_largeobject table is different, in the pg_restore
version the Rows (estimated) is 316286 and Rows (counted) is the same, in
the pg_migrator version the Rows (counted) is only 180507.

Any advice on what I might look for to try and track down this problem?
pg_restore on our production database takes too long so it would be really
nice to use pg_migrator instead.

[ Email moved to hackers list.]

Wow, I didn't test large objects specifically, and I am confused why
there would be a count discrepancy. I will need to do some research
unless someone else can guess about the cause.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#4Bruce Momjian
bruce@momjian.us
In reply to: Jamie Fox (#2)
hackersgeneral
Re: [GENERAL] large object does not exist after pg_migrator

Forwarded to hackers.

---------------------------------------------------------------------------

Jamie Fox wrote:

Hi -
This is probably more helpful - the pg_largeobject table only changed after
vacuumlo, not before. When comparing pre- and post- pg_migrator databases
(no vacuum or vacuumlo):

select * from pg_largeobject where loid = '24696063';

in the pre- there are three rows, having pageno 0 through 3, in the post-
database there are no results.

Thanks for any advice,

Jamie

On Mon, Jul 13, 2009 at 2:13 PM, Jamie Fox <jfox@directcommerce.com> wrote:

Hi -
After what seemed to be a normal successful pg_migrator migration from
8.3.7 to 8.4.0, in either link or copy mode, vacuumlo fails on both our
production and qa databases:

Jul 1 11:17:03 db2 postgres[9321]: [14-1] LOG: duration: 175.563 ms
statement: DELETE FROM vacuum_l WHERE lo IN (SELECT "xml_data" FROM
"public"."xml_user")
Jul 1 11:17:03 db2 postgres[9321]: [15-1] ERROR: large object 17919608
does not exist
Jul 1 11:17:03 db2 postgres[9321]: [16-1] ERROR: current transaction is
aborted, commands ignored until end of transaction block

I migrated our qa database using pg_dump/pg_restore and vacuumlo has no
problem with it. When I try querying the two databases for large objects
manually I see the same error in the one that was migrated with pg_migrator:

select loread(lo_open(xml_data,262144),1073741819) from xml_user where id =
'10837246';
ERROR: large object 24696063 does not exist
SQL state: 42704

I can also see that the pg_largeobject table is different, in the
pg_restore version the Rows (estimated) is 316286 and Rows (counted) is the
same, in the pg_migrator version the Rows (counted) is only 180507.

Any advice on what I might look for to try and track down this problem?
pg_restore on our production database takes too long so it would be really
nice to use pg_migrator instead.

Thanks,

Jamie

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#5Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#3)
hackersgeneral
Re: [GENERAL] large object does not exist after pg_migrator

Bruce Momjian wrote:

Jamie Fox wrote:

I can also see that the pg_largeobject table is different, in the pg_restore
version the Rows (estimated) is 316286 and Rows (counted) is the same, in
the pg_migrator version the Rows (counted) is only 180507.

Wow, I didn't test large objects specifically, and I am confused why
there would be a count discrepancy. I will need to do some research
unless someone else can guess about the cause.

Maybe pg_largeobject is not getting frozen?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#6Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#5)
hackersgeneral
Re: [GENERAL] large object does not exist after pg_migrator

Alvaro Herrera wrote:

Bruce Momjian wrote:

Jamie Fox wrote:

I can also see that the pg_largeobject table is different, in the pg_restore
version the Rows (estimated) is 316286 and Rows (counted) is the same, in
the pg_migrator version the Rows (counted) is only 180507.

Wow, I didn't test large objects specifically, and I am confused why
there would be a count discrepancy. I will need to do some research
unless someone else can guess about the cause.

Maybe pg_largeobject is not getting frozen?

That would explain the change in count, but I thought we froze
_everything_, and had to.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#6)
hackersgeneral
Re: [GENERAL] large object does not exist after pg_migrator

Bruce Momjian wrote:

Alvaro Herrera wrote:

Bruce Momjian wrote:

Jamie Fox wrote:

I can also see that the pg_largeobject table is different, in the pg_restore
version the Rows (estimated) is 316286 and Rows (counted) is the same, in
the pg_migrator version the Rows (counted) is only 180507.

Wow, I didn't test large objects specifically, and I am confused why
there would be a count discrepancy. I will need to do some research
unless someone else can guess about the cause.

Maybe pg_largeobject is not getting frozen?

That would explain the change in count, but I thought we froze
_everything_, and had to.

After a quick chat with Bruce it was determined that we don't freeze
anything (it would be horrid for downtime if we did so in pg_migrator;
and it would be useless if ran anywhere else). What we do is migrate
pg_clog from the old cluster to the new. So never mind that hypothesis.

Bruce noticed that the pg_dump/pg_migrator combo is failing to restore
pg_largeobject's relfrozenxid. We're not sure how this is causing the
errors Jamie is seeing, because what I think should happen is that scans
of the table should fail with failures to open pg_clog files
such-and-such, but not missing tuples ...

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#8Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#7)
hackersgeneral
Re: [GENERAL] large object does not exist after pg_migrator

Alvaro Herrera wrote:

Bruce Momjian wrote:

Alvaro Herrera wrote:

Bruce Momjian wrote:

Jamie Fox wrote:

I can also see that the pg_largeobject table is different, in the pg_restore
version the Rows (estimated) is 316286 and Rows (counted) is the same, in
the pg_migrator version the Rows (counted) is only 180507.

Wow, I didn't test large objects specifically, and I am confused why
there would be a count discrepancy. I will need to do some research
unless someone else can guess about the cause.

Maybe pg_largeobject is not getting frozen?

That would explain the change in count, but I thought we froze
_everything_, and had to.

After a quick chat with Bruce it was determined that we don't freeze
anything (it would be horrid for downtime if we did so in pg_migrator;
and it would be useless if ran anywhere else). What we do is migrate
pg_clog from the old cluster to the new. So never mind that hypothesis.

FYI, we do freeze the new cluster that has only schema definitions, no
data.

Bruce noticed that the pg_dump/pg_migrator combo is failing to restore
pg_largeobject's relfrozenxid. We're not sure how this is causing the
errors Jamie is seeing, because what I think should happen is that scans
of the table should fail with failures to open pg_clog files
such-and-such, but not missing tuples ...

Yea, I can fix that in PG 8.4.1, but that doesn't seem like the cause of
the missing rows. Alvaro and I are still investigating.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#9Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#7)
hackersgeneral
Re: [GENERAL] large object does not exist after pg_migrator

Alvaro Herrera wrote:

Bruce Momjian wrote:

Alvaro Herrera wrote:

Bruce Momjian wrote:

Jamie Fox wrote:

I can also see that the pg_largeobject table is different, in the pg_restore
version the Rows (estimated) is 316286 and Rows (counted) is the same, in
the pg_migrator version the Rows (counted) is only 180507.

Wow, I didn't test large objects specifically, and I am confused why
there would be a count discrepancy. I will need to do some research
unless someone else can guess about the cause.

Maybe pg_largeobject is not getting frozen?

That would explain the change in count, but I thought we froze
_everything_, and had to.

After a quick chat with Bruce it was determined that we don't freeze
anything (it would be horrid for downtime if we did so in pg_migrator;
and it would be useless if ran anywhere else). What we do is migrate
pg_clog from the old cluster to the new. So never mind that hypothesis.

Bruce noticed that the pg_dump/pg_migrator combo is failing to restore
pg_largeobject's relfrozenxid. We're not sure how this is causing the
errors Jamie is seeing, because what I think should happen is that scans
of the table should fail with failures to open pg_clog files
such-and-such, but not missing tuples ...

Jamie, is it possible for you to apply the attached patch to the 8.4
server, install the new pg_dump, and run the test again to see if
pg_largeobject is fixed? This patch properly sets the relfrozenxid in
the system tables for each database.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachments:

/pgpatches/largeobjecttext/x-diffDownload+44-0
#10Jamie Fox
jfox@directcommerce.com
In reply to: Bruce Momjian (#9)
hackersgeneral
Re: [GENERAL] large object does not exist after pg_migrator

On Mon, Jul 13, 2009 at 8:03 PM, Bruce Momjian <bruce@momjian.us> wrote:

Alvaro Herrera wrote:

Bruce Momjian wrote:

Alvaro Herrera wrote:

Bruce Momjian wrote:

Jamie Fox wrote:

I can also see that the pg_largeobject table is different, in the

pg_restore

version the Rows (estimated) is 316286 and Rows (counted) is the

same, in

the pg_migrator version the Rows (counted) is only 180507.

Wow, I didn't test large objects specifically, and I am confused

why

there would be a count discrepancy. I will need to do some research
unless someone else can guess about the cause.

Maybe pg_largeobject is not getting frozen?

That would explain the change in count, but I thought we froze
_everything_, and had to.

After a quick chat with Bruce it was determined that we don't freeze
anything (it would be horrid for downtime if we did so in pg_migrator;
and it would be useless if ran anywhere else). What we do is migrate
pg_clog from the old cluster to the new. So never mind that hypothesis.

Bruce noticed that the pg_dump/pg_migrator combo is failing to restore
pg_largeobject's relfrozenxid. We're not sure how this is causing the
errors Jamie is seeing, because what I think should happen is that scans
of the table should fail with failures to open pg_clog files
such-and-such, but not missing tuples ...

Jamie, is it possible for you to apply the attached patch to the 8.4
server, install the new pg_dump, and run the test again to see if
pg_largeobject is fixed? This patch properly sets the relfrozenxid in
the system tables for each database.

Sorry for the confusion, an addendum meant to be helpful fell out of this
thread during the move from -general. I will try this patch now, but to be
clear - the number of rows in pg_largeobject changed after I ran vacuumlo
(that eventually failed).

Here's what I have found that got broken during pg_migrate: In two side by
side databases (an 8.3.7 copy and 8.4.0 migrated with pg_migrator) the
pg_largeobject table has the same number of rows. However, in the 8.4
database any select for an loid in pg_largeobject returns zero rows. If I
select all loids to a file, and compare to select all loids from 8.3.7
they're the same. When I select != an loid it seems to exclude the one and
return the rest, but all other comparisons <, > or = return zero rows. Or
I'm completely batty. Dereferencing via lo_open of blob_data (an oid) in
other tables fails in the 8.4 database with 'large object xxxxid does not
exist'.

Thanks again,

Jamie

#11Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jamie Fox (#10)
hackersgeneral
Re: [GENERAL] large object does not exist after pg_migrator

Jamie Fox wrote:

Here's what I have found that got broken during pg_migrate: In two side by
side databases (an 8.3.7 copy and 8.4.0 migrated with pg_migrator) the
pg_largeobject table has the same number of rows. However, in the 8.4
database any select for an loid in pg_largeobject returns zero rows. If I
select all loids to a file, and compare to select all loids from 8.3.7
they're the same. When I select != an loid it seems to exclude the one and
return the rest, but all other comparisons <, > or = return zero rows. Or
I'm completely batty. Dereferencing via lo_open of blob_data (an oid) in
other tables fails in the 8.4 database with 'large object xxxxid does not
exist'.

Oh, so maybe it's pg_largeobject's index that's borked ... Did you try
reindexing it?

How are we transferring pg_largeobject, and are we transferring its
index too?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#12Jamie Fox
jfox@directcommerce.com
In reply to: Alvaro Herrera (#11)
hackersgeneral
Re: [GENERAL] large object does not exist after pg_migrator

Here's what I have found that got broken during pg_migrate: In two side

by

side databases (an 8.3.7 copy and 8.4.0 migrated with pg_migrator) the
pg_largeobject table has the same number of rows. However, in the 8.4
database any select for an loid in pg_largeobject returns zero rows. If

I

select all loids to a file, and compare to select all loids from 8.3.7
they're the same. When I select != an loid it seems to exclude the one

and

return the rest, but all other comparisons <, > or = return zero rows.

Or

I'm completely batty. Dereferencing via lo_open of blob_data (an oid) in
other tables fails in the 8.4 database with 'large object xxxxid does not
exist'.

Oh, so maybe it's pg_largeobject's index that's borked ... Did you try
reindexing it?

How are we transferring pg_largeobject, and are we transferring its
index too?

Hi -
REINDEX INDEX pg_largeobject_loid_pn_index;

This seems to have fixed the problem, lo_open of lob data is working again -
now to see how vacuumlo likes it.

Thanks,

Jamie

#13Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jamie Fox (#12)
hackersgeneral
Re: [GENERAL] large object does not exist after pg_migrator

Jamie Fox wrote:

Hi -
REINDEX INDEX pg_largeobject_loid_pn_index;

This seems to have fixed the problem, lo_open of lob data is working again -
now to see how vacuumlo likes it.

So did it work?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#14Jamie Fox
jfox@directcommerce.com
In reply to: Alvaro Herrera (#13)
hackersgeneral
Re: [GENERAL] large object does not exist after pg_migrator

Worked great, vacuumlo finished, a vacuum -full finished amazingly quickly,
very exciting. We're pointing qa apps at it now for testing.
For some reason though, that index has to be rebuilt after running
pg_migrator.

I'll be testing on our 100GB+ prod copy shortly and will let you know if you
want.

Thanks,

Jamie

On Wed, Jul 15, 2009 at 9:28 AM, Alvaro Herrera
<alvherre@commandprompt.com>wrote:

Show quoted text

Jamie Fox wrote:

Hi -
REINDEX INDEX pg_largeobject_loid_pn_index;

This seems to have fixed the problem, lo_open of lob data is working

again -

now to see how vacuumlo likes it.

So did it work?

--
Alvaro Herrera
http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#15Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#13)
hackersgeneral
pg_migrator 8.4.1 alpha 1 released with bug mention

To more clearly identify that pg_migrator now has known bugs, I have
released pg_migrator 8.4.1 alpha1, and mentioned in the README that
there are known bugs related to migrating sequences and large objects.
I have removed the 8.4 source file from pgfoundry.

---------------------------------------------------------------------------

Alvaro Herrera wrote:

Jamie Fox wrote:

Hi -
REINDEX INDEX pg_largeobject_loid_pn_index;

This seems to have fixed the problem, lo_open of lob data is working again -
now to see how vacuumlo likes it.

So did it work?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#16Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#9)
hackersgeneral
Re: [GENERAL] large object does not exist after pg_migrator

Bruce Momjian wrote:

After a quick chat with Bruce it was determined that we don't freeze
anything (it would be horrid for downtime if we did so in pg_migrator;
and it would be useless if ran anywhere else). What we do is migrate
pg_clog from the old cluster to the new. So never mind that hypothesis.

Bruce noticed that the pg_dump/pg_migrator combo is failing to restore
pg_largeobject's relfrozenxid. We're not sure how this is causing the
errors Jamie is seeing, because what I think should happen is that scans
of the table should fail with failures to open pg_clog files
such-and-such, but not missing tuples ...

Jamie, is it possible for you to apply the attached patch to the 8.4
server, install the new pg_dump, and run the test again to see if
pg_largeobject is fixed? This patch properly sets the relfrozenxid in
the system tables for each database.

I have applied the attached patch to have pg_dump restore
pg_largeobject.relfrozenxid in binary upgrade mode; backpatched to
8.4.X.

This doesn't fix the reported problem, but it is still a bug.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachments:

/rtmp/difftext/x-diffDownload+47-0
#17Bruce Momjian
bruce@momjian.us
In reply to: Jamie Fox (#12)
hackersgeneral
Re: [GENERAL] large object does not exist after pg_migrator

Jamie Fox wrote:

Here's what I have found that got broken during pg_migrate: In two side

by

side databases (an 8.3.7 copy and 8.4.0 migrated with pg_migrator) the
pg_largeobject table has the same number of rows. However, in the 8.4
database any select for an loid in pg_largeobject returns zero rows. If

I

select all loids to a file, and compare to select all loids from 8.3.7
they're the same. When I select != an loid it seems to exclude the one

and

return the rest, but all other comparisons <, > or = return zero rows.

Or

I'm completely batty. Dereferencing via lo_open of blob_data (an oid) in
other tables fails in the 8.4 database with 'large object xxxxid does not
exist'.

Oh, so maybe it's pg_largeobject's index that's borked ... Did you try
reindexing it?

How are we transferring pg_largeobject, and are we transferring its
index too?

Hi -
REINDEX INDEX pg_largeobject_loid_pn_index;

This seems to have fixed the problem, lo_open of lob data is working again -
now to see how vacuumlo likes it.

I have applied the attached patch to pg_migrator to properly migrate the
pg_largeobject index. I have added large object comment migration as a
TODO item.

This eliminates the last known bug in pg_migrator.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachments:

/rtmp/difftext/x-diffDownload+45-35