vacuum crash on 6.5.3
Althoug this happens on old 6.5.3, I would like to know if this has
been already fixed...
Here is the scenario:
1) before vacuum, table A has 8850 tuples.
2) vacuum on table A makes postgres crashed.
3) it crashes at line 1758:
Assert(num_moved == checked_moved);
I examined variables using gdb. num_moved == 8849, check_moved ==
8813, num_tuples == 18.
4) if PostgreSQL is not compiled with assertion, vacuum does not
crash. However, after vacuum, the number of tuples descreases from
8850 to 8814!! (I am not sure which number is correct, though)
I think this is an important problem since a data loss might
happen. Any idea?
--
Tatsuo Ishii
Althoug this happens on old 6.5.3, I would like to know if this has
been already fixed...Here is the scenario:
1) before vacuum, table A has 8850 tuples.
2) vacuum on table A makes postgres crashed.
3) it crashes at line 1758:
Assert(num_moved == checked_moved);
I examined variables using gdb. num_moved == 8849, check_moved ==
8813, num_tuples == 18.4) if PostgreSQL is not compiled with assertion, vacuum does not
crash. However, after vacuum, the number of tuples descreases from
8850 to 8814!! (I am not sure which number is correct, though)I think this is an important problem since a data loss might
happen. Any idea?
It turns out that this was caused by vacuum's bug. Thanks to Hiroshi,
he has identified the problem. I have checked other version of
PostgreSQL, and found that at we have had the bug at least since
6.3.2, and it has been fixed in 7.0. Included are patches for 6.5.3 and
a test sript to reproduce the bug. Both of them are made by Hiroshi.
--
Tatsuo Ishii
Attachments:
movein.difftext/plain; charset=us-asciiDownload
*** commands/vacuum.c.orig Tue Dec 26 23:24:01 2000
--- commands/vacuum.c Wed Dec 27 00:36:46 2000
***************
*** 1025,1030 ****
--- 1025,1031 ----
*idcur;
int last_fraged_block,
last_vacuum_block,
+ last_moved_in_block,
i = 0;
Size tuple_len;
int num_moved,
***************
*** 1060,1065 ****
--- 1061,1067 ----
vacuumed_pages = vacuum_pages->vpl_num_pages - vacuum_pages->vpl_empty_end_pages;
last_vacuum_page = vacuum_pages->vpl_pagedesc[vacuumed_pages - 1];
last_vacuum_block = last_vacuum_page->vpd_blkno;
+ last_moved_in_block = 0;
Assert(last_vacuum_block >= last_fraged_block);
cur_buffer = InvalidBuffer;
num_moved = 0;
***************
*** 1073,1078 ****
--- 1075,1083 ----
/* if it's reapped page and it was used by me - quit */
if (blkno == last_fraged_block && last_fraged_page->vpd_offsets_used > 0)
break;
+ /* couldn't shrink any more if this block has MOVED_IN tuplesit's - quit */
+ if (blkno == last_moved_in_block)
+ break;
buf = ReadBuffer(onerel, blkno);
page = BufferGetPage(buf);
***************
*** 1447,1452 ****
--- 1452,1459 ----
pfree(newtup.t_data);
newtup.t_data = (HeapTupleHeader) PageGetItem(ToPage, newitemid);
ItemPointerSet(&(newtup.t_self), vtmove[ti].vpd->vpd_blkno, newoff);
+ if (vtmove[i].vpd->vpd_blkno > last_moved_in_block)
+ last_moved_in_block = vtmove[i].vpd->vpd_blkno;
/*
* Set t_ctid pointing to itself for last tuple in
***************
*** 1579,1584 ****
--- 1586,1593 ----
newtup.t_data = (HeapTupleHeader) PageGetItem(ToPage, newitemid);
ItemPointerSet(&(newtup.t_data->t_ctid), cur_page->vpd_blkno, newoff);
newtup.t_self = newtup.t_data->t_ctid;
+ if (cur_page->vpd_blkno > last_moved_in_block)
+ last_moved_in_block = cur_page->vpd_blkno;
/*
* Mark old tuple as moved_off by vacuum and store vacuum XID
Just a supplement.
Essentially this isn't a crash bug.
This had been a disastrous bug that causes data loss silently.
(This is known as 'HEAP_MOVED_IN was not expected' bug
but the result could be more serious than I've recognized.)
Please apply the patch if you still have pre-7.0 pg db-s and
you don't love data loss.
Regards.
Hiroshi Inoue
Show quoted text
-----Original Message-----
From: Tatsuo IshiiAlthoug this happens on old 6.5.3, I would like to know if this has
been already fixed...Here is the scenario:
1) before vacuum, table A has 8850 tuples.
2) vacuum on table A makes postgres crashed.
3) it crashes at line 1758:
Assert(num_moved == checked_moved);
I examined variables using gdb. num_moved == 8849, check_moved ==
8813, num_tuples == 18.4) if PostgreSQL is not compiled with assertion, vacuum does not
crash. However, after vacuum, the number of tuples descreases from
8850 to 8814!! (I am not sure which number is correct, though)I think this is an important problem since a data loss might
happen. Any idea?It turns out that this was caused by vacuum's bug. Thanks to Hiroshi,
he has identified the problem. I have checked other version of
PostgreSQL, and found that at we have had the bug at least since
6.3.2, and it has been fixed in 7.0. Included are patches for 6.5.3 and
a test sript to reproduce the bug. Both of them are made by Hiroshi.
--
Tatsuo Ishii