vacuum crash on 6.5.3

Started by Tatsuo Ishiiabout 25 years ago3 messages
#1Tatsuo Ishii
t-ishii@sra.co.jp

Althoug this happens on old 6.5.3, I would like to know if this has
been already fixed...

Here is the scenario:

1) before vacuum, table A has 8850 tuples.

2) vacuum on table A makes postgres crashed.

3) it crashes at line 1758:

Assert(num_moved == checked_moved);

I examined variables using gdb. num_moved == 8849, check_moved ==
8813, num_tuples == 18.

4) if PostgreSQL is not compiled with assertion, vacuum does not
crash. However, after vacuum, the number of tuples descreases from
8850 to 8814!! (I am not sure which number is correct, though)

I think this is an important problem since a data loss might
happen. Any idea?
--
Tatsuo Ishii

#2Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Tatsuo Ishii (#1)
2 attachment(s)
Re: vacuum crash on 6.5.3

Althoug this happens on old 6.5.3, I would like to know if this has
been already fixed...

Here is the scenario:

1) before vacuum, table A has 8850 tuples.

2) vacuum on table A makes postgres crashed.

3) it crashes at line 1758:

Assert(num_moved == checked_moved);

I examined variables using gdb. num_moved == 8849, check_moved ==
8813, num_tuples == 18.

4) if PostgreSQL is not compiled with assertion, vacuum does not
crash. However, after vacuum, the number of tuples descreases from
8850 to 8814!! (I am not sure which number is correct, though)

I think this is an important problem since a data loss might
happen. Any idea?

It turns out that this was caused by vacuum's bug. Thanks to Hiroshi,
he has identified the problem. I have checked other version of
PostgreSQL, and found that at we have had the bug at least since
6.3.2, and it has been fixed in 7.0. Included are patches for 6.5.3 and
a test sript to reproduce the bug. Both of them are made by Hiroshi.
--
Tatsuo Ishii

Attachments:

vacuum_test.sqltext/plain; charset=us-asciiDownload
movein.difftext/plain; charset=us-asciiDownload
*** commands/vacuum.c.orig	Tue Dec 26 23:24:01 2000
--- commands/vacuum.c	Wed Dec 27 00:36:46 2000
***************
*** 1025,1030 ****
--- 1025,1031 ----
  			   *idcur;
  	int			last_fraged_block,
  				last_vacuum_block,
+ 				last_moved_in_block,
  				i = 0;
  	Size		tuple_len;
  	int			num_moved,
***************
*** 1060,1065 ****
--- 1061,1067 ----
  	vacuumed_pages = vacuum_pages->vpl_num_pages - vacuum_pages->vpl_empty_end_pages;
  	last_vacuum_page = vacuum_pages->vpl_pagedesc[vacuumed_pages - 1];
  	last_vacuum_block = last_vacuum_page->vpd_blkno;
+ 	last_moved_in_block = 0;
  	Assert(last_vacuum_block >= last_fraged_block);
  	cur_buffer = InvalidBuffer;
  	num_moved = 0;
***************
*** 1073,1078 ****
--- 1075,1083 ----
  		/* if it's reapped page and it was used by me - quit */
  		if (blkno == last_fraged_block && last_fraged_page->vpd_offsets_used > 0)
  			break;
+ 		/* couldn't shrink any more if this block has MOVED_IN tuplesit's - quit */
+ 		if (blkno == last_moved_in_block)
+ 			break;
  
  		buf = ReadBuffer(onerel, blkno);
  		page = BufferGetPage(buf);
***************
*** 1447,1452 ****
--- 1452,1459 ----
  					pfree(newtup.t_data);
  					newtup.t_data = (HeapTupleHeader) PageGetItem(ToPage, newitemid);
  					ItemPointerSet(&(newtup.t_self), vtmove[ti].vpd->vpd_blkno, newoff);
+ 					if (vtmove[i].vpd->vpd_blkno > last_moved_in_block)
+ 						last_moved_in_block = vtmove[i].vpd->vpd_blkno;
  
  					/*
  					 * Set t_ctid pointing to itself for last tuple in
***************
*** 1579,1584 ****
--- 1586,1593 ----
  			newtup.t_data = (HeapTupleHeader) PageGetItem(ToPage, newitemid);
  			ItemPointerSet(&(newtup.t_data->t_ctid), cur_page->vpd_blkno, newoff);
  			newtup.t_self = newtup.t_data->t_ctid;
+ 			if (cur_page->vpd_blkno > last_moved_in_block)
+ 				last_moved_in_block = cur_page->vpd_blkno;
  
  			/*
  			 * Mark old tuple as moved_off by vacuum and store vacuum XID
#3Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Tatsuo Ishii (#2)
RE: vacuum crash on 6.5.3

Just a supplement.
Essentially this isn't a crash bug.
This had been a disastrous bug that causes data loss silently.
(This is known as 'HEAP_MOVED_IN was not expected' bug
but the result could be more serious than I've recognized.)

Please apply the patch if you still have pre-7.0 pg db-s and
you don't love data loss.

Regards.
Hiroshi Inoue

Show quoted text

-----Original Message-----
From: Tatsuo Ishii

Althoug this happens on old 6.5.3, I would like to know if this has
been already fixed...

Here is the scenario:

1) before vacuum, table A has 8850 tuples.

2) vacuum on table A makes postgres crashed.

3) it crashes at line 1758:

Assert(num_moved == checked_moved);

I examined variables using gdb. num_moved == 8849, check_moved ==
8813, num_tuples == 18.

4) if PostgreSQL is not compiled with assertion, vacuum does not
crash. However, after vacuum, the number of tuples descreases from
8850 to 8814!! (I am not sure which number is correct, though)

I think this is an important problem since a data loss might
happen. Any idea?

It turns out that this was caused by vacuum's bug. Thanks to Hiroshi,
he has identified the problem. I have checked other version of
PostgreSQL, and found that at we have had the bug at least since
6.3.2, and it has been fixed in 7.0. Included are patches for 6.5.3 and
a test sript to reproduce the bug. Both of them are made by Hiroshi.
--
Tatsuo Ishii