unlink large objects

Started by Philip Crotwellalmost 25 years ago7 messageshackers

crotwell@seis.sc.edu

almost 25 years ago

I am having trouble with a 7.1rc4 database filling up my disks. What I do
is put a large number of "small" large objects of seismic data into the
database in one process and use another process to unlink them after they
reach a certain age to form a buffer. The unlink seems to be working, and
some disk space is reclaimed, but the size of the database continues to
grow until the disk fills and the postgres backend dies. I have tried
vacuuming, but that doesn't help.

I poked around in the database directory and found a file named 16948 that
is 960Mb or almost all of the space on my partition. If the unlinks were
completely cleaning up, then my 8 days data buffer should be about 150Mb.
Is there a way to tell what this file is? I guess it is all the large
objects dumped in together??? Does anyone know why my unlinks wouldn't be
completely freeing the disk space?

lgelg pg> ls -l 16948
-rw------- 1 postgres postgres 959438848 Jun 8 14:31 16948
lgelg pg> pwd
/home/postgres/data/base/18721
lgelg pg>

I have put some more info below, if it helps. But basically I think that
the messages are all related to the disk filing, but don't explain why it
filled.

thanks,
Philip

Here is a snippet of the java code fo my unlink, and I am using autocommit
off:
lobj = ((org.postgresql.Connection)conn).getLargeObjectAPI();
...snip...

logger.debug("before large object delete");
// loop over all large objects, deleting them
it = oid.iterator();
while (it.hasNext()) {
Integer nextId = (Integer)it.next();

logger.debug("Deleting large object "+nextId);
// delete large object data
lobj.delete(nextId.intValue());
}
it = null;

// commit changes
logger.debug("Commiting...");
jdbcDataChunk.commit();
conn.commit();
logger.info("Commiting done.");

Here is the java exception I get:
An I/O error has occured while flushing the output - Exception:
java.io.IOException: Broken pipe
Stack Trace:

java.io.IOException: Broken pipe
at java.net.SocketOutputStream.socketWrite(Native Method)
at java.net.SocketOutputStream.write(SocketOutputStream.java,
Compiled Code)
at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java,
Compiled Code)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java,
Compiled Code)
at org.postgresql.PG_Stream.flush(PG_Stream.java, Compiled Code)
at org.postgresql.Connection.ExecSQL(Connection.java, Compiled
Code)
at org.postgresql.jdbc2.Statement.execute(Statement.java, Compiled
Code)
at org.postgresql.jdbc2.Statement.executeQuery(Statement.java,
Compiled Code)
at
org.postgresql.jdbc2.PreparedStatement.executeQuery(PreparedStatement.java,
Compile
d Code)
at
edu.sc.seis.anhinga.database.JDBCChannelId.getDBId(JDBCChannelId.java,
Compiled Cod
e)
at
edu.sc.seis.anhinga.database.JDBCDataChunk.put(JDBCDataChunk.java,
Compiled Code)
at edu.sc.seis.anhinga.symres.Par4ToDB.run(Par4ToDB.java, Compiled
Code)
End of Stack Trace

Here are the messages in the serverlog:
DEBUG: MoveOfflineLogs: remove 00000000000000D7
DEBUG: MoveOfflineLogs: remove 00000000000000D8
DEBUG: MoveOfflineLogs: remove 00000000000000D9
ERROR: Write to hashjoin temp file failed
ERROR: Write to hashjoin temp file failed
ERROR: Write to hashjoin temp file failed
ERROR: Write to hashjoin temp file failed
ERROR: Write to hashjoin temp file failed
ERROR: Write to hashjoin temp file failed
ERROR: Write to hashjoin temp file failed
ERROR: Write to hashjoin temp file failed
ERROR: Write to hashjoin temp file failed
ERROR: Write to hashjoin temp file failed
ERROR: Write to hashjoin temp file failed
ERROR: Write to hashjoin temp file failed
ERROR: Write to hashjoin temp file failed
ERROR: Write to hashjoin temp file failed
ERROR: Write to hashjoin temp file failed
ERROR: Write to hashjoin temp file failed
ERROR: Write to hashjoin temp file failed
ERROR: Write to hashjoin temp file failed
ERROR: Write to hashjoin temp file failed
ERROR: Write to hashjoin temp file failed
ERROR: Write to hashjoin temp file failed
ERROR: Write to hashjoin temp file failed
ERROR: Write to hashjoin temp file failed
ERROR: Write to hashjoin temp file failed
DEBUG: MoveOfflineLogs: remove 00000000000000DA
FATAL 2: ZeroFill(/home/postgres/data/pg_xlog/xlogtemp.19371) failed: No
such file or directo
ry
ERROR: Write to hashjoin temp file failed
Server process (pid 19371) exited with status 512 at Thu Jun 7 03:32:52
2001
Terminating any active server processes...
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend died
abnormally and possibly c
orrupted shared memory.
I have rolled back the current transaction and am going to
terminate your databa
se system connection and exit.
Please reconnect to the database system and repeat your query.
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend died
abnormally and possibly c
orrupted shared memory.
I have rolled back the current transaction and am going to
terminate your databa
se system connection and exit.
Please reconnect to the database system and repeat your query.
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend died
abnormally and possibly c
orrupted shared memory.
I have rolled back the current transaction and am going to
terminate your databa
se system connection and exit.
Please reconnect to the database system and repeat your query.
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend died
abnormally and possibly c
orrupted shared memory.
I have rolled back the current transaction and am going to
terminate your databa
se system connection and exit.
Please reconnect to the database system and repeat your query.
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend died
abnormally and possibly c
orrupted shared memory.
I have rolled back the current transaction and am going to
terminate your databa
se system connection and exit.
Please reconnect to the database system and repeat your query.
Server processes were terminated at Thu Jun 7 03:32:53 2001
Reinitializing shared memory and semaphores
DEBUG: database system was interrupted at 2001-06-07 03:32:47 UTC
DEBUG: CheckPoint record at (0, 3686817652)
DEBUG: Redo record at (0, 3686817652); Undo record at (0, 0); Shutdown
FALSE
DEBUG: NextTransactionId: 9905192; NextOid: 846112
DEBUG: database system was not properly shut down; automatic recovery in
progress...
DEBUG: ReadRecord: record with zero len at (0, 3686817716)
DEBUG: redo is not required
DEBUG: database system is in production state

Tom Lane

tgl@sss.pgh.pa.us

almost 25 years ago

In reply to: Philip Crotwell (#1)

Re: unlink large objects

Philip Crotwell <crotwell@seis.sc.edu> writes:

I poked around in the database directory and found a file named 16948 that
is 960Mb or almost all of the space on my partition.

That is pg_largeobject. Vacuuming should recover the lost space; are
you sure you've vacuumed it?

regards, tom lane

Philip Crotwell

crotwell@seis.sc.edu

almost 25 years ago

In reply to: Tom Lane (#2)

Re: unlink large objects

I was vacuuming, but as the owner of the database. When I do that there
are messages that should have clued me in, lke

NOTICE: Skipping "pg_largeobject" --- only table owner can VACUUM it

From now on I will vacuum as user postgres, but I wonder if there is a
need for individual users to be able to vacuum large obects by themselves.
I assume that is problimatic since all large objects are in the same file?

Also, when the disk is nearly full, I am seeing the database crash while
trying to vacuum.
timbr pg> vacuumdb scepp
FATAL 2: ZeroFill(/home/postgres/data/pg_xlog/xlogtemp.7922) failed: No
space left on device
pqReadData() -- backend closed the channel unexpectedly.
This probably means the backend terminated abnormally
before or while processing the request.
connection to server was lost
vacuumdb: vacuum scepp failed
timbr pg> df -k
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/hda5 298663 232362 50881 82% /
/dev/hda1 17534 2482 14147 15% /boot
/dev/hda7 1111176 1039436 15296 99% /home
timbr pg> vacuumdb scepp
psql: connectDBStart() -- connect() failed: Connection refused
Is the postmaster running locally
and accepting connections on Unix socket '/tmp/.s.PGSQL.5432'?
vacuumdb: vacuum scepp failed

After this I went to another on of our stations that hasn't been running
very long and tried. Now instead of getting smaller the size went up by
50%!!!

myrtl> df -k
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/hda5 417743 320216 75958 81% /
/dev/hda1 17534 2482 14147 15% /boot
/dev/hda7 1458400 277884 1106432 20% /home
myrtl> su - postgres
Password:
test> vacuumdb scepp
VACUUM
test> df -k
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/hda5 417743 320216 75958 81% /
/dev/hda1 17534 2482 14147 15% /boot
/dev/hda7 1458400 419660 964656 30% /home

Looking into the db directory, it seems that the space is in the xlog.
When does this get flushed?

test> du -sk *
4 PG_VERSION
153680 base
616 global
8 pg_hba.conf
4 pg_ident.conf
262468 pg_xlog
4 postgresql.conf
4 postmaster.opts
4 postmaster.pid
24 serverlog

I tried to vacuum again, and the space went down to 23% of disk. But still
worse than before vacuuming.
test> vacuumdb scepp
VACUUM
test> df -k
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/hda5 417743 320216 75958 81% /
/dev/hda1 17534 2482 14147 15% /boot
/dev/hda7 1458400 320804 1063512 23% /home

Any suggestions?

thanks for you help,
Philip

On Fri, 8 Jun 2001, Tom Lane wrote:

Show quoted text

Philip Crotwell <crotwell@seis.sc.edu> writes:

I poked around in the database directory and found a file named 16948 that
is 960Mb or almost all of the space on my partition.

That is pg_largeobject. Vacuuming should recover the lost space; are
you sure you've vacuumed it?

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

almost 25 years ago

In reply to: Philip Crotwell (#3)

Re: [JDBC] unlink large objects

Philip Crotwell <crotwell@seis.sc.edu> writes:

I was vacuuming, but as the owner of the database. When I do that there
are messages that should have clued me in, lke
NOTICE: Skipping "pg_largeobject" --- only table owner can VACUUM it

From now on I will vacuum as user postgres, but I wonder if there is a
need for individual users to be able to vacuum large obects by themselves.

Good point. More generally, it seems like it might be a good idea to
allow the owner of an individual database to vacuum all the system
catalogs in it, even if he's not the superuser. Comments anyone?

Also, when the disk is nearly full, I am seeing the database crash while
trying to vacuum.

From WAL log overflow, no doubt. If you look in the pgsql-patches
archives from the last week or two, there is a patch to alleviate the
growth of the log.

regards, tom lane

Bruce Momjian

bruce@momjian.us

almost 25 years ago

In reply to: Tom Lane (#4)

Re: Re: [JDBC] unlink large objects

Philip Crotwell <crotwell@seis.sc.edu> writes:

I was vacuuming, but as the owner of the database. When I do that there
are messages that should have clued me in, lke
NOTICE: Skipping "pg_largeobject" --- only table owner can VACUUM it

From now on I will vacuum as user postgres, but I wonder if there is a
need for individual users to be able to vacuum large obects by themselves.

Good point. More generally, it seems like it might be a good idea to
allow the owner of an individual database to vacuum all the system
catalogs in it, even if he's not the superuser. Comments anyone?

Seems db owner should be able to do whatever they want to the non-global
system tables.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Philip Crotwell

crotwell@seis.sc.edu

almost 25 years ago

In reply to: Bruce Momjian (#5)

Re: Re: [JDBC] unlink large objects

On Sat, 9 Jun 2001, Bruce Momjian wrote:

Philip Crotwell <crotwell@seis.sc.edu> writes:

I was vacuuming, but as the owner of the database. When I do that there
are messages that should have clued me in, lke
NOTICE: Skipping "pg_largeobject" --- only table owner can VACUUM it

From now on I will vacuum as user postgres, but I wonder if there is a
need for individual users to be able to vacuum large obects by themselves.

Good point. More generally, it seems like it might be a good idea to
allow the owner of an individual database to vacuum all the system
catalogs in it, even if he's not the superuser. Comments anyone?

Seems db owner should be able to do whatever they want to the non-global
system tables.

I would suppose that this has already been thought of, but it would be
really nice, IMHO, if the "vacuuming" could be done continuously by a low
priority backend thread, like garbabge collection in java. Easier said
then done I am sure, but this would mean that it could be done by the
backend without having to worry about ownership of system tables.

On a similar idea, has there been any thought to allowing regular backend
processess to run at lower priority? Either by adopting the callers
priority if it is local, or by having some way to set a priority. I have
some db cleanup tasks that I would like to run as low level background
tasks, but the backend does all the work at the default top priority.

Just some thoughts,
thanks,
Philip

Tom Lane

tgl@sss.pgh.pa.us

almost 25 years ago

In reply to: Philip Crotwell (#6)

Re: Re: [JDBC] unlink large objects

Philip Crotwell <crotwell@seis.sc.edu> writes:

On a similar idea, has there been any thought to allowing regular backend
processess to run at lower priority?

People suggest that from time to time, but it's not an easy thing to do.
The problem is priority inversion: low-priority process acquires a lock,
then some high-priority process starts to run and wants that lock.
Presto, high-priority process is now a low-priority waiter.

Detecting priority inversion situations would be difficult, and doing
anything about them would be even more difficult...

regards, tom lane