huge table occupation after updates

Started by Tom DalPozzoover 9 years ago19 messagesgeneral

t.dalpozzo@gmail.com

over 9 years ago

Hi,
I've a table ('stato') with an indexed bigint ('Id') and 5 bytea fields
('d0','d1',...,'d4').
I populated the table with 10000 rows; each d.. field inizialized with 20
bytes.
Reported table size is 1.5MB. OK.
Now, for 1000 times, I update 2000 different rows each time, changing d0
filed keeping the same length, and at the end of all, I issued VACUUM.
Now table size is 29MB.

Why so big? What is an upper bound to estimate a table occupation on disk?

The same test, redone with dX length=200 bytes instead of 20 reports:
Size before UPDATES = 11MB. OK.
Size after UPDATES = 1.7GB . Why?

Attached a txt file with details of statistical command I issued (max of
row size, rows count etc....)

Regards
Pupillo

Andreas Kretschmer

akretschmer@spamfence.net

over 9 years ago

In reply to: Tom DalPozzo (#1)

Re: huge table occupation after updates

Tom DalPozzo <t.dalpozzo@gmail.com> wrote:

Hi,
I've a table ('stato') with an indexed bigint ('Id') and 5 bytea fields
('d0','d1',...,'d4').
I populated the table with 10000 rows; each d.. field inizialized with 20
bytes.
Reported table size is 1.5MB. OK.
Now, for 1000 times, ï¿½I update ï¿½2000 different rows each time, changing d0
filed keeping the same length, and at the end of all, ï¿½I issued VACUUM.
Now table size is 29MB.ï¿½

Why so big? What is an upper bound to estimate a table occupation on disk?

every (!) update creates a new row-version and marks the old row as
'old', but don't delete the old row.

A Vacuum marks old rows as reuseable - if there is no runnung
transaction that can see the old row-version. That's how MVCC works in
PostgreSQL.

Regards, Andreas Kretschmer
--
Andreas Kretschmer
http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Francisco Olarte

folarte@peoplecall.com

over 9 years ago

In reply to: Tom DalPozzo (#1)

Re: huge table occupation after updates

Hi Tom

On Sat, Dec 10, 2016 at 1:15 PM, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:
...

Reported table size is 1.5MB. OK.

That's 150 bytes per row, prety normal.

Now, for 1000 times, I update 2000 different rows each time, changing d0
filed keeping the same length, and at the end of all, I issued VACUUM.

And probably autovacuum or something similar kicked in meanwhile. 2M
updates is 200 updates per row, that's pretty heavy traffic, many
tables do not get that in their whole lifetime.

Now table size is 29MB.
Why so big? What is an upper bound to estimate a table occupation on disk?

Strictly, you could probably calculate an upper bound as row
size*number or insertions, given an update aproximates an insertion
plus deletion. Given the original insertion used 1.5Mb and you
repeated thar about 200 times, I would stimate 201*1.5=301.5Mb as an
upper bound, but I doubt that's of any use.

Those many updates probably left your table badly fragmented, with few
rows per page. On a normal usage you do not need to worry, as
periodic vacuum would mark the space for reuse and the table will not
grow that big. But issuing an 1k updates on 20% of the table is hardly
normal usage, if you need this kind of usage maybe you should rethink
your strategies.

Vacuum full will probably pack the table and ''recover'' the space, it
should be fast with just 29Mb on disk. Not knowing your intended usage
nothing can be recommended, but I've had some usage patterns where a
heavy update plus vacuuum full was successfully used.

Francisco Olarte.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Tom DalPozzo

t.dalpozzo@gmail.com

over 9 years ago

In reply to: Francisco Olarte (#3)

Re: huge table occupation after updates

Hi,
you're right, VACUUM FULL recovered the space, completely.
So, at this point I'm worried about my needs.
I cannot issue vacuum full as I read it locks the table.
In my DB, I (would) need to have a table with one bigint id field+ 10 bytea
fields, 100 bytes long each (more or less, not fixed).
5/10000 rows maximum, but let's say 5000.
As traffic I can suppose 10000 updates per row per day (spread over groups
of hours; each update involving two of those fields, randomly.
Also rows are chosen randomly (in my test I used a block of 2000 just to
try one possibility).
So, it's a total of 50 millions updates per day, hence (50millions * 100
bytes *2 fields updated) 10Gbytes net per day.
I'm afraid it's not possible, according to my results.
Reagrds
Pupillo

2016-12-10 13:38 GMT+01:00 Francisco Olarte <folarte@peoplecall.com>:

Show quoted text

Hi Tom

On Sat, Dec 10, 2016 at 1:15 PM, Tom DalPozzo <t.dalpozzo@gmail.com>
wrote:
...

Reported table size is 1.5MB. OK.

That's 150 bytes per row, prety normal.

Now, for 1000 times, I update 2000 different rows each time, changing

d0

filed keeping the same length, and at the end of all, I issued VACUUM.

And probably autovacuum or something similar kicked in meanwhile. 2M
updates is 200 updates per row, that's pretty heavy traffic, many
tables do not get that in their whole lifetime.

Now table size is 29MB.
Why so big? What is an upper bound to estimate a table occupation on

disk?

Strictly, you could probably calculate an upper bound as row
size*number or insertions, given an update aproximates an insertion
plus deletion. Given the original insertion used 1.5Mb and you
repeated thar about 200 times, I would stimate 201*1.5=301.5Mb as an
upper bound, but I doubt that's of any use.

Those many updates probably left your table badly fragmented, with few
rows per page. On a normal usage you do not need to worry, as
periodic vacuum would mark the space for reuse and the table will not
grow that big. But issuing an 1k updates on 20% of the table is hardly
normal usage, if you need this kind of usage maybe you should rethink
your strategies.

Vacuum full will probably pack the table and ''recover'' the space, it
should be fast with just 29Mb on disk. Not knowing your intended usage
nothing can be recommended, but I've had some usage patterns where a
heavy update plus vacuuum full was successfully used.

Francisco Olarte.

Rob Sargent

robjsargent@gmail.com

over 9 years ago

In reply to: Tom DalPozzo (#4)

Re: huge table occupation after updates

On Dec 10, 2016, at 6:25 AM, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:

Hi,
you're right, VACUUM FULL recovered the space, completely.
So, at this point I'm worried about my needs.
I cannot issue vacuum full as I read it locks the table.
In my DB, I (would) need to have a table with one bigint id field+ 10 bytea fields, 100 bytes long each (more or less, not fixed).
5/10000 rows maximum, but let's say 5000.
As traffic I can suppose 10000 updates per row per day (spread over groups of hours; each update involving two of those fields, randomly.
Also rows are chosen randomly (in my test I used a block of 2000 just to try one possibility).
So, it's a total of 50 millions updates per day, hence (50millions * 100 bytes *2 fields updated) 10Gbytes net per day.
I'm afraid it's not possible, according to my results.
Reagrds
Pupillo

Are each of the updates visible to a user or read/analyzed by another activity? If not you can do most of the update in memory and flush a snapshot periodically to the database.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Tom DalPozzo

t.dalpozzo@gmail.com

over 9 years ago

In reply to: Rob Sargent (#5)

Re: huge table occupation after updates

Hi,
I'd like to do that! But my DB must be crash proof! Very high reliability
is a must.
I also use sycn replication.
Regards
Pupillo

2016-12-10 16:04 GMT+01:00 Rob Sargent <robjsargent@gmail.com>:

Show quoted text

On Dec 10, 2016, at 6:25 AM, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:

Hi,
you're right, VACUUM FULL recovered the space, completely.
So, at this point I'm worried about my needs.
I cannot issue vacuum full as I read it locks the table.
In my DB, I (would) need to have a table with one bigint id field+ 10

bytea fields, 100 bytes long each (more or less, not fixed).

5/10000 rows maximum, but let's say 5000.
As traffic I can suppose 10000 updates per row per day (spread over

groups of hours; each update involving two of those fields, randomly.

Also rows are chosen randomly (in my test I used a block of 2000 just to

try one possibility).

So, it's a total of 50 millions updates per day, hence (50millions * 100

bytes *2 fields updated) 10Gbytes net per day.

I'm afraid it's not possible, according to my results.
Reagrds
Pupillo

Are each of the updates visible to a user or read/analyzed by another
activity? If not you can do most of the update in memory and flush a
snapshot periodically to the database.

Rob Sargent

robjsargent@gmail.com

over 9 years ago

In reply to: Tom DalPozzo (#6)

Re: huge table occupation after updates

On Dec 10, 2016, at 7:27 AM, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:

Hi,
I'd like to do that! But my DB must be crash proof! Very high reliability is a must.
I also use sycn replication.
Regards
Pupillo

Are each of the updates visible to a user or read/analyzed by another activity? If not you can do most of the update in memory and flush a snapshot periodically to the database.

This list discourages top posting. You’re asked to place your reply at the bottom

You haven’t laid out you’re application architecture (how many clients, who is reading who is writing, etc). Caching doesn’t mean your database is any less crash proof. At that rate of activity, depending on architecture, you could lose updates in all sorts of crash scenarios.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Tom DalPozzo

t.dalpozzo@gmail.com

over 9 years ago

In reply to: Rob Sargent (#7)

Re: huge table occupation after updates

2016-12-10 16:36 GMT+01:00 Rob Sargent <robjsargent@gmail.com>:

On Dec 10, 2016, at 7:27 AM, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:

Hi,
I'd like to do that! But my DB must be crash proof! Very high

reliability is a must.

I also use sycn replication.
Regards
Pupillo

Are each of the updates visible to a user or read/analyzed by another

activity? If not you can do most of the update in memory and flush a
snapshot periodically to the database.

This list discourages top posting. You’re asked to place your reply at the
bottom

You haven’t laid out you’re application architecture (how many clients,
who is reading who is writing, etc). Caching doesn’t mean your database is
any less crash proof. At that rate of activity, depending on architecture,
you could lose updates in all sorts of crash scenarios.

As for crash proof, I meant that once my client app is told that her
update request was committed, it mustn't get lost (hdd failure apart of
course). And I can't wait to flush the cache before telling to the app
:"committed".
I can replicate also the cache on the standby PC of course.
Regards
Pupillo

Rob Sargent

robjsargent@gmail.com

over 9 years ago

In reply to: Tom DalPozzo (#8)

Re: huge table occupation after updates

On Dec 10, 2016, at 10:01 AM, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:

2016-12-10 16:36 GMT+01:00 Rob Sargent <robjsargent@gmail.com>:

On Dec 10, 2016, at 7:27 AM, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:

Hi,
I'd like to do that! But my DB must be crash proof! Very high reliability is a must.
I also use sycn replication.
Regards
Pupillo

Are each of the updates visible to a user or read/analyzed by another activity? If not you can do most of the update in memory and flush a snapshot periodically to the database.

This list discourages top posting. You’re asked to place your reply at the bottom

You haven’t laid out you’re application architecture (how many clients, who is reading who is writing, etc). Caching doesn’t mean your database is any less crash proof. At that rate of activity, depending on architecture, you could lose updates in all sorts of crash scenarios.

As for crash proof, I meant that once my client app is told that her update request was committed, it mustn't get lost (hdd failure apart of course). And I can't wait to flush the cache before telling to the app :"committed".
I can replicate also the cache on the standby PC of course.
Regards
Pupillo

OK clientA sends an update; you commit and tell clientA committed. clientB updates same record; Do you tell clientA of clientB’s update?
Are the two updates cumulative or destructive.
Can you report all updates done by clientA?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10

Tom DalPozzo

t.dalpozzo@gmail.com

over 9 years ago

In reply to: Rob Sargent (#9)

Re: huge table occupation after updates

2016-12-10 18:10 GMT+01:00 Rob Sargent <robjsargent@gmail.com>:

On Dec 10, 2016, at 10:01 AM, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:

2016-12-10 16:36 GMT+01:00 Rob Sargent <robjsargent@gmail.com>:

On Dec 10, 2016, at 7:27 AM, Tom DalPozzo <t.dalpozzo@gmail.com>

wrote:

Hi,
I'd like to do that! But my DB must be crash proof! Very high

reliability is a must.

I also use sycn replication.
Regards
Pupillo

Are each of the updates visible to a user or read/analyzed by another

activity? If not you can do most of the update in memory and flush a
snapshot periodically to the database.

This list discourages top posting. You’re asked to place your reply at

the bottom

You haven’t laid out you’re application architecture (how many clients,

who is reading who is writing, etc). Caching doesn’t mean your database is
any less crash proof. At that rate of activity, depending on architecture,
you could lose updates in all sorts of crash scenarios.

As for crash proof, I meant that once my client app is told that her

update request was committed, it mustn't get lost (hdd failure apart of
course). And I can't wait to flush the cache before telling to the app
:"committed".

I can replicate also the cache on the standby PC of course.
Regards
Pupillo

OK clientA sends an update; you commit and tell clientA committed. clientB
updates same record; Do you tell clientA of clientB’s update?
Are the two updates cumulative or destructive.
Can you report all updates done by clientA?

I have one direct DB client (let's name it MIDAPP) only. This client of
the DB is a server for up to 10000 final clients.
Any time MIDAPP is going to reply to a client, it must save a "status
record with some data" related to that client and only after that,
answering /committing the final client.
The next time the same final client will ask something, the same status
record will be updated again (with a different content).
Each client can send up to 10000 reqs per day, up to 1 per second.
So, if I lose the cache, I'm done. I don't want to send the status to the
final clients (in order to get it back in case).
I can evaluate to use a periodic_snapshot+ a sequential log for tracking
the newer updates, but still to evaluate.
Regards
Pupillo

#11

Francisco Olarte

folarte@peoplecall.com

over 9 years ago

In reply to: Tom DalPozzo (#4)

Re: huge table occupation after updates

A couple of things first.

1.- This list encourages inline replying, editing the text, and frowns
upon top posting.

2.- Your HTML formatting with so a small size makes it harder for me (
and I can assume some others ) to properly read your messages.

If you want to discourage people replying to you, keep doing the two above.

On Sat, Dec 10, 2016 at 3:25 PM, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:

you're right, VACUUM FULL recovered the space, completely.

Well, it always does. ;-)

So, at this point I'm worried about my needs.
I cannot issue vacuum full as I read it locks the table.

Well, first hint of your needs. Bear in mind vacuum fulls can be very
fast on small tables ( i.e, if you have the 1.5Mb table, and do 2000
updates and then a vacuum full that will be very fast, time it ).

In my DB, I (would) need to have a table with one bigint id field+ 10 bytea
fields, 100 bytes long each (more or less, not fixed).
5/10000 rows maximum, but let's say 5000.
As traffic I can suppose 10000 updates per row per day (spread over groups
of hours; each update involving two of those fields, randomly.
Also rows are chosen randomly (in my test I used a block of 2000 just to try
one possibility).
So, it's a total of 50 millions updates per day, hence (50millions * 100
bytes *2 fields updated) 10Gbytes net per day.

Not at all. That's the volume of updated data, you must multiply by
the ROW size, not just the changed size, in your case 50M * 1100 ( to
have some wiggle room ), 55Gbytes.

But this is the UPPER BOUND you asked for. Not the real one.

I'm afraid it's not possible, according to my results.

It certaninly is. You can set a very aggresive autovacuum schedule for
the table, or even better, you may vacuum AFTER each hourly update.
This will mark dead tuples for reuse. It will not be as fast, but it
can certainly be fast enough.

And, if you only update once an hour, you may try other tricks ( like
copy to a temp table, truncate the original and insert the temp table
in the original, although I fear this will lock the table too, but it
will be a very short time, your readers may well tolerate it. )

Yours seem a special app with special need, try a few, measure, it is
certainly possible.

Francisco Olarte.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#12

Francisco Olarte

folarte@peoplecall.com

over 9 years ago

In reply to: Tom DalPozzo (#8)

Re: huge table occupation after updates

Tom:

On Sat, Dec 10, 2016 at 6:01 PM, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:

As for crash proof, I meant that once my client app is told that her update
request was committed, it mustn't get lost (hdd failure apart of course).
And I can't wait to flush the cache before telling to the app :"committed".
I can replicate also the cache on the standby PC of course.

You are making inconsistent requests. When the server tells your app
it's commited, it has flush the transaction log cache. If your
assertion about is real, you cannot wait for commit, so your
requirements are imposible to satisfy ( of course, you could run with
scissors, but that will loose data without hdd failure ).

Francisco Olarte.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#13

Tom DalPozzo

t.dalpozzo@gmail.com

over 9 years ago

In reply to: Francisco Olarte (#12)

Re: huge table occupation after updates

2016-12-10 18:33 GMT+01:00 Francisco Olarte <folarte@peoplecall.com>:

Tom:

On Sat, Dec 10, 2016 at 6:01 PM, Tom DalPozzo <t.dalpozzo@gmail.com>
wrote:

As for crash proof, I meant that once my client app is told that her

update

request was committed, it mustn't get lost (hdd failure apart of course).
And I can't wait to flush the cache before telling to the app

:"committed".

I can replicate also the cache on the standby PC of course.

You are making inconsistent requests. When the server tells your app
it's commited, it has flush the transaction log cache. If your
assertion about is real, you cannot wait for commit, so your
requirements are imposible to satisfy ( of course, you could run with
scissors, but that will loose data without hdd failure ).

Francisco Olarte.

Hi, perhaps I was not clear. The cache I mentioned is a possible cache in
my app, outside postgresql server.
I answered to Rob's question with more details regarding my app.

Regards
Pupillo

#14

Tom DalPozzo

t.dalpozzo@gmail.com

over 9 years ago

In reply to: Francisco Olarte (#11)

Re: huge table occupation after updates

2016-12-10 18:30 GMT+01:00 Francisco Olarte <folarte@peoplecall.com>:

A couple of things first.

1.- This list encourages inline replying, editing the text, and frowns
upon top posting.

2.- Your HTML formatting with so a small size makes it harder for me (
and I can assume some others ) to properly read your messages.

If you want to discourage people replying to you, keep doing the two above.

On Sat, Dec 10, 2016 at 3:25 PM, Tom DalPozzo <t.dalpozzo@gmail.com>
wrote:

you're right, VACUUM FULL recovered the space, completely.

Well, it always does. ;-)

So, at this point I'm worried about my needs.
I cannot issue vacuum full as I read it locks the table.

Well, first hint of your needs. Bear in mind vacuum fulls can be very
fast on small tables ( i.e, if you have the 1.5Mb table, and do 2000
updates and then a vacuum full that will be very fast, time it ).

In my DB, I (would) need to have a table with one bigint id field+ 10

bytea

fields, 100 bytes long each (more or less, not fixed).
5/10000 rows maximum, but let's say 5000.
As traffic I can suppose 10000 updates per row per day (spread over

groups

of hours; each update involving two of those fields, randomly.
Also rows are chosen randomly (in my test I used a block of 2000 just to

try

one possibility).
So, it's a total of 50 millions updates per day, hence (50millions * 100
bytes *2 fields updated) 10Gbytes net per day.

Not at all. That's the volume of updated data, you must multiply by
the ROW size, not just the changed size, in your case 50M * 1100 ( to
have some wiggle room ), 55Gbytes.

But this is the UPPER BOUND you asked for. Not the real one.

I'm afraid it's not possible, according to my results.

It certaninly is. You can set a very aggresive autovacuum schedule for
the table, or even better, you may vacuum AFTER each hourly update.
This will mark dead tuples for reuse. It will not be as fast, but it
can certainly be fast enough.

And, if you only update once an hour, you may try other tricks ( like
copy to a temp table, truncate the original and insert the temp table
in the original, although I fear this will lock the table too, but it
will be a very short time, your readers may well tolerate it. )

Yours seem a special app with special need, try a few, measure, it is
certainly possible.

Francisco Olarte.

Hi, I think you're right. I was surprised by the huge size of the tables
in my tests but I had not considered the vacuum properly.
My test had a really huge activity so perhaps the autovacuum didn't have
time to make the rows reusable.
Also, issuing plain VACUUM command does nothing visibile at once, but only
after when, inserting new rows, the size doesn't increase.
I will try again as you suggest.
Thank you very much
Pupillo

#15

Adrian Klaver

adrian.klaver@aklaver.com

over 9 years ago

In reply to: Francisco Olarte (#11)

Re: huge table occupation after updates

On 12/10/2016 09:30 AM, Francisco Olarte wrote:

A couple of things first.

1.- This list encourages inline replying, editing the text, and frowns
upon top posting.

2.- Your HTML formatting with so a small size makes it harder for me (
and I can assume some others ) to properly read your messages.

+1. I either had to Ctrl + or put the 'readers' on:)

If you want to discourage people replying to you, keep doing the two above.

On Sat, Dec 10, 2016 at 3:25 PM, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:

you're right, VACUUM FULL recovered the space, completely.

Well, it always does. ;-)

So, at this point I'm worried about my needs.
I cannot issue vacuum full as I read it locks the table.

Well, first hint of your needs. Bear in mind vacuum fulls can be very
fast on small tables ( i.e, if you have the 1.5Mb table, and do 2000
updates and then a vacuum full that will be very fast, time it ).

In my DB, I (would) need to have a table with one bigint id field+ 10 bytea
fields, 100 bytes long each (more or less, not fixed).
5/10000 rows maximum, but let's say 5000.
As traffic I can suppose 10000 updates per row per day (spread over groups
of hours; each update involving two of those fields, randomly.
Also rows are chosen randomly (in my test I used a block of 2000 just to try
one possibility).
So, it's a total of 50 millions updates per day, hence (50millions * 100
bytes *2 fields updated) 10Gbytes net per day.

Not at all. That's the volume of updated data, you must multiply by
the ROW size, not just the changed size, in your case 50M * 1100 ( to
have some wiggle room ), 55Gbytes.

But this is the UPPER BOUND you asked for. Not the real one.

I'm afraid it's not possible, according to my results.

It certaninly is. You can set a very aggresive autovacuum schedule for
the table, or even better, you may vacuum AFTER each hourly update.
This will mark dead tuples for reuse. It will not be as fast, but it
can certainly be fast enough.

And, if you only update once an hour, you may try other tricks ( like
copy to a temp table, truncate the original and insert the temp table
in the original, although I fear this will lock the table too, but it
will be a very short time, your readers may well tolerate it. )

Yours seem a special app with special need, try a few, measure, it is
certainly possible.

Francisco Olarte.

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#16

Adrian Klaver

adrian.klaver@aklaver.com

over 9 years ago

In reply to: Tom DalPozzo (#14)

Re: huge table occupation after updates

On 12/10/2016 10:15 AM, Tom DalPozzo wrote:

2016-12-10 18:30 GMT+01:00 Francisco Olarte <folarte@peoplecall.com
<mailto:folarte@peoplecall.com>>:

A couple of things first.

1.- This list encourages inline replying, editing the text, and frowns
upon top posting.

2.- Your HTML formatting with so a small size makes it harder for me (
and I can assume some others ) to properly read your messages.

If you want to discourage people replying to you, keep doing the two
above.

On Sat, Dec 10, 2016 at 3:25 PM, Tom DalPozzo <t.dalpozzo@gmail.com
<mailto:t.dalpozzo@gmail.com>> wrote:

you're right, VACUUM FULL recovered the space, completely.

Well, it always does. ;-)

So, at this point I'm worried about my needs.
I cannot issue vacuum full as I read it locks the table.

Well, first hint of your needs. Bear in mind vacuum fulls can be very
fast on small tables ( i.e, if you have the 1.5Mb table, and do 2000
updates and then a vacuum full that will be very fast, time it ).

In my DB, I (would) need to have a table with one bigint id field+

10 bytea

fields, 100 bytes long each (more or less, not fixed).
5/10000 rows maximum, but let's say 5000.
As traffic I can suppose 10000 updates per row per day (spread

over groups

of hours; each update involving two of those fields, randomly.
Also rows are chosen randomly (in my test I used a block of 2000

just to try

one possibility).
So, it's a total of 50 millions updates per day, hence (50millions

* 100

bytes *2 fields updated) 10Gbytes net per day.

Not at all. That's the volume of updated data, you must multiply by
the ROW size, not just the changed size, in your case 50M * 1100 ( to
have some wiggle room ), 55Gbytes.

But this is the UPPER BOUND you asked for. Not the real one.

I'm afraid it's not possible, according to my results.

It certaninly is. You can set a very aggresive autovacuum schedule for
the table, or even better, you may vacuum AFTER each hourly update.
This will mark dead tuples for reuse. It will not be as fast, but it
can certainly be fast enough.

And, if you only update once an hour, you may try other tricks ( like
copy to a temp table, truncate the original and insert the temp table
in the original, although I fear this will lock the table too, but it
will be a very short time, your readers may well tolerate it. )

Yours seem a special app with special need, try a few, measure, it is
certainly possible.

Francisco Olarte.

Hi, I think you're right. I was surprised by the huge size of the
tables in my tests but I had not considered the vacuum properly.
My test had a really huge activity so perhaps the autovacuum didn't have
time to make the rows reusable.
Also, issuing plain VACUUM command does nothing visibile at once, but
only after when, inserting new rows, the size doesn't increase.
I will try again as you suggest.

To make more sense of this I would suggest reading the following
sections of the manual:

https://www.postgresql.org/docs/9.5/static/routine-vacuuming.html

https://www.postgresql.org/docs/9.5/static/mvcc.html

There is a lot of ground covered in the above, more then can be digested
in one pass but it will help provide some context for the
answers/suggestions provided in this thread.

Thank you very much
Pupillo

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#17

Alban Hertroys

haramrae@gmail.com

over 9 years ago

In reply to: Tom DalPozzo (#14)

Re: huge table occupation after updates

Please use a readable font. Your messages are using a font that's so small that my eyes start to hurt. I still try to read them, but I - and I assume others - will stop trying if you keep this up.

Sorry for the top-post, but since it's not directly appropriate to the topic that's perhaps for the better.

On 10 Dec 2016, at 19:15, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:

2016-12-10 18:30 GMT+01:00 Francisco Olarte <folarte@peoplecall.com>:
A couple of things first.

1.- This list encourages inline replying, editing the text, and frowns
upon top posting.

2.- Your HTML formatting with so a small size makes it harder for me (
and I can assume some others ) to properly read your messages.

If you want to discourage people replying to you, keep doing the two above.

On Sat, Dec 10, 2016 at 3:25 PM, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:

you're right, VACUUM FULL recovered the space, completely.

Well, it always does. ;-)

So, at this point I'm worried about my needs.
I cannot issue vacuum full as I read it locks the table.

Well, first hint of your needs. Bear in mind vacuum fulls can be very
fast on small tables ( i.e, if you have the 1.5Mb table, and do 2000
updates and then a vacuum full that will be very fast, time it ).

In my DB, I (would) need to have a table with one bigint id field+ 10 bytea
fields, 100 bytes long each (more or less, not fixed).
5/10000 rows maximum, but let's say 5000.
As traffic I can suppose 10000 updates per row per day (spread over groups
of hours; each update involving two of those fields, randomly.
Also rows are chosen randomly (in my test I used a block of 2000 just to try
one possibility).
So, it's a total of 50 millions updates per day, hence (50millions * 100
bytes *2 fields updated) 10Gbytes net per day.

Not at all. That's the volume of updated data, you must multiply by
the ROW size, not just the changed size, in your case 50M * 1100 ( to
have some wiggle room ), 55Gbytes.

But this is the UPPER BOUND you asked for. Not the real one.

I'm afraid it's not possible, according to my results.

It certaninly is. You can set a very aggresive autovacuum schedule for
the table, or even better, you may vacuum AFTER each hourly update.
This will mark dead tuples for reuse. It will not be as fast, but it
can certainly be fast enough.

And, if you only update once an hour, you may try other tricks ( like
copy to a temp table, truncate the original and insert the temp table
in the original, although I fear this will lock the table too, but it
will be a very short time, your readers may well tolerate it. )

Yours seem a special app with special need, try a few, measure, it is
certainly possible.

Francisco Olarte.

Hi, I think you're right. I was surprised by the huge size of the tables in my tests but I had not considered the vacuum properly.
My test had a really huge activity so perhaps the autovacuum didn't have time to make the rows reusable.
Also, issuing plain VACUUM command does nothing visibile at once, but only after when, inserting new rows, the size doesn't increase.
I will try again as you suggest.
Thank you very much
Pupillo

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#18

David G. Johnston

david.g.johnston@gmail.com

over 9 years ago

In reply to: Tom DalPozzo (#10)

Re: huge table occupation after updates

On Saturday, December 10, 2016, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:

I have one direct DB client (let's name it MIDAPP) only. This client of
the DB is a server for up to 10000 final clients.
Any time MIDAPP is going to reply to a client, it must save a "status
record with some data" related to that client and only after that,
answering /committing the final client.
The next time the same final client will ask something, the same status
record will be updated again (with a different content).

Why do you want to pay for concurrency control when you don't seem to need
it? While PostgreSQL likely can do what you need I suspect there are
applications out there that can solve this specific problem better. Even
something as simple as a flat file, one per "final client", written
atomically and fsynced after each write/rename.

David J,

#19

Tom DalPozzo

t.dalpozzo@gmail.com

over 9 years ago

In reply to: David G. Johnston (#18)

Re: huge table occupation after updates

Il 12/12/2016 02:42, David G. Johnston ha scritto:

On Saturday, December 10, 2016, Tom DalPozzo <t.dalpozzo@gmail.com
<mailto:t.dalpozzo@gmail.com>> wrote:

I have one direct DB client (let's name it MIDAPP) only. This
client of the DB is a server for up to 10000 final clients.
Any time MIDAPP is going to reply to a client, it must save a
"status record with some data" related to that client and only
after that, answering /committing the final client.
The next time the same final client will ask something, the
same status record will be updated again (with a different
content).

Why do you want to pay for concurrency control when you don't seem
to need it? While PostgreSQL likely can do what you need I
suspect there are applications out there that can solve this
specific problem better. Even something as simple as a flat file,
one per "final client", written atomically and fsynced after each
write/rename.

David J,

Hi David,
there are also other DB clients which only perform read queries using
SQL. It's the reason why I chose postgreSQL over simpler apps. I didn't
mention about them so far as those queries are not a concern in terms of
performance.

Anyway, regarding the huge dimension of the table, I think that reason
was that autovacuum didn't work as the updates traffic was really high
in my test, with no pause. Infact, if I lower it down to
1500updates/sec, then autovacuum works (I checked the log).
So the table size can grow but not for ever as it gets reused.
Thank you very much.

Pupillo

huge table occupation after updates

Attachments: