Column reset all values

Started by otar shavadzealmost 6 years ago5 messagesgeneral

Jump to latest

otar shavadze

oshavadze@gmail.com

almost 6 years ago

postgres version 12
I have very simple update query, like this:

update my_table
set
col = NULL
where
col IS NOT NULL;

my_table contains few million rows, col is indexed column

Fastest way would be alter table, drop column and then add column again,
but I can't do this just because on this column depends bunch of views
and materialized views.

No much hope, but still asking, Is there some another way to just reset
column all values? ( with NULL in my case)

Peter J. Holzer

hjp-pgsql@hjp.at

almost 6 years ago

In reply to: otar shavadze (#1)

Re: Column reset all values

On 2020-05-13 12:13:20 +0400, otar shavadze wrote:

postgres version 12
I have very simple update query, like this:

update my_table
set
col = NULL
where
col IS NOT NULL;

my_table contains few million rows, col is indexed column

You might want to drop the index before doing this. You obviously won't
need the index afterwards and the database may be able to use HOT
updates if there is no index on the column (but that depends on the
amount of unused space in each block).

Olivier Gautherot

ogautherot@gautherot.net

almost 6 years ago

In reply to: otar shavadze (#1)

Re: Column reset all values

Hi Otar,

On Wed, May 13, 2020 at 10:15 AM otar shavadze <oshavadze@gmail.com> wrote:

postgres version 12
I have very simple update query, like this:

update my_table
set
col = NULL
where
col IS NOT NULL;

my_table contains few million rows, col is indexed column

Fastest way would be alter table, drop column and then add column again,
but I can't do this just because on this column depends bunch of views
and materialized views.

No much hope, but still asking, Is there some another way to just reset
column all values? ( with NULL in my case)

If views depend on this column, you may need to drop them (in the right
order...) and then recreate them. Now, if they depend on a column that will
not contain significant data, you may wish to remove the column, or declare
it as null if you need to maintain compatibility.

Now, if you have time and down time of the database is an issue, you may
run the UPDATE on lots of 1000 rows (or whatever that number fits you).
UPDATE is typically a INSERT/DELETE/VACUUM sequence and this copying around
is the killer - doing it in one go can temporarily increase the disk usage.
I've had success with the following pseudo code:

SELECT rowid FROM mytable WHERE col IS NOT NULL

and fed the result to something like:

FOR chunk IN chunk_in_1000_rows(query_result)
DO
BEGIN
UPDATE my_table SET col = NULL WHERE rowid IN chunk
COMMIT
SLEEP(5)
DONE

You may wish to run a VACUUM FULL manually at the end.

In my case, I had to compute individual numbers so the processing was a bit
more complex but it happily processed over 60 millions rows in a few days.

Hope it helps
--
Olivier Gautherot

otar shavadze

oshavadze@gmail.com

almost 6 years ago

In reply to: Olivier Gautherot (#3)

Re: Column reset all values

Thanks a lot. Drop and re-create views is not an option, because there is a
lot views, (and materialized views).
also nor index drop is an option, because I need re-create index as I use
this table in procedure, so index is necessary for further queries. So
total runtime will not decreased.

Thanks Olivier, I will test out with second option, you mentioned.

On Wed, May 13, 2020 at 1:15 PM Olivier Gautherot <ogautherot@gautherot.net>
wrote:

Show quoted text

Hi Otar,

On Wed, May 13, 2020 at 10:15 AM otar shavadze <oshavadze@gmail.com>
wrote:

postgres version 12
I have very simple update query, like this:

update my_table
set
col = NULL
where
col IS NOT NULL;

my_table contains few million rows, col is indexed column

Fastest way would be alter table, drop column and then add column
again, but I can't do this just because on this column depends bunch of
views and materialized views.

No much hope, but still asking, Is there some another way to just reset
column all values? ( with NULL in my case)

If views depend on this column, you may need to drop them (in the right
order...) and then recreate them. Now, if they depend on a column that will
not contain significant data, you may wish to remove the column, or declare
it as null if you need to maintain compatibility.

Now, if you have time and down time of the database is an issue, you may
run the UPDATE on lots of 1000 rows (or whatever that number fits you).
UPDATE is typically a INSERT/DELETE/VACUUM sequence and this copying around
is the killer - doing it in one go can temporarily increase the disk usage.
I've had success with the following pseudo code:

SELECT rowid FROM mytable WHERE col IS NOT NULL

and fed the result to something like:

FOR chunk IN chunk_in_1000_rows(query_result)
DO
BEGIN
UPDATE my_table SET col = NULL WHERE rowid IN chunk
COMMIT
SLEEP(5)
DONE

You may wish to run a VACUUM FULL manually at the end.

In my case, I had to compute individual numbers so the processing was a
bit more complex but it happily processed over 60 millions rows in a few
days.

Hope it helps
--
Olivier Gautherot

Peter J. Holzer

hjp-pgsql@hjp.at

almost 6 years ago

In reply to: otar shavadze (#4)

Re: Column reset all values

On 2020-05-14 16:32:41 +0400, otar shavadze wrote:

also nor index drop is an option, because I need re-create index as I use this
table in procedure, so index is necessary for further queries. So total
runtime will not decreased.

A full index on a column where all the values are the same (NULL in this
case) is useless. For querys which check for the existing value, the
optimizer will notice that a full table scan is faster. For query which
use any other value, you expect 0 results: So all the optimizer want to know is
that the value is indeed not in the column so that it can skip the table
entirely. You can do that with a partial index (WHERE col IS NOT NULL)
or maybe even a constraint.

So I would drop the full index, update the table and then create a
partial index.