why select count(*) consumes wal logs

Started by Ravi Krishnaover 7 years ago15 messagesgeneral
Jump to latest
#1Ravi Krishna
srkrishna1@aol.com

PG 10.5

I loaded 133 million rows to a wide table (more than 100 cols) via COPY. The table has
no index at this time. Since I am the only user I don't see any other activity.
Now when I run select count(*) on the table where I just loaded data, it runs for ever,
more than 10min and still running. Intrigued, I checked locks and saw nothing. Then I noticed something
strange. When select count(*) runs, PG is writing to wal_logs, and that too a large amount. Why?
I suspect vaccum is getting triggered, but this is a brand new table with no updates. So it should not.

Is there a SQL to peek into what PG is doing to write so much to WAL logs ?

#2Ravi Krishna
srkrishna1@aol.com
In reply to: Ravi Krishna (#1)
Re: why select count(*) consumes wal logs

Must be something to do with Vaccum as the second time I ran the SQL, it did not consume WAL logs.

#3Michael Nolan
htfoot@gmail.com
In reply to: Ravi Krishna (#1)
Re: why select count(*) consumes wal logs

On Tue, Nov 6, 2018 at 11:08 AM Ravi Krishna <srkrishna1@aol.com> wrote:

PG 10.5

I loaded 133 million rows to a wide table (more than 100 cols) via COPY.

It's always a good idea after doing a large scale data load to do a vacuum
analyze on the table (or the entire database.)
--
Mike Nolan

#4Ron
ronljohnsonjr@gmail.com
In reply to: Michael Nolan (#3)
Re: why select count(*) consumes wal logs

On 11/06/2018 11:12 AM, Michael Nolan wrote:

On Tue, Nov 6, 2018 at 11:08 AM Ravi Krishna <srkrishna1@aol.com
<mailto:srkrishna1@aol.com>> wrote:

PG 10.5

I loaded 133 million rows to a wide table (more than 100 cols) via COPY.

It's always a good idea after doing a large scale data load to do a vacuum
analyze on the table (or the entire database.)

I understand the need to ANALYZE (populate the histograms needed by the
dynamic optimizer), but why VACUUM (which is recommended after updates and
deletes).

Thanks

--
Angular momentum makes the world go 'round.

#5Kumar, Virendra
Virendra.Kumar@guycarp.com
In reply to: Ron (#4)
RE: why select count(*) consumes wal logs

I concord.
Why VACUUM when there is no update or deletes.

Regards,
Virendra

From: Ron [mailto:ronljohnsonjr@gmail.com]
Sent: Tuesday, November 06, 2018 12:20 PM
To: pgsql-general@lists.postgresql.org
Subject: Re: why select count(*) consumes wal logs

On 11/06/2018 11:12 AM, Michael Nolan wrote:

On Tue, Nov 6, 2018 at 11:08 AM Ravi Krishna <srkrishna1@aol.com<mailto:srkrishna1@aol.com>> wrote:
PG 10.5

I loaded 133 million rows to a wide table (more than 100 cols) via COPY.

It's always a good idea after doing a large scale data load to do a vacuum analyze on the table (or the entire database.)

I understand the need to ANALYZE (populate the histograms needed by the dynamic optimizer), but why VACUUM (which is recommended after updates and deletes).

Thanks
--
Angular momentum makes the world go 'round.

________________________________

This message is intended only for the use of the addressee and may contain
information that is PRIVILEGED AND CONFIDENTIAL.

If you are not the intended recipient, you are hereby notified that any
dissemination of this communication is strictly prohibited. If you have
received this communication in error, please erase all copies of the message
and its attachments and notify the sender immediately. Thank you.

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ravi Krishna (#1)
Re: why select count(*) consumes wal logs

Ravi Krishna <srkrishna1@aol.com> writes:

I loaded 133 million rows to a wide table (more than 100 cols) via COPY. The table has
no index at this time. Since I am the only user I don't see any other activity.
Now when I run select count(*) on the table where I just loaded data, it runs for ever,
more than 10min and still running. Intrigued, I checked locks and saw nothing. Then I noticed something
strange. When select count(*) runs, PG is writing to wal_logs, and that too a large amount. Why?

That represents setting the yes-this-row-is-committed hint bits on the
newly loaded rows. The first access to any such row will set that bit,
whether it's a select or a VACUUM or whatever.

regards, tom lane

#7Ravi Krishna
srkrishna1@aol.com
In reply to: Tom Lane (#6)
Re: why select count(*) consumes wal logs

That represents setting the yes-this-row-is-committed hint bits on the
newly loaded rows.  The first access to any such row will set that bit,
whether it's a select or a VACUUM or whatever.

yes now I recollect reading this in a blog. Thanks Tom.

#8Michael Nolan
htfoot@gmail.com
In reply to: Tom Lane (#6)
Re: why select count(*) consumes wal logs

On Tue, Nov 6, 2018 at 11:40 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

That represents setting the yes-this-row-is-committed hint bits on the
newly loaded rows. The first access to any such row will set that bit,
whether it's a select or a VACUUM or whatever.

Tom, does that include ANALYZE?

--
Mike Nolan

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Nolan (#8)
Re: why select count(*) consumes wal logs

Michael Nolan <htfoot@gmail.com> writes:

On Tue, Nov 6, 2018 at 11:40 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

That represents setting the yes-this-row-is-committed hint bits on the
newly loaded rows. The first access to any such row will set that bit,
whether it's a select or a VACUUM or whatever.

Tom, does that include ANALYZE?

Yes, but remember that ANALYZE doesn't scan the whole table; it'll only
set the bit on rows it visits.

(I forget at the moment if it's guaranteed to set the bit on all rows
in each page it examines, or only on the rows it selects to sample.
But in any case it will not examine every page in the table.)

regards, tom lane

#10Bruno Lavoie
bl@brunol.com
In reply to: Tom Lane (#6)
Re: why select count(*) consumes wal logs

Le mar. 6 nov. 2018 12:40 PM, Tom Lane <tgl@sss.pgh.pa.us> a écrit :

Ravi Krishna <srkrishna1@aol.com> writes:

I loaded 133 million rows to a wide table (more than 100 cols) via COPY.

The table has

no index at this time. Since I am the only user I don't see any other

activity.

Now when I run select count(*) on the table where I just loaded data, it

runs for ever,

more than 10min and still running. Intrigued, I checked locks and saw

nothing. Then I noticed something

strange. When select count(*) runs, PG is writing to wal_logs, and that

too a large amount. Why?

That represents setting the yes-this-row-is-committed hint bits on the
newly loaded rows. The first access to any such row will set that bit,
whether it's a select or a VACUUM or whatever.

regards, tom lane

And IIRC, it can generate a high WAL traffic since the first page change
after a checkpoint is done with full page write. And you said that it's
happening on a big table with wide rows....

Show quoted text
#11Steven Winfield
Steven.Winfield@cantabcapital.com
In reply to: Bruno Lavoie (#10)
RE: why select count(*) consumes wal logs

As long as you don’t have page checksums turned on, you can prevent this by turning off wal_log_hints.

Steve.

This email is confidential. If you are not the intended recipient, please advise us immediately and delete this message.
The registered name of Cantab- part of GAM Systematic is Cantab Capital Partners LLP.
See - http://www.gam.com/en/Legal/Email+disclosures+EU for further information on confidentiality, the risks of non-secure electronic communication, and certain disclosures which we are required to make in accordance with applicable legislation and regulations.
If you cannot access this link, please notify us by reply message and we will send the contents to you.

GAM Holding AG and its subsidiaries (Cantab – GAM Systematic) will collect and use information about you in the course of your interactions with us.
Full details about the data types we collect and what we use this for and your related rights is set out in our online privacy policy at https://www.gam.com/en/legal/privacy-policy.
Please familiarise yourself with this policy and check it from time to time for updates as it supplements this notice.

#12Ravi Krishna
srkrishna1@aol.com
In reply to: Steven Winfield (#11)
Re: why select count(*) consumes wal logs

As long as you don’t have page checksums turned on,
you can prevent this by turning off wal_log_hints.

 

I did not run initdb. How to find out which parameter were used with initdb. For page checksums
to be on, it must have been run with -k option.

Our wal_log_hints is left at default which means off.

thanks

#13Steven Winfield
Steven.Winfield@cantabcapital.com
In reply to: Ravi Krishna (#12)
RE: why select count(*) consumes wal logs

How to find out which parameter were used with initdb

pg_controldata -D <datadir> | grep sum
...should give you something like:
Data page checksum version: 0
...and 0 means off.

Similarly, from SQL:
select data_page_checksum_version from pg_control_init()

Steve.

This email is confidential. If you are not the intended recipient, please advise us immediately and delete this message.
The registered name of Cantab- part of GAM Systematic is Cantab Capital Partners LLP.
See - http://www.gam.com/en/Legal/Email+disclosures+EU for further information on confidentiality, the risks of non-secure electronic communication, and certain disclosures which we are required to make in accordance with applicable legislation and regulations.
If you cannot access this link, please notify us by reply message and we will send the contents to you.

GAM Holding AG and its subsidiaries (Cantab – GAM Systematic) will collect and use information about you in the course of your interactions with us.
Full details about the data types we collect and what we use this for and your related rights is set out in our online privacy policy at https://www.gam.com/en/legal/privacy-policy.
Please familiarise yourself with this policy and check it from time to time for updates as it supplements this notice.

#14Ravi Krishna
srkrishna1@aol.com
In reply to: Steven Winfield (#13)
Re: why select count(*) consumes wal logs

select data_page_checksum_version from pg_control_init()

returned 1. So we have page_checksum turned on, and wal_log_hints off.

#15Thomas Kellerer
spam_eater@gmx.net
In reply to: Ravi Krishna (#14)
Re: why select count(*) consumes wal logs

Ravi Krishna schrieb am 07.11.2018 um 15:10:

select data_page_checksum_version from pg_control_init()

returned 1. So we have page_checksum turned on, and wal_log_hints off.

If page_checksum is enabled, then wal_log_hints is ignored (or actually always assumed "on")