Help in Parsing PG log usings CSV format

Started by Arvind Singhabout 14 years ago6 messagesgeneral
Jump to latest
#1Arvind Singh
arvindps@hotmail.com

Help needed in parsing PostgreSQL CSV Log
Hello friends,
I am working an a section of application which needs to Parse CSV Logs generated by PostgreSql server.
- The Logs are stored C:\Program Files\PostgreSQL\9.0\data\pg_log
- The Server version in 9.0.4
- The application is developed in C Sharp
* The basic utility after Parse the Log is to show contents in a DataGridView.
* There are other filter options like to view log contents for a particular range of Time for a Day.

**However the main problem that is, the Log format is not readable**

A Sample Log data line

2012-03-21 11:59:20.640 IST,"postgres","stock_apals",3276,"localhost:1639",4f697540.ccc,10,"idle",2012-03-21 11:59:20 IST,2/163,0,LOG,00000,"statement: SELECT version()",,,,,,,,"exec_simple_query, .\src\backend\tcop\postgres.c:900",""

As you can see the columns in the Log are comma separated , But however individual values are not Quote Enclosed.

For instance the 1st,4rth,6th .. columns

**Is there a setting in PostgreSQL configuration to place quotes around all columns in a Logfili

I just want to update the columns so that all are within Quotes

what happens wrong is when it reaches the column where sql statement is place. it also has commas set for table columns. The log line is a mix bunch of quote-enclosed and non-quote-enclosed column. is there is a configuration or utility to convert the non-quoted column to quoted column

PS : the copy utility or any other utility cannot be used , as we have to parse the log within a C Sharp application

Thanks for any advice and help

#2Martin Gregorie
martin@gregorie.org
In reply to: Arvind Singh (#1)
Re: Help in Parsing PG log usings CSV format

On Thu, 2012-03-22 at 09:32 +0000, Arvind Singh wrote:

Help needed in parsing PostgreSQL CSV Log
Hello friends,
I am working an a section of application which needs to Parse CSV Logs generated by PostgreSql server.
- The Logs are stored C:\Program Files\PostgreSQL\9.0\data\pg_log
- The Server version in 9.0.4
- The application is developed in C Sharp
* The basic utility after Parse the Log is to show contents in a DataGridView.
* There are other filter options like to view log contents for a particular range of Time for a Day.

**However the main problem that is, the Log format is not readable**

A Sample Log data line

2012-03-21 11:59:20.640 IST,"postgres","stock_apals",3276,"localhost:1639",4f697540.ccc,10,"idle",2012-03-21 11:59:20 IST,2/163,0,LOG,00000,"statement: SELECT version()",,,,,,,,"exec_simple_query, .\src\backend\tcop\postgres.c:900",""

As you can see the columns in the Log are comma separated , But however individual values are not Quote Enclosed.

For instance the 1st,4rth,6th .. columns

Thats fairly standard. A good CSV parser only requires a field to be
quoted if it contains commas or quotes. In the latter case the internal
quotes should be doubled, i.e the three fields in the following:

unquoted field,"contains commas, etc.","Fred said ""Cool!""."

should be handled correctly by a decent CSV parser.

Martin

#3Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Arvind Singh (#1)
Re: Help in Parsing PG log usings CSV format

Arvind Singh wrote:

Help needed in parsing PostgreSQL CSV Log

[...]

**However the main problem that is, the Log format is not readable**

A Sample Log data line

2012-03-21 11:59:20.640

IST,"postgres","stock_apals",3276,"localhost:1639",4f697540.ccc,10,"idle
",2012-03-21 11:59:20

IST,2/163,0,LOG,00000,"statement: SELECT

version()",,,,,,,,"exec_simple_query,

.\src\backend\tcop\postgres.c:900",""

As you can see the columns in the Log are comma separated , But

however individual values are not

Quote Enclosed.

For instance the 1st,4rth,6th .. columns

**Is there a setting in PostgreSQL configuration to place quotes

around all columns in a Logfili

I just want to update the columns so that all are within Quotes

what happens wrong is when it reaches the column where sql statement

is place. it also has commas set

for table columns. The log line is a mix bunch of quote-enclosed and

non-quote-enclosed column. is

there is a configuration or utility to convert the non-quoted column

to quoted column

The columns that are not quoted are guaranteed not to contain a comma.
So it shouldn't be a problem to parse them.

In fact, it is quite easy. As an example, see here:
http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/pgreplay/pgreplay/parse.c?re
v=1.14&content-type=text/x-cvsweb-markup
In the function parse_csvlog_entry, after the comment
"read next line after start timestamp from log file"
you can find code that parses such a line.

The code is in C, so it should be easy to port it to C#, which
is essentially Java, which has C-like syntax.

Yours,
Laurenz Albe

#4Arvind Singh
arvindps@hotmail.com
In reply to: Laurenz Albe (#3)
Re: Parsing PG log usings CSV format

Thank you sir,

i have sorted out the problem on
" The columns that are not quoted are guaranteed not to contain a comma. "

But i have another query, the structure of PG Log CSV as mentioned in manual and as below has 24 columns
http://www.postgresql.org/docs/9.1/static/runtime-config-logging.html
-------------------------------------------

log_time timestamp,
user_name text,
database_name text,
process_id integer,
connection_from text,
session_id text,
session_line_num bigint,
command_tag text,
session_start_time timestamp,
virtual_transaction_id text,
transaction_id bigint,
error_severity text,
sql_state_code text,
message text,
detail text,
hint text,
internal_query text,
internal_query_pos integer,
context text,
query text,
query_pos integer,
location text,
application_name text,
PRIMARY KEY (session_id, session_line_num)
-------------------------------------------

However every Log line added contains only 22 columns, a sample log line is given below
-------------------------------------------

2012-03-22 15:29:53.546 IST,"postgres","stock_apals",2396,"localhost:2766",4f6af819.95c,9,"SELECT",2012-03-22 15:29:53 IST,3/0,0,LOG,00000,"QUERY STATISTICS","! system usage stats:
! 0.015000 elapsed 0.000000 user 0.015625 system sec
! [0.078125 user 0.062500 sys total]",,,,,"SELECT SUBSTR(itemname, 1, 15) as Product,
avg(rate) as Avgrate
FROM issue_stock WHERE extract(year from issue_stock.edate) = '2011'
GROUP BY itemname
order by itemname",,"ShowUsage, .\src\backend\tcop\postgres.c:4305",""
-------------------------------------------

the last column of the Log is not the Primarykey ?
the last column as of yet is unknown because at all occurances it is a empty quoted column ?
the column numbers dont match with the generated log

Is this is a different Log format , can you guide us to a page where the column specifications can be matched.

Yrs truly
arvind pal singh

Show quoted text

Subject: RE: [GENERAL] Help in Parsing PG log usings CSV format
Date: Thu, 22 Mar 2012 16:47:11 +0100
From: laurenz.albe@wien.gv.at
To: arvindps@hotmail.com; pgsql-general@postgresql.org

Arvind Singh wrote:

Help needed in parsing PostgreSQL CSV Log

[...]

**However the main problem that is, the Log format is not readable**

A Sample Log data line

2012-03-21 11:59:20.640

IST,"postgres","stock_apals",3276,"localhost:1639",4f697540.ccc,10,"idle
",2012-03-21 11:59:20

IST,2/163,0,LOG,00000,"statement: SELECT

version()",,,,,,,,"exec_simple_query,

.\src\backend\tcop\postgres.c:900",""

As you can see the columns in the Log are comma separated , But

however individual values are not

Quote Enclosed.

For instance the 1st,4rth,6th .. columns

**Is there a setting in PostgreSQL configuration to place quotes

around all columns in a Logfili

I just want to update the columns so that all are within Quotes

what happens wrong is when it reaches the column where sql statement

is place. it also has commas set

for table columns. The log line is a mix bunch of quote-enclosed and

non-quote-enclosed column. is

there is a configuration or utility to convert the non-quoted column

to quoted column

The columns that are not quoted are guaranteed not to contain a comma.
So it shouldn't be a problem to parse them.

In fact, it is quite easy. As an example, see here:
http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/pgreplay/pgreplay/parse.c?re
v=1.14&content-type=text/x-cvsweb-markup
In the function parse_csvlog_entry, after the comment
"read next line after start timestamp from log file"
you can find code that parses such a line.

The code is in C, so it should be easy to port it to C#, which
is essentially Java, which has C-like syntax.

Yours,
Laurenz Albe

#5Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Arvind Singh (#4)
Re: Parsing PG log usings CSV format

Arvind Singh wrote:

i have sorted out the problem on
" The columns that are not quoted are guaranteed not to contain a

comma. "

Good.

But i have another query, the structure of PG Log CSV as mentioned in

manual and as below has 24

columns
http://www.postgresql.org/docs/9.1/static/runtime-config-logging.html
-------------------------------------------

log_time timestamp,
user_name text,
database_name text,
process_id integer,
connection_from text,
session_id text,
session_line_num bigint,
command_tag text,
session_start_time timestamp,
virtual_transaction_id text,
transaction_id bigint,
error_severity text,
sql_state_code text,
message text,
detail text,
hint text,
internal_query text,
internal_query_pos integer,
context text,
query text,
query_pos integer,
location text,
application_name text,
PRIMARY KEY (session_id, session_line_num)

Actually, that's only 23 columns. The last line is a constraint.

However every Log line added contains only 22 columns, a sample log

line is given below

-------------------------------------------

2012-03-22 15:29:53.546

IST,"postgres","stock_apals",2396,"localhost:2766",4f6af819.95c,9,"SELEC
T",2012-03-22 15:29:53

IST,3/0,0,LOG,00000,"QUERY STATISTICS","! system usage stats:
! 0.015000 elapsed 0.000000 user 0.015625 system sec
! [0.078125 user 0.062500 sys total]",,,,,"SELECT

SUBSTR(itemname, 1, 15) as Product,

avg(rate) as Avgrate
FROM issue_stock WHERE extract(year from issue_stock.edate) =

'2011'

GROUP BY itemname
order by itemname",,"ShowUsage,

.\src\backend\tcop\postgres.c:4305",""

-------------------------------------------

the last column of the Log is not the Primarykey ?
the last column as of yet is unknown because at all occurances it is

a empty quoted column ?

the column numbers dont match with the generated log

Is this is a different Log format , can you guide us to a page where

the column specifications can be

matched.

You must be using PostgreSQL 8.4 or earlier.
The last column, "application_name", was added in 9.0.
Look up the documentation for your version and you'll find it.

Yours,
Laurenz Albe

#6Jasen Betts
jasen@xnet.co.nz
In reply to: Arvind Singh (#1)
Re: Help in Parsing PG log usings CSV format

On 2012-03-22, Martin Gregorie <martin@gregorie.org> wrote:

Thats fairly standard. A good CSV parser only requires a field to be
quoted if it contains commas or quotes.

quotes,commas, or linebreaks

copy ( values (2,'comma, etc'),(3,'and "quote".'),(1,'line
break') ) to stdout with csv;

--
⚂⚃ 100% natural