How to parse xml containing optional elements

Started by Andrusover 9 years ago8 messagesgeneral
Jump to latest
#1Andrus
kobruleht2@hot.ee

SEPA ISO XML transactions file needs to be parsed into flat table in
Postgres 9.1+ in ASP:NET 4.6 MVC controller.

I tried code below but this produces wrong result:

tasusumma orderinr
150.00 PV04131
0.38 PV04131

Since there is no EndToEnd in second row there should be null in second row
orderinr column. Correct result is:

tasusumma orderinr
150.00 PV04131
0.38 null

How to fix this ?

create temp table t(x xml, nsa text[][]) on commit drop;
insert into t values(
'<?xml version=''1.0'' encoding=''UTF-8''?>
<Document xmlns="urn:iso:std:iso:20022:tech:xsd:camt.053.001.02">
<BkToCstmrStmt>
<Stmt>
<Ntry>
<Amt Ccy="EUR">150.00</Amt>
<NtryDtls>
<TxDtls>
<Refs>
<EndToEndId>PV04131</EndToEndId>
</Refs>
</TxDtls>
</NtryDtls>
</Ntry>
<Ntry>
<Amt Ccy="EUR">0.38</Amt>
<NtryDtls>
<TxDtls>
<Refs>
<AcctSvcrRef>2016080100178214-2</AcctSvcrRef>
</Refs>
</TxDtls>
</NtryDtls>
</Ntry>
</Stmt>
</BkToCstmrStmt>
</Document>
', ARRAY[ARRAY['ns','urn:iso:std:iso:20022:tech:xsd:camt.053.001.02']]);

SELECT
unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry/ns:Amt/text()',
x,nsa))::text::numeric AS tasusumma
,
unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry/ns:NtryDtls/ns:TxDtls/ns:Refs/ns:EndToEndId/text()',
x,nsa))::text AS orderinr
FROM t;

Parsing can done in other ways e.q. using xslt stylesheet for tranformation
or in client side ASP.NET 4.6 MVC if this is more reasonable.

Posted also in
http://stackoverflow.com/questions/38888739/how-to-parse-xml-with-optional-elements

Andrus.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Hannes Erven
hannes@erven.at
In reply to: Andrus (#1)
Re: How to parse xml containing optional elements

Hi Andrus,

SELECT
unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry/ns:Amt/text()',
x,nsa))::text::numeric AS tasusumma
,
unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry/ns:NtryDtls/ns:TxDtls/ns:Refs/ns:EndToEndId/text()',
x,nsa))::text AS orderinr
FROM t;

You need to extract all ns:Ntry elements first, and then get the amount
and EndToEndId for each of them:

SELECT
(xpath('ns:Amt/text()', x,nsa))[1]::text::numeric AS tasusumma,
(xpath('ns:NtryDtls/ns:TxDtls/ns:Refs/ns:EndToEndId/text()', x,nsa))[1]
AS orderinr

FROM (
SELECT unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry',
x,nsa)) as x, nsa
FROM t
) Ntry

Best regards,

-hannes

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Andrus
kobruleht2@hot.ee
In reply to: Hannes Erven (#2)
Re: How to parse xml containing optional elements

Hi!

Thank you.
In "PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled by gcc-4.4.real
(Debian 4.4.5-8) 4.4.5, 64-bit"

it returns two empty rows. How to make it work in this version ?

In "PostgreSQL 9.5.2, compiled by Visual C++ build 1800, 32-bit" it works.

Andrus.

-----Algsļæ½num-----
From: Hannes Erven
Sent: Thursday, August 11, 2016 11:51 AM
To: Andrus
Cc: pgsql-general
Subject: Re: [GENERAL] How to parse xml containing optional elements

Hi Andrus,

SELECT
unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry/ns:Amt/text()',
x,nsa))::text::numeric AS tasusumma
,
unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry/ns:NtryDtls/ns:TxDtls/ns:Refs/ns:EndToEndId/text()',
x,nsa))::text AS orderinr
FROM t;

You need to extract all ns:Ntry elements first, and then get the amount
and EndToEndId for each of them:

SELECT
(xpath('ns:Amt/text()', x,nsa))[1]::text::numeric AS tasusumma,
(xpath('ns:NtryDtls/ns:TxDtls/ns:Refs/ns:EndToEndId/text()', x,nsa))[1]
AS orderinr

FROM (
SELECT unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry',
x,nsa)) as x, nsa
FROM t
) Ntry

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Hannes Erven
hannes@erven.at
In reply to: Andrus (#3)
Re: How to parse xml containing optional elements

Hi,

Thank you.
In "PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled by
gcc-4.4.real (Debian 4.4.5-8) 4.4.5, 64-bit"
it returns two empty rows. How to make it work in this version ?

I couldn't really believe this so I just installed a VM and a 9.1
postgresql just to test this for you.

It seems you hit a bug in PostgreSQL prior to 9.1.15:
https://www.postgresql.org/docs/9.1/static/release-9-1-15.html

"Fix namespace handling in xpath() (Ali Akbar)

Previously, the xml value resulting from an xpath() call would not have
namespace declarations if the namespace declarations were attached to an
ancestor element in the input xml value, rather than to the specific
element being returned. Propagate the ancestral declaration so that the
result is correct when considered in isolation.
"

Given your current PG version, the queries will probably work if you
remove the "ns:" parts of the first two xpaths like this:

SELECT
(xpath('Amt/text()', x,nsa))[1]::text::numeric AS tasusumma,
(xpath('NtryDtls/TxDtls/Refs/EndToEndId/text()', x,nsa))[1] AS orderinr
FROM (
SELECT unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry',
x,nsa)) as x, nsa
FROM t
) Ntry

But that is not a good solution:
- when you eventually do upgrade, the query *will* break
- it depends on the exact location of the namespace declaration in the
source document. Your bank might change that in a way that will be still
perfectly valid, but break the assumptions made in that workaround.

So I suggest to upgrade to a supported version of the 9.1 branch from
your more than 5 years old build (which should be easy to do).

Best regards,

-hannes

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Andrus
kobruleht2@hot.ee
In reply to: Hannes Erven (#4)
Re: How to parse xml containing optional elements

Hi!

I couldn't really believe this so I just installed a VM and a 9.1

postgresql just to test this for you.

It seems you hit a bug in PostgreSQL prior to 9.1.15:
https://www.postgresql.org/docs/9.1/static/release-9-1-15.html
"Fix namespace handling in xpath() (Ali Akbar)
Previously, the xml value resulting from an xpath() call would not have
namespace declarations if the namespace declarations were attached to an
ancestor element in the input xml value, rather than to the specific
element being returned. Propagate the ancestral declaration so that the
result is correct when considered in isolation.
"
Given your current PG version, the queries will probably work if you remove
the "ns:" parts of the first two xpaths like this:

SELECT
(xpath('Amt/text()', x,nsa))[1]::text::numeric AS tasusumma,
(xpath('NtryDtls/TxDtls/Refs/EndToEndId/text()', x,nsa))[1] AS orderinr
FROM (
SELECT unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry',
x,nsa)) as x, nsa
FROM t
) Ntry
But that is not a good solution:
- when you eventually do upgrade, the query *will* break
- it depends on the exact location of the namespace declaration in the
source document. Your bank might change that in a way that will be still
perfectly valid, but break the assumptions made in that workaround.
So I suggest to upgrade to a supported version of the 9.1 branch from your
more than 5 years old build (which should be easy to do).

Psotgres 9.1 run isn Debian Squeeze which is unsupported.
How to upgrade in Debian Squeeze ?

How to add IBAN column to result table? This column has same value for all
rows.

create temp table t(x xml, nsa text[][]) on commit drop;
insert into t values(
'<?xml version=''1.0'' encoding=''UTF-8''?>
<Document xmlns="urn:iso:std:iso:20022:tech:xsd:camt.053.001.02">
<BkToCstmrStmt>
<Stmt>
<Acct>
<Id>
<IBAN>XX00221059842412</IBAN>
</Id>
</Acct>
<Ntry>
<Amt Ccy="EUR">150.00</Amt>
<NtryDtls>
<TxDtls>
<Refs>
<EndToEndId>PV04131</EndToEndId>
</Refs>
</TxDtls>
</NtryDtls>
</Ntry>
<Ntry>
<Amt Ccy="EUR">0.38</Amt>
<NtryDtls>
<TxDtls>
<Refs>
<AcctSvcrRef>2016080100178214-2</AcctSvcrRef>
</Refs>
</TxDtls>
</NtryDtls>
</Ntry>
</Stmt>
</BkToCstmrStmt>
</Document>
', ARRAY[ARRAY['ns','urn:iso:std:iso:20022:tech:xsd:camt.053.001.02']]);

SELECT
unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Acct/ns:Id/ns:IBAN/text()',
x,nsa))::text AS endaaa,
unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry/ns:Amt/text()',
x,nsa))::text::numeric AS tasusumma
,
unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry/ns:NtryDtls/ns:TxDtls/ns:Refs/ns:EndToEndId/text()',
x,nsa))::text AS orderinr
FROM t;

should produce

endaaa tasusumma orderinr
XX00221059842412 150.00 PV04131
XX00221059842412 0.38 null

Andrus.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Hannes Erven
hannes@erven.at
In reply to: Andrus (#5)
Re: How to parse xml containing optional elements

Andrus,

Psotgres 9.1 run isn Debian Squeeze which is unsupported.
How to upgrade in Debian Squeeze ?

A plain "apt-get upgrade postgresql-9.1" does not work?
It might help to enable the postgresql.org APT repository. For
instructions, see here:
https://www.postgresql.org/download/linux/debian/

How to add IBAN column to result table? This column has same value for
all rows.

SELECT
endaaa,
(xpath('ns:Amt/text()', x,nsa))[1]::text::numeric AS tasusumma,
(xpath('ns:NtryDtls/ns:TxDtls/ns:Refs/ns:EndToEndId/text()',
x,nsa))[1] AS orderinr

FROM (
SELECT
(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Acct/ns:Id/ns:IBAN/text()',
x,nsa))[1] as endaaa,
unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry',
x,nsa)) as x, nsa
FROM t
) Ntry

Be careful, this will only work when there is exactly one ns:Stmt
element in the document.
Else you will have to build a third query level, first selecting the
ns:Stmt entries, second the IBAN and Ntry from them and third amount and
EndToEndId.

-hannes

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Andrus
kobruleht2@hot.ee
In reply to: Hannes Erven (#6)
Re: How to parse xml containing optional elements

Hi!

Thank you.

How to upgrade in Debian Squeeze ?
A plain "apt-get upgrade postgresql-9.1" does not work?
It might help to enable the postgresql.org APT repository. For
instructions, see here:
https://www.postgresql.org/download/linux/debian/

apt-get upgrade postgresql-9.1 returns

Reading package lists... Done
Building dependency tree
Reading state information... Done
You might want to run 'apt-get -f install' to correct these.
The following packages have unmet dependencies:
openssl : Depends: libssl1.0.0 (>= 1.0.1e-2+deb7u5) but it is not
installable
wkhtmltox : Depends: libssl1.0.0 but it is not installable
E: Unmet dependencies. Try using -f.

so it looks like repository is found but not usable ?

How to add IBAN column to result table? This column has same value for
all rows.

SELECT

endaaa,
(xpath('ns:Amt/text()', x,nsa))[1]::text::numeric AS tasusumma,
(xpath('ns:NtryDtls/ns:TxDtls/ns:Refs/ns:EndToEndId/text()',
x,nsa))[1] AS orderinr

FROM (
SELECT
(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Acct/ns:Id/ns:IBAN/text()',
x,nsa))[1] as endaaa,
unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry',
x,nsa)) as x, nsa
FROM t
) Ntry

Be careful, this will only work when there is exactly one ns:Stmt element
in the document.
Else you will have to build a third query level, first selecting the
ns:Stmt entries, second the IBAN and Ntry from them and third amount and
EndToEndId.

Hopefully there is only one Stmt element in single file.

I solved it by moving xpath to select IBAN to main select:

SELECT
(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Acct/ns:Id/ns:IBAN/text()',
xo,nsa))[1]::text AS endaaa,
(xpath('ns:Amt/text()', x,nsa))[1]::text::numeric AS tasusumma,
(xpath('ns:NtryDtls/ns:TxDtls/ns:Refs/ns:EndToEndId/text()',
x,nsa))[1] AS orderinr

FROM (
SELECT
unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry',
x,nsa)) as x, nsa, x as xo
FROM t
) Ntry

This references endaaa from single select only. Changing code requires
changing only one line.

Is this OK ?

Andrus.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Hannes Erven
hannes@erven.at
In reply to: Andrus (#7)
Re: How to parse xml containing optional elements

Hi,

apt-get upgrade postgresql-9.1 returns

Reading package lists... Done
Building dependency tree
Reading state information... Done
You might want to run 'apt-get -f install' to correct these.
The following packages have unmet dependencies:
openssl : Depends: libssl1.0.0 (>= 1.0.1e-2+deb7u5) but it is not
installable
wkhtmltox : Depends: libssl1.0.0 but it is not installable
E: Unmet dependencies. Try using -f.

I'm afraid I cannot really help you with this. Is this with the
postgresql.org repository already enabled?
Did you run "apt-get update"?
What about "apt-get upgrade" (a system-wide upgrade)?

Be careful, this will only work when there is exactly one ns:Stmt
element in the document.
Else you will have to build a third query level, first selecting the
ns:Stmt entries, second the IBAN and Ntry from them and third amount
and EndToEndId.

Hopefully there is only one Stmt element in single file.
I solved it by moving xpath to select IBAN to main select:

SELECT
(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Acct/ns:Id/ns:IBAN/text()',
xo,nsa))[1]::text AS endaaa,
(xpath('ns:Amt/text()', x,nsa))[1]::text::numeric AS tasusumma,
(xpath('ns:NtryDtls/ns:TxDtls/ns:Refs/ns:EndToEndId/text()',
x,nsa))[1] AS orderinr

FROM (
SELECT
unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry',
x,nsa)) as x, nsa, x as xo
FROM t
) Ntry

This references endaaa from single select only. Changing code requires
changing only one line.

Is this OK ?

This will still select only the IBAN from the first statement in the file.

-hannes

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general