pgsql: Fix overread in JSON parsing errors for incomplete byte sequence

Started by Michael Paquierover 1 year ago8 messages
#1Michael Paquier
michael@paquier.xyz

Fix overread in JSON parsing errors for incomplete byte sequences

json_lex_string() relies on pg_encoding_mblen_bounded() to point to the
end of a JSON string when generating an error message, and the input it
uses is not guaranteed to be null-terminated.

It was possible to walk off the end of the input buffer by a few bytes
when the last bytes consist of an incomplete multi-byte sequence, as
token_terminator would point to a location defined by
pg_encoding_mblen_bounded() rather than the end of the input. This
commit switches token_terminator so as the error uses data up to the
end of the JSON input.

More work should be done so as this code could rely on an equivalent of
report_invalid_encoding() so as incorrect byte sequences can show in
error messages in a readable form. This requires work for at least two
cases in the JSON parsing API: an incomplete token and an invalid escape
sequence. A more complete solution may be too invasive for a backpatch,
so this is left as a future improvement, taking care of the overread
first.

A test is added on HEAD as test_json_parser makes this issue
straight-forward to check.

Note that pg_encoding_mblen_bounded() no longer has any callers. This
will be removed on HEAD with a separate commit, as this is proving to
encourage unsafe coding.

Author: Jacob Champion
Discussion: /messages/by-id/CAOYmi+ncM7pwLS3AnKCSmoqqtpjvA8wmCdoBtKA3ZrB2hZG6zA@mail.gmail.com
Backpatch-through: 13

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/855517307db8efd397d49163d65a4fd3bdcc41bc

Modified Files
--------------
src/common/jsonapi.c | 4 ++--
src/test/modules/test_json_parser/t/002_inline.pl | 8 ++++++++
2 files changed, 10 insertions(+), 2 deletions(-)

#2Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Michael Paquier (#1)
Re: pgsql: Fix overread in JSON parsing errors for incomplete byte sequence

On 2024-May-09, Michael Paquier wrote:

Fix overread in JSON parsing errors for incomplete byte sequences

I'm getting this error in the new test:

t/002_inline.pl ........................ 1/?
# Failed test 'incomplete UTF-8 sequence, chunk size 3: correct error output'
# at t/002_inline.pl line 134.
# 'Escape sequence "\�1+2" is invalid.'
# doesn't match '(?^:(Token|Escape sequence) ""?\\\x{F5}" is invalid)'
# Looks like you failed 1 test of 850.

Not sure what's going on here, or why it fails for me while the
buildfarm is all happy.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Every machine is a smoke machine if you operate it wrong enough."
https://twitter.com/libseybieda/status/1541673325781196801

#3Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Alvaro Herrera (#2)
Re: pgsql: Fix overread in JSON parsing errors for incomplete byte sequence

On 2024-May-10, Alvaro Herrera wrote:

Not sure what's going on here, or why it fails for me while the
buildfarm is all happy.

Ah, I ran 'git clean -dfx' and now it works correctly. I must have had
an incomplete rebuild.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"World domination is proceeding according to plan" (Andrew Morton)

#4Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#3)
Re: pgsql: Fix overread in JSON parsing errors for incomplete byte sequence

On Fri, May 10, 2024 at 02:23:09PM +0200, Alvaro Herrera wrote:

Ah, I ran 'git clean -dfx' and now it works correctly. I must have had
an incomplete rebuild.

I am going to assume that this is an incorrect build. It seems to me
that src/common/ was compiled with a past version not sufficient to
make the new test pass as more bytes got pushed to the error output as
the pre-855517307db8 code could point to some random junk.
--
Michael

#5Peter Eisentraut
peter@eisentraut.org
In reply to: Alvaro Herrera (#3)
Re: pgsql: Fix overread in JSON parsing errors for incomplete byte sequence

On 10.05.24 14:23, Alvaro Herrera wrote:

On 2024-May-10, Alvaro Herrera wrote:

Not sure what's going on here, or why it fails for me while the
buildfarm is all happy.

Ah, I ran 'git clean -dfx' and now it works correctly. I must have had
an incomplete rebuild.

I saw the same thing. The problem is that there is incomplete
dependency information in the makefiles (not meson) between src/common/
and what is using it. So whenever anything changes in src/common/, you
pretty much have to do a forced rebuild of everything.

#6Michael Paquier
michael@paquier.xyz
In reply to: Peter Eisentraut (#5)
Re: pgsql: Fix overread in JSON parsing errors for incomplete byte sequence

On Tue, May 14, 2024 at 10:39:36AM +0200, Peter Eisentraut wrote:

I saw the same thing. The problem is that there is incomplete dependency
information in the makefiles (not meson) between src/common/ and what is
using it. So whenever anything changes in src/common/, you pretty much have
to do a forced rebuild of everything.

Is that a recent regression? I have some blurry memories from
working on these areas that changing src/common/ reflected on the
compiled pieces effectively at some point.
--
Michael

#7Peter Eisentraut
peter@eisentraut.org
In reply to: Michael Paquier (#6)
Re: pgsql: Fix overread in JSON parsing errors for incomplete byte sequence

On 15.05.24 02:00, Michael Paquier wrote:

On Tue, May 14, 2024 at 10:39:36AM +0200, Peter Eisentraut wrote:

I saw the same thing. The problem is that there is incomplete dependency
information in the makefiles (not meson) between src/common/ and what is
using it. So whenever anything changes in src/common/, you pretty much have
to do a forced rebuild of everything.

Is that a recent regression? I have some blurry memories from
working on these areas that changing src/common/ reflected on the
compiled pieces effectively at some point.

One instance of this problem that I can reproduce at least back to PG12 is

1. touch src/common/exec.c
2. make -C src/bin/pg_dump

This will rebuild libpgcommon, but it will not relink pg_dump.

#8Jacob Champion
jacob.champion@enterprisedb.com
In reply to: Peter Eisentraut (#7)
Re: pgsql: Fix overread in JSON parsing errors for incomplete byte sequence

On Tue, May 14, 2024 at 11:15 PM Peter Eisentraut <peter@eisentraut.org> wrote:

On 15.05.24 02:00, Michael Paquier wrote:

Is that a recent regression? I have some blurry memories from
working on these areas that changing src/common/ reflected on the
compiled pieces effectively at some point.

One instance of this problem that I can reproduce at least back to PG12 is

1. touch src/common/exec.c
2. make -C src/bin/pg_dump

This will rebuild libpgcommon, but it will not relink pg_dump.

I remember src/common/unicode changes having similar trouble, as well [1]/messages/by-id/CAFBsxsGZTwzDnTs=TVM38CCTPP3Y0D3=h+UiWt8M83D5THHf9A@mail.gmail.com.

--Jacob

[1]: /messages/by-id/CAFBsxsGZTwzDnTs=TVM38CCTPP3Y0D3=h+UiWt8M83D5THHf9A@mail.gmail.com