[PATCH] Patch to fix a crash of psql

Started by JiangGuiqingabout 13 years ago12 messages
#1JiangGuiqing
jianggq@cn.fujitsu.com
2 attachment(s)

hi

When i test psql under multi-lingual and different encoding environment,
I found a crash of psql.

----------------------------------------------------------------------
$ export PGCLIENTENCODING=SJIS
$ psql
psql (9.2rc1)
Type "help" for help.

postgres=# \i sql
CREATE DATABASE
You are now connected to database "mydb" as user "postgres".
CREATE SCHEMA
Segmentation fault (core dumped)
$
----------------------------------------------------------------------

I'm look into this problem and found that
only some especial character can cause psql crash.
conditions is:
1. some especial character
(my sql file contains japanese comment "-- コメント" . It can cause
psql crash.)
2. PGCLIENTENCODING is SJIS
3. the encoding of input sql file is UTF-8

I investigated this problem. The reasons are as follows.
----------------------------------------------------------------------
src/bin/psql/mainloop.c
-> psql_scan_setup() //Set up to perform lexing of the given input line.
-->prepare_buffer () //Set up a flex input buffer to scan the given data.
---->malloc character buffer.
---->set two \0 characters. (Flex wants two \0 characters after the
actual data.)
---->working in an unsafe encoding, the copy has multibyte sequences
replaced by FFs to avoid fooling the lexer rules.
****the encoding of input sql file is different from PGCLIENTENCODING, two
\0 characters are replaced by FFs. ****

---->yy_scan_buffer() //Setup the input buffer state to scan directly
from a user-specified character buffer.
****because two \0 characters are replaced by FFs,yy_scan_buffer() return
0. input buffer state can not setup correctly.****

-> psql_scan() //Do lexical analysis of SQL command text.
--> yylex() //The main scanner function which does all the work.
****because input buffer state is not setup,so when access the input
buffer state,segmentation fault is happened.****
----------------------------------------------------------------------

I modify src/bin/psql/psqlscan.l to resolve this problem.
The diff file refer to the attachment "psqlscan.l.patch".

Regards,
Jiang Guiqing

Attachments:

psqlscan.l.patchtext/plain; charset=Shift_JIS; name=psqlscan.l.patchDownload
diff --git a/src/bin/psql/psqlscan.l b/src/bin/psql/psqlscan.l
index d32a12c..6c14298 100644
--- a/src/bin/psql/psqlscan.l
+++ b/src/bin/psql/psqlscan.l
@@ -1807,7 +1807,7 @@ prepare_buffer(const char *txt, int len, char **txtcopy)
 			/* first byte should always be okay... */
 			newtxt[i] = txt[i];
 			i++;
-			while (--thislen > 0)
+			while (--thislen > 0 && i < len)
 				newtxt[i++] = (char) 0xFF;
 		}
 	}

sqltext/plain; charset=Shift_JIS; name=sqlDownload
#2Tatsuo Ishii
ishii@postgresql.org
In reply to: JiangGuiqing (#1)
Re: [PATCH] Patch to fix a crash of psql

I confirmed the problem. Also I confirmed your patch fixes the
problem. In addition to this, all the tests in test/mb and
test/regress are passed.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

hi

When i test psql under multi-lingual and different encoding
environment,
I found a crash of psql.

----------------------------------------------------------------------
$ export PGCLIENTENCODING=SJIS
$ psql
psql (9.2rc1)
Type "help" for help.

postgres=# \i sql
CREATE DATABASE
You are now connected to database "mydb" as user "postgres".
CREATE SCHEMA
Segmentation fault (core dumped)
$
----------------------------------------------------------------------

I'm look into this problem and found that
only some especial character can cause psql crash.
conditions is:
1. some especial character
(my sql file contains japanese comment "-- コメント" . It can cause
psql crash.)
2. PGCLIENTENCODING is SJIS
3. the encoding of input sql file is UTF-8

I investigated this problem. The reasons are as follows.
----------------------------------------------------------------------
src/bin/psql/mainloop.c
-> psql_scan_setup() //Set up to perform lexing of the given input line.
-->prepare_buffer () //Set up a flex input buffer to scan the given data.
---->malloc character buffer.
---->set two \0 characters. (Flex wants two \0 characters after the
actual data.)
---->working in an unsafe encoding, the copy has multibyte sequences
replaced by FFs to avoid fooling the lexer rules.
****the encoding of input sql file is different from PGCLIENTENCODING, two
\0 characters are replaced by FFs. ****

---->yy_scan_buffer() //Setup the input buffer state to scan directly
from a user-specified character buffer.
****because two \0 characters are replaced by FFs,yy_scan_buffer() return
0. input buffer state can not setup correctly.****

-> psql_scan() //Do lexical analysis of SQL command text.
--> yylex() //The main scanner function which does all the work.
****because input buffer state is not setup,so when access the input
buffer state,segmentation fault is happened.****
----------------------------------------------------------------------

I modify src/bin/psql/psqlscan.l to resolve this problem.
The diff file refer to the attachment "psqlscan.l.patch".

Regards,
Jiang Guiqing

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tatsuo Ishii
ishii@postgresql.org
In reply to: JiangGuiqing (#1)
1 attachment(s)
Re: [PATCH] Patch to fix a crash of psql

1. some especial character
(my sql file contains japanese comment "-- コメント" . It can cause
psql crash.)
2. PGCLIENTENCODING is SJIS
3. the encoding of input sql file is UTF-8

Actually the problem can occur even when importing following 3 byte
UTF8 input file:

(in hexa, 0xe3, 0x83, 0x88)

In this paticular case, psql decides that the total character length is
5, not 3. Because it just looks at the each first byte by calling PQmblen:

0xe3 -> 1 bytes in SJIS
0x83 -> 2 bytes in SJIS
0x88 -> 2 bytes in SJIS
total: 5 bytes

which is apparently wrong and causes subsequent segfault. Note that it
is possible that "input file > psql decision" case as well if client
encoding is different from file encoding, which will not be good too.
I think we should detect the cases as much as possible and warn users,
rather than silently ignore that fact client encoding != file
encoding. I don't think we can detect it in a reliable way, but at
least we could check the cases above(sum of PQmblen is not equale to
buffer lenghth). So my proposal is, if prepare_buffer() detects
possible inconsistency between buffer encoding and file encoding, warn
user.

[t-ishii@localhost psql]$ PGCLIENTENCODING=SJIS psql postgres
Pager usage is off.
psql (9.3devel)
Type "help" for help.

postgres=# \i ~/sql
CREATE DATABASE
You are now connected to database "mydb" as user "t-ishii".
CREATE SCHEMA
psql:/home/t-ishii/sql:7: warning: possible conflict between client encoding SJIS and input file encoding
CREATE TABLE

Comments?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Attachments:

psqlscan.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/bin/psql/psqlscan.l b/src/bin/psql/psqlscan.l
index d32a12c..a14d6fe 100644
--- a/src/bin/psql/psqlscan.l
+++ b/src/bin/psql/psqlscan.l
@@ -1808,7 +1808,29 @@ prepare_buffer(const char *txt, int len, char **txtcopy)
 			newtxt[i] = txt[i];
 			i++;
 			while (--thislen > 0)
+			{
+				if (i >= len)
+				{
+					/*
+					 * This could happen if cur_state->encoding is
+					 * different from input file encoding.
+					 */
+					psql_error("warning: possible conflict between client encoding %s and input file encoding\n",
+							   pg_encoding_to_char(cur_state->encoding));
+					break;
+				}
 				newtxt[i++] = (char) 0xFF;
+			}
+		}
+
+		if (i != len)
+		{
+			/*
+			 * This could happen if cur_state->encoding is
+			 * different from input file encoding.
+			 */
+			psql_error("warning: possible conflict between client encoding %s and input file encoding\n",
+					   pg_encoding_to_char(cur_state->encoding));
 		}
 	}
 
#4JiangGuiqing
jianggq@cn.fujitsu.com
In reply to: Tatsuo Ishii (#3)
Re: [PATCH] Patch to fix a crash of psql

buffer lenghth). So my proposal is, if prepare_buffer() detects
possible inconsistency between buffer encoding and file encoding, warn
user.

I agree with that.

On 2012/11/30 12:52, Tatsuo Ishii Wrote:

1. some especial character
(my sql file contains japanese comment "-- コメント" . It can cause
psql crash.)
2. PGCLIENTENCODING is SJIS
3. the encoding of input sql file is UTF-8

Actually the problem can occur even when importing following 3 byte
UTF8 input file:

(in hexa, 0xe3, 0x83, 0x88)

In this paticular case, psql decides that the total character length is
5, not 3. Because it just looks at the each first byte by calling PQmblen:

0xe3 -> 1 bytes in SJIS
0x83 -> 2 bytes in SJIS
0x88 -> 2 bytes in SJIS
total: 5 bytes

which is apparently wrong and causes subsequent segfault. Note that it
is possible that "input file > psql decision" case as well if client
encoding is different from file encoding, which will not be good too.
I think we should detect the cases as much as possible and warn users,
rather than silently ignore that fact client encoding != file
encoding. I don't think we can detect it in a reliable way, but at
least we could check the cases above(sum of PQmblen is not equale to
buffer lenghth). So my proposal is, if prepare_buffer() detects
possible inconsistency between buffer encoding and file encoding, warn
user.

[t-ishii@localhost psql]$ PGCLIENTENCODING=SJIS psql postgres
Pager usage is off.
psql (9.3devel)
Type "help" for help.

postgres=# \i ~/sql
CREATE DATABASE
You are now connected to database "mydb" as user "t-ishii".
CREATE SCHEMA
psql:/home/t-ishii/sql:7: warning: possible conflict between client encoding SJIS and input file encoding
CREATE TABLE

Comments?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Albe Laurenz
laurenz.albe@wien.gv.at
In reply to: Tatsuo Ishii (#3)
Re: [PATCH] Patch to fix a crash of psql

Tatsuo Ishii wrote:

I think we should detect the cases as much as possible and warn users,
rather than silently ignore that fact client encoding != file
encoding. I don't think we can detect it in a reliable way, but at
least we could check the cases above(sum of PQmblen is not equale to
buffer lenghth). So my proposal is, if prepare_buffer() detects
possible inconsistency between buffer encoding and file encoding, warn
user.

[t-ishii@localhost psql]$ PGCLIENTENCODING=SJIS psql postgres
Pager usage is off.
psql (9.3devel)
Type "help" for help.

postgres=# \i ~/sql
CREATE DATABASE
You are now connected to database "mydb" as user "t-ishii".
CREATE SCHEMA
psql:/home/t-ishii/sql:7: warning: possible conflict between client

encoding SJIS and input file

encoding
CREATE TABLE

Comments?

I wonder about the "possible".

Could there be false positives with the test?
If yes, I don't like the idea.
If there is no possibility for false positives, I'd say
that the "possible" should go. Maybe it should even be
an error and no warning then.

Yours,
Laurenz Albe

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Peter Eisentraut
peter_e@gmx.net
In reply to: Albe Laurenz (#5)
Re: [PATCH] Patch to fix a crash of psql

On 11/30/12 3:26 AM, Albe Laurenz wrote:

Tatsuo Ishii wrote:

postgres=# \i ~/sql
CREATE DATABASE
You are now connected to database "mydb" as user "t-ishii".
CREATE SCHEMA
psql:/home/t-ishii/sql:7: warning: possible conflict between client

encoding SJIS and input file

encoding
CREATE TABLE

Comments?

I wonder about the "possible".

Could there be false positives with the test?
If yes, I don't like the idea.
If there is no possibility for false positives, I'd say
that the "possible" should go. Maybe it should even be
an error and no warning then.

Yes, encoding mismatches are generally an error.

I think the message should be more precise. Nobody will know what an
"encoding conflict" is. The error condition is "last multibyte
character ran over end of file" or something like that, which should be
clearer.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#6)
Re: [PATCH] Patch to fix a crash of psql

Peter Eisentraut <peter_e@gmx.net> writes:

On 11/30/12 3:26 AM, Albe Laurenz wrote:

If there is no possibility for false positives, I'd say
that the "possible" should go. Maybe it should even be
an error and no warning then.

Yes, encoding mismatches are generally an error.

I think the message should be more precise. Nobody will know what an
"encoding conflict" is. The error condition is "last multibyte
character ran over end of file" or something like that, which should be
clearer.

TBH I think that a message here is unnecessary; it's sufficient to
ensure psql doesn't crash. The backend will produce a better message
than this anyway once the data gets there, and that way we don't have to
invent a new error recovery path inside psql. As-is, the patch just
creates the question of what to do after issuing the error.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Tatsuo Ishii
ishii@postgresql.org
In reply to: Tom Lane (#7)
Re: [PATCH] Patch to fix a crash of psql

Peter Eisentraut <peter_e@gmx.net> writes:

On 11/30/12 3:26 AM, Albe Laurenz wrote:

If there is no possibility for false positives, I'd say
that the "possible" should go. Maybe it should even be
an error and no warning then.

Yes, encoding mismatches are generally an error.

I think the message should be more precise. Nobody will know what an
"encoding conflict" is. The error condition is "last multibyte
character ran over end of file" or something like that, which should be
clearer.

TBH I think that a message here is unnecessary; it's sufficient to
ensure psql doesn't crash. The backend will produce a better message
than this anyway once the data gets there, and that way we don't have to
invent a new error recovery path inside psql. As-is, the patch just
creates the question of what to do after issuing the error.

Problem is, backend does not always detect errors.

For my example backend caches an error:
postgres=# \i ~/a
psql:/home/t-ishii/a:1: warning: possible conflict between client encoding SJIS and input file encoding
ERROR: invalid byte sequence for encoding "SJIS": 0x88
psql:/home/t-ishii/a:1: ERROR: invalid byte sequence for encoding "SJIS": 0x88

However for Jiang Guiqing's example, backend silently ignores error:
postgres=# \i ~/sql
CREATE DATABASE
You are now connected to database "mydb" as user "t-ishii".
CREATE SCHEMA
psql:/home/t-ishii/sql:7: warning: possible conflict between client encoding SJIS and input file encoding
CREATE TABLE

IMO it is a benefit for users to detect such errors earlier.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Tatsuo Ishii
ishii@postgresql.org
In reply to: Albe Laurenz (#5)
Re: [PATCH] Patch to fix a crash of psql

I wonder about the "possible".

Could there be false positives with the test?
If yes, I don't like the idea.

Yes, there could be false positives. It was just because the input
file was broken. In the real world, I think probably encoding mismatch
is the most possible cause, but this is not 100%.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Tatsuo Ishii
ishii@postgresql.org
In reply to: Peter Eisentraut (#6)
Re: [PATCH] Patch to fix a crash of psql

I think the message should be more precise. Nobody will know what an
"encoding conflict" is. The error condition is "last multibyte
character ran over end of file" or something like that, which should be
clearer.

"last multibyte character ran over" is not precise at all because the
error was caused by each byte in the file confused PQmblen, not just
last multibyte character. However to explain those in the messgae is
too technical for users, I'm afraid. Maybe just "encoding mismatch
detected. please make sure that input file encoding is SJIS" or
something like that?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#8)
Re: [PATCH] Patch to fix a crash of psql

Tatsuo Ishii <ishii@postgresql.org> writes:

TBH I think that a message here is unnecessary; it's sufficient to
ensure psql doesn't crash. The backend will produce a better message
than this anyway once the data gets there, and that way we don't have to
invent a new error recovery path inside psql. As-is, the patch just
creates the question of what to do after issuing the error.

Problem is, backend does not always detect errors.

The reason it doesn't produce an error in Jiang Guiqing's example is
that the encoding error is in a comment and thus is never transmitted
to the backend at all. I don't see a big problem with that. If we
did have a problem with it, I think the better fix would be to stop
having psql strip comments from what it sends. (I've considered
proposing such a change anyway, in order that logged statements look
more like what the user typed.)

IMO it is a benefit for users to detect such errors earlier.

It is not a benefit for users to get randomly different (and less
informative) messages and different error behaviors for the same
problem.

I think we should just do this and call it good:

diff --git a/src/bin/psql/psqlscan.l b/src/bin/psql/psqlscan.l
index d32a12c63c856625aa42c708b24d4b58e3cdd677..6c1429815f2eec597f735d18ea86d9c8c9f1f3a2 100644
*** a/src/bin/psql/psqlscan.l
--- b/src/bin/psql/psqlscan.l
*************** prepare_buffer(const char *txt, int len,
*** 1807,1813 ****
  			/* first byte should always be okay... */
  			newtxt[i] = txt[i];
  			i++;
! 			while (--thislen > 0)
  				newtxt[i++] = (char) 0xFF;
  		}
  	}
--- 1807,1813 ----
  			/* first byte should always be okay... */
  			newtxt[i] = txt[i];
  			i++;
! 			while (--thislen > 0 && i < len)
  				newtxt[i++] = (char) 0xFF;
  		}
  	}

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Tatsuo Ishii
ishii@postgresql.org
In reply to: Tatsuo Ishii (#2)
Re: [PATCH] Patch to fix a crash of psql

I confirmed the problem. Also I confirmed your patch fixes the
problem. In addition to this, all the tests in test/mb and
test/regress are passed.

Fix committed as you proposed(without any message I proposed). Thanks.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers