Character with byte sequence 0xa2 0xa3 in encoding "EUC_CN" has no equivalent in encoding "UTF8"

Started by Zhongpu Chen23 days ago4 messagesbugs
Jump to latest
#1Zhongpu Chen
chenloveit@gmail.com

## Description

The legacy encodings allow some invalid bytes, which will cause errors
during SELECT operations.

## How to reproduce

```shell
createdb -E EUC_CN -T template0 --locale=C demo_euc_cn_db
```

```sql
demo_euc_cn_db=# CREATE TABLE t(id int, s varchar(10));

demo_euc_cn_db=# INSERT INTO t VALUES(1, E'\xA2\xA3');
INSERT 0 1
demo_euc_cn_db=# SELECT * FROM t WHERE id = 1;
ERROR: character with byte sequence 0xa2 0xa3 in encoding "EUC_CN" has no
equivalent in encoding "UTF8"
```

--
Zhongpu Chen

#2Junwang Zhao
zhjwpku@gmail.com
In reply to: Zhongpu Chen (#1)
Re: Character with byte sequence 0xa2 0xa3 in encoding "EUC_CN" has no equivalent in encoding "UTF8"

On Fri, May 1, 2026 at 9:59 PM Zhongpu Chen <chenloveit@gmail.com> wrote:

## Description

The legacy encodings allow some invalid bytes, which will cause errors during SELECT operations.

## How to reproduce

```shell
createdb -E EUC_CN -T template0 --locale=C demo_euc_cn_db
```

```sql
demo_euc_cn_db=# CREATE TABLE t(id int, s varchar(10));

demo_euc_cn_db=# INSERT INTO t VALUES(1, E'\xA2\xA3');
INSERT 0 1
demo_euc_cn_db=# SELECT * FROM t WHERE id = 1;
ERROR: character with byte sequence 0xa2 0xa3 in encoding "EUC_CN" has no equivalent in encoding "UTF8"

Can you try the following statement before select?
SET client_encoding TO 'EUC_CN';

```

--
Zhongpu Chen

--
Regards
Junwang Zhao

#3Zhongpu Chen
chenloveit@gmail.com
In reply to: Junwang Zhao (#2)
Re: Character with byte sequence 0xa2 0xa3 in encoding "EUC_CN" has no equivalent in encoding "UTF8"

```
demo_euc_cn_db=# SET client_encoding TO 'EUC_CN';
SET
demo_euc_cn_db=# SELECT * FROM t WHERE id = 1;
id | s
----+----
1 | ��
(1 row)
```

Since 0xA2A3 is invalid in EUC-CN, it cannot be mapped to any meaningful
character. Currently, EUC-CN allows all 2-byte within A1-EF, but this
coarse-grained approach is flawed.

On Fri, May 1, 2026 at 11:07 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

On Fri, May 1, 2026 at 9:59 PM Zhongpu Chen <chenloveit@gmail.com> wrote:

## Description

The legacy encodings allow some invalid bytes, which will cause errors

during SELECT operations.

## How to reproduce

```shell
createdb -E EUC_CN -T template0 --locale=C demo_euc_cn_db
```

```sql
demo_euc_cn_db=# CREATE TABLE t(id int, s varchar(10));

demo_euc_cn_db=# INSERT INTO t VALUES(1, E'\xA2\xA3');
INSERT 0 1
demo_euc_cn_db=# SELECT * FROM t WHERE id = 1;
ERROR: character with byte sequence 0xa2 0xa3 in encoding "EUC_CN" has

no equivalent in encoding "UTF8"

Can you try the following statement before select?
SET client_encoding TO 'EUC_CN';

```

--
Zhongpu Chen

--
Regards
Junwang Zhao

--
Zhongpu Chen

#4Junwang Zhao
zhjwpku@gmail.com
In reply to: Zhongpu Chen (#3)
Re: Character with byte sequence 0xa2 0xa3 in encoding "EUC_CN" has no equivalent in encoding "UTF8"

On Sat, May 2, 2026 at 12:09 AM Zhongpu Chen <chenloveit@gmail.com> wrote:

```
demo_euc_cn_db=# SET client_encoding TO 'EUC_CN';
SET
demo_euc_cn_db=# SELECT * FROM t WHERE id = 1;
id | s
----+----
1 | ��
(1 row)
```

Since 0xA2A3 is invalid in EUC-CN, it cannot be mapped to any meaningful character. Currently, EUC-CN allows all 2-byte within A1-EF, but this coarse-grained approach is flawed.

This seems more like a feature request than a bug. It would make sense
to close the bug report and start a discussion on the hackers mailing
list instead.

On Fri, May 1, 2026 at 11:07 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

On Fri, May 1, 2026 at 9:59 PM Zhongpu Chen <chenloveit@gmail.com> wrote:

## Description

The legacy encodings allow some invalid bytes, which will cause errors during SELECT operations.

## How to reproduce

```shell
createdb -E EUC_CN -T template0 --locale=C demo_euc_cn_db
```

```sql
demo_euc_cn_db=# CREATE TABLE t(id int, s varchar(10));

demo_euc_cn_db=# INSERT INTO t VALUES(1, E'\xA2\xA3');
INSERT 0 1
demo_euc_cn_db=# SELECT * FROM t WHERE id = 1;
ERROR: character with byte sequence 0xa2 0xa3 in encoding "EUC_CN" has no equivalent in encoding "UTF8"

Can you try the following statement before select?
SET client_encoding TO 'EUC_CN';

```

--
Zhongpu Chen

--
Regards
Junwang Zhao

--
Zhongpu Chen

--
Regards
Junwang Zhao