psql include text file with bom

Started by Rick Parrishover 2 years ago2 messagesbugs
Jump to latest
#1Rick Parrish
ai5jt@unitrunker.net

*Summary:*

psql "include" or "\i" command chokes on UTF8 text files prefixed with BOM.

*Steps to reproduce:*

1. create a UTF8 file with three byte BOM 'EF BB BF'.
2. include the file from psql via the "include" or \i command.

Example output for file named "test.sql" below:

redacted-# \i test.sql
psql:test.sql:1: ERROR:  syntax error at or near ""
LINE 1: 

*Background*

https://en.wikipedia.org/wiki/Byte_order_mark

Some text editors save text to a file prefixed by a BOM or byte marker.
This includes Visual Studio, VSCode and others.

I think it would be reasonable for the include command to skip over any
BOM found in the first two or three bytes of a file.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Rick Parrish (#1)
Re: psql include text file with bom

Rick Parrish <ai5jt@unitrunker.net> writes:

I think it would be reasonable for the include command to skip over any
BOM found in the first two or three bytes of a file.

This has been proposed before, and rejected before. psql has no
inherent knowledge of what encoding an input file is in, and therefore
no justification to assume that a bit-pattern it sees there is a BOM.
In non-UTF8 encodings it could very easily be valid data.

(For that matter, it's also valid data in UTF8: it's the same bit
pattern as U+FEFF ZERO WIDTH NO-BREAK SPACE. Programs that emit
one into UTF8 streams, and expect it not to be taken as data,
are frankly broken.)

regards, tom lane