initdb faild to initialize full text search dictionaries
This is a follow-up on bug #17356. PostgreSQL version 15 is also affected:
it is not able to build dictionary from hunspell packages.
How to reproduce:
# Start minimal virgin system
docker run -it --rm debian bash
apt update
apt install -y acl ca-certificates curl gzip libbsd0 libbz2-1.0 libc6
libedit2 libffi7 libnettle8 libicu67 libreadline8 libgcc1 libgmp10
libgnutls30 libhogweed6 libidn2-0 libldap-2.4-2 liblz4-1 liblzma5 \
libncurses6 libp11-kit0 libpcre3 libsasl2-2 libsqlite3-0 libssl1.1
libstdc++6 libtasn1-6 libtinfo6 libunistring2 libuuid1 libxml2 libxslt1.1
libzstd1 locales procps tar zlib1g gnupg dumb-init curl
# Install hunspell-hu and PostgreSQL
apt install -y hunspell hunspell-hu
curl -s
https://salsa.debian.org/postgresql/postgresql-common/raw/master/pgdg/apt.postgresql.org.sh
| bash
apt update
apt install -y postgresql-15
The last command "apt install -y postgresql-15" gives this error:
Building PostgreSQL dictionaries from installed myspell/hunspell packages...
hu_hu
iconv: illegal input sequence at position 131
ERROR: Conversion of /usr/share/hunspell/hu_HU.aff failed
Removing obsolete dictionary files:
I'm not sure where the problem is. It may be in hunspell, or hunspell-hu,
or iconv or postgresql. I have tried to find the root cause, but I falied.
At least it seems that it is NOT a bug in hunspell or hunspell-hu, because
the author of hunspell wrote this comment in 2018 at
https://github.com/hunspell/hunspell/issues/559#issuecomment-446335091
Not a bug: Hunspell's file format is not an UTF-8 encoded text file in
the case of SET UTF-8 with the default 8-bit FLAG.
That hunspell issue is only open because "it is a valid request", but it is
not a bug nonetheless (according to the author).
So it might be iconv, or it might be pg_updatedicts that calls iconv with
the wrong parameters. I do not know enough to tell...
The effect of this bug is that PostgreSQL is not able to utilize the
dictionaries for full text search. (
https://www.postgresql.org/docs/15/textsearch-dictionaries.html ) I did not
try ispell or myspell yet, but they are old (ancient, actually) and
hunspell should be preferred. I think that this bug has been around at
least since 5 years (2018).
Regards,
Laszlo Zsolt Nagy
Les <nagylzs@gmail.com> writes:
The last command "apt install -y postgresql-15" gives this error:
Building PostgreSQL dictionaries from installed myspell/hunspell packages...
hu_hu
iconv: illegal input sequence at position 131
ERROR: Conversion of /usr/share/hunspell/hu_HU.aff failed
Sadly, I do not think any of the moving parts there are under the PG
project's control. We certainly can't fix problems in either hunspell
or iconv, and even the fact that iconv is being applied during install
is not something the core project does. I gather that this is
something the Debian packaging of postgres is attempting, so I'd
suggest taking it up with those packagers. It's possible that it's
something easy like they have the wrong idea of what encoding that
particular file is in. Or maybe the best answer is to skip any
files that fail conversion, without aborting the package install
entirely. But we here on pgsql-bugs can't help you.
regards, tom lane