Add support for automatically updating Unicode derived files
Continuing the discussion from [0]/messages/by-id/bbb19114-af1e-513b-08a9-61272794bd5c@2ndquadrant.com and [1]/messages/by-id/77f69366-ca31-6437-079f-47fce69bae1b@2ndquadrant.com, here is a patch that
automates the process of updating Unicode derived files. Summary:
- Edit UNICODE_VERSION and/or CLDR_VERSION in src/Makefile.global.in
- Run make update-unicode
- Commit
I have added that to the release checklist in RELEASE_NOTES.
This also includes the script used in [0]/messages/by-id/bbb19114-af1e-513b-08a9-61272794bd5c@2ndquadrant.com that was not committed at that
time. Other than that, this just refactors existing build code.
Open questions that are currently not handled consistently:
- Should the downloaded files be listed in .gitignore?
- Should the downloaded files be cleaned by make clean (or distclean or
maintainer-clean or none)?
- Should the generated files be excluded from pgindent? Currently, the
generated files will not pass pgindent unchanged, so that could cause
annoying whitespace battles when these files are updated and re-indented
around release time.
[0]: /messages/by-id/bbb19114-af1e-513b-08a9-61272794bd5c@2ndquadrant.com
/messages/by-id/bbb19114-af1e-513b-08a9-61272794bd5c@2ndquadrant.com
[1]: /messages/by-id/77f69366-ca31-6437-079f-47fce69bae1b@2ndquadrant.com
/messages/by-id/77f69366-ca31-6437-079f-47fce69bae1b@2ndquadrant.com
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
0001-Add-support-for-automatically-updating-Unicode-deriv.patchtext/plain; charset=UTF-8; name=0001-Add-support-for-automatically-updating-Unicode-deriv.patch; x-mac-creator=0; x-mac-type=0Download+300-87
On Tue, Oct 29, 2019 at 6:06 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
Continuing the discussion from [0] and [1], here is a patch that
automates the process of updating Unicode derived files. Summary:- Edit UNICODE_VERSION and/or CLDR_VERSION in src/Makefile.global.in
- Run make update-unicode
- Commit
Hi Peter,
I gave "make update-unicode" a try. It's unclear to me what the state
of the build tree should be when a maintainer runs this, so I'll just
report what happens when running naively (on MacOS).
After only running configure, "make update-unicode" gives this error
at normalization-check:
ld: library not found for -lpgcommon
clang: error: linker command failed with exit code 1 (use -v to see invocation)
After commenting that out, the next command "$(MAKE) -C
contrib/unaccent $@" failed, seemingly because $(PYTHON) is empty
unless --with-python was specified at configure time.
Open questions that are currently not handled consistently:
- Should the downloaded files be listed in .gitignore?
These files are transient byproducts of a build, and we don't want
them committed, so they seem like a normal candidate for .gitignore.
- Should the downloaded files be cleaned by make clean (or distclean or
maintainer-clean or none)?
It seems one would want to make clean without removing these files,
and maintainer clean is for removing things that are preserved in
distribution tarballs. So I would go with distclean.
- Should the generated files be excluded from pgindent? Currently, the
generated files will not pass pgindent unchanged, so that could cause
annoying whitespace battles when these files are updated and re-indented
around release time.
I see what you mean in the norm table header. I think generated files
should not be pgindent'd, since creating them is already a consistent,
mechanical process, and their presentation is not as important as
other code.
Other comments:
+print "/* generated by
src/common/unicode/generate-unicode_combining_table.pl, do not edit
*/\n\n";
I would print out the full boilerplate like for other generated headers.
Lastly, src/common/unicode/README is outdated (and possibly no longer
useful at all?).
--
John Naylor https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-12-19 23:48, John Naylor wrote:
I gave "make update-unicode" a try. It's unclear to me what the state
of the build tree should be when a maintainer runs this, so I'll just
report what happens when running naively (on MacOS).
Yeah, that wasn't fully thought through, it appears.
After only running configure, "make update-unicode" gives this error
at normalization-check:ld: library not found for -lpgcommon
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Fixed by adding more make dependencies.
After commenting that out, the next command "$(MAKE) -C
contrib/unaccent $@" failed, seemingly because $(PYTHON) is empty
unless --with-python was specified at configure time.
I'm not sure whether that's worth addressing.
Open questions that are currently not handled consistently:
- Should the downloaded files be listed in .gitignore?
These files are transient byproducts of a build, and we don't want
them committed, so they seem like a normal candidate for .gitignore.
OK done
- Should the downloaded files be cleaned by make clean (or distclean or
maintainer-clean or none)?It seems one would want to make clean without removing these files,
and maintainer clean is for removing things that are preserved in
distribution tarballs. So I would go with distclean.
also done
- Should the generated files be excluded from pgindent? Currently, the
generated files will not pass pgindent unchanged, so that could cause
annoying whitespace battles when these files are updated and re-indented
around release time.I see what you mean in the norm table header. I think generated files
should not be pgindent'd, since creating them is already a consistent,
mechanical process, and their presentation is not as important as
other code.
I've left it alone for now because the little indentation problem
currently present might actually go away with my Unicode normalization
support patch.
Other comments:
+print "/* generated by
src/common/unicode/generate-unicode_combining_table.pl, do not edit
*/\n\n";I would print out the full boilerplate like for other generated headers.
Hmm, you are probably comparing with
src/common/unicode/generate-unicode_norm_table.pl, but other file
generating scripts around the tree print out a small header in the style
that I have. I'd rather adjust the output of
generate-unicode_norm_table.pl to match those. (It's also not quite
correct to make copyright claims about automatically generated output.)
Lastly, src/common/unicode/README is outdated (and possibly no longer
useful at all?).
updated
new patch attached
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
v2-0001-Add-support-for-automatically-updating-Unicode-de.patchtext/plain; charset=UTF-8; name=v2-0001-Add-support-for-automatically-updating-Unicode-de.patch; x-mac-creator=0; x-mac-type=0Download+310-95
On Thu, Dec 26, 2019 at 12:39 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
On 2019-12-19 23:48, John Naylor wrote:
I would print out the full boilerplate like for other generated headers.
Hmm, you are probably comparing with
src/common/unicode/generate-unicode_norm_table.pl, but other file
generating scripts around the tree print out a small header in the style
that I have. I'd rather adjust the output of
generate-unicode_norm_table.pl to match those. (It's also not quite
correct to make copyright claims about automatically generated output.)
Hmm, the scripts I'm most familiar with have full headers. Your point
about copyright makes sense, and using smaller file headers would aid
readability of the scripts, but I also see how others may feel
differently.
v2 looks good to me, marked ready for committer.
--
John Naylor https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-01-03 15:13, John Naylor wrote:
On Thu, Dec 26, 2019 at 12:39 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:On 2019-12-19 23:48, John Naylor wrote:
I would print out the full boilerplate like for other generated headers.
Hmm, you are probably comparing with
src/common/unicode/generate-unicode_norm_table.pl, but other file
generating scripts around the tree print out a small header in the style
that I have. I'd rather adjust the output of
generate-unicode_norm_table.pl to match those. (It's also not quite
correct to make copyright claims about automatically generated output.)Hmm, the scripts I'm most familiar with have full headers. Your point
about copyright makes sense, and using smaller file headers would aid
readability of the scripts, but I also see how others may feel
differently.v2 looks good to me, marked ready for committer.
Committed, thanks.
I have added a little tweak so that it works also without --with-python,
to avoid gratuitous annoyances.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
Committed, thanks.
This patch is making src/tools/pginclude/headerscheck unhappy:
./src/include/common/unicode_combining_table.h:3: error: array type has incomplete element type
I guess that header needs another #include, or else you need to
move some declarations around.
regards, tom lane
On 2020-01-15 01:37, Tom Lane wrote:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
Committed, thanks.
This patch is making src/tools/pginclude/headerscheck unhappy:
./src/include/common/unicode_combining_table.h:3: error: array type has incomplete element type
I guess that header needs another #include, or else you need to
move some declarations around.
Hmm, this file is only meant to be included inside one particular
function. Making it standalone includable would seem to be unnecessary.
What should we do?
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
On 2020-01-15 01:37, Tom Lane wrote:
This patch is making src/tools/pginclude/headerscheck unhappy:
./src/include/common/unicode_combining_table.h:3: error: array type has incomplete element type
Hmm, this file is only meant to be included inside one particular
function. Making it standalone includable would seem to be unnecessary.
What should we do?
Well, we could make it a documented exception in headerscheck and
cpluspluscheck.
regards, tom lane
On 2020-01-20 16:43, Tom Lane wrote:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
On 2020-01-15 01:37, Tom Lane wrote:
This patch is making src/tools/pginclude/headerscheck unhappy:
./src/include/common/unicode_combining_table.h:3: error: array type has incomplete element typeHmm, this file is only meant to be included inside one particular
function. Making it standalone includable would seem to be unnecessary.
What should we do?Well, we could make it a documented exception in headerscheck and
cpluspluscheck.
OK, done.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I have committed the first Unicode data update using this new "make
update-unicode" facility.
CLDR is released regularly every 6 months, so around this time every
year would be the appropriate time to pull in the latest updates in
preparation for our own release.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services