delimiter inconsistency in generate-wait_event_types.pl
Hello,
I got bitten by an inconsistency introduced about two years ago. In
the script generate-wait_event_types.pl, the intermediate line format
is parsed using a regular expression that allows multiple tab
characters between fields. However, the fields were later extracted
using split(/\t/, ...), which assumes single-tab delimiters and fails
when fields are separated by multiple tabs. This leads to a somewhat
unclear error when processing input that should otherwise be valid
(*1):
substr outside of string at ./generate-wait_event_types.pl line 243,
<$wait_event_names> line 434.
Since the data was already captured via regex, using $1, $2 and $3
instead of split() avoids the inconsistency and makes the intent
clearer. A related adjustment was made elsewhere in the script to
improve consistency.
This is addressed in the attached patch.
regards.
*1:
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 0be307d2ca0..ba551938ed7 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -405,7 +405,7 @@ SerialSLRU "Waiting to access the serializable transaction conflict SLRU cache."
SubtransSLRU "Waiting to access the sub-transaction SLRU cache."
XactSLRU "Waiting to access the transaction status SLRU cache."
ParallelVacuumDSA "Waiting for parallel vacuum dynamic shared memory allocation."
-AioUringCompletion "Waiting for another process to complete IO via io_uring."
+AioUringCompletion "Waiting for another process to complete IO via io_uring."
# No "ABI_compatibility" region here as WaitEventLWLock has its own C code.
Attachments:
v1-0001-Make-tab-delimiter-handling-consistent-in-generat.patchtext/x-patch; charset=us-asciiDownload
From 4520e71a9f064b1c01df045d0de878a9cf15b1aa Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Tue, 29 Jul 2025 11:45:21 +0900
Subject: [PATCH v1] Make tab delimiter handling consistent in
generate-wait_event_types.pl
Format validation and element extraction for intermediate line strings
are inconsistent in their handling of multiple tab delimiters, which
can result in an unclear error. Extract the elements using regex
captures from the validation regex instead of a separate split() to
avoid the inconsistency. Also replace \t with \t+ in the remaining
split() calls on the same strings for consistency.
---
src/backend/utils/activity/generate-wait_event_types.pl | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/backend/utils/activity/generate-wait_event_types.pl b/src/backend/utils/activity/generate-wait_event_types.pl
index 424ad9f115d..21abef860de 100644
--- a/src/backend/utils/activity/generate-wait_event_types.pl
+++ b/src/backend/utils/activity/generate-wait_event_types.pl
@@ -85,7 +85,7 @@ while (<$wait_event_names>)
# Sort the lines based on the second column.
# uc() is being used to force the comparison to be case-insensitive.
my @lines_sorted =
- sort { uc((split(/\t/, $a))[1]) cmp uc((split(/\t/, $b))[1]) } @lines;
+ sort { uc((split(/\t+/, $a))[1]) cmp uc((split(/\t+/, $b))[1]) } @lines;
# If we are generating code, concat @lines_sorted and then
# @abi_compatibility_lines.
@@ -101,7 +101,7 @@ foreach my $line (@lines_sorted)
unless $line =~ /^(\w+)\t+(\w+)\t+("\w.*\.")$/;
(my $waitclassname, my $waiteventname, my $waitevendocsentence) =
- split(/\t/, $line);
+ ($1, $2, $3);
# Generate the element name for the enums based on the
# description. The C symbols are prefixed with "WAIT_EVENT_".
--
2.47.1
On 29 Jul 2025, at 06:56, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
I got bitten by an inconsistency introduced about two years ago. In
the script generate-wait_event_types.pl, the intermediate line format
is parsed using a regular expression that allows multiple tab
characters between fields. However, the fields were later extracted
using split(/\t/, ...), which assumes single-tab delimiters and fails
when fields are separated by multiple tabs. This leads to a somewhat
unclear error when processing input that should otherwise be valid
Nothing in the documentation for this explicitly states that multiple tab
characters are supported so the alternative patch could be to remove support
for \t+. That being said, such a restriction seems artificial and I prefer
your approach.
Since the data was already captured via regex, using $1, $2 and $3
instead of split() avoids the inconsistency and makes the intent
clearer. A related adjustment was made elsewhere in the script to
improve consistency.
+1, using the capture groups is clearly more readable.
While looking at this I noticed that the --docs option is incorrectly refered
to as --sgml in the usage output, which is fixed in 0002.
--
Daniel Gustafsson
Attachments:
v2-0002-Fix-incorrect-option-name-in-usage-screen.patchapplication/octet-stream; name=v2-0002-Fix-incorrect-option-name-in-usage-screen.patch; x-unix-mode=0644Download
From 275d26712318d3934be1efd38bbf44bb8d841ed1 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Tue, 29 Jul 2025 09:48:56 +0200
Subject: [PATCH v2 2/2] Fix incorrect option name in usage screen
The usage screen incorrectly refered to the --docs option as --sgml.
Backpatch down to v17 where this script was introduced.
Author: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/20250729.135638.1148639539103758555.horikyota.ntt@gmail.com
Backpatch-through: 17
---
src/backend/utils/activity/generate-wait_event_types.pl | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/backend/utils/activity/generate-wait_event_types.pl b/src/backend/utils/activity/generate-wait_event_types.pl
index 21abef860de..5db13419f25 100644
--- a/src/backend/utils/activity/generate-wait_event_types.pl
+++ b/src/backend/utils/activity/generate-wait_event_types.pl
@@ -334,12 +334,12 @@ close $wait_event_names;
sub usage
{
die <<EOM;
-Usage: perl [--output <path>] [--code ] [ --sgml ] input_file
+Usage: perl [--output <path>] [--code ] [ --docs ] input_file
Options:
--outdir Output directory (default '.')
--code Generate C and header files.
- --sgml Generate wait_event_types.sgml.
+ --docs Generate wait_event_types.sgml.
generate-wait_event_types.pl generates the SGML documentation and code
related to wait events. This should use wait_event_names.txt in input, or
--
2.39.3 (Apple Git-146)
v2-0001-Consistently-handle-tab-delimiters-for-wait-event.patchapplication/octet-stream; name=v2-0001-Consistently-handle-tab-delimiters-for-wait-event.patch; x-unix-mode=0644Download
From e85d53dd98ad04069936acb03b37809ec493fd84 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Tue, 29 Jul 2025 09:39:38 +0200
Subject: [PATCH v2 1/2] Consistently handle tab delimiters for wait event
names
Format validation and element extraction for intermediate line
strings were inconsistent in their handling of tab delimiters,
which resulted in an unclear error when multiple tab characters
were used as a delimiter. This fixes it by using captures from
the validation regex instead of a separate split() to avoid the
inconsistency. Also, it ensures that \t+ is used consistently
when inspecting the strings.
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/20250729.135638.1148639539103758555.horikyota.ntt@gmail.com
---
src/backend/utils/activity/generate-wait_event_types.pl | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/backend/utils/activity/generate-wait_event_types.pl b/src/backend/utils/activity/generate-wait_event_types.pl
index 424ad9f115d..21abef860de 100644
--- a/src/backend/utils/activity/generate-wait_event_types.pl
+++ b/src/backend/utils/activity/generate-wait_event_types.pl
@@ -85,7 +85,7 @@ while (<$wait_event_names>)
# Sort the lines based on the second column.
# uc() is being used to force the comparison to be case-insensitive.
my @lines_sorted =
- sort { uc((split(/\t/, $a))[1]) cmp uc((split(/\t/, $b))[1]) } @lines;
+ sort { uc((split(/\t+/, $a))[1]) cmp uc((split(/\t+/, $b))[1]) } @lines;
# If we are generating code, concat @lines_sorted and then
# @abi_compatibility_lines.
@@ -101,7 +101,7 @@ foreach my $line (@lines_sorted)
unless $line =~ /^(\w+)\t+(\w+)\t+("\w.*\.")$/;
(my $waitclassname, my $waiteventname, my $waitevendocsentence) =
- split(/\t/, $line);
+ ($1, $2, $3);
# Generate the element name for the enums based on the
# description. The C symbols are prefixed with "WAIT_EVENT_".
--
2.39.3 (Apple Git-146)
On 29 Jul 2025, at 10:08, Daniel Gustafsson <daniel@yesql.se> wrote:
While looking at this I noticed that the --docs option is incorrectly refered
to as --sgml in the usage output, which is fixed in 0002.
I was helpfully reminded about this thread and after taking another look at it
I went ahead and pushed it.
--
Daniel Gustafsson