CI speed improvements for FreeBSD
Hi,
Here are a couple of changes that got FreeBSD down to 4:29 total, 2:40
in test_world in my last run (over 2x speedup), using a RAM disk
backed by a swap partition, and more CPUs. It's still a regular UFS
file system but FreeBSD is not as good at avoiding I/O around short
lived files and directories as Linux: it can get hung up on a bunch of
synchronous I/O, and also flushes disk caches for those writes,
without an off switch.
I don't know about Windows, but I suspect the same applies there, ie
synchronous I/O blocking system calls around our blizzard of file
creations and unlinks. Anyone know how to try it?
Attachments:
0001-ci-Use-a-RAM-disk-on-FreeBSD.patchtext/x-patch; charset=US-ASCII; name=0001-ci-Use-a-RAM-disk-on-FreeBSD.patchDownload
From d47d01edbbf88d1cfc5fa2c48024e3bc85b52eae Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Sun, 27 Aug 2023 11:01:24 +1200
Subject: [PATCH 1/2] ci: Use a RAM disk on FreeBSD.
Run the tests in a RAM disk. It's still a UFS file system and is backed
by 20GB of disk, but it avoids a lot of I/O. This shaves over a minute
off the test_world time, and scales better.
---
src/tools/ci/gcp_freebsd_repartition.sh | 24 +++++++++++-------------
1 file changed, 11 insertions(+), 13 deletions(-)
diff --git a/src/tools/ci/gcp_freebsd_repartition.sh b/src/tools/ci/gcp_freebsd_repartition.sh
index 2d5e173899..91c0e7f93c 100755
--- a/src/tools/ci/gcp_freebsd_repartition.sh
+++ b/src/tools/ci/gcp_freebsd_repartition.sh
@@ -3,26 +3,24 @@
set -e
set -x
-# The default filesystem on freebsd gcp images is very slow to run tests on,
-# due to its 32KB block size
-#
-# XXX: It'd probably better to fix this in the image, using something like
-# https://people.freebsd.org/~lidl/blog/re-root.html
-
# fix backup partition table after resize
gpart recover da0
gpart show da0
-# kill swap, so we can delete a partition
-swapoff -a || true
-# (apparently we can only have 4!?)
+
+# delete and re-add swap partition with expanded size
+swapoff -a
gpart delete -i 3 da0
-gpart add -t freebsd-ufs -l data8k -a 4096 da0
+gpart add -t freebsd-swap -l swapfs -a 4096 da0
gpart show da0
-newfs -U -b 8192 /dev/da0p3
+swapon -a
+
+# create a file system on a memory disk backed by swap, to minimize I/O
+mdconfig -a -t swap -s20G -u md1
+newfs -b 8192 -U /dev/md1
-# Migrate working directory
+# migrate working directory
du -hs $CIRRUS_WORKING_DIR
mv $CIRRUS_WORKING_DIR $CIRRUS_WORKING_DIR.orig
mkdir $CIRRUS_WORKING_DIR
-mount -o noatime /dev/da0p3 $CIRRUS_WORKING_DIR
+mount -o noatime /dev/md1 $CIRRUS_WORKING_DIR
cp -r $CIRRUS_WORKING_DIR.orig/* $CIRRUS_WORKING_DIR/
--
2.41.0
0002-ci-Use-more-CPUs-on-FreeBSD.patchtext/x-patch; charset=US-ASCII; name=0002-ci-Use-more-CPUs-on-FreeBSD.patchDownload
From ee51fc23a3297e1acdeffc5b4ca09129ba7c889a Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Sun, 27 Aug 2023 12:09:39 +1200
Subject: [PATCH 2/2] ci: Use more CPUs on FreeBSD.
Reduce test_world time by using all available CPUs.
---
.cirrus.tasks.yml | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index e137769850..69e2bcb7ad 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -126,11 +126,9 @@ task:
name: FreeBSD - 13 - Meson
env:
- # FreeBSD on GCP is slow when running with larger number of CPUS /
- # jobs. Using one more job than cpus seems to work best.
- CPUS: 2
- BUILD_JOBS: 3
- TEST_JOBS: 3
+ CPUS: 4
+ BUILD_JOBS: 4
+ TEST_JOBS: 8
IMAGE_FAMILY: pg-ci-freebsd-13
DISK_SIZE: 50
--
2.41.0
And after adding this to the commitfest, here's the first cfbot run.
The gain was due to "test_world" which shows a greater-than-2x speedup
(~4:30 -> ~2:08) from 2x CPUs. That is nice for humans who want the
answer as soon as possible, but note that the resource usage cost
might go up because of the non-parallel parts now wasting more idle
CPUs: git clone, meson configure etc (as they do on every platform).
Hi!
I looked at the changes and I liked them. Here are my thoughts:
0001:
1. I think, this is a good idea to use RAM. Since, it's still a UFS, and
we lose nothing in terms of testing, but win in speed significantly.
2. Change from "swapoff -a || true" to "swapoff -a" is legit in my view,
since it's better to explicitly fail than silent any possible problem.
3. Man says that lowercase suffixes should be used for the mdconfig. But in
fact, you can use either lowercase or an appercase. Yep, it's in
the mdconfig.c: "else if (*p == 'g' || *p == 'G')".
0002:
1. The resource usage should be a bit higher, this is for sure. But, if
I'm not missing something, not drastically. Anyway, I do not know
how to measure this increase to get concrete values.
2. And think of a potential benefits of increasing the number of test jobs:
more concurrent processes, more interactions, better test coverage.
Here are my runs:
FreeBSD @master
https://cirrus-ci.com/task/4934701194936320
Run test_world 05:56
FreeBSD @master + 0001
https://cirrus-ci.com/task/5921385306914816
Run test_world 05:06
FreeBSD @master + 0001, + 0002
https://cirrus-ci.com/task/5635288945393664
Run test_world 02:20
For comparison
Debian @master
https://cirrus-ci.com/task/5143705577848832
Run test_world 02:38
In the overall, I consider this changes useful. CI run faster, with better
test coverage in exchange for presumably slight increase
in resource usage, but I don't think this increase should be significant.
--
Best regards,
Maxim Orlov.