FEAT: Improve interactions of user pools by ColmTalbot · Pull Request #1009 · bilby-dev/bilby

ColmTalbot · 2025-10-30T14:21:48Z

I noticed that it was difficult to pass a user-specified pool through run_sampler, and it isn't used at all in the post processing. This PR:

unifies how to create a pool for Bilby with a context (a la with multiprocessing.pool()...)
explicitly passes the user pool around including for post processing
moves logic out of Sampler._setup_pool and Sampler._close_pool
support MPI pools using schwimmbad. This change will effectively make parallel_bilby moot.

mj-will

Looks great, I look forward to trying it! I've added some initial comments but might need to have a second look.

mj-will · 2025-12-11T17:48:34Z

bilby/core/sampler/__init__.py

+        result = apply_conversion_function(
+            result=result,
+            likelihood=likelihood,
+            conversion_function=conversion_function,
+            npool=npool,
+            pool=pool,
+        )


Will this always reapply the conversion function? If so, I'm unsure if this is desirable since rather than bilby quickly exiting when a run is already done it will spend time doing the conversion. That said, i don't feel strongly about this, so happy to keep it. Thoughts?

It will reapply it, this maintains the current behaviour. I'd be open to changing that behaviour, maybe by adding a flag to the result file to say if the conversion has been applied, but I would say to do that as a separate change.

mj-will · 2025-12-11T17:50:47Z

bilby/core/sampler/__init__.py

+            parameters=priors.sample(),
+        ) as _pool:
+            start_time = datetime.datetime.now()
+            sampler.pool = _pool


Is this safe to do? Depending on how the sampler uses the pool, are there settings where the pool the sampler has stored is not updated?

For example, if the sampler has constructed a likelihood using the pool.map from the initial input pool will this break?

I'm trying to work through what this would look like.

If the initial input pool is not None, all of the pool objects reference here should be the same, so I don't think that will make a difference.

One potential issue is that if a specific sampler implementation handles the pool internally by itself this will create two pools and then in the best case, we have a bunch of extra processes we don't need. I think this is what nessai does, so maybe we should game through that specific case.

mj-will · 2025-12-11T17:54:30Z

bilby/core/sampler/ptemcee.py

        if (
            os.path.isfile(self.resume_file)
-            and os.path.getsize(self.resume_file)
+            and os.path.getsize(self.resume_file) > 5


Why is this needed?

Empty pickle files have nonzero size (5B) so currently the code thinks that should be a valid resume file and fails. Probably this should be fixed on main too.

mj-will · 2025-12-11T17:55:27Z

bilby/core/utils/parallel.py

+    npool=None,
+    pool=None,
+    parameters=None,
+):


I think this would benefit from a doc-string with example usage.

mj-will · 2025-12-11T17:56:35Z

bilby/core/utils/parallel.py

+    npool=None,
+    pool=None,
+    parameters=None,
+):


This would benefit from a doc-string. In particular, I think we should make clear what happens in npool is None or 1.

mj-will · 2025-12-11T17:57:34Z

bilby/core/result.py

+        my_pool = create_pool(likelihood=this_logl, npool=npool)
+        if my_pool is None:
+            map_fn = map
+        else:
+            map_fn = partial(my_pool.imap, chunksize=chunksize)
+        likelihood_fn = partial(_safe_likelihood_call, this_logl)
+
+        log_l = list(tqdm(
+            map_fn(likelihood_fn, dict_samples[starting_index:]),
+            desc='Computing likelihoods',
+            total=n,
+        ))
+        close_pool(my_pool)


Any reason to use open/close over the context manager?

It seems not.

mj-will · 2025-12-11T18:01:09Z

bilby/gw/conversion.py

+                if _pool is not None:
+                    subset_samples = _pool.map(
+                        _compute_per_detector_log_likelihoods,
+                        fill_args[ii: ii + block]
+                    )
+                else:
+                    from ..core.sampler.base_sampler import _sampling_convenience_dump
+                    _sampling_convenience_dump.likelihood = likelihood
+                    subset_samples = [
+                        list(_compute_per_detector_log_likelihoods(xx))
+                        for xx in fill_args[ii: ii + block]
+                    ]


Is it worth simplifying this with something like this?

if _pool is None: map_fn = map _sampling_convenience_dump.likelihood = likelihood else: map_fn = pool.map subset_samples = map_fn( _compute_per_detector_log_likelihoods, fill_args[ii: ii + block] )

mj-will · 2025-12-11T18:01:53Z

bilby/gw/conversion.py

+            if _pool is not None:
+                subset_samples = _pool.map(fill_sample, fill_args[ii: ii + block])
+            else:
+                subset_samples = [list(fill_sample(xx)) for xx in fill_args[ii: ii + block]]


Same comment about map but not sure it's worth it. Thoughts?

I modified it to look closer, but I agree it probably isn't worth it in this case.

ColmTalbot

I don't remember exactly what the behaviour for some of these questions is, so I'll go back and check, and ideally write docstrings about them.

ColmTalbot · 2026-01-23T16:08:52Z

bilby/core/sampler/ptemcee.py

        if (
            os.path.isfile(self.resume_file)
-            and os.path.getsize(self.resume_file)
+            and os.path.getsize(self.resume_file) > 5


Empty pickle files have nonzero size (5B) so currently the code thinks that should be a valid resume file and fails. Probably this should be fixed on main too.

ColmTalbot · 2026-01-23T16:14:22Z

bilby/core/sampler/__init__.py

+        result = apply_conversion_function(
+            result=result,
+            likelihood=likelihood,
+            conversion_function=conversion_function,
+            npool=npool,
+            pool=pool,
+        )


It will reapply it, this maintains the current behaviour. I'd be open to changing that behaviour, maybe by adding a flag to the result file to say if the conversion has been applied, but I would say to do that as a separate change.

ColmTalbot · 2026-01-23T16:24:16Z

bilby/core/sampler/__init__.py

+            parameters=priors.sample(),
+        ) as _pool:
+            start_time = datetime.datetime.now()
+            sampler.pool = _pool


I'm trying to work through what this would look like.

If the initial input pool is not None, all of the pool objects reference here should be the same, so I don't think that will make a difference.

One potential issue is that if a specific sampler implementation handles the pool internally by itself this will create two pools and then in the best case, we have a bunch of extra processes we don't need. I think this is what nessai does, so maybe we should game through that specific case.

ColmTalbot added 11 commits October 24, 2025 15:18

BUG: catch that empty pickle files have non-zero size

1bd7a08

FEAT: improve user pool passing

8bee9e0

FEAT: improve reweighting parallelisation

479c100

Merge remote-tracking branch 'origin/main' into user-pools

978c696

FEAT: add parameters as argument to new pool

dcd05dd

BUG: test that pool exists at cleanup

86b993c

BUG: test pool exists at closing

c51e15c

REFACTOR: refactor run_sampler to simplify pool logic

c4654a9

DEP: discourage setting up pool in sampler

ba1df87

REFACTOR: remove top level multiprocessing import

2025cf0

BUG: make sure prior is passed to pool creation

10e4267

ColmTalbot marked this pull request as draft October 30, 2025 14:22

ColmTalbot added 5 commits October 30, 2025 14:43

BUG: fix test failures

079d1c9

TEST: fix reproducibility test

1ab2fa4

BUG: fix a typo in conversion function test

cffe25c

MAINT: don't create pool of size 1

463af87

BUG: only include chunksize in multiprocessing map

975cbfc

ColmTalbot marked this pull request as ready for review November 3, 2025 16:03

ColmTalbot added >100 lines refactoring sampling Issues about sampling algorithms, their efficiency and robustness labels Nov 4, 2025

mj-will reviewed Dec 11, 2025

View reviewed changes

mj-will mentioned this pull request Jan 8, 2026

BUG: ImportError with dynesty.nestedsamplers.MultiEllipsoidSampler and UnitCubeSampler in dynesty_utils.py #1032

Open

ColmTalbot commented Jan 23, 2026

View reviewed changes

ColmTalbot requested a review from mj-will January 26, 2026 16:30

mj-will mentioned this pull request Jan 29, 2026

ENH: add npool_post_process #1002

Draft

ColmTalbot added this to the 3.0.0 milestone Jan 29, 2026

Conversation

ColmTalbot commented Oct 30, 2025

Uh oh!

mj-will left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ColmTalbot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants