ComputeThreadPool
public protocol ComputeThreadPool
Allows efficient use of multi-core CPUs by managing a fixed-size collection of threads.
From first-principles, a (CPU) compute-bound application will run at peak performance when overheads are minimized. Once enough parallelism is exposed to leverage all cores, one of the key overheads to minimize is context switching, and thead creation / destruction. The optimal system configuration is thus a fixed-size threadpool where there is exactly one thread per CPU core (or rather, hyperthread). This configuration results in zero context switching, no additional kernel calls for thread creation & deletion, and full utilization of the hardware.
Unfortunately, in practice, it is infeasible to statically schedule work apriori onto a fixed pool of threads. Even when applying the same operation to a homogenous dataset, there will inevitably be variability in execution time. (This can arise from I/O interrupts taking over a core [briefly], or page faults, or even different latencies for memory access across NUMA domains.) As a result, it is important for peak performance to build abstractions that are flexible and dynamic in their work allocation.
The ComputeThreadPool
protocol is a foundational API designed to enable efficient use of
hardware resources. There are two APIs exposed to support two kinds of parallelism. For
additional details, please see the documentation associated with each.
Note: be sure to avoid executing code on the ComputeThreadPool
that is not compute-bound. If
you are doing I/O, be sure to use a dedicated threadpool, or use
Swift NIO for high performance non-blocking I/O.
Note: while there should be only one “physical” threadpool process-wide, there can be many
virtual threadpools that compose on top of this one to allow configuration and tuning. (This is
why ComputeThreadPool
is a protocol and not static methods.) Examples of additional threadpool
abstractions could include a separate threadpool per-NUMA domain, to support different
priorities for tasks, or higher-level parallelism primitives such as “wait-groups”.
See also
ComputeThreadPools
-
Schedules
fn
to be executed in the threadpool eventually.Declaration
Swift
func dispatch(_ fn: @escaping () -> Void)
-
Executes
a
andb
optionally in parallel; both are guaranteed to have finished executing beforejoin
returns.Declaration
Swift
func join(_ a: () -> Void, _ b: () -> Void)
-
Executes
a
andb
optionally in parallel; if one throws, it is unspecified whether the other will have started or completed executing. It is also unspecified as to which error will be thrown.This is the throwing overload
Declaration
Swift
func join(_ a: () throws -> Void, _ b: () throws -> Void) throws
-
A function to be invoked in parallel a specified number of times by
parallelFor
.Declaration
Swift
typealias ParallelForBody = (_ currentInvocationIndex: Int, _ requestedInvocationCount: Int) -> Void
Parameters
currentInvocationIndex
the index of the invocation executing in the current thread.
requestedInvocationCount
the number of parallel invocations requested.
-
A function that can be executed in parallel.
Declaration
Swift
typealias ThrowingParallelForBody = (_ currentInvocationIndex: Int, _ requestedInvocationCount: Int) throws -> Void
Parameters
currentInvocationIndex
the index of the invocation executing in the current thread.
requestedInvocationCount
the number of parallel invocations requested.
-
A vectorized function that can be executed in parallel.
The first argument is the start index for the vectorized operation, and the second argument corresponds to the end of the range. The third argument contains the total size of the range.
Declaration
Swift
typealias VectorizedParallelForBody = (Int, Int, Int) -> Void
-
A vectorized function that can be executed in parallel.
The first argument is the start index for the vectorized operation, and the second argument corresponds to the end of the range. The third argument contains the total size of the range.
Declaration
Swift
typealias ThrowingVectorizedParallelForBody = (Int, Int, Int) throws -> Void
-
parallelFor(n:
Default implementation_: ) Returns after executing
fn
n
times.Default Implementation
Implements
parallelFor(n:_:)
(scalar) in terms ofparallelFor(n:_:)
(vectorized).Declaration
Swift
func parallelFor(n: Int, _ fn: ParallelForBody)
Parameters
n
The total times to execute
fn
. -
Returns after executing
fn
an unspecified number of times, guaranteeing thatfn
has been called with parameters that perfectly cover of the range0..<n
.Declaration
Swift
func parallelFor(n: Int, _ fn: VectorizedParallelForBody)
Parameters
n
The range of numbers
0..<n
to cover. -
Returns after executing
fn
n
times.Declaration
Swift
func parallelFor(n: Int, _ fn: ThrowingParallelForBody) throws
Parameters
n
The total times to execute
fn
. -
Returns after executing
fn
an unspecified number of times, guaranteeing thatfn
has been called with parameters that perfectly cover of the range0..<n
.Declaration
Swift
func parallelFor(n: Int, _ fn: ThrowingVectorizedParallelForBody) throws
Parameters
n
The range of numbers
0..<n
to cover. -
The maximum number of concurrent threads of execution supported by this thread pool.
Declaration
Swift
var maxParallelism: Int { get }
-
Returns the index of the current thread in the pool, if running on a thread-pool thread, nil otherwise.
The return value is guaranteed to be either nil, or between 0 and
parallelism
.Declaration
Swift
var currentThreadIndex: Int? { get }