ComputeThreadPool

public protocol ComputeThreadPool

Allows efficient use of multi-core CPUs by managing a fixed-size collection of threads.

From first-principles, a (CPU) compute-bound application will run at peak performance when overheads are minimized. Once enough parallelism is exposed to leverage all cores, one of the key overheads to minimize is context switching, and thead creation / destruction. The optimal system configuration is thus a fixed-size threadpool where there is exactly one thread per CPU core (or rather, hyperthread). This configuration results in zero context switching, no additional kernel calls for thread creation & deletion, and full utilization of the hardware.

Unfortunately, in practice, it is infeasible to statically schedule work apriori onto a fixed pool of threads. Even when applying the same operation to a homogenous dataset, there will inevitably be variability in execution time. (This can arise from I/O interrupts taking over a core [briefly], or page faults, or even different latencies for memory access across NUMA domains.) As a result, it is important for peak performance to build abstractions that are flexible and dynamic in their work allocation.

The ComputeThreadPool protocol is a foundational API designed to enable efficient use of hardware resources. There are two APIs exposed to support two kinds of parallelism. For additional details, please see the documentation associated with each.

Note: be sure to avoid executing code on the ComputeThreadPool that is not compute-bound. If you are doing I/O, be sure to use a dedicated threadpool, or use Swift NIO for high performance non-blocking I/O.

Note: while there should be only one “physical” threadpool process-wide, there can be many virtual threadpools that compose on top of this one to allow configuration and tuning. (This is why ComputeThreadPool is a protocol and not static methods.) Examples of additional threadpool abstractions could include a separate threadpool per-NUMA domain, to support different priorities for tasks, or higher-level parallelism primitives such as “wait-groups”.

Declaration

Swift

typealias ParallelForBody 
  = (_ currentInvocationIndex: Int, _ requestedInvocationCount: Int) -> Void

Parameters

`currentInvocationIndex`	the index of the invocation executing in the current thread.
`requestedInvocationCount`	the number of parallel invocations requested.

Show on GitHub


                    
                    
                    ThrowingParallelForBody

A function that can be executed in parallel.

Declaration

Swift

typealias ThrowingParallelForBody 
  = (_ currentInvocationIndex: Int, _ requestedInvocationCount: Int) throws -> Void

Parameters

`currentInvocationIndex`	the index of the invocation executing in the current thread.
`requestedInvocationCount`	the number of parallel invocations requested.

Show on GitHub

VectorizedParallelForBody
A vectorized function that can be executed in parallel.

The first argument is the start index for the vectorized operation, and the second argument corresponds to the end of the range. The third argument contains the total size of the range.
Declaration
Swift

typealias VectorizedParallelForBody = (Int, Int, Int) -> Void
Show on GitHub
ThrowingVectorizedParallelForBody
A vectorized function that can be executed in parallel.

The first argument is the start index for the vectorized operation, and the second argument corresponds to the end of the range. The third argument contains the total size of the range.
Declaration
Swift

typealias ThrowingVectorizedParallelForBody = (Int, Int, Int) throws -> Void
Show on GitHub


                    
                    
                    parallelFor(n:_:)

Default implementation

Returns after executing fn n times.

Default Implementation

Implements parallelFor(n:_:) (scalar) in terms of parallelFor(n:_:) (vectorized).

Declaration

Swift

func parallelFor(n: Int, _ fn: ParallelForBody)

Parameters

`n`	The total times to execute `fn`.

Show on GitHub


                    
                    
                    parallelFor(n:_:)

Returns after executing fn an unspecified number of times, guaranteeing that fn has been called with parameters that perfectly cover of the range 0..<n.

Declaration

Swift

func parallelFor(n: Int, _ fn: VectorizedParallelForBody)

Parameters

`n`	The range of numbers `0..<n` to cover.

Show on GitHub


                    
                    
                    parallelFor(n:_:)

Returns after executing fn n times.

Declaration

Swift

func parallelFor(n: Int, _ fn: ThrowingParallelForBody) throws

Parameters

`n`	The total times to execute `fn`.

Show on GitHub


                    
                    
                    parallelFor(n:_:)

Returns after executing fn an unspecified number of times, guaranteeing that fn has been called with parameters that perfectly cover of the range 0..<n.

Declaration

Swift

func parallelFor(n: Int, _ fn: ThrowingVectorizedParallelForBody) throws

Parameters

`n`	The range of numbers `0..<n` to cover.

Show on GitHub

maxParallelism
The maximum number of concurrent threads of execution supported by this thread pool.
Declaration
Swift

var maxParallelism: Int { get }
Show on GitHub
currentThreadIndex
Returns the index of the current thread in the pool, if running on a thread-pool thread, nil otherwise.

The return value is guaranteed to be either nil, or between 0 and parallelism.
Declaration
Swift

var currentThreadIndex: Int? { get }
Show on GitHub