ComputeThreadPool

public protocol ComputeThreadPool

Allows efficient use of multi-core CPUs by managing a fixed-size collection of threads.

From first-principles, a (CPU) compute-bound application will run at peak performance when overheads are minimized. Once enough parallelism is exposed to leverage all cores, one of the key overheads to minimize is context switching, and thead creation / destruction. The optimal system configuration is thus a fixed-size threadpool where there is exactly one thread per CPU core (or rather, hyperthread). This configuration results in zero context switching, no additional kernel calls for thread creation & deletion, and full utilization of the hardware.

Unfortunately, in practice, it is infeasible to statically schedule work apriori onto a fixed pool of threads. Even when applying the same operation to a homogenous dataset, there will inevitably be variability in execution time. (This can arise from I/O interrupts taking over a core [briefly], or page faults, or even different latencies for memory access across NUMA domains.) As a result, it is important for peak performance to build abstractions that are flexible and dynamic in their work allocation.

The ComputeThreadPool protocol is a foundational API designed to enable efficient use of hardware resources. There are two APIs exposed to support two kinds of parallelism. For additional details, please see the documentation associated with each.

Note: be sure to avoid executing code on the ComputeThreadPool that is not compute-bound. If you are doing I/O, be sure to use a dedicated threadpool, or use Swift NIO for high performance non-blocking I/O.

Note: while there should be only one “physical” threadpool process-wide, there can be many virtual threadpools that compose on top of this one to allow configuration and tuning. (This is why ComputeThreadPool is a protocol and not static methods.) Examples of additional threadpool abstractions could include a separate threadpool per-NUMA domain, to support different priorities for tasks, or higher-level parallelism primitives such as “wait-groups”.

  • Schedules fn to be executed in the threadpool eventually.

    Declaration

    Swift

    func dispatch(_ fn: @escaping () -> Void)
  • Executes a and b optionally in parallel; both are guaranteed to have finished executing before join returns.

    Declaration

    Swift

    func join(_ a: () -> Void, _ b: () -> Void)
  • Executes a and b optionally in parallel; if one throws, it is unspecified whether the other will have started or completed executing. It is also unspecified as to which error will be thrown.

    This is the throwing overload

    Declaration

    Swift

    func join(_ a: () throws -> Void, _ b: () throws -> Void) throws
  • A function to be invoked in parallel a specified number of times by parallelFor.

    Declaration

    Swift

    typealias ParallelForBody 
      = (_ currentInvocationIndex: Int, _ requestedInvocationCount: Int) -> Void

    Parameters

    currentInvocationIndex

    the index of the invocation executing in the current thread.

    requestedInvocationCount

    the number of parallel invocations requested.

  • A function that can be executed in parallel.

    Declaration

    Swift

    typealias ThrowingParallelForBody 
      = (_ currentInvocationIndex: Int, _ requestedInvocationCount: Int) throws -> Void

    Parameters

    currentInvocationIndex

    the index of the invocation executing in the current thread.

    requestedInvocationCount

    the number of parallel invocations requested.

  • A vectorized function that can be executed in parallel.

    The first argument is the start index for the vectorized operation, and the second argument corresponds to the end of the range. The third argument contains the total size of the range.

    Declaration

    Swift

    typealias VectorizedParallelForBody = (Int, Int, Int) -> Void
  • A vectorized function that can be executed in parallel.

    The first argument is the start index for the vectorized operation, and the second argument corresponds to the end of the range. The third argument contains the total size of the range.

    Declaration

    Swift

    typealias ThrowingVectorizedParallelForBody = (Int, Int, Int) throws -> Void
  • parallelFor(n:_:) Default implementation

    Returns after executing fn n times.

    Default Implementation

    Implements parallelFor(n:_:) (scalar) in terms of parallelFor(n:_:) (vectorized).

    Declaration

    Swift

    func parallelFor(n: Int, _ fn: ParallelForBody)

    Parameters

    n

    The total times to execute fn.

  • Returns after executing fn an unspecified number of times, guaranteeing that fn has been called with parameters that perfectly cover of the range 0..<n.

    Declaration

    Swift

    func parallelFor(n: Int, _ fn: VectorizedParallelForBody)

    Parameters

    n

    The range of numbers 0..<n to cover.

  • Returns after executing fn n times.

    Declaration

    Swift

    func parallelFor(n: Int, _ fn: ThrowingParallelForBody) throws

    Parameters

    n

    The total times to execute fn.

  • Returns after executing fn an unspecified number of times, guaranteeing that fn has been called with parameters that perfectly cover of the range 0..<n.

    Declaration

    Swift

    func parallelFor(n: Int, _ fn: ThrowingVectorizedParallelForBody) throws

    Parameters

    n

    The range of numbers 0..<n to cover.

  • The maximum number of concurrent threads of execution supported by this thread pool.

    Declaration

    Swift

    var maxParallelism: Int { get }
  • Returns the index of the current thread in the pool, if running on a thread-pool thread, nil otherwise.

    The return value is guaranteed to be either nil, or between 0 and parallelism.

    Declaration

    Swift

    var currentThreadIndex: Int? { get }