Glossary

agent. See asynchronous agent.

aggregation. To combine multiple data items into a single result.

alpha blending. Merging different images into a single image by superimposing them in semitransparent layers.

associative operation. A binary operation is associative if, for a sequence of operations, the order in which the operations are performed does not change the result. For example (a + b) + c = a + (b + c).

asynchronous. An operation that that does not block the current thread of control when the operation starts.

asynchronous agent. A software component that works asynchronously with other agents as part of a larger computation. Often shortened to agent.

asynchronous pipeline. A pipeline in which tasks are only created when data becomes available.

background thread. A thread that stops when a process shuts down. A running background thread does not keep a process running. Threads in the thread pool are background threads. Compare to foreground thread.

barrier. A synchronization point where all participating threads must stop and wait until every thread reaches it.

block. To pause execution while waiting for some event or condition.

captured variable. A variable defined outside a lambda expression that is used in the lambda expression. The lambda expression can update the captured variable.

cluster. A parallel computing system composed of multiple computers connected by a network, not multiple cores in a single physical processor.

closure. A lambda expression that captures variables from an enclosing lexical scope.

commutative operation. A binary operation is commutative if changing the order of the operands does not change the result. For example, a + b = b + a. Examples of commutative operations are scalar addition and scalar multiplication.

concurrency. Programming with multiple activities at the same time. Concurrency enables programs to respond promptly to external stimuli; its goal is to reduce latency. Concurrency can be implemented with asynchronous operations or with threads, where it is expected that threads will take turns executing on processors. Compare to parallelism.

concurrency safe. When a block of code can be run on multiple cores simultaneously without introducing errors.

Concurrency Visualizer. An addition to the Microsoft® Visual Studio® development system’s profiler that collects and displays information about the execution and performance of parallel programs.

context switch. When one thread stops executing on a processor and a different thread resumes. Excessive context switching can occur when processors are oversubscribed and can result in poor performance.

control flow. A basis for coordination whereby tasks execute according to the steps of an algorithm, as in a parallel loop.

control-flow agent. Agents whose run methods contain sequential loops that process incoming values from a messaging block.

cooperative blocking. A programming idiom whereby the current context waits for a resource to become available or for a signal to occur. The task scheduler is notified when a cooperative blocking operation occurs. Compare with non-cooperative blocking.

cooperative cancellation. A programming idiom that uses cooperative blocking to implement operations that are capable of being canceled before they are completed.

cooperative context switch. When a worker thread becomes blocked as a result of a cooperative blocking operation. See cooperative blocking.

coordination. Arranging for tasks to work together to ensure a correct outcome. Coordination can be based on data flow or control flow.

core. The part of a physical processor that executes instructions. Most recent physical processor models have more than one core, so they can execute tasks in parallel.

data flow. A basis for coordination where tasks execute when data becomes available, as in a pipeline or task graph. Compare to control flow.

data parallelism. A form of parallel processing whereby the same computation executes in parallel on different data. Data parallelism is supported in the Parallel Patterns Library (PPL) by the parallel_for and parallel_for_each functions. Compare to task parallelism.

data partitioning. Dividing a collection of data into parts, in order to use data parallelism.

data race. When more than one concurrent thread reads and updates a variable without synchronization.

deadlock. When execution stops and cannot resume because the system is waiting for a condition that cannot occur. Threads can deadlock when one holds resources that another needs. Compare to livelock.

decomposition. To break a problem into smaller parts. For parallel processing, decomposition can be by data or by task.

degree of parallelism. The number of parallel tasks that may execute concurrently at any one time.

dependency. When one operation uses the results of another. When there is a dependency between operations, they cannot run in parallel. Compare to independent.

double-checked locking. Process in which one first tests a condition, then, only if the condition is true, acquires a lock and tests the same condition again, this time to determine whether to update shared data. This maneuver can often avoid the expensive operation of acquiring a lock when it will not be useful.

dynamic partitioning. Data partitioning whereby the parts are selected as the parallel tasks execute. Compare to static partitioning.

enhanced locality mode. When a scheduler tries to execute related tasks in batches.

foreground thread. A thread that keeps a process running. After all its foreground threads stop, the process shuts down. Compare to background thread.

fork/join. A parallel computing pattern that uses task parallelism. Fork occurs when tasks start; join occurs when all tasks finish.

forward progress mode. When a scheduler executes one pending task from each of its schedule groups in round-robin fashion.

future. A task that returns a value.

function object. See functor.

functor. A language feature that allows an instance of a class to be invoked as though it were a function. In C++ functors are defined as a class with a definition for operator().

granularity. The quantity of data in a partition or work in a task. Equivalently, the number of data partitions or tasks. A coarse level of granularity has a few large partitions or tasks; a fine level of granularity has many small partitions or tasks.

hardware thread. An execution pipeline on a core. Simultaneous multithreading (also sometimes known as hyperthreading) enables more than one hardware thread to execute on a single core. Each hardware thread is considered a separate logical processor.

hyperthreading. See simultaneous multithreading.

immutable. Property of data that means it cannot be modified after it's created. For example, strings provided by some libraries are immutable. Compare to mutable.

immutable type. A type whose instances are immutable. Its instances are purely functional data structures.

independent. When one operation does not use the results of another. Independent operations can execute in parallel. Compare to dependency.

kernel mode. The mode of execution in which the Microsoft Windows® kernel runs and has full access to all resources. Compare to user mode.

lambda expression. An anonymous function that can capture variables from its enclosing lexical scope.

livelock. When execution continues but does not make progress toward its goal. Compare to deadlock.

load balancing. When similar amounts of work are assigned to different tasks so that the available processors are used efficiently. Compare to load imbalance.

load imbalance. When different amounts of work are assigned to different tasks so that some tasks don't have enough work to do, and the available processors are not used efficiently. Compare to load balancing.

lock. A synchronization mechanism that ensures that only one thread can execute a particular section of code at a time.

lock convoy. When multiple tasks contend repeatedly for the same lock. Frequent failures to acquire the lock can result in poor performance.

manycore. Multicore, usually with more than eight logical processors.

map. A parallel computation where multiple tasks independently perform the same transformation on different data. An example of data parallelism.

map/reduce. A parallel programming pattern where a data parallel phase (map) is followed by an aggregation phase (reduce).

memory barrier. A machine instruction that enforces an ordering constraint on memory operations. Memory operations that precede the barrier are guaranteed to occur before operations that follow the barrier.

multicore. Having more than one core, able to execute parallel tasks. Most recent physical processor models are multicore.

multiplicity. The number of times an element occurs in a multiset.

multiset. An unordered collection that may contain duplicates. Each element in the collection is associated with a multiplicity (or count) that indicates how many times it occurs. Compare to set.

multiset union. An operation that combines multisets by merging their elements and adding the multiplicities of each element.

mutable. Property of data that means that it can be modified after it is created. Not to be confused with the C++ mutable keyword. Compare to immutable.

mutable type. A type whose instances are mutable.

nested parallelism. When one parallel programming construct appears within another.

node. A computer in a cluster.

nonblocking algorithm. An algorithm that allows multiple tasks to make progress on a problem without ever blocking each other.

nonLower-level blocking operations that are provided by the operating system. The task scheduler is unaware of noncooperative blocking; to the scheduler, the blocked tasks are indistinguishable from running tasks.See cooperative blocking.

NUMA (non-uniform memory access). A computer memory architecture in which memory access time depends on the memory location relative to a processor.

object graph. A data structure consisting of objects that reference each other. Object graphs are often of shared mutable data that can complicate parallel programming.

overlapped I/O. I/O operations that proceed (or wait) while other tasks are executing.

oversubscription. When there are more threads than processors available to run them. Oversubscription can result in poor performance because time is spent context switching.

parallelism. Programming with multiple threads, when it is expected that threads will execute at the same time on multiple processors. Its goal is to increase throughput. Compare to concurrency.

partitioning. Dividing data into parts in order to use data parallelism.

physical processor. A processor chip, also known as a package or socket. Most recent physical processor models have more than one core and more than one logical processor per core.

pipeline. A series of producer/consumers, where each one consumes the output produced by its predecessor.

priority inversion. When a lower-priority thread runs while a higher-priority thread waits. This can occur when the lower-priority thread holds a resource that the higher-priority thread requires.

process. A running application. Processes can run in parallel and are isolated from one another (they usually do not share data). A process can include several (or many) threads or tasks Compare to thread, task.

processor affinity mask. A bit mask that tells the scheduler which processor(s) a thread should be run on.

profiler. A tool that collects and displays information for performance analysis. The Concurrency Visualizer is a profiler for concurrent and parallel programs.

pure function. A function (or method or operation) that has no side effects (does not update any data nor produce any output) and returns only a value.

purely functional data structure. A data structure that can only be accessed by pure functions. An instance of an immutable type.

race. When the outcome of a computation depends on which statement executes first, but the order of execution is not controlled or synchronized.

race condition. A situation in which the observable behavior of a program depends on the order in which parallel operations complete. Race conditions are usually errors; good programming practices prevent them.

recursive decomposition. In parallel programming, refers to the situation in which the tasks themselves can start more tasks.

reduce. A kind of aggregation whereby data is combined by an operation that is associative, which often makes it possible to perform much of the reduction in parallel.

round robin. A scheduling algorithm whereby each thread is given its turn to run in a fixed order in a repeating cycle, so that during each cycle, each thread runs once.

runnable context. A thread that was previously interrupted by a cooperative blocking operation but is now unblocked and ready to resume its work.

scalable. A parallel computation whose performance improves when more processors are available is said to be scalable.

schedule group. An internal data structure used by the scheduler that represents a queue of pending tasks.

semaphore. A synchronization mechanism that ensures that not more than a specified number of threads can execute a particular section of code at a time. Compare to lock.

serialize. To run in sequence, not in parallel.

set. An unordered collection without duplicates. Compare to multiset.

shared data. Data used by more than one thread. Read/write access to mutable shared data requires synchronization.

single-threaded data type. A type that is not thread-safe. It cannot be accessed by multiple threads unless there is additional synchronization in user code.

simultaneous multithreading (SMT). A technique for executing multiple threads on a single core.

socket. Physical processor.

speculative execution. To execute tasks even though their results may not be needed.

static partitioning. Data partitioning whereby the parts are selected before the program executes. Compare to dynamic partitioning.

sentinel value. A user-defined value that acts as an end-of-file-token.

synchronization. Coordinating the actions of threads to ensure a correct outcome. A lock is an example of a synchronization mechanism.

task. A parallelizable unit of work. A task executes in a thread, but is not the same as a thread; it is at a higher level of abstraction. Tasks are recommended for parallel programming with the PPL and Asynchronous Agents libraries. Compare to thread, process.

task graph. Can be seen as a directed graph when tasks provide results that are the inputs to other tasks. The nodes are tasks, and the arcs are values that act as inputs and outputs of the tasks.

task group. A group of tasks that are either waited on or cancelled together. The task_group class is thread-safe.

task inlining. When more than one task executes in a single thread concurrently due to one task requesting that the other task run synchronously at the current point of execution.

task parallelism. A form of parallel processing whereby different computations execute in parallel on different data. Compare to data parallelism.

thread. An executing sequence of statements. Several (or many) threads can run within a single process. Threads are not isolated; all the threads in a process share data. A thread can run a task, but is not the same as a task; it is at a lower level of abstraction. Compare to process, task.

thread affinity. When certain operations must only be performed by a particular thread.

thread pool. A collection of threads managed by the system that are used to avoid the overhead of creating and disposing threads.

thread safe. See concurrency safe.

thread starvation. When tasks are unable to run because there are no virtual processor objects available to run them in the scheduler.

thread-local state. Variables that are accessed by only one thread. No locking or other synchronization is needed to safely access thread-local state. The combinable<T> class is a recommended way to establish thread-local state when using the PPL and Asynchronous Agents libraries.

thread-safe. A type that can be used concurrently by multiple threads, without requiring additional synchronization in user code. A thread-safe type ensures that its data can be accessed by only one thread at a time, and its operations are atomic with respect to multiple threads. Compare to single-threaded data type.

torn read. When reading a variable requires more than one machine instruction, and another task writes to the variable between the read instructions. Compare to torn write.

torn write. When writing a variable requires more than one machine instruction, and another task reads the variable between the write instructions. Compare to torn read.

two-step dance. To signal an event while holding a lock, when the waking thread needs to acquire that lock. It will wake only to find that it must wait again. This can cause context switching and poor performance.

undersubscription. When there are fewer tasks than there are processors available to run them, so processors remain idle.

user mode. The mode in which user applications run but have restricted access to resources. Compare to kernel mode.

virtual core. Logical processor.

volatile. A keyword that tells the C++ compiler that a field can be modified by multiple threads, the operating system, or other hardware.

work functions. The arguments to parallel_invoke or task_group::run.

work stealing. When a thread executes a task queued for another thread, in order to remain busy.


Show: