Parallel Programming and High-Performance Computing - PDF

Please download to get full document.

View again

of 52
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information Report
Category:

Social Media

Published:

Views: 9 | Pages: 52

Extension: PDF | Download: 0

Share
Related documents
Description
Parallel Programming and High-Performance Computing Part 6: Dynamic Load Balancing Dr. Ralf-Peter Mundani CeSIM / IGSSE Overview definitions examples of load balancing strategies space filling curves swarm
Transcript
Parallel Programming and High-Performance Computing Part 6: Dynamic Load Balancing Dr. Ralf-Peter Mundani CeSIM / IGSSE Overview definitions examples of load balancing strategies space filling curves swarm intelligence Computers make it easier to do a lot of things, but most of the things they make it easier to do don t need to be done. Andy Rooney 6 2 Definitions motivation central issue: fairly distribution of computations across all processors / nodes in order to optimise run time (user s point of view) system load (computing centre s point of view) so far, division of a problem into a fixed number of processes to be executed in parallel problem amount of work is often not known prior to execution load situation changes permanently (adaptive mesh refinement within numerical simulations, I/O, searches, ) different processor speeds (heterogeneous systems, e. g.) different latencies for communication (grid computing, e. g.) objective: load distribution or load balancing strategies 6 3 Definitions static load balancing to be applied before the execution of any process (in contrast to dynamic load balancing to be applied during the execution) usually referred to as mapping problem or scheduling problem potential static load-balancing techniques round robin: assigning tasks (more general formulation than work to cover both data and function parallelism) in sequential order to processes coming back to the first when all processes have been given a task randomised: selecting processes at random to assign tasks recursive bisection: recursive division of problems into smaller subproblems of equal computational effort with less communication costs genetic algorithm: finding an optimal distribution of tasks according to a given objective function 6 4 Definitions static load balancing (cont d) mapping should reflect communication pattern of processes in case of static network topologies when assigning tasks, i. e. short communication paths between processors / nodes to be preferred ( NP-complete problem) missing knowledge about execution times of various parts of a program might lead to very inaccurate mappings communication delays that vary under different circumstances are difficult to incorporate with static load balancing algorithms might have an indeterminate number of steps to reach their solutions (traversing a graph in search algorithms, e. g.) hence, different approaches needed to overcome the mentioned problems 6 5 Definitions dynamic load balancing division of tasks dependent upon the execution of parts of the program as they are being executed entails additional overhead (to be kept as small as possible, else bureaucracy wins) assignment of tasks to processes can be classified as centralised tasks are handed out from a centralised location within a master-slave structure one dedicated master process directly controls each of a set of slave processes decentralised tasks are passed between arbitrary processes worker processes operate upon the problem and interact among themselves a worker process may receive tasks from other or may send tasks to other worker processes 6 6 Definitions centralised dynamic load balancing example: work pool master process holds a collection of tasks to be performed by the slave processes tasks are sent ( ) to slave processes when a task is completed, a slave process requests ( ) another task from the master process all slaves are the same (replicated worker), but specialised slaves capable of performing certain tasks are also possible work pool master queue with tasks slave slave 6 7 Definitions centralised dynamic load balancing (cont d) work pool techniques can also be readily applied when tasks are quite different and of different size (in general, it is best to hand out larger tasks first to prevent idle waiting) amount of tasks may change during execution, i. e. execution of one task might generate new tasks (to be submitted to the master) computation terminates if both of the following are satisfied 1) task queue is empty 2) every process made a request for another task without any new tasks being generated (even if (1) is true a still running process may provide new tasks for the task queue) a slave may detect the program termination condition by some local termination condition (searches, e. g.), hence it has to send a termination message to the master for closing down all others 6 8 Definitions decentralised dynamic load balancing example: distributed work pool drawback of centralised model: master might become bottleneck in case of too many slaves hence, work pool is distributed among several masters each master controls one group of slaves several layers of decomposition possible building up a tree hierarchy with tasks being passed downwards and requests / messages being passed upwards M 0 M 1 M N 6 9 Definitions decentralised dynamic load balancing (cont d) example: fully distributed work pool once tasks are (initially) distributed among processes (that moreover are able to generate new tasks), all processes can execute tasks from each other tasks could be transferred by a receiver-initiated method: a process that has only few or no tasks to perform requests tasks from other processes it selects (works well at high system loads) sender-initiated method: a process with heavy load sends tasks to other processes it selects and that are willing to accept them (works well at light overall system loads) in general, avoid passing on the same task that is received which one to prefer, what kind of flaws do they have? 6 10 Definitions load models decisions of any strategy about passing / requesting tasks are based on the local load problem: measurement of the load reliable load models are based upon load indices simple and composite load indices (one or more quantities) might refer to different functional units (CPU, bus, memory, ) snapshot or integrated or averaged quantities stochastic quantities to reflect external influences properties of a good load index precisely reflects the target quantity at present allows for accurate predictions concerning the future smoothing behaviour to compensate peaks based upon some simple formula, easy to compute 6 11 Definitions termination detection recognising that computation has come to an end can be a significant problem in decentralised dynamic load balancing distributed termination at time T requires the following conditions to be satisfied (BERTSEKAS and TSITSIKLIS, 1989) 1) application-specific local termination conditions exist throughout the collection of processes at time T 2) no messages are in transit between processes at time T difference to centralised termination conditions: taking into account messages in transit, because a message in transit might restart an already terminated process problem: how to recognise a message in transit waiting for a long enough period of time to allow any message in transit to arrive is not to be favoured 6 12 Definitions termination detection (cont d) acknowledgement messages (1) each process is in one of the two states active or inactive initially, without any task to perform, a process is inactive upon receiving a task it changes to the active state and the sending process becomes its parent (it can itself become parent if passing a task to another inactive process thus creating a tree of process hierarchies) an active process can receive more tasks from other active processes while it is in the active state, but these processes do not become its parent an acknowledgement message (ACK) from another process is expected whenever passing a task to that process whenever receiving a task from a process an immediate ACK is sent, except if receiving a tasks from its parent process 6 13 Definitions termination detection (cont d) acknowledgement messages (2) an ACK to the parent process is only sent in case a process is ready to become inactive when 1) a local termination condition exists (all tasks are finished) 2) all ACKs for received tasks have been sent 3) all ACKs for tasks passed to others have been received due to (3) a process must become inactive before its parent computation can terminate when first process becomes idle task process I final ACK first task parent ACK A other processes 6 14 Definitions termination detection (cont d) ring termination (1) processes are (logically) organised in a ring structure the single-pass ring termination algorithm is defined follows 1) when P 1 terminates, it generates a token that is passed to P 2 2) when P i (1 i N) receives the token, it waits for its local termination condition (or has already terminated) and then passes the token onward to P i+1 (P N passes the token to P 1 ) 3) when P 1 receives a token, it knows all processes have terminated a message can be sent to all processes informing them of global termination, if necessary the algorithm assumes that a process cannot be reactivated after reaching its local termination condition 6 15 Definitions termination detection (cont d) ring termination (2) the dual-pass ring termination algorithm can handle processes being reactivated but requires two passes around the ring here, tokens and processes are coloured white or black if process P i passes tasks to P j (j i) it becomes a black process; otherwise it is a white process black processes colour a token black, white processes pass a token in its original colour (i. e. black or white) P 1 (when terminated) generates and passes a white token to P 2 ; when P 1 receives a black token it passes on a white token, otherwise (white) all processes have terminated P 1 P j P i P N 6 16 Definitions strategy selection not all strategies are appropriate for any problem crucial task: how to find the best strategy for a given problem main aspects to be considered objective: optimisation of load or run time level of integration: OS, runtime system (MPI, e. g.), application units to distribute: process / thread, parts of program, data, further aspects static / dynamic strategies, central / decentral strategies source of initiative: idle slave, overloaded slave, master, costs of the chosen strategy (computation should dominate load distribution and not vice versa) placement of new processes or real process migration 6 17 Overview definitions examples of load balancing strategies space filling curves swarm intelligence 6 18 Examples of Load Balancing Strategies diffusion model (a. k. a first order scheme) analogy to physical processes in nature (salt or ink in water, e. g.) original algorithm introduced by CYBENKO (1989) for static network topologies, meanwhile it has been often studied and derived (second order scheme, dynamic network topologies, e. g.) idea: a process P i balances its load simultaneously with all its neighbours N(i) ratio α ij of the load difference between process P i and P j is swapped between them according to w (t+ 1) i = w (t) i j N(i) α ij (t) (t) ( w w ), i 1 i n, 1 α ij 1 where w (t) i is the workload done by process P i at time t various methods to be found that determine parameter α ij such as optimal choice: needs global knowledge of the network BOILLAT choice: needs only local knowledge of the neighbours j 6 19 Examples of Load Balancing Strategies diffusion model (cont d) update of workload can be done a) after all balancing factors have been computed (JACOBI-like) b) during computation of balancing factors (GAUSS-SEIDEL-like) example: first two iteration steps according to method a) for a 2D grid with a ratio of α =0.25 for workload swapping initial setup (t = 0) first step (t = 1) second step (t = 2) 6 20 Examples of Load Balancing Strategies bidding (economic model) analogy to mechanisms of price fixing in markets idea process (with high workload) advertises tasks to its neighbours neighbours submit their free resources as bid process with highest bid (i. e. largest free resources) wins remarks maybe several rounds of bidding necessary successively extending the range of bidders in case of sudden workload peaks, a process might reject the purchased tasks processes with free resources are still allowed to ask for tasks drawback: quite complex analysis of this model 6 21 Examples of Load Balancing Strategies balanced allocation (balls into bins) basic idea: placing N balls into N bins at random choice (extensively studied problem from probability and statistics) variant of the above each ball comes with D possible destinations (to be placed), chosen independently and uniformly at random then the ball is placed in the least full bin among the D possible destinations applied to load balancing: a process selects D processes at random and passes some of its workload to the least loaded one for temporary tasks (i. e. tasks that are finished at unpredictable times) this strategy has a competitive ratio of Ο( N ) compared to the optimal off-line strategy (that has global knowledge) 6 22 Examples of Load Balancing Strategies broker system origin of the idea: brokers at the stock exchange designed and especially well-suited for hierarchical topologies idea each processor has one broker with local knowledge (about workload in subtree, e. g.) tasks arrive at the local broker (via an application server) and are dependent from the available budget processed locally or passed to the parent node on some level (at least at the root node), some price-based decision and allocation is done prices have to be paid for using remote resources as well as for the broking itself local computations are cheaper flexible strategy for hierarchical and heterogeneous topologies 6 23 Examples of Load Balancing Strategies random matching origin of the idea: graph theory principle construct a matching in the topology graph G = (V, E) of the network (set of vertices V are the processors, set of edges E are the direct connections between processors) matching: injective function f: x y for all x, y V perfect load balancing along all edges of the matching this is an iterative strategy, hence several steps are necessary matching must be found in parallel start with an empty set of edges in each vertex local selection (by chance) of one incident edge in each vertex coordination with neighbouring vertices, solution of conflicts 6 24 Examples of Load Balancing Strategies precalculation of the load all strategies so far are based on local information only hence, load balancing is often quite expensive since (from a global point of view) balancing steps not always lead to a better load distribution among the processors idea global determination of the workload at program start or at certain points in time global determination of an appropriate load distribution workload transfer with less communication developed and used for hierarchical network topologies workload recording and load balancing between child and parent nodes 6 25 Overview definitions examples of load balancing strategies space filling curves swarm intelligence 6 26 Space Filling Curves definition origin of the idea: analysis and topology ( topological monsters ) nice example of a construct from pure mathematics that gets practical relevance only decades later definition of a space filling curve (SFC) curve: image of a continuous mapping f: [0,1] [0,1] D SFC: continuous, surjective mapping f: [0,1] [0,1] D that covers an area (with a JORDAN content) greater than zero prominent representatives HILBERT s SFC (1891): most famous SFC PEANO s SFC (1890): oldest SFC LEBESGUE s SFC: most important SFC for computer science further reading: H. Sagan, Space-Filling Curves, Springer (1994) nice applet: 6 27 Space Filling Curves HILBERT s space filling curve for reasons of simplicity only in 2D f: I = [0,1] [0,1] 2 = Q construction of SFC follows the geometric conception If I can be mapped onto Q in the space filling sense, then each of the four congruent subintervals of I can be mapped to one of the four quadrants of Q in the space filling sense, too. recursive application of above preserves neighbourhood relations: neighbouring subintervals in I are mapped onto neighbouring subsquares of Q subset relations (inclusion): from I 1 I 2 follows f(i 1 ) f(i 2 ) border case: HILBERT s SFC 6 28 Space Filling Curves HILBERT s space filling curve (cont d) correspondence of nested intervals in I and nested squares in Q provides pairs of points in I with corresponding image points in Q of course, the iterative steps in this generation process are of practical relevance, not the border case itself 1) starting with a generator or Leitmotiv that defines the order in which the subsquares are visited 2) applying generator in each subsquare (with appropriate similarity transformations if necessary) 3) connecting the open ends generator for HILBERT s SFC 6 29 Space Filling Curves HILBERT s space filling curve (cont d) classical version of HILBERT Space Filling Curves HILBERT s space filling curve (cont d) variant of MOORE modulo symmetry, these are the only two possibilities 6 31 Space Filling Curves HILBERT s space filling curve (cont d) all iterations are injective, but HILBERT s SFC itself is not injective (there are image points with more than one original point in I) important precondition: there exists a bijective mapping between two finite-dimensional smooth manifolds (CANTOR, 1878), but it cannot be both bijective and continuous (NETTO, 1879) 6 32 Space Filling Curves PEANO s space filling curve ancestor of all SFCs subdivision of I and Q into nine congruent subdomains definition of a generator, again, defines the order of visit Space Filling Curves PEANO s space filling curve (cont d) now, there are (modulo symmetry) 273 different possibilities to recursively apply the generator preserving neighbourhood and inclusion serpentine type (left and centre) and meander type (right) 6 34 Space Filling Curves LEBESGUE s space filling curve definition of LEBESGUES s SFC by the CANTOR set CANTOR set C: repeatedly deleting the open middle thirds of [0,1] 0 1 C is defined as set of points not excluded, hence the remaining interval can be computed by the total length removed N= N = L N the proportion of the remaining interval seems to be 1 1 = 0, but in fact C has the same cardinality as the unit interval [0,1] (!) = 2 3 = Space Filling Curves LEBESGUE s space filling curve (cont d) nested intervals of C to be represented by ternary numbers of the form 0 3.w 1 w 2 w 3 with w i {0, 1, 2} (1 3.0) example: parameter T = 2/9 can be written as [0,1], [0 3.0,0 3.1], [0 3.02,0 3.10], [ , ], [ , ], since all interval borders can be written in two different ways (1 3.0 or , e. g.) and the middle third ([0 3.1,0 3.2], e. g.) is repeatedly deleted, the CANTOR set only contains ternary numbers that consist of 0 and Space Filling Curves LEBESGUE s space filling curve (cont d) when mapping C to [0,1] 2 according to f : 03.w1w 2w ( 2 3 w 4 K ) K K and connecting the image points via linear interpolation, this results to LEBESGUE s SFC also referred to as Z-order = 02.x1x 02.y2y Space Filling Curves LEBESGUE s space filling curve (cont d) Z-ordering is well-known from quadtrees and octrees when lineariseing a tree by a depth-first traversal (provides a common naming scheme for cells lexicographic or MORTON index) bitwise interleaving of coordinate values (x, y) leads to Z-value useful for multidimensional range searches, e. g x: y: z: x/y Space Filling Curves LEBESGUE s space filling curve (cont d) compared to SFCs studied so far, there are several differences both HILBERT s and PEANO s SFC can be nowhere differentiated, whereas LEBESGUE s SFC can be differentiated almost everywhere both HILBERT s and PEANO s SFC are self-similar (close relation to fractals such as KOCH s snowflake or SIERPIŃSKI s triangle), but LEBESGUE s SFC is not self-similar generation is less complicated as for HILBERT and PEANO many applications of this SFC in computer science 6 39 Space Filling Curves applications sequentialisation of multidimensional data to one dimension whil
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks