- Jens Gustedt
- Amelia De Vivo
Isabelle Guérin Lassous
- 1997-2008 This documentation describes parXXL 1.00
Copyright © 1997-2008 INRIA, France, http://www.inria.fr/,
and Supélec, France, http://www.supelec.fr/
Permission to use, copy, modify, and distribute this software and its documentation under the terms as provided in the file LICENSE is hereby granted. No representations are made about the suitability of this software for any purpose. It is provided "as is" without express or implied warranty.
- Credits and Rights
- Current Authors and Maintainers
- Authors of previous versions
- Table of Contents
- Two different ways to Organize the Rounds and the Granularity.
- Supported Communication Interfaces
- parXXL processes and POSIX threads
- Supported Platforms
- Organization of parXXL
- Installing parXXL
- Building the library
- Configuring the library
- Compiling the parXXL library
- Installing parXXL in its destination directory
- Installation if configure does'nt work
- Compile and link parXXL executables
- Tuning parXXL and parXXL executables
- Different target architectures
- Running parXXL
- How to debug a parXXL application
- Missing symbols
- Coding Style
- Opening and closing braces
- Breaking lines and line breaks
- Declarations with derived types.
- `>>' clash
- Argument mix up for macros
- Commenting code
parXXL is a library for large scale computation and communication that executes fine grained algorithms (computation and communication are of the same order of magnitude) on so-called coarse grained architectures (clusters, grids, mainframes).
Historically is the result of a merge of two different projects, ParCeL and SSCRAP, that stand for a consequent modeling and implementation of fine grained networks (ParCeL) and coarse grained algorithmics (SSCRAP) respectively.
ParCeL was a familly of cellular oriented parallel programming models
. The project started in 1989 at Supelec Metz campus. Different ParCeL have been designed and implemented like complete languages or C libraries, for shared or distributed memory parallel architectures. First ParCeLs were designed for Artificial Intelligence applications, like neural networks and multi-agent systems. Last version of ParceL (ParCeL-6.1, 2004-2005) was designed for more generic fine grained numerical applications, but was limited to shared memory multiprocessors.
parXXL inherits of the cellular orientation of ParCeL project. Some elementary computing elements can be defined (the cells) and explicitely connected, to build some cellular networks. parXXL supports hundred millions and billions of cells, that own some private variables and are associated to identical or different behavior functions to achieve cellular computations. This computation paradigm is cyclic: each cell is activated one time per cycle. It runs its current behavior function, achieves some local computations, sends results to other cells and (sometimes) save results in a global data storage area (in memory or in a file). Cells communicate through output channels, according to different communication modes (more or less synchronous).
The cellular parXXL functionnalities aare implemented on top of communication and memory management levels of parXXL, corresponding to the previous SSCRAP levels.
SSCRAP stood for
Soft Synchronized Computing in Rounds for Adequate Parallelization
and parXXL inherits most what is expressed in that naming.
Let us explain the terms in there.
- parXXL provides you with a number par::cntrl::np() of (abstract) parXXL processes. Don't mix this up with the real processors that your machine might have. In some circumstances these might be two very different numbers. E.g you might use parXXL with some hundred parXXL processes to compute just on one single CPU.
parXXL processes can do any sort of computation that can be expressed in the C++ programming language. In addition they can communicate amongst each other, that is to a certain extent they can share information.
- parXXL should be able to cover very different aspects of computational problems. But in general it will only make sense to use it on large problems. A problem is large if its solution isn't possible by using your everydays computing environment. Usually this is for one (or both) of it beeing too large in size (needing too much memory) or it beeing too time consuming. Observe that this definition gives us a moving target: problems that are big nowadays might not be so in 10 years from now.
- Base on an idea that was introduced by Valiant in the early 90's, rounds or supersteps are the fundamental units of abstraction that are used by parXXL to describe an algorithm. A superstep groups together a substantial amount of computation and communication to force the efficient use of system resources. In particular, if used correctly it inhibits certain sorts of nonsense that unexperienced programmers might be tempted to implement. To a certain extent, parXXL forces such a correct use. This is a voluntary and severe restriction of your freedom of expression as a programmer. If you don't want it, don't use parXXL.
- soft synchronization
- Other than Valiant was proposing in his initial paper for the theoretical BSP model, parXXL may be much more relaxed concerning synchronization. In BSP, a synchronization barrier between all processors was established after any superstep. In practise, such a strong requirement is not necessary. What is necessary is that any processor that is starting round
n should have terminated all communication with other processors that was initiated in round
n-1. So parXXL ensures just this. As soon as it has concluded all communication of some round
n-1 a parXXL process can start round
n without waiting for the other processes. This often lets us gain a factor of two on problems that can't be easily balanced amoung the processes. If you think a factor of two for your running times is neglectable, please think it over again. Think of paying twice your monthly salary for the computing resources you need, or just one. If your still are not convinced, again, parXXL is probably not what you need.
parXXL gives you two different levels of abstraction to organize the rounds and in particular their communication. These influence the granularity
that a parXXL program may have, i.e which relation between the size of the data and the number of parXXL processes may make sense for a particular run of the program.
- targeted communication
- During the round, each parXXL process knows exactly to which other process it will want to communicate, how much data this will be etc. A typical example for an algorithm that could effectively use this is round robin matrix multiplication.
- untargeted communication
- During the round, the parXXL processes discover (usually by inspecting their particular data) to whom and how much they have to communicate.
Emphasis in our developpement lays in the second, untargeted communication. For the moment targeted communication is mainly used internally. For an entry point to untargeted communication please have a look to par::mem::chunk, par::step::array, par::step::bulk and par::step::query_reply. Good entry points are also the code of the algorithms in the directory tests&benchs that comes with parXXL.
For targeted communication look for par::cntrl::send.
Currently we fully support two different communication interfaces:
- message passing via MPI
- shared memory via PosiX threads
Preliminary parts of communication via mapped memory are implemented. This will make mixed interfaces between message passing and shared memory approaches possible. This will use mapped files (mmap, perhaps even via NFS) and mapped segments (shm_open).
PVM as message passing interface is no longer maintained.
Beware that the same problems that we encoutered may happen to your code when changing to pthreads: don't use any global variables that should have different values for different parXXL processes. Global variables are shared by threads, so most probably your threads would kill each other. If you feel in urgent need of such global variables look into par::sys::once and par::sys::specific
The first type provides you with a variable that is global and shared but for which you may be sure that it is only inialized once. After that you are on your own, probably your processes should only read this variable afterwards. par::sys::specific implemets variables that have a value that is guaranteed to be specific for each process.
This problem also holds for system libraries. Some of them are just made thread safe the easy way by the system designers: they just block on concurred access and your program might sloooooooooow down instead of speeding up. Good candidats for that are all kinds of random generators and memcpy that use global variables to hold their state. The parXXL layer par::sys contains a lot of wrappers that you should use instead of this potential trouble makers, in particular use par::sys::rand whenever you need some randomness in your program.
We actively tested parXXL on the following platforms:
|CPU ||OS ||compiler ||threads ||MPI |
|i86, different flavors ||linux v. 2.4 and 2.6 ||gcc v. >3.2 recommended ||native pthread ||lam, mpich |
|RS 12000 ||IRIX v. ||CC (native) ||posix (native) ||native |
|sparc v. 9 ||SUN OS ||gcc ||posix (native) ||native |
parXXL should be able to run on other UNIX-like systems without many modifications. Please submit patches and updates if you managed to compile on another system than indicated above.
If you are interested in porting parXXL to other systems, in particular Windows, please let us know. This should mainly concern the par::sys and par::cntrl layers that encapsulate the access the system and the communication primitives, respectively.
parXXL is organized in different levels that correspond to different C++
- Everything that can be done with a C++ Compiler alone.
- System level add ons.
- The implementation of the particular memory handling model of parXXL.
- The implemantation of the run time and cntrl support of parXXL.
- Implementing supersteps.
- The implementation of cellular networks.
- A toolbox of predefined cellular networks.
To illustrate that separation into different levels consider the family of functions named
quicksort. Such functions are defined on three of the different levels:
- Is the interface to a template family of a classical C++ function that sorts an array of a given type.
- interfaces this to the memory management utilities (par::mem::chunk) of parXXL.
- implements a parallel/distributed sorting algorithm that works on a distributed array (par::step::array). This highlevel algorithm uses par::sys::quicksort as base algorithm to sort items locally.