parXXL

1.00

Credits and Rights

Current Authors and Maintainers

Author:
Jens Gustedt

Stéphane Vialle

Authors of previous versions

Author:
Amelia De Vivo

Mohamed Essaïdi

Isabelle Guérin Lassous

Date:
1997-2008 This documentation describes parXXL 1.00

Copyright

Copyright © 1997-2008 INRIA, France, http://www.inria.fr/, and Supélec, France, http://www.supelec.fr/

Permission to use, copy, modify, and distribute this software and its documentation under the terms as provided in the file LICENSE is hereby granted. No representations are made about the suitability of this software for any purpose. It is provided "as is" without express or implied warranty.

Table of Contents

  1. Credits and Rights
    1. Current Authors and Maintainers
    2. Authors of previous versions
    3. Copyright
  2. Table of Contents
  3. Introduction
    1. ParCeL
    2. SSCRAP
  4. Two different ways to Organize the Rounds and the Granularity.
  5. Supported Communication Interfaces
    1. parXXL processes and POSIX threads
  6. Supported Platforms
  7. Organization of parXXL
  8. Installing parXXL
    1. Building the library
      1. Configuring the library
      2. Compiling the parXXL library
      3. Installing parXXL in its destination directory
    2. Installation if configure does'nt work
    3. Compile and link parXXL executables
    4. Tuning parXXL and parXXL executables
    5. Different target architectures
    6. Running parXXL
  9. How to debug a parXXL application
  10. Missing symbols
  11. Coding Style
    1. Indentation
    2. Opening and closing braces
    3. Breaking lines and line breaks
    4. Functions
    5. Declarations with derived types.
    6. Templates
      1. `>>' clash
      2. Argument mix up for macros
    7. Commenting code
  12. Documentation

Introduction

parXXL is a library for large scale computation and communication that executes fine grained algorithms (computation and communication are of the same order of magnitude) on so-called coarse grained architectures (clusters, grids, mainframes).

Historically is the result of a merge of two different projects, ParCeL and SSCRAP, that stand for a consequent modeling and implementation of fine grained networks (ParCeL) and coarse grained algorithmics (SSCRAP) respectively.

ParCeL

ParCeL was a familly of cellular oriented parallel programming models. The project started in 1989 at Supelec Metz campus. Different ParCeL have been designed and implemented like complete languages or C libraries, for shared or distributed memory parallel architectures. First ParCeLs were designed for Artificial Intelligence applications, like neural networks and multi-agent systems. Last version of ParceL (ParCeL-6.1, 2004-2005) was designed for more generic fine grained numerical applications, but was limited to shared memory multiprocessors.

parXXL inherits of the cellular orientation of ParCeL project. Some elementary computing elements can be defined (the cells) and explicitely connected, to build some cellular networks. parXXL supports hundred millions and billions of cells, that own some private variables and are associated to identical or different behavior functions to achieve cellular computations. This computation paradigm is cyclic: each cell is activated one time per cycle. It runs its current behavior function, achieves some local computations, sends results to other cells and (sometimes) save results in a global data storage area (in memory or in a file). Cells communicate through output channels, according to different communication modes (more or less synchronous).

The cellular parXXL functionnalities aare implemented on top of communication and memory management levels of parXXL, corresponding to the previous SSCRAP levels.

SSCRAP

SSCRAP stood for

Soft Synchronized Computing in Rounds for Adequate Parallelization

and parXXL inherits most what is expressed in that naming.

Let us explain the terms in there.

parallelization
parXXL provides you with a number par::cntrl::np() of (abstract) parXXL processes. Don't mix this up with the real processors that your machine might have. In some circumstances these might be two very different numbers. E.g you might use parXXL with some hundred parXXL processes to compute just on one single CPU.

parXXL processes can do any sort of computation that can be expressed in the C++ programming language. In addition they can communicate amongst each other, that is to a certain extent they can share information.

computing
parXXL should be able to cover very different aspects of computational problems. But in general it will only make sense to use it on large problems. A problem is large if its solution isn't possible by using your everydays computing environment. Usually this is for one (or both) of it beeing too large in size (needing too much memory) or it beeing too time consuming. Observe that this definition gives us a moving target: problems that are big nowadays might not be so in 10 years from now.
rounds
Base on an idea that was introduced by Valiant in the early 90's, rounds or supersteps are the fundamental units of abstraction that are used by parXXL to describe an algorithm. A superstep groups together a substantial amount of computation and communication to force the efficient use of system resources. In particular, if used correctly it inhibits certain sorts of nonsense that unexperienced programmers might be tempted to implement. To a certain extent, parXXL forces such a correct use. This is a voluntary and severe restriction of your freedom of expression as a programmer. If you don't want it, don't use parXXL.
soft synchronization
Other than Valiant was proposing in his initial paper for the theoretical BSP model, parXXL may be much more relaxed concerning synchronization. In BSP, a synchronization barrier between all processors was established after any superstep. In practise, such a strong requirement is not necessary. What is necessary is that any processor that is starting round n should have terminated all communication with other processors that was initiated in round n-1. So parXXL ensures just this. As soon as it has concluded all communication of some round n-1 a parXXL process can start round n without waiting for the other processes. This often lets us gain a factor of two on problems that can't be easily balanced amoung the processes. If you think a factor of two for your running times is neglectable, please think it over again. Think of paying twice your monthly salary for the computing resources you need, or just one. If your still are not convinced, again, parXXL is probably not what you need.

Two different ways to Organize the Rounds and the Granularity.

parXXL gives you two different levels of abstraction to organize the rounds and in particular their communication. These influence the granularity that a parXXL program may have, i.e which relation between the size of the data and the number of parXXL processes may make sense for a particular run of the program.

targeted communication
During the round, each parXXL process knows exactly to which other process it will want to communicate, how much data this will be etc. A typical example for an algorithm that could effectively use this is round robin matrix multiplication.
untargeted communication
During the round, the parXXL processes discover (usually by inspecting their particular data) to whom and how much they have to communicate.

Emphasis in our developpement lays in the second, untargeted communication. For the moment targeted communication is mainly used internally. For an entry point to untargeted communication please have a look to par::mem::chunk, par::step::array, par::step::bulk and par::step::query_reply. Good entry points are also the code of the algorithms in the directory tests&benchs that comes with parXXL.

For targeted communication look for par::cntrl::send.

Supported Communication Interfaces

Currently we fully support two different communication interfaces:

Preliminary parts of communication via mapped memory are implemented. This will make mixed interfaces between message passing and shared memory approaches possible. This will use mapped files (mmap, perhaps even via NFS) and mapped segments (shm_open).

PVM as message passing interface is no longer maintained.

parXXL processes and POSIX threads

Beware that the same problems that we encoutered may happen to your code when changing to pthreads: don't use any global variables that should have different values for different parXXL processes. Global variables are shared by threads, so most probably your threads would kill each other. If you feel in urgent need of such global variables look into par::sys::once and par::sys::specific.

The first type provides you with a variable that is global and shared but for which you may be sure that it is only inialized once. After that you are on your own, probably your processes should only read this variable afterwards. par::sys::specific implemets variables that have a value that is guaranteed to be specific for each process.

This problem also holds for system libraries. Some of them are just made thread safe the easy way by the system designers: they just block on concurred access and your program might sloooooooooow down instead of speeding up. Good candidats for that are all kinds of random generators and memcpy that use global variables to hold their state. The parXXL layer par::sys contains a lot of wrappers that you should use instead of this potential trouble makers, in particular use par::sys::rand whenever you need some randomness in your program.

Supported Platforms

We actively tested parXXL on the following platforms:
CPU OS compiler threads MPI
i86, different flavors linux v. 2.4 and 2.6 gcc v. >3.2 recommended native pthread lam, mpich
RS 12000 IRIX v. CC (native) posix (native) native
sparc v. 9 SUN OS gcc posix (native) native

parXXL should be able to run on other UNIX-like systems without many modifications. Please submit patches and updates if you managed to compile on another system than indicated above.

If you are interested in porting parXXL to other systems, in particular Windows, please let us know. This should mainly concern the par::sys and par::cntrl layers that encapsulate the access the system and the communication primitives, respectively.

Organization of parXXL

parXXL is organized in different levels that correspond to different C++ namespaces.

par::cpp
Everything that can be done with a C++ Compiler alone.
par::sys
System level add ons.
par::mem
The implementation of the particular memory handling model of parXXL.
par::cntrl
The implemantation of the run time and cntrl support of parXXL.
par::step
Implementing supersteps.
par::cell
The implementation of cellular networks.
par::cellnet
A toolbox of predefined cellular networks.

To illustrate that separation into different levels consider the family of functions named quicksort. Such functions are defined on three of the different levels:

par::sys::quicksort
Is the interface to a template family of a classical C++ function that sorts an array of a given type.
par::mem::quicksort
interfaces this to the memory management utilities (par::mem::chunk) of parXXL.
par::step::quicksort
implements a parallel/distributed sorting algorithm that works on a distributed array (par::step::array). This highlevel algorithm uses par::sys::quicksort as base algorithm to sort items locally.

Generated on Tue Oct 13 22:02:13 2009 for parXXL by  doxygen 1.5.8