H. Dietz, T. Mattox, et al.
School of Electrical and Computer Engineering, Purdue University
The TTL_PAPERS (TTL implementation of Purdue's Adapter for
Parallel Execution and Rapid Synchronization) Library is
designed to provide very low latency barrier synchronization and
aggregate communication operations to C++ or C programs compiled using
the Gnu compilers. Fortran programs can use the library via
f2c
, as Pascal programs can using p2c
. In
addition to these basic functions, coherent shared memory access is
now also supported for C++ code.
In addition to being usable with any (even heterogeneous) collection
of machines supported by GCC and connected by TTL_PAPERS, this library
also can be compiled to simulate the TTL_PAPERS functionality. This
I/O port-level simulator is called TTL_VAPERS (the
TTL_PAPERS-compatible Virtual Adapter for Parallel Execution and Rapid
Synchronization). TTL_VAPERS actually consists of two
parts: the simulation engine and the graphical debugging interface.
The simulation engine is literally linked into the user program, using
UNIX pipes to communicate among processes for each of the simulated
machines and the simulated TTL_PAPERS unit. The graphical debugging
interface, xvapers
, is built with TCL/TK and is actually
independent from the user program; it simply examines and modifies the
status of the simulated TTL_PAPERS unit. In summary,
xvapers
provides both graphical and textual status
displays, a variety of execution controls including barrier
single-step, interactive debugging of individual processes, etc.
Of course, just as TTL_PAPERS can be used with four, eight, or more
machines, there are TTL_VAPERS versions supporting various sizes for
the simulated cluster. The remainder of this handout briefly
summarizes the user interface structure of the library.
GCC Declaration | Type | Suffix |
---|---|---|
int (used as boolean) | uint1 | 1u |
char | int8 | 8 |
unsigned char | uint8 | 8u |
short | int16 | 16 |
unsigned short | uint16 | 16u |
int | int32 | 32 |
unsigned int | uint32 | 32u |
long long | int64 | 64 |
unsigned long long | uint64 | 64u |
float | f32 | f |
double | f64 | d |
barrier | b |
f
or d
barrier
is type of barrier mask; GCC type depends on the
configured size of the PAPERS cluster
Bits
type(args,
bits) to send only the specified number of bits
s_
, e.g., s_int8
shared<t>
paperslib.h | Header for all PAPERS library routines |
intern.h | Header for inlined primitives |
stypes.h | Header for C++ shared memory classes |
p_init() | Initialize and check-in with PAPERS |
p_exit() | Check-out with PAPERS |
s_init() | Enable shared memory system |
s_exit() | Disable shared memory system |
Functions with a p_
prefix cannot be used while the
shared memory system is active
NPROC | Number of PEs |
IPROC | PE number, 0..NPROC -1 |
CPROC | PE number of console/control PE |
p_enqueue(m) | Enqueue the barrier mask m |
p_wait() | Barrier synchronize with current mask |
s_wait() | Barrier synchronize with current mask |
p_waitvec(f) | Return bit vector assembled from f of each active processor |
p_any(f) | Did any enabled processor have f true? |
p_all(f) | Did all enabled processors have f true? |
p_gathertype(p, d) | Gather an array p[i] = d from PE i |
p_putgettype(d, s) | Put d, get (return) d from PE s |
p_bcastPutitype(d) | Broadcast put d from this PE |
p_bcastGetitype() | Broadcast get value from sending PE |
p_reduceAnditype(d) | Return the bitwise AND of d from each PE |
p_reduceOritype(d) | Return the bitwise OR of d from each PE |
p_reduceAddtype(d) | Return the sum of d from each PE |
p_reduceMultype(d) | Return the product of d from each PE |
p_reduceMintype(d) | Return the minimum of d from each PE |
p_reduceMaxtype(d) | Return the maximum of d from each PE |
p_scanAnditype(d) | Return the bitwise AND of d up to this PE |
p_scanOritype(d) | Return the bitwise OR of d up to this PE |
p_scanAddtype(d) | Return the sum of d up to this PE |
p_scanMultype(d) | Return the product of d up to this PE |
p_scanMintype(d) | Return the minimum of d up to this PE |
p_scanMaxtype(d) | Return the maximum of d up to this PE |
p_ranktype(d) | Return the rank of d for each PE, such that the smallest value has rank 0 |
p_population() | Return the total number of active PEs |
p_enumerate() | Return a consecutive number for each active PE |
p_selectFirst() | Return the lowest active PE number |
p_selectOne() | Return any active PE's number |
p_voteCount(v) | Return number of PEs that voted for us |
p_vote(v) | Return vote bit mask |
p_matchCounttype(v) | Return number of PEs that voted as we did |
p_matchtype(v) | Return match bit mask |
Full information, and source code, is public domain and available from
http://garage.ecn.purdue.edu/~papers
Alternatively, contact Prof. Hank Dietz, School of Electrical and
Computer Engineering, Purdue University, West Lafayette, IN,
47907-1285, phone: (317) 494 3357, fax: (317) 494 3371, email: [email protected]