DistributePP
A distributed parallel processing tool for MATLAB® by Michael D. DeVore
DistributePP is a parallel processing package for MATLAB® which is intended to support coarse granularity parallelism
across a heterogeneous computing network with access to a shared file system. It is designed to be
- Lightweight
- The entire package consists of eight source files (5 MATLAB scripts and 3 C routines) and less than 400 lines
of code. Installation is simple, just add the DistributePP directory to the MATLAB search path.
- Robust
- File system locking controls server coordination so server processes may be killed and started with impunity.
Node crashes do not interfere with the proper operation of server processes on other nodes. Such crashes typically
do not result in unsatisfied requests because restarted processes or server processes on other nodes will automatically
service the abandoned requests. Decentralized coordination leaves the underlying file system as the only single
point of failure, and this can be eliminated with file system redundancy.
- Flexible
- The software supports many-to-many client and server interaction. Client requests are serviced in the order posted
and new clients may participate at any time. Newly started server processes immediately begin to satisfy any pending
requests.
If you use the DistributePP package, like or dislike the package, have any comments, feedback, or suggestions, or if you have
found useful extensions, I would love to hear from you. Please contact me by e-mail
or through the link to my homepage above.
Terms of Use
This software is distributed freely for use by anyone for any purpose provided that:
- any modified files are documented internally to clearly indicate that they have been
modified from the original release; and
- it is recognized that no liability is assumed by Michael D. DeVore nor by Washington University
for the use or misuse of the software.
The intention of point 1 is that there not be any confusion as to the originality of the software in case modified versions
of the code are circulated. The intention of point 2 should be clear.
Operation
To use the software, server processes are started on all machines which will participate
in the computation. These processes may be started as background processes and there may be multiple server processes on any
machine. To request that one of these server processes carry out a computation, a client invokes a function similar to
MATLAB's feval() function. A description of the request is returned to the client which may carry out other processing
while waiting for the request to complete. The description may be used by the client to check the completion status of the
request, to block the client until the request is completed, and/or to retrieve the result of a completed request.
Server Processes
To start a server process, log into the machine on which the process is to execute, launch MATLAB, ensure that the
DistributePP package is in the search path, and execute the function PP_SERVER. This function accepts two parameters:
- Options
- This is a string of items which customize the behavior of this server process of the form
'opt1=val1&opt2=val2&....'.
The option names opt1, etc. refer to properties that can be
customized and the values val1, etc. specify their value. Currently two options are available:
- A name can be assigned to the process which will be used to name a server log file and in time stamps placed
in request status files.
- A polling time period that specifies how frequently the communication directory is checked for outstanding
requests. If unspecified or equal to zero, the server function will exit as soon as all outstanding requests
have been picked up by some server process.
- Communication Directory
- This is a path to a directory that will be used for communication between client and server processes. A server will
watch only a single communication directory and a client may specify the directory to which a request will be posted.
This allows the set of server processes to be partitioned arbitrarily with clients making requests of alternate classes
of servers. If unspecified, the current default directory will be used.
Server processes may be run as background jobs under Unix and Unix-like operating systems. Simply create a MATLAB script that
invokes the PP_SERVER() function and then quits, as in the script bgserver.m below:
addpath DistributePP
[LV_STATUS,LV_HOSTNAME]=unix('hostname')
LV_HOSTNAME = LV_HOSTNAME(1:(end-1));
PP_SERVER(['NAME=' LV_HOSTNAME '&PAUSE=15'],'')
quit
Then launch MATLAB as a background process taking standard input from the script and sending standard output and errors to
a log file, as:
matlab < bgserver.m >& bgserver.log &
If the machine is remote, you can use ssh to login and start the server in one step:
ssh node 'cd bgpath; matlab < bgserver.m >& `hostname`.log &'
where node is the name of the remote machine, bgpath is the location of the file
bgserver.m and the output is redirected to a log file bearing the name of the server on which it executes.
If a request results in an error, the server process checks specifically to see if it resulted from an out-of-memory
condition. If so, the request is left unsatisfied, but it is not removed from the communications directory. Another server process,
if one exists, will attempt to satisfy the request. A server process will not pick up a single request more than once.
If the request terminates with any other error, the error message is recorded and the request is assumed to be
satisfied. No other server processes will attempt to pick up the request.
Each time after satisfying a request, the server processes check to see if they should terminate. The function
PP_STOP_SERVERS() can be used to indicate that one or more server processes should exit. Specify the
communication directory and name of the server as parameters to the function. If no server name is given, all servers
using the communication directory will exit. This provides an orderly termination for the server processes. This function
operates by creating a file STOP_name.ind, where name specifies the name of the
server that should terminate or STOP_ALL.ind to indicate that all servers should terminate.
You must manually remove these files
in order to start new server processes with matching names using the same communication directory.
Client Processes
To make a request for computation to be performed by one of the server processes, a client need only call the function
PP_FEVAL() which has a syntax similar to that of the MATLAB function feval(). Along with some
options, the client specifies the name of a MATLAB function to be executed and the parameters that should be passed to the
function. PP_FEVAL() captures the state of MATLAB's search path (so the server can properly locate the
requested function), prepares a request for a server to satisfy, and returns a descriptor for the request to the client. The
descriptor can be used by the client to check on the status of the request, to block until the request is satisfied, and/or
to collect the return value of the function when it has completed. PP_FEVAL() accepts the following parameters:
- Options
- This is a string of items which customize the behavior of this server process of the form
'opt1=val1&opt2=val2&....'.
The option names opt1, etc. refer to properties that can be
customized and the values val1, etc. specify their value. Currently two options are available:
- A polling time period that specifies how frequently the request is checked to see if the request was
picked up by some server process. If this value is nonzero,
PP_FEVAL will not return control
to the calling procedure until the processing of the request by a server has begun (it does not wait
until the request has completed). Otherwise PP_FEVAL() returns immediately after posting the
request. This feature can be used to ensure that a communications directory does not get filled with
extremely large numbers of outstanding requests.
- A name that will be used as a prefix on all communication files associated with this request. This is provided
for convenience only and enables one to see which request files are associated with which clients.
By default, a prefix
PP_ is used.
- Communication Directory
- This is a path to a directory used for communication between client and server processes. If unspecified, the
current default directory will be used.
- Function Name
- This is the name of the function to be evaluated by a server process. This parameter may refer to any MATLAB function
that is visible in the context of the client, including built-in, external, user-supplied, and compiled functions.
The only requirement is that the function yield a single return value.
- Arguments
- The remaining arguments to
PP_FEVAL() are passed unmodified to the requested function when it is
invoked by a server process.
As an example, the following function call initiates a request that the sum of integers from 1 to 10 be computed and
does not wait for processing to begin before returning.:
>> myDescriptor = PP_FEVAL('','','sum',1:10);
The current status of one or more requests can be checked with the function PP_GET_STATUS() which returns the value
1 or 0 for each descriptor indicating whether or not it has completed, the name of the server process which picked up the
request, and the date and time processing began. For example,
>> [myStatus,myNode,myTime]=PP_GET_STATUS(myDescriptor)
myStatus =
1
myNode =
'NODE_1'
myTime =
'02-Feb-2002 12:35:21'
The result of computation can be retrieved when processing is completed by calling PP_GET_RESULTS() with a list of
descriptors. An optional polling time parameter can be specified which dictates how often the routines checks for completion
of the requests. If specified, the routine will not exit until all requests have been completed. Otherwise the routine
returns computation results from all requests that have been satisfied. For each descriptor, PP_GET_RESULTS()
returns a 1 or 0 indicating whether or not the request has been completed, the return value from the requested function,
and a text string indicating the nature of any errors that may have occured during processing. For example,
>> [myStatus,myResult,myError]=PP_GET_RESULTS(myDescriptor)
myStatus =
1
myResult =
[55]
myError =
{''}
Communication Files
A list of all files placed in the communications directory is given below:
- prefix
.seq
- These files contain the state of various numeric sequence generators. These generators are used to ensure that
unique names are generated for server processes and processing requests.
- prefix_seq
.req
- A file indicating the presence of a processing request made by a client. The prefix prefix is taken from an
optional argument to
PP_FEVAL() and the sequence number seq is generated to result in a
unique request name. This file is locked by the server which
is currently processing the request and is deleted by the server when processing is successfully completed. In future
versions, this file may be used by the client to contain a list of specifications a server must be able to meet (minimum
memory requirements or processor type, etc.). Any server session which cannot meet those specifications will not pick up
the request.
- prefix_seq
.mat
- A MATLAB data file which contains the name of the function to be evaluated, the parameters to that function,
the search path of the requesting client, and any specified options. When processing of the request is complete, the
server process records in this file the function result and any error messages generated while processing.
- prefix_seq
.prc
- An indicator file created by a server process indicating that the request has been picked up and that processing
has commenced. It is deleted by the client when the processing results are captured.
- servername
.log
- A log file with the processing history of the respective server process.
Limitations
- Synchronization is accomplished through advisory file locking supported through NFS. Currently, the package has been tested
only under the Sun Solaris and Silicon Graphics operating systems. There are three source files which attempt such
synchronization: PP_OPEN_LOCK.c, PP_CLOSE_LOCK.c, and PP_SEQUENCE.c.
- At this time, the routine to be executed by a server process must return exactly one return value or an error will result.
- At this time, no individual server process can be targeted by a client to receive a processing request. Server processes
collect requests on a first-come, first-serve basis.
- Other than the search path, the context of the client process is not preserved by the server. Thus, global variables
visible in the client process are not available to the computation executed in parallel. Further, functions such as
evalin() to query the context of the calling routine will not work as expected.
Download
The DistributePP package is distributed as a .zip file which can be expanded with unzip.
It has compiled output from the C programs for the Sun Solaris and SGI platforms. If you need to build for an alternate
platform, execute the cshell script BUILD_DistributePP. To use, simply place the DistributePP directory
in your MATLAB search path.
Click here to download DistributePP