Working with MPI
Distributed memory parallelism in Tamaas is implemented with MPI. Due to the bottleneck role of the fast-Fourier
transform in Tamaas’ core routines, the data layout of Tamaas is that of FFTW.
Tamaas is somewhat affected by limitations of FFTW, and MPI only works on
systems with a 2D boundary, i.e. basic_2d
, surface_2d
and volume_2d
model types (which are the most important anyways, since rough contact mechanics
can yield different scaling laws in 1D).
MPI support in Tamaas is still experimental, but the following parts are tested:
Rough surface generation
Surface statistics computation
Elastic normal contact
Elastic-plastic contact (with
DFSANECXXSolver
)Dumping models with
H5Dumper
andNetCDFDumper
.
Tip
One can look at examples/plasticity.py
for a full example of an
elastic-plastic contact simulation that can run in MPI.
Transparent MPI context
Some parts of Tamaas work transparently with MPI and no additional work or logic is needed.
Warning
MPI_Init()
is automatically called when importing the
tamaas
module in Python. While this works transparently most of the time,
in some situations, e.g. in Singularity containers, the program can hang if
tamaas
is imported first. It is therefore advised to run from mpi4py
import MPI
before import tamaas
to avoid issues.
Creating a model
The following snippet creates a model whose global shape is [16, 2048, 2048]
:
import tamaas as tm
model = tm.Model(tm.model_type.volume_2d,
[0.1, 1, 1], [16, 2048, 2048])
print(model.shape, model.global_shape)
Running this code with mpirun -np 3
will print the following (not necessarily
in this order):
[16, 683, 2048] [16, 2048, 2048]
[16, 682, 2048] [16, 2048, 2048]
[16, 683, 2048] [16, 2048, 2048]
Note that the partitioning occurs on the x dimension of the model (see below for more information on the data layout imposed by FFTW).
Creating a rough surface
Similarly, rough surface generators expect a global shape and return the partionned data:
iso = tm.Isopowerlaw2D()
iso.q0, iso.q1, iso.q2, iso.hurst = 4, 4, 32, .5
gen = tm.SurfaceGeneratorRandomPhase2D([2048, 2048])
gen.spectrum = iso
surface = gen.buildSurface()
print(surface.shape, tm.mpi.global_shape(surface.shape))
With mpirun -np 3
this should print:
(682, 2048) [2048, 2048]
(683, 2048) [2048, 2048]
(683, 2048) [2048, 2048]
Handling partitioning edge cases
Under certain conditions, FFTW may assign to one or more processes a size of zero to the x dimension of the model. If that happens, the surface generator will raise a runtime error, which causes a deadlock because it does not exit the processes with zero data. The correct way to handle this edge case is:
from mpi4py import MPI
try:
gen = tm.SurfaceGeneratorRandomPhase2D([128, 128])
except RuntimeError as e:
print(e)
MPI.COMM_WORLD.Abort(1)
This will correctly kill all processes. Alternatively, os._exit()
can be used,
but one should avoid sys.exit()
, as it kills the process by raising an
exception, which still results in a deadlock.
Computing statistics
With a model’s data distributed among independent process, computing global
properties, like the true contact area, must be done in a collective fashion.
This is transparently handled by the Statistics
class, e.g. with:
contact = tm.Statistics2D.contact(model.traction)
This gives the correct contact fraction, whereas something like
np.mean(model.traction > 0)
will give a different result on each processor.
Nonlinear solvers
The only nonlinear solver (for plastic contact) that works with MPI is
DFSANECXXSolver
, which is
a C++ implementation of DFSANESolver
that works in an MPI context.
Note
Scipy and Numpy use optimized BLAS routines for array operations, while Tamaas does not, which results in serial performance of the C++ implementation of the DF-SANE algorithm being lower than the Scipy version.
Dumping models
The only dumpers that properly works in MPI are the H5Dumper
and NetCDFDumper
. Output is then as simple as:
from tamaas.dumpers import H5Dumper
H5Dumper('output', all_fields=True) << model
This is useful for doing post-processing separately from the main simulation: the post-processing can then be done in serial.
MPI convenience methods
Not every use case can be handled transparently, but although adapting existing
scripts to work in an MPI context can require some work, especially if said
scripts rely on numpy and scipy for pre- and post-processing (e.g. constructing
a parabolic surface for hertzian contact, computing the total contact area), the
module mpi
provides some convenience functions to
make that task easier. The functions mpi.scatter
and mpi.gather
can be used to scatter/gather 2D data using the
partitioning scheme expected from FFTW (see figure below). The functions
mpi.rank
and mpi.size
are used to determine the local process rank and the
total number of processes respectively.
If finer control is needed, the function mpi.local_shape
gives the 2D shape of the local data if given
the global 2D shape (its counterpart mpi.global_shape
does the exact opposite), while
mpi.local_offset
gives the offset
of the local data in the global \(x\) dimension. These two functions mirror
FFTW’s own data distribution functions.
2D Data distribution scheme from FFTW. N0
and N1
are the
number of points in the \(x\) and \(y\) directions
respectively. The array local_N0
, indexed by the process rank,
give the local size of the \(x\) dimension. The
local_offset
function
gives the offset in \(x\) for each process rank.
The mpi
module also contains a function
sequential
whose return value is
meant to be used as a context manager. Within the sequential context the default
communicator is MPI_COMM_SELF
instead of MPI_COMM_WORLD
.
For other MPI functionality not covered by Tamaas that may be required, one can
use mpi4py, which in conjunction with the
methods in mpi
should handle just about any use
case.