Using shared memory with MPI-3 is relatively simple.
First, you allocate the shared memory window using MPI_Win_allocate_shared
:
MPI_Win win;
MPI_Aint size;
void *baseptr;
if (rank == 0)
{
size = 2 * ARRAY_LEN * sizeof(T);
MPI_Win_allocate_shared(size, sizeof(T), MPI_INFO_NULL,
MPI_COMM_WORLD, &baseptr, &win);
}
else
{
int disp_unit;
MPI_Win_allocate_shared(0, sizeof(T), MPI_INFO_NULL,
MPI_COMM_WORLD, &baseptr, &win);
MPI_Win_shared_query(win, 0, &size, &disp_unit, &baseptr);
}
a_old.data = baseptr;
a_old.length = ARRAY_LEN;
a_new.data = a_old.data + ARRAY_LEN;
a_new.length = ARRAY_LEN;
Here, only rank 0 allocates memory. It doesn't really matter which process allocates it as it is shared. It is even possible to have each process allocate a portion of the memory, but since by the default the allocation is contiguous, both methods are equivalent. MPI_Win_shared_query
is then used by all other processes to find out the location in their virtual address space of the beginning of the shared memory block. That address might vary among the ranks and therefore one should not pass around absolute pointers.
You can now simply load from and store into a_old.data
respectively a_new.data
. As the ranks in your case work on disjoint sets of memory locations, you don't really need to lock the window. Use window locks to implement e.g. protected initialisation of a_old
or other operations that require synchronisation. You might also need to explicitly tell the compiler not to reorder the code and to emit a memory fence in order to have all outstanding load/store operations finished before e.g. you call MPI_Barrier()
.
The a_old = a_new
code suggests copying one array onto the other. Instead, you could simply swap the data pointers and eventually the size fields. Since only the data of the array is in the shared memory block, swapping the pointers is a local operation, i.e. no synchronisation needed. Assuming that both arrays are of equal length:
T *temp;
temp = a_old.data;
a_old.data = a_new.data;
a_new.data = temp;
You still need a barrier to make sure that all other processes have finished processing before continuing further.
At the very end, simply free the window:
MPI_Win_free(&win);
A complete example (in C) follows:
#include <stdio.h>
#include <mpi.h>
#define ARRAY_LEN 1000
int main (void)
{
MPI_Init(NULL, NULL);
int rank, nproc;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &nproc);
MPI_Win win;
MPI_Aint size;
void *baseptr;
if (rank == 0)
{
size = ARRAY_LEN * sizeof(float);
MPI_Win_allocate_shared(size, sizeof(int), MPI_INFO_NULL,
MPI_COMM_WORLD, &baseptr, &win);
}
else
{
int disp_unit;
MPI_Win_allocate_shared(0, sizeof(int), MPI_INFO_NULL,
MPI_COMM_WORLD, &baseptr, &win);
MPI_Win_shared_query(win, 0, &size, &disp_unit, &baseptr);
}
printf("Rank %d, baseptr = %p
", rank, baseptr);
int *arr = baseptr;
for (int i = rank; i < ARRAY_LEN; i += nproc)
arr[i] = rank;
MPI_Barrier(MPI_COMM_WORLD);
if (rank == 0)
{
for (int i = 0; i < 10; i++)
printf("%4d", arr[i]);
printf("
");
}
MPI_Win_free(&win);
MPI_Finalize();
return 0;
}
Disclaimer: Take this with a grain of salt. My understanding of MPI's RMA is still quite weak.