Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
857 views
in Technique[技术] by (71.8m points)

c++ - MPI-3 Shared Memory for Array Struct

I have a simple C++ struct that basically wraps a standard C array:

struct MyArray {
    T* data;
    int length;
    // ...
}

where T is a numeric type like float or double. length is the number of elements in the array. Typically my arrays are very large (tens of thousands up to tens of millions of elements).

I have an MPI program where I would like to expose two instances of MyArray, say a_old and a_new, as shared memory objects via MPI 3 shared memory. The context is that each MPI rank reads from a_old. Then, each MPI rank writes to certain indices of a_new (each rank only writes to its own set of indices - no overlap). Finally, a_old = a_new must be set on all ranks. a_old and a_new are the same size. Right now I'm making my code work by syncing (Isend/Irecv) each rank's updated values with other ranks. However, due to the data access pattern, there's no reason I need to incur the overhead of message passing and could instead have one shared memory object and just put a barrier before a_old = a_new. I think this would give me better performance (though please correct me if I'm wrong).

I have had trouble finding complete code examples of doing shared memory with MPI 3. Most sites only provide reference documentation or incomplete snippets. Could someone walk me through a simple and complete code example that does the sort of thing I'm trying to achieve (updating and syncing a numeric array via MPI shared memory)? I understand the main concepts of creating shared memory communicators and windows, setting fences, etc., but it would really help my understanding to see one example that puts it all together.

Also, I should mention that I'll only be running my code on one node, so I don't need to worry about needing multiple copies of my shared-memory object across nodes; I just need one copy of my data for the single node on which my MPI processes are running. Despite this, other solutions like OpenMP aren't feasible for me in this case, since I have a ton of MPI code and can't rewrite everything for the sake of one or two arrays I'd like to share.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Using shared memory with MPI-3 is relatively simple.

First, you allocate the shared memory window using MPI_Win_allocate_shared:

MPI_Win win;
MPI_Aint size;
void *baseptr;

if (rank == 0)
{
   size = 2 * ARRAY_LEN * sizeof(T);
   MPI_Win_allocate_shared(size, sizeof(T), MPI_INFO_NULL,
                           MPI_COMM_WORLD, &baseptr, &win);
}
else
{
   int disp_unit;
   MPI_Win_allocate_shared(0, sizeof(T), MPI_INFO_NULL,
                           MPI_COMM_WORLD, &baseptr, &win);
   MPI_Win_shared_query(win, 0, &size, &disp_unit, &baseptr);
}
a_old.data = baseptr;
a_old.length = ARRAY_LEN;
a_new.data = a_old.data + ARRAY_LEN;
a_new.length = ARRAY_LEN;

Here, only rank 0 allocates memory. It doesn't really matter which process allocates it as it is shared. It is even possible to have each process allocate a portion of the memory, but since by the default the allocation is contiguous, both methods are equivalent. MPI_Win_shared_query is then used by all other processes to find out the location in their virtual address space of the beginning of the shared memory block. That address might vary among the ranks and therefore one should not pass around absolute pointers.

You can now simply load from and store into a_old.data respectively a_new.data. As the ranks in your case work on disjoint sets of memory locations, you don't really need to lock the window. Use window locks to implement e.g. protected initialisation of a_old or other operations that require synchronisation. You might also need to explicitly tell the compiler not to reorder the code and to emit a memory fence in order to have all outstanding load/store operations finished before e.g. you call MPI_Barrier().

The a_old = a_new code suggests copying one array onto the other. Instead, you could simply swap the data pointers and eventually the size fields. Since only the data of the array is in the shared memory block, swapping the pointers is a local operation, i.e. no synchronisation needed. Assuming that both arrays are of equal length:

T *temp;
temp = a_old.data;
a_old.data = a_new.data;
a_new.data = temp;

You still need a barrier to make sure that all other processes have finished processing before continuing further.

At the very end, simply free the window:

MPI_Win_free(&win);

A complete example (in C) follows:

#include <stdio.h>
#include <mpi.h>

#define ARRAY_LEN 1000

int main (void)
{
   MPI_Init(NULL, NULL);

   int rank, nproc;
   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
   MPI_Comm_size(MPI_COMM_WORLD, &nproc);

   MPI_Win win;
   MPI_Aint size;
   void *baseptr;

   if (rank == 0)
   {
      size = ARRAY_LEN * sizeof(float);
      MPI_Win_allocate_shared(size, sizeof(int), MPI_INFO_NULL,
                              MPI_COMM_WORLD, &baseptr, &win);
   }
   else
   {
      int disp_unit;
      MPI_Win_allocate_shared(0, sizeof(int), MPI_INFO_NULL,
                              MPI_COMM_WORLD, &baseptr, &win);
      MPI_Win_shared_query(win, 0, &size, &disp_unit, &baseptr);
   }

   printf("Rank %d, baseptr = %p
", rank, baseptr);

   int *arr = baseptr;
   for (int i = rank; i < ARRAY_LEN; i += nproc)
     arr[i] = rank;

   MPI_Barrier(MPI_COMM_WORLD);

   if (rank == 0)
   {
      for (int i = 0; i < 10; i++)
         printf("%4d", arr[i]);
      printf("
");
   }

   MPI_Win_free(&win);

   MPI_Finalize();
   return 0;
}

Disclaimer: Take this with a grain of salt. My understanding of MPI's RMA is still quite weak.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...