This is very similar to this question: How to MPI_Gatherv columns from processor, where each process may send different number of columns . The issue is that columns aren't contiguous in memory, so you have to play around.
As is always the case in C, lacking real multidimensional arrays, you have to be a little careful about memory layout. I believe in C it's the case that a statically-declared array like
float a[nrows][ncols]
will be contiguous in memory, so you should be alright for now. However, be aware that as soon as you go to dynamic allocation, this will no longer be the case; you'd have to allocate all the data at once to make sure that you get contiguous data, eg
float **floatalloc2d(int n, int m) {
float *data = (float *)malloc(n*m*sizeof(float));
float **array = (float **)calloc(n*sizeof(float *));
for (int i=0; i<n; i++)
array[i] = &(data[i*m]);
return array;
}
float floatfree2d(float **array) {
free(array[0]);
free(array);
return;
}
/* ... */
float **a;
nrows = 3;
ncols = 2;
a = floatalloc2d(nrows,ncols);
but I think you're ok for now.
Now that you have your 2d array one way or another, you have to create your type. The type you've described is fine if you are just sending one column; but the trick here is that if you're sending multiple columns, each column starts only one float past the start of the previous one, even though the column itself spans almost the whole array! So you need to move the upper bound of the type for this to work:
MPI_Datatype col, coltype;
MPI_Type_vector(nrows,
1,
ncols,
MPI_FLOAT,
&col);
MPI_Type_commit(&col);
MPI_Type_create_resized(col, 0, 1*sizeof(float), &coltype);
MPI_Type_commit(&coltype);
will do what you want. NOTE that the receiving processes will have different types than the sending process, because they are storing a smaller number of columns; so the stride between elements is smaller.
Finally, you can now do your scatter,
MPI_Comm_size(MPI_COMM_WORLD,&size);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
if (rank == 0) {
a = floatalloc2d(nrows,ncols);
sendptr = &(a[0][0]);
} else {
sendptr = NULL;
}
int ncolsperproc = ncols/size; /* we're assuming this divides evenly */
b = floatalloc(nrows, ncolsperproc);
MPI_Datatype acol, acoltype, bcol, bcoltype;
if (rank == 0) {
MPI_Type_vector(nrows,
1,
ncols,
MPI_FLOAT,
&acol);
MPI_Type_commit(&acol);
MPI_Type_create_resized(acol, 0, 1*sizeof(float), &acoltype);
}
MPI_Type_vector(nrows,
1,
ncolsperproc,
MPI_FLOAT,
&bcol);
MPI_Type_commit(&bcol);
MPI_Type_create_resized(bcol, 0, 1*sizeof(float), &bcoltype);
MPI_Type_commit(&bcoltype);
MPI_Scatter (sendptr, ncolsperproc, acoltype, &(b[0][0]), ncolsperproc, bcoltype, 0, MPI_COMM_WORLD);