Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
77 views
in Technique[技术] by (71.8m points)

c++ - 1D or 2D array, what's faster?

I'm in need of representing a 2D field (axes x, y) and I face a problem: Should I use an 1D array or a 2D array?

I can imagine, that recalculating indices for 1D arrays (y + x*n) could be slower than using 2D array (x, y) but I could image that 1D could be in CPU cache..

I did some googling, but only found pages regarding static array (and stating that 1D and 2D are basically the same). But my arrays must me dynamic.

So, what's

  1. faster,
  2. smaller (RAM)

dynamic 1D arrays or dynamic 2D arrays?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

tl;dr : You should probably use a one-dimensional approach.

Note: One cannot dig into detail affecting performance when comparing dynamic 1d or dynamic 2d storage patterns without filling books since the performance of code is dependent one a very large number of parameters. Profile if possible.

1. What's faster?

For dense matrices the 1D approach is likely to be faster since it offers better memory locality and less allocation and deallocation overhead.

2. What's smaller?

Dynamic-1D consumes less memory than the 2D approach. The latter also requires more allocations.

Remarks

I laid out a pretty long answer beneath with several reasons but I want to make some remarks on your assumptions first.

I can imagine, that recalculating indices for 1D arrays (y + x*n) could be slower than using 2D array (x, y)

Let's compare these two functions:

int get_2d (int **p, int r, int c) { return p[r][c]; }
int get_1d (int *p, int r, int c)  { return p[c + C*r]; }

The (non-inlined) assembly generated by Visual Studio 2015 RC for those functions (with optimizations turned on) is:

?get_1d@@YAHPAHII@Z PROC
push    ebp
mov ebp, esp
mov eax, DWORD PTR _c$[ebp]
lea eax, DWORD PTR [eax+edx*4]
mov eax, DWORD PTR [ecx+eax*4]
pop ebp
ret 0

?get_2d@@YAHPAPAHII@Z PROC
push ebp
mov ebp, esp
mov ecx, DWORD PTR [ecx+edx*4]
mov eax, DWORD PTR _c$[ebp]
mov eax, DWORD PTR [ecx+eax*4]
pop ebp
ret 0

The difference is mov (2d) vs. lea (1d). The former has a latency of 3 cycles and a a maximum throughput of 2 per cycle while the latter has a latency of 2 cycles and a maximum throughput of 3 per cycle. (According to Instruction tables - Agner Fog Since the differences are minor, I think there should not be a big performance difference arising from index recalculation. I expect it to be very unlikely to identify this difference itself to be the bottleneck in any program.

This brings us to the next (and more interesting) point:

... but I could image that 1D could be in CPU cache ...

True, but 2d could be in CPU cache, too. See The Downsides: Memory locality for an explanation why 1d is still better.

The long answer, or why dynamic 2 dimensional data storage (pointer-to-pointer or vector-of-vector) is "bad" for simple / small matrices.

Note: This is about dynamic arrays/allocation schemes [malloc/new/vector etc.]. A static 2d array is a contiguous block of memory and therefore not subject to the downsides I'm going to present here.

The Problem

To be able to understand why a dynamic array of dynamic arrays or a vector of vectors is most likely not the data storage pattern of choice, you are required to understand the memory layout of such structures.

Example case using pointer to pointer syntax

int main (void)
{
    // allocate memory for 4x4 integers; quick & dirty
    int ** p = new int*[4];
    for (size_t i=0; i<4; ++i) p[i] = new int[4]; 

    // do some stuff here, using p[x][y] 

    // deallocate memory
    for (size_t i=0; i<4; ++i) delete[] p[i];
    delete[] p;
}

The downsides

Memory locality

For this “matrix” you allocate one block of four pointers and four blocks of four integers. All of the allocations are unrelated and can therefore result in an arbitrary memory position.

The following image will give you an idea of how the memory may look like.

For the real 2d case:

  • The violet square is the memory position occupied by p itself.
  • The green squares assemble the memory region p points to (4 x int*).
  • The 4 regions of 4 contiguous blue squares are the ones pointed to by each int* of the green region

For the 2d mapped on 1d case:

  • The green square is the only required pointer int *
  • The blue squares ensemble the memory region for all matrix elements (16 x int).

Real 2d vs mapped 2d memory layout

This means that (when using the left layout) you will probably observe worse performance than for a contiguous storage pattern (as seen on the right), due to caching for instance.

Let's say a cache line is "the amount of data transfered into the cache at once" and let's imagine a program accessing the whole matrix one element after another.

If you have a properly aligned 4 times 4 matrix of 32 bit values, a processor with a 64 byte cache line (typical value) is able to "one-shot" the data (4*4*4 = 64 bytes). If you start processing and the data isn't already in the cache you'll face a cache miss and the data will be fetched from main memory. This load can fetch the whole matrix at once since it fits into a cache line, if and only if it is contiguously stored (and properly aligned). There will probably not be any more misses while processing that data.

In case of a dynamic, "real two-dimensional" system with unrelated locations of each row/column, the processor needs to load every memory location seperately. Eventhough only 64 bytes are required, loading 4 cache lines for 4 unrelated memory positions would -in a worst case scenario- actually transfer 256 bytes and waste 75% throughput bandwidth. If you process the data using the 2d-scheme you'll again (if not already cached) face a cache miss on the first element. But now, only the first row/colum will be in the cache after the first load from main memory because all other rows are located somewhere else in memory and not adjacent to the first one. As soon as you reach a new row/column there will again be a cache miss and the next load from main memory is performed.

Long story short: The 2d pattern has a higher chance of cache misses with the 1d scheme offering better potential for performance due to locality of the data.

Frequent Allocation / Deallocation

  • As many as N + 1 (4 + 1 = 5) allocations (using either new, malloc, allocator::allocate or whatever) are necessary to create the desired NxM (4×4) matrix.
  • The same number of proper, respective deallocation operations must be applied as well.

Therefore, it is more costly to create/copy such matrices in contrast to a single allocation scheme.

This is getting even worse with a growing number of rows.

Memory consumption overhead

I'll asumme a size of 32 bits for int and 32 bits for pointers. (Note: System dependency.)

Let's remember: We want to store a 4×4 int matrix which means 64 bytes.

For a NxM matrix, stored with the presented pointer-to-pointer scheme we consume

  • N*M*sizeof(int) [the actual blue data] +
  • N*sizeof(int*) [the green pointers] +
  • sizeof(int**) [the violet variable p] bytes.

That makes 4*4*4 + 4*4 + 4 = 84 bytes in case of the present example and it gets even worse when using std::vector<std::vector<int>>. It will require N * M * sizeof(int) + N * sizeof(vector<int>) + sizeof(vector<vector<int>>) bytes, that is 4*4*4 + 4*16 + 16 = 144 bytes in total, intead of 64 bytes for 4 x 4 int.

In addition -depending on the used allocator- each single allocation may well (and most likely will) have another 16 bytes of memory overhead. (Some “Infobytes” which store the number of allocated bytes for the purpose of proper deallocation.)

This means the worst case is:

N*(16+M*sizeof(int)) + 16+N*sizeof(int*) + sizeof(int**)
= 4*(16+4*4) + 16+4*4 + 4 = 164 bytes ! _Overhead: 156%_

The share of the overhead will reduce as the size of the matrix grows but will still be present.

Risk of memory leaks

The bunch of allocations requires an appropriate exception handling in order to avoid memory leaks if one of the allocations will fail! You’ll need to keep track of allocated memory blocks and you must not forget them when deallocating the memory.

If new runs of of memory and the next row cannot be allocated (especially likely when the matrix is very large), a std::bad_alloc is thrown by new.

Example:

In the above mentioned new/delete example, we'll face some more code if we want to avoid leaks in case of bad_alloc exceptions.

  // allocate memory for 4x4 integers; quick & dirty
  size_t const N = 4;
  // we don't need try for this allocation
  // if it fails there is no leak
  int ** p = new int*[N];
  size_t allocs(0U);
  try 
  { // try block doing further allocations
    for (size_t i=0; i<N; ++i) 
    {
      p[i] = new int[4]; // allocate
      ++allocs; // advance counter if no exception occured
    }
  }
  catch (std::bad_alloc & be)
  { // if an exception occurs we need to free out memory
    for (size_t i=0; i<allocs; ++i) delete[] p[i]; // free all alloced p[i]s
    delete[] p; // free p
    throw; // rethrow bad_alloc
  }
  /*
     do some stuff here, using p[x][y] 
  */
  // deallocate memory accoding to the number of allocations
  for (size_t i=0; i<allocs; ++i) delete[] p[i];
  delete[] p;

Summary

There are cases where "real 2d" memory layouts fit and make sense (i.e. if the number of columns per row is not constant) but in the most simple and common 2D data storage cases they just bloat the complexity of your code and reduce the performance and memory efficiency of your program.

Alternative

You should use a contiguous block of memory and map your rows onto that block.

The "C++ way" of doing it is probably to write a class that manages your memory while considering important things like

Example

To provide an idea of how such a class may look like, here's a simple example with some basic features:

  • 2d-size-constructible
  • 2d-resizable
  • operator(size_t, size_t) for 2d- row major element access
  • at(size_t, size_t) for checked 2d-row major element access
  • Fulfills Concept requirements for Container

Source:

#include <vector>
#include <algorithm>
#include <iterator>
#include <utility>

namespace matrices
{

  template<class T>
  class simple
  {
  public:
    // misc types
    using data_type  = std::vector<T>;
    using value_type = typename std::vector<T>::value_type;
    using size_type  = typename 

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...