Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
360 views
in Technique[技术] by (71.8m points)

python - Cython interfaced with C++: segmentation fault for large arrays

I am transferring my code from Python/C interfaced using ctypes to Python/C++ interfaced using Cython. The new interface will give me an easier to maintain code, because I can exploit all the C++ features and need relatively few lines of interface-code.

The interfaced code works perfectly with small arrays. However it encounters a segmentation fault when using large arrays. I have been wrapping my head around this problem, but have not gotten any closer to a solution. I have included a minimal example in which the segmentation fault occurs. Please note that it consistently occurs on Linux and Mac, and also valgrind did not give insights. Also note that the exact same example in pure C++ does work without problems.

The example contains (part of) a Sparse matrix class in C++. An interface is created in Cython. As a result the class can be used from Python.

C++ side

sparse.h

#ifndef SPARSE_H
#define SPARSE_H

#include <iostream>
#include <cstdio>

using namespace std;

class Sparse {

  public:
    int* data;
    int  nnz;

    Sparse();
    ~Sparse();
    Sparse(int* data, int nnz);
    void view(void);

};

#endif

sparse.cpp

#include "sparse.h"

Sparse::Sparse()
{
  data = NULL;
  nnz  = 0   ;
}

Sparse::~Sparse() {}

Sparse::Sparse(int* Data, int NNZ)
{
  nnz  = NNZ ;
  data = Data;
}

void Sparse::view(void)
{

  int i;

  for ( i=0 ; i<nnz ; i++ )
    printf("(%3d) %d
",i,data[i]);

}

Cython interface

csparse.pyx

import  numpy as np
cimport numpy as np

# UNCOMMENT TO FIX
#from cpython cimport Py_INCREF

cdef extern from "sparse.h":
  cdef cppclass Sparse:
    Sparse(int*, int) except +
    int* data
    int  nnz
    void view()


cdef class PySparse:

  cdef Sparse *ptr

  def __cinit__(self,**kwargs):

    cdef np.ndarray[np.int32_t, ndim=1, mode="c"] data

    data = kwargs['data'].astype(np.int32)

    # UNCOMMENT TO FIX
    #Py_INCREF(data)

    self.ptr = new Sparse(
      <int*> data.data if data is not None else NULL,
      data.shape[0],
    )

  def __dealloc__(self):
    del self.ptr

  def view(self):
    self.ptr.view()

setup.py

from distutils.core import setup, Extension
from Cython.Build   import cythonize

setup(ext_modules = cythonize(Extension(
  "csparse",
  sources=["csparse.pyx", "sparse.cpp"],
  language="c++",
)))

Python side

import numpy as np
import csparse

data = np.arange(100000,dtype='int32')

matrix = csparse.PySparse(
  data = data
)

matrix.view() # --> segmentation fault

To run:

$ python setup.py build_ext --inplace
$ python example.py

Note that data = np.arange(100,dtype='int32') does work.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The memory is being managed by your numpy arrays. As soon as they go out of scope (most likely at the end of the PySparse constructor) the arrays cease to exist, and all your pointers are invalid. This applies to both large and small arrays, but presumably you just get lucky with small arrays.

You need to hold a reference to all the numpy arrays you use for the lifetime of your PySparse object:

cdef class PySparse:

  # ----------------------------------------------------------------------------

  cdef Sparse *ptr
  cdef object _held_reference # added

  # ----------------------------------------------------------------------------

  def __cinit__(self,**kwargs):
      # ....
      # your constructor code code goes here, unchanged...
      # ....

      self._held_reference = [data] # add any other numpy arrays you use to this list

As a rule you need to be thinking quite hard about who owns what whenever you're dealing with C/C++ pointers, which is a big change from the normal Python approach. Getting a pointer from a numpy array does not copy the data and it does not give numpy any indication that you're still using the data.


Edit note: In my original version I tried to use locals() as a quick way of gathering a collection of all the arrays I wanted to keep. Unfortunately, that doesn't seem to include to cdefed arrays so it didn't manage to keep the ones you were actually using (note here that astype() makes a copy unless you tell it otherwise, so you need to hold the reference to the copy, rather than the original passed in as an argument).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...