Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
612 views
in Technique[技术] by (71.8m points)

python - numpy.ma (masked) array mean method has inconsitent return type

I noticed that the numpy masked-array mean method returns different types when it probably should not:

import numpy as np

A = np.ma.masked_equal([1,1,0], value=0)
B = np.ma.masked_equal([1,1,1], value=0) # no masked values

type(A.mean())
#numpy.float64
type(B.mean())
#numpy.ma.core.MaskedArray

Other numpy.ma.core.MaskedArray methods seem to be consistent

type( A.sum()) == type(B.sum())
# True
type( A.prod()) == type(B.prod())
# True
type( A.std()) == type(B.std())
# True
type( A.mean()) == type(B.mean())
# False

Can someone explain this?

UPDATE: As pointed out in the comments

C = np.ma.masked_array([1, 1, 1], mask=[False, False, False])
type(C.mean()) == type(A.mean())
# True 
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

B.mask starts with:

    if self._mask is nomask:
        result = super(MaskedArray, self).mean(axis=axis, dtype=dtype)

np.ma.nomask is False.

This is the case for your B:

masked_array(data = [1 1 1],
             mask = False,
       fill_value = 0)

For A the mask is an array that matches the data in size. In B it is a scalar, False, and mean is handling that as a special case.

I need to dig a bit more to see what this implies.

In [127]: np.mean(B)
Out[127]: 
masked_array(data = 1.0,
             mask = False,
       fill_value = 0)

In [141]: super(np.ma.MaskedArray,B).mean()
Out[141]: 
masked_array(data = 1.0,
             mask = False,
       fill_value = 0)

I'm not sure that helps; there's some circular referencing between np.ndarray methods and the np function and the np.ma methods, that makes it hard to identify exactly what code is being used. It like it is using the compiled mean method, but it isn't obvious how that handles the masking.

I wonder if the intent is to use

 np.mean(B.data) # or
 B.data.mean()

and the super method fetch isn't the right approach.

In any case, the same array, but with a vector mask returns the scalar.

In [132]: C
Out[132]: 
masked_array(data = [1 1 1],
             mask = [False False False],
       fill_value = 0)

In [133]: C.mean()
Out[133]: 1.0

====================

Trying this method without the nomask shortcut, raises an error after

        dsum = self.sum(axis=axis, dtype=dtype)
        cnt = self.count(axis=axis)
        if cnt.shape == () and (cnt == 0):
            result = masked
        else:
            result = dsum * 1. / cnt

self.count returns a scalar in the nomask case, but a np.int32 in the regular masking. So the cnt.shape chokes.

trace is the only other masked method that tries this super(MaskedArray...) 'shortcut'. There's clearly something kludgy about the mean code.

====================

Relevant bug issue: https://github.com/numpy/numpy/issues/5769

According to that the same question was raised here last year: Testing equivalence of means of Numpy MaskedArray instances raises attribute error

Looks like there are a lot of masking issues, not just with mean. There may be fixes in the development master now, or in the near future.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...