I'm having unwanted behaviour come out of np.vectorize
, namely, it changes the datatype of the argument going into the original function. My original question is about the general case, and I'll use this new question to ask a more specific case.
(Why this second question? I've created this question about a more specific case in order to illustrate the problem - it's always easier to go from the specific to the more general. And I've created this question seperately, because I think it's useful to keep the general case, as well as a general answer to it (should one be found), by themselves and not 'contaminated' with thinking about solving any particular problem.)
So, a concrete example. Where I live, Wednesday is Lottery Day. So, let's start with a pandas
dataframe with a date column with all Wednesdays this year:
df = pd.DataFrame({'date': pd.date_range('2020-01-01', freq='7D', periods=53)})
I want to see which of these possible days I'll actually play on. I don't feel particularly lucky at the beginning and end of each month, and there are some months I feel especially unlucky about. Therefore I use this function to see if a date qualifies:
def qualifies(dt, excluded_months = []):
#Date qualifies, if...
#. it's on or after the 5th of the month; and
#. at least 5 days remain till the end of the month (incl. date itself); and
#. it's not in one of the months in excluded_months.
if dt.day < 5:
return False
if (dt + pd.tseries.offsets.MonthBegin(1) - dt).days < 5:
return False
if dt.month in excluded_months:
return False
return True
I hope you realise that this example is still somewhat contrived ;) But it's closer to what I'm trying to do. I try to apply this function in two ways:
df['qualifies1'] = df['date'].apply(lambda x: qualifies(x, [3, 8]))
df['qualifies2'] = np.vectorize(qualifies, excluded=[1])(df['date'], [3, 8])
As far as I know, both should work, and I'd prefer the latter, as the former is slow and frowned upon. Edit: I've learned that also the first is frowned upon lol.
However, only the first one succeeds, the second one fails with an AttributeError: 'numpy.datetime64' object has no attribute 'day'
. And so my question is, if there is a way to use np.vectorize
on this function qualifies
, which takes a datetime/timestamp as an argument.
Many thanks!
PS: for the interested, this is df
:
In [15]: df
Out[15]:
date qualifies1
0 2020-01-01 False
1 2020-01-08 True
2 2020-01-15 True
3 2020-01-22 True
4 2020-01-29 False
5 2020-02-05 True
6 2020-02-12 True
7 2020-02-19 True
8 2020-02-26 False
9 2020-03-04 False
10 2020-03-11 False
11 2020-03-18 False
12 2020-03-25 False
13 2020-04-01 False
14 2020-04-08 True
15 2020-04-15 True
16 2020-04-22 True
17 2020-04-29 False
18 2020-05-06 True
19 2020-05-13 True
20 2020-05-20 True
21 2020-05-27 True
22 2020-06-03 False
23 2020-06-10 True
24 2020-06-17 True
25 2020-06-24 True
26 2020-07-01 False
27 2020-07-08 True
28 2020-07-15 True
29 2020-07-22 True
30 2020-07-29 False
31 2020-08-05 False
32 2020-08-12 False
33 2020-08-19 False
34 2020-08-26 False
35 2020-09-02 False
36 2020-09-09 True
37 2020-09-16 True
38 2020-09-23 True
39 2020-09-30 False
40 2020-10-07 True
41 2020-10-14 True
42 2020-10-21 True
43 2020-10-28 False
44 2020-11-04 False
45 2020-11-11 True
46 2020-11-18 True
47 2020-11-25 True
48 2020-12-02 False
49 2020-12-09 True
50 2020-12-16 True
51 2020-12-23 True
52 2020-12-30 False
See Question&Answers more detail:
os