TL;DR
This is debattedly a bug in eval. See the open github issue GH16289.
Why am I getting this error?
This is because pd.eval
cannot parse series with more than 100 rows. Here's an example.
len(s)
300000
pd.eval(s.head(100)) # returns a parsed result
Whereas,
pd.eval(s.head(101))
AttributeError: 'PandasExprVisitor' object has no attribute 'visit_Ellipsis'
This issue persists, regardless of the parser or the engine.
What does this error mean?
pd.eval
operates on the __repr__
of the Series, rather than the objects contained within it (which is the cause of this bug). The __repr__
truncated rows, replacing them with a ...
(ellipsis). This ellipsis is misinterpreted by the engine as an Ellipsis
object -
...
Ellipsis
pd.eval('...')
AttributeError: 'PandasExprVisitor' object has no attribute 'visit_Ellipsis'
pd.eval
technically is not supposed to parse series of strings (the documentation mentioned it is meant to receive strings) and (as described by the accepted answer) will try to make a reasonable guess at the result instead of rejecting the input outright.
Whether this is intended behavior or incomplete behavior (a lot of pandas methods operate differently based on the input - and eval could work on a series by mapping itself onto each row, which is how I initially assumed this was working anyway) is up for discussion, since there's an open issue tracking this.
What can I do to make this to work?
Right now, there isn't a solution (the issue is still open as of 12/28/2017), however, there are a couple of workarounds.
Option 1
ast.literal_eval
This option should work out of the box if you can guarantee that you do not have any malformed strings.
from ast import literal_eval
s.apply(literal_eval)
0 [133, 115, 3, 1]
1 [114, 115, 2, 3]
2 [51, 59, 1, 1]
dtype: object
If there is a possibility of malformed data, you'll need to write a little error handling code. You can do that with a function -
def safe_parse(x):
try:
return literal_eval(x)
except (SyntaxError, ValueError):
return np.nan # replace with any suitable placeholder value
Pass this function to apply
-
s.apply(safe_parse)
0 [133, 115, 3, 1]
1 [114, 115, 2, 3]
2 [51, 59, 1, 1]
dtype: object
ast
works for any number of rows, and is slow, but reliable. You can also use pd.json.loads
for JSON data, applying the same ideas as with literal_eval
.
Option 2
yaml.load
Another great option for parsing simple data, I picked this up from @ayhan a while ago.
import yaml
s.apply(yaml.load)
0 [133, 115, 3, 1]
1 [114, 115, 2, 3]
2 [51, 59, 1, 1]
dtype: object
I haven't tested this on more complex structures, but this should work for almost any basic string representation of data.
You can find the documentation for PyYAML here. Scroll down a bit and you'll find more details on the load
function.
Note
If you're working with JSON data, it might be suitable to read your file using pd.read_json
or pd.io.json.json_normalize
to begin with.
You can also perform parsing as you read in your data, using read_csv
-
s = pd.read_csv(converters=literal_eval, squeeze=True)
Where the converters
argument will apply that function passed on the column as it is read, so you don't have to deal with parsing later.
Continuing the point above, if you're working with a dataframe, pass a dict
-
df = pd.read_csv(converters={'col' : literal_eval})
Where col
is the column that needs to be parsed
You can also pass pd.json.loads
(for json data), or pd.eval
(if you have 100 rows or less).
Credits to MaxU and Moondra for uncovering this issue.