Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
93 views
in Technique[技术] by (71.8m points)

python - Creating Loops in Pandas

I am trying to turn this into a loop variable to save space but unsure what I am doing wrong-

This is the code fully written out-

df0 = pd.crosstab(df['Gender: What gender do you identify with?'], df[Banner[0]], normalize= 'columns')
df1 = pd.crosstab(df['Gender: What gender do you identify with?'], df[Banner[1]], normalize= 'columns')
df2 = pd.crosstab(df['Gender: What gender do you identify with?'], df[Banner[2]], normalize= 'columns')
df3 = pd.crosstab(df['Gender: What gender do you identify with?'], df[Banner[3]], normalize= 'columns')
df4 = pd.crosstab(df['Gender: What gender do you identify with?'], df[Banner[4]], normalize= 'columns')
df5 = pd.crosstab(df['Gender: What gender do you identify with?'], df[Banner[5]], normalize= 'columns')
df6 = pd.crosstab(df['Gender: What gender do you identify with?'], df[Banner[6]], normalize= 'columns')
df7 = pd.crosstab(df['Gender: What gender do you identify with?'], df[Banner[7]], normalize= 'columns')
df8 = pd.crosstab(df['Gender: What gender do you identify with?'], df[Banner[8]], normalize= 'columns')
df9 = pd.crosstab(df['Gender: What gender do you identify with?'], df[Banner[9]], normalize= 'columns')
df10 = pd.crosstab(df['Gender: What gender do you identify with?'], df[Banner[10]], normalize= 'columns')
df11 = pd.crosstab(df['Gender: What gender do you identify with?'], df[Banner[11]], normalize= 'columns')
df12 = pd.crosstab(df['Gender: What gender do you identify with?'], df[Banner[12]], normalize= 'columns')

When I try to create a loop with this-

for i in range(0, len(Banner)):
    df[i] = (pd.crosstab(df['Gender: What gender do you identify with?'], df[Banner[i]], normalize= 'columns'))

I run into this error-

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~Anaconda3libsite-packagespandascoreindexesase.py in get_loc(self, key, method, tolerance)
   2888             try:
-> 2889                 return self._engine.get_loc(casted_key)
   2890             except KeyError as err:

pandas\_libsindex.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libsindex.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libshashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libshashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 2

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
~Anaconda3libsite-packagespandascoregeneric.py in _set_item(self, key, value)
   3564         try:
-> 3565             loc = self._info_axis.get_loc(key)
   3566         except KeyError:

~Anaconda3libsite-packagespandascoreindexesase.py in get_loc(self, key, method, tolerance)
   2890             except KeyError as err:
-> 2891                 raise KeyError(key) from err
   2892 

KeyError: 2

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-35-1630d74a377e> in <module>
      1 for i in range(0, len(Banner)):
----> 2         df[i] = (pd.crosstab(df['Gender: What gender do you identify with?'], df[Banner[i]], normalize= 'columns'))

~Anaconda3libsite-packagespandascoreframe.py in __setitem__(self, key, value)
   3038         else:
   3039             # set column
-> 3040             self._set_item(key, value)
   3041 
   3042     def _setitem_slice(self, key: slice, value):

~Anaconda3libsite-packagespandascoreframe.py in _set_item(self, key, value)
   3115         self._ensure_valid_index(value)
   3116         value = self._sanitize_column(key, value)
-> 3117         NDFrame._set_item(self, key, value)
   3118 
   3119         # check if we are modifying a copy

~Anaconda3libsite-packagespandascoregeneric.py in _set_item(self, key, value)
   3566         except KeyError:
   3567             # This item wasn't present, just insert at end
-> 3568             self._mgr.insert(len(self._info_axis), key, value)
   3569             return
   3570 

~Anaconda3libsite-packagespandascoreinternalsmanagers.py in insert(self, loc, item, value, allow_duplicates)
   1187             value = _safe_reshape(value, (1,) + value.shape)
   1188 
-> 1189         block = make_block(values=value, ndim=self.ndim, placement=slice(loc, loc + 1))
   1190 
   1191         for blkno, count in _fast_count_smallints(self.blknos[loc:]):

~Anaconda3libsite-packagespandascoreinternalslocks.py in make_block(values, placement, klass, ndim, dtype)
   2712         values = DatetimeArray._simple_new(values, dtype=dtype)
   2713 
-> 2714     return klass(values, ndim=ndim, placement=placement)
   2715 
   2716 

~Anaconda3libsite-packagespandascoreinternalslocks.py in __init__(self, values, placement, ndim)
    128         if self._validate_ndim and self.ndim and len(self.mgr_locs) != len(self.values):
    129             raise ValueError(
--> 130                 f"Wrong number of items passed {len(self.values)}, "
    131                 f"placement implies {len(self.mgr_locs)}"
    132             )

ValueError: Wrong number of items passed 2, placement implies 1

I have also tried-

i = 0
while i < 10:
    df[i] = (pd.crosstab(df['Gender: What gender do you identify with?'], df[Banner[i]], normalize= 'columns'))
    i = i+1

And I run into the same error.

Is there a way to loop these variables in python or a faster way than manually writing it out.

Any help would be greatly appreciated.

Thanks

question from:https://stackoverflow.com/questions/65924951/creating-loops-in-pandas

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You're trying to dynamically create a variable name, and to do that one way is to use the global() method

for i in range(13):
    globals()['df{}'.format(i)] = i*1000

once you run, that you can call the variable, so

In [5]: df12
Out[5]: 12000

replace the i*1000 with your desired output


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...