I have a dataset:
(我有一个数据集:)
from pandas import DataFrame
Cars = {'1': [140.8731392,142.3481116,146.7621232,144.9406286,144.8725356,145.3976902],
'2': [147.6279494,141.4455089,147.3953295,144.6467237,146.406241,147.0695877],
'3': [140.7164976,143.4675429,145.9967808,141.7831729,144.4806287,147.7805723],
'4': [149.359966,147.0236556,146.2931072,148.478762,149.565317,143.9501002],
'5': [145.9216418,143.3376241,145.2974838,148.80916,143.7103238,145.4369799],
'6': [146.2192954,149.0914385,146.3690445,143.3845218,140.1431644,149.6484708]
}
df = DataFrame(Cars,columns= ['1', '2', '3', '4', '5', '6'])
print (df)
If I try to identify which col contain outliers, using:
(如果我尝试确定哪个col包含异常值,请使用:)
outlier_numbers = []
explantations = []
for col in df.columns:
quartile_01, quartile_03 = np.percentile(df[col].dropna(), [25, 75])
iqr = quartile_03 - quartile_01
lwer_bound = quartile_01 -(1.5 * iqr)
upper_bound = quartile_03 +(1.5 * iqr)
outliers_number = ((df[col] < (quartile_01 - 1.5 * iqr)) | (df[col] > (quartile_03 + 1.5 * iqr))).sum() #!=0
explanation = f"The lower and upper bound of the range for '{col}' respectively is: {lwer_bound} and {upper_bound}"
if outliers_number >0:
outlier_numbers.append(outliers_number)
explantations.append(explanation)
a_dict = {key: value for key, value in zip(outlier_numbers, explantations)}
values_checking = len(outlier_numbers) == 0
and then, print outliers_number, I will get [0, 1, 0, 0, 1, 0], meaning that col 2 and 5 contain outliers.
(然后,打印出outliers_number,我将得到[0,1,0,0,1,0],这意味着col 2和5包含离群值。)
But if I check the zipped a_dict, I will get : {1: "The lower and upper bound of the range for '5' respectively is: 141.5670700125 and 148.3405201125"} which doesn't make sense to me. (但是,如果我检查压缩的a_dict,我将得到:{1:“'5'的范围的下限和上限分别是:141.5670700125和148.3405201125”}对我来说没有意义。)
Why only one element, not two elements got zipped? (为什么只有一个元素而不是两个元素被压缩?)
ask by eponkratova translate from so