Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
389 views
in Technique[技术] by (71.8m points)

python - Can anyone explain me StandardScaler?

I am unable to understand the page of the StandardScaler in the documentation of sklearn.

Can anyone explain this to me in simple terms?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Intro

I assume that you have a matrix X where each row/line is a sample/observation and each column is a variable/feature (this is the expected input for any sklearn ML function by the way -- X.shape should be [number_of_samples, number_of_features]).


Core of method

The main idea is to normalize/standardize i.e. μ = 0 and σ = 1 your features/variables/columns of X, individually, before applying any machine learning model.

StandardScaler() will normalize the features i.e. each column of X, INDIVIDUALLY, so that each column/feature/variable will have μ = 0 and σ = 1.


P.S: I find the most upvoted answer on this page, wrong. I am quoting "each value in the dataset will have the sample mean value subtracted" -- This is neither true nor correct.


See also: How and why to Standardize your data: A python tutorial


Example with code

from sklearn.preprocessing import StandardScaler
import numpy as np

# 4 samples/observations and 2 variables/features
data = np.array([[0, 0], [1, 0], [0, 1], [1, 1]])
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)

print(data)
[[0, 0],
 [1, 0],
 [0, 1],
 [1, 1]])

print(scaled_data)
[[-1. -1.]
 [ 1. -1.]
 [-1.  1.]
 [ 1.  1.]]

Verify that the mean of each feature (column) is 0:

scaled_data.mean(axis = 0)
array([0., 0.])

Verify that the std of each feature (column) is 1:

scaled_data.std(axis = 0)
array([1., 1.])

Appendix: The maths

enter image description here


UPDATE 08/2020: Concerning the input parameters with_mean and with_std to False/True, I have provided an answer here: StandardScaler difference between “with_std=False or True” and “with_mean=False or True”


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...