If you are on sklearn>0.20.dev0
In [11]: from sklearn.preprocessing import OneHotEncoder
...: cat = OneHotEncoder()
...: X = np.array([['a', 'b', 'a', 'c'], [0, 1, 0, 1]], dtype=object).T
...: cat.fit_transform(X).toarray()
...:
Out[11]: array([[1., 0., 0., 1., 0.],
[0., 1., 0., 0., 1.],
[1., 0., 0., 1., 0.],
[0., 0., 1., 0., 1.]])
If you are on sklearn==0.20.dev0
In [30]: cat = CategoricalEncoder()
In [31]: X = np.array([['a', 'b', 'a', 'c'], [0, 1, 0, 1]], dtype=object).T
In [32]: cat.fit_transform(X).toarray()
Out[32]:
array([[ 1., 0., 0., 1., 0.],
[ 0., 1., 0., 0., 1.],
[ 1., 0., 0., 1., 0.],
[ 0., 0., 1., 0., 1.]])
Another way to do it is to use category_encoders.
Here is an example:
% pip install category_encoders
import category_encoders as ce
le = ce.OneHotEncoder(return_df=False, impute_missing=False, handle_unknown="ignore")
X = np.array([['a', 'dog', 'red'], ['b', 'cat', 'green']])
le.fit_transform(X)
array([[1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1]])