How to transform (encode) a qualitative (categorical) variable into a quantitative variable with scikit learn in python ?

Active August 25, 2020    /    Viewed 2312    /    Comments 0    /    Edit


Examples of how to transform (encode) a qualitative (categorical) variable into a quantitative variable with scikit learn in python ?

Input matrix

Let's consider the following input matrix X:

from sklearn import preprocessing

import numpy as np

X = np.array(('A','C','B','A','C','D','A'))

of shape

print(X.shape)

(7,)

that can be reshaped:

X = X.reshape(-1,1)

returns

print(X.shape)

(7, 1)

Encoding the elements of matrix X using the function OrdinalEncoder

To encode the elements of matrix X a solution is to use OrdinalEncoder:

enc = preprocessing.OrdinalEncoder(categories='auto')

enc.fit(X)

print( enc.transform(X) )

returns

[[0.]
 [2.]
 [1.]
 [0.]
 [2.]
 [3.]
 [0.]]

Encoding the elements of matrix X using the function OneHotEncoder

Another solution to encode the elements of matrix X using the function OneHotEncoder

enc = preprocessing.OneHotEncoder(categories='auto')

enc.fit(X)

print( enc.transform(X) )

returns

  (0, 0)    1.0
  (1, 2)    1.0
  (2, 1)    1.0
  (3, 0)    1.0
  (4, 2)    1.0
  (5, 3)    1.0
  (6, 0)    1.0

To get a matrix just use toarray() :

print( enc.transform(X).toarray() )

gives here

[[1. 0. 0. 0.]
 [0. 0. 1. 0.]
 [0. 1. 0. 0.]
 [1. 0. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]
 [1. 0. 0. 0.]]

References


Card image cap
profile-image
Daidalos

Hi, I am Ben.

I have developed this web site from scratch with Django to share with everyone my notes. If you have any ideas or suggestions to improve the site, let me know ! (you can contact me using the form in the welcome page). Thanks!



Did you find this content useful ?, If so, please consider donating a tip to the author(s). MoonBooks.org is visited by millions of people each year and it will help us to maintain our servers and create new contents.

Amount