For example we have encoded a set of country names into numerical data.
Example of label encoding. Blue 1 Green 2 Red 3 and create an object with this mapping to then use for transforming new data in a similar fashion. Beginmatrix beginarraycc textPosition textSalary hline textCustomer Service 44000 textManager 75000 textAssistant Manager 65000 textDirector 90000 endarray endmatrix. Harvard Rutgers UCLA Berkeley Stanford.
What is the difference between one hot encoding and label encoding. If youre new to Machine Learning you might get confused between these two terms -Label Encoder. From sklearn import preprocessing le preprocessingLabelEncoder lefit 1 2 2 6 LabelEncoder leclasses_ array 1 2 6 letransform 1 1 2 6 array 0 0 1 2 leinverse_transform 0 0 1 2 array.
What one hot encoding does is it takes a column which has categorical data which has been label encoded and then splits the column into multiple columns. After applying label encoding the Height column is converted into. We apply Label Encoding on iris dataset on the target column which is Species.
When LabelEncoder is used with categorical features having multiple values the integer value such as 0 1 2 3 etc. The way I see it label encoding is meaningless since you encode them in a non-ordinal fashion which makes it impossible for a linear model to learn anything useful. Lets say you use linear regression and encode cat dog mouse to 1 2 3 then youll get cat1coeff and mouse3coeff which is a false relation made by the label encoder.
If we observe the below data frame the State column. Label encoding is simply converting each value in a column to a number. For example in above example the feature hsc_s has three different types of value such as commerce science and arts.
For example we have encoded a set of country names into numerical data. While observing the following. For instance if the value of the categorical variable has six different classes we will use 0 1 2 3 4 and 5.