If the categorical variable value contains 5 distinct classes we use 0 1 2 3 and 4.
Example of label encoding. To understand label encoding with an example let us take COVID-19 cases in India across states. From sklearnpreprocessing import LabelEncoder lb_make LabelEncoder obj_dfmake_code lb_makefit_transformobj_dfmake obj_dfmake make_codehead11. Harvard Rutgers UCLA Berkeley Stanford.
When LabelEncoder is used they get assigned value of 1 2 and 0 for commerce science and arts. Which college someone when to eg. But depending on the data label encoding introduces a new problem.
It doesnt seem like just factors will work because there is no persisting of the mapping. Suppose we have a column Height in some dataset. When LabelEncoder is used with categorical features having multiple values the integer value such as 0 1 2 3 etc.
The state someone lives in eg. To understand label encoding with an example let us take COVID-19 cases in India across states. For example color feature having values like red orange blue white etc.
Lets say you use linear regression and encode cat dog mouse to 1 2 3 then youll get cat1coeff and mouse3coeff which is a false relation made by the label encoder. Understanding Label Encoding. For instance if the value of the categorical variable has six different classes we will use 0 1 2 3 4 and 5.
While observing the following. What one hot encoding does is it takes a column which has categorical data which has been label encoded and then splits the column into multiple columns. Where 0 is the label for tall 1 is the label for medium and 2 is label for short height.