범주형 자료 Encoding for Handling

How Handling Categorical Variables

<Label encoding>

For tree-based models (like decision trees and random forests),

you can expect label encoding to work well with ordinal variables : 순서형 변수에서 사용.

1. Replace using the map function

: 각 범주형 값을 숫자로 mapping

2. Label Encoding

: LabelEncoder 사용한 mapping

<Use module_One Hot Encoder>

In contrast to label encoding, one-hot encoding does not assume an ordering of the categories. Thus, you can expect this approach to work particularly well if there is no clear ordering in the categorical data.

So, We refer to categorical variables without an intrinsic ranking as nominal variables : 명목형 변수에서 사용

3. pd.get_dummy( )

:pd.get_dummy( )를 활용한 컴퓨터가 실질적으로 읽을 수 있는 방법

4. LabelEncoder( ) or LabelBinarizer( )

: one-hot-encoding by Encoder

from sklearn.preprocessing import either LabelEncoder or LabelBinarizer

캐글은 라벨인코더, 아티클은 라벨바이너라이저

5. Count/Frequency Encoder

: high_cardinality인 빈도수로 대체하여 범주형 변수 인코딩

6. Drop the categorical Variables

: 범주형 변수 버리고 분석

This approach will only work well if the columns did not contain useful information.

출처 : analyticsindiamag.com/complete-guide-to-handling-categorical-data-using-scikit-learn/

Complete Guide To Handling Categorical Data Using Scikit-Learn

Handling categorical features to preprocess before building machine learning models. Techniques of encoding categorical features to numeric.

analyticsindiamag.com

출처 : www.kaggle.com/alexisbcook/categorical-variables

Categorical Variables

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

www.kaggle.com

추가자료: www.kaggle.com/getting-started/27270

How to handle categorical data in scikit with pandas | Data Science and Machine Learning

How to handle categorical data in scikit with pandas.

www.kaggle.com

cromboltz.tistory.com/19

[기계학습]랜덤 포레스트

최근 기계 학습과 관련한 프로젝트를 진행하며 얻게된 지식들을 공유하고자 합니다. 먼저 최근에 사용하고 있는 랜덤 포레스트에 대해서 설명드리고자 합니다. 위키에서 쉽고(?) 자세하게 설명

cromboltz.tistory.com

one-hot-encoding > dimensional space is blow up

'데이터분석&캐글' 카테고리의 다른 글

데이터스케일링_로그변환 (0)	2021.02.25
사람들이 선형회귀에 대해 잘못 알고있는 점 (0)	2021.02.19
데이터 시각화 함수 정리 _Seaborn (0)	2021.01.21
캐글코리아 커널 커리큘럼 (0)	2021.01.20
Kaggle로 알아보는 데이터 분석사고 (0)	2021.01.20

꿈 있는 다락방

범주형 자료 Encoding for Handling

How Handling Categorical Variables

<Label encoding>

1. Replace using the map function

2. Label Encoding

<Use module_One Hot Encoder>

3. pd.get_dummy( )

4. LabelEncoder( ) or LabelBinarizer( )

5. Count/Frequency Encoder

6. Drop the categorical Variables

'데이터분석&캐글' 카테고리의 다른 글

티스토리툴바

범주형 자료 Encoding for Handling

How Handling Categorical Variables

<Label encoding>

1. Replace using the map function

2. Label Encoding

<Use module_One Hot Encoder>

3. pd.get_dummy( )

4. LabelEncoder( ) or LabelBinarizer( )

5. Count/Frequency Encoder

6. Drop the categorical Variables

'데이터분석&캐글' 카테고리의 다른 글

'데이터분석&캐글' Related Articles

티스토리툴바