범주형 자료 Encoding for Handling
How Handling Categorical Variables
<Label encoding>
For tree-based models (like decision trees and random forests),
you can expect label encoding to work well with ordinal variables : 순서형 변수에서 사용.
1. Replace using the map function
: 각 범주형 값을 숫자로 mapping
2. Label Encoding
: LabelEncoder 사용한 mapping
<Use module_One Hot Encoder>
In contrast to label encoding, one-hot encoding does not assume an ordering of the categories. Thus, you can expect this approach to work particularly well if there is no clear ordering in the categorical data.
So, We refer to categorical variables without an intrinsic ranking as nominal variables : 명목형 변수에서 사용
3. pd.get_dummy( )
:pd.get_dummy( )를 활용한 컴퓨터가 실질적으로 읽을 수 있는 방법
4. LabelEncoder( ) or LabelBinarizer( )
: one-hot-encoding by Encoder
from sklearn.preprocessing import either LabelEncoder or LabelBinarizer
캐글은 라벨인코더, 아티클은 라벨바이너라이저
5. Count/Frequency Encoder
: high_cardinality인 빈도수로 대체하여 범주형 변수 인코딩
6. Drop the categorical Variables
: 범주형 변수 버리고 분석
This approach will only work well if the columns did not contain useful information.
출처 : analyticsindiamag.com/complete-guide-to-handling-categorical-data-using-scikit-learn/
Complete Guide To Handling Categorical Data Using Scikit-Learn
Handling categorical features to preprocess before building machine learning models. Techniques of encoding categorical features to numeric.
analyticsindiamag.com
출처 : www.kaggle.com/alexisbcook/categorical-variables
Categorical Variables
Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources
www.kaggle.com
추가자료: www.kaggle.com/getting-started/27270
How to handle categorical data in scikit with pandas | Data Science and Machine Learning
How to handle categorical data in scikit with pandas.
www.kaggle.com
[기계학습]랜덤 포레스트
최근 기계 학습과 관련한 프로젝트를 진행하며 얻게된 지식들을 공유하고자 합니다. 먼저 최근에 사용하고 있는 랜덤 포레스트에 대해서 설명드리고자 합니다. 위키에서 쉽고(?) 자세하게 설명
cromboltz.tistory.com
one-hot-encoding > dimensional space is blow up