unbalanced dataset, anomalies have same distribution as normal data

I worked with a dataset which contains 2 classes (95%, 5%).

And the features of these 2 classes have almost the same distribution.

Question is: How can I classify these 2 classes and explain which principal the model uses to classify the test set?

enter image description here

edited Jan 22 at 2:23

thlpswm

553

asked Jan 20 at 15:09

Xuanqi Huang

add a comment |

I worked with a dataset which contains 2 classes (95%, 5%).

And the features of these 2 classes have almost the same distribution.

Question is: How can I classify these 2 classes and explain which principal the model uses to classify the test set?

enter image description here

edited Jan 22 at 2:23

thlpswm

553

asked Jan 20 at 15:09

Xuanqi Huang

add a comment |

I worked with a dataset which contains 2 classes (95%, 5%).

And the features of these 2 classes have almost the same distribution.

Question is: How can I classify these 2 classes and explain which principal the model uses to classify the test set?

enter image description here

edited Jan 22 at 2:23

thlpswm

553

asked Jan 20 at 15:09

Xuanqi Huang

I worked with a dataset which contains 2 classes (95%, 5%).

And the features of these 2 classes have almost the same distribution.

Question is: How can I classify these 2 classes and explain which principal the model uses to classify the test set?

enter image description here

python data-science anomaly-detection

edited Jan 22 at 2:23

thlpswm

553

asked Jan 20 at 15:09

Xuanqi Huang

edited Jan 22 at 2:23

thlpswm

553

asked Jan 20 at 15:09

Xuanqi Huang

edited Jan 22 at 2:23

thlpswm

553

edited Jan 22 at 2:23

thlpswm

553

edited Jan 22 at 2:23

thlpswm

553

asked Jan 20 at 15:09

Xuanqi Huang

asked Jan 20 at 15:09

Xuanqi Huang

asked Jan 20 at 15:09

Xuanqi Huang

add a comment |

1 Answer
1

active

oldest

votes

Actually the distribution of features makes sense, but you have to make more detailed exploratory analysis than simple distribution of features. I suggest to have a look some 3D plots. You can have a look at some links about EDA:

https://www.kaggle.com/dejavu23/titanic-eda-to-ml-beginner

https://www.kaggle.com/dejavu23/house-prices-eda-to-ml-beginner

Regarding to classification models, I would suggest to have use Decision Tree based models, such as Random Forest or Gradient Tree Boosting.
The idea behind Decision Tree is partition of feature space and making the same prediction for each part of it. You can plot Decision Trees, using some packages and it will help to understand principles behind the model. You can read more about all these models in the nice book:

http://www-bcf.usc.edu/~gareth/ISL/

Links to packages:

https://lightgbm.readthedocs.io/en/latest/

https://scikit-learn.org/stable/modules/tree.html

https://scikit-learn.org/stable/modules/ensemble.html

You can read about decision tree visualization:

https://medium.com/@rnbrown/creating-and-visualizing-decision-trees-with-python-f8e8fa394176

https://www.kaggle.com/willkoehrsen/visualize-a-decision-tree-w-python-scikit-learn

answered Jan 22 at 6:38

Razmik Melikbekyan

112

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54277791%2funbalanced-dataset-anomalies-have-same-distribution-as-normal-data%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

https://www.kaggle.com/dejavu23/titanic-eda-to-ml-beginner

https://www.kaggle.com/dejavu23/house-prices-eda-to-ml-beginner

http://www-bcf.usc.edu/~gareth/ISL/

Links to packages:

https://lightgbm.readthedocs.io/en/latest/

https://scikit-learn.org/stable/modules/tree.html

https://scikit-learn.org/stable/modules/ensemble.html

You can read about decision tree visualization:

https://medium.com/@rnbrown/creating-and-visualizing-decision-trees-with-python-f8e8fa394176

https://www.kaggle.com/willkoehrsen/visualize-a-decision-tree-w-python-scikit-learn

answered Jan 22 at 6:38

Razmik Melikbekyan

112

add a comment |

https://www.kaggle.com/dejavu23/titanic-eda-to-ml-beginner

https://www.kaggle.com/dejavu23/house-prices-eda-to-ml-beginner

http://www-bcf.usc.edu/~gareth/ISL/

Links to packages:

https://lightgbm.readthedocs.io/en/latest/

https://scikit-learn.org/stable/modules/tree.html

https://scikit-learn.org/stable/modules/ensemble.html

You can read about decision tree visualization:

https://medium.com/@rnbrown/creating-and-visualizing-decision-trees-with-python-f8e8fa394176

https://www.kaggle.com/willkoehrsen/visualize-a-decision-tree-w-python-scikit-learn

answered Jan 22 at 6:38

Razmik Melikbekyan

112

add a comment |

https://www.kaggle.com/dejavu23/titanic-eda-to-ml-beginner

https://www.kaggle.com/dejavu23/house-prices-eda-to-ml-beginner

http://www-bcf.usc.edu/~gareth/ISL/

Links to packages:

https://lightgbm.readthedocs.io/en/latest/

https://scikit-learn.org/stable/modules/tree.html

https://scikit-learn.org/stable/modules/ensemble.html

You can read about decision tree visualization:

https://medium.com/@rnbrown/creating-and-visualizing-decision-trees-with-python-f8e8fa394176

https://www.kaggle.com/willkoehrsen/visualize-a-decision-tree-w-python-scikit-learn

answered Jan 22 at 6:38

Razmik Melikbekyan

112

https://www.kaggle.com/dejavu23/titanic-eda-to-ml-beginner

https://www.kaggle.com/dejavu23/house-prices-eda-to-ml-beginner

http://www-bcf.usc.edu/~gareth/ISL/

Links to packages:

https://lightgbm.readthedocs.io/en/latest/

https://scikit-learn.org/stable/modules/tree.html

https://scikit-learn.org/stable/modules/ensemble.html

You can read about decision tree visualization:

https://medium.com/@rnbrown/creating-and-visualizing-decision-trees-with-python-f8e8fa394176

https://www.kaggle.com/willkoehrsen/visualize-a-decision-tree-w-python-scikit-learn

answered Jan 22 at 6:38

Razmik Melikbekyan

112

answered Jan 22 at 6:38

Razmik Melikbekyan

112

answered Jan 22 at 6:38

Razmik Melikbekyan

112

answered Jan 22 at 6:38

Razmik Melikbekyan

112

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Brtdku