Selecting top n groups with dplyr then plotting other variables
I have a dataset where I am trying to select just the top n by counting one category, but then plotting using other variables in the dataset--basically one level of aggregation for the top n, but needing to go back to the full data to plot in ggplot
.
So in the problem below, I want the two most common examName
s and then plot and facetwrap
them by count of year
.
ap <-
tribble(
~year, ~examName,
2014, "Statistics",
2015, "Statistics",
2016, "Statistics",
2016, "Statistics",
2016, "Statistics",
2016, "Statistics",
2017, "Statistics",
2017, "Statistics",
2017, "Statistics",
2017, "Statistics",
2017, "Statistics",
2013, "Macroeconomics",
2013, "Macroeconomics",
2014, "Macroeconomics",
2015, "Macroeconomics",
2016, "Macroeconomics",
2016, "Macroeconomics",
2016, "Macroeconomics",
2016, "Macroeconomics",
2016, "Macroeconomics",
2017, "Macroeconomics",
2017, "Macroeconomics",
2017, "Macroeconomics",
2017, "Macroeconomics",
2017, "Macroeconomics",
2017, "Macroeconomics",
2013, "Calculus",
2014, "Calculus",
2015, "Calculus",
2016, "Calculus",
2017, "Calculus",
2017, "Psychology",
2017, "Psychology",
2017, "Psychology",
2017, "Psychology",
2017, "Psychology",
2018, "Psychology",
2018, "Psychology")
ap_top <- ap %>%
count(examName, sort = TRUE) %>%
head(2) %>%
inner_join(ap, by = "examName") %>%
select(-n)
ap_top %>%
count(examName, year) %>%
ggplot(aes(x = year, y = n, group = examName)) +
geom_line() +
facet_wrap(~ examName)
My thought is to get my top n, then inner_join
back on the original dataset. Then plot using that; essentially using the inner join as a filter.
I know there's a better way to do this, and I would love a more elegant solution! I'm all ears! Example dataset given (sorry it's so long).
r ggplot2 dplyr
add a comment |
I have a dataset where I am trying to select just the top n by counting one category, but then plotting using other variables in the dataset--basically one level of aggregation for the top n, but needing to go back to the full data to plot in ggplot
.
So in the problem below, I want the two most common examName
s and then plot and facetwrap
them by count of year
.
ap <-
tribble(
~year, ~examName,
2014, "Statistics",
2015, "Statistics",
2016, "Statistics",
2016, "Statistics",
2016, "Statistics",
2016, "Statistics",
2017, "Statistics",
2017, "Statistics",
2017, "Statistics",
2017, "Statistics",
2017, "Statistics",
2013, "Macroeconomics",
2013, "Macroeconomics",
2014, "Macroeconomics",
2015, "Macroeconomics",
2016, "Macroeconomics",
2016, "Macroeconomics",
2016, "Macroeconomics",
2016, "Macroeconomics",
2016, "Macroeconomics",
2017, "Macroeconomics",
2017, "Macroeconomics",
2017, "Macroeconomics",
2017, "Macroeconomics",
2017, "Macroeconomics",
2017, "Macroeconomics",
2013, "Calculus",
2014, "Calculus",
2015, "Calculus",
2016, "Calculus",
2017, "Calculus",
2017, "Psychology",
2017, "Psychology",
2017, "Psychology",
2017, "Psychology",
2017, "Psychology",
2018, "Psychology",
2018, "Psychology")
ap_top <- ap %>%
count(examName, sort = TRUE) %>%
head(2) %>%
inner_join(ap, by = "examName") %>%
select(-n)
ap_top %>%
count(examName, year) %>%
ggplot(aes(x = year, y = n, group = examName)) +
geom_line() +
facet_wrap(~ examName)
My thought is to get my top n, then inner_join
back on the original dataset. Then plot using that; essentially using the inner join as a filter.
I know there's a better way to do this, and I would love a more elegant solution! I'm all ears! Example dataset given (sorry it's so long).
r ggplot2 dplyr
add a comment |
I have a dataset where I am trying to select just the top n by counting one category, but then plotting using other variables in the dataset--basically one level of aggregation for the top n, but needing to go back to the full data to plot in ggplot
.
So in the problem below, I want the two most common examName
s and then plot and facetwrap
them by count of year
.
ap <-
tribble(
~year, ~examName,
2014, "Statistics",
2015, "Statistics",
2016, "Statistics",
2016, "Statistics",
2016, "Statistics",
2016, "Statistics",
2017, "Statistics",
2017, "Statistics",
2017, "Statistics",
2017, "Statistics",
2017, "Statistics",
2013, "Macroeconomics",
2013, "Macroeconomics",
2014, "Macroeconomics",
2015, "Macroeconomics",
2016, "Macroeconomics",
2016, "Macroeconomics",
2016, "Macroeconomics",
2016, "Macroeconomics",
2016, "Macroeconomics",
2017, "Macroeconomics",
2017, "Macroeconomics",
2017, "Macroeconomics",
2017, "Macroeconomics",
2017, "Macroeconomics",
2017, "Macroeconomics",
2013, "Calculus",
2014, "Calculus",
2015, "Calculus",
2016, "Calculus",
2017, "Calculus",
2017, "Psychology",
2017, "Psychology",
2017, "Psychology",
2017, "Psychology",
2017, "Psychology",
2018, "Psychology",
2018, "Psychology")
ap_top <- ap %>%
count(examName, sort = TRUE) %>%
head(2) %>%
inner_join(ap, by = "examName") %>%
select(-n)
ap_top %>%
count(examName, year) %>%
ggplot(aes(x = year, y = n, group = examName)) +
geom_line() +
facet_wrap(~ examName)
My thought is to get my top n, then inner_join
back on the original dataset. Then plot using that; essentially using the inner join as a filter.
I know there's a better way to do this, and I would love a more elegant solution! I'm all ears! Example dataset given (sorry it's so long).
r ggplot2 dplyr
I have a dataset where I am trying to select just the top n by counting one category, but then plotting using other variables in the dataset--basically one level of aggregation for the top n, but needing to go back to the full data to plot in ggplot
.
So in the problem below, I want the two most common examName
s and then plot and facetwrap
them by count of year
.
ap <-
tribble(
~year, ~examName,
2014, "Statistics",
2015, "Statistics",
2016, "Statistics",
2016, "Statistics",
2016, "Statistics",
2016, "Statistics",
2017, "Statistics",
2017, "Statistics",
2017, "Statistics",
2017, "Statistics",
2017, "Statistics",
2013, "Macroeconomics",
2013, "Macroeconomics",
2014, "Macroeconomics",
2015, "Macroeconomics",
2016, "Macroeconomics",
2016, "Macroeconomics",
2016, "Macroeconomics",
2016, "Macroeconomics",
2016, "Macroeconomics",
2017, "Macroeconomics",
2017, "Macroeconomics",
2017, "Macroeconomics",
2017, "Macroeconomics",
2017, "Macroeconomics",
2017, "Macroeconomics",
2013, "Calculus",
2014, "Calculus",
2015, "Calculus",
2016, "Calculus",
2017, "Calculus",
2017, "Psychology",
2017, "Psychology",
2017, "Psychology",
2017, "Psychology",
2017, "Psychology",
2018, "Psychology",
2018, "Psychology")
ap_top <- ap %>%
count(examName, sort = TRUE) %>%
head(2) %>%
inner_join(ap, by = "examName") %>%
select(-n)
ap_top %>%
count(examName, year) %>%
ggplot(aes(x = year, y = n, group = examName)) +
geom_line() +
facet_wrap(~ examName)
My thought is to get my top n, then inner_join
back on the original dataset. Then plot using that; essentially using the inner join as a filter.
I know there's a better way to do this, and I would love a more elegant solution! I'm all ears! Example dataset given (sorry it's so long).
r ggplot2 dplyr
r ggplot2 dplyr
edited Jan 18 at 20:18
talbe009
asked Jan 18 at 20:09
talbe009talbe009
344
344
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
You don't need inner_join()
I would just determine top two exams in a separate statement and then filter on those.
top_exams <- count(ap, examName) %>%
top_n(2, n) %>% pull(examName)
ap %>%
filter(examName %in% top_exams) %>%
count(year, examName) %>%
ggplot(aes(x = year, y = n, group = examName)) +
geom_line() +
facet_wrap(~ examName)
add a comment |
Another possibility:
ap %>%
group_by(examName) %>%
mutate(temp = n()) %>%
ungroup() %>%
mutate(temp = dense_rank(desc(temp))) %>%
filter(temp %in% c(1,2)) %>%
select(-temp) %>%
count(year, examName) %>%
ggplot(aes(x = year, y = n, group = examName)) +
geom_line() +
facet_wrap(~ examName)
It counts the cases per "examName" and ranks the count. Then, it filters the cases that have the greatest and the second greatest count.
What's nice about this solution is that you could do things with thedense_rank
, like use it infct_reorder
for sorting in the plot.
– talbe009
Jan 18 at 21:01
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54260789%2fselecting-top-n-groups-with-dplyr-then-plotting-other-variables%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
You don't need inner_join()
I would just determine top two exams in a separate statement and then filter on those.
top_exams <- count(ap, examName) %>%
top_n(2, n) %>% pull(examName)
ap %>%
filter(examName %in% top_exams) %>%
count(year, examName) %>%
ggplot(aes(x = year, y = n, group = examName)) +
geom_line() +
facet_wrap(~ examName)
add a comment |
You don't need inner_join()
I would just determine top two exams in a separate statement and then filter on those.
top_exams <- count(ap, examName) %>%
top_n(2, n) %>% pull(examName)
ap %>%
filter(examName %in% top_exams) %>%
count(year, examName) %>%
ggplot(aes(x = year, y = n, group = examName)) +
geom_line() +
facet_wrap(~ examName)
add a comment |
You don't need inner_join()
I would just determine top two exams in a separate statement and then filter on those.
top_exams <- count(ap, examName) %>%
top_n(2, n) %>% pull(examName)
ap %>%
filter(examName %in% top_exams) %>%
count(year, examName) %>%
ggplot(aes(x = year, y = n, group = examName)) +
geom_line() +
facet_wrap(~ examName)
You don't need inner_join()
I would just determine top two exams in a separate statement and then filter on those.
top_exams <- count(ap, examName) %>%
top_n(2, n) %>% pull(examName)
ap %>%
filter(examName %in% top_exams) %>%
count(year, examName) %>%
ggplot(aes(x = year, y = n, group = examName)) +
geom_line() +
facet_wrap(~ examName)
answered Jan 18 at 20:21
dylanjmdylanjm
393112
393112
add a comment |
add a comment |
Another possibility:
ap %>%
group_by(examName) %>%
mutate(temp = n()) %>%
ungroup() %>%
mutate(temp = dense_rank(desc(temp))) %>%
filter(temp %in% c(1,2)) %>%
select(-temp) %>%
count(year, examName) %>%
ggplot(aes(x = year, y = n, group = examName)) +
geom_line() +
facet_wrap(~ examName)
It counts the cases per "examName" and ranks the count. Then, it filters the cases that have the greatest and the second greatest count.
What's nice about this solution is that you could do things with thedense_rank
, like use it infct_reorder
for sorting in the plot.
– talbe009
Jan 18 at 21:01
add a comment |
Another possibility:
ap %>%
group_by(examName) %>%
mutate(temp = n()) %>%
ungroup() %>%
mutate(temp = dense_rank(desc(temp))) %>%
filter(temp %in% c(1,2)) %>%
select(-temp) %>%
count(year, examName) %>%
ggplot(aes(x = year, y = n, group = examName)) +
geom_line() +
facet_wrap(~ examName)
It counts the cases per "examName" and ranks the count. Then, it filters the cases that have the greatest and the second greatest count.
What's nice about this solution is that you could do things with thedense_rank
, like use it infct_reorder
for sorting in the plot.
– talbe009
Jan 18 at 21:01
add a comment |
Another possibility:
ap %>%
group_by(examName) %>%
mutate(temp = n()) %>%
ungroup() %>%
mutate(temp = dense_rank(desc(temp))) %>%
filter(temp %in% c(1,2)) %>%
select(-temp) %>%
count(year, examName) %>%
ggplot(aes(x = year, y = n, group = examName)) +
geom_line() +
facet_wrap(~ examName)
It counts the cases per "examName" and ranks the count. Then, it filters the cases that have the greatest and the second greatest count.
Another possibility:
ap %>%
group_by(examName) %>%
mutate(temp = n()) %>%
ungroup() %>%
mutate(temp = dense_rank(desc(temp))) %>%
filter(temp %in% c(1,2)) %>%
select(-temp) %>%
count(year, examName) %>%
ggplot(aes(x = year, y = n, group = examName)) +
geom_line() +
facet_wrap(~ examName)
It counts the cases per "examName" and ranks the count. Then, it filters the cases that have the greatest and the second greatest count.
answered Jan 18 at 20:43
tmfmnktmfmnk
2,2941412
2,2941412
What's nice about this solution is that you could do things with thedense_rank
, like use it infct_reorder
for sorting in the plot.
– talbe009
Jan 18 at 21:01
add a comment |
What's nice about this solution is that you could do things with thedense_rank
, like use it infct_reorder
for sorting in the plot.
– talbe009
Jan 18 at 21:01
What's nice about this solution is that you could do things with the
dense_rank
, like use it in fct_reorder
for sorting in the plot.– talbe009
Jan 18 at 21:01
What's nice about this solution is that you could do things with the
dense_rank
, like use it in fct_reorder
for sorting in the plot.– talbe009
Jan 18 at 21:01
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54260789%2fselecting-top-n-groups-with-dplyr-then-plotting-other-variables%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown