Selecting top n groups with dplyr then plotting other variables

I have a dataset where I am trying to select just the top n by counting one category, but then plotting using other variables in the dataset--basically one level of aggregation for the top n, but needing to go back to the full data to plot in ggplot.

So in the problem below, I want the two most common examNames and then plot and facetwrap them by count of year.

ap <- 

      tribble(

        ~year, ~examName,

        2014, "Statistics",

        2015, "Statistics",

        2016, "Statistics",

        2016, "Statistics",

        2016, "Statistics",

        2016, "Statistics",

        2017, "Statistics",

        2017, "Statistics",

        2017, "Statistics",

        2017, "Statistics",

        2017, "Statistics",

        2013, "Macroeconomics",

        2013, "Macroeconomics",

        2014, "Macroeconomics",

        2015, "Macroeconomics",

        2016, "Macroeconomics",

        2016, "Macroeconomics",

        2016, "Macroeconomics",

        2016, "Macroeconomics",

        2016, "Macroeconomics",

        2017, "Macroeconomics",

        2017, "Macroeconomics",

        2017, "Macroeconomics",

        2017, "Macroeconomics",

        2017, "Macroeconomics",

        2017, "Macroeconomics",

        2013, "Calculus",

        2014, "Calculus",

        2015, "Calculus",

        2016, "Calculus",

        2017, "Calculus",

        2017, "Psychology",

        2017, "Psychology",

        2017, "Psychology",

        2017, "Psychology",

        2017, "Psychology",

        2018, "Psychology",

        2018, "Psychology")





ap_top <- ap %>% 

    count(examName, sort = TRUE) %>% 

    head(2) %>% 

    inner_join(ap, by = "examName") %>% 

    select(-n)



ap_top %>% 

    count(examName, year) %>% 

    ggplot(aes(x = year, y = n, group = examName)) +

    geom_line() +

    facet_wrap(~ examName)

My thought is to get my top n, then inner_join back on the original dataset. Then plot using that; essentially using the inner join as a filter.

I know there's a better way to do this, and I would love a more elegant solution! I'm all ears! Example dataset given (sorry it's so long).

edited Jan 18 at 20:18

asked Jan 18 at 20:09

talbe009

344

add a comment |

So in the problem below, I want the two most common examNames and then plot and facetwrap them by count of year.

ap <- 

      tribble(

        ~year, ~examName,

        2014, "Statistics",

        2015, "Statistics",

        2016, "Statistics",

        2016, "Statistics",

        2016, "Statistics",

        2016, "Statistics",

        2017, "Statistics",

        2017, "Statistics",

        2017, "Statistics",

        2017, "Statistics",

        2017, "Statistics",

        2013, "Macroeconomics",

        2013, "Macroeconomics",

        2014, "Macroeconomics",

        2015, "Macroeconomics",

        2016, "Macroeconomics",

        2016, "Macroeconomics",

        2016, "Macroeconomics",

        2016, "Macroeconomics",

        2016, "Macroeconomics",

        2017, "Macroeconomics",

        2017, "Macroeconomics",

        2017, "Macroeconomics",

        2017, "Macroeconomics",

        2017, "Macroeconomics",

        2017, "Macroeconomics",

        2013, "Calculus",

        2014, "Calculus",

        2015, "Calculus",

        2016, "Calculus",

        2017, "Calculus",

        2017, "Psychology",

        2017, "Psychology",

        2017, "Psychology",

        2017, "Psychology",

        2017, "Psychology",

        2018, "Psychology",

        2018, "Psychology")





ap_top <- ap %>% 

    count(examName, sort = TRUE) %>% 

    head(2) %>% 

    inner_join(ap, by = "examName") %>% 

    select(-n)



ap_top %>% 

    count(examName, year) %>% 

    ggplot(aes(x = year, y = n, group = examName)) +

    geom_line() +

    facet_wrap(~ examName)

My thought is to get my top n, then inner_join back on the original dataset. Then plot using that; essentially using the inner join as a filter.

I know there's a better way to do this, and I would love a more elegant solution! I'm all ears! Example dataset given (sorry it's so long).

edited Jan 18 at 20:18

asked Jan 18 at 20:09

talbe009

344

add a comment |

So in the problem below, I want the two most common examNames and then plot and facetwrap them by count of year.

ap <- 

      tribble(

        ~year, ~examName,

        2014, "Statistics",

        2015, "Statistics",

        2016, "Statistics",

        2016, "Statistics",

        2016, "Statistics",

        2016, "Statistics",

        2017, "Statistics",

        2017, "Statistics",

        2017, "Statistics",

        2017, "Statistics",

        2017, "Statistics",

        2013, "Macroeconomics",

        2013, "Macroeconomics",

        2014, "Macroeconomics",

        2015, "Macroeconomics",

        2016, "Macroeconomics",

        2016, "Macroeconomics",

        2016, "Macroeconomics",

        2016, "Macroeconomics",

        2016, "Macroeconomics",

        2017, "Macroeconomics",

        2017, "Macroeconomics",

        2017, "Macroeconomics",

        2017, "Macroeconomics",

        2017, "Macroeconomics",

        2017, "Macroeconomics",

        2013, "Calculus",

        2014, "Calculus",

        2015, "Calculus",

        2016, "Calculus",

        2017, "Calculus",

        2017, "Psychology",

        2017, "Psychology",

        2017, "Psychology",

        2017, "Psychology",

        2017, "Psychology",

        2018, "Psychology",

        2018, "Psychology")





ap_top <- ap %>% 

    count(examName, sort = TRUE) %>% 

    head(2) %>% 

    inner_join(ap, by = "examName") %>% 

    select(-n)



ap_top %>% 

    count(examName, year) %>% 

    ggplot(aes(x = year, y = n, group = examName)) +

    geom_line() +

    facet_wrap(~ examName)

My thought is to get my top n, then inner_join back on the original dataset. Then plot using that; essentially using the inner join as a filter.

I know there's a better way to do this, and I would love a more elegant solution! I'm all ears! Example dataset given (sorry it's so long).

edited Jan 18 at 20:18

asked Jan 18 at 20:09

talbe009

344

So in the problem below, I want the two most common examNames and then plot and facetwrap them by count of year.

ap <- 

      tribble(

        ~year, ~examName,

        2014, "Statistics",

        2015, "Statistics",

        2016, "Statistics",

        2016, "Statistics",

        2016, "Statistics",

        2016, "Statistics",

        2017, "Statistics",

        2017, "Statistics",

        2017, "Statistics",

        2017, "Statistics",

        2017, "Statistics",

        2013, "Macroeconomics",

        2013, "Macroeconomics",

        2014, "Macroeconomics",

        2015, "Macroeconomics",

        2016, "Macroeconomics",

        2016, "Macroeconomics",

        2016, "Macroeconomics",

        2016, "Macroeconomics",

        2016, "Macroeconomics",

        2017, "Macroeconomics",

        2017, "Macroeconomics",

        2017, "Macroeconomics",

        2017, "Macroeconomics",

        2017, "Macroeconomics",

        2017, "Macroeconomics",

        2013, "Calculus",

        2014, "Calculus",

        2015, "Calculus",

        2016, "Calculus",

        2017, "Calculus",

        2017, "Psychology",

        2017, "Psychology",

        2017, "Psychology",

        2017, "Psychology",

        2017, "Psychology",

        2018, "Psychology",

        2018, "Psychology")





ap_top <- ap %>% 

    count(examName, sort = TRUE) %>% 

    head(2) %>% 

    inner_join(ap, by = "examName") %>% 

    select(-n)



ap_top %>% 

    count(examName, year) %>% 

    ggplot(aes(x = year, y = n, group = examName)) +

    geom_line() +

    facet_wrap(~ examName)

My thought is to get my top n, then inner_join back on the original dataset. Then plot using that; essentially using the inner join as a filter.

I know there's a better way to do this, and I would love a more elegant solution! I'm all ears! Example dataset given (sorry it's so long).

r ggplot2 dplyr

edited Jan 18 at 20:18

asked Jan 18 at 20:09

talbe009

344

edited Jan 18 at 20:18

asked Jan 18 at 20:09

talbe009

344

edited Jan 18 at 20:18

asked Jan 18 at 20:09

talbe009

344

asked Jan 18 at 20:09

talbe009

344

asked Jan 18 at 20:09

talbe009

344

add a comment |

2 Answers
2

active

oldest

votes

You don't need inner_join() I would just determine top two exams in a separate statement and then filter on those.

top_exams <- count(ap, examName) %>% 

  top_n(2, n) %>% pull(examName)



ap %>% 

  filter(examName %in% top_exams) %>% 

  count(year, examName) %>% 

  ggplot(aes(x = year, y = n, group = examName)) +

  geom_line() +

  facet_wrap(~ examName)

answered Jan 18 at 20:21

dylanjm

393112

add a comment |

Another possibility:

ap %>% 

 group_by(examName) %>%

 mutate(temp = n()) %>%

 ungroup() %>%

 mutate(temp = dense_rank(desc(temp))) %>%

 filter(temp %in% c(1,2)) %>%

 select(-temp) %>%

 count(year, examName) %>% 

 ggplot(aes(x = year, y = n, group = examName)) +

 geom_line() +

 facet_wrap(~ examName)

It counts the cases per "examName" and ranks the count. Then, it filters the cases that have the greatest and the second greatest count.

answered Jan 18 at 20:43

tmfmnk

2,2941412

What's nice about this solution is that you could do things with the dense_rank, like use it in fct_reorder for sorting in the plot.

– talbe009
Jan 18 at 21:01

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54260789%2fselecting-top-n-groups-with-dplyr-then-plotting-other-variables%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

You don't need inner_join() I would just determine top two exams in a separate statement and then filter on those.

top_exams <- count(ap, examName) %>% 

  top_n(2, n) %>% pull(examName)



ap %>% 

  filter(examName %in% top_exams) %>% 

  count(year, examName) %>% 

  ggplot(aes(x = year, y = n, group = examName)) +

  geom_line() +

  facet_wrap(~ examName)

answered Jan 18 at 20:21

dylanjm

393112

add a comment |

You don't need inner_join() I would just determine top two exams in a separate statement and then filter on those.

top_exams <- count(ap, examName) %>% 

  top_n(2, n) %>% pull(examName)



ap %>% 

  filter(examName %in% top_exams) %>% 

  count(year, examName) %>% 

  ggplot(aes(x = year, y = n, group = examName)) +

  geom_line() +

  facet_wrap(~ examName)

answered Jan 18 at 20:21

dylanjm

393112

add a comment |

You don't need inner_join() I would just determine top two exams in a separate statement and then filter on those.

top_exams <- count(ap, examName) %>% 

  top_n(2, n) %>% pull(examName)



ap %>% 

  filter(examName %in% top_exams) %>% 

  count(year, examName) %>% 

  ggplot(aes(x = year, y = n, group = examName)) +

  geom_line() +

  facet_wrap(~ examName)

answered Jan 18 at 20:21

dylanjm

393112

You don't need inner_join() I would just determine top two exams in a separate statement and then filter on those.

top_exams <- count(ap, examName) %>% 

  top_n(2, n) %>% pull(examName)



ap %>% 

  filter(examName %in% top_exams) %>% 

  count(year, examName) %>% 

  ggplot(aes(x = year, y = n, group = examName)) +

  geom_line() +

  facet_wrap(~ examName)

answered Jan 18 at 20:21

dylanjm

393112

answered Jan 18 at 20:21

dylanjm

393112

answered Jan 18 at 20:21

dylanjm

393112

answered Jan 18 at 20:21

dylanjm

393112

add a comment |

Another possibility:

ap %>% 

 group_by(examName) %>%

 mutate(temp = n()) %>%

 ungroup() %>%

 mutate(temp = dense_rank(desc(temp))) %>%

 filter(temp %in% c(1,2)) %>%

 select(-temp) %>%

 count(year, examName) %>% 

 ggplot(aes(x = year, y = n, group = examName)) +

 geom_line() +

 facet_wrap(~ examName)

It counts the cases per "examName" and ranks the count. Then, it filters the cases that have the greatest and the second greatest count.

answered Jan 18 at 20:43

tmfmnk

2,2941412

What's nice about this solution is that you could do things with the dense_rank, like use it in fct_reorder for sorting in the plot.

– talbe009
Jan 18 at 21:01

add a comment |

Another possibility:

ap %>% 

 group_by(examName) %>%

 mutate(temp = n()) %>%

 ungroup() %>%

 mutate(temp = dense_rank(desc(temp))) %>%

 filter(temp %in% c(1,2)) %>%

 select(-temp) %>%

 count(year, examName) %>% 

 ggplot(aes(x = year, y = n, group = examName)) +

 geom_line() +

 facet_wrap(~ examName)

It counts the cases per "examName" and ranks the count. Then, it filters the cases that have the greatest and the second greatest count.

answered Jan 18 at 20:43

tmfmnk

2,2941412

What's nice about this solution is that you could do things with the dense_rank, like use it in fct_reorder for sorting in the plot.

– talbe009
Jan 18 at 21:01

add a comment |

Another possibility:

ap %>% 

 group_by(examName) %>%

 mutate(temp = n()) %>%

 ungroup() %>%

 mutate(temp = dense_rank(desc(temp))) %>%

 filter(temp %in% c(1,2)) %>%

 select(-temp) %>%

 count(year, examName) %>% 

 ggplot(aes(x = year, y = n, group = examName)) +

 geom_line() +

 facet_wrap(~ examName)

It counts the cases per "examName" and ranks the count. Then, it filters the cases that have the greatest and the second greatest count.

answered Jan 18 at 20:43

tmfmnk

2,2941412

Another possibility:

ap %>% 

 group_by(examName) %>%

 mutate(temp = n()) %>%

 ungroup() %>%

 mutate(temp = dense_rank(desc(temp))) %>%

 filter(temp %in% c(1,2)) %>%

 select(-temp) %>%

 count(year, examName) %>% 

 ggplot(aes(x = year, y = n, group = examName)) +

 geom_line() +

 facet_wrap(~ examName)

It counts the cases per "examName" and ranks the count. Then, it filters the cases that have the greatest and the second greatest count.

answered Jan 18 at 20:43

tmfmnk

2,2941412

answered Jan 18 at 20:43

tmfmnk

2,2941412

answered Jan 18 at 20:43

tmfmnk

2,2941412

answered Jan 18 at 20:43

tmfmnk

2,2941412

What's nice about this solution is that you could do things with the dense_rank, like use it in fct_reorder for sorting in the plot.

– talbe009
Jan 18 at 21:01

add a comment |

What's nice about this solution is that you could do things with the dense_rank, like use it in fct_reorder for sorting in the plot.

– talbe009
Jan 18 at 21:01

What's nice about this solution is that you could do things with the dense_rank, like use it in fct_reorder for sorting in the plot.

– talbe009
Jan 18 at 21:01

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Brtdku