How to aggregate categorical data in R?How to sum a variable by group?Quickly reading very large tables as dataframesGrouping functions (tapply, by, aggregate) and the *apply familyShow % instead of counts in charts of categorical variablesDrop data frame columns by nameHow to make a great R reproducible exampleHow to assign colors to categorical variables in ggplot2 that have stable mapping?data.table vs dplyr: can one do something well the other can't or does poorly?Aggregating mixed data by factor columnWhy does pandas grouping-aggregation discard categoricals column?

how do we prove that a sum of two periods is still a period?

Car headlights in a world without electricity

In Bayesian inference, why are some terms dropped from the posterior predictive?

What is required to make GPS signals available indoors?

Can someone clarify Hamming's notion of important problems in relation to modern academia?

Is there a hemisphere-neutral way of specifying a season?

What is this scratchy sound on the acoustic guitar called?

How dangerous is XSS

What is the fastest integer factorization to break RSA?

Pact of Blade Warlock with Dancing Blade

What is a Samsaran Word™?

How does a dynamic QR code work?

How to coordinate airplane tickets?

Sums of two squares in arithmetic progressions

How can I deal with my CEO asking me to hire someone with a higher salary than me, a co-founder?

How to compactly explain secondary and tertiary characters without resorting to stereotypes?

How seriously should I take size and weight limits of hand luggage?

How to aggregate categorical data in R?

Do creatures with a speed 0ft., fly 30ft. (hover) ever touch the ground?

Is this draw by repetition?

How could indestructible materials be used in power generation?

What does the same-ish mean?

Getting extremely large arrows with tikzcd

When handwriting 黄 (huáng; yellow) is it incorrect to have a disconnected 草 (cǎo; grass) radical on top?



How to aggregate categorical data in R?


How to sum a variable by group?Quickly reading very large tables as dataframesGrouping functions (tapply, by, aggregate) and the *apply familyShow % instead of counts in charts of categorical variablesDrop data frame columns by nameHow to make a great R reproducible exampleHow to assign colors to categorical variables in ggplot2 that have stable mapping?data.table vs dplyr: can one do something well the other can't or does poorly?Aggregating mixed data by factor columnWhy does pandas grouping-aggregation discard categoricals column?













7















I have a dataframe which consists of two columns with categorical variables (Better, Similar, Worse). I would like to come up with a table which counts the number of times that these categories appear in the two columns.
The dataframe I am using is as follows:



 Category.x Category.y
1 Better Better
2 Better Better
3 Similar Similar
4 Worse Similar


I would like to come up with a table like this:



 Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0


How would you go about it?










share|improve this question

















  • 4





    Looks like you need table(df1)

    – akrun
    5 hours ago











  • Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3?

    – Daniel
    5 hours ago












  • I would convert to factor with common levels lvls <- unique(unlist(df1)); df1[] <- lapply(df1, factor, levels = lvls) and then do the table(df1)

    – akrun
    4 hours ago















7















I have a dataframe which consists of two columns with categorical variables (Better, Similar, Worse). I would like to come up with a table which counts the number of times that these categories appear in the two columns.
The dataframe I am using is as follows:



 Category.x Category.y
1 Better Better
2 Better Better
3 Similar Similar
4 Worse Similar


I would like to come up with a table like this:



 Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0


How would you go about it?










share|improve this question

















  • 4





    Looks like you need table(df1)

    – akrun
    5 hours ago











  • Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3?

    – Daniel
    5 hours ago












  • I would convert to factor with common levels lvls <- unique(unlist(df1)); df1[] <- lapply(df1, factor, levels = lvls) and then do the table(df1)

    – akrun
    4 hours ago













7












7








7


1






I have a dataframe which consists of two columns with categorical variables (Better, Similar, Worse). I would like to come up with a table which counts the number of times that these categories appear in the two columns.
The dataframe I am using is as follows:



 Category.x Category.y
1 Better Better
2 Better Better
3 Similar Similar
4 Worse Similar


I would like to come up with a table like this:



 Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0


How would you go about it?










share|improve this question














I have a dataframe which consists of two columns with categorical variables (Better, Similar, Worse). I would like to come up with a table which counts the number of times that these categories appear in the two columns.
The dataframe I am using is as follows:



 Category.x Category.y
1 Better Better
2 Better Better
3 Similar Similar
4 Worse Similar


I would like to come up with a table like this:



 Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0


How would you go about it?







r aggregate






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked 5 hours ago









DanielDaniel

664




664







  • 4





    Looks like you need table(df1)

    – akrun
    5 hours ago











  • Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3?

    – Daniel
    5 hours ago












  • I would convert to factor with common levels lvls <- unique(unlist(df1)); df1[] <- lapply(df1, factor, levels = lvls) and then do the table(df1)

    – akrun
    4 hours ago












  • 4





    Looks like you need table(df1)

    – akrun
    5 hours ago











  • Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3?

    – Daniel
    5 hours ago












  • I would convert to factor with common levels lvls <- unique(unlist(df1)); df1[] <- lapply(df1, factor, levels = lvls) and then do the table(df1)

    – akrun
    4 hours ago







4




4





Looks like you need table(df1)

– akrun
5 hours ago





Looks like you need table(df1)

– akrun
5 hours ago













Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3?

– Daniel
5 hours ago






Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3?

– Daniel
5 hours ago














I would convert to factor with common levels lvls <- unique(unlist(df1)); df1[] <- lapply(df1, factor, levels = lvls) and then do the table(df1)

– akrun
4 hours ago





I would convert to factor with common levels lvls <- unique(unlist(df1)); df1[] <- lapply(df1, factor, levels = lvls) and then do the table(df1)

– akrun
4 hours ago












3 Answers
3






active

oldest

votes


















6














As mentioned in the comments, table is standard for this, like



table(stack(DT))

ind
values Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0


or



table(value = unlist(DT), cat = names(DT)[col(DT)])

cat
value Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0


or



with(reshape(DT, direction = "long", varying = 1:2), 
table(value = Category, cat = time)
)

cat
value x y
Better 2 2
Similar 1 2
Worse 1 0





share|improve this answer






























    3














    sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
    # Category.x Category.y
    #Better 2 2
    #Similar 1 2
    #Worse 1 0





    share|improve this answer






























      2














      One dplyr and tidyr possibility could be:



      df %>%
      gather(var, val) %>%
      count(var, val) %>%
      spread(var, n, fill = 0)

      val Category.x Category.y
      <chr> <dbl> <dbl>
      1 Better 2 2
      2 Similar 1 2
      3 Worse 1 0


      It, first, transforms the data from wide to long format, with column "var" including the variable names and column "val" the corresponding values. Second, it counts per "var" and "val". Finally, it spreads the data into the desired format.



      Or with dplyr and reshape2 you can do:



      df %>%
      mutate(rowid = row_number()) %>%
      melt(., id.vars = "rowid") %>%
      count(variable, value) %>%
      dcast(value ~ variable, value.var = "n", fill = 0)

      value Category.x Category.y
      1 Better 2 2
      2 Similar 1 2
      3 Worse 1 0





      share|improve this answer

























      • Is var = Category.x and val= c('Better', 'Similar', 'Worse')?

        – Daniel
        4 hours ago











      • Please see the updated post for commentary.

        – tmfmnk
        4 hours ago











      Your Answer






      StackExchange.ifUsing("editor", function ()
      StackExchange.using("externalEditor", function ()
      StackExchange.using("snippets", function ()
      StackExchange.snippets.init();
      );
      );
      , "code-snippets");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "1"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













      draft saved

      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55479506%2fhow-to-aggregate-categorical-data-in-r%23new-answer', 'question_page');

      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      6














      As mentioned in the comments, table is standard for this, like



      table(stack(DT))

      ind
      values Category.x Category.y
      Better 2 2
      Similar 1 2
      Worse 1 0


      or



      table(value = unlist(DT), cat = names(DT)[col(DT)])

      cat
      value Category.x Category.y
      Better 2 2
      Similar 1 2
      Worse 1 0


      or



      with(reshape(DT, direction = "long", varying = 1:2), 
      table(value = Category, cat = time)
      )

      cat
      value x y
      Better 2 2
      Similar 1 2
      Worse 1 0





      share|improve this answer



























        6














        As mentioned in the comments, table is standard for this, like



        table(stack(DT))

        ind
        values Category.x Category.y
        Better 2 2
        Similar 1 2
        Worse 1 0


        or



        table(value = unlist(DT), cat = names(DT)[col(DT)])

        cat
        value Category.x Category.y
        Better 2 2
        Similar 1 2
        Worse 1 0


        or



        with(reshape(DT, direction = "long", varying = 1:2), 
        table(value = Category, cat = time)
        )

        cat
        value x y
        Better 2 2
        Similar 1 2
        Worse 1 0





        share|improve this answer

























          6












          6








          6







          As mentioned in the comments, table is standard for this, like



          table(stack(DT))

          ind
          values Category.x Category.y
          Better 2 2
          Similar 1 2
          Worse 1 0


          or



          table(value = unlist(DT), cat = names(DT)[col(DT)])

          cat
          value Category.x Category.y
          Better 2 2
          Similar 1 2
          Worse 1 0


          or



          with(reshape(DT, direction = "long", varying = 1:2), 
          table(value = Category, cat = time)
          )

          cat
          value x y
          Better 2 2
          Similar 1 2
          Worse 1 0





          share|improve this answer













          As mentioned in the comments, table is standard for this, like



          table(stack(DT))

          ind
          values Category.x Category.y
          Better 2 2
          Similar 1 2
          Worse 1 0


          or



          table(value = unlist(DT), cat = names(DT)[col(DT)])

          cat
          value Category.x Category.y
          Better 2 2
          Similar 1 2
          Worse 1 0


          or



          with(reshape(DT, direction = "long", varying = 1:2), 
          table(value = Category, cat = time)
          )

          cat
          value x y
          Better 2 2
          Similar 1 2
          Worse 1 0






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 4 hours ago









          FrankFrank

          55.9k660135




          55.9k660135























              3














              sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
              # Category.x Category.y
              #Better 2 2
              #Similar 1 2
              #Worse 1 0





              share|improve this answer



























                3














                sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
                # Category.x Category.y
                #Better 2 2
                #Similar 1 2
                #Worse 1 0





                share|improve this answer

























                  3












                  3








                  3







                  sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
                  # Category.x Category.y
                  #Better 2 2
                  #Similar 1 2
                  #Worse 1 0





                  share|improve this answer













                  sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
                  # Category.x Category.y
                  #Better 2 2
                  #Similar 1 2
                  #Worse 1 0






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 4 hours ago









                  d.bd.b

                  20.5k41949




                  20.5k41949





















                      2














                      One dplyr and tidyr possibility could be:



                      df %>%
                      gather(var, val) %>%
                      count(var, val) %>%
                      spread(var, n, fill = 0)

                      val Category.x Category.y
                      <chr> <dbl> <dbl>
                      1 Better 2 2
                      2 Similar 1 2
                      3 Worse 1 0


                      It, first, transforms the data from wide to long format, with column "var" including the variable names and column "val" the corresponding values. Second, it counts per "var" and "val". Finally, it spreads the data into the desired format.



                      Or with dplyr and reshape2 you can do:



                      df %>%
                      mutate(rowid = row_number()) %>%
                      melt(., id.vars = "rowid") %>%
                      count(variable, value) %>%
                      dcast(value ~ variable, value.var = "n", fill = 0)

                      value Category.x Category.y
                      1 Better 2 2
                      2 Similar 1 2
                      3 Worse 1 0





                      share|improve this answer

























                      • Is var = Category.x and val= c('Better', 'Similar', 'Worse')?

                        – Daniel
                        4 hours ago











                      • Please see the updated post for commentary.

                        – tmfmnk
                        4 hours ago















                      2














                      One dplyr and tidyr possibility could be:



                      df %>%
                      gather(var, val) %>%
                      count(var, val) %>%
                      spread(var, n, fill = 0)

                      val Category.x Category.y
                      <chr> <dbl> <dbl>
                      1 Better 2 2
                      2 Similar 1 2
                      3 Worse 1 0


                      It, first, transforms the data from wide to long format, with column "var" including the variable names and column "val" the corresponding values. Second, it counts per "var" and "val". Finally, it spreads the data into the desired format.



                      Or with dplyr and reshape2 you can do:



                      df %>%
                      mutate(rowid = row_number()) %>%
                      melt(., id.vars = "rowid") %>%
                      count(variable, value) %>%
                      dcast(value ~ variable, value.var = "n", fill = 0)

                      value Category.x Category.y
                      1 Better 2 2
                      2 Similar 1 2
                      3 Worse 1 0





                      share|improve this answer

























                      • Is var = Category.x and val= c('Better', 'Similar', 'Worse')?

                        – Daniel
                        4 hours ago











                      • Please see the updated post for commentary.

                        – tmfmnk
                        4 hours ago













                      2












                      2








                      2







                      One dplyr and tidyr possibility could be:



                      df %>%
                      gather(var, val) %>%
                      count(var, val) %>%
                      spread(var, n, fill = 0)

                      val Category.x Category.y
                      <chr> <dbl> <dbl>
                      1 Better 2 2
                      2 Similar 1 2
                      3 Worse 1 0


                      It, first, transforms the data from wide to long format, with column "var" including the variable names and column "val" the corresponding values. Second, it counts per "var" and "val". Finally, it spreads the data into the desired format.



                      Or with dplyr and reshape2 you can do:



                      df %>%
                      mutate(rowid = row_number()) %>%
                      melt(., id.vars = "rowid") %>%
                      count(variable, value) %>%
                      dcast(value ~ variable, value.var = "n", fill = 0)

                      value Category.x Category.y
                      1 Better 2 2
                      2 Similar 1 2
                      3 Worse 1 0





                      share|improve this answer















                      One dplyr and tidyr possibility could be:



                      df %>%
                      gather(var, val) %>%
                      count(var, val) %>%
                      spread(var, n, fill = 0)

                      val Category.x Category.y
                      <chr> <dbl> <dbl>
                      1 Better 2 2
                      2 Similar 1 2
                      3 Worse 1 0


                      It, first, transforms the data from wide to long format, with column "var" including the variable names and column "val" the corresponding values. Second, it counts per "var" and "val". Finally, it spreads the data into the desired format.



                      Or with dplyr and reshape2 you can do:



                      df %>%
                      mutate(rowid = row_number()) %>%
                      melt(., id.vars = "rowid") %>%
                      count(variable, value) %>%
                      dcast(value ~ variable, value.var = "n", fill = 0)

                      value Category.x Category.y
                      1 Better 2 2
                      2 Similar 1 2
                      3 Worse 1 0






                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited 3 hours ago

























                      answered 4 hours ago









                      tmfmnktmfmnk

                      3,6561516




                      3,6561516












                      • Is var = Category.x and val= c('Better', 'Similar', 'Worse')?

                        – Daniel
                        4 hours ago











                      • Please see the updated post for commentary.

                        – tmfmnk
                        4 hours ago

















                      • Is var = Category.x and val= c('Better', 'Similar', 'Worse')?

                        – Daniel
                        4 hours ago











                      • Please see the updated post for commentary.

                        – tmfmnk
                        4 hours ago
















                      Is var = Category.x and val= c('Better', 'Similar', 'Worse')?

                      – Daniel
                      4 hours ago





                      Is var = Category.x and val= c('Better', 'Similar', 'Worse')?

                      – Daniel
                      4 hours ago













                      Please see the updated post for commentary.

                      – tmfmnk
                      4 hours ago





                      Please see the updated post for commentary.

                      – tmfmnk
                      4 hours ago

















                      draft saved

                      draft discarded
















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid


                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.

                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55479506%2fhow-to-aggregate-categorical-data-in-r%23new-answer', 'question_page');

                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      How does Billy Russo acquire his 'Jigsaw' mask? Unicorn Meta Zoo #1: Why another podcast? Announcing the arrival of Valued Associate #679: Cesar Manara Favourite questions and answers from the 1st quarter of 2019Why does Bane wear the mask?Why does Kylo Ren wear a mask?Why did Captain America remove his mask while fighting Batroc the Leaper?How did the OA acquire her wisdom?Is Billy Breckenridge gay?How does Adrian Toomes hide his earnings from the IRS?What is the state of affairs on Nootka Sound by the end of season 1?How did Tia Dalma acquire Captain Barbossa's body?How is one “Deemed Worthy”, to acquire the Greatsword “Dawn”?How did Karen acquire the handgun?

                      Личност Атрибути на личността | Литература и източници | НавигацияРаждането на личносттаредактиратередактирате

                      A sequel to Domino's tragic life Why Christmas is for Friends Cold comfort at Charles' padSad farewell for Lady JanePS Most watched News videos