Regression vs Random Forest - Combination of features The Next CEO of Stack Overflow2019 Community Moderator ElectionHow important is lookahead search in decision trees?feature importance via random forest and linear regression are differentsklearn random forest and fitting with continuous featuresWhy do we pick random features in random forestMultiple time-series predictions with Random Forests (in Python)Forecast Model recognize future trendFeatures selection/combination for random forestGet frequent features of scikitlearn random forestMetrics to evaluate features' importance in classification problem (with random forest)Mean Absolute Error in Random Forest Regression

Why is the US ranked as #45 in Press Freedom ratings, despite its extremely permissive free speech laws?

Purpose of level-shifter with same in and out voltages

Is "three point ish" an acceptable use of ish?

Towers in the ocean; How deep can they be built?

Won the lottery - how do I keep the money?

What steps are necessary to read a Modern SSD in Medieval Europe?

My ex-girlfriend uses my Apple ID to login to her iPad, do I have to give her my Apple ID password to reset it?

Does Germany produce more waste than the US?

Computationally populating tables with probability data

Decide between Polyglossia and Babel for LuaLaTeX in 2019

Sulfuric acid symmetry point group

IC has pull-down resistors on SMBus lines?

Is French Guiana a (hard) EU border?

How to find image of a complex function with given constraints?

When "be it" is at the beginning of a sentence, what kind of structure do you call it?

What connection does MS Office have to Netscape Navigator?

Players Circumventing the limitations of Wish

0-rank tensor vs vector in 1D

TikZ: How to fill area with a special pattern?

What does "shotgun unity" refer to here in this sentence?

Where do students learn to solve polynomial equations these days?

How to get the last not-null value in an ordered column of a huge table?

Regression vs Random Forest - Combination of features

Can I use the word “Senior” as part of a job title directly in German?



Regression vs Random Forest - Combination of features



The Next CEO of Stack Overflow
2019 Community Moderator ElectionHow important is lookahead search in decision trees?feature importance via random forest and linear regression are differentsklearn random forest and fitting with continuous featuresWhy do we pick random features in random forestMultiple time-series predictions with Random Forests (in Python)Forecast Model recognize future trendFeatures selection/combination for random forestGet frequent features of scikitlearn random forestMetrics to evaluate features' importance in classification problem (with random forest)Mean Absolute Error in Random Forest Regression










3












$begingroup$


I had a discussion with a friend and we were talking about the advantages of random forest over linear regression.



At some point, my friend said that one of the advantages of the random forest over the linear regression is that it takes automatically into account the combination of features.



By this he meant that if I have a model with



  • Y as a target

  • X, W, Z as the predictors

then the random forests tests also the combinations of the features (e.g. X+W) whereas in linear regression you have to build these manually and insert them at the model.



I am quite confused, is this true?



Also if it true then is it about any kind of combination of features (e.g. X*W, X+W+Z etc) or only for some specific ones (e.g. X+W)?










share|improve this question











$endgroup$
















    3












    $begingroup$


    I had a discussion with a friend and we were talking about the advantages of random forest over linear regression.



    At some point, my friend said that one of the advantages of the random forest over the linear regression is that it takes automatically into account the combination of features.



    By this he meant that if I have a model with



    • Y as a target

    • X, W, Z as the predictors

    then the random forests tests also the combinations of the features (e.g. X+W) whereas in linear regression you have to build these manually and insert them at the model.



    I am quite confused, is this true?



    Also if it true then is it about any kind of combination of features (e.g. X*W, X+W+Z etc) or only for some specific ones (e.g. X+W)?










    share|improve this question











    $endgroup$














      3












      3








      3





      $begingroup$


      I had a discussion with a friend and we were talking about the advantages of random forest over linear regression.



      At some point, my friend said that one of the advantages of the random forest over the linear regression is that it takes automatically into account the combination of features.



      By this he meant that if I have a model with



      • Y as a target

      • X, W, Z as the predictors

      then the random forests tests also the combinations of the features (e.g. X+W) whereas in linear regression you have to build these manually and insert them at the model.



      I am quite confused, is this true?



      Also if it true then is it about any kind of combination of features (e.g. X*W, X+W+Z etc) or only for some specific ones (e.g. X+W)?










      share|improve this question











      $endgroup$




      I had a discussion with a friend and we were talking about the advantages of random forest over linear regression.



      At some point, my friend said that one of the advantages of the random forest over the linear regression is that it takes automatically into account the combination of features.



      By this he meant that if I have a model with



      • Y as a target

      • X, W, Z as the predictors

      then the random forests tests also the combinations of the features (e.g. X+W) whereas in linear regression you have to build these manually and insert them at the model.



      I am quite confused, is this true?



      Also if it true then is it about any kind of combination of features (e.g. X*W, X+W+Z etc) or only for some specific ones (e.g. X+W)?







      feature-selection random-forest feature-engineering






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 40 mins ago







      Poete Maudit

















      asked 8 hours ago









      Poete MauditPoete Maudit

      406314




      406314




















          2 Answers
          2






          active

          oldest

          votes


















          1












          $begingroup$

          I would say it is partly true as Random forests which are made up of decision trees does perform feature selection but they do not perform feature engineering (feature selection is different from feature engineering). Decision trees use a metric called Information gain (which is total entropy minus the weighted entropy) as per which useful features are separated from bad features. Simply to say whichever feature exhibit the highest information gain on this iteration is chosen as the node on which the tree on this iteration is split or you can say which feature reduces the entropy(aka randomness) the most in this iteration is chosen as the node upon which the tree is split on this iteration. So if you data is text, trees are split upon words. If your data is real valued numbers, tree is split upon that. Hope it helps



          For more details check this






          share|improve this answer









          $endgroup$




















            1












            $begingroup$

            I think it is true. Tree based algorithms especially the ones with multiple trees has the capability of capturing different feature interactions. Please see this article from xgboost official documentation and this discussion. You can say it's a perk of being a non parametric model (trees are non parametric and linear regression is not). I hope this will shed some light on this thought.






            share|improve this answer











            $endgroup$












            • $begingroup$
              (+1) As an example,Tree 1 works with features (A, B) and gives 80% accuracy, Tree 2 works with features (C, D) and gives 60%. A boosting algorithm puts more weight on Tree 1, thus effectively favors f(A, B) over g(C, D).
              $endgroup$
              – Esmailian
              3 hours ago












            Your Answer





            StackExchange.ifUsing("editor", function ()
            return StackExchange.using("mathjaxEditing", function ()
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            );
            );
            , "mathjax-editing");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "557"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48294%2fregression-vs-random-forest-combination-of-features%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1












            $begingroup$

            I would say it is partly true as Random forests which are made up of decision trees does perform feature selection but they do not perform feature engineering (feature selection is different from feature engineering). Decision trees use a metric called Information gain (which is total entropy minus the weighted entropy) as per which useful features are separated from bad features. Simply to say whichever feature exhibit the highest information gain on this iteration is chosen as the node on which the tree on this iteration is split or you can say which feature reduces the entropy(aka randomness) the most in this iteration is chosen as the node upon which the tree is split on this iteration. So if you data is text, trees are split upon words. If your data is real valued numbers, tree is split upon that. Hope it helps



            For more details check this






            share|improve this answer









            $endgroup$

















              1












              $begingroup$

              I would say it is partly true as Random forests which are made up of decision trees does perform feature selection but they do not perform feature engineering (feature selection is different from feature engineering). Decision trees use a metric called Information gain (which is total entropy minus the weighted entropy) as per which useful features are separated from bad features. Simply to say whichever feature exhibit the highest information gain on this iteration is chosen as the node on which the tree on this iteration is split or you can say which feature reduces the entropy(aka randomness) the most in this iteration is chosen as the node upon which the tree is split on this iteration. So if you data is text, trees are split upon words. If your data is real valued numbers, tree is split upon that. Hope it helps



              For more details check this






              share|improve this answer









              $endgroup$















                1












                1








                1





                $begingroup$

                I would say it is partly true as Random forests which are made up of decision trees does perform feature selection but they do not perform feature engineering (feature selection is different from feature engineering). Decision trees use a metric called Information gain (which is total entropy minus the weighted entropy) as per which useful features are separated from bad features. Simply to say whichever feature exhibit the highest information gain on this iteration is chosen as the node on which the tree on this iteration is split or you can say which feature reduces the entropy(aka randomness) the most in this iteration is chosen as the node upon which the tree is split on this iteration. So if you data is text, trees are split upon words. If your data is real valued numbers, tree is split upon that. Hope it helps



                For more details check this






                share|improve this answer









                $endgroup$



                I would say it is partly true as Random forests which are made up of decision trees does perform feature selection but they do not perform feature engineering (feature selection is different from feature engineering). Decision trees use a metric called Information gain (which is total entropy minus the weighted entropy) as per which useful features are separated from bad features. Simply to say whichever feature exhibit the highest information gain on this iteration is chosen as the node on which the tree on this iteration is split or you can say which feature reduces the entropy(aka randomness) the most in this iteration is chosen as the node upon which the tree is split on this iteration. So if you data is text, trees are split upon words. If your data is real valued numbers, tree is split upon that. Hope it helps



                For more details check this







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered 7 hours ago









                karthikeyan mgkarthikeyan mg

                30510




                30510





















                    1












                    $begingroup$

                    I think it is true. Tree based algorithms especially the ones with multiple trees has the capability of capturing different feature interactions. Please see this article from xgboost official documentation and this discussion. You can say it's a perk of being a non parametric model (trees are non parametric and linear regression is not). I hope this will shed some light on this thought.






                    share|improve this answer











                    $endgroup$












                    • $begingroup$
                      (+1) As an example,Tree 1 works with features (A, B) and gives 80% accuracy, Tree 2 works with features (C, D) and gives 60%. A boosting algorithm puts more weight on Tree 1, thus effectively favors f(A, B) over g(C, D).
                      $endgroup$
                      – Esmailian
                      3 hours ago
















                    1












                    $begingroup$

                    I think it is true. Tree based algorithms especially the ones with multiple trees has the capability of capturing different feature interactions. Please see this article from xgboost official documentation and this discussion. You can say it's a perk of being a non parametric model (trees are non parametric and linear regression is not). I hope this will shed some light on this thought.






                    share|improve this answer











                    $endgroup$












                    • $begingroup$
                      (+1) As an example,Tree 1 works with features (A, B) and gives 80% accuracy, Tree 2 works with features (C, D) and gives 60%. A boosting algorithm puts more weight on Tree 1, thus effectively favors f(A, B) over g(C, D).
                      $endgroup$
                      – Esmailian
                      3 hours ago














                    1












                    1








                    1





                    $begingroup$

                    I think it is true. Tree based algorithms especially the ones with multiple trees has the capability of capturing different feature interactions. Please see this article from xgboost official documentation and this discussion. You can say it's a perk of being a non parametric model (trees are non parametric and linear regression is not). I hope this will shed some light on this thought.






                    share|improve this answer











                    $endgroup$



                    I think it is true. Tree based algorithms especially the ones with multiple trees has the capability of capturing different feature interactions. Please see this article from xgboost official documentation and this discussion. You can say it's a perk of being a non parametric model (trees are non parametric and linear regression is not). I hope this will shed some light on this thought.







                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited 4 hours ago

























                    answered 4 hours ago









                    tamtam

                    614




                    614











                    • $begingroup$
                      (+1) As an example,Tree 1 works with features (A, B) and gives 80% accuracy, Tree 2 works with features (C, D) and gives 60%. A boosting algorithm puts more weight on Tree 1, thus effectively favors f(A, B) over g(C, D).
                      $endgroup$
                      – Esmailian
                      3 hours ago

















                    • $begingroup$
                      (+1) As an example,Tree 1 works with features (A, B) and gives 80% accuracy, Tree 2 works with features (C, D) and gives 60%. A boosting algorithm puts more weight on Tree 1, thus effectively favors f(A, B) over g(C, D).
                      $endgroup$
                      – Esmailian
                      3 hours ago
















                    $begingroup$
                    (+1) As an example,Tree 1 works with features (A, B) and gives 80% accuracy, Tree 2 works with features (C, D) and gives 60%. A boosting algorithm puts more weight on Tree 1, thus effectively favors f(A, B) over g(C, D).
                    $endgroup$
                    – Esmailian
                    3 hours ago





                    $begingroup$
                    (+1) As an example,Tree 1 works with features (A, B) and gives 80% accuracy, Tree 2 works with features (C, D) and gives 60%. A boosting algorithm puts more weight on Tree 1, thus effectively favors f(A, B) over g(C, D).
                    $endgroup$
                    – Esmailian
                    3 hours ago


















                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48294%2fregression-vs-random-forest-combination-of-features%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Благоевград Съдържание География | История | Население | Политика | Икономика и инфрастуктура | Здравеопазване | Образование и наука | Култура и забавления | Забележителности | Личности | Литература | Външни препратки | Бележки | Навигация42°01′18.99″ с. ш. 23°05′51″ и. д. / 42.021944° с. ш. 23.0975° и. д.*БлагоевградразширитередактиранеОфициален уебсайт на община БлагоевградНовинарски портал на Благоевград – blagoevgrad.euСайтове за БлагоевградНационален статистически институтdariknews.bgГригоровичъ, Викторъ. „Очеркъ путешествія по Европейской Турціи“. Москва, 1877.Стрезов, Георги. Два санджака от Източна Македония. Периодично списание на Българското книжовно дружество в Средец, кн. XXXVII и XXXVIII, 1891, стр. 18 – 19.Македония. Етнография и статистикаГаджанов, Димитър Г. Мюсюлманското население в Новоосвободените земи, в: Научна експедиция в Македония и Поморавието 1916, Военноиздателски комплекс „Св. Георги Победоносец“, Университетско издателство „Св. Климент Охридски“, София, 1993, стр. 244.паметник на незнайния четник&cd=18&hl=en&ct=clnk&client=firefox-a „История на днешен Благоевград“, взето от www.museumblg.com на 16 март 2010 г.„Справка за населението на град Благоевград, община Благоевград, област Благоевград, НСИ“„The population of all towns and villages in Blagoevgrad Province with 50 inhabitants or more according to census results and latest official estimates“„Ethnic composition, all places: 2011 census“История на Неврокопска епархия.Национален статистически институтМюсюлманско изповедание. Главно мюфтийствоНационален публичен регистър на храмовете в БългарияМюсюлманско изповедание. Главно мюфтийствоwww.dnes.bg Джамията в Благоевград не била паленаwww.sesc-bg.orgСписък на побратимени градовеТехническо побратимяванеГУМ грейва в цветовете на нощен Лас Вегас под името „Largo“, „МОЛ Благоевград“..., в. „Струма“grabo.bgwww.cinemaxbg.comррр4238731-067cad53a-0546-417b-a3d3-51e49b1d2232147736077147736077

                    What is the best defense strategy for Survival in Grand Theft Auto Online?What is JP used for in Grand Theft Auto Online?How do I setup a Crew HQ in Grand Theft Auto Online?How does stealth work in Grand Theft Auto Online?Is it possible to own more than 10 cars in Grand Theft Auto online?Where to find truck/trailers in Grand Theft Auto OnlineWhat are some of the best missions to do on Grand Theft Auto 5 onlineFastest Car in Grand Theft Auto V PCHow to setup a Crew vs Crew online session in Grand Theft Auto Online?Grand theft auto 5 crossplayingRestart Grand Theft Auto V Online?

                    How does Billy Russo acquire his 'Jigsaw' mask? Unicorn Meta Zoo #1: Why another podcast? Announcing the arrival of Valued Associate #679: Cesar Manara Favourite questions and answers from the 1st quarter of 2019Why does Bane wear the mask?Why does Kylo Ren wear a mask?Why did Captain America remove his mask while fighting Batroc the Leaper?How did the OA acquire her wisdom?Is Billy Breckenridge gay?How does Adrian Toomes hide his earnings from the IRS?What is the state of affairs on Nootka Sound by the end of season 1?How did Tia Dalma acquire Captain Barbossa's body?How is one “Deemed Worthy”, to acquire the Greatsword “Dawn”?How did Karen acquire the handgun?