Select Board & Class


Correlation Analysis

Introduction to Correlation, Regression v/s Correlation

INTRODUCTION TO CORRELATION  Regression v/s Correlation   The measure of central tendency and measure of dispersion, these statistical measures dealt with only one variable at a time (univariate data). For example, we may find the mean height of the students of a class or the standard deviation among them. In both cases, a single variable, height, was involved. However, many times it is required to deal with more than one variable simultaneously. For example, we may wish to find the relationship between the age of a child and his/her height. In such cases, two other statistical tools, namely correlation and regression are studied.    In this lesson, we will study in detail correlation analysis and regression analysis.   CORRELATION ANALYSIS   Types of Data (on basis of the number of variables)  Univariate Data: One variable is there Bivariate Data: Two variables involved Multivariate Data: Multiple variables involved ‚Äč Meaning of Correlation   Carefully observe your surroundings. You will notice that there are many such pairs of variables where one variable is related to the other. Take, for example, the amount of rainfall and crop yield. The crop yield is directly related to the amount of rainfall. A similar relationship can be found in many variables such as the price of a commodity and its supply; the number of vehicles and pollution level and so on. The relationship between two variables is studied with the help of a statistical tool called correlation. It studies the degree and intensity of the relationship between the two variables. Definition of Correlation given by different mathematicians are as follows: As per Croxton and Cowden, "When the relationship is of qualitative nature, the appropriate statistical tool for discovering and measuring the relationship and expressing it in a brief formula is known as correlation." As per Boddington. " Whenever some definite connection exists between the two or more groups, classes or series or data there is said to be a correlation." As per A.M. Turtle, " An analysis of the relationship of two or more variables is usually called correlation." As per Connor, " Correlation analysis is the statistical tool that can be used to determine the degree to which one variable is related to the other." Significance of Correlation The study of correlation finds importance in understanding various practical life problems. i. Formation of laws: In economics, the study of correlation analysis forms the basis for various theories and laws such as the law of demand and that of supply, concept of elasticity, etc. For example, the law of demand is based on the relationship between the price of a commodity and its quantity demanded. ii. Degree and direction: Correlation helps in measuring the degree and direction of the relationship between two variables. For example, besides establishing the relationship between the demand of a commodity and its price, it would also help in estimating the extent to which the two are related and in which direction. iii. Base for regression analysis: Correlation serves as the base for regression analysis. Once it is established that the two variables are correlated, the value of one variable given the value of another variable can be depicted using the regression analysis. iv. Business decisions and planning: Correlation analysis proves helpful in taking important business-related decisions. For example, by looking at the trend on how an increase in production has to lead to an increase in profitability, future plans regarding production can be easily made. v. Helps in policy formation by the government: Similar to business, correlation also helps the government in framing plans and policies. For example, policies regarding poverty alleviation can be framed on the basis of a correlation between expenditure on poverty alleviation programs and percentage poverty reduction. Bivariate Data We know that statistical measures such as central tendency, dispersion, etc. relate to only one variable. Such distributions that relate to only one variable are known as univariate distributions. On the other hand, other statistical measures namely, correlation and regression deal with two variables simultaneously. Such data that relates to two variables is known as bivariate data and the corresponding distributions are known as bivariate frequency distributions or two-way frequency distributions. To understand the bivariate distribution, consider the example given below.   Variable Y Variable X 5 - 10 10 - 15 15 - 20 20 - 25 25 - 30 Total 5 - 10 I I II I I 6 10 - 15 I IIII IIIII IIIII I I 17 15 - 20 I II IIII II IIIII 14 20 - 25   I III IIIII III 12 25 - 30     I     1 Total 3 8 15 14 10 50 Marginal Distribution and Conditional Distribution From a bivariate distribution, two distributions can be derived. They are as follows: i. Marginal distribution ii. Conditional distribution Marginal Distribution: Marginal distribution is the frequency distribution of each of the variables individually along with the frequency totals/marginal totals. Conditional Distribution: Under conditional distribution, the frequency values of one variable are obtained when the values of the other variable are given. REGRESSION ANALYSIS Correlation analysis helps in identifying the presence and the degree of relationship between two variables. However, correlation analysis fails to identify whether a cause and effect relationship exists between the concerned variables. For example, while correlation analysis helps in identifying a strong relationship between the price of a commodity and its quantity demanded, it does not identify the cause and effect relationship between the two in the sense that through correlation, we are not able to estimate how a change in the price of the commodity affects its demand. For such estimation, we make use of regression analysis. Meaning of Regression Analysis Regression analysis is a statistical tool that helps in estimating the cause and effect relationship between two variables. In other words, it helps in estimating the value of one variable when the values of the other related variables are given. It must be noted that regression analysis can be done for two or more variables. The analysis that involves only two variables is known as simple regression. Here, we will focus on simple regression analysis; regression analysis involving more than two variables will be dealt with in higher studies. Simple Regression Analysis As discussed above, simple regression analysis involves two variables. Of these, one is the dependent variable, i.e. it is influenced by the changes in the other variable known as the independent variable. For example, if there are two variables, X and Y, and Y is influenced by the changes in the values of X, then Y is known as the dependent variable, and X is known as the independent variable. Regression Lines The regression line is also known as the line of best fit. It is used to estimate the value of the dependent variable for given values of the independent variable. A regression line is obtained by the method of ordinary least square. In case of two variables, X and Y, then there will be two regression lines, which are as follows. i. Regression line of Y on X: The regression line of Y on X can be defined as the line that is used to estimate the value of Y given the value of X. That is, this line gives the best estimate of Y given the value of X. ii. Regression line of X on Y: The regression line of X on Y can be defined as the line that is used to estimate the value of X given the value of Y. That is, this line gives the best estimate of X given the value of Y.      If the coefficient of correlation (r) is 1 or -1 then the regression lines will become identical (i.e. they will coincide)                                        If the coefficient of correlation is 0, then the regression lines are perpendicular to each other.

Regression Equations (also known as Estimating Equations) The algebraic form of regression lines is called regression equations. Similar to regression lines, there are two regression equations. The following are the two types of regression equations. i. Regression equation of Y on X Y=a+bX or Y-Y=byxX-X Where, a and b: Constants  byx: Regression Coefficient or slope Y on X Y: Dependent Variable X: Independent Variable ii. Regression equation of X on Y X=a'+b'Y or X-X=bxyY-Y Where, a' and b': Constants bxy: Regression Coefficient or slope Y on X X: Dependent Variable Y: Independent Variable Method of Obtaining Regression Lines - Ordinary Least Square (OLS) Method The regression lines are determined using the method of ordinary least squares. Under this method, the regression lines are drawn in such a way that the sum of the squares of deviations of the observed or actual value from the estimated value is minimum. In other words, the regression line of Y on X is derived by minimizing the vertical distances in the scatter diagram. Similarly, the regression line X on&nbs

To view the complete topic, please

What are you looking for?