What are the benefits of correlation analysis


The correlation describes the quantitative degree of dependency (primarily the linear relationship) between different (mostly two) features. In the case of a positive (negative) correlation between two features, an increase (a decrease) in the second feature can also be observed when the first feature increases. The correlation of the existing value pairs is measured by the correlation coefficient.

Statistical measure to show the dependency of different variables, for example the development of the price of a share on the overall price development. In the case of stocks, it is used to assess future development opportunities in the context of futures. In technical stock analysis, it is used in calculating the beta factor.

is a statistical measure of the relationship between two or more characteristics (example: between income and expenditure on consumer goods). It indicates to what extent the change in one characteristic is related to the change in the other (the other). The correlation can assume values ​​between -1 and +1, where "-1" is the perfect, opposite relationship (e.g. income increases by 10%, expenses decrease by 10%), "0" no relationship and "+1" more perfect, Equal relationship means (e.g. income increases by 10%, expenses increase by 10%). A connection can be established with the correlation, but this does not have to be a direct cause-effect relationship (causal relationship).

In economic sociology: general term for the common occurrence or the common - in the same or opposite direction - variation of two or more characteristics. A correlation of two features is not necessarily synonymous with a functional (causal) relationship, but always requires an additional interpretation. A number of different measures (coefficients) are available in statistics to describe and identify children. Some authors use the term correlation only for interval-scaled data and refer to the relationship for nominal or ordinal-scaled data as association, contingency, rank correlation or concordance. The use of these terms varies considerably.

is the relationship of two or more statistical variables. The mere presence of a correlation does not say anything about the causal relationship, so that a correlation does not necessarily require a cause-effect relationship. In the case of a positive correlation, an increase in one variable also leads to an increase in the other variable and vice versa. In the case of a negative correlation, the variables develop in opposite directions. If there is no correlation, conclusions cannot be drawn from the development of one variable about the other.

indicates the statistical strength of the relationship between variables.
This complements the regression analysis. In simple linear regression, the coefficient of determination r2 is used, which indicates how large the proportion of the scatter of the dependent variable Y is that can be explained by the mean linear regression relationship between Y and X due to the variation in X. This results in a range of existence of 0 <_ r2 <_
1. The preferred measure of quality is often r, the (PEARSON) correlation coefficient, for which -1 <_ r <_ +1 applies. The advantage is that not only the strength but also the direction of the causal relationship between X and Y is given. For example, r <0 means that essentially positive Y values ​​are associated with negative X values ​​and vice versa. For r = 0 the hypothesis of the general linear independence between X and Y cannot be excluded. However, this is also possible if one obtains r <0 or r> 0 on the basis of a sample finding. In order to be sure, given a given probability of error, that a linear independence between X and Y can be excluded in general, i.e. in the population, the confidence interval for the correlation coefficient of the population must not exceed the value zero. In the construction of such confidence intervals or in corresponding tests, it is assumed that the random variables X and Y have a bivariate normal distribution. Just as for the special linear relationship between two quantitative quantities, correlation coefficients or coefficients of determination can be determined for observations of two variables that are only ordered according to ranks (rank correlation coefficient) or for empirical regression relationships, for nonlinear regressions or for multiple regressions. The multiple coefficient of determination 0 <_ R2 <_ 1 indicates how large the proportion of the scatter of Y is that can be explained by the variation of all explicitly recorded, predetermined variables X1, ..., Xk. R2 can be represented by the matrix of the simple correlation coefficients from the endogenous and one exogenous variable and the vector of the simple correlation coefficients from the endogenous and one exogenous variable. If one wants to know how much explains only part of the variables of X1 Xk, in the extreme case only a predetermined variable, with regard to the scatter of Y while keeping the remaining X values ​​constant, so-called partial coefficients of determination or partial correlation coefficients have to be determined. In the case of rank values, the PEARSON correlation coefficient can be calculated in a simplified manner (SPEARMAN rank correlation coefficient). An alternative is with KENDALL's r. In the case of qualitative variables, other measures of the relationship should be used, e.g. YULE association measures. Separate indicators have also been developed for correlations between dichotomous and metrically scaled variables (biserial and point-series correlation coefficient), which can be generalized for the multiple case (polyserial and polychoric correlation coefficients).
0. 110. Literature: Schneeweiß, H. (1990). Huebler, O. (1989). Hartung, J. et al. (1993)

Previous technical term: Correction procedure | Next technical term: correlation analysis

Report this article to the editors as incorrect & mark it for editing