|Year : 2021 | Volume
| Issue : 1 | Page : 31-38
Understanding statistical association and correlation
Ramesh Lal Sapra1, Satish Saluja2
1 GRIPMER, SGRH, New Delhi, India
2 Department of Neonatology, SGRH, New Delhi, India, Indi
|Date of Submission||15-Nov-2020|
|Date of Decision||15-Dec-2020|
|Date of Acceptance||06-Jan-2021|
|Date of Web Publication||19-Feb-2021|
Dr. Ramesh Lal Sapra
GRIPMER, SGRH, Rajinder Nagar, New Delhi - 110 060
Source of Support: None, Conflict of Interest: None
In medical research, the word ‘association’ and ‘correlation’ between two attributes/variables are frequently used and many times interchanged. Simplifying these concepts may help the researchers in applying the appropriate test. The article makes an attempt to simplify the concept of statistical association and correlation, especially for the clinical practitioners and researchers. The article discusses various measures of association and relationship for testing and assessing the strength. It also includes discussion on three popular measures of association used in medical research, namely odds ratio (OR), relative risk (RR) and hazard ratio which measure association of outcome between the two groups. Pearson Chi-square test is the most common test and has been extensively used for studying the association without bothering about its limitations or strength. Many times, researchers take it granted that the OR and RR are one and the same thing. Our calculations suggest us that with probabilities of outcome of 0.5 and 0.1, the OR is 9, whereas RR is 5. Tools for studying the statistical association and correlation should be used cautiously and appropriate tests to be used, particularly when assumptions are violated. While studying the association, its strength should be assessed using the appropriate statistics. OR and RR measure the association for assessing the risk. However, we should avoid equating OR with RR, particularly when the probabilities of outcome are not small.
Keywords: Association, correlation, effect size, Pearson's Chi-square, strength
|How to cite this article:|
Sapra RL, Saluja S. Understanding statistical association and correlation. Curr Med Res Pract 2021;11:31-8
| Introduction|| |
In medical research, the word ‘association’ and ‘correlation’ between two attributes/variables are frequently used and many times interchanged. In the English language, they may carry the same meaning, but statistically, they do not. Association refers to the general relationship and is normally used for studying relationship between two nominal/categorical/ordinal attributes, whereas correlation refers to a linear relationship between two quantitative attributes. It would not be out of context to mention here that the relationship between two quantitative variables can even be a nonlinear as well such as curvilinear or exponential.
Medical research literature includes plenty of association studies to study the association between attributes. In most of the association studies, Pearson Chi-square test is the most common test and has been extensively used without bothering about its limitations or strength. Alternatively, there are few more tests in the statistical theory which takes care of the limitations. We give an overview of the concept of association including various statistics/tests used for testing and assessing the strength of association. We also discuss the controversy that surrounded the Pearson Chi-square test long back. In addition to this, the present article also includes discussion on three popular measures of association used in medical research, namely odds ratio (OR), relative risk (RR) and hazard ratio (HR) which measure the association of outcome between the two groups. These measures of association give an idea of a comparative risk under the contrasting/different conditions and are sometimes misunderstood and interchanged. The present article makes an attempt to simplify the concept of association and correlation, and cautions the researchers for using the appropriate test.
Association is a general relationship between the two attributes when the attributes are nominal or ordinal. A nominal variable is a categorical variable and has two or more than two mutually exclusive (non-overlapping) categories, for example, smoking habit (Yes, No), hypertension (Yes, No) and colour of an eye (black, grey and blue). An ordinal variable is a categorical variable whose values are ordered and is mainly derived from a quantitative variable. For example, body mass index (BMI) is a quantitative variable and we can convert it into ordinal variable as follows:
- Underweight (if BMI <18.5)
- Normal (If BMI is 18.5 to <25)
- Overweight (If BMI is 25 to <30)
- Obese (If BMI is 30 or higher).
Other examples of ordinal variables are income (1. low, 2. medium and 3. high) and education (1. uneducated, 2. school level, 3. college level, and 4. highly educated).
Two attributes are said to be associated if the response of one attribute changes over the different states/categories of the other attribute. Conversely, if the response does not change, they are said to be independent. For example, if we notice in the community that people who smoke are more prone to lung cancer than who don't, we have a reason to suspect that smoking is linked or associated with occurrence of lung cancer. Had they been independent, we could have observed more or less similar response (prevalence) of lung cancer among smokers as well as non-smokers. Similarly, if we happen to observe a higher prevalence of kidney failure among those who have diabetes than who don't, there is a likelihood that diabetes and kidney failure are associated.
We try to understand the concept of association between smoking and lung cancer based on a data of a randomly drawn sample of size 1200 from a large population. [Table 1] shows the distribution of the sample:
Let us analyse the data of [Table 1]. The prevalence of lung cancer is 10% among the smokers and 1% among the non-smokers. Or alternatively, we can also say that the prevalence of cancer-free population is 90% among the smokers and 99% among the non-smokers. This shows that the attribute lung cancer changes when we move from smoking category to non-smoking category. In other words, lung cancer and smoking are associated. Had the two attributes been independent, we could have observed more or less similar values of prevalence among the smokers as well as non-smokers. From this table data, we cannot say whether this association is statistically significant or it has occurred by chance; and if it is significant, then how much is the strength of association.
The concept of association can be easily understood if we visualise [Figure 1]. Under association, two bars differ in prevalence, whereas they do not when they are independent.
Testing association between attributes
In the previous section, we tried to discuss the association on the basis of changes in the prevalence pattern of an attribute over the different states of the other attribute. However, this gives us some idea about the association which may not be statistically significant. If the observed and expected counts differ, we may think of the association between the attributes. The most commonly used test for detecting association statistically is the Pearson Chi-square statistic which can be used for a 2 × 2 or higher classification (m × n classification, m categories for the first attribute and n for the other). This test is also known as ‘test of goodness-of-fit.’ The test calculates the expected values of the observed count/frequency of each cell assuming the attributes to be independent. The expected count of any cell is obtained by multiplying its row total with its column total and dividing by the total sample size. Let us calculate the expected count for the data in [Table 2].
[Table 2] shows that in each cell, the observed count differs from the expected count. It is this difference which is responsible for association between attributes. Pearson Chi-square (χ2) statistic involves summation of these squared differences between the observed and expected counts for all the cells as shown in the formula in the adjoining box. For testing the association or independence of attributes, the Chi-square value is calculated and corresponding to this Chi-square value, a P value is calculated. This is all done by the software automatically and remains hidden to the user. If the calculated P < 0.05, we reject the null hypothesis of independence of attributes or we say that there is a likelihood of significant association between the two attributes [Figure 2].
Common tests for testing association
Statistical Package for the Social Sciences (SPSS) by IBM gives four tests under the Chi-square included under the crosstab module for testing the independence of nominal/ordinal attributes. These are Pearson Chi-square, Chi-square with Yates's continuity correction, likelihood ratio and Fisher's exact tests. Normally, there is a confusion which test to be used for drawing conclusions, particularly when different tests give contrasting results. The following points will help in taking decision:
- If it is a 2 × 2 contingency table, i.e., two categories for each variable, the best choice is Fisher's exact test. Compared to Pearson Chi-square test, Fisher's test is even suited for small samples, particularly when more than 20% of the cells have expected count <5
- For a m × n contingency table (for example 2 × 3 or 3 × 4 etc.) use Pearson Chi-square if at least 80% of the cells have expected count more than 5. However, if more than 20% of the cells have expected count <5, use maximum likelihood ratio Chi-square test. Likelihood ratio statistic is based on the ratio of the observed to the expected frequency and is most often used when the sample size is too small and does not meet the Chi-square assumption of at least 80% of the cells having a frequency >5. Likelihood ratio tests are increasingly being used in situations where Chi-squared tests were previously recommended. Likelihood ratio tests are also known as ‘log-likelihood ratio test,’ G-test of goodness-of-fit’ or ‘G2 test'
- The Chi-square assumes independence of observations and cannot be used for matched or paired observations. For such data, McNemar's test may be more appropriate.
We further elaborate the above-said points through hypothetical data [Table 3] for testing whether chronic hypertension is associated with low birth weight or not.
Observed and expected counts differ in each cell of [Table 3]; we may suspect association between hypertension and birth weight. Here, 50% of the cells of the contingency table, i.e., two of four cells have expected count <5. One may be curious to know what test he/she should use to test the null hypothesis. SPSS outputs four tests [Table 4] for testing statistical significance of the association. Of the four tests, only likelihood ratio gives P < 0.041 suggesting a strong evidence against the null hypothesis of ‘no association.’ Contrary to this, the rest three tests have P > 0.05. What should be our decision in this case? Since our data pertains to 2 × 2 contingency table and 50% of the cells have counts <5, we suggest a decision to be based on Fisher's exact test rather than the likelihood ratio which is an approximation method and suited for contingency tables larger than 2 × 2. It is not out of the context to mention here that an alternative exact test, Barnard's exact test is also available in the literature and is more powerful, particularly in 2 × 2 tables. However, the said test is computationally intensive. Most of the popular statistical packages give Fisher's exact test for a 2 × 2 contingency table. However, this test is also available for bigger contingency tables up to 5 × 5 on the web page ‘Social Science Statistics.’ (http://www.socscistatistics.com/tests/chisquare2/Default2.aspx).
|Table 4: Statistical package for social sciences output for test of significance of association|
Click here to view
We did not mention anything about the fourth method which is a Chi-square with Yates's continuity correction. Yates's correction is applied to 2 × 2 contingency table, especially when the sample size is small. The method corrects for the discontinuity in small samples by subtracting 0.5 from the absolute value of difference between observed and expected count for each cell before calculating the Chi-square statistic. This correction may tend to overcorrect and Sokal and Rohlf suggested that there is no need of Yates's correction even with quite low sample sizes.
Strength of association
Above we have discussed at length what is statistical association and how it is tested. If P < 0.05, we reject the null hypothesis that the two attributes are independent. However, a significant P value does not tell us whether the association is ‘weak’ or ‘strong'. When the sample size is large, even the ‘weak association’ becomes significant. Most of the researchers make a mention of rejecting the null hypothesis in favour of an alternative hypothesis but do not further investigate how much is the strength of association or the effect size (ES). Cramer's V is the most useful statistics used for testing the strength of association when Chi-square is found to be significant. Phi(φ) statistic is used for assessing the strength for 2 × 2 contingency table, whereas Cramer's V is used for classification larger than 2 × 2. Cramer's V is measured as shown in the box [Figure 3].
Cramer's V is a measure of strength of association and is also known as ES. [Table 5] can be used for categorising the strength suggested by Rea and Parker:
Chi-square statistic developed by Karl Pearson in 1900 is the most useful statistic and occupies space in almost all fields of science. It is considered among the top 20 discoveries since 1900 that have changed our world. Hacking says, ‘Karl Person's Chi-square test ushered ‘in a new kind of decision making.' Most of the medical researchers remain untouched with the fact that statistic we are discussing has been a great controversy almost a century ago. The controversy which involved Pearson and Fisher primarily pertained to degrees of freedom rather than the formula as itself. Degrees of freedom is a number that determines the critical value for testing the significance. Pearson did not recognise that degrees of freedom required for testing significance depends on the parameters (expected proportions) to be estimated from the data under the null hypothesis. Fisher in his article published in 1922 strongly argued that Pearson made a fundamental mistake. However, Pearson did not agree with Fisher's arguments and rather reacted swiftly with great hostility. We reproduce his reaction in verbatim published in 1922 issue of Biometrika in the following paragraph.
''The above re-description of what seems to me very elementary considerations would be unnecessary had not a recent writer in the Journal of the Royal Statistical Society appeared to have wholly ignored them. He considers that I have made serious blunders in not linking my degrees of freedom by the number of moments I have taken;… I hold that such a view is entirely erroneous and that the writer has done no service to the science of statistics by giving it broad cast circulation in the pages of the Journal of the Royal Statistical Society''.
And finally, ‘'the move Fisher made in Chi-squared controversy has been adopted throughout the theory of statistical inference''.
Measures of association in medical literature
In the previous sections, we have discussed general tools for testing and assessing the strength of association between two attributes and can be applied in any scientific field. However, in addition to this in medical literature, there are three popular methods of association, namely OR, RR and HR. All these association measures look for association of an event (e.g., occurrence of a cancer, recurrence of a disease and survival/mortality) under two conditions. For example, these two conditions can be exposure versus non-exposure, treatment versus non-treatment or surgery versus conservative treatment etc. All the above said three measures of association give an idea of a comparative risk under the contrasting/different conditions and are sometimes misunderstood and interchanged. They are known as relative measures of association. In addition to relative measures, there are methods based on absolute measures of association such as risk difference, rate difference and number needed to treat. These absolute methods are less frequently used and have their own merits and demerits.
Generally, there is a confusion among the researchers which relative measure is to be used. The following points will help the researchers in choosing the right measure:
- OR is simply the ratio of two odds under two conditions, for example, odds under the exposure and odds under non-exposure. OR is generally used for assessing risk in case–control studies. We rewrite the data of [Table 1] as a case–control study in [Table 6] where a sample of 30 cases and a sample of 1170 controls have been selected from the two populations.
OR of two odds for the above data can be calculated as follows:
OR = Odd1/Odd2
Where Odd1 are the odds of risk under exposure (smoking) and is equal to the ratio of probability of lung cancer and probability of not having it among the smokers, i.e., 0.1/0.9 = 1/9. This can be even directly calculated by dividing counts, i.e., 20/180 = 1/9.
Similarly, we can calculate Odd2 under non-exposure = 10/990 = 1/99.
OR is calculated by dividing Odd1 by Odd2. OR =(1/9)/(1/99) = 11.
- RR is the ratio of two probabilities and is generally calculated when we study outcomes for the two groups, for example, recurrence of a cancer among patients of two cohorts where patients of one cohort receive new drug and in other old drug. Unlike case–control studies where we have two samples – one pertains to ‘cases’ and the other to ‘controls,’ in cohort studies, samples pertain to two groups and enable us to estimate the probabilities of outcome for each group correctly. Thus, we should calculate RR rather than OR where we can calculate probabilities of outcome for each group
- OR may or may not be equal to RR depending on the probabilities of outcome in the two groups. For the above discussed case–control study, OR is 11 and RR is 10. However, it is wrong here to calculate the RR as we cannot estimate the probabilities of outcome in the exposure and non-exposure groups as samples have not been drawn accordingly from the populations which had a exposure and the other which did not. OR and RR are related together and adjoining box gives the relationship between the two [Figure 4].
OR and RR are bound to differ if there is a difference between the two probabilities. However, OR is close to the RR if probabilities of the outcome are small
- OR does not change even if the ratio of number of cases versus control changes. However, RR changes. For example, if the cell count in the above table increases from 180 to 360 and other cell count from 990 to 1980, the OR remains the same, i.e., 11. However, the RR changes from 10 to 10.5
- HR is broadly equivalent to RR and is useful when the risk varies with respect to time. It is generally used in survival analysis where the information is collected at different intervals of time. Cox model is generally used to estimate the HR and for drawing time to event curves. It is the ratio of hazard function among the exposed and the unexposed or it is the ratio of risk among the untreated group versus treated group at a given interval of time. For example, a HR of 0.60 means the treatment group provides a 40% reduction in risk compared to the untreated one
- ESs play a greater role in clinical and practical implications and need to be reported along with confidence intervals. Researchers may refer to ESs and their classification suggested By Oliver et al. for various association measures in [Table 7]:
Correlation between quantitative variables
Above we have discussed the concept of association and how to assess its strength for studying relationship between two nominal/categorical/ordinal attributes at length. However, association tools cannot be used for studying the relationship, particularly when the attributes have been measured on a continuous numeric scale. For example, we may be interested in studying the relationship between age and height of schoolgoing children, body weight and cholesterol level, maternal age and anxiety, etc. When our attributes are numeric, we try to assess the relationship through Pearson correlation coefficient whose value ranges from -1 to +1. Minus 1 and plus 1 are the perfect negative and positive correlations, respectively. When the two attributes/variables are unrelated, the correlation coefficient is 0. One of the limitations of the Pearson correlation coefficient is that it assesses a linear relationship between the two variables. If we draw a scatter plot of the data points by taking one variable on the X-axis and the other on the Y-axis, the relationship will be a straight line with a constant slope, as shown in [Figure 1]. Pearson correlation coefficient cannot assess non-linear or monotonic relationship (curvilinear, exponential, polynomial, etc.), as shown in [Figure 5] and [Figure 6]. If we calculate a correlation coefficient for a data that follows non-linear relationship, correlation coefficient will be either non-significant or weak, the reason being that we are trying to fit a straight line for a data which is unfit for a straight-line relationship. Such a correlation coefficient may lead to erroneous interpretation. Thus, it is always advisable to draw the scatter plot of the data points and visually explore the relationship before calculating the correlation coefficient. Let us understand through the non-linear relationship graph below where data follows a curvilinear relationship [Figure 7]. When we tried to fit the straight line to the curvilinear data, R-square value reduces from 0.9735 to 0.0292, indicating that straight-line relationship is highly unfit for this data as R-square is closer to 0. We advise researchers to use Microsoft Excel scatter-plot tool which provides options for plotting non-linear relationships (exponential, power, logarithmic, polynomial and moving average) in addition to linear (straight-line) relationship along with the measure of fitness of the model (R2).
Correlation between ordinal variables
Above we have discussed bivariate relationship when both the variables are continuous numeric (interval or ratio scale). A researcher may ask a question how he or she can study a relationship when the variables are ordinal in nature such as pain score, satisfaction level, meld score, and intelligence level. For ordinal variables, we cannot use Pearson correlation coefficient which assumes variables to be normally distributed. For such ordinal variables, we can either use Kendall's tau or Spearman's rank correlation. However, Spearman's correlation coefficient is more widely used and is appropriate for ordinal as well as continuous variables. Similar to Pearson's correlation coefficient, it ranges from -1 to +1. The basic difference between the two coefficients is that Spearman's coefficient measures the strength and direction of monotonic relationship [Figure 6], whereas Pearson's coefficient measures linear relationship. A monotonic relationship shows either a falling trend (orange) or a rising trend (blue) but not both, as shown in [Figure 6].
A researcher may be curious to know which correlation coefficient he or she should choose to study the relationship. If both the variables are normally distributed, Pearson's correlation coefficient is used, otherwise Spearman's correlation coefficient is used. It is worth mentioning here that Pearson's correlation coefficient is highly sensitive to outliers (extreme values) and may give unexpected value; therefore, extra precaution should be taken for the outliers before calculating the coefficient. However, Spearman's correlation coefficient is more robust to outliers than is Pearson's correlation coefficient.
For assessing the strength of relationship, Cohen's 1992 guidelines may be referred for the ES for Pearson’ correlation coefficient as follows:
It is to be noted that the bivariate relationship as assessed by correlation coefficients in no way ensures that one variable is dependent and the other as independent. However, if our aim is to study the relationship between a dependent variable and a set of independent variables often called predictors, feature or explanatory variables, then we should go for regression analysis which is beyond the scope of the present write-up.
| Discussion|| |
Statistical ‘association’ and ‘correlation’ are used for testing the relationships. However, the former is used for nominal attributes and the latter for the numeric/ordinal attributes. Tools for studying the statistical association and correlation should be used cautiously and appropriate test to be used, particularly when assumptions are violated. Significant P value may indicate that there is a strong evidence against the null hypothesis that the two attributes are independent. However, P value may not give us an idea how much is the strength of association or ES. When the sample is large enough, even the weak association may prove to be statistically significant. Therefore, researchers must report the ES along with the P value. We discussed three most commonly used relative measure of association, i.e., OR, RR and HR. These measures assess the RR for specific scenario and cannot be replaced as such while interpreting the results and for drawing conclusions. Many times, researchers take it granted that the OR and RR are one and the same thing. Our calculations suggest us that with probabilities of outcome of 0.5 and 0.1, the OR is 9, whereas RR is 5. However, with smaller probabilities of outcome, i.e., 0.1 and 0.01, the corresponding values are 11 and 10. Thus, we should avoid equating OR with RR, particularly when probabilities of outcome are not small. We hope that the present article will help the researchers in understanding the concepts of association and correlation, and in interpretation of their results.
| Conclusions|| |
Tools for studying the statistical association and correlation should be used cautiously and appropriate tests to be used, particularly when assumptions are violated. P value may indicate a significant association between the two attributes even if the strength of association is very week. Therefore, while studying the association, its strength should be assessed using the appropriate statistics. OR and RR measure the association for assessing the RR. However, we should avoid equating OR with RR, particularly when the probabilities of outcome are not small.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Kim HY. Statistical notes for clinical researchers: Chi-squared test and Fisher's exact test. Restor Dent Endod 2017;42:152-5.
McHugh ML. The Chi-square test of independence. Biochem Med (Zagreb) 2013;23:143-9.
McDonald, JH. G-test of goodness-of-fit. In: Handbook of Biological Statistics. 3rd
ed. Baltimore, Maryland: Sparky House Publishing; 2014. p. 53-8.
Berger RL. Power Comparison of Exact Unconditional Tests for Comparing Two Binomial Proportions. Institute of Statistics Mimeo Series No. 2266; 1994. p. 1-19.
Sokal RR, Rohlf FJ. Biometry: The Principles and Practice of Statistics in Biological Research. Oxford: W.H. Freeman; 1981.
Rea LM, Parker RA. Designing and Conducting Survey Research. 1st
ed. San Francisco, CA: Jossey-Bass; 1992.
Canal L, Micciolo R. The Chi-square controversy: What if Pearson had R? J Stat Comput Simul 2014;84:1015-21.
Hacking I. Trial by number. Science 1984;84:69-70.
Baird D. The fisher/Pearson Chi-squared controversy: A turning point for inductive inference. Br J Philos Sci 1983;34:105-18.
Knol MJ, Algra A, Groenwold RH. How to deal with measures of association: A short guide for the clinician. Cerebrovasc Dis 2012;33:98-103.
Zhang J, Yu KF. What's the relative risk? A method of correcting the odds ratio in cohort studies of common outcomes. JAMA 1998;280:1690-1.
Davies HT, Crombie IK, Tavakoli M. When can odds ratio mislead? BMJ 1998;316:989-91.
Stare J, Maucort-Boulch D. Odds ratio, hazard ratio and relative risk. Metodoloskizvezki 2016;13:59-67.
Oliver J, May WL, Bell ML. Relative effect sizes for measures of risk. Commun Stat Theor Methods 2017;46:1134575.
Mukaka MM. Statistics corner: A guide to appropriate use of correlation coefficient in medical research. Malawi Med J 2012;24:69-71.
Cohen J. A power primer. Psychol Bull 1992;112:155-9.
[Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5], [Figure 6], [Figure 7]
[Table 1], [Table 2], [Table 3], [Table 4], [Table 5], [Table 6], [Table 7]