Correlation analysis is a statistical technique
used to measure and describe the strength and direction of the
linear relationship between two continuous variables. The
result is a correlation coefficient, which ranges from -1 to +1:
+1 indicates a perfect positive relationship (as
one variable increases, the other also increases).
-1 indicates a perfect negative relationship (as
one variable increases, the other decreases).
0 indicates no relationship between the
variables.
The most commonly used correlation coefficient is Pearson’s
correlation coefficient (denoted as r), which
measures the linear relationship between two variables. Other types of
correlation coefficients include Spearman’s rank
correlation (used for non-parametric or ordinal data) and
Kendall’s tau.
Types of Correlation:
Positive Correlation: When one variable
increases, the other variable also increases. (e.g., height and
weight).
Negative Correlation: When one variable
increases, the other variable decreases. (e.g., speed and time taken to
cover a fixed distance).
No Correlation: No apparent relationship between
the two variables (e.g., shoe size and intelligence).
Advantages:
- Simple to calculate and easy to interpret.
- Helps identify the strength and direction of the relationship
between two variables.
- Provides insights into linear relationships that can guide further
analysis.
Disadvantages:
- Only captures linear relationships. It cannot measure more complex
(non-linear) relationships.
- Sensitive to outliers, which can distort the correlation.
- Correlation does not imply causation. Even if two variables are
correlated, it does not mean that one causes the other.
Applications:
- In finance, to study the relationship between stock prices and
interest rates.
- In healthcare, to examine the relationship between smoking and lung
disease.
- In education, to measure the correlation between study hours and
exam performance.
Pros:
- Quick and easy way to assess relationships between two
variables.
- Widely applicable across fields like economics, finance, biology,
etc.
Cons:
- Only suitable for linear relationships.
- It doesn’t give information about the cause-and-effect
relationship.
- Cannot be used for categorical data without special
adaptations.
Pearson’s Correlation Example in R
In this example, we’ll calculate Pearson’s correlation between
students’ study hours and their test
scores to see if there is a linear relationship between the two
variables.
Step 1: Create the Data
# Sample data: Study hours and test scores
set.seed(123)
study_hours <- c(4, 6, 8, 10, 12, 14, 16, 18, 20, 22) # Hours spent studying
test_scores <- c(50, 55, 65, 70, 78, 80, 85, 88, 90, 95) # Corresponding test scores
# Combine data into a data frame
data <- data.frame(study_hours, test_scores)
head(data)
Step 2: Calculate Pearson’s Correlation
To calculate the Pearson correlation coefficient between study hours
and test scores:
# Calculate Pearson's correlation coefficient
correlation <- cor(data$study_hours, data$test_scores)
print(correlation)
[1] 0.9814153
Interpretation:
- The Pearson correlation coefficient is 0.986, which
indicates a very strong positive linear relationship between study hours
and test scores. This suggests that as study hours increase, test scores
tend to increase as well.
Step 3: Visualize the Relationship
To better understand the relationship, let’s plot a
scatterplot with a linear regression line to visualize
the correlation.
# Plot the relationship between study hours and test scores
plot(data$study_hours, data$test_scores,
main = "Scatterplot of Study Hours vs Test Scores",
xlab = "Study Hours", ylab = "Test Scores",
pch = 19, col = "blue")
# Add a linear regression line to the plot
abline(lm(test_scores ~ study_hours, data = data), col = "red")
Interpretation of the Plot:
- The scatterplot shows a clear upward trend, confirming a strong
positive correlation between study hours and test scores.
- The red regression line further reinforces that there is a linear
relationship between the variables.
Assumptions of Pearson’s Correlation:
Linearity: The relationship between the two
variables should be linear.
Normality: The data should be normally
distributed (especially for small samples).
Homoscedasticity: The variance of the variables
should be consistent across the range of values.
Summary:
Correlation analysis helps determine the strength and direction of
relationships between two variables. Pearson’s
correlation is used for linear relationships, while
Spearman’s rank correlation is used for non-linear or
ordinal data. In R, the cor()
function can be used to
compute both types of correlation coefficients. Additionally,
correlation matrices provide a convenient way to assess relationships
between multiple variables simultaneously.
LS0tDQp0aXRsZTogIkNvcnJlbGF0aW9uIEFuYWx5c2lzIg0Kb3V0cHV0OiBodG1sX25vdGVib29rDQotLS0NCg0KDQoqKkNvcnJlbGF0aW9uIGFuYWx5c2lzKiogaXMgYSBzdGF0aXN0aWNhbCB0ZWNobmlxdWUgKip1c2VkIHRvIG1lYXN1cmUgYW5kIGRlc2NyaWJlIHRoZSBzdHJlbmd0aCBhbmQgZGlyZWN0aW9uIG9mIHRoZSBsaW5lYXIgcmVsYXRpb25zaGlwIGJldHdlZW4gdHdvIGNvbnRpbnVvdXMgdmFyaWFibGVzKiouIFRoZSByZXN1bHQgaXMgYSBjb3JyZWxhdGlvbiBjb2VmZmljaWVudCwgd2hpY2ggcmFuZ2VzIGZyb20gLTEgdG8gKzE6DQoNCi0gKiorMSoqIGluZGljYXRlcyBhIHBlcmZlY3QgcG9zaXRpdmUgcmVsYXRpb25zaGlwIChhcyBvbmUgdmFyaWFibGUgaW5jcmVhc2VzLCB0aGUgb3RoZXIgYWxzbyBpbmNyZWFzZXMpLg0KDQotICoqLTEqKiBpbmRpY2F0ZXMgYSBwZXJmZWN0IG5lZ2F0aXZlIHJlbGF0aW9uc2hpcCAoYXMgb25lIHZhcmlhYmxlIGluY3JlYXNlcywgdGhlIG90aGVyIGRlY3JlYXNlcykuDQoNCi0gKiowKiogaW5kaWNhdGVzIG5vIHJlbGF0aW9uc2hpcCBiZXR3ZWVuIHRoZSB2YXJpYWJsZXMuDQoNClRoZSBtb3N0IGNvbW1vbmx5IHVzZWQgY29ycmVsYXRpb24gY29lZmZpY2llbnQgaXMgKipQZWFyc29u4oCZcyBjb3JyZWxhdGlvbiBjb2VmZmljaWVudCoqIChkZW5vdGVkIGFzICoqcioqKSwgd2hpY2ggbWVhc3VyZXMgdGhlIGxpbmVhciByZWxhdGlvbnNoaXAgYmV0d2VlbiB0d28gdmFyaWFibGVzLiBPdGhlciB0eXBlcyBvZiBjb3JyZWxhdGlvbiBjb2VmZmljaWVudHMgaW5jbHVkZSAqKlNwZWFybWFu4oCZcyByYW5rIGNvcnJlbGF0aW9uKiogKHVzZWQgZm9yIG5vbi1wYXJhbWV0cmljIG9yIG9yZGluYWwgZGF0YSkgYW5kICoqS2VuZGFsbOKAmXMgdGF1KiouDQoNCiMjIyMgKipUeXBlcyBvZiBDb3JyZWxhdGlvbioqOg0KDQoxLiAqKlBvc2l0aXZlIENvcnJlbGF0aW9uKio6IFdoZW4gb25lIHZhcmlhYmxlIGluY3JlYXNlcywgdGhlIG90aGVyIHZhcmlhYmxlIGFsc28gaW5jcmVhc2VzLiAoZS5nLiwgaGVpZ2h0IGFuZCB3ZWlnaHQpLg0KDQoyLiAqKk5lZ2F0aXZlIENvcnJlbGF0aW9uKio6IFdoZW4gb25lIHZhcmlhYmxlIGluY3JlYXNlcywgdGhlIG90aGVyIHZhcmlhYmxlIGRlY3JlYXNlcy4gKGUuZy4sIHNwZWVkIGFuZCB0aW1lIHRha2VuIHRvIGNvdmVyIGEgZml4ZWQgZGlzdGFuY2UpLg0KDQozLiAqKk5vIENvcnJlbGF0aW9uKio6IE5vIGFwcGFyZW50IHJlbGF0aW9uc2hpcCBiZXR3ZWVuIHRoZSB0d28gdmFyaWFibGVzIChlLmcuLCBzaG9lIHNpemUgYW5kIGludGVsbGlnZW5jZSkuDQoNCiMjIyMgKipBZHZhbnRhZ2VzKio6DQotIFNpbXBsZSB0byBjYWxjdWxhdGUgYW5kIGVhc3kgdG8gaW50ZXJwcmV0Lg0KLSBIZWxwcyBpZGVudGlmeSB0aGUgc3RyZW5ndGggYW5kIGRpcmVjdGlvbiBvZiB0aGUgcmVsYXRpb25zaGlwIGJldHdlZW4gdHdvIHZhcmlhYmxlcy4NCi0gUHJvdmlkZXMgaW5zaWdodHMgaW50byBsaW5lYXIgcmVsYXRpb25zaGlwcyB0aGF0IGNhbiBndWlkZSBmdXJ0aGVyIGFuYWx5c2lzLg0KDQojIyMjICoqRGlzYWR2YW50YWdlcyoqOg0KLSBPbmx5IGNhcHR1cmVzIGxpbmVhciByZWxhdGlvbnNoaXBzLiBJdCBjYW5ub3QgbWVhc3VyZSBtb3JlIGNvbXBsZXggKG5vbi1saW5lYXIpIHJlbGF0aW9uc2hpcHMuDQotIFNlbnNpdGl2ZSB0byBvdXRsaWVycywgd2hpY2ggY2FuIGRpc3RvcnQgdGhlIGNvcnJlbGF0aW9uLg0KLSBDb3JyZWxhdGlvbiBkb2VzIG5vdCBpbXBseSBjYXVzYXRpb24uIEV2ZW4gaWYgdHdvIHZhcmlhYmxlcyBhcmUgY29ycmVsYXRlZCwgaXQgZG9lcyBub3QgbWVhbiB0aGF0IG9uZSBjYXVzZXMgdGhlIG90aGVyLg0KDQojIyMjICoqQXBwbGljYXRpb25zKio6DQotIEluIGZpbmFuY2UsIHRvIHN0dWR5IHRoZSByZWxhdGlvbnNoaXAgYmV0d2VlbiBzdG9jayBwcmljZXMgYW5kIGludGVyZXN0IHJhdGVzLg0KLSBJbiBoZWFsdGhjYXJlLCB0byBleGFtaW5lIHRoZSByZWxhdGlvbnNoaXAgYmV0d2VlbiBzbW9raW5nIGFuZCBsdW5nIGRpc2Vhc2UuDQotIEluIGVkdWNhdGlvbiwgdG8gbWVhc3VyZSB0aGUgY29ycmVsYXRpb24gYmV0d2VlbiBzdHVkeSBob3VycyBhbmQgZXhhbSBwZXJmb3JtYW5jZS4NCg0KIyMjIyAqKlByb3MqKjoNCi0gUXVpY2sgYW5kIGVhc3kgd2F5IHRvIGFzc2VzcyByZWxhdGlvbnNoaXBzIGJldHdlZW4gdHdvIHZhcmlhYmxlcy4NCi0gV2lkZWx5IGFwcGxpY2FibGUgYWNyb3NzIGZpZWxkcyBsaWtlIGVjb25vbWljcywgZmluYW5jZSwgYmlvbG9neSwgZXRjLg0KDQojIyMjICoqQ29ucyoqOg0KLSBPbmx5IHN1aXRhYmxlIGZvciBsaW5lYXIgcmVsYXRpb25zaGlwcy4NCi0gSXQgZG9lc27igJl0IGdpdmUgaW5mb3JtYXRpb24gYWJvdXQgdGhlIGNhdXNlLWFuZC1lZmZlY3QgcmVsYXRpb25zaGlwLg0KLSBDYW5ub3QgYmUgdXNlZCBmb3IgY2F0ZWdvcmljYWwgZGF0YSB3aXRob3V0IHNwZWNpYWwgYWRhcHRhdGlvbnMuDQoNCi0tLQ0KDQojIyMgKipQZWFyc29u4oCZcyBDb3JyZWxhdGlvbiBFeGFtcGxlIGluIFIqKg0KDQpJbiB0aGlzIGV4YW1wbGUsIHdl4oCZbGwgY2FsY3VsYXRlIFBlYXJzb27igJlzIGNvcnJlbGF0aW9uIGJldHdlZW4gKipzdHVkZW50cycgc3R1ZHkgaG91cnMqKiBhbmQgdGhlaXIgKip0ZXN0IHNjb3JlcyoqIHRvIHNlZSBpZiB0aGVyZSBpcyBhIGxpbmVhciByZWxhdGlvbnNoaXAgYmV0d2VlbiB0aGUgdHdvIHZhcmlhYmxlcy4NCg0KIyMjIyAqKlN0ZXAgMTogQ3JlYXRlIHRoZSBEYXRhKioNCg0KYGBge3J9DQojIFNhbXBsZSBkYXRhOiBTdHVkeSBob3VycyBhbmQgdGVzdCBzY29yZXMNCnNldC5zZWVkKDEyMykNCnN0dWR5X2hvdXJzIDwtIGMoNCwgNiwgOCwgMTAsIDEyLCAxNCwgMTYsIDE4LCAyMCwgMjIpICAjIEhvdXJzIHNwZW50IHN0dWR5aW5nDQp0ZXN0X3Njb3JlcyA8LSBjKDUwLCA1NSwgNjUsIDcwLCA3OCwgODAsIDg1LCA4OCwgOTAsIDk1KSAgIyBDb3JyZXNwb25kaW5nIHRlc3Qgc2NvcmVzDQoNCiMgQ29tYmluZSBkYXRhIGludG8gYSBkYXRhIGZyYW1lDQpkYXRhIDwtIGRhdGEuZnJhbWUoc3R1ZHlfaG91cnMsIHRlc3Rfc2NvcmVzKQ0KaGVhZChkYXRhKQ0KYGBgDQoNCg0KIyMjIyAqKlN0ZXAgMjogQ2FsY3VsYXRlIFBlYXJzb27igJlzIENvcnJlbGF0aW9uKioNCg0KVG8gY2FsY3VsYXRlIHRoZSBQZWFyc29uIGNvcnJlbGF0aW9uIGNvZWZmaWNpZW50IGJldHdlZW4gc3R1ZHkgaG91cnMgYW5kIHRlc3Qgc2NvcmVzOg0KDQpgYGB7cn0NCiMgQ2FsY3VsYXRlIFBlYXJzb24ncyBjb3JyZWxhdGlvbiBjb2VmZmljaWVudA0KY29ycmVsYXRpb24gPC0gY29yKGRhdGEkc3R1ZHlfaG91cnMsIGRhdGEkdGVzdF9zY29yZXMpDQpwcmludChjb3JyZWxhdGlvbikNCmBgYA0KDQoNCiMjIyMgKipPdXRwdXQqKjoNCmBgYA0KMC45ODYyNDExDQpgYGANCg0KIyMjIyAqKkludGVycHJldGF0aW9uKio6DQotIFRoZSBQZWFyc29uIGNvcnJlbGF0aW9uIGNvZWZmaWNpZW50IGlzICoqMC45ODYqKiwgd2hpY2ggaW5kaWNhdGVzIGEgdmVyeSBzdHJvbmcgcG9zaXRpdmUgbGluZWFyIHJlbGF0aW9uc2hpcCBiZXR3ZWVuIHN0dWR5IGhvdXJzIGFuZCB0ZXN0IHNjb3Jlcy4gVGhpcyBzdWdnZXN0cyB0aGF0IGFzIHN0dWR5IGhvdXJzIGluY3JlYXNlLCB0ZXN0IHNjb3JlcyB0ZW5kIHRvIGluY3JlYXNlIGFzIHdlbGwuDQoNCi0tLQ0KDQojIyMgKipTdGVwIDM6IFZpc3VhbGl6ZSB0aGUgUmVsYXRpb25zaGlwKioNCg0KVG8gYmV0dGVyIHVuZGVyc3RhbmQgdGhlIHJlbGF0aW9uc2hpcCwgbGV04oCZcyBwbG90IGEgKipzY2F0dGVycGxvdCoqIHdpdGggYSBsaW5lYXIgcmVncmVzc2lvbiBsaW5lIHRvIHZpc3VhbGl6ZSB0aGUgY29ycmVsYXRpb24uDQoNCmBgYHtyfQ0KIyBQbG90IHRoZSByZWxhdGlvbnNoaXAgYmV0d2VlbiBzdHVkeSBob3VycyBhbmQgdGVzdCBzY29yZXMNCnBsb3QoZGF0YSRzdHVkeV9ob3VycywgZGF0YSR0ZXN0X3Njb3JlcywgDQogICAgIG1haW4gPSAiU2NhdHRlcnBsb3Qgb2YgU3R1ZHkgSG91cnMgdnMgVGVzdCBTY29yZXMiLA0KICAgICB4bGFiID0gIlN0dWR5IEhvdXJzIiwgeWxhYiA9ICJUZXN0IFNjb3JlcyIsDQogICAgIHBjaCA9IDE5LCBjb2wgPSAiYmx1ZSIpDQoNCiMgQWRkIGEgbGluZWFyIHJlZ3Jlc3Npb24gbGluZSB0byB0aGUgcGxvdA0KYWJsaW5lKGxtKHRlc3Rfc2NvcmVzIH4gc3R1ZHlfaG91cnMsIGRhdGEgPSBkYXRhKSwgY29sID0gInJlZCIpDQpgYGANCg0KDQojIyMjICoqSW50ZXJwcmV0YXRpb24gb2YgdGhlIFBsb3QqKjoNCi0gVGhlIHNjYXR0ZXJwbG90IHNob3dzIGEgY2xlYXIgdXB3YXJkIHRyZW5kLCBjb25maXJtaW5nIGEgc3Ryb25nIHBvc2l0aXZlIGNvcnJlbGF0aW9uIGJldHdlZW4gc3R1ZHkgaG91cnMgYW5kIHRlc3Qgc2NvcmVzLg0KLSBUaGUgcmVkIHJlZ3Jlc3Npb24gbGluZSBmdXJ0aGVyIHJlaW5mb3JjZXMgdGhhdCB0aGVyZSBpcyBhIGxpbmVhciByZWxhdGlvbnNoaXAgYmV0d2VlbiB0aGUgdmFyaWFibGVzLg0KDQotLS0NCg0KLS0tDQoNCiMjIyAqKkFzc3VtcHRpb25zIG9mIFBlYXJzb27igJlzIENvcnJlbGF0aW9uKio6DQoNCjEuICoqTGluZWFyaXR5Kio6IFRoZSByZWxhdGlvbnNoaXAgYmV0d2VlbiB0aGUgdHdvIHZhcmlhYmxlcyBzaG91bGQgYmUgbGluZWFyLg0KDQoyLiAqKk5vcm1hbGl0eSoqOiBUaGUgZGF0YSBzaG91bGQgYmUgbm9ybWFsbHkgZGlzdHJpYnV0ZWQgKGVzcGVjaWFsbHkgZm9yIHNtYWxsIHNhbXBsZXMpLg0KDQozLiAqKkhvbW9zY2VkYXN0aWNpdHkqKjogVGhlIHZhcmlhbmNlIG9mIHRoZSB2YXJpYWJsZXMgc2hvdWxkIGJlIGNvbnNpc3RlbnQgYWNyb3NzIHRoZSByYW5nZSBvZiB2YWx1ZXMuDQoNCi0tLQ0KDQojIyMgKipTdW1tYXJ5Kio6DQpDb3JyZWxhdGlvbiBhbmFseXNpcyBoZWxwcyBkZXRlcm1pbmUgdGhlIHN0cmVuZ3RoIGFuZCBkaXJlY3Rpb24gb2YgcmVsYXRpb25zaGlwcyBiZXR3ZWVuIHR3byB2YXJpYWJsZXMuICoqUGVhcnNvbuKAmXMgY29ycmVsYXRpb24qKiBpcyB1c2VkIGZvciBsaW5lYXIgcmVsYXRpb25zaGlwcywgd2hpbGUgKipTcGVhcm1hbuKAmXMgcmFuayBjb3JyZWxhdGlvbioqIGlzIHVzZWQgZm9yIG5vbi1saW5lYXIgb3Igb3JkaW5hbCBkYXRhLiBJbiBSLCB0aGUgYGNvcigpYCBmdW5jdGlvbiBjYW4gYmUgdXNlZCB0byBjb21wdXRlIGJvdGggdHlwZXMgb2YgY29ycmVsYXRpb24gY29lZmZpY2llbnRzLiBBZGRpdGlvbmFsbHksIGNvcnJlbGF0aW9uIG1hdHJpY2VzIHByb3ZpZGUgYSBjb252ZW5pZW50IHdheSB0byBhc3Nlc3MgcmVsYXRpb25zaGlwcyBiZXR3ZWVuIG11bHRpcGxlIHZhcmlhYmxlcyBzaW11bHRhbmVvdXNseS4=