Correlation Coefficient Calculator

 
Dataset set x
comma separated input values
Dataset set y
comma separated input values
Number of samples  =  5
Mean `\mu_X`  =  4
Mean `\mu_Y`  =  49
σx  =  2.4495
σy  =  35.8329
Correlation coefficient  =  0.9684
GENERATE WORK
GENERATE WORK

Correlation Coefficient (ρ) - work with steps

Input Data :
Data set x = 1, 2, 4, 5, 8
Data set y = 5, 20, 40, 80, 100
Total number of elements = 5

Objective :
Find what is correlation coefficient for given input data?

Solution :
`x_i = `1, 2, 4, 5, 8   Mean `\mu_X = 20/5 = 4`
`y_i = `5, 20, 40, 80, 100   Mean `\mu_Y = 245/5 = 49`

`(x_i - \mu_X)``(x_i - \mu_X)^2``(y_i - \mu_Y)``(y_i - \mu_Y)^2``(x_i - \mu_X)(y_i - \mu_Y)`
-39-441936132
-24-2984158
00-981-0
113196131
416512601204
`\sum(x_i - \mu_X)^2``=30``\sum(y_i - \mu_Y)^2``=6420``\sum(x_i - \mu_X)(y_i - \mu_Y)``=425`
`σ_X=\sqrt{\frac{30}{5}`
`=\sqrt{6}`
`σ_X=2.4495`
`σ_Y=\sqrt{\frac{6420}{5}`
`=\sqrt{1284}`
`σ_Y=35.8329`
`ρ_(XY)=\frac{1}{5}\times frac{425}{2.4495\times35.8329}`
`=\frac{1}{5}\times frac{425}{87.7724}`
`=\frac{425}{438.8622}`
`ρ_(XY)=0.9684`

Correlation Coefficient calculator measures the degree of dependence or linear correlation between two random samples $X$ and $Y$ or two sets of population data. It's an online statistics and probability tool requires two random samples $X$ and $Y$ or two sets of population data. In other words, it measures how strongly and in which direction the linear relationship between the the two data sets.
It is necessary to follow the next steps:

  1. Enter two samples $X$ and $Y$ (observed values) in the box. These values must be real numbers or variables and may be separated by commas. The values can be copied from a text document or a spreadsheet.
  2. Press the "GENERATE WORK" button to make the computation.
  3. Correlation coefficient calculator will give the linear correlation between the data sets.
Input : Two lists of real numbers separated by comma
Output : A real number

Correlation coefficient calculator gives us the stepwise procedure and insight into every step of calculation. Before the final result of correlation coefficient is derived, it calculates the sample mean and standard deviations of two sets of data. These values of the sample mean and the standard deviations can be of benefit for further solving of problems and applications.

Correlation Coefficient Formula :
Sample Correlation Coefficient Formulas:
Sample correlation coefficient of $X$ and $Y$ is determined by the formula
$$\begin{align} r_{XY}&=\frac{1}{n-1}\sum_{i=1}^n\frac{(x_i-\bar{X})(y_i-\bar{Y})}{s_Xs_Y}\\\\ &=\frac{\sum_{i=1}^n(x_i-\bar{X})(y_i-\bar{Y})}{\sum_{i=1}^n(x_i-\bar{X})\sum_{i=1}^n(y_i-\bar{Y})}\end{align}$$
where $s_x$ and $s_y$ are the sample standard deviations and $\bar{X}$ and $\bar{Y}$ are the sample means.

Population Correlation Coefficient Formula:
Population correlation coefficient of $X$ and $Y$ is determined by the formula
$$\begin{align} \rho_{XY}&=\frac{1}{N}\sum_{i=1}^N\frac{(x_i-\mu_X)(y_i-\mu_Y)}{\sigma_X\sigma_Y}\end{align}$$
where $\sigma_x$ and $\sigma_y$ are the population standard deviations and $\mu_x$ and $\mu_y$ are the population means.

What is Correlation Coefficient?

Let us consider two variables,

$$X=(x_1,\ldots,x_n)\quad \mbox{and}\quad Y=(y_1,\ldots, y_n)$$
If high values of $X$ are connected with high values of $Y$, then a positive correlation exists. If high values of $X$ are connected with law values of $Y$, then a negative correlation exists. These correlations can be concluded from the scatter plots. A scatter plot is the graph which uses Cartesian coordinates to show values for two variables of a data set.
A correlation coefficient, usually denoted by $r_{XY}$, measures how close a set of data points is to being linear. In other words, it measures the degree of dependence or linear correlation (statistical relationship) between two random samples or two sets of population data. The correlation coefficient uses values between $-1$ and $1$.
For example, in the first picture, $r_{XY}=1$, and the data points are on a the line with positive slope. In the second picture, $r_{XY}=-1,$ and the data points are on the line with negative slope.
correlation coefficient with positive slope correlation coefficient with negative slope
There are some kind of correlation:
  • If $|r_{XY}|=1$, there is perfect correlation between $X$ and $Y$;
  • If $r_{XY}=0$, there is no correlation between $X$ and $Y$.
The correlation with the greater absolute value has a stronger linear correlation between data sets $X$ and $Y$. The correlation with the less absolute value has a weaker linear correlation between data sets $X$ and $Y$. If values of correlations are equal, then they have the same strength.

How to Calculate Correlation Coefficient?

Let $X=(x_1,\ldots,x_n)$ and $Y=(y_1,\ldots, y_n)$ be samples of $n$ outcomes. The means of these samples are

$$\bar {X} =\frac{x_1+ \ldots+x_n}{n}\quad \mbox{and}\quad \bar{Y} =\frac{y_1+ \ldots+y_n}{n}$$
The sample standard deviations of these samples are
$$s_X=\sqrt{\frac1{n-1} \sum_{i=1}^n(x_i-\bar{X})^2}\quad \mbox{and}\quad s_Y=\sqrt{\frac1{n-1} \sum_{i=1}^n(y_i-\bar{Y})^2} $$
A correlation coefficient of $X$ and $Y$ is determined by the formula
$$\begin{align} r_{XY}&=\frac{1}{n-1}\sum_{i=1}^n\frac{(x_i-\bar{X})(y_i-\bar{Y})}{s_Xs_Y}\\ &=\frac{\sum_{i=1}^n(x_i-\bar{X})(y_i-\bar{Y})}{\sum_{i=1}^n(x_i-\bar{X})\sum_{i=1}^n(y_i-\bar{Y})}\end{align}$$
The sample data are used to find the correlation coefficient for the sample. If we have data for the entire population, we can find the population correlation coefficient. Similarly, the population correlation coefficient, usually denoted by $\rho_{XY}$, is defined by the following equation
$$\begin{align} \rho_{XY}&=\frac{1}{N}\sum_{i=1}^N\frac{(x_i-\mu_X)(y_i-\mu_Y)}{\sigma_X\sigma_Y}\end{align}$$
where $\sigma_x$ and $\sigma_y$ are the population standard deviations and $\mu_x$ and $\mu_y$ are the population means.
To find the sample correlation coefficient, we need to follow the next steps:
  1. Find the sample mean $\bar{X}$ for data set $X$;
  2. Find the sample mean $\bar{Y}$ for data set $Y$;
  3. Find the sample standard deviation $s_X$ for sample data set $X$;
  4. Find the sample standard deviation $s_Y$ for data set $Y$;
  5. Substitute values in the formula for correlation coefficient to get the result.
In many cases, we can calculate the correlation coefficient by hand, especially for small calculations. But, if we have a large set of data for calculation or we want to get an accurate result, then we should use the correlation coefficient calculator.
The work with steps shows the complete step-by-step calculation for how to find the correlation coefficient of the two samples $X: 1,2,4,5,8$ and $Y: 5,20,40,80,100$ by using tabular method. For any other samples, just supply two lists of numbers and click on the "GENERATE WORK" button. The grade school students may use this calculator to generate the work, verify the results derived by hand or do their homework problems efficiently.

Practice Problems for Correlation Coefficient

The correlation coefficient is useful in finance. For example, in determining how well a mutual fund performs relative to its benchmark index, or another fund. Practice problems of the correlation coefficient are provided using data from statistical simulations as well as real data.

Practice Problem 1: Find the correlation coefficient of the data in the table which shows the relationship between temperature and the weakness felt in various extremities.

body temperatureNumber of extremities
$38.2^o$$4$
$37.5^o$$7$
$37.9^o$$6$
$39.2^o$$10$
$40^o$$12$
$36.9^o$$2$
$39.1^o$$5$
Practice Problem 2 : Find the correlation coefficient of the number of borrowed geometry and statistics books in week five from Monday to Friday.
Number of extremitiesStatistics
Monday134231
Tuesday156127
Wednesday234276
Thursday214265
Friday301124

The correlation coefficient calculator, formula, work with steps (tabular method) and practice problems would be very useful for grade school students of K-12 education to learn what is correlation coefficient of a data set in statistics and probability, how to find it. It's applications in real life is of great significance.