On this website, you will find tutorials for installing & managing software, lists of the best linux resources, and in depth guides to linux.

# Correlation

Robert Washbourne - a year ago - coding, correlation, numpy

Correlation is the measure of similarity between arrays. The formula is this, where n is the length of the lists:

```
Sum_k(0,n)(a_k*b_k)/sqrt(Sum_k(0,n)(a_k*a_k)*Sum_k(0,n)(b_k*b_k))
```

Thus, for the list [1,2,3] and [2,5,1], we have

```
(1*2+2*5+3*1)/sqrt((1*1+2*2+3*3)*(2*2+5*5+1*1)) = 0.731
```

The closer the value is to 1, the more similar the lists are. If the correlation is exactly 1, we know that the lists are the same. Now to write this in python. We use **0.5 to take the square root. First we need a product function.

```
def prod2(list1,list2): return(sum([x*y for x,y in zip(list1,list2)]))
```

This reads: Make a list from multiplying each element of list one by the respective element in `list2`

.

Sum that list and return the value. Now we use the correlation formula with this product function:

```
def corr(list1,list2): return(prod2(list1,list2)/(prod2(list1,list1)*prod2(list2,list2))**0.5)
```