Correlation

Robert Washbourne - a year ago - coding, correlation, numpy

Correlation is the measure of similarity between arrays. The formula is this, where n is the length of the lists:

Sum_k(0,n)(a_k*b_k)/sqrt(Sum_k(0,n)(a_k*a_k)*Sum_k(0,n)(b_k*b_k))  

Thus, for the list [1,2,3] and [2,5,1], we have

(1*2+2*5+3*1)/sqrt((1*1+2*2+3*3)*(2*2+5*5+1*1)) = 0.731

The closer the value is to 1, the more similar the lists are. If the correlation is exactly 1, we know that the lists are the same. Now to write this in python. We use **0.5 to take the square root. First we need a product function.

def prod2(list1,list2): return(sum([x*y for x,y in zip(list1,list2)]))  

This reads: Make a list from multiplying each element of list one by the respective element in list2.

Sum that list and return the value. Now we use the correlation formula with this product function:

def corr(list1,list2): return(prod2(list1,list2)/(prod2(list1,list1)*prod2(list2,list2))**0.5)  
Subscribe to DevPy
Get a post every few weeks, no spam.