Mike’s Data Science Tutorial For Chemists: Automating Code to Calculate Distance From Chemical Source
The amount of time I’m reminded of the importance of my high school trigonometry class in daily life has been surprisingly often. The other day, I was trying to create a supporting geochemical data visualization where I showcase the source chemical decreasing the further out from the release point you travel. Super easy task if you’re looking from point A to point B on a flat surface, however when dealing with points on a map you have to account for the geodesic distance. My stance is that the Earth is a globe, my apologies to the flat Earthers around the globe reading this (pun intended). However, what if you are dealt a big data set with hundreds to thousands of different data points all in latitude/longitude coordinate systems? Believe it or not this code is based on the fundamental knowledge that Sir James Inman came up with in 1835 when writing his book, “Astronomy for Sea-men”. This key theorem is called the Haversine.
The first concept that Sir James Inman introduced was that the Earth, by no surprise, is a big spherical object. If were to use traditional Pythagorean theorem (A2 + B2 = C2 sing it with me) you’ll notice that it becomes extremely inaccurate over large distances since we are travelling around a sphere and not a flat plane. However, we can substitute the traditional Pythagorean trigonometry for some spherical trigonometric functions. There are some things to think about: lines from point A to point B are actually circular and the angle between two lines is the angle between the planes of the corresponding great circles.
Did you know that there are more than 3 trigonometric functions other than sine, cosine, and tangent? Unless I was distracted in my grade school math class (very likely possibility) I though I knew all the different functions, however there are others that are now seemingly obsolete. These additional functions include versine, haversine, coversine, hacoversine, exsecant, and excosecant. The reason why we don’t see them often these days is that you can sub in our 3 familiar ones as an alternative. For example, haversine(θ) = sin²(θ/2). So why did we make them obsolete. Well, the times before came before calculators and directions were calculated by hang using log tables. If you ever go back in time, bring a calculator with you, and blow the minds of some sailors.
So, the formula is calculating the arc in the Figure below.
Let’s dive into the coding. You’ll notice how I don’t say whether to use R, Python or SAS, since the concept is all the same regardless of what you use. Since, I’m a spoiled chemist I’m using JMP to code since most of my files are in this format. The main concept to take away after reading this code is that it can be automated through a simple loop. For instance, try calculating all this by hand with thousands of samples, you’ll be in for a bad time!
Clear Globals(); Names Default To Here( 1 ); // reference the data table dt = current data table(); // introduce the column with the distance to be updated in the dots table. Feel free to change the name from "Distance" dt << add multiple columns( "Distance", 1, after( 85 ), numeric ); // extract the longitude and the latitude values from the table - as radians. Need to have your release point as "Reference" coordinates HHx = (Column( dt, "Reference Long" ) << getValues) * Pi() / 180 ; HHy = (Column( dt, "Reference Lat" ) << getValues) * Pi() / 180 ; PSx = (Column( dt, "Longitude" ) << getValues) * Pi() / 180 ; PSy = (Column( dt, "Latitude" ) << getValues) * Pi() / 180 ; // declare the calculation function in KMs. This is the haversine calculation! Where R is the mean radius of Earth haversine = Function( {long1, lat1, long2, lat2, R = 6371}, {Default Local}, a = Sin( (lat1 - lat2) / 2 ) ^ 2 + Cos( lat1 ) * Cos( lat2 ) * Sin( (long1 - long2) / 2 ) ^ 2; c = 2 * ATan( a ^ 0.5, (1 - a) ^ 0.5 ); d = (R * c)// KMS ); wait (0.01); // run the function along the rows of the table. Loops make life very very very easy. For( i = 1, i <= N Row( dt ) - 1, i++, dt:distance[i] = haversine( HHx[i], HHy[i], PSx[i], PSy[i] ) );
Always happy to help with any bugs in the code! Feel free to connect and we can chat more.