Sunday, June 17, 2018

Using R: consistent treatment of data in the lab


Typically geochemical data is treated rather poorly; it often starts with a clean Excel spreadsheet and ends with a mess of calculations and ugly plots. In this post I share what I learned about manipulating data in R using tidyverse packages. Particularly, I focus on keeping the data clean, columns in good shape and consistent treatement of data.

For example, in the last 4 years, I have been working in a stable isotope lab and thus, I've produced a lot of measurements. Having lot of measurements compiled in data sets is great, however now I have a hard time treating the data consistently.

The main problem is in small day-to-day deviations in the data monitored by analysis of standards. Because of that each session includes a set of standards along with unknowns. The difference between the measured and nominal value of standards is subtracted from the rest of the data within the same analytical session. Having to do this for each analytical sessions, it is almost impossible to use Excel.

TASK: adjust the values of unknowns based on the standards measured during different analytical sessions. Each session has a different date - that helps to automatize the experience.

DETAILS: named UOG is measured for d18O within each analytical session, it should have value of 6.52 per mil but deviations within 0.5 - 1 per mil occur. The difference (excess or deficiency) between the measured value of UOG and 6.52 should be applied to the data set on that particular day.

LOGICAL APPROACH: compile all the data in a single file. Each column is a separate variable. At least three of them: Date of analytical session, Sample name, Measured d18O value. I will instruct the machine how to average out standards within each session and apply the difference between nominal and average value to the rest of the data.

COMPUTING APPROACH is provided below. The instructions are created using RMarkdown document. With some basic familiarity with R this should be easy.