Let’s recap a little…
If you remember from the last post, we left off with editing the historic data files, all files prior to 2007. Using the metadata documents, we recoded all the 2003-2006 data files and added the suspect data flag and appropriate CDMO code to the data file. We then ran into a little bit of a hiccup when merging the newly edited files with the non-edited files (2007 – 2012). After too long of a period of trying to figure out the problem, I opened the non-edited files in Notepad and discovered the formatting on the .csv files were different. I then opened all the post 2006 files and resaved as .csv files in Excel and the files merged nicely into R.
Okay, so what is R?
R is a free open-source computer language that can be used for visualizing our SWMP data and performing statistical data analyses. R is perfect for analyzing large complex data sets, like our SWMP data, especially if analyzing multiple years of data.
Have you ever reached the end of an Excel spreadsheet? I have! At 1,048,576 rows!
Not only did I reach the limit of Excel but then it took a long time to produce a simple graph. R takes away the long processing time, gives you the ability to perform powerful analyses quickly, and is free (probably it’s largest appeal).
Another great aspect of R is that it bundles together shareable code in easy to use ‘packages’. Packages are great! There are a huge variety of them and if you want to perform a specific task with your data chances are someone else has already developed a package to perform that task.
Hat’s off to Marcus Beck for creating an awesome R package for analyzing our SWMP data. This package is a great resource and tool for anyone and everyone using SWMP data. The package is comprised of several functions used to retrieve, organize, and analyze SWMP data. The only catch is that the data files need to be in the same format used by the Centralized Data Management Office (CDMO).
Tip: If you download your data files directly from the CDMO (www.nerrsdata.org) make sure you remember which type of data export you use: Data Export System vs. Advanced Query System. There is a difference in the header. This is important for future downloads. Let’s say you are only interested in analyzing data from 2001-2016 now, but next year want to add the 2017 data file. You need to remember what type of data export you used so the files can merge nicely together.
For more information about the SWMPr package, cool widgets used to visualize SWMP data, and an awesome SWMP Forum, visit www.swmprats.net.
R is great, however there are some negatives.
But, I’ll save that for next time…