Re-Coding Our Data Files

By Katie Petrinec

Let me begin with where we left off last

After realizing that our dataset contained turbidity values that exceed 1000 NTUs. We realized that further steps were needed in preparing the data before we could begin any sort of analyses. If we were noticing these patterns in the turbidity data, what about the other parameters? It seems that the “Historic” coding was keeping all the data and not giving us the ability to select whether we wanted to include these “suspect/anomalous” values in our analysis.


How do I know what data to edit?

The first step in the re-coding process was to track down all the “suspect/anomalous” data that were included in the dataset.  How can I track down all the “suspect” data?  That’s easy!  It’s included in our (often overlooked but super important) metadata document.

The metadata document contains valuable information about our dataset and the good news is it is included when you download data files from the CDMO (www.nerrsdata.org).  So, don’t overlook it because it provides information about each instrument deployment, how the instruments performed upon retrieval from the field, and a “suspect/anomalous” data section (nowadays referred to as the See Metadata [GSM] (CSM) section).

fig1
An excerpt of the suspect/anomalous data section taken from our 2006 metadata document

So, armed with the handy-dandy “suspect/anomalous” metadata sections and the 2003 – 2006 data files I began the arduous process of re-coding.


How I edited the files

Good news!  The historic data files (<2007) already have the flag columns included in the data file, but if you remember from last post the flag column only contains the <4> flag.   Rather than changing all the <4> values to <0> (meaning, good data) I left the <4> flag in the data file and only added suspect (<1>) flags.  Reading through the metadata “suspect/anomalous” data sections line by line, I tracked down the data in the data file and applied the suspect flags using the CDMO’s current data management practices.

Tip: Don’t leave the <4> historic flags in the data files.  Go ahead and change them all to <0> flags.  After our 10-year report is complete I am going back and changing all the <4> flags.  Unfortunately, that was one of those hindsight discoveries.

fig2
Example of the edited data file

After I finished editing the annual file.  I then re-saved the files as .csv (comma delimited format) and placed all the files into the same folder.

Hiccup: I ran into a problem with merging the edited historic files and the non-edited data files together.  After opening the non-edited data files in Notepad I discovered that the formatting of the .csv files were different.  I opened all the post 2006 files and re-saved as .csv files in Excel and then the files merged nicely.

All the data historic files are now edited, what’s next?  Merging the files into R….huh?

What’s R?

2 Comments Add yours

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s