Environmental Data Literacy

Developing quantitative analysis skills for data visualization, manipulation, analysis, and communication using the R statistical language.

Environmental Data Literacy
Image with a scatter plot of a dozen 2-variable data sets with the same mean, standard deviation, and correlation (to at least two decimal points)—mean to show importance of data literacy in being able to differentiate between quantitatively similar content
Semester course; 3 lecture hours. 3 credits. Enrollment is restricted to students with graduate standing or those with one course in statistics and permission of instructor. Develop quantitative skills for the visualization, manipulation, analysis, and communication of environmental "big data." This course focuses on spatial environmental data analysis, interpretation, and communication, using real-time data from the Rice Rivers Center and the R statistical analysis environment.

As both a student and instructor in statistics classes, I found I spent a vast amount of time and effort describing the characteristics of statistics (derivations, expectations, etc.). These skills do not translate into a skillset that allows you, as a graduate student, to apply statistical analyses to real-world data sets in an efficient manner. This course is not about statistics per se; it is about data: how to collect it, how to visualize it, how to work with and manipulate it, how to apply standard analysis models to it, and how to communicate about data.

Workflow in Data Analysis

Below is a brief graphical depiction of how analysis works in the real world. In this class, we will work on all of these components using the open-source R language.

  • Collect: Getting data from an external source into a format that you can use is often the most time-consuming step in the analysis. The content of this class will provide training in data import from local, online, and database sources.
  • Visualize: Visualizing data is key to understanding. In the image below, notice that the variables X and Y in all the displayed data sets have equivalent means, standard deviations, and correlation up to 2 decimal places! We will emphasize visualization, both static and dynamic, throughout this class.
  • Transform: Pulling data into your analysis ecosystem is not sufficient. Often, the data needs to be reformatted and reconfigured before it is actually usable.
  • Model: The application of models to subsets of data is often the step that takes the least amount of time and effort. However, the application of a model to data is not the endpoint. The model must be visualized, and often, the underlying data or derived data must be transformed and submitted to subsequent models.
  • Communicate: The effort we put into research and analyses is meaningless without effective communication of your data and findings to a broad audience. Here, we will focus on developing effective data communication strategies and formats.
Schematic of typical data analysis workflow, collection, visualization, transformation, modeling, and communication.

Course Learning Objectives

The purpose of this course is to help you build your data skills and to develop a foundational understanding upon which subsequent courses will build. The overarching goal is to develop a working knowledge of the R statistical computing language and sufficient proficiency to import raw data and then iterate through the visualization, manipulation, and analysis steps in creating output that is easily communicated to a scientific audience.

The content of this course is built upon the following general course learning objectives (CLO):

  1. Reproducible Research: By the end of this course, students will demonstrate habits, knowledge, and toolsets that support robust data management methods and reproducible research associated with commonly used environmental analyses.
  2. Environmental Data Types: By the end of this course, students will be able to find, load, manipulate, and summarize common data types found in environmental and ecological studies.
  3. Environmental Analyses: By the end of this course, students will be able to identify, perform, and communicate appropriate statistical analyses for common data types found in environmental studies.

Course Content & Assessment

This course is designed as a sequence of individual, stand-alone modules. Each is self-contained and includes a lecture, slides, a larger narrative document, a video demonstration, and an assessment.

  1. Welcome & Logistics: Setting up the logistics for the class, installing R, RStudio, and Quarto on each of your machines, and providing a tour of the IDE.
  2. Git, Github, & Markdown: Establish a functional working knowledge of git as a collaborative tool for reproducible research and begin working with Markdown as an output for data analysis.
  3. Data Types & Containers: Understanding the fundamental data types and containers within R and how to import, work with easily, and export raw data.
  4. Tidyverse: Data manipulation. Like a boss.
  5. Graphics That DON’T Suck: Hello publication quality graphics, using the grammar of graphics approach.
  6. Statistical Confidence: Base understanding of statistical inferences and the properties of sampled data.
  7. Binomial Inferences: Analyses based upon expectations.
  8. Categorical~f(Categorical): Contingency table and categorical count data.
  9. Continuous~f(Categorical): Analysis of Variance (or equality of means).
  10. Continuous~f(Continuous): Correlation & Regression approaches.
  11. Categorical~f(Continuous): Logistic regression.
  12. Points, Lines, & Polygons: Spatial data in vector format.
  13. Raster Data: Continuously distributed spatial data.
  14. Spatial Analyses: Performing spatially explicit analyses of point processes and habitats.
  15. Raytracing: Higher-dimensional visualization of spatial extents.

Logistics

  • Course Instructor: Professor Rodney Dyer
  • Email: rjdyer@vcu.edu
  • Webpage: dyerlab.org.
  • Office Hours: Wednesdays from 10-11 am via Zoom or by appointment.
  • Meeting Times: T/R 12:30 - 13:45
  • Meeting Location: MCALC 1104
  • Final Exam Scheduled: Tuesday, December 9, 2023, 12:30 – 15:20.

Required Materials

This course requires that you bring your laptop or other computing device that is capable of running RStudio and the R statistical language. There is no required book, and all content is provided via online resources.

Assignments & Grading Policy

The grade for this course is based upon the totality of the points gained for all assignments, as well as a single large data analysis project that will be due at the end of the semester. This final will account for 10% of your overall grade. Grades will be determined using the normal 10% scale:

  • A (>= 90%),
  • B (>= 80% & < 90%),
  • C (>= 70% & < 80%),
  • D (>= 60% & < 70%), and
  • F (< 60%).

All percentages are concrete, scores will be rounded to the nearest integer, and no extra credit will be given.

Late Policy

All of the content in this class is given as take-home assignments and tests. You will have a full seven days to complete and submit the work. The intention here is to provide you with more than sufficient time to complete the job because we do not rush data analysis. On the due date, I will post the answers so you can check your work. After the answers are posted, no points will be awarded for late work.

Attendance Policy

All content is provided in the form of slides, handouts, and video content. Much of the work in this class will be conducted during the in-class session. As such, you must attend class if you intend to receive the content. Data analysis is a hands-on experience, and the more you do it, the more efficient you will become.

Disclaimer

Note that the specifics of this Course Syllabus may be changed at any time during the semester. You will be responsible for abiding by any such changes that are communicated to you via email, course announcement, and/or posting in the course discussion forums.

VCU University Policies

Students should visit http://go.vcu.edu/syllabus and thoroughly review all of the listed syllabus statement information. The whole university syllabus statement includes information such as safety, registration, the VCU Honor Code, student conduct, withdrawal from courses, and more.