Chapter 1 Introduction
1.1 What to expect in each chapter
This textbook is intended as applied, in that a primary goal is to help the scientist-practitioner-advocate use a variety of statistics in research problems and writing them up for a program evaluation, dissertation, or journal article. In support of that goal, I try to provide just enough conceptual information so that the researcher can select the appropriate statistic (i.e., distinguishing between when ANOVA is appropriate and when regression is appropriate) and assign variables to their proper role (e.g., covariate, moderator, mediator).
This conceptual approach does include occasional, step-by-step, hand-calculations (only we calculate them arithmetically in R) to provide a visceral feeling of what is happening within the statistical algorithm that may be invisible to the researcher. Additionally, the conceptual review includes a review of the assumptions about the characteristics of the data and research design that are required for the statistic. Statistics can be daunting, so I have worked hard to establish a workflow through each analysis. When possible, I include a flowchart that is referenced frequently in each chapter and assists the the researcher keep track of their place in the many steps and choices that accompany even the simplest of analyses.
As with many statistics texts, each chapter includes a research vignette. Somewhat unique to this resource is that the vignettes are selected from recently published articles. Each vignette is chosen with the intent to meet as many of the following criteria as possible:
- the statistic that is the focus of the chapter was properly used in the article,
- the author’s identity is from a group where scholarship is historically marginalized (e.g., BIPOC, LGBTQ+, LMIC [low middle income countries]),
- the research has a justice, equity, inclusion, diversity, and social responsivity focus and will contribute positively to a social justice pedagogy, and
- the data is available in a repository or there is sufficient information in the article to simulate the data for the chapter example(s) and practice problem(s).
In each chapter we employ R packages that will efficiently calculate the statistic and the dashboard of metrics (e.g., effect sizes, confidence intervals) that are typically reported in psychological science.
1.2 Strategies for Accessing and Using this OER
There are a number of ways you can access this resource. You may wish to try several strategies and then select which works best for you. I demonstrate these in the screencast that accompanies this chapter.
- Simply follow along in the .html formatted document that is available on via GitHub Pages, and then
- open a fresh .rmd file of your own, copying (or retyping) the script and running it
- Locate the original documents at the GitHub repository . You can
- open them to simply take note of the “behind the scenes” script
- copy/download individual documents that are of interest to you
- fork a copy of the entire project to your own GitHub site and further download it (in its entirety) to your personal workspace. The GitHub Desktop app makes this easy!
- Listen to the accompanying lectures (I think sound best when the speed is 1.75). The lectures are being recorded in Panopto and should include the closed captioning.
- Provide feedback to me! If you fork a copy to your own GitHub repository, you can
- open up an editing tool and mark up the document with your edits,
- start a discussion by leaving comments/questions, and then
- sending them back to me by committing and saving. I get an e-mail notiying me of this action. I can then review (accepting or rejecting) them and, if a discussion is appropriate, reply back to you.
1.3 If You are New to R
R can be oveRwhelming. Jumping right into advanced statistics might not be the easiest way to start. However, in these chapters, I provide complete code for every step of the process, starting with uploading the data. To help explain what R script is doing, I sometimes write it in the chapter text; sometimes leave hastagged-comments in the chunks; and, particularly in the accompanying screencasted lectures, try to take time to narrate what the R script is doing.
I’ve found that, somewhere on the internet, there’s almost always a solution to what I’m trying to do. I am frequently stuck and stumped and have spent hours searching the internet for even the tiniest of things. When you watch my videos, you may notice that in my R studio, there is a “scRiptuRe” file. I takes notes on the solutions and scripts here – using keywords that are meaningful to me so that when I need to repeat the task, I can hopefully search my own prior solutions and find a fix or a hint.
1.3.1 Base R
The base program is free and is available here: https://www.r-project.org/
Because R is already on my machine (and because the instructions are sufficient), I will not walk through the instllation, but I will point out a few things.
- Follow the instructions for your operating system (Mac, Windows, Linux)
- The “cran” (I think “cranium”) is the Comprehensive R Archive Network. In order for R to run on your computer, you have to choose a location. Because proximity is somewhat related to processing speed, select one that is geographically “close to you.”
- You will see the results of this download on your desktop (or elsewhere if you chose to not have it appear there) but you won’t ever use R through this platform.
1.3.2 R Studio
R Studio is the desktop application I work in R. It’s a separate download. Choose the free, desktop, option that is appropriate for your operating system: https://www.rstudio.com/products/RStudio/
- Upper right window: Includes several tabs; we frequently monitor the
- Environment: it lists the objects that are available to you (e.g., dataframes)
- Lower right window: has a number of helpful tabs.
- Files: Displays the file structure in your computer’s environment. Make it a practice to (a) organize your work in small folders and (b) navigating to that small folder that is holding your project when you are working on it.
- Packages: Lists the packages that have been installed. If you navigate to it, you can see if it is “on.” You can also access information about the package (e.g., available functions, examples of script used with the package) in this menu. This information opens in the Help window.
- Viewer and Plots are helpful, later, when we can simultaneously look at our output and still work on our script.
- Primary window
- R Studio runs in the background(in the console). Very occasionally, I can find useful troubleshooting information here.
- More commonly, I open my R Markdown document so that it takes the whole screen and I work directly, right here.
- R Markdown is the way that many analysts write script, conduct analyses, and even write up results. These are saved as .rmd files.
- In R Studio, open an R Markdown document through File/New File/R Markdown
- Specify the details of your document (title, author, desired ouput)
- In a separate step, SAVE this document (File/Save] into a NEW FILE FOLDER that will contain anything else you need for your project (e.g., the data).
- Packages are at the heart of working in R. Installing and activating packages require writing script.
1.3.3 R Hygiene
Many initial problems in R can be solved with good R hygiene. Here are some suggestions for basic practices. It can be tempting to “skip this.” However, in the first few weeks of class, these are the solutions I am presenting to my students.
1.3.3.1 Everything is documented in the .rmd file
Although others do it differently, everything is in my .rmd file. That is, for uploading data and opening packages I write the code in my .rmd file. Why? Because when I read about what I did hours or years later, I have a permanent record of very critical things like (a) where my data is located, (b) what version I was using, and (c) what package was associated with the functions.
1.3.3.2 File organization
File organization is a critical key to this:
- Create a project file folder.
- Put the data file in it.
- Open an R Markdown file.
- Save it in the same file folder.
- When your data and .rmd files are in the same folder (not your desktop, but a shared folder), they can be connected.
1.3.3.3 Chunks
The R Markdown document is an incredible tool for integrating text, tables, and analyses. This entire OER is written in R Markdown. A central feature of this is “chunks.”
The easiest way to insert a chunk is to use the INSERT/R command at the top of this editor box. You can also insert a chunk with the keyboard shortcut: CTRL/ALT/i
“Chunks” start and end with with those three tic marks and will show up in a shaded box, like this:
#hashtags let me write comments to remind myself what I did
#here I am simply demonstrating arithmetic (but I would normally be running code)
2021 - 1966
## [1] 55
Each chunk must open and close. If one or more of your tic marks get deleted, your chunk won’t be read as such and your script will not run. The only thing in the chunks should be script for running R; you can hashtag-out script so it won’t run.
Although unnecessary, you can add a brief title for the chunk in the opening row, after the “r.” These create something of a table of contents of all the chunks – making it easier to find what you did. You can access them in the “Chunks” tab at the bottom left of R Studio. If you wish to knit a document, you cannot have identical chunk titles.
You can put almost anything you want in the space outside of tics. Syntax for simple formatting in the text areas (e.g,. using italics, making headings, bold, etc.) is found here: https://rmarkdown.rstudio.com/authoring_basics.html
1.3.3.4 Packages
As scientist-practitioners (and not coders), we will rely on packages to do our work for us. At first you may feel overwhelmed about the large number of packages that are available. Soon, though, you will become accustomed to the ones most applicable to our work (e.g., psych, tidyverse, lavaan, apaTables).
Researchers treat packages differently. In these lectures, I list all the packages we will use in an opening chunk that asks R to check to see if the package is installed, and if not, installs it.
## Loading required package: psych
## Warning: package 'psych' was built under R version 4.0.5
To make a package operable, you need to open it through the library. This process must be repeated each time you restart R. I don’t open the package (through the “library(package_name)”) command until it is time to use it. Especially for new users, I think it’s important to connect the functions with the specific packages.
If you type in your own “install.packages” code, hashtag it out once it’s been installed. It is problematic to continue to re-run this code .
1.3.3.5 Knitting
An incredible feature of R Markdown is its capacity to knit to HTML, powerpoint, or word. If you access the .rmd files for this OER, you can use annotate or revise them to suit your purposes. If you redistribute them, though, please honor the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License with a citation.
1.3.4 tRoubleshooting in R maRkdown
Hiccups are normal. Here are some ideas that I have found useful in getting unstuck.
- In an R script, you must have everything in order – Every. Single. Time.
- All the packages have to be in your library and activated; if you restart R, you need to reload each package.
- If you open an .rmd file and want a boxplot, you cannot just scroll down to that script. You need to run any prerequisite script (like loading the package, importing data, putting the data in the global environment, etc.)
- Do you feel lost? clear your global environment (broom) and start at the top of the R script. Frequent, fresh starts are good.
- Your .rmd file and your data need to be stored in the same file folder. These should be separate for separate projects, no matter how small.
- Type any warnings you get into a search engine. Odds are, you’ll get some decent hints in a manner of seconds. Especially at first, these are common errors:
- The package isn’t loaded (if you restarted R, you need to reload your packages)
- The .rmd file has been saved yet, or isn’t saved in the same folder as the data
- Errors of punctuation or spelling
- Restart R (it’s quick – not like restarting your computer)
- If you receive an error indicating that a function isn’t working or recognized, and you have loaded the package, type the name of the package in front of the function with two colons (e.g., psych::describe(df). If multiple packages are loaded with functions that have the same name, R can get confused.
1.3.5 stRategies for success
- Engage with R, but don’t let it overwhelm you.
- The mechanical is also the conceptual. Especially when it is simpler, do try to retype the script into your own .rmd file and run it. Track down the errors you are making and fix them.
- If this stresses you out, move to simply copying the code into the .rmd file and running it. If you continue to have errors, you may have violated one of the best practices above (Is the package loaded? Are the data and .rmd files in the same place? Is all the prerequisite script run?).
- Still overwhelmed? Keep moving forward by downloading a copy of the .rmd file that accompanies any given chapter and just “run it along” with the lecture. Spend your mental power trying to understand what each piece does. Then select a practice problem that is appropriate for your next level of growth.
- Copy script that works elsewhere and replace it with your datafile, variables, etc.
- The leaRning curve is steep, but not impossible. Gladwell(2008) reminds us that it takes about 10,000 hours to get GREAT at something (2,000 to get reasonably competent). Practice. Practice. Practice.
- Updates to R, R Studio, and the packages are NECESSARY, but can also be problematic. It could very well be that updates cause programs/script to fail (e.g., “X has been deprecated for version X.XX”). Moreover, this very well could have happened between my distribution of these resources and your attempt to use it. My personal practice is to update R, R Studio, and the packages a week or two before each academic term.
- Embrace your downward dog. Also, walk away, then come back.
1.3.6 Resources for getting staRted
R for Data Science: https://r4ds.had.co.nz/
R Cookbook: http://shop.oreilly.com/product/9780596809164.do
R Markdown homepage with tutorials: https://rmarkdown.rstudio.com/index.html
R has cheatsheets for everything, here’s one for R Markdown: https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf
R Markdown Reference guide: https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf
Using R Markdown for writing reproducible scientific papers: https://libscie.github.io/rmarkdown-workshop/handout.html
LaTeX equation editor: https://www.codecogs.com/latex/eqneditor.php
References
Gladwell, M. (2008). Outliers: The story of success (First edition.). Little, Brown; Company.