GIT for Scales Lab (PART I)
Part I of a how-to guide for integrating Git and Github with RStudio. Developed for Scales Lab at UniSC by Jessica Bolin jessica.bolin@research.usc.edu.au
GITTING started
Git is a version control system. Think of git as like Dropbox for R, with the added bonus of version control, so you can track all changes to your scripts and keep a detailed record of what has happened to your code over time. Git is a fantastic tool to ensure you never lose your work again, and, to facilitate easy collaboration.
I’ve created this guide using my own knowledge, in addition to help from https://happygitwithr.com/index.html, https://inbo.github.io/git-course/course_rstudio.html#213_Create_a_branch_to_experiment and https://jennybc.github.io/2014-05-12-ubc/ubc-r/session03_git.html. If you run into issues, check out these websites for a likely solution.
Note: Ensure you have installed the latest versions of R and Rstudio.
Step 1: Download
macOS
If you are using a new-ish machine with macOS, you’ll likely already have git installed by default. To check, open terminal, type git version
, and it will display the version you have installed. If it shows the version like it does below, then great! You’re good to go!
If not, download Git from here: https://git-scm.com/download/mac and follow the prompts. I used the homebrew option by running brew install git
in the terminal.
Note that you will need homebrew (https://brew.sh) installed for this to work. Homebrew is a macOS-specific software/package manager. If you haven’t already installed homebrew, simply execute /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
in the terminal and you’ll be good to go.
If that doesn’t work, try one of the other options outlined on the Git website.
Windows
Check if git is already installed by opening shell and typing git version
. If it doesn’t work, download here: https://git-scm.com/download/win. Ensure you download the latest version.
Step 2: Create a Github account
We eventually want to integrate git with RStudio. But, before we do this, we need to create a Github account. Github is one of the most commonly used code-hosting platforms for version control and collaboration, hence the name.
- Create an account: https://github.com
- Make sure you write down your username and password as you will need this for the next step.
Step 3: Configure git
In the terminal/shell, do this:
git config --global user.name 'Jane Doe'
git config --global user.email 'jane@example.com'
substituting your name and the email associated with your GitHub account.
To check that it worked, run git config --global --list
and the output should be your name and email, amongst other things.
Sometimes Windows can mess this up, because you might be using the wrong type of shell. You want to be in a “Git Bash” shell, as opposed to Power Shell or the legacy cmd.exe command prompt. So another way you can do the config above is by using the usethis
package, from within R.
If this is the case for you, open RStudio, and execute the following:
## install if needed (do this once):
## install.packages("usethis")
library(usethis)
use_git_config(user.name = "Jane Doe", user.email = "jane@example.org")
Step 4: Activate git in Rstudio
Now things start to get a bit tricky…
In RStudio:
1. Go to Global Options (from the Tools menu)
2. Click Git/SVN
3. Click Enable version control interface for RStudio projects
4. If necessary, enter the path for your Git executable where provided. The ‘Git executable’ is the path to wherever the git.exe
file is on your computer. Mine is in the usr/bin/git.exe
folder (as seen below), but on Windows, git.exe
will likely be something along the lines of C:/Program Files/Git/bin/git.exe
). Ignore the SVN executable and SSH key fields.
5. Click OK
WARNING #1: On Windows, do NOT use C:/Program Files/Git/cmd/git.exe
. bin
in the path is good! cmd
in the path is bad!
WARNING #2: On Windows, do NOT set this to git-bash.exe
. Something that ends in git.exe
is good! git-bash.exe
is bad!
Step 5: Set up a PAT
This is the most important part of the setup process, because a Personal Access Token (PAT) is how Git within RStudio ‘talks’ to Github. Whenever Rstudio ‘talks’ to your Github account, you need to include credentials in the request. A PAT proves that we are a GitHub user, who’s allowed to do whatever we’re asking RStudio to do.
There are two ways you can communicate from RStudio to Github: HTTPS or SSH. HTTPS and SSH are cryptographic network protocols (meaning they encrypt stuff). HTTPS is used to encrypt communication between a browser and a server (e.g., when logging into Facebook); whereas SSH is used for encrypting communication between two computers (e.g., one computer remotely accessing another computer).
With HTTPS, we will use a personal access token (PAT).
- Go to https://github.com/settings/tokens and click “Generate new token”.
- Look over the scopes - I ticked all of them.
- Note that the default expiration date is 30 days. This means that every 30 days, you will need to generate a new token and reconfigure git within RStudio. If you’re lazy, tick ‘no expiration’. However, if you’re working on sensitive code/projects or want to be a bit more hardcore about your security, this isn’t advised.
- Give it a title.
- Click “Generate token”.
- Copy the generated PAT to your clipboard. DON’T LOSE YOUR PAT! This will be the only time you will ever see your PAT. Copy and save it somewhere, in case you lose it.
Provide this PAT next time a Git operation asks for your password in RStudio (we’ll get to this in a second).
Step 6: Create and sync a repo
OK, now we’re getting somewhere!
- In Github, go to ‘Your Repositories’ and click the green ‘New’ button. This creates a new repository, or folder, where we will eventually put our files and scripts. It’s essentially a self-contained directory.
- Give it a name, description, choose either public or private, and tick ‘add a README file’. It’s really good practice to add a README file that contains a brief description and/or file directory of the repo.
- Tick add .gitignore. A gitignore file tells git what files in the repo to ignore when syncing between RStudio and Github. For example, you usually have your .RHistory, .RProj, and .RData files in your .gitignore file, since these are all specific to your own local machine (i.e., it wouldn’t make sense for your collaborators to have these files).
- Ignore the licence option for now, and click ‘create repository’.
Now, you’ll be inside the repo. Click on the green ‘Code’ button, and copy the HTTPS URL to your clipboard (see below). This is called ‘cloning’ a repository.
Nearly there.
- Open Rstudio.
- Click on the ‘Create project’ icon (the box with an R in it, with a green plus icon - next to the create new script icon). In R, Git works via R Projects.
- Click ‘Version Control’
- Click ‘Git’
- Paste your repo URL into the box. Change your directory to wherever you want the repo to be stored locally on your computer. In this case, mine is a folder on my Desktop.
Note that it will ask you for a password or PAT at some point. When it does, input your PAT.
R will now create a new project window, that is linked to your Github! You will know that it’s linked, because there will be a tab called ‘Git’ next to the environment pane in the top right of the RStudio window.
Step 7: Let’s git pushin’
Now it’s time to create some code and sync to Github!
- Within your new RStudio Project, create a new script and save it to the directory. Here, I have created a new script called ‘gitty’ containing one line of code.
- Look at the ‘Git’ tab near the Environment tab. It has some files and the option to check them.
- Check all of the files. You will need to do this at the start of creating a new repo. In future, only tick the files that you wish to commit.You will notice that the colour of the box turns green with an A for ‘Added’. The act of ‘checking’ a file here is called ‘staging’ a file for a commit.
- Click ‘Commit’. This opens a new window.
- Ensure all of your files are still ticked, and write a short message on what you have done in the top right box. You always need to write a message here. It’s really good practice to get into the habit of writing succinct and informative messages so your collaborators know what you have done when they see your changes.
- Click Commit. A terminal window pops up. It displays the commit message that I wrote in the top right ‘Commit message’ box, and how many files have been changed.
- Click Close on the terminal window.
- Notice how there is a new grey rectangle box towards the top that says “your branch is ahead of origin/main’ by 1 commit. So what we have just done, is committed three files to our local repo. Meaning, we have ‘saved’ our changes to our local computer, but these won’t be reflected on our Github repository. We need to do one more step to sync our committed changes to our online Github repository….
- In the top right, click ‘Push’. Push is where you sync the changes from your local repo on your personal computer, to your online Github repository. If it worked, your terminal message will look like below.
- Click close and exit out of the window back to RStudio.
NOTE: Only push small files to Github. It’s ill-advised to try and push huge .nc
files, for example. Github blocks pushes that exceed 100MB, and Git repos have a max storage limit of 2GB. To share large data sets with other developers, I recommend (i) sharing the data set via Dropbox, (ii) get all collaborators to copy the Dropbox folder to their local (i.e., cloned) repository, and (iii) get everyone to add their local file path containing the data to their respective .gitignore
files.
11. Let’s check that it worked. Go to your Github account, and open the repository that is synced to your RStudio. In my case, it’s ‘bolin_test’
12. If it worked, your file should appear in the directory (gitty.R
), along with the commit message, and the time it was pushed (5 minutes ago)!
So remember, there are 5 main steps to syncing scripts from RStudio to your Github repo:
1. Write the code and save the script to your local repo
2. Stage the changed file
3. Commit the file and add a message
4. Push the file
5. (optional) Check it worked on Github
Step 8: Let’s git pullin’
A ‘pull’ is the opposite of a push. You can press the blue down arrow in the ‘Git’ window, which will download any changes from your online repository, that you don’t have on your local computer. This is useful for when you have collaborators working on your scripts, who upload their changes, and you want to download them. OR if you manually add a file to your repo on Github, and want to sync that to your RStudio local repo.
NOTE: If you ‘pull’ when someone is actively working on your scripts, you may run into a merge conflict. A merge conflict is when two or more people have changed the same section of code, causing a ‘conflict’. This is fine (but slightly annoying) and easily resolved by choosing whoever’s changes to keep/kill. But, for brevity, it’s better to communicate with your colleagues so you know who and when people are working on files, so you don’t run into issues.
Step 9: Let’s git collabin’
Often, we will want to add collaborators to our repositories, like supervisors and other students. To do this:
- Go to your Github account
- Click on the relevant repository
- Click ‘Settings’
- Click ‘Collaborators’
- Click the green ‘Add people’ button
- Input their Github username or email address. Note that they will need to have an active Github account.
- Once your collaborators accept your invite, they can now edit your scripts!
Step 10: Let’s git clonin’
We have already cloned a repo in Step 6, when we cloned our ‘bolin_test’ repo from Github into RStudio. But, I’ll reiterate it again here.
- Go to the Github repository
- Above the list of files, click Code
- Click the copy button in the Clone with HTTPS option. This will copy the git URL
- Go to RStudio
- Create a new project -> Version control -> Git
- Paste the git URL and choose a location on your local machine for the repo to live
- Click Create Project
- The repo will clone (can take a while if it’s a big repo) and then, presto!
When you clone a repository, you copy the repository from GitHub.com to your local machine. Cloning a repository pulls down a full copy of all the repository data that GitHub.com has at that point in time, including all versions of every file and folder for the project.
When you create a repository on GitHub.com, it exists as a remote repository. You can clone your repository to create a local copy on your computer and sync between the two locations.
Cloning is different to pulling. Cloning is ‘copying’ a repo from Github as it exists at that point in time. Pulling is ‘updating’ your local repo with any changes that have been made to the online repo by other users.
Note that you won’t be able to see who has cloned your repository, so if I’m working on a project with sensitive info, I keep the repo as private.
Step 11: Let’s git forkin’
A fork is a copy of someone else’s repository that you manage. Forks let you freely experiment and make changes to a repository, without affecting the original repository. It’s a personal copy of someone else’s repo. You can ‘fetch’ updates from the original repository, or submit your own changes to the main repository with a ‘pull request’, that will merge your changes with the original repo, if the repo owner accepts your changes.
NOTE: A fork is different to a clone. A fork is on your online repository (Github); whereas a clone is on your local machine.
I like this analogy of forking and pull requests from https://stackoverflow.com/questions/24939843/what-does-it-mean-to-fork-on-github:
Lets take a scenario in which the teacher is conducting a pop quiz in their class. They usually make copies of the question paper and distribute it (Forking) to the students so that they can work on it and mark the correct answer. The teacher still has the master copy. On the completion of the test, they can collect the copies from students so that they can assess it (Pull request).
To fork someone’s repo, you need to click ‘Fork’ in the top right. Wait a few seconds, and the repo will now be added to your own account. Here, I have forked Dave’s seabird code from Sydeman et al 2021 Science (https://github.com/DavidSchoeman/sydeman_et_al_seabirds)
Once I’ve forked the repo, I can then clone it to my local machine, and make as many changes as I want, without messing around with Dave’s original repo. I have the option to submit a pull request if I want Dave to integrate any changes I’ve made on my local repo, which we’ll go through another time.
Step 12: Let’s git outta here…
That’s enough for now. We’ll talk about pull requests, issues, branches, actions and other Git features in a Part II document (TBA).