Version control for your research
2026-02-08
We’ll focus on a single code writer.
Today:
Why backup matters: Your code is a research output

Importance of version control
thesis_final_v2_FINAL.docx, analysis_old_backup.RWhen you try things (new parameters, a different method, a refactor):
git diff shows every line added or removed.| Tool | Limitation |
|---|---|
| Drive / Dropbox | No real history — only “versions” and sync. Conflicts when two copies change. No “why” for each change. |
| Cluster / shared disk | Same: no per-file history, no clear snapshots. You still end up with run_v1.sh, run_v2.sh. |
| Git | History + messages for every snapshot. Works offline on your laptop, then sync to cluster or server. One workflow everywhere. |
Git = tool that keeps a history of changes in a folder (a repository).
commit A commit B commit C
● --------- ● --------- ● ← main branch
\
● --- ● ← feature branch
D E
Each ● is a snapshot. Branches let you try things without breaking the main line.
Check if Git is installed:
If not: git-scm.com — use default options.
Windows
Use “Git Bash” for the terminal in this workshop (same commands as Mac/Linux).
One-time per machine:
Use the same email as on GitHub/GitLab if you use them.
Start from scratch (new folder, new repo):
Start from an existing repo (e.g. GitHub, GitLab, a colleague’s copy):
git init — Git starts tracking the current folder; you make the first commit.git clone — You get a full copy (history included); you’re ready to work and push back if there’s a remote.Every time you finish a small, logical change:
git addgit commit -m "message"Golden rule
You never commit without staging first. Stage (add) → Commit. Every time.
| What you want | Command |
|---|---|
| What changed? (files) | git status |
| Short status | git status -s |
| List of commits | git log |
| One line per commit | git log --oneline |
Run git status often. It tells you what’s modified, staged, or untracked.
git statusTracked — Git already knows the file (it was in the last commit or you ran git add). Git shows changes: modified, staged, or clean.
Untracked — The file is in the folder but Git is not tracking it yet. It won’t be in the next commit until you git add it.
git status groups files into:
git status — ExampleExample of git status output
| What you do | Command |
|---|---|
| Stage one file | git add <filename> |
| Stage all changes | git add . |
| Commit with message | git commit -m "Add results for experiment 1" |
Good messages: short, present tense, why (“Fix threshold in step 2” not “changes”).
Structure:
Why it matters: Good messages make git log readable. Future you (and collaborators) can understand and find changes without opening every commit. Bad messages (“fix”, “update”, “changes”) make history useless.
Tip: AI can help — paste your git diff and ask for a commit message, or use your editor’s AI to suggest one from the staged changes.
| What you want | Command |
|---|---|
| See changes not yet staged | git diff |
| See changes already staged | git diff --staged |
| Discard changes in a file (careful!) | git restore <file> |
| Unstage a file (keep changes) | git restore --staged <file> |
git diff = working copy vs last commit (or vs staged). Very useful before committing.
Working directory Staging area Repository (history)
(your files) → (git add) → (git commit)
edited staged snapshot
You edit → you add (stage) → you commit. History is only what you committed.
Same Git workflow — status, stage, commit — from your editor instead of the terminal.
Today we practice in the terminal so you understand what the IDE is doing; then you can use either.
Git pane (top-right when in a Git repo):
First time in RStudio
Tools → Global Options → Git: ensure “Git executable” points to your Git (e.g. /usr/bin/git). RStudio will use the same user.name / user.email as the terminal.

New project from Git: File → New Project → Version Control → Git; paste repo URL to clone.


Source Control view (sidebar icon or Ctrl/Cmd+Shift+G):
First time in VS Code
Install the built-in Git support (usually already active). For GitHub: “GitHub Pull Requests and Issues” extension. Git uses your system Git and config (user.name / user.email).
Clone repo: Ctrl/Cmd+Shift+P → “Git: Clone” → paste URL → choose folder.
VS Code Source Control / Git pane
git switch -c my-featuregit switch maingit merge my-featureBranches let you try ideas without touching the main line. We’ll try this in the workshop.
When you want a copy on a server (GitHub/GitLab) — for backup or later sharing:
git remote add origin <url>git push -u origin maingit pullToday we focus on local workflow; remote is one short exercise at the end if time.
A .gitignore file in the repo root tells Git which files or folders not to track.
Why: Avoid committing build outputs, large data, secrets, IDE/OS junk — and keep git status clean.
Example (e.g. for R/Python + data):
Git will ignore these even if they’re in the folder. Add .gitignore early and commit it.
Open workshop/exercises.html (or the workshop/exercises.qmd source).
We’ll do in order:
status → add → commit.git status, git add, git commit -m "...", git log / git diff; branches and remote when you need them.Take-home message
Start small. Pick one project, commit at the end of the day, and move from there.
Things to learn when you’re ready:
git merge); fast-forward vs merge commit.git stash / git stash pop); switch context without committing.git add and finish merge/rebase.git tag paper-initial-submission.git bisect start / good / bad; Git binary-searches history.Resources: git-scm.com/book, GitHub Docs
Any questions? We can redo any exercise or go deeper on workflow or branches.
You’re always welcome to ask me anything in the future.