Website for the UCSB Data Science Capstone Preparation Workshop
Welcome!
Optional but recommended:
We really hope that we don’t have to convince you of or describe the numerous and various benefits of being able to properly keep track of different versions of your work. Image source: PhD Comics
git and GitHub are two of the currently popular tools to keep track of changes in text-based files. NB: while you can also use them with binary files, such as images, .doc or .xsls files, they are most powerful for textual data, since you can easily see/preview the differences between versions.
What’s the difference between git and GitHub?
A lot of times you’ll hear people use these terms interchangeably but that’s not entirely correct.
git runs locally on your computer and keeps track of the changes that you make to the files on your machine. You won’t be able to share these changes with your teammates using git alone, and that’s where GitHub comes in.
GitHub is a web service, a “cloud” platform that hosts your projects in repositories and allows you to share your repos with the world.
There are alternative solutions to GitHub, such as GitLab and BitBucket, but they are all designed to let you push
your local changes to the cloud to enable backup, sharing, and collaboration.
For some people, especially if they are not used to working on the command-line, working with git can be confusing.
Image source: XKCD
GitHub allows you to run the same commands as git, except that you can run them through a web browser or a GitHub Desktop app. Additionally, some collaboration features, such as Forking a repo or issuing a Pull Request are only available on GitHub.
Below a list of a few resources to help you get started with using git and GitHub. If you don’t look at any other resources, I expect you to at least work through the first two links below, which should take you less than 1 hour to complete.
Here’s a GitHub Basics Tutorial - How to Use GitHub (31:20) https://youtube.com/watch?v=x0EYpi38Yp4 to get you started. This video covers:
These visual guides might also be helpful in exposing what’s going on behind each git
command you run:
git
commands)You might also find it useful to look over the Git Tutorial: Get Started with Version Control and Command Line Tutorial: Usage in Linux and macOS by Tania Rascia.
If you are working with Jupyter notebooks, I recommend running through the following step-by-step walkthrough, which we provided to the first Data Science Capstone cohort: Hands-on with GitHub https://team-repo.github.io/ucsb-2020-capstone#hands-on-with-github. Note that this walkthrough expects that you can run command-line git commands.
If you have any other resources that helped you, please contact me, so that we can include them here for everyone’s benefit (see the last section for my contact info).
As you can imagine, there are many ways in which one can use git/GitHub and structure their workflow. We recommend using an open-source collaboration model for team projects. In this model, there’s one central repository and several forks (i.e., separate copies) from which the Pull Requests can be submitted.
To be continued…
We’ve created a playground repository for you to practice with: https://github.com/ucsb-ds/workshop2021-playground.
We will refer to it as the PROJECT_REPO
.
Fork the repo
fork the PROJECT_REPO
using the GitHub interface by clicking the “Fork” button.
Note that the forked repo will a different URL (i.e., web address) for each person who forks it. We will refer to it as a PROJECT_REPO_FORK
.
Modify the (forked) repo
Now that you have your own fork (i.e., copy) PROJECT_REPO_FORK
, which is connected to the main PROJECT_REPO
let’s change its contents.
Open up the forked repo and modify the README.md
file by clicking on the pencil icon on the top right.
Add your name and your github ID to that file.
Issue a Pull Request (PR)
In order for your ID to show up on the main site, you now need to let us know about it by issuing a Pull Request (usually abbreviated as PR). Pull Requests (PRs) are typically issued through the GitHub web interface, so find the link that says “Pull Requests” and follow the steps to submit one.
Important: make sure to select the “compare across forks” link and then set the base
and head
repositories and branches accordingly.
Once we approve and merge your pull request others should be able to run git pull
in their PROJECT_REPO_FORK
to update their files locally or see that their repo is now behind the original PROJECT_REPO
.
To be added…
I’ll leave this section as a placeholder for the answers to the questions that I’ll be asked.
These materials are released under the CC BY 4.0 by Yekaterina Kharitonova.
If you have any questions or suggestions, don’t hesitate to reach out to me via ykk@ucsb edu (I rely on your knowledge of how to convert this into an email address ;-)).
Page last updated on Sep 9, 2021