Intro
It is not uncommon to find new data scientists entering the workforce with great programming and analytical skills, but with limited experience related to coding on team projects. This can happen for a variety of reasons, and sometimes limits the productivity of these individuals until they get used to the team's workflow.
The goal of this post is to shed some light on a simple, but effective git workflow based on git rebase
to hopefully help new data scientists hit the ground running when joining team projects.
Git is one of the most popular version control systems used by developers to track and store changes to almost any kind of file present on a software project. It has great functionalities related to code synchronization and backup that are specially helpful for developers working on bigger projects.
The following workflow might not be suited for every situation, but it has been great for our team at Auto ML Station.
Git workflow based on git rebase
The following explanation assumes that the team in question uses the develop
branch as their main branch for features in development and that you are about to start a new task on my_new_branch
.
Since there are more people working on the project, two things have to be considered:
- It is not recommended to work directly on the
develop
branch; - You have to pull all the latest changes from the remote
develop
branch to your local repository to make sure your work is up to date with the rest of the team.
This can be done with:
git checkout develop
git pull
After synchronizing your local and remote repos, you can see bellow that E
is the team's latest commit and you want to start your work based on it.
A---B---C---D---E 'develop'
You can use the following commands to create my_new_branch
and switch into it to start working:
git branch my_new_branch
(this step can be done on a web browser via Jira, Bitbucket or some other similar tool that might be synchronized to your remote repository)git checkout my_new_branch
After completing your task, let's assume you used the commands bellow to add 3 new commits (H
, I
and J
) to your new branch.
git status
(to check for modified files)git add <file_names>
(to stage modified files)git commit -m 'my new commit message'
(to commit staged modifications)
Your git tree should look like the following:
H---I---J 'my_new_branch'
/
D---E 'develop'
Suppose that while you were working on my_new_branch
one of your teammates was also working on commits F
and G
, and his/her git tree looks like the this:
F---G 'teammate_branch'
/
D---E 'develop'
At this point, any of you can merge into develop
without conflicts, but as soon as that's done, the other one will have to complete some extra steps to update the local work branch before being able to merge into develop
without issues.
Let's assume your colleague pushed his/her branch before you and your team merged his/her work into the remote develop
.
At this point, you have to pull his/her new changes to your local develop
branch with:
git checkout develop
git pull
After this synchronization your git tree will be as follows:
H---I---J 'my_new_branch'
/
D---E---F---G 'develop'
Now you can switch to my_new_branch
and update it with the latest develop
commits with the commands below.
git checkout my_new_branch
git rebase develop
The rebase
command will update your branch with all commits from develop
and then add your commits on top of them with updated hashes (H*
, I*
and J*
) as seen bellow.
H*--I*--J* 'my_new_branch'
/
D---E---F---G 'develop'
With that, all that's left to do is push your work into the remote repo:
git push --set-upstream origin my_new_branch
or justgit push
in case the target branch already exists remotely.
Now your team can review and merge your branch into develop
(probably by creating a pull request
that I expect to explore on a future post).
The final result will be a develop
branch that looks like this:
D---E---F---G---H*--I*--J* 'develop'
Summary
git checkout develop
git pull
git branch my_new_branch
(this step can be done on a web browser via Jira, Bitbucket or some other similar tool that might be synchronized to your remote repository)git checkout my_new_branch
- Work on your code...
git status
git add <file_names>
git commit -m 'my new commit message'
git checkout develop
git pull
git checkout my_new_branch
git rebase develop
git push --set-upstream origin my_new_branch
or justgit push
in case the target branch already exists remotely
For more information on git rebase
, check the official git rebase documentation for a more detailed explanation of the command and its options.