This is a common topic while managing Git repositories. Sometimes, you need to have one (or many) repositories withing another repository.
The laziest way to do it is having a monorepo. A Git monorepo is when a team develops multiple projects, related or not, in a single Git repo, in order to make it easy to share code between different projects. As example, if you have a library that you want to use in different projects, you just put it in its own folder and reference in your projects:
- /your-git-project - /one-project - /project-related-to-project-two - /project-unrelated-to-the-two-other-projects - /library-used-in-all-other-projects
This is very easy to do. But in order to make things more organized, and to prevent your repo to became too big, it’s sometimes a better idea to keep things in separated repos. To do so, Git have two buildin features to make this work, Git submodules and Git subtrees. Let’s see what they are and how to use them.
We can think of submodules as a kind of link that points to another repository. As example, imagine we are starting a repo for a web application. This repo will contain both server-side and client-side code.
$ mkdir webapp $ cd webapp $ git init $ echo "This will be an web app" > README.md $ git add . $ git commit -m "Initial commit"
Now, imagine we will add Pikaday as a submodule, since we will need it in our web app. Doing that is as simple as doing:
$ git submodule add https://github.com/owenmead/Pikaday pikaday $ git add . $ git commit -m "Add Pikaday as a submodule"
The repo structure will look like this:
- /webapp - /.git - /README.md - /.gitmodules - /pikaday - /.git - ... # pikaday source files
What happened here is:
- The pikaday repository was cloned in the
.gitmodulesfile was added with metadata about the submodules in the repo
Let’s take a look in the
$ cat .gitmodules [submodule "pikaday"] path = pikaday url = https://github.com/owenmead/Pikaday
It contains what submodules we have and its clone path. Now let’s take a look in the commit we just made:
$ git show commit 0387cc12229bd31fc3a4f299225ce3c1e1b6aec3 Author: You <firstname.lastname@example.org> Date: Sun Apr 3 15:06:32 2016 -0300 Add Pikaday as a submodule diff --git a/.gitmodules b/.gitmodules new file mode 100644 index 0000000..affc969 --- /dev/null +++ b/.gitmodules @@ -0,0 +1,3 @@ +[submodule "pikaday"] + path = pikaday + url = https://github.com/owenmead/Pikaday diff --git a/pikaday b/pikaday new file mode 160000 index 0000000..d57fa05 --- /dev/null +++ b/pikaday @@ -0,0 +1 @@ +Subproject commit d57fa05193f46a1394635f11bbbcd9c55da2a54c
Here’s the interesting part. Internally, Git stores the submodule as a simple text file, with the exact commit ref it points to:
# pikaday file Subproject commit d57fa05193f46a1394635f11bbbcd9c55da2a54c
This means that the Pikaday source was not commited to the repository when we did the commit. Remember, a Git submodule is just a link to a specific ref in another repository. When another person clones your repository, it won’t see the Pikaday source there. In order to have that, they will have to run:
$ git submodule init $ git submodule update
An alternative is cloning with the
$ git clone --recursive <repo-path>
Git submodules gotchas
There are many gotchas you have to be aware when dealing with submodules. One of them is that Git often keep your submodules checked in detached heads. Imagine we want to make a change in the Pikaday repo. First, we need to make sure we are checked in a branch (often master):
$ cd pikaday $ git checkout master # now we are ready to work
Another gotcha: even after making changes to Pikaday, we have to manually update the ref in the outer repo:
/pikaday $ echo "Foo" > README.md /pikaday $ git add . /pikaday $ git commit -m "Update Pikaday README.md" /pikaday $ git push <your-pikaday-fork-path> master /pikaday $ cd .. / $ git add . / $ git commit -m "Update Pikaday submodule ref"
These additional steps make everything a little more boring and error-prone:
- Someone may forgot to update the ref after making changes in a submodule
- Someone may forgot to do
git submodule updateafter pulling and ending up with a different build
- Someone not much familiar with Git might have problems with dealing with detached heads
- What if you don’t want to have your own fork of a lib to make changes in it?
These and other problems make many people prefer subtrees over submodules, which we will see next.
Subtrees are much simpler than submodules. As opposed to submodules, subtrees’ sources files are stored in the repo. It’s not just a link, the code is really there. There’s also fewer steps required and fewer changes to the workflow.
Subtrees started as a set of scripts that were later made available in the Git itself. It uses some conventions, like metadata written in the commit messages, that made it work without changing how Git work internally. Let’s reproduce the above example, but using subtrees instead:
$ mkdir webapp $ cd webapp $ git init $ echo "My webapp" > README.md $ git add . $ git commit -m "Initial commit" # Here's the important part # Do not forget the ending slash (/) in the prefix # Also do not forget the "--squash" flag, otherwise you will # end up with a very polluted Git history $ git remote add pikaday https://github.com/owenmead/Pikaday $ git subtree add --squash --prefix=pikaday/ pikaday master
Let’s take a look in the log:
$ git log commit a0a9a576b8ce5a73422f6f3f1489faabe7b26dd0 Merge: 230ef84 0277a19 Author: You <email@example.com> Date: Sun Apr 3 16:07:11 2016 -0300 Merge commit '0277a193131f68b873ab83b2618dea89217db757' as 'pikaday' commit 0277a193131f68b873ab83b2618dea89217db757 Author: You <firstname.lastname@example.org> Date: Sun Apr 3 16:07:11 2016 -0300 Squashed 'pikaday/' content from commit d57fa05 git-subtree-dir: pikaday git-subtree-split: d57fa05193f46a1394635f11bbbcd9c55da2a54c commit 230ef8475baeeb9ce9e9940c84d54c214135e5ce Author: You <email@example.com> Date: Sun Apr 3 16:06:47 2016 -0300 Initial commit
What happened is: Git squashed the entire Pikaday history in our repo’s history.
There isn’t another
.git folder, just one. As opposed to submodules, someone
that clones your repo won’t have to do anything else to have all the code.
If, in the future, you have to pull Pikaday changes from its original repository, do:
$ git subtree pull --squash --prefix=pikaday/ pikaday master
If you have write access to the repository, you can also push changes you did to a subtree repo to its original repository:
$ echo "Imagine this is a bug fix" > pikaday/README.md $ git add . $ git commit -m "Pikaday: fix #123" $ git subtree push --prefix=pikaday/ pikaday master
When you do a
git subtree push, Git will collect the commits that changed
files inside the folder specified in the
--prefix option, and push just these
commits to the given repo.
|Harder (specially for Git beginners)||Easier|
|It’s just a link to a commit ref in another repository||Code is merged in the outer repository’s history|
|Requires the submodule to be accessible in a server (like GitHub)||Decentralized|
|Requires additional steps||Just clone, pull and push in a similar way you are already familiar|
|Smaller repository size||Bigger repository size|