Notes on Subproject Support =========================== Junio C Hamano Scenario -------- The examples in the following discussion show how this proposal plans to help this: . A project to build an embedded Linux appliance "gadget" is maintained with git. . The project uses linux-2.6 kernel as its subcomponent. It starts from a particular version of the mainline kernel, but adds its own code and build infrastructure to fit the appliance's needs. . The working tree of the project is laid out this way: + ------------ Makefile - Builds the whole thing. linux-2.6/ - The kernel, perhaps modified for the project. appliance/ - Applications that run on the appliance, and other bits. ------------ . The project is willing to maintain its own changes out of tree of the Linux kernel project, but would want to be able to feed the changes upstream, and incorporate upstream changes to its own tree, taking advantage of the fact that both itself and the Linux kernel project are version controlled with git. . To make the story a bit more interesting, later in the history of development, `linux-2.6/` and `appliance/` directories will be renamed to `kernel/` and `gadget/`. The idea here is to: . Keep `linux-2.6/` part as an independent project. The work by the project on the kernel part can be naturally exchanged with the other kernel developers this way. Specifically, a tree object contained in commit objects belonging to this sub-project does *not* have `linux-2.6/` directory at the top. . Keep the `appliance/` part as another independent project. Applications are supposed to be more or less independent from the kernel version, but some other bits might be tied to a specific kernel version. Again, a tree object contained in commit objects belonging to this sub-project does *not* have `appliance/` directory at the top. . Have another project that combines the whole thing together, so that the project can keep track of which versions of the parts are built together. The Makefile is illustrated above, but there might be other files and directories. We will call the project that binds things together the 'toplevel project'. Other projects that hold `linux-2.6/` part and `appliance/` part are called 'subprojects'. Setting up ---------- Let's say we have been working on the appliance software, independently version controlled with git. Also the kernel part has been version controlled separately, like this: ------------ $ ls -dF current/*/.git current/* current/Makefile current/appliance/.git/ current/linux-2.6/.git/ current/appliance/ current/linux-2.6/ ------------ Now we would want to get a combined project. First we would clone from these repositories (which is not strictly needed -- we could use `$GIT_ALTERNATE_OBJECT_DIRECTORIES` instead): ------------ $ mkdir combined && cd combined $ cp ../current/Makefile . $ git init-db $ mkdir -p .git/refs/subs/{kernel,gadget}/{heads,tags} $ git clone-pack ../current/linux-2.6/ master | read kernel_commit junk $ git clone-pack ../current/appliance/ master | read gadget_commit junk ------------ We will introduce a new command to set up a combined project: ------------ $ git bind-projects \ $kernel_commit linux-2.6/ \ $gadget_commit appliance/ ------------ This would probably do an equivalent of: ------------ $ rm -f "$GIT_DIR/index" $ git read-tree --prefix=linux-2.6/ $kernel_commit $ git read-tree --prefix=appliance/ $gadget_commit $ git update-index --bind linux-2.6/ $kernel_commit $ git update-index --bind appliance/ $gadget_commit ------------ [NOTE] ============ Earlier outlines sent to the git mailing list talked about `$GIT_DIR/bind` to record what subproject are bound to which subtree in the current working tree and index. This proposal instead records that information in the index file with `update-index --bind` command. Also note that in this round of proposal, there is no separate branches that keep track of heads of subprojects. `update-index --bind` is not implemented on the core side yet; it would involve backward incompatible changes to the index format. ============ Let's not forget to add the `Makefile`, and check the whole thing out from the index file. ------------ $ git add Makefile $ git checkout-index -f -u -q -a ------------ Now our directory should be identical with the `current` directory. After making sure of that, we should be able to commit the whole thing: ------------ $ diff -x .git -r ../current ../combined $ git commit -m 'Initial toplevel project commit' ------------ Which should create a new commit object that records what is in the index file as its tree, with `bind` lines to record which subproject commit objects are bound at what subdirectory, and updates the `$GIT_DIR/refs/heads/master`. Such a commit object might look like this: ------------ tree 04803b09c300c8325258ccf2744115acc4c57067 bind 5b2bcc7b2d546c636f79490655b3347acc91d17f linux-2.6/ bind 0bdd79af62e8621359af08f0afca0ce977348ac7 appliance/ author Junio C Hamano 1137965565 -0800 committer Junio C Hamano 1137965565 -0800 Initial toplevel project commit ------------ Notice that `Makefile` at the top is part of the toplevel project in this example, but it is not necessary. We could instead have the appliance subproject include this file. In such a setup, the appliance subproject would have had `Makefile` and `appliance/` directory at the toplevel. The `bind` line for that project would have said "the rest is bound at `/`" and `write-tree \--exclude=linux-2.6/` would have been used to write the tree for that subproject out of the combined index. Making further commits ---------------------- The easiest case is when you updated the Makefile without changing anything in the subprojects. In such a case, we just need to create a new commmit object that records the new tree with the current `HEAD` as its parent, and with the same set of `bind` lines. When we have changes to the subproject part, we would make a separate commit to the subproject part and then record the whole thing by making a commit to the toplevel project. The user interaction might go this way: ------------ $ git commit error: you have changes to the subproject bound at linux-2.6/. $ git commit --subproject linux-2.6/ $ git commit ------------ With the new `\--subproject` option, the directory structure rooted at `linux-2.6/` part is written out as a tree, and a new commit object that records that tree object with the commit bound to that portion of the tree (`5b2bcc7b` in the above example) as its parent is created. Then the final `git commit` would record the whole tree with updated `bind` line for the `linux-2.6/` part. Checking out ------------ After cloning such a toplevel project, `git clone` without `-n` option would check out the working tree. This is done by reading the tree object recorded in the commit object (which records the whole thing), and adding the information from the "bind" line to the index file. ------------ $ cd .. $ git clone -n combined cloned ;# clone the one we created earlier $ cd cloned $ git checkout ------------ This round of proposal does not maintain separate branch heads for subprojects. The bound commits and their subdirectories are recorded in the index file from the commit object, so there is no need to do anything other than updating the index and the working tree. Switching branches ------------------ Along with the traditional two-way merge by `read-tree -m -u`, we would need to look at: . `bind` lines in the current `HEAD` commit. . `bind` lines in the commit we are switching to. . subproject binding information in the index file. to make sure we do sensible things. Just like until very recently we did not allow switching branches when two-way merge would lose local changes, we can start by refusing to switch branches when the subprojects bound in the index do not match what is recorded in the `HEAD` commit. Because in this round of the proposal we do not use the `$GIT_DIR/bind` file nor separate branches to keep track of heads of the subprojects, there is nothing else other than the working tree and the index file that needs to be updated when switching branches. Merging ------- Merging two branches of the toplevel projects can use the traditional merging mechanism mostly unchanged. The merge base computation can be done using the `parent` ancestry information taken from the two toplevel project branch heads being merged, and merging of the whole tree can be done with a three-way merge of the whole tree using the merge base and two head commits. For reasons described later, we would not merge the subproject parts of the trees during this step, though. When the two branch heads use different versions of subproject, things get a bit tricky. First, let's forget for a moment about the case where they bind the same project at different location. We would refuse if they do not have the same number of `bind` lines that bind something at the same subdirectories. ------------ $ git merge 'Merge in a side branch' HEAD side error: the merged heads have subprojects bound at different places. ours: linux-2.6/ appliance/ theirs: kernel/ gadget/ manual/ ------------ Such renaming can be handled by first moving the bind points in our branch, and redoing the merge (this is a rare operation anyway). It might go like this: ------------ $ git reset $ git update-index --unbind linux-2.6/ $ git update-index --unbind appliance/ $ git update-index --bind $kernel_commit kernel/ $ git update-index --bind $gadget_commit gadget/ $ git commit -m 'Prepare for merge with side branch' $ git merge 'Merge in a side branch' HEAD side error: the merged heads have subprojects bound at different places. ours: kernel/ gadget/ theirs: kernel/ gadget/ manual/ ------------ [NOTE] ============ Again, `update-index --unbind` is not implemented yet on the core side. ============ Their branch added another subproject, so this did not work (or it could be the other way around -- we might have been the one with `manual/` subproject while they didn't). This suggests that we may want an option to `git merge` to allow taking a union of subprojects. Again, this is a rare operation, and always taking a union would have created a toplevel project that had both `kernel/` and `linux-2.6/` bound to the same Linux kernel project from possibly different vintage, so it would be prudent to require the set of bound subprojects to exactly match and give the user an option to take a union. ------------ $ git merge --union-subprojects 'Merge in a side branch HEAD side error: the subproject at 'kernel/' needs to be merged first. ------------ Here, the version of the Linux kernel project in the `side` branch was different from what our branch had on our `bind` line. On what kind of difference should we give this error? Initially, I think we could require one is the fast forward of the other (ours might be ahead of theirs, or the other way around), and take the descendant. Or we could do an independent merge of subprojects heads, using the `parent` ancestry of the bound subproject heads to find their merge-base and doing a three-way merge. This would leave the merge result in the subproject part of the working tree and the index. [NOTE] This is the reason we did not do the whole-tree three way merge earlier. The subproject commit bound to the merge base commit used for the toplevel project may not be the merge base between the subproject commits bound to the two toplevel project commits. So let's deal with the case to merge only a subproject part into our tree first. Merging subprojects ------------------- An operation of more practical importance is to be able to merge in changes done outside to the projects bound to our toplevel project. ------------ $ git pull --subproject=kernel/ git://git.kernel.org/.../linux-2.6/ ------------ might do: . fetch the current `HEAD` commit from Linus. . find the subproject commit bound at kernel/ subtree. . perform the usual three-way merge of these two commits, in `kernel/` part of the working tree. After that, `git commit \--subproject` option would be needed to make a commit. [NOTE] This suggests that we would need to have something similar to `MERGE_HEAD` for merging the subproject part. In the case of merging two toplevel project commits, we probably can read the `bind` lines from the `MERGE_HEAD` commit and either our `HEAD` commit or our index file. Further, we probably would require that the latter two must match, just as we currently require the index file matches our `HEAD` commit before `git merge`. Just like the current `pull = fetch + merge` semantics, the subproject aware version `git pull \--subproject=frotz/` would be a `git fetch \--subproject=frotz/` followed by a `git merge \--subproject=frotz/`. So the above would be: . Fetch the head. + ------------ $ git fetch --subproject=kernel/ git://git.kernel.org/.../linux-2.6/ ------------ + which would fetch the commit chain from the remote repository, and write something like this to `FETCH_HEAD`: + ------------ 3ee68c4...\tfor-merge-into kernel/\tbranch 'master' of git://.../linux-2.6 ------------ . Run `git merge`. + ------------ $ git merge --subproject=kernel/ \ 'Merge git://.../linux-2.6 into kernel/' HEAD 3ee68c4... ------------ . In case it does not cleanly automerge, `git merge` would write the necessary information for a later `git commit` to use in `MERGE_HEAD`. It may look like this: + ------------ 3ee68c4af3fd7228c1be63254b9f884614f9ebb2 kernel/ ------------ + Similarly, `MERGE_MSG` file will hold the merge message. With this, a later invocation of `git commit` to record the result of hand resolving would be able to notice that: . We should be first resolving `kernel/` subproject, not the whole thing. . The remote `HEAD` is `3ee68c4\...` commit. . The merge message is `Merge git://\.../linux-2.6 into kernel/`. and would make a merge commit, and register that resulting commit in the index file using `update-index \--bind` instead of updating *any* branch head. Management of Subprojects ------------------------- While the above as a mechanism would support version controlling of subprojects as a part of *one* larger toplevel project, it probably is worth pointing out that having a separate repository to manage the subproject independently would be a good idea. The same subproject can be incorporated into more than one toplevel projects, and after all, a subproject should be something that can stand on its own. In our example scenario, the `kernel/` project is used as a subproject for the "gadget" product, but at the same time, the organizaton that runs the "gadget" project may use Linux on their development machines, and have their own kernel hackers, not necessarily related to the use of the kernel in the "gadget" product. What this suggests is that not just we need to be able to pull the kernel development history *into* the subproject of the "gadget" project, but also we need to be able to push the development history of the kernel part alone *out* *of* the "gadget" project to another repository that deals only with the kernel part. It might go this way. First the setup: ------------ $ git clone git://git.kernel.org/.../linux-2.6 Linux $ ls -dF * cloned/ combined/ current/ Linux/ ------------ That is, in addition to the `combined/` which we have been using to develop the "gadget" product in, we now have a repository for the kernel, cloned from Linus. In the previous section, we have outlined how we update the kernel subproject part of `combined/` repository from the `kernel.org` repository. The same procedure would work for pulling from `Linux/` repository here. We are now going the other way; propagate the kernel work done in the "gadget" project repository `combined/` back to `Linux/`. We might do this at the lowest level: ------------ $ cd combined $ git cat-file commit HEAD | sed -ne 's|^bind \([0-9a-f]*\) kernel/$|\1|p' >.git/refs/heads/linux26 $ git push ../Linux linux26:master ------------ Or, more realistically, since the `Linux` project might already have their own commits on its `master`: ------------ $ cd Linux $ git pull ../combined linux26 ------------ Either way we would need an easy way to maintain the `linux26` branch in the above example, and that will have to be part of the wrapper scripts like `git commit` (more likely, that would be a job for `git commit \--subproject`) for the usability's sake; in other words, the `cat-file commit` piped to `sed` above is not something the end user would do, but something that is done by the wrapper scripts. Hopefully the people who work in `Linux/` repository would run `format-patch` and feed their changes back to the kernel community.