Friday, February 28, 2020

Understanding Git, Part 3

The final part of my talk about Git. Here are Part 1 and Part 2.

After familiarizing ourselves with the areas of Git repository and basic commands, let’s dive into practical application of this knowledge. In this last part of the talk we will consider how to manipulate changes to achieve your goals, how to track what other people have done, and how to get out of trouble.

Let’s start with understanding how to make amendments to changes. This is useful to correct any mistakes you’ve discovered after commits have already been submitted into local repository. Also, amending your previous changes is required if you are working with a distributed code review system like Gerrit.

As you can remember, the actual commit objects residing in the repository are immutable and can’t be changed once you have created them. However, what we can do is come up with another set of commits, trees, and blobs, connect the new commits to the commit graph, and update the branch reference to point to the new set of commits.

This is illustrated on this diagram. Imagine we have a master branch which points to commit D. We have realized that we need to amend the last two commits: C and D. We use them as starting points for creating new commits, beginning from the oldest one—C. When we create this new commit we can optionally use a different parent commit for it.

After we have created the last new commit—D' in this case, we reset the master branch reference to point to it. Now although the old commits C and D are still in the repository, Git will consider them garbage at some point as they are not referenced by any branch.

Note that even if we didn’t change the trees of C' and D' compared to the original commits, but just reparented C, it will still be a different commit because the commit hash is calculated from entire commit contents which includes the parent reference. And since C' becomes a different commit with a different hash, D' is also different from D even if it still references the same tree.

That was a generic case of history rewriting, let’s consider the simplest one—amending just the last commit. Git has a special option for the git commit command, which is called --amend. If you haven’t staged any changes, executing this command will only result in updating the commit message of the last commit. If you have staged any files, they will replace corresponding files in the last commit.

As always, you can skip the intermediate step of staging by passing the -a option together with --amend.

And this is an illustration of what happens when you amend the last change. The tree of the commit may change or it may stay the same. The commit itself changes if you change the commit description. The parent of the amended commits stays the same. Your current branch gets updated to point to the new commit object.

Let’s consider a more interesting example of amending commits. Imagine we were developing two features: X and Y in dedicated branches, and now want to integrate them into a single branch.

As we mentioned earlier, one way of doing that is to create a merge commit. If that is not an option, then we can take the changes specific to one of the branches and replant them onto another branch.

This is how we do that using git rebase command. Imagine we decided to take the changes from the featureX branch and rebase them onto the featureY branch. For that, we make the featureX branch current, then we provide git rebase with the identifier of the change that ends the commit chain, in this case this is commit B, and the destination branch: featureY.

What happens next is Git takes the commits from that chain and, one by one, plants them onto the destination branch. Then it updates the branch reference. Note that the current branch remain the same after rebasing, thus the HEAD meta-reference doesn’t change.

Here is another practical example of rebasing, this time using a tracking branch. This scenario happens when you are working on a branch which is being changed in a remote repository by someone else. Recall that by default when you do a git pull Git merges your local changes with remote changes. Instead we can do a rebase by specifying --rebase argument. In this case git will automagically rebase your local changes on top of remote changes. Here is how this happens.

To demonstrate what happens, let’s do the same that git pull --rebase does, but this time manually. First we sync with a remote repository using git fetch. By comparing positions of the tracking and upstream branches, Git can find out that there are 100 new changes happened upstream, and we have a couple of changes, too. This is in fact the same situation as we have considered earlier with two features. The only difference is that this time one of the branches is a remote branch reference.

Now we do a rebase. We don’t have to specify the fork point and the destination branch to git rebase this time, because Git assumes we are rebasing onto the upstream branch, and it can figure out the fork point itself. After rebasing we have our changes on top of the changes from the upstream, as expected.

Since while we are rebasing we are creating new commits, it’s possible to apply more radical changes rather than just changing a parent or amending files. With Git interactive rebase we can also re-apply changes in different order, skip some changes, or add new ones, and combine multiple changes together. Basically, if you recall our “Patch algebra”, we can do all those operations on actual commits.

For that we use -i (for “interactive”) option to git rebase.

This power of being able to rearrange our changes in the hindsight allows us to apply “working in small steps” philosophy. What this means in practice is that we commit our changes as often as possible. Usually it’s after we have completed some logical step and the code at least compiles, although this is not a requirement. This allows us to rearrange our changes later, or to find a change that has broke everything.

Even if we use a code review system for the project, working in small steps is beneficial because by the time when we are ready to get our changes reviewed, we can group them into bigger chunks.

Imagine we were working on adding a resampler to our audio code. Let’s say, we have implemented the resampler partially, then hooked it up to existing code, then finished implementation, and then have fixed a bug in the integration code.

Now what we can do is fire up an interactive rebase to organize those changes into a more logical form.

After we have run git rebase -i we end up in our favorite text editor. Here we are presented with a list of changes. By default, Git will just pick them one by one, and this will result in no new commits, because no data will change. However, the interactive rebase offers a wide choice of operations listed below.

It is important to note that the commits are listed in direct chronological order (oldest go first), which is opposite to the order normally used by git log.

So what we have done here is we rearranged the changes in order to group together the two changes that implement the resampler. We use the squash (“s”) command to combine these changes and to edit the message of the resulting commit. On top of them we put the plumbing change and the fix for it. For the fix, we use fixup (“f”) command which simply uses the commit message from the first commit.

After we have done with editing the interactive rebase instructions, Git follow them, and this results in a new sequence of commits.

There is also a manual alternative to rebasing, known as “cherry-picking”. We can basically replant changes manually one by one in the order we want. This can be used, for example, if you want to try a new approach in a new branch, but want to reuse some of the work you have done on another branch.

What you do is you use git cherry-pick command providing a list of commit identifiers as parameters. There is no option to squash commits.

In this example, we decided to try to hook up our resampler in a different way, so we start another branch using master branch as a starting point, then we find out the hashes of the commits we are interested in, and apply them in the right order to our new branch. Obviously, this will result in new commits being created.

In the examples above we used a couple of times special syntax to reference commits that are ancestors of HEAD. The time has come to understand it. I called this section “Mini-Languages” to refer to the concept of Domain-Specific Languages. I also introduce Git-specific jargon here that is encountered in the documentation.

Let’s start with what is called “refname”, which I guess means “the name of a reference”. If you recall, HEAD is the name of the metarefence to the current branch.

Obviously, the name of the branch is a refname too, because as we know, any branch is a reference. If a local branch tracks a remote branch, we can retrieve it using @{upstream} syntax.

Another Git jargon word is “rev” which is a short name for “revision”, and that means specifying a commit. The canonical way is to use the full SHA-1 hash of the commit. However, it’s usually enough to specify just a few first characters of the hash. For small repositories, something like 5 first characters is enough, and for big ones we might need to provide up to 12.

Since the contents of a branch reference is a commit hash, a branch name can be trivially resolved into a commit. That’s why any Git command requiring commit hash will also accept a branch name. If we have a rev, which as we remember is equivalent to having a commit, we can reference the Nth parent of this commit by using the ^N syntax. If N isn’t specified, it’s assumed to be 1.

Since the parent of a rev is also a rev, we can apply the ^ “operator” multiple times moving one parent away with each application. There is a short syntax for this—~. If N isn’t specified it’s assumed to be 1, thus a ~ alone is equivalent to ^.

And finally, this is the syntax we’ve seen when we were talking about reflogs. Recall that a reflog is the list of changes happened to a reference. In order to reference the Nth previous state, we use @{N} syntax.

As you remember, every commit has a tree object associated with it. You can reach that object by adding a colon (:) after the rev. And once you’ve got a tree, you can go down by it and reference files it contains. This way you can designate the state of any file at any commit—kind of a time machine.

Since you typically execute Git commands from some subdirectory of your working tree, Git can use the current path for resolving a relative path. The relative path is specified by prefixing it with ./ character sequence.

Here are a couple of examples. Both assume we are working with the frameworks/av repository of Android. The first example references the state of the file services/audioflinger/Threads.cpp 3 commits ago. The second example assumes that we are currently in the services subdirectory of that repository, and references the state of the same file at the specific commit.

Consult the git revisions help page for the full list of options.

So far, we were talking about how to reference just one commit. It is also useful to be able to specify a section of a path in the commit graph. This is called a “commit range”. A commit range may also denote a set of commits which are disjoint in the commit graph. Commit ranges are frequently used with git log command.

If you provide git log with just one rev, it will interpret it as “all the paths that go down from the commit <rev> to the initial commit”. Actually, if you don’t provide any revision specification to git log it will simply take HEAD. The problem with that is the fact that the path from HEAD to the initial commit is quite long. In order to restrict the range we can use .. syntax, as shown here: <rev1>..<rev2>. This means “all commits not reachable from rev1 but reachable from rev2 (including rev2 itself)”.

In case of a linear history (when no merge commits present), this is simply the path from the commit rev2 down to but not including rev1. However, with non-linear history the result can be more interesting, as we can see on the diagram on the right.

In you don’t specify rev1 or rev2 in the range, Git will use HEAD.

Similar to printf function in C, Git has a template-based mini-language for specifying how to format the output of git log and other commands that can display commits, like git show. There is a set of predefined formats, called “pretty” in Git jargon. You can also construct a format that suits your needs by using placeholders prefixed with the percent symbol %. I’m not listing them here because usage of this mini-language is pretty straightforward and you can look up all the formatting instructions on the git pretty-formats help page.

Now we are armed with the knowledge of how to specify objects we want to view and how to represent the results. Let’s consider the commands that can be used for viewing different kinds of objects in Git.

A versatile command git show can display any repository object: a blob, a tree, or a commit. You can provide either the hash of the object, or a rev, or a path specification we have considered earlier.

The next command is well-known git log. It accepts a commit range, and its output can be limited to only consider certain paths so you can view changes specific only to a certain file or a directory. Don’t forget that you can specify output format and number of commit entries to display to make the output more manageable.

And finally, git status command which we have seen before. Its output can also be scoped to a certain path.

As we have discussed before, Git doesn’t store any diffs in the repository but generates them on request, for example when you invoke git diff command. Remember that diff is always generated from two objects, even if you specify only one or even no arguments to git diff. Here on the slide I listed possible useful options.

Sometimes we are not interested in seeing an entire diff. What we can do then—first, limit diff output to a certain path or paths. We can also ask diff to only display what is called in Git the “diff stats”—the amount of lines changed in every file. We can even only display the names of the changed files, with no extra information.

Of course you know about git blame command in Git. It’s perfect when you can immediately find the commit that made the change in question to the file. However, it’s not always that easy because in big and long-living repositories a lot of refactorings happen that just move lines between files or change formatting. But there is always a way to find the first functional change.

In this example, let’s consider this fragment of code from Android’s frameworks/av repository. Let’s say we want to find out the reason why “FLUSHED” state of a track is also considered as “stopped” state. From git blame output we can see that this line was last touched by someone named Eric in the commit with the hash starting from 81784.

We use git log to check the description of this commit (we could also use git show to view the entire commit), and unfortunately this last change was just code reorganization, so we need to look deeper.

It’s a bit of a problem that the line has been moved from one file to another. We need to find which file had this line before. One way to do that is to go manually through the change, and that’s doable if the change is small. In this case, the change was quite big so I used grep in order to find this line in the output of git show. I instruct grep to show a lot of lines (300) before the match to see the name of the original file and then grep the output once more just to show the line with the file name which diff prefixes with three dashes (---), as we can recall from our discussion of the patch format.

So I can see that the line was originally in AudioFlinger.h file. Now I instruct git blame to start looking from the ancestor of the refactoring commit. I use the hat (^) here to specify that revision, I could also use a tilde (~) if you recall the slide about revision specification. And I specify the name of the file I want to blame.

Now I see the last commit that actually changed this code, it was also made by Eric, and from looking at the description and probably diff, which didn’t fit on the slide so it’s not shown, I can see that I finally have found a functional change.

So git blame is really useful for figuring out commits that have changed a line of code. But changes can also remove lines of code. Obviously, we will not see the removed lines in the output of git blame. In order to see the history of removed lines, we can use the “pickaxe” tool of Git.

Recall that in diffs, each modification of a line of text is described as the removal of the old line of text, and the addition of the new version of that line. Any line can also be removed in one place of one file and added to some other place, maybe in some other file. Obviously, if the change adds a new line, there will be no old version of it. If the change removes a line, there will be no new version.

So what “pickaxe” does—it counts the amount of diff lines that remove a line containing a specified string, and the amount of diff lines that add it back. If these numbers do not match, that means the line has been added or removed in this diff. And this is what pickaxe reports. In this example we are looking for changes that add or remove string 4.0 from the set of files. We provide -S argument to git log which invokes the pickaxe tool.

Some commits add or remove not just lines but entire files. Once our colleague Jean-Michel couldn’t anymore find a file that he knew existed before. He wanted to find the change that removed that file. That’s actually easy with git log which supports --compact-summary argument which only displays “diff stats”. We limit the scope of git log to that particular file and here is what we see—the last commit that had removed the file.

Managing commits is a complex business and sometimes things do not go as you expect. Remember that Git always allows you to pull out from a multi-step operation like rebasing or merging and get back to a clean slate. To pull out, just re-run the same command with --abort argument. Because all of the listed commands check before starting that there are no changes to the working tree, they guarantee that after aborting you end up in the state you were initially.

Here are some more Git commands that can help getting your working tree and index back to some good known state. First, there is git reset command which, if called with only a revision parameter resets the branch reference to the specified revision. You can always look up any previous state of any branch by examining its reflog, as we discussed earlier.

Second, recall about the git stash command which you can use to clear out of your way any uncommited changes without losing them. If you are sure you want to lose them, use git reset --hard command.

And finally, you can always restore any particular file to a previous state using git checkout command. Recall that Git repository is a time machine that stores the state of any file at any previously committed revision.

There is one thing in version control management that people hate as much as going to the dentist or paying taxes—it’s resolving merge conflicts. Let’s try to understand this problem better using the “patch algebra” we have discussed in the first part of this talk. In terms of patches, on one branch we have a file in state A to which a change M had been applied. Then on another branch the same file has received a change P. Now we are trying to merge these changes together. Recall that any change in the diff has a context. A merge conflict occurs if the change M modifies parts of the file that are included in the context of the change P.

A lot of times Git can work around these discrepancies and still apply the patch. However, sometimes the help of a human is required. What I highly recommend to do is to configure Git to use so called “diff3 conflict resolution style” using the command shown on the slide because it provides the most comprehensive information about the conflict.

Let’s consider an example. The change A which is common to both branches adds a file with 3 lines of text. Now let’s imagine that two persons have made two different changes to this file. The change M was done by a humble developer, while the change P was done by a more optimistic marketing person.

This is how Git presents this conflict to you using diff3 style. First goes the part of the file that hasn’t been changed. Then the change on our branch (change M). Then how was this part looking prior to the change on the remote branch, that is, the parent of the change P. In our case this is actually the same as the state A. And finally, how this part looks with the change P applied.

Now what we need to do is to either select one of the versions, or produce a new version which for example combines both changes. We also need to remove the lines with merge conflict markers. And we repeat this process for every conflicting hunk.

Another common scenario we encounter when working on Android is the need to transfer a change from one repository to another. Here Git offers several options.

The first one is to add the second repository as a remote, fetch its contents into our repository, and then cherry-pick the commit containing the change. The problem we encounter here is that with large repositories fetching objects from another repository will take a lot of time and end up consuming a lot of disk space.

So if we are not interested in the history of the change we are transferring, another option is to use so called “mail exchange” scenario. We use git format-patch command on the source repository to export the change in the form of a diff file that contains all the commit metadata, and then apply this diff to the target repository using git am command. git format-patch produces a series of files, one per commit, so we can end up with lots of files.

Finally, instead of using git format-patch, we can produce a diff using git diff that contains all the changes we are interested in, possibly from multiple commits, and then apply it using git apply command. But since in this case the diff will not contain commit metadata, we will need to provide it manually.

Note that while transferring changes from one repository to another it’s a common situation to encounter a merge conflict, because repositories are likely diverged significantly one from another, like Android internal master and AOSP. If there is a conflict you will be presented with the following instruction from Git.

What you need to do is to go over the files that have conflicts, resolve them, and then stage updated files using git add. Then you can let Git to continue.

As I’ve mentioned before, if anything goes wrong, cherry-picking can be aborted at any moment.

These are the final tips for your Git journeys. First, don’t panic! Remember that Git is a time machine and there is always a way out. Second, remember that with the way Git repository is organized it’s actually hard to lose anything. Most probably, your changes are somewhere in the repository, you just need to find them using one of the tools we have discussed. Third, the most important thing you must care about is files in your working directory—if you haven’t saved changes to them and you overwrite your working tree, that’s it—they are lost forever. So take care of the changes you are doing in your working tree and let Git takes care about the rest.

And finally, if you are losing track of what’s going on, recall the building blocks diagram. Commit objects are the keys to everything, so if you need to find something, always start with finding the corresponding commit.

And there is always more to learn about Git! Here are several recommended sources. First, it’s the free book about Git called “Pro Git”.

If you are a visual thinker like myself, then “A visual Git reference” provides a lot of illustrations on what Git commands are doing to Git objects.

Finally, I’m pretty confident that if you’ve got through this talk you are now able to understand the text of official Git documentation pages :) Invoking git help command with the name of another command as a parameter provides you with pages of text you might now even find helpful.

Sunday, February 23, 2020

Understanding Git, Part 2

Publishing the next part of my talk about Git. See the first part here.

Let’s continue to the second part of the talk. Whereas the building blocks we were considering in the first part are more or less abstract and can be used for understanding generic principles of any distributed version control system, the areas and especially commands are specific to Git.

Let’s start with Working tree and Index. The Working tree is something that is understood by all Git users—it’s their sandbox where they do all your edits and file structure manipulations. It can be also referred to as “Working Directory”.

As for the Index, it’s a more confusing area and depending on your experience with other version control systems, index can be either a benefit or a hazard (rephrasing Rick Dekkard). Some version control systems do not have index, thus changes in the working tree are immediately committed into the repository. Not so with Git. Here, Index is an intermediate area where the next commit is staged. That’s why Index is also referred to as “Staging Area”. In order to commit a change to the file, it must be first registered with Index.

Because Index sits between the working tree and the repository, it is also referred to as “Cache”. No wonder why reading Git’s messages and help pages related to Index can be very confusing! Speaking of the building blocks we were considering earlier, both Working tree and Index can be represented as Tree objects. Also note that both Working tree and Index are optional and are only needed to prepare new commits. On a headless Git server they do not exist as there are no users who would commit changes to the server repository directly.

As an example, let’s consider typical output from the git status command. The above output appears when there are changes both to the index and to the working tree. The changes that have already been registered with Index are listed first. Changes that haven’t yet been registered are listed at the bottom.

Note that the same file can be listed in both sections meaning that some of the changes in it are in the Index and some are not. That illustrates the “benefit or a hazard” behavior. On one hand, index allows you to split the changes your are making to the working tree into several commits. But this also can produce an unfinished commit that misses some changes you’ve done in the working tree but forgot to stage.

In this output Git hints us about some commands that can be used in order to move files between the index and the working directory. Let’s explore them.

git add is used to stage to the index any changes in the file. It’s also used to put a file under version control. If you want to undo the changes done to the file in the working tree and use previously staged version, use git restore command. This is a relatively new command and long time git users know that a more universal git checkout command can also be used for the same purpose.

If we want to remove the file from under version control, we use git rm command which erases it in the working tree and also removes from the index. If there are any untracked garbage files or directories in the working tree, they can be easily removed using git clean command.

If we want to see the changes done to the files in the working tree compared to the staging area, we need to use git diff command which generates a diff object.

Moving onto the next level, let’s consider how do we move changes between the index and the repository. The git commit command is well known—it creates a commit object that points to the tree object of the index and adds both to the repository.

In order to see the changes before we commit them we can use git diff with --cached argument—it shows the difference between the index tree and the tree from the last commit.

Previously mentioned git rm and git restore commands can be used to remove or undo changes in the index. Note that there is an inconsistency between the argument names they use.

And of course, there is no direct way to remove a committed change from the repository.

An interesting question: what does it contain in a “clean” state, e.g. after doing git checkout or git reset? A good answer to this question is in the “Reset demystified” section of “Pro Git” book. On checkout, the tree of the index is the same as the tree from the checked out commit (and same as your working dir assuming you haven’t made any changes to it yet). Thus, both git diff and git diff --cached produce an empty diff immediately after a checkout. If we make a change in the working directory to an already tracked file, git diff shows it. Once we stage this change (git add), it is now shown by git diff --cached, but isn't shown by git diff.

Finally, for those users who don’t want to fiddle with the cache, there is a way to work through it, as if it didn’t exist. Using -a argument with git commit command automatically puts all the current changes from the working tree into the index before creating a commit.

git reset with --hard argument can be used to replace the contents of the working tree and the index from the last commit in the repository directly. And git diff HEAD (all caps) can be used to show combined differences between the working tree plus index and the last commit. git diff HEAD puts the tree of the working dir on top of the tree of Index and diffs the result against the tree of the last commit (so if there are both staged and unstaged changes to the same line the latter will be shown).

What is HEAD, exactly? It’s a special reference in the repository that we consider next.

This is the already familiar diagram of the commit graph with branches. HEAD usually points to what is called “current” branch—the line of changes we are working on. Note that HEAD is a meta-reference—instead of pointing to a commit via its hash it instead contains the branch name, thus when we use HEAD, double dereferencing is needed to get to the commit.

This is how HEAD can be used and changed. When we do a commit, git commit uses HEAD to get to the last commit on the current branch and make it the parent of the freshly created commit. Then git commit updates that branch to point to the new commit.

git branch command can be used for creating a new branch without switching to it. A relatively new command for changing the current branch, and thus updating HEAD is git switch. Note that git switch will refuse to work if there are any uncommitted changes in the working tree, because switching to a branch also means populating the working tree contents using the last commit of the branch, and any uncommitted changes will be lost. In case we actually want to lose them, we can use git reset --hard and provide it with the branch name.

git switch with -c argument is used for creating a new branch and switching to it (that is, updating HEAD).

People often ask, what are the differences between relatively new git restore and git switch commands vs. good old git checkout and git reset. There is an excellent write-up on GitHub blog about this. In short, git restore and git switch are safer alternatives to old commands, but they don’t replace them fully. There are some similarities in their arguments, e.g. for interactive piece-by-piece restoration both git reset and git restore use -p argument, but there are differences too, e.g. git switch -c is the same as git checkout -b. In general, the arguments of the new commands should be more straightforward and easier to understand.

The old multi-purpose git checkout command can be used for changing HEAD, too. Unlike git switch which only accept branch names, git checkout can switch HEAD to any commit. However, if the commit is not referenced by any branch, switching to it creates a so called “detached” HEAD. This isn’t a normal state for development as git commit will not be able to refer the new commit from any branch. That means, the new commit will be considered “garbage” by Git and removed from the repository sooner or later.

That’s why Git displays this lengthy message when you switch to a detached HEAD. BTW, it also hints you about a quick way to flip between the last two branches by passing minus (-) argument to git switch. The takeaway is—if you see this message, don’t commit any changes unless you really know what you are doing.

As Git was created for distributed development, it natively supports synchronization with remote repositories. “Remote” doesn’t necessarily means that data is on another machine, it can reside in a different folder on the same hard drive.

To add a remote, use git remote add command. As you can guess, Git sets no artificial restrictions on how many remote repositories you may have.

Let’s do a brief recap on the main commands dealing with remote repositories. git clone allows starting a new local repository from a remote one. It’s a shorthand for creating an empty repository, adding a remote and syncing from it. There are actually two commands for syncing: git fetch and git pull, we will consider the differences between them later. git push is used to sync your local changes back to the remote.

You never manipulate the contents of remote repositories directly. When you sync from a remote repository, all its objects and references get imported into your local repository, thus it effectively becomes a cache for remote data. After you’ve made any changes which means you have created new objects and changed references, these changes can be synced back to the remote using the same synchronization process.

As Git was designed as a peer-to-peer system, it doesn’t matter which side initiates data transfer. The remote side can request data from you, or you can push your local data to remote side. In the latter case you must have the necessary permissions.

Now let’s talk about the difference between git fetch and git pull. The former doesn’t change the state of your branches or the working tree, it just does the synchronization. git pull by default is a shorthand for doing git fetch and then executing git merge. UPDATE: You can pass --rebase argument to do a rebase instead.

Here is an example of using git remote command. First, we list the remotes by passing the -v argument. There are two remotes here, one called “aosp” which is accessed via HTTPS, and another called “internal” which lives on the same machine. It’s possible to use different URLs for fetching and pushing.

Then we execute git fetch command to receive any new objects and references from the remote called “internal”. After it has finished you can access the contents of objects received from the remote.

As we know, branches are the entry points to repository objects. Obviously, it’s a good idea to use branches of the remote repository to access its commits. From the building blocks point of view they don’t represent anything new—they are also files pointing to commits. However, because conceptually they are in remote repository, they can’t be updated in any other way than syncing from a remote. For the same reason, if you attempt to do a git checkout using a remote branch name, this will result in a detached HEAD.

The name of a remote branch reference is the same as the name of the branch in the corresponding remote repository. For namespacing purposes, names of remote branch references are prefixed with the name of the remote.

And here is another term from Git vocabulary—a ”tracking” branch. Since remote branch references are read-only, in order to do any development on a branch coming from a remote repository—it is called “upstream” branch—you need to have a local branch. Usually it has the same name as the upstream branch, but that’s not mandatory. The name of the upstream branch is stored in the metadata of the “tracking” branch.

If you attempt to switch to a non-existing branch using the same name as one of remote branches has, Git will automatically assume that you want to create a new branch that should track that remote branch. In order to have control over this behavior, you can use --track argument of the git switch command.

Here is an example of what happens when you switch to a local tracking branch after doing git fetch. If there were remote changes on the upstream branch since the last sync—the last common commit is labelled “C” on the diagram—Git will offer to merge your local branch with the remote branch to integrate the changes together. Since these branches have diverged, a fast forward is not possible, and a merge commit will need to be created.

If you are not allowed to have merge commits, then instead of doing a merge you should rebase your changes on top of remote changes. We will consider the mechanics of this a bit later.

Enough with remotes, let’s talk about the tool that helps handling work interruptions. Sometimes you are in the middle of working on a change and then something unexpected happens: your previous change has caused a build break and a simple fix is needed immediately, or you’ve got some ideas and need to test them quickly on some other branch. You don’t want to lose your current changes, but you also don’t want to commit them as is.

One possible way of handling this is to create a branch and commit your current changes there, then checkout another branch to restore the working tree contents to a known state. However, there is another more convenient solution—to stash the change away using git stash command. This command saves your current index and working tree modifications into the repository as commits and brings the working tree and index into the previous committed state.

In terms of building blocks, Stash is just another reference, like a branch. However, the last stashed commit does not use the hash of the previously stashed commit as a parent. It’s actually interesting to understand how stashing is implemented.

Let’s say we have the last commit called “M”. We have made some changes and added them to the staging area, then we have also made some changes to the working tree and haven’t yet staged them. Now we execute git stash, what happens?

First, Git creates a commit from the contents of the index, setting “M” as a parent—as if we have executed git commit command. This commit is shown as “I” on the diagram. Git also snapshots working tree contents and creates another commit for them, called “W” on the diagram. This commit has two parents: “M” and “I”, so this is a merge commit. The stash reference is then set to point to that commit. After that, Git resets the index and the working tree to the state of the commit “M”.

This brings up a question: if we do git stash multiple times, how can we find previously stashed commits?

To answer the question, we introduce another powerful tool of Git—the Reflog. Speaking in terms of building blocks, this is just a list of commit hashes. The power comes from the fact that Git adds a node to this list every time any local reference changes. This includes HEAD, branch references, and Stash.

So, answering the question we have previously stated—in order to find previously stashed commits we can look into the reflog of Stash. git stash list command does exactly that. Another useful application for reflog is finding commits that you have accidentally unlinked from branches. You can find their hashes either in the scroll buffer of your terminal—if they are still there— or in the reflog.

Here is an example of executing git reflog command. It lists the history of changes to HEAD. The reference specifications in the reflog commits list use special syntax which we will consider later.