I use Git a lot, in my daily job as well as for this blog. When using it, I often rebase locally before pushing, to have a clean and readable history.
A sample workflow
For my blog, the branching model looks like the following:
o---o---o---o master \ \---o---o feature/newposts
- master
-
As expected, this branch is the production site.
- feature/newposts
-
The branch is dedicated for new posts. There’s one post per commit.
Also, to speed up rendering, there are only a handful of the latest posts.
The first commit after master is to remove most of them.
|
To publish a new post, I cherry pick from feature/newposts
to the master
branch.
Also, when I make changes to master
, I do rebase feature/newposts
onto master
, to have the latest updates.
The impact of rebasing
Things start to get interesting when I rebase interactively on master
.
- Initial state
A---B---C---D master \ \---a---b feature/newposts
- Rebase interactively on master
A---B---C---D \ \ \ \---a---b feature/newposts \ \---D' master
- Rebase onto master
A---B---C---D \ \--C'---D' master \ \---a---b feature/newposts
See commits C
and D
?
Notice they are not referenced by any branch, and they are not displayed with git log
.
Still, they can be displayed via git reflog
.
Likewise, those commits are not displayed in GUI such as SourceTree. |
Dangling and unreachable commits
Time for some definitions:
- unreachable object
An object which is not reachable from a branch, tag, or any other reference.
- dangling object
An unreachable object which is not reachable even from other unreachable objects; a dangling object has no references to it from any reference or object in the repository.
https://git-scm.com/docs/gitglossary/
Using those definitions, commits C
and D
in the above diagrams are considered unreachable because no reference points to either of them.
Moreover, commit D
is also dangling, because no other object reference it, while commit C
is not because D
points to it.
To list those dangling and unreachable objects, one can use the git fsck
command:
git-fsck - Verifies the connectivity and validity of the objects in the database
https://git-scm.com/docs/git-fsck
For example, to display unreachable commits:
git fsck --unreachable
If an expected commit is not displayed, then perhaps it’s because it’s referenced by a reflog. In that case, there’s an option to ignore reflog references.
git fsck --unreachable --no-reflog
The same command can be used to list dangling commits only.
Replace --unreachable by --dangling .
|
Cleanup proper
Git is quite efficient at storing text. And yet, there’s no point to store neither reflogs nor unreachable commits past a certain point.
There’s a garbage collector in Git. It might run automatically along some commands. You know the GC has been run when there’s an output like the following:
Counting objects: 9451, done. Delta compression using up to 8 threads. Compressing objects: 100% (4657/4657), done. Writing objects: 100% (9451/9451), done. Total 9451 (delta 3843), reused 8900 (delta 3584)
It’s also possible to run it explicitly:
git gc
Calling the GC will remove unreachable objects.
The GC not only removes unreachable objects but also compresses file revisions |
However, remember that most unused objects are still referenced by reflogs. Thus, they are not considered unreachable, and therefore neither are they garbage collected. The question now is how to expire reflogs to make objects unreachable?
Reflogs expiry
To expire reflogs, run:
git reflog expire
Reflogs are separated between standard and unreachable:
Standard | Unreachable | |
---|---|---|
Expired after (by default, days) |
90 |
30 |
Command parameter |
|
|
For example, to expire reflogs older than two weeks instead of the default 90 days value, use:
git reflog expire --expire=2.weeks.ago
After reflogs have been expired, then relevant commits truly become unreachable, and can finally be removed by the garbage collector.
Conclusion
This post has looked into how commits references each other in Git, and how they can be cleaned up. In most cases however, the default regular automated cleanup should be enough. Remember that by removing reflogs and commits, you make it harder on yourself to recover from your mistakes.