Nothing Special   »   [go: up one dir, main page]

Highlights from Git 2.46

Git 2.46 is here with new features like pseudo-merge bitmaps, more capable credential helpers, and a new git config command. Check out our coverage on some of the highlights here.

Orange rectangle with the Git logo and white text overlaid, which reads "Git 2.46 is here!"
|
| 9 minutes

The open source Git project just released Git 2.46 with features and bug fixes from over 96 contributors, 31 of them new. We last caught up with you on the latest in Git back when 2.45 was released.

Before we get into the details of this latest release, we wanted to remind you that Git Merge, the conference for Git users and developers is back this year on September 19-20, in Berlin. GitHub and GitButler are co-hosting the event, which will feature talks from developers working on Git, as well as those developing tools in the Git ecosystem. The call for proposals (CFP) closes on August 8. Check out the website to learn more.

With that out of the way, here is GitHub’s look at some of the most interesting features and changes introduced since last time.

Faster traversals with pseudo-merge bitmaps

Returning readers of these posts will remember our coverage of reachability bitmaps. If you’re new around here or need a refresher, here’s a quick overview. When Git needs to fulfill a fetch request, it starts with a set of commits that the client wants but does not have, and another set that the client already has and does not want. Given this information, Git needs to determine which commits (and trees, blobs, etc.) exist between the two sets in order to fulfill the client’s fetch request. Once it has this information, it generates a packfile to send to the client containing the objects that it requested.

So, how does Git determine the set of objects to send between the “haves” and “wants”? One way to do this is to walk backwards through history, starting with each commit in the “wanted” set, marking them as interesting, then examining their parents, and so forth until you have no more interesting commits left. When Git encounters a commit along the way, it can either continue the walk (if the commit isn’t reachable from any of the “haves”), or not (otherwise).

This naive object walk determines the set of objects to send, but can be slow depending on the size of the “haves” or “wants” set, the number of commits between the two, how deep the intermediate trees are, and so on. In order to optimize this computation, Git uses reachability bitmaps to more quickly determine the set of objects to send. Bitmaps store the set of objects reachable from some set of commits, encoding this set as a bitset where each bit uniquely identifies a single object. Here’s an example of that in action:

Reachability traversal using pack-bitmaps without pseudo-merge bitmaps enabled.

In the above example, each commit is represented by a circle, with boxes indicating a handful of references (like refs/heads/main, refs/heads/foo, etc). In the bottom-left there are three bitmaps for commits C(0,1), C(0,3), and C(0,5). To determine the set of objects to send, Git starts by walking from each of the given references backwards along each commit’s parents, marking reachable commits with the color green. If a commit is already colored green during the traversal, we can stop early. When Git encounters commit C(0,5), it sees that that commit has a bitmap, which indicates the set of objects reachable from it. Whenever this happens, Git can quickly mark all of those objects reachable without actually having to walk along that portion of history.

So, bitmaps can speed-up many object traversals like the above. But notice that there is still a lot of manual object traversal taking place from the other branches, like foo, bar, and so on. Git could write more bitmaps, but doing so can be expensive1.

In Git 2.46, Git introduced experimental support for a new kind of bitmap, called a pseudo-merge reachability bitmap. Instead of storing bitmaps for individual commits, pseudo-merge bitmaps store the set of objects reachable from multiple commits, rather than any single one. If the client wants commits reachable from, say, foo, bar, and baz, it suffices to have a single pseudo-merge bitmap corresponding to those commits. Here’s an example of the same traversal from earlier, but this time with pseudo-merge bitmaps:

Reachability traversal using pack-bitmaps with pseudo-merge bitmaps enabled.

Notice that there’s a new bitmap in the lower-left hand corner, corresponding to the pseudo-merge between commits C(1,1), C(2,2), C(0,7), and C(3,4). The first three are all part of the explicit “wants” set, and C(3,4) is implicitly wanted since it is reachable from the baz and quux is implicitly wanted since it is reachable from the baz and quux branches.

Git starts its traversal in the same way, walking backwards from each of the starting commits. At each point, it checks whether or not any pseudo-merges are usable: that is, whether the commits they describe are all either explicitly or implicitly wanted by the client. When Git starts walking backwards from refs/heads/baz, it marks C(3,4), and determines that the pseudo-merge is usable. Once this happens, all of the commits corresponding to the bits in the pseudo-merge are marked. Finally, Git performs a couple of remaining manual traversal steps backwards from refs/heads/quux until the whole walk is complete.

This post just covers the tip of the iceberg with pseudo-merge bitmaps, which have a ton of new configuration options to determine how pseudo-merges get selected and organized in large repositories. The new configuration (as well as the feature itself) is still considered experimental, but you can get started by enabling pseudo-merges in your repository like so:

# configure pseduo-merge bitmaps
$ git config bitmapPseudoMerge.all.pattern 'refs/(heads|tags)/'
$ git config bitmapPseudoMerge.all.threshold now
$ git config bitmapPseudoMerge.all.stableThreshold never

# then generate a new *.bitmap file
$ git repack -adb

GitHub will be rolling out pseudo-merge bitmaps in the coming short while, so expect a follow-up post from us with an even deeper dive into the inner workings of pseudo-merge bitmaps when we do.

[source, source]


  • At the end of our coverage of pseudo-merge bitmaps above, we used the git config command to show how to tweak your repository’s .gitconfig file to enable bitmaps. Veteran Git users will know that this command does a whole lot more than just setting configuration options. It can list all configuration settings, get a single one (or multiple, matching a regular expression), unset settings, rename or remove sections, and even open a .gitconfig in your $EDITOR of choice.

    In the past, these options were all hidden behind different command-line options, like --get, --get-all, --unset, and --remove-section, to name a few.

    Git 2.46 shipped a new user-interface for this command, grouping its many capabilities into a few top-level sub-commands, making it much easier to use. For instance, if you want to list all of the configuration settings in your repository, you can simply run git config list. If you want to get a single setting, you can run git config get <name>. If you want to narrow your results down further to just those matching a regular expression, you can use the --regexp option along with the get sub-command.

    These new command-line options and sub-command modes organize git config’s various functions much more neatly, and make the command a lot easier to use while still retaining backwards compatibility with all existing invocations. To learn more about the new modes, check out the documentation for git-config(1).

    [source]

  • We’ve written about Git’s credential helper mechanism in previous posts from this series. If you’re not familiar, this mechanism is used to provide credentials when accessing repositories that require them. Credential helpers translate between Git’s credential helper protocol and other applications like Keychain.app, or libsecret.

    However, HTTP authentication in Git is practically limited to protocols that require a username and password. For services that want to use Bearer authentication, sensitive credentials can be stored in the http.extraHeader configuration option, which requires storing sensitive information in plaintext.

    Git 2.46 enhances the credential helper protocol with a few new tricks with its new authtype and credential fields. The protocol was also extended to support holding onto arbitrary state for each credential helper, as well as multi-round authentication for protocols like NTLM and Kerberos. Together, the new credential helper capabilities enable removing sensitive data from the http.extraHeader configuration, and pave the way for implementing protocols like NTLM and Digest.

    If you’re curious to learn more, you can check out some of the new protocol changes here.

    [source]

  • In our last post, we talked about Git’s preliminary support for a new kind of reference storage backend, reftable. Readers who are curious to learn about more of the details behind this new storage backend are encouraged to check out that post. But a high-level description is that reftable is a new, binary format for reference storage. It is designed to have near constant-time lookup for individual references, efficient lookup of the entire reference namespace through prefix compression, and atomic updates that scale with the size of the update, not the total number of pre-existing references.

    Reftable support is still evolving within Git. In our last update, we said that you can initialize a new repository with the reftable backend by running git init --ref-format=reftable /path/to/repo.git. In Git 2.46, you can now convert existing repositories to use the reftable backend by running the new git refs migrate --ref-format=reftable command.

    Note that reftable support is still considered experimental, and the git refs migrate command has some known limitations when converting repositories to use the new reftable backend. But if you like to live on the bleeding edge, or have a repository that you have a non-reftable copy of, you can experiment with reftable today.

    [source]

  • If you’ve used Git long enough, you may have encountered some of its “advice” messages when performing subtle or unsafe operations, like checking out a detached HEAD state:

    $ git checkout HEAD^
    Note: switching to 'HEAD^'.
    
    You are in 'detached HEAD' state. You can look around, make experimental
    changes and commit them, and you can discard any commits you make in this
    state without impacting any branches by switching back to a branch.
    
    [...] 
    

    These advice messages are meant to provide helpful tips when performing operations that may have unintended consequences (like committing on a detached head state, which makes it easier to lose your commits when not being careful). You can also disable different kinds of advice messages if you are comfortable with the potential consequences. In the above example, you can run git config set advice.detachedHead false to tell Git to suppress the aforementioned advice message.

    But each advice message must be enabled or disabled individually. So, it can be a hassle when scripting, for example, to constantly maintain and update the list of different kinds of advice messages.

    In Git 2.46, Git has a new top-level option, --no-advice, which disables all advice messages. You can use this new option when scripting to avoid cluttering your stderr output with advice messages like the above.

    [source]

  • Git has a comprehensive test suite written primarily in Shell scripts, containing tens of thousands of integration tests2. Though these Shell scripts have been an effective way to test many components of Git, they are not without drawbacks. They can be slower on platforms where the overhead to spawn new processes is higher, like Windows.

    They also can sometimes prove awkward for testing lower level components of Git, like its implementation of the progress meter. For components like the progress meter, Git often implements a low-level test helper which can manipulate whatever component it is testing through a line-oriented protocol that can then be driven from shell scripts.

    Git has begun to convert some of its integration tests to unit tests, making it easier to directly test some of Git’s lower level components. For some of the details on how these conversions have gone thus far, check out some of the source links below.

    [source, source, source, source, source, source, source, source, source]

The rest of the iceberg

That’s just a sample of changes from the latest release. For more, check out the release notes for 2.46, or any previous version in the Git repository.

Notes


  1. Both expensive to generate, and expensive to use. Expensive to generate because it necessarily means walking through more of history to generate *.bitmap files. But also expensive to use, since bitmaps are OR’d together, so having lots of bitmaps means we spend a lot of time decompressing those bitmaps and then OR’ing them together at query time. 
  2. The exact number of tests varies based on what machine you’re running on, since some tests only run when certain packages are installed, or certain file system properties are met. On my machine, running make test ran more than 31,000 tests. 

Written by

Related posts

Game Off 2024 theme announcement

GitHub’s annual month-long game jam, where creativity knows no limits! Throughout November, dive into your favorite game engines, libraries, and programming languages to bring your wildest game ideas to life. Whether you’re a seasoned dev or just getting started, it’s all about having fun and making something awesome!