July, 2026: Google copybara allows one to move code between two prod repositories
March, 1974: IBM COPY allows one to move code between two prod partitioned data sets: OS/MVT and 0S/VS2 TSO Data Utilities COPY, FORMAT, LIST, MERGE User's Guide and Reference https://www.computinghistory.org.uk/downloads/8987
At my previous company we tried to use this tool to sync parts of the code between two different git repos. The tool turned out being unacceptably slow.
Handwritten bash scripts using git-replace and git-filter-repo [1] did a much better job
To those who have used it: is it handy for situations where you have multiple repos that want to share a little code, but it's not worth the trouble of extracting a library, referencing it, publishing versioned releases, updating dependent repos, etc?
And instead just "sync" a code folder from one main repo (perhaps containing common domain models) to other repos?
Basically the Go philosophy that a little bit of copying is better than a lot of dependency?
It’s largely used for syncing external open source projects with the monorepo. Policy is to require source code imports over built artifacts. Though you can get exceptions.
Some projects are also developed in the monorepo and exported via Copybara.
My team also uses it to version Starlark rule sets internally.
Source code imports versus artifacts really neither here nor there. Go is source code imports too.
The key part for Copybara is that Google will make changes to the OSS projects from within the internal repo and everyone else will make changes to the OSS projects.
It's for when you have a monorepo internally, and want to publish parts of it as open source to the world. They still need to live in the monorepo, so this is the solution.
Having a public repo as a dependency for your private corporate repo is a pain in the ass development-wise. Having a tree of such dependencies is a migraine.
Yeah, that's the fun part. Probably built first for exporting monolith slices to OSS, but the reverse direction is more interesting to me. Tracking an upstream or keeping a private fork in sync. That's what makes Copybara useful well beyond the monorepo use case.
Copybara can do that but I think it will be annoying and tedious to use it that way. More annoying than the problem of extracting a library or shoving some files in a separate repo.
Been using this for a while, mostly when I make a tool as part of a larger project and the tool is big enough to deserve its own release.
It’s powerful enough to do a whole bidirectional shipping operation where you export and import code—no thanks, that’s a hassle. I use it mostly for a simple fire and forget export, where I take a folder out of its original repo and preserve the history. Then I just move development to the new repo. The new project layout can be completely different, but Git blame works and I’m happy with that.
The one-way pattern is actually how Google uses it internally too, syncing outward from their monorepo to GitHub. Bidirectional gets messy because transforms (path remapping, file exclusions, header stripping) are easy to apply in one direction but can't always be cleanly inverted. When both sides have diverged, Copybara's baseline tracking starts producing confusing results because semantically equivalent commits generate different SHAs after transform.
One thing worth knowing: history "preservation" is actually cherry-picks with rewritten commits, not a true transplant. Git blame works because the file content and authorship carry over, but the SHAs are new. Copybara embeds the original SHA in a commit message trailer (GitOrigin-RevId), which is useful to know if you ever need to correlate commits across repos after the fact.
There are three ways I've seen it done, though it being Google I assume there's more
One is to try the bidirectional support with copybara itself, thought that usually requires more effort than it's worth.
Another is to have the external repo be the source of truth and then always import into google3. Kythe used to do this at least, though I gather it's not done that way anymore.
The third is to just replicate the patches externally (which is pretty easy to automate or semi-automate on a case by case basis), and verify that a re-copybara-export keeps zero diff
The rust blog post covers this. Subtree performance is terrible on larger repos and they didn’t land something that would fix it on medium sized ones. That’s why they went with a better maintained solution that scales.
If the only need you need is sync repos without exclusions or transformations I wouldn't bother, it could work for you until it doesn't when they archive it or kill it like kaniko or so many other google products/tools.
Gitlab has really simple way to mirror from Gitlab to Github or other git vendors/servers
yea copybara is really for doing transforms in one or both directions.
E.g. translating from external bzl to internal blaze BUILD compatibility, changing between external imports and internal third_party style imports, etc etc.
Does this tool allow changes in both repositories? (with a 3 way merge strategy)
git subtrees come close, but I have a use case where I need transformations/file filters on top.
Interesting. Anyone knows how this compares to using git submodules and subtrees?
I had used those to create separate repo for website artifacts while the same also remain plugged into the webapp dev repo. (Both sides remain modifiable and changes mergeable to the
other side.)
If you're exporting more than a few commits, I suggest using local repos (/path/to/.git) for both the source and destination. Otherwise it'll be quite slow.
i used this tool when i was at google, extremely helpful in open-sourcing things from google3 to github.
still, i'm glad to just directly develop on github now :)
Wild times when one can go from a HN post about an interesting open source code to a port to a new language in a matter of hours (wip, but almost complete: https://github.com/theolivenbaum/copybara)
52+ years of progress... :-)
July, 2026: Google copybara allows one to move code between two prod repositories
March, 1974: IBM COPY allows one to move code between two prod partitioned data sets: OS/MVT and 0S/VS2 TSO Data Utilities COPY, FORMAT, LIST, MERGE User's Guide and Reference https://www.computinghistory.org.uk/downloads/8987
Does anyone know if it is useful for bidirectional sync between codeberg and GitHub?
At my previous company we tried to use this tool to sync parts of the code between two different git repos. The tool turned out being unacceptably slow.
Handwritten bash scripts using git-replace and git-filter-repo [1] did a much better job
[1]: https://github.com/newren/git-filter-repo
To those who have used it: is it handy for situations where you have multiple repos that want to share a little code, but it's not worth the trouble of extracting a library, referencing it, publishing versioned releases, updating dependent repos, etc?
And instead just "sync" a code folder from one main repo (perhaps containing common domain models) to other repos?
Basically the Go philosophy that a little bit of copying is better than a lot of dependency?
It’s largely used for syncing external open source projects with the monorepo. Policy is to require source code imports over built artifacts. Though you can get exceptions.
Some projects are also developed in the monorepo and exported via Copybara.
My team also uses it to version Starlark rule sets internally.
I suppose it mitigates the potential risk of libraries being poisoned?
Source code imports versus artifacts really neither here nor there. Go is source code imports too.
The key part for Copybara is that Google will make changes to the OSS projects from within the internal repo and everyone else will make changes to the OSS projects.
It's for when you have a monorepo internally, and want to publish parts of it as open source to the world. They still need to live in the monorepo, so this is the solution.
Having a public repo as a dependency for your private corporate repo is a pain in the ass development-wise. Having a tree of such dependencies is a migraine.
It can also be used if you want part of your monorepo to track something open source from the world.
Say, to rebase upstream MySQL changes onto a fork in the monorepo (in a random, non-specific example)
Yeah, that's the fun part. Probably built first for exporting monolith slices to OSS, but the reverse direction is more interesting to me. Tracking an upstream or keeping a private fork in sync. That's what makes Copybara useful well beyond the monorepo use case.
Copybara can do that but I think it will be annoying and tedious to use it that way. More annoying than the problem of extracting a library or shoving some files in a separate repo.
Been using this for a while, mostly when I make a tool as part of a larger project and the tool is big enough to deserve its own release.
It’s powerful enough to do a whole bidirectional shipping operation where you export and import code—no thanks, that’s a hassle. I use it mostly for a simple fire and forget export, where I take a folder out of its original repo and preserve the history. Then I just move development to the new repo. The new project layout can be completely different, but Git blame works and I’m happy with that.
The one-way pattern is actually how Google uses it internally too, syncing outward from their monorepo to GitHub. Bidirectional gets messy because transforms (path remapping, file exclusions, header stripping) are easy to apply in one direction but can't always be cleanly inverted. When both sides have diverged, Copybara's baseline tracking starts producing confusing results because semantically equivalent commits generate different SHAs after transform.
One thing worth knowing: history "preservation" is actually cherry-picks with rewritten commits, not a true transplant. Git blame works because the file content and authorship carry over, but the SHAs are new. Copybara embeds the original SHA in a commit message trailer (GitOrigin-RevId), which is useful to know if you ever need to correlate commits across repos after the fact.
> The one-way pattern is actually how Google uses it internally too, syncing outward from their monorepo to GitHub
Do they not support contributions on the public repos back into the internal monorepo?
There are three ways I've seen it done, though it being Google I assume there's more
One is to try the bidirectional support with copybara itself, thought that usually requires more effort than it's worth.
Another is to have the external repo be the source of truth and then always import into google3. Kythe used to do this at least, though I gather it's not done that way anymore.
The third is to just replicate the patches externally (which is pretty easy to automate or semi-automate on a case by case basis), and verify that a re-copybara-export keeps zero diff
Some other interesting tools in the space. Rust is using a tool called Josh to sync commits:
https://josh-project.dev
The blog post from the Rust people:
https://blog.rust-lang.org/inside-rust/2026/06/04/how-josh-h...
Meta used to have an open source tool called fbshipit. But according to its open source repo they no longer use it:
https://github.com/facebookarchive/fbshipit
Any others in this space?
git subtree was the OG tool: https://apenwarr.ca/log/20090430
It has since been merged into git proper:
https://manpages.debian.org/testing/git-man/git-subtree.1.en...
https://docs.github.com/en/get-started/using-git/about-git-s...
The rust blog post covers this. Subtree performance is terrible on larger repos and they didn’t land something that would fix it on medium sized ones. That’s why they went with a better maintained solution that scales.
Plus Josh seems to do waaay more - dynamically exposing monorepo directories as a separate repo.
I wish all the effort into things like JJ and Pijul was going into solving those sorts of things instead!
If the only need you need is sync repos without exclusions or transformations I wouldn't bother, it could work for you until it doesn't when they archive it or kill it like kaniko or so many other google products/tools.
Gitlab has really simple way to mirror from Gitlab to Github or other git vendors/servers
I really doubt copybara ever gets killed. AFAIK it’s a pretty central tool to google3 and how they maintain and vendor OSS projects at scale
yea copybara is really for doing transforms in one or both directions.
E.g. translating from external bzl to internal blaze BUILD compatibility, changing between external imports and internal third_party style imports, etc etc.
If it's a pure mirror, copybara is super overkill
Does this tool allow changes in both repositories? (with a 3 way merge strategy) git subtrees come close, but I have a use case where I need transformations/file filters on top.
Nice, I built something similar ~5 years ago using nested git repos and scripts to accomplish a similar purpose of combined private and public repos.
My shell script definitely wasn't google scale tho!
Yep, same. I thought it might a wrapper around git subtree but looks like it’s doing quite a lot more!
For example altering commit author emails during sync
Interesting. Anyone knows how this compares to using git submodules and subtrees?
I had used those to create separate repo for website artifacts while the same also remain plugged into the webapp dev repo. (Both sides remain modifiable and changes mergeable to the other side.)
Thx.
Copybara is one of those things that you should have set up yesterday.
It works great and I've seen many teams gain significant productivity when collaborating in a monorepo with public bits.
If you're even toying with an internal monorepo you owe it to yourself to give it a try.
We’re in the process of open-sourcing a few sub-projects within a monorepo, and didn’t know this existed!
I’m curious what downsides folks have experienced with this tool?
Any tips?
If you're exporting more than a few commits, I suggest using local repos (/path/to/.git) for both the source and destination. Otherwise it'll be quite slow.
If you are using Jujutsu you can achieve a basic way to maintain a public repo from a private monorepo with very little code and without Copybara. I wrote up how to do it here: https://vihren.dev/blog/20260625-jj-public-private-workflow/
The main function of copybara is not moving code but modifying code to make it suitable for a different repository structure, build system, etc.
i used this tool when i was at google, extremely helpful in open-sourcing things from google3 to github. still, i'm glad to just directly develop on github now :)
That seems like a tool easily adoptable by folks engaging in dark patterns on GitHub, particularly the malware bait repos.
Cute name. (Naming is hard and important.)
That tune is in my head again... again...
Wild times when one can go from a HN post about an interesting open source code to a port to a new language in a matter of hours (wip, but almost complete: https://github.com/theolivenbaum/copybara)