
Stop Letting Large LFS Files Break Your CI/CD Pipelines
You've just pushed a feature branch. The build starts, the tests run, and then—silence. Ten minutes later, the CI runner crashes with a generic "out of memory" or "network timeout" error. You look at the logs and realize your team accidentally committed a 500MB binary file or a massive uncompressed dataset directly to the Git history. This isn't just a nuisance; it's a bottleneck that kills deployment velocity and drives up your cloud provider's storage bills. This post covers how to identify these culprits and move them into Git Large File Storage (LFS) to keep your repository lightweight.
Standard Git isn't designed to handle large binary blobs. Every time someone pulls the repo, they end up downloading the entire history of those heavy files. If you've accidentally committed a large video file or a heavy machine learning model, simply deleting it in a later commit won't fix the problem—the file still lives in your .git folder. You have to actually strip it from the history.
How do I identify large files in my Git history?
Before you can fix a bloated repository, you need to know exactly what is taking up space. You can't just guess. One of the most effective ways to find the offenders is using a tool like BFG Repo-Cleaner or the built-int git filter-repo. If you want a quick look without specialized tools, you can use a standard Git command to list the largest objects in your current tree:
git rev-list --objects --all | grep "$(git verify-pack -v .git/objects/pack/*.idx | sort -k 3 -n | tail -10 | awk '{print $1}')"This command is a bit of a mouthful, but it shows you the top ten largest files currently bloating your packfiles. Once you identify a file—say, a 200MB data_dump.sql—you know exactly what needs to be migrated to LFS. If you're looking for a more modern approach to repository maintenance, the git-filter-repo documentation provides extensive guidance on stripping heavy artifacts from your history effectively.
How do I set up Git LFS for my project?
Setting up LFS is a two-step process: installation and tracking. First, make sure you have the LFS extension installed on your local machine. If you use macOS, brew install git-lfs is your best friend. Once the binary is on your system, you need to tell Git which file types should be handled by the L than the standard version control system. You do this by creating or updating a .gitattributes file in your root directory.
Here is the typical workflow for adding a new file type to LFS:
- Step 1: Run
git lfs track "*.psd". This adds the pattern to your.gitattributesfile. - Step 2: Add the
.gitattributesfile to your staging area withgit add .gitattributes. - Step 3: Add your large files as usual. Git will now intercept these files and replace them with tiny text pointers.
The beauty of this approach is that your Git history only tracks the small text pointer, while the actual heavy lifting happens on a dedicated LFS server. This keeps your git clone times fast and prevents your CI runners from choking on massive payloads. For a deeper look at how these pointers work under the hood, the official Git LFS documentation is the gold standard for understanding the mechanics of pointer-based storage.
Can I fix a repository that is already bloated?
Yes, but it's a surgical procedure. If you've already committed a massive file and it's stuck in your history, simply adding it to .gitattributes now won't help. The damage is already done; the file is baked into your past commits. To fix this, you need to rewrite the history of the repository.
Using BFG Repo-Cleaner is often much faster and simpler than using git filter-branch. If you have a file named huge_video.mp4 that is killing your CI, you can run a command to strip all instances of that file from every commit in your history. After the rewrite, you'll need to perform a git push --force to update the remote. Warning: this is a destructive action. Coordinate with your team before doing this, as anyone else working on the repo will have a broken local state once you've rewritten the history.
Common Pitfalls to Avoid
A common mistake is forgetting to commit the .gitattributes file before adding the large files. If you add the file first and then run the git lfs track command, the file will be tracked as a standard Git object, and you'll be right back where you started. Always verify your .gitattributes is present and correctly formatted before you start your work.
Another issue involves the CI environment. Many developers set up LFS locally but forget to configure their CI runners (like GitHub Actions or GitLab CI) to fetch the LFS objects. Without the proper setup, your build might succeed, but your tests might fail because they can't find the actual data files—they only see the tiny text pointers. Ensure your CI configuration includes a step to fetch LFS assets, typically through a git lfs pull command or a specific action plugin.
If you find your repository is still slow, check your .gitconfig. Sometimes, local settings can conflict with how LFS handles remote transfers. Keeping your tools updated is a constant battle, but it pays off in the long run when your deployment pipelines actually complete without manual intervention.
