I would like to put a Git project on GitHub but it contains certain files with sensitive data (usernames and passwords, like /config/deploy.rb for capistrano).
I know I can add these filenames to .gitignore, but this would not remove their history within Git.
I also don't want to start over again by deleting the /.git directory.
Is there a way to remove all traces of a particular file in your Git history?
For all practical purposes, the first thing you should be worried about is CHANGING YOUR PASSWORDS! It's not clear from your question whether your git repository is entirely local or whether you have a remote repository elsewhere yet; if it is remote and not secured from others you have a problem. If anyone has cloned that repository before you fix this, they'll have a copy of your passwords on their local machine, and there's no way you can force them to update to your "fixed" version with it gone from history. The only safe thing you can do is change your password to something else everywhere you've used it.
With that out of the way, here's how to fix it. GitHub answered exactly that question as an FAQ:
Note for Windows users: use double quotes (") instead of singles in this command
git filter-branch --index-filter \
'git update-index --remove PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA' <introduction-revision-sha1>..HEAD
git push --force --verbose --dry-run
git push --force
Update 2019:
This is the current code from the FAQ:
git filter-branch --force --index-filter \
"git rm --cached --ignore-unmatch PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA" \
--prune-empty --tag-name-filter cat -- --all
git push --force --verbose --dry-run
git push --force
Keep in mind that once you've pushed this code to a remote repository like GitHub and others have cloned that remote repository, you're now in a situation where you're rewriting history. When others try pull down your latest changes after this, they'll get a message indicating that the changes can't be applied because it's not a fast-forward.
To fix this, they'll have to either delete their existing repository and re-clone it, or follow the instructions under "RECOVERING FROM UPSTREAM REBASE" in the git-rebase manpage.
Tip: Execute git rebase --interactive
In the future, if you accidentally commit some changes with sensitive information but you notice before pushing to a remote repository, there are some easier fixes. If you last commit is the one to add the sensitive information, you can simply remove the sensitive information, then run:
git commit -a --amend
That will amend the previous commit with any new changes you've made, including entire file removals done with a git rm. If the changes are further back in history but still not pushed to a remote repository, you can do an interactive rebase:
git rebase -i origin/master
That opens an editor with the commits you've made since your last common ancestor with the remote repository. Change "pick" to "edit" on any lines representing a commit with sensitive information, and save and quit. Git will walk through the changes, and leave you at a spot where you can:
$EDITOR file-to-fix
git commit -a --amend
git rebase --continue
For each change with sensitive information. Eventually, you'll end up back on your branch, and you can safely push the new changes.
Changing your passwords is a good idea, but for the process of removing password's from your repo's history, I recommend the BFG Repo-Cleaner, a faster, simpler alternative to git-filter-branch explicitly designed for removing private data from Git repos.
Create a private.txt file listing the passwords, etc, that you want to remove (one entry per line) and then run this command:
$ java -jar bfg.jar --replace-text private.txt my-repo.git
All files under a threshold size (1MB by default) in your repo's history will be scanned, and any matching string (that isn't in your latest commit) will be replaced with the string "***REMOVED***". You can then use git gc to clean away the dead data:
$ git gc --prune=now --aggressive
The BFG is typically 10-50x faster than running git-filter-branch and the options are simplified and tailored around these two common use-cases:
Removing Crazy Big Files
Removing Passwords, Credentials & other Private data
Full disclosure: I'm the author of the BFG Repo-Cleaner.
git filter-repo is now officially recommended over git filter-branch
This is mentioned in the manpage of git filter-branch in Git 2.5 itself.
With git filter repo, you could either remove certain files with: Remove folder and its contents from git/GitHub's history
pip install git-filter-repo
git filter-repo --path path/to/remove1 --path path/to/remove2 --invert-paths
This automatically removes empty commits.
Or you can replace certain strings with: How to replace a string in whole Git history?
git filter-repo --replace-text <(echo 'my_password==>xxxxxxxx')
If you pushed to GitHub, force pushing is not enough, delete the repository or contact support
Even if you force push one second afterwards, it is not enough as explained below.
The only valid courses of action are:
is what leaked a changeable credential like a password?
yes: modify your passwords immediately, and consider using more OAuth and API keys!
no (naked pics):
do you care if all issues in the repository get nuked?
no: delete the repository
yes:
contact support
if the leak is very critical to you, to the point that you are willing to get some repository downtime to make it less likely to leak, make it private while you wait for GitHub support to reply to you
Force pushing a second later is not enough because:
GitHub keeps dangling commits for a long time.
GitHub staff does have the power to delete such dangling commits if you contact them however.
I experienced this first hand when I uploaded all GitHub commit emails to a repo they asked me to take it down, so I did, and they did a gc. Pull requests that contain the data have to be deleted however: that repo data remained accessible up to one year after initial takedown due to this.
Dangling commits can be seen either through:
the commit web UI: https://github.com/cirosantilli/test-dangling/commit/53df36c09f092bbb59f2faa34eba15cd89ef8e83 (Wayback machine)
the API: https://api.github.com/repos/cirosantilli/test-dangling/commits/53df36c09f092bbb59f2faa34eba15cd89ef8e83 (Wayback machine)
One convenient way to get the source at that commit then is to use the download zip method, which can accept any reference, e.g.: https://github.com/cirosantilli/myrepo/archive/SHA.zip
It is possible to fetch the missing SHAs either by:
listing API events with type": "PushEvent". E.g. mine: https://api.github.com/users/cirosantilli/events/public (Wayback machine)
more conveniently sometimes, by looking at the SHAs of pull requests that attempted to remove the content
There are scrappers like http://ghtorrent.org/ and https://www.githubarchive.org/ that regularly pool GitHub data and store it elsewhere.
I could not find if they scrape the actual commit diff, and that is unlikely because there would be too much data, but it is technically possible, and the NSA and friends likely have filters to archive only stuff linked to people or commits of interest.
If you delete the repository instead of just force pushing however, commits do disappear even from the API immediately and give 404, e.g. https://api.github.com/repos/cirosantilli/test-dangling-delete/commits/8c08448b5fbf0f891696819f3b2b2d653f7a3824 This works even if you recreate another repository with the same name.
To test this out, I have created a repo: https://github.com/cirosantilli/test-dangling and did:
git init
git remote add origin git#github.com:cirosantilli/test-dangling.git
touch a
git add .
git commit -m 0
git push
touch b
git add .
git commit -m 1
git push
touch c
git rm b
git add .
git commit --amend --no-edit
git push -f
See also: How to remove a dangling commit from GitHub?
I recommend this script by David Underhill, worked like a charm for me.
It adds these commands in addition natacado's filter-branch to clean up the mess it leaves behind:
rm -rf .git/refs/original/
git reflog expire --all
git gc --aggressive --prune
Full script (all credit to David Underhill)
#!/bin/bash
set -o errexit
# Author: David Underhill
# Script to permanently delete files/folders from your git repository. To use
# it, cd to your repository's root and then run the script with a list of paths
# you want to delete, e.g., git-delete-history path1 path2
if [ $# -eq 0 ]; then
exit 0
fi
# make sure we're at the root of git repo
if [ ! -d .git ]; then
echo "Error: must run this script from the root of a git repository"
exit 1
fi
# remove all paths passed as arguments from the history of the repo
files=$#
git filter-branch --index-filter \
"git rm -rf --cached --ignore-unmatch $files" HEAD
# remove the temporary history git-filter-branch
# otherwise leaves behind for a long time
rm -rf .git/refs/original/ && \
git reflog expire --all && \
git gc --aggressive --prune
The last two commands may work better if changed to the following:
git reflog expire --expire=now --all && \
git gc --aggressive --prune=now
You can use git forget-blob.
The usage is pretty simple git forget-blob file-to-forget. You can get more info here
https://ownyourbits.com/2017/01/18/completely-remove-a-file-from-a-git-repository-with-git-forget-blob/
It will disappear from all the commits in your history, reflog, tags and so on
I run into the same problem every now and then, and everytime I have to come back to this post and others, that's why I automated the process.
Credits to contributors from Stack Overflow that allowed me to put this together
Here is my solution in windows
git filter-branch --tree-filter "rm -f 'filedir/filename'" HEAD
git push --force
make sure that the path is correct
otherwise it won't work
I hope it helps
Use filter-branch:
git filter-branch --force --index-filter 'git rm --cached --ignore-unmatch *file_path_relative_to_git_repo*' --prune-empty --tag-name-filter cat -- --all
git push origin *branch_name* -f
To be clear: The accepted answer is correct. Try it first. However, it may be unnecessarily complex for some use cases, particularly if you encounter obnoxious errors such as 'fatal: bad revision --prune-empty', or really don't care about the history of your repo.
An alternative would be:
cd to project's base branch
Remove the sensitive code / file
rm -rf .git/ # Remove all git info from
your code
Go to github and delete your repository
Follow this guide to push your code to a new repository as you normally would -
https://help.github.com/articles/adding-an-existing-project-to-github-using-the-command-line/
This will of course remove all commit history branches, and issues from both your github repo, and your local git repo. If this is unacceptable you will have to use an alternate approach.
Call this the nuclear option.
In my android project I had admob_keys.xml as separated xml file in app/src/main/res/values/ folder. To remove this sensitive file I used below script and worked perfectly.
git filter-branch --force --index-filter \
'git rm --cached --ignore-unmatch app/src/main/res/values/admob_keys.xml' \
--prune-empty --tag-name-filter cat -- --all
I've had to do this a few times to-date. Note that this only works on 1 file at a time.
Get a list of all commits that modified a file. The one at the bottom will the the first commit:
git log --pretty=oneline --branches -- pathToFile
To remove the file from history use the first commit sha1 and the path to file from the previous command, and fill them into this command:
git filter-branch --index-filter 'git rm --cached --ignore-unmatch <path-to-file>' -- <sha1-where-the-file-was-first-added>..
So, It looks something like this:
git rm --cached /config/deploy.rb
echo /config/deploy.rb >> .gitignore
Remove cache for tracked file from git and add that file to .gitignore list
Considering that OP is using GitHub, if one commits sensitive data into a Git repo, one can remove it entirely from the history by using one of the previous options (read more about them below):
The git filter-repo tool (view source on GitHub).
The BFG Repo-Cleaner tool (it is open source - view source on GitHub).
After one of the previous options, there are additional steps to follow. Check the section Additional below.
If the goal is to remove a file that was added in the most recent unpushed commit, read the section Alternative below.
For future considerations, to prevent similar situations, check the For the Future section below.
Option 1
Using git filter-repo. Before moving forward, note that
If you run git filter-repo after stashing changes, you won't be able to retrieve your changes with other stash commands. Before running git filter-repo, we recommend unstashing any changes you've made. To unstash the last set of changes you've stashed, run git stash show -p | git apply -R. For more information, see Git Tools - Stashing and Cleaning.
Let us now remove one file from the history of one's repo and add it to .gitignore (to prevent re-committing it again).
Before moving forward, make sure that one has git filter-repo installed (read here how to install it), and that one has a local copy of one's repo (if that is not the case, see here how to clone a repository).
Open GitBash and access the repository.
cd YOUR-REPOSITORY
(Optional) Backup the .git/config file.
Run
git filter-repo --invert-paths --path PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA
replace PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA with the path to the file you want to remove, not just its filename to:
Force Git to process, but not check out the entire history of every branch and tag.
Remove the specified file (as well as empty commits generated as a result)
Remove some configs (such as remote URL stored in the .git/config file)
Overwrite one's existing tags.
Add the file with sensitive data to .gitignore
echo "YOUR-FILE-WITH-SENSITIVE-DATA" >> .gitignore
git add .gitignore
git commit -m "Add YOUR-FILE-WITH-SENSITIVE-DATA to .gitignore"
Check if everything was removed from one's repository history, and that all branches are checked out. Only then move to the next step.
Force-push the local changes to overwrite your repository on GitHub.com, as well as all the branches you've pushed up. A force push is required to remove sensitive data from your commit history. Read the first note at the bottom of this answer for more details one this.
git push origin --force --all
Option 2
Using BFG Repo-Cleaner. This is faster and simpler than git filter-branch.
For example, to remove one's file with sensitive data and leave your latest commit untouched, run
bfg --delete-files YOUR-FILE-WITH-SENSITIVE-DATA
To replace all text listed in passwords.txt wherever it can be found in your repository's history, run
bfg --replace-text passwords.txt
After the sensitive data is removed, one must force push one's changes to GitHub.
git push --force
Additional
After using one of the options above:
Contact GitHub Support.
(If working with a team) Tell them to rebase, not merge, any branches they created off of one's old (tainted) repository history. One merge commit could reintroduce some or all of the tainted history that one just went to the trouble of purging.
After some time has passed and you're confident that one had no unintended side effects, one can force all objects in one's local repository to be dereferenced and garbage collected with the following commands (using Git 1.8.5 or newer):
git for-each-ref --format="delete %(refname)" refs/original | git update-ref --stdin
git reflog expire --expire=now --all
git gc --prune=now
Alternative
If the file was added with the most recent commit, and one has not pushed to GitHub.com, one can delete the file and amend the commit:
Open GitBash and access the repository.
cd YOUR-REPOSITORY.l
To remove the file, enter git rm --cached:
git rm --cached GIANT_FILE
# Stage our giant file for removal, but leave it on disk
Commit this change using --amend -CHEAD:
git commit --amend -CHEAD
# Amend the previous commit with your change
# Simply making a new commit won't work, as you need
# to remove the file from the unpushed history as well
Push one's commits to GitHub.com:
git push
# Push our rewritten, smaller commit
For the Future
In order to prevent sensitive data to be exposed, other good practices include:
Use a visual program to commit the changes. There are various alternatives (such as GitHub Desktop, GitKraken, gitk, ...) and it could be easier to track the changes.
Avoid the catch-all commands git add . and git commit -a. Instead, use git add filename and git rm filename to individually stage files.
Use git add --interactive to individually review and stage changes within each file.
Use git diff --cached to review the changes that one has staged for commit. This is the exact diff that git commit will produce as long as one doesn't use the -a flag.
Generate Secret Keys in secure hardware (HSM boxes, hardware keys - like Yubikey / Solokey), that never leaves it.
Train the team on x508.
Notes:
When one force pushes, it rewrites the repository history, which removes sensitive data from the commit history. That may overwrite commits that other people have based their work on.
For this answer one used content from some GitHub posts:
Removing sensitive data from a repository
About large files on GitHub
I have a mercurial repository which contains a monolith project I am trying to gradually split. While doing those, I figured I would convert the new sub projects to git hence the one way sync.
A few more details about what I am doing:
the hg repo and the new git repos are located in a private bitbucket cloud account.
I want to keep the commits history while doing the split
All our development is Windows based (although I'm open to do the migration using a unix based system)
the initial repo is 7 years old, it has all sort of tags, closed branches, some branches with unsupported git characters. But more importantly I am happy if I can migrate only the default/master (if it helps me get the job done and doesn't imply losing history)
As we are gradually converting some projects inside the repo (lets say I have 30 projects and I want incrementally to move them) I need to do one way syncs from hg to git. I am not afraid of the merges and I am happy to keep my new repo work outside of master and then just rebase with the hg changes as we go.
I get the idea our mercurial repo is not properly configured (I saw multiple heads, etc) but I am outside my comfort zone when I dig deep into into mercurial backbone.
So far I tried several tools such as fast-export, mercurial hg hggit plugin. However I am struggling to find good step by step tools. (and almost all approaches in this thread Convert Mercurial project to Git)
fast-export was the tool that gave me the best results, I was able to migrate the project once and everything but when I tried to resync I started to get errors, like branch modified outside and multiple heads.
Now that I explained my problem in more detail I can ask the question.
What would be the best approach and tools to use for me to be able to do a one way hg to git migration?
Also, how can I make sure my mercurial repository is correctly configured to avoid any potential issues when migrating to git?
After countless tries, I think I found a way to do what I wanted in a consistent way. For future reference this were the steps:
Installing Necessary tools
Install Git for Windows
Install Tortoise HG or Mercurial standalone
Install Python 2.7 (fast-export does not support Python 3.X at the moment)
Open a Command Line prompt (Run as Admin).
Check if you can run git, mercurial and python as follow:
$ git
$ mercurial
$ python
If you have installed the other ones above and you are getting errors you need to set the path, in my case I only had to do it for Python. So I did:
$ setx path "%path%;C:\Python27"
restart the command prompt and everything should be ready to go.
Install fast-export and clone the mercurial and git repos
Create a clean directory so the work will be contained in there (In my case I wont use the repos inside this directory for anything other than syncing the projects). e.g:
c:\syncprojects
From inside c:\syncprojects start by cloning fast-export
$ git clone https://github.com/frej/fast-export.git fast-export
Then clone the mercurial project
$ hg clone https://bitbucket.org/user/mercurialrepo
Then clone the git project you want to sync into
$ git clone https://bitbucket.org/user/gitrepo gitrepo
It helped me to have a authors file configured correctly so I did
$ cd mercurialrepo
$ hg log | grep user: | sort | uniq | sed 's/user: *//' > ../authors
Then open the authors file which was created in c:\syncprojects make sure the authors file matches something similar to this:
bob=Bob Jones <bob#company.com>
bob#localhost=Bob Jones <bob#company.com>
bob <bob#company.com>=Bob Jones <bob#company.com>
Next step is to start the actual migration, for this step I felt more comfortable using the git bash so I did: In windows explorer, right click on the gitrepo folder and select "Git Bash here"
Then I made my local git repo case sensitive, this helps with the process but its a good thing to have, as I run in to problems with case sensitive folders in the past. Just do:
git config core.ignoreCase false
Trigger the sync
Finally I did:
$ c:\syncprojects\fast-export\hg-fast-export.sh -r c:\syncprojects\mercurialrepo -A c:\syncprojects\authors --force
If all goes well (and this does not necessarily happen all the time, for multiple reasons I had issues with the heads in mercurial, issues with local changes in the git repo I am trying to sync into).
All we need to do is checkout the head and push the changes to remote, as such:
$ git checkout HEAD
$ git push -u origin master
The next time you want to do a sync just repeat the final part:
$ c:\syncprojects\fast-export\hg-fast-export.sh -r c:\syncprojects\mercurialrepo -A c:\syncprojects\authors --force
$ git checkout HEAD
$ git push -u origin master
Unfortunately the steps are not as fast forward as it looks but this was the more concise and consistent way I found to tackle my problem.
A few more tips:
I did not merge any new code to master in the newly created git repo.
Until I am totally satisfied and able to stop the sync I will have a
branch that contains the changes I do in my day to day and
periodically merge master back into that branch.
Do not use this repos for development, fast-export stores data inside
the repos that might get lost and make the re-syncing project very
hard to achieve. Clone the repos in a separate location (and please
be careful with checking other branches out in this repos for the
same reason).
I was switching branch in jboss developer studio (just like eclipse). It asked me to commit your changes before switching. I entered temporary commit as commit message. Then IDE was doing it's work.
But when IDE was doing all this, system was crashed due to unavailability of electricity.
When power was back I saw much of my files were blank, IDE is not recognizing project as git project, git status is saying not a git repository.
How can I recover git repository data if system was crashed during switching branches?
Try to check what's inside the reflog, with :
git reflog
If you are lucky enough you should find the temporary commit into the reflog list, and then you can restore it from there.
In the case you won't find anything inside the reflog try with this command :
git fsck --full --no-reflogs --unreachable --lost-found
The listed commits are copied into .git/lost-found/commit/, and non-commit objects are copied into .git/lost-found/other/ .
We had a number of developers working on a large website project using Git. We have a GitHub repository and then we have the website on the server, plus all the developers have their local versions.
When we finally launched the project, I got lazy (hangs head in shame) and started making changes directly to the server, without pushing them back to the Github repo. However, other people made changes to the repo, for reasons I don't quite understand, that were never pushed down to the server and are now either outdated or wrong. We have been doing this for almost seven months.
Now the server and repo are hopelessly out of sync. I would now like to get the most updated version of the site (which is the server) back up to the Git repository so we can begin another round of development. I basically want to start with a fresh copy of what is on the server.
How would you recommend I proceed? That was the first time I had used Git. It didn't seem like such a big deal at the time but now seems like it is harder to start up again than I thought.
I have looked for instructions and don't really see anything that fits. Because I am not super confident in my Git skills, I am afraid to just start trying the few ideas I did find and losing what I have on the server.
(I know I could restore from a backup if I really messed it up but would prefer not to do that as it would take the site down.)
Can I uninstall git and start again with a fresh repo? Or is there a safe way to push the current version up to the repo?
Thanks for your help.
UPDATE: I found this answer elsewhere (Replace GitHub repo while preserving issues, wiki, etc) but I am not sure how to do this:
cd into "new" repository
git remote add origin git#github.com:myusername/myrepository (replacing myusername & myrepository accordingly)
git push --force origin master
Possibly delete other remote branches and push new ones.
Not sure what they mean by "new repository"
Make a new branch and push it to GH.
Make a new branch based on the previous
branch.
Switch to the new branch (created on #2).
Delete all the files and folders on this branch repository
except the .git folder and contains
(maintain the README.md,
.gitignore and other files if you want it).
Copy all the files from the server except
.git folder.
Commit.
Switch to local Master (created on #1)
Merge this new branch with the previous one.
Solve conflicts
(I use SmartGITthat have a visual conflict solver and helps me a lot, but you can use gitdiffif you don't want a visual interface)
Commit
Push it to GH.
I hope this helps
I figured this out. What I did was:
Make a new branch on Github to effectively store a backup.
$ git add . to stage all changes
$ git commit -m "Commit message" to commit changes
$ git push --force origin master to force changes from server to remote branch master
Once I did this, there were still hundreds of files I had deleted on the server that were not reflected on the remote github.com repository. I used the following:
$ git rm $(git ls-files --deleted)
See Removing multiple files from a Git repo that have already been deleted from disk
Then repeated git commit and git push. Now my github repo matches my server exactly.
I have not yet deleted the "backup" branch I created on github but I will.
Hope that helps someone.
I'm thinking about migrating a project from Sourceforge to Github. Besides the svn to git, what about migrating things like the issue tracker? Is there an easy way to do that?
For SVN to GitHub part, this is now the easiest way: https://help.github.com/en/github/importing-your-projects-to-github/importing-a-repository-with-github-importer
But it doesn't import issues.
I've written a Python script to migrate issues. It's at https://github.com/ttencate/sf2github.
Beware: Sunday afternoon software. Use at your own risk, etc. etc. Pull requests welcome!
since I just have done this here is my approach
create a local git repository from the remote svn repository
git svn clone http://svn/repo/here/trunk
now push the repository to github
git remote rename origin upstream
git remote add origin git#github.com:myname/myproject.git
git push origin master
This script uses rsync to sync the raw svn repo onto your /tmp directory and requires the svn2git ruby gem for importing the svn commit info into git.
If you happen to use a newer version of the SVN infrastructure provided by sourceforge (aka SVN 2.0 dev), you can use this script instead - I forked off the original to just make changes to the rsync command. :)