search iconsearch icon
Type something to search...

Clean repo by rewriting GIT history

Clean repo by rewriting GIT history

0. Motivation

Maintaining a clean and organized Git history is crucial for project management and collaboration. Over time, repositories can become cluttered with unnecessary commits, making it challenging to track changes effectively. In this guide, we’ll walk through the process of cleaning up your Git history by rewriting commits, ensuring a streamlined and efficient version control system.

1. Why Clean Git History Matters

A cluttered Git history can obscure meaningful changes and make it difficult for team members to understand the evolution of a project. By rewriting Git history, you can eliminate unnecessary commits, squash related changes into logical units, and improve the overall clarity and readability of your repository.

2. Step-by-Step Guide to Rewriting Git History

Make sure to read all post before rewriting the history. This way you might be able to use the code at the end to automate the process.

Also you might want to read Domain LogoGit Rewriting History

2.1. Create a New main Branch

To begin, create a new branch named main (or any preferred branch name) to perform the cleanup process without affecting the existing master branch.

If you already had a main branch, rename it to old_main or something similar. Adapt the code in this post so that any reference to master is replaced by old_main

2.2. Cherry Pick the Oldest Commit on master

Cherry pick the oldest commit from the master branch onto the new main branch using the following command:

git cherry-pick <commit_hash>

You might prefer using a git progrom such as Domain LogoSublime Merge instead.

2.3. Edit the new commit date with

Modify the date of the newly cherry-picked commit to match the original commit date using the following commands:

export GIT_COMMITTER_DATE="YYYY-MM-DDTHH:MM:SS"
git commit --amend --no-edit --date="YYYY-MM-DD HH:MM:SS"

2.4. Remove the Tag from the Cherry-Picked Commit

Remove the tag associated with the cherry-picked commit both locally and on GitHub to avoid conflicts with the newly created commit.

2.5. Apply the Tag to the Newly Created Commit

Apply the tag to the newly created commit on the main branch to maintain versioning consistency.

2.6. Push the New Tag to GitHub

Ensure the updated tag is pushed to the remote repository on GitHub using the following command:

git push origin <tag_name> -f

3. Python Script for Automating the Process

Before running the automated process I strongly suggest you first try it in a demo repository. Any problem or issue with the code will mean losing important data that can’t be recovered.

You can use gitpython and some python code to automate the process. To do so you will first need to install gitpython with:

pip install gitpython

Then you can use this script which:

  1. Export the current git history with all commits
  2. Creates a script (commands.sh) with the needed git commands to rewrite the history
# Python script for automating Git history cleanup
import git
import pandas as pd

# Initialize Git repository
repo = git.Repo('/repo/path') # Update with your repository path

# Extract commit information
data = []
for tag in repo.tags:
    commit = tag.commit
    data.append([
        tag.name,
        commit.hexsha,
        commit.authored_datetime,
        commit.message,
    ])

# Convert data to DataFrame
df = pd.DataFrame(data, columns=["tag", "commit", "dt", "message"])
df = df.sort_values("dt").reset_index(drop=True)

# Export commit information to CSV
df.to_csv("commits.csv") # Optional step, helps

# Generate shell commands for executing Git operations
COMMANDS = """
git cherry-pick {commit} -m 1
export GIT_COMMITTER_DATE="{datetime:%Y-%m-%dT%H:%M:%S}"
git commit --amend --no-edit --date="{datetime:%Y-%m-%d %H:%M:%S}"
git tag -a {tag} -m "" -f
git push origin {tag} -f
"""

# Generate shell script for executing commands
out = ""
for _, row in df.iterrows():
    out += COMMANDS.format(**row)

# Write commands to shell script
with open("commands.sh", "w") as stream:
    stream.write(out)

Execute the generated shell script (commands.sh) to automate the Git history cleanup process.

4. Conclusion

By following these steps, you can effectively clean up your Git history and maintain a well-organized repository. Remember to review the changes carefully before pushing them to the remote repository. Embrace the power of Git’s version control capabilities to enhance collaboration and project management within your development workflow.