Skip to content

Home

diffhouse is a Python solution for structuring Git metadata, designed to enable large-scale codebase analysis at practical speeds.

Key features are:

  • 🚀 Fast access to commit data, file changes and more
  • 📊 Easy integration with pandas and Polars
  • 🐍 Simple-to-use Python interface

Performance

tweenjs/tween.js benchmark results
Processing times for tween.js. Lower is better.

For more details, see benchmarks.

Requirements

Python 3.10 or higher
Git 2.22 or higher

Git also needs to be added to the system PATH.

Limitations

At its core, diffhouse is a data extraction tool and therefore does not calculate software metrics like code churn or cyclomatic complexity; if this is needed, take a look at PyDriller instead.