Skip to content

Home

diffhouse is a Python solution for structuring Git metadata, designed to enable large-scale codebase analysis at practical speeds.

Key features are:

  • Fast access to commit data, file changes and more
  • Easy integration with pandas and polars
  • Simple-to-use Python interface

Requirements

Python 3.10 or higher
Git 2.22 or higher

Git also needs to be added to the system PATH.

Limitations

At its core, diffhouse is a data extraction tool and therefore does not calculate software metrics like code churn or cyclomatic complexity; if this is needed, take a look at PyDriller instead.

Also note that revision data is limited to default branches only.