Repo
Wrapper around a Git repository.
This class is the main entry point for mining repositories with diffhouse.
When used in a with statement, it creates a non-persistent clone of the
target repository from which data can be extracted.
Examples:
with Repo('https://github.com/user/repo') as r:
for c in r.commits:
print(c.commit_hash[:10], c.date, c.author_email)
if len(r.branches.to_list()) > 100:
print('🎉')
df = r.diffs.to_pandas()
__init__
__init__(source: str, blobs: bool = True)
Initialize the repository.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
str
|
URL or local path pointing to a Git repository. |
required |
blobs
|
bool
|
Whether to download file contents. |
True
|
filemods
File modifications across the commit history.
Holds one record per modified file per commit. Note that this property
is unavailable if blobs=False.
diffs
Source code changes across the commit history.
Holds one record per code chunk per file per commit. Note that this
property is unavailable if blobs=False.
source
source: str
Location where the repository was cloned from.
Can either be a remote URL or a local file URI based on the original input.
clone
clone() -> Repo
Set up a temporary clone of the repository.
This method is an alternative to with statements. Call dispose() to
free up resources when done.
Returns:
| Type | Description |
|---|---|
Repo
|
self |
dispose
dispose() -> None
Free up resources associated with the object.
Only needed when clone() is used directly.