Skip to content

Repo

Wrapper around a Git repository.

This class is the main entry point for mining repositories with diffhouse. When used in a with statement, it creates a non-persistent clone of the target repository from which data can be extracted.

Examples:

with Repo('https://github.com/user/repo') as r:
    for c in r.commits:
        print(c.commit_hash[:10], c.date, c.author_email)

    if len(r.branches.to_list()) > 100:
        print('🎉')

    df = r.diffs.to_pandas()

__init__

__init__(source: str, blobs: bool = True)

Initialize the repository.

Parameters:

Name Type Description Default
source str

URL or local path pointing to a Git repository.

required
blobs bool

Whether to download file contents.

True

commits

commits: Extractor[Commit]

Commit history of the repository.

Holds one record per commit.

filemods

filemods: Extractor[FileMod]

File modifications across the commit history.

Holds one record per modified file per commit. Note that this property is unavailable if blobs=False.

diffs

diffs: Extractor[Diff]

Source code changes across the commit history.

Holds one record per code chunk per file per commit. Note that this property is unavailable if blobs=False.

branches

branches: Extractor[Branch]

Branches of the repository.

tags

tags: Extractor[Tag]

Tag names of the repository.

source

source: str

Location where the repository was cloned from.

Can either be a remote URL or a local file URI based on the original input.

clone

clone() -> Repo

Set up a temporary clone of the repository.

This method is an alternative to with statements. Call dispose() to free up resources when done.

Returns:

Type Description
Repo

self

dispose

dispose() -> None

Free up resources associated with the object.

Only needed when clone() is used directly.