Skip to content

diff-annotate

diff-annotate

Applies specified annotation rules to selected patches or selected commits, creating JSON files with annotation data, one file per patch or commit.

Common options can be used to specify annotation rules for changed files (which files are considered documentation, where are test files, etc.), for changed lines (for example, whether empty lines should have their own "whitespace" annotation label), and for the changeset itself (whether to compute sizes and spreads metrics for the patchset).

Usage:

console $ diff-annotate [OPTIONS] COMMAND [ARGS]...

Options:

  • -V, --version: Output version information and exit.
  • --use-pylinguist: Use Python clone of github/linguist, if available.
  • --update-languages / --no-update-languages: Use own version of 'languages.yml' [default: update-languages]
  • --sizes-and-spreads / --no-sizes-and-spreads: Compute patch size and spread metrics [default: sizes-and-spreads]
  • --ext-to-language EXT:LANGUAGE: Mapping from extension to file language. Empty value resets mapping.
  • --filename-to-language FILENAME:LANGUAGE: Mapping from filename to file language. Empty value resets mapping.
  • --purpose-to-annotation PURPOSE:ANNOTATION: Mapping from file purpose to line annotation. Empty value resets mapping.
  • --pattern-to-purpose PATTERN:PURPOSE: Mapping from pattern to match file path, to that file purpose. Empty value resets mapping.
  • --line-callback CALLBACK: Body for line_callback(tokens) callback function. See documentation and examples.
  • --help: Show this message and exit.

Commands:

  • dataset: Annotate all bugs in provided DATASETS
  • patch: Annotate a single PATCH_FILE, writing...
  • from-repo: Create annotation data for commits from...

diff-annotate dataset

Annotate all bugs in provided DATASETS

Each DATASET is expected to be an existing directory with the following structure, by default:

<dataset_directory>/<bug_directory>/patches/<patch_file>.diff

You can change the /patches/ part with the --patches-dir option. For example, with --patches-dir='' the script would expect data to have the following structure:

<dataset_directory>/<bug_directory>/<patch_file>.diff

Each DATASET can consist of many BUGs, each BUG should include patch to annotate as *.diff file in 'patches/' subdirectory (or in subdirectory you provide via --patches-dir option).

Usage:

console $ diff-annotate dataset [OPTIONS] DATASETS...

Arguments:

  • DATASETS...: [required]

Options:

  • --output-prefix DIRECTORY: Where to save files with annotation data.
  • --patches-dir DIR_NAME: Subdirectory with patches; use '' to do without such [default: patches]
  • --annotations-dir DIR_NAME: Subdirectory to write annotations to; use '' to do without such [default: annotation]
  • --uses-fanout / --no-uses-fanout: Dataset was generated with fan-out [default: no-uses-fanout]
  • --help: Show this message and exit.

diff-annotate patch

Annotate a single PATCH_FILE, writing results to RESULT_JSON

Usage:

console $ diff-annotate patch [OPTIONS] PATCH_FILE RESULT_JSON

Arguments:

  • PATCH_FILE: unified diff file to annotate [required]
  • RESULT_JSON: JSON file to write annotation to [required]

Options:

  • --help: Show this message and exit.

diff-annotate from-repo

Create annotation data for commits from local Git repository

You can add additional options and parameters, which will be passed to the git log -p command. With those options and arguments you can specify which commits to operate on (defaults to all commits).

See https://git-scm.com/docs/git-log or man git-log (or git log --help).

When no is specified, it defaults to HEAD (i.e., the whole history leading to the current commit). origin..HEAD specifies all the commits reachable from the current commit (i.e., HEAD), but not from origin. For a complete list of ways to spell , see the "Specifying Ranges" section of the gitrevisions(7) manpage:

https://git-scm.com/docs/gitrevisions#_specifying_revisions

Note that --use-fanout and --bugsinpy-layout are mutually exclusive.

Usage:

console $ diff-annotate from-repo [OPTIONS] REPO_PATH

Arguments:

  • REPO_PATH: Path to git repository. [required]

Options:

  • --output-dir DIRECTORY: Where to save generated annotated data. [required]
  • --use-fanout / --no-use-fanout: Use fan-out when saving annotation data [default: no-use-fanout]
  • --bugsinpy-layout / --no-bugsinpy-layout: Create layout like the one in BugsInPy [default: no-bugsinpy-layout]
  • --annotations-dir DIR_NAME: Subdirectory to write annotations to; use '' to do without such [default: annotation]
  • --use-repo / --no-use-repo: Retrieve pre-/post-image contents from repo, and use it for lexing [default: use-repo]
  • -j, --n_jobs INTEGER: Number of processes to use (joblib); 0 turns feature off [default: 0]
  • --help: Show this message and exit.