Skip to content

CLI

CLI Reference

This page provides documentation for PatchScope command line tools.

diff-generate

Create patches from local Git repository with provided REPO_PATH

You can add additional options and parameters, which will be passed to the git format-patch command. With those options and arguments you can specify which commits to operate on.

  1. A single commit, , specifies that the commits leading to the tip of the current branch that are not in the history that leads to the to be output. Example: 'HEAD~2'. Not supported with '--use-fanout'.

  2. Generic expression means the commits in the specified range. Example: 'origin/main..main', or '--root HEAD', or '--user=joe --root HEAD'.

If not provided or , a single patch for the current commit on the current branch will be created ('HEAD').

To create patches for everything since the beginning of history up until , use '--root ' as extra options.

Usage:

main [OPTIONS] REPO_PATH

Options:

  REPO_PATH                       Path to git repository.  \[required]
  --output-dir DIRECTORY          Where to save generated patches.
  --use-fanout / --no-use-fanout  Use fan-out when saving patches, save as
                                  *.diff  \[default: no-use-fanout]

diff-annotate

Usage:

 [OPTIONS] COMMAND [ARGS]...

Options:

  -V, --version                   Output version information and exit.
  --use-pylinguist                Use Python clone of github/linguist, if
                                  available.
  --update-languages / --no-update-languages
                                  Use own version of 'languages.yml'
                                  \[default: update-languages]
  --sizes-and-spreads / --no-sizes-and-spreads
                                  Compute patch size and spread metrics
                                  \[default: sizes-and-spreads]
  --ext-to-language EXT:LANGUAGE  Mapping from extension to file language.
                                  Empty value resets mapping.
  --filename-to-language FILENAME:LANGUAGE
                                  Mapping from filename to file language.
                                  Empty value resets mapping.
  --purpose-to-annotation PURPOSE:ANNOTATION
                                  Mapping from file purpose to line
                                  annotation. Empty value resets mapping.
  --pattern-to-purpose PATTERN:PURPOSE
                                  Mapping from pattern to match file path, to
                                  that file purpose. Empty value resets
                                  mapping.
  --line-callback CALLBACK        Body for `line_callback(tokens)` callback
                                  function.  See documentation and examples.

dataset

Annotate all bugs in provided DATASETS

Each DATASET is expected to be existing directory with the following structure, by default:

<dataset_directory>/<bug_directory>/patches/<patch_file>.diff

You can change the /patches/ part with --patches-dir option. For example with --patches-dir='' the script would expect data to have the following structure:

<dataset_directory>/<bug_directory>/<patch_file>.diff

Each DATASET can consist of many BUGs, each BUG should include patch to annotate as *.diff file in 'patches/' subdirectory (or in subdirectory you provide via --patches-dir option).

Usage:

 dataset [OPTIONS] DATASETS...

Options:

  DATASETS...                     \[required]
  --output-prefix DIRECTORY       Where to save files with annotation data.
  --patches-dir DIR_NAME          Subdirectory with patches; use '' to do
                                  without such  \[default: patches]
  --annotations-dir DIR_NAME      Subdirectory to write annotations to; use ''
                                  to do without such  \[default: annotation]
  --uses-fanout / --no-uses-fanout
                                  Dataset was generated with fan-out
                                  \[default: no-uses-fanout]

from-repo

Create annotation data for commits from local Git repository

You can add additional options and parameters, which will be passed to the git log -p command. With those options and arguments you can specify which commits to operate on (defaults to all commits).

See https://git-scm.com/docs/git-log or man git-log (or git log -help).

When no is specified, it defaults to HEAD (i.e. the whole history leading to the current commit). origin..HEAD specifies all the commits reachable from the current commit (i.e. HEAD), but not from origin. For a complete list of ways to spell , see the "Specifying Ranges" section of the gitrevisions(7) manpage:

https://git-scm.com/docs/gitrevisions#_specifying_revisions

Note that --use-fanout and --bugsinpy-layout are mutually exclusive.

Usage:

 from-repo [OPTIONS] REPO_PATH

Options:

  REPO_PATH                       Path to git repository.  \[required]
  --output-dir DIRECTORY          Where to save generated annotated data.
                                  \[required]
  --use-fanout / --no-use-fanout  Use fan-out when saving annotation data
                                  \[default: no-use-fanout]
  --bugsinpy-layout / --no-bugsinpy-layout
                                  Create layout like the one in BugsInPy
                                  \[default: no-bugsinpy-layout]
  --annotations-dir DIR_NAME      Subdirectory to write annotations to; use ''
                                  to do without such  \[default: annotation]
  --use-repo / --no-use-repo      Retrieve pre-/post-image contents from repo,
                                  and use it for lexing  \[default: use-repo]
  -j, --n_jobs INTEGER            Number of processes to use (joblib); 0 turns
                                  feature off  \[default: 0]

patch

Annotate a single PATCH_FILE, writing results to RESULT_JSON

Usage:

 patch [OPTIONS] PATCH_FILE RESULT_JSON

Options:

  PATCH_FILE   unified diff file to annotate  \[required]
  RESULT_JSON  JSON file to write annotation to  \[required]

diff-gather-stats

Usage:

 [OPTIONS] COMMAND [ARGS]...

Options:

  --annotations-dir DIR_NAME  Subdirectory to read annotations from; use '' to
                              do without such  \[default: annotation]

lines-stats

Calculate per-bug and per-file count of line types in provided datasets

Each dataset is expected to be existing directory with the following structure:

<dataset_directory>/<bug_directory>/annotation/<patch_file>.json

Each dataset can consist of many BUGs, each BUG should include patch of annotated *diff.json file in 'annotation/' subdirectory.

Usage:

 lines-stats [OPTIONS] OUTPUT_FILE DATASETS...

Options:

  OUTPUT_FILE                     JSON file to write gathered results to
                                  \[required]
  DATASETS...                     list of dirs with datasets to process
                                  \[required]
  --purpose-to-annotation PURPOSE:LINE_TYPE|PURPOSE
                                  Mapping from file PURPOSE to line type
                                  LINE_TYPE. Each line of such file will be
                                  treated as if it had given type. As a
                                  shortcut, giving PURPOSE is the same as
                                  PURPOSE:PURPOSE. Can be given multiple
                                  times.

list-added-lines

List added lines from all bugs in provided datasets

Each dataset is expected to be existing directory with the following structure:

<dataset_directory>/<bug_directory>/annotation/<patch_file>.json

Each dataset can consist of many bugs, each bug should include patch of annotated *diff.json file in 'annotation/' subdirectory.

Usage:

 list-added-lines [OPTIONS] DATASETS...

Options:

  DATASETS...  \[required]

purpose-counter

Calculate count of purposes from all bugs in provided datasets

Each dataset is expected to be existing directory with the following structure:

<dataset_directory>/<bug_directory>/annotation/<patch_file>.json

Each dataset can consist of many bugs, each bug should include patch of annotated *diff.json file in 'annotation/' subdirectory.

Usage:

 purpose-counter [OPTIONS] DATASETS...

Options:

  DATASETS...             \[required]
  -o, --output JSON_FILE  JSON file to write gathered results to

purpose-per-file

Calculate per-file count of purposes from all bugs in provided datasets

Each dataset is expected to be existing directory with the following structure:

<dataset_directory>/<bug_directory>/annotation/<patch_file>.json

Each dataset can consist of many BUGs, each BUG should include patch of annotated *diff.json file in 'annotation/' subdirectory.

Usage:

 purpose-per-file [OPTIONS] RESULT_JSON DATASETS...

Options:

  RESULT_JSON  JSON file to write gathered results to  \[required]
  DATASETS...  list of dirs with datasets to process  \[required]

timeline

Calculate timeline of bugs with per-bug count of different types of lines

For each bug (bugfix commit), compute the count of lines removed and added by the patch (commit) in all changed files, keeping separate counts for lines with different types, and (separately) with different purposes.

The gathered data is then saved in a format easy to load into dataframe.

Each DATASET is expected to be generated by annotating dataset or creating annotations from a repository, and should be an existing directory with the following structure:

<dataset_directory>/<bug_directory>/annotation/<patch_file>.json

Each dataset can consist of many BUGs, each BUG should include JSON file with its diff/patch annotations as *.json file in 'annotation/' subdirectory (by default).

Saves gathered timeline results to the OUTPUT_FILE.

Usage:

 timeline [OPTIONS] OUTPUT_FILE DATASETS...

Options:

  OUTPUT_FILE                     file to write gathered results to
                                  \[required]
  DATASETS...                     list of dirs with datasets to process
                                  \[required]
  --purpose-to-annotation PURPOSE:LINE_TYPE|PURPOSE
                                  Mapping from file PURPOSE to line type
                                  LINE_TYPE. Each line of such file will be
                                  treated as if it had given type. As a
                                  shortcut, giving PURPOSE is the same as
                                  PURPOSE:PURPOSE. Can be given multiple
                                  times.