CLI
CLI Reference¶
This page provides documentation for PatchScope command line tools.
diff-generate¶
Create patches from local Git repository with provided REPO_PATH
You can add additional options and parameters, which will be passed to
the git format-patch
command. With those options and arguments you
can specify which commits to operate on.
-
A single commit,
, specifies that the commits leading to the tip of the current branch that are not in the history that leads to the to be output. Example: 'HEAD~2'. Not supported with '--use-fanout'. -
Generic
expression means the commits in the specified range. Example: 'origin/main..main', or '--root HEAD', or '--user=joe --root HEAD'.
If not provided
To create patches for everything since the beginning of history
up until
Usage:
main [OPTIONS] REPO_PATH
Options:
REPO_PATH Path to git repository. \[required]
--output-dir DIRECTORY Where to save generated patches.
--use-fanout / --no-use-fanout Use fan-out when saving patches, save as
*.diff \[default: no-use-fanout]
diff-annotate¶
Usage:
[OPTIONS] COMMAND [ARGS]...
Options:
-V, --version Output version information and exit.
--use-pylinguist Use Python clone of github/linguist, if
available.
--update-languages / --no-update-languages
Use own version of 'languages.yml'
\[default: update-languages]
--sizes-and-spreads / --no-sizes-and-spreads
Compute patch size and spread metrics
\[default: sizes-and-spreads]
--ext-to-language EXT:LANGUAGE Mapping from extension to file language.
Empty value resets mapping.
--filename-to-language FILENAME:LANGUAGE
Mapping from filename to file language.
Empty value resets mapping.
--purpose-to-annotation PURPOSE:ANNOTATION
Mapping from file purpose to line
annotation. Empty value resets mapping.
--pattern-to-purpose PATTERN:PURPOSE
Mapping from pattern to match file path, to
that file purpose. Empty value resets
mapping.
--line-callback CALLBACK Body for `line_callback(tokens)` callback
function. See documentation and examples.
dataset¶
Annotate all bugs in provided DATASETS
Each DATASET is expected to be existing directory with the following structure, by default:
<dataset_directory>/<bug_directory>/patches/<patch_file>.diff
You can change the /patches/
part with --patches-dir option.
For example with --patches-dir='' the script would expect data
to have the following structure:
<dataset_directory>/<bug_directory>/<patch_file>.diff
Each DATASET can consist of many BUGs, each BUG should include patch to annotate as *.diff file in 'patches/' subdirectory (or in subdirectory you provide via --patches-dir option).
Usage:
dataset [OPTIONS] DATASETS...
Options:
DATASETS... \[required]
--output-prefix DIRECTORY Where to save files with annotation data.
--patches-dir DIR_NAME Subdirectory with patches; use '' to do
without such \[default: patches]
--annotations-dir DIR_NAME Subdirectory to write annotations to; use ''
to do without such \[default: annotation]
--uses-fanout / --no-uses-fanout
Dataset was generated with fan-out
\[default: no-uses-fanout]
from-repo¶
Create annotation data for commits from local Git repository
You can add additional options and parameters, which will be passed to
the git log -p
command. With those options and arguments you
can specify which commits to operate on (defaults to all commits).
See https://git-scm.com/docs/git-log or man git-log
(or git log -help
).
When no
https://git-scm.com/docs/gitrevisions#_specifying_revisions
Note that --use-fanout and --bugsinpy-layout are mutually exclusive.
Usage:
from-repo [OPTIONS] REPO_PATH
Options:
REPO_PATH Path to git repository. \[required]
--output-dir DIRECTORY Where to save generated annotated data.
\[required]
--use-fanout / --no-use-fanout Use fan-out when saving annotation data
\[default: no-use-fanout]
--bugsinpy-layout / --no-bugsinpy-layout
Create layout like the one in BugsInPy
\[default: no-bugsinpy-layout]
--annotations-dir DIR_NAME Subdirectory to write annotations to; use ''
to do without such \[default: annotation]
--use-repo / --no-use-repo Retrieve pre-/post-image contents from repo,
and use it for lexing \[default: use-repo]
-j, --n_jobs INTEGER Number of processes to use (joblib); 0 turns
feature off \[default: 0]
patch¶
Annotate a single PATCH_FILE, writing results to RESULT_JSON
Usage:
patch [OPTIONS] PATCH_FILE RESULT_JSON
Options:
PATCH_FILE unified diff file to annotate \[required]
RESULT_JSON JSON file to write annotation to \[required]
diff-gather-stats¶
Usage:
[OPTIONS] COMMAND [ARGS]...
Options:
--annotations-dir DIR_NAME Subdirectory to read annotations from; use '' to
do without such \[default: annotation]
lines-stats¶
Calculate per-bug and per-file count of line types in provided datasets
Each dataset is expected to be existing directory with the following structure:
<dataset_directory>/<bug_directory>/annotation/<patch_file>.json
Each dataset can consist of many BUGs, each BUG should include patch of annotated *diff.json file in 'annotation/' subdirectory.
Usage:
lines-stats [OPTIONS] OUTPUT_FILE DATASETS...
Options:
OUTPUT_FILE JSON file to write gathered results to
\[required]
DATASETS... list of dirs with datasets to process
\[required]
--purpose-to-annotation PURPOSE:LINE_TYPE|PURPOSE
Mapping from file PURPOSE to line type
LINE_TYPE. Each line of such file will be
treated as if it had given type. As a
shortcut, giving PURPOSE is the same as
PURPOSE:PURPOSE. Can be given multiple
times.
list-added-lines¶
List added lines from all bugs in provided datasets
Each dataset is expected to be existing directory with the following structure:
<dataset_directory>/<bug_directory>/annotation/<patch_file>.json
Each dataset can consist of many bugs, each bug should include patch of annotated *diff.json file in 'annotation/' subdirectory.
Usage:
list-added-lines [OPTIONS] DATASETS...
Options:
DATASETS... \[required]
purpose-counter¶
Calculate count of purposes from all bugs in provided datasets
Each dataset is expected to be existing directory with the following structure:
<dataset_directory>/<bug_directory>/annotation/<patch_file>.json
Each dataset can consist of many bugs, each bug should include patch of annotated *diff.json file in 'annotation/' subdirectory.
Usage:
purpose-counter [OPTIONS] DATASETS...
Options:
DATASETS... \[required]
-o, --output JSON_FILE JSON file to write gathered results to
purpose-per-file¶
Calculate per-file count of purposes from all bugs in provided datasets
Each dataset is expected to be existing directory with the following structure:
<dataset_directory>/<bug_directory>/annotation/<patch_file>.json
Each dataset can consist of many BUGs, each BUG should include patch of annotated *diff.json file in 'annotation/' subdirectory.
Usage:
purpose-per-file [OPTIONS] RESULT_JSON DATASETS...
Options:
RESULT_JSON JSON file to write gathered results to \[required]
DATASETS... list of dirs with datasets to process \[required]
timeline¶
Calculate timeline of bugs with per-bug count of different types of lines
For each bug (bugfix commit), compute the count of lines removed and added by the patch (commit) in all changed files, keeping separate counts for lines with different types, and (separately) with different purposes.
The gathered data is then saved in a format easy to load into dataframe.
Each DATASET is expected to be generated by annotating dataset or creating annotations from a repository, and should be an existing directory with the following structure:
<dataset_directory>/<bug_directory>/annotation/<patch_file>.json
Each dataset can consist of many BUGs, each BUG should include JSON file with its diff/patch annotations as *.json file in 'annotation/' subdirectory (by default).
Saves gathered timeline results to the OUTPUT_FILE.
Usage:
timeline [OPTIONS] OUTPUT_FILE DATASETS...
Options:
OUTPUT_FILE file to write gathered results to
\[required]
DATASETS... list of dirs with datasets to process
\[required]
--purpose-to-annotation PURPOSE:LINE_TYPE|PURPOSE
Mapping from file PURPOSE to line type
LINE_TYPE. Each line of such file will be
treated as if it had given type. As a
shortcut, giving PURPOSE is the same as
PURPOSE:PURPOSE. Can be given multiple
times.