About generating code scanning results with CodeQL CLI
Once you've made the CodeQL CLI available to servers in your CI system, and ensured that they can authenticate with GitHub, you're ready to generate data.
You use three different commands to generate results and upload them to GitHub:
database createto create a CodeQL database to represent the hierarchical structure of each supported programming language in the repository.
database analyzeto run queries to analyze each CodeQL database and summarize the results in a SARIF file.
github upload-resultsto upload the resulting SARIF files to GitHub where the results are matched to a branch or pull request and displayed as code scanning alerts.
You can display the command-line help for any command using the
Note: Uploading SARIF data to display as code scanning results in GitHub is supported for organization-owned repositories with GitHub Advanced Security enabled, and public repositories on GitHub.com. For more information, see "Managing security and analysis settings for your repository."
Creating CodeQL databases to analyze
Check out the code that you want to analyze:
- For a branch, check out the head of the branch that you want to analyze.
- For a pull request, check out either the head commit of the pull request, or check out a GitHub-generated merge commit of the pull request.
Set up the environment for the codebase, making sure that any dependencies are available. For more information, see "Creating CodeQL databases" and "Creating CodeQL databases."
Find the build command, if any, for the codebase. Typically this is available in a configuration file in the CI system.
codeql database createfrom the checkout root of your repository and build the codebase.
# Single supported language - create one CodeQL database codeql database create <database> --command<build> --language=<language-identifier> # Multiple supported languages - create one CodeQL database per language codeql database create <database> --command<build> \ --db-cluster --language=<language-identifier>,<language-identifier>
Note: If you use a containerized build, you need to run the CodeQL CLI inside the container where your build task takes place.
|Specify the name and location of a directory to create for the CodeQL database. The command will fail if you try to overwrite an existing directory. If you also specify |
|Specify the identifier for the language to create a database for, one of: |
|Recommended. Use to specify the build command or script that invokes the build process for the codebase. Commands are run from the current folder or, where it is defined, from |
|Use in multi-language codebases to generate one database for each language specified by |
|Use if you run the CLI outside the checkout root of the repository. By default, the |
|Advanced. Use if you have a configuration file that specifies how to create the CodeQL databases and what queries to run in later steps. For more information, see "Customizing code scanning" and "database create."|
For more information, see "Creating CodeQL databases."
Single language example
This example creates a CodeQL database for the repository checked out at
Multiple language example
This example creates two CodeQL databases for the repository checked out at
/checkouts/example-repo-multi. It uses:
--db-clusterto request analysis of more than one language.
--languageto specify which languages to create databases for.
--commandto tell the tool the build command for the codebase, here
--no-run-unnecessary-buildsto tell the tool to skip the build command for languages where it is not needed (like Python).
The resulting databases are stored in
cpp subdirectories of
$ codeql database create /codeql-dbs/example-repo-multi \ --db-cluster --language python,cpp \ --command make --no-run-unnecessary-builds \ --source-root /checkouts/example-repo-multi Initializing databases at /codeql-dbs/example-repo-multi. Running build command: [make] [build-stdout] Calling python3 /codeql-bundle/codeql/python/tools/get_venv_lib.py [build-stdout] Calling python3 -S /codeql-bundle/codeql/python/tools/python_tracer.py -v -z all -c /codeql-dbs/example-repo-multi/python/working/trap_cache -p ERROR: 'pip' not installed. [build-stdout] /usr/local/lib/python3.6/dist-packages -R /checkouts/example-repo-multi [build-stdout] [INFO] Python version 3.6.9 [build-stdout] [INFO] Python extractor version 5.16 [build-stdout] [INFO]  Extracted file /checkouts/example-repo-multi/hello.py in 5ms [build-stdout] [INFO] Processed 1 modules in 0.15s [build-stdout] <output from calling 'make' to build the C/C++ code> Finalizing databases at /codeql-dbs/example-repo-multi. Successfully created databases at /codeql-dbs/example-repo-multi. $
Analyzing a CodeQL database
- Create a CodeQL database (see above).
codeql database analyzeon the database and specify which packs and/or queries to use.
codeql database analyze <database> --format=<format> \ --output=<output> --download <packs,queries>
Note: If you analyze more than one CodeQL database for a single commit, you must specify a SARIF category for each set of results generated by this command. When you upload the results to GitHub, code scanning uses this category to store the results for each language separately. If you forget to do this, each upload overwrites the previous results.
codeql database analyze <database> --format=<format> \ --sarif-category=<language-specifier> --output=<output> \ <packs,queries>
|Specify the path for the directory that contains the CodeQL database to analyze.|
|Specify CodeQL packs or queries to run. To run the standard queries used for code scanning, omit this parameter. To see the other query suites included in the CodeQL CLI bundle, look in |
|Specify the format for the results file generated by the command. For upload to GitHub this should be: |
|Specify where to save the SARIF results file.|
|Optional for single database analysis. Required to define the language when you analyze multiple databases for a single commit in a repository.|
Specify a category to include in the SARIF results file for this analysis. A category is used to distinguish multiple analyses for the same tool and commit, but performed on different languages or different parts of the code.
|Use if you want to include any available markdown-rendered query help for custom queries used in your analysis. Any query help for custom queries included in the SARIF output will be displayed in the code scanning UI if the relevant query generates an alert. For more information, see "Analyzing databases with the CodeQL CLI."|
|Use if you want to include CodeQL query packs in your analysis. For more information, see "Downloading and using CodeQL packs."|
|Use if some of your CodeQL query packs are not yet on disk and need to be downloaded before running queries.|
|Use if you want to use more than one thread to run queries. The default value is |
|Use to get more detailed information about the analysis process and diagnostic data from the database creation process.|
For more information, see Analyzing databases with the CodeQL CLI."
Basic example of analyzing a CodeQL database
This example analyzes a CodeQL database stored at
/codeql-dbs/example-repo and saves the results as a SARIF file:
/temp/example-repo-js.sarif. It uses
Uploading results to GitHub
You can check that the SARIF properties have the supported size for upload and that the file is compatible with code scanning. For more information, see "SARIF support for code scanning".
Before you can upload results to GitHub, you must determine the best way to pass the GitHub App or personal access token you created earlier to the CodeQL CLI (see Installing CodeQL CLI in your CI system). We recommend that you review your CI system's guidance on the secure use of a secret store. The CodeQL CLI supports:
- Passing the token to the CLI via standard input using the
- Saving the secret in the environment variable
GITHUB_TOKENand running the CLI without including the
When you have decided on the most secure and reliable method for your CI server, run
codeql github upload-results on each SARIF results file and include
--github-auth-stdin unless the token is available in the environment variable
echo "$UPLOAD_TOKEN" | codeql github upload-results --repository=<repository-name> \ --ref=<ref> --commit=<commit> --sarif=<file> \ --github-auth-stdin
|Specify the OWNER/NAME of the repository to upload data to. The owner must be an organization within an enterprise that has a license for GitHub Advanced Security and GitHub Advanced Security must be enabled for the repository, unless the repository is public. For more information, see "Managing security and analysis settings for your repository."|
|Specify the name of the |
|Specify the full SHA of the commit you analyzed.|
|Specify the SARIF file to load.|
|Use to pass the CLI the GitHub App or personal access token created for authentication with GitHub's REST API via standard input. This is not needed if the command has access to a |
For more information, see "github upload-results."
Basic example of uploading results to GitHub
This example uploads results from the SARIF file
temp/example-repo-js.sarif to the repository
my-org/example-repo. It tells the code scanning API that the results are for the commit
deb275d2d5fe9a522a0b7bd8b6b6a1c939552718 on the
$ echo $UPLOAD_TOKEN | codeql github upload-results --repository=my-org/example-repo \ --ref=refs/heads/main --commit=deb275d2d5fe9a522a0b7bd8b6b6a1c939552718 \ --sarif=/temp/example-repo-js.sarif --github-auth-stdin
There is no output from this command unless the upload was unsuccessful. The command prompt returns when the upload is complete and data processing has begun. On smaller codebases, you should be able to explore the code scanning alerts in GitHub shortly afterward. You can see alerts directly in the pull request or on the Security tab for branches, depending on the code you checked out. For more information, see "Triaging code scanning alerts in pull requests" and "Managing code scanning alerts for your repository."
Downloading and using CodeQL query packs
Note: The CodeQL package management functionality, including CodeQL packs, is currently in beta and subject to change.
The CodeQL CLI bundle includes queries that are maintained by GitHub experts, security researchers, and community contributors. If you want to run queries developed by other organizations, CodeQL query packs provide an efficient and reliable way to download and run queries. For more information, see "About code scanning with CodeQL."
Before you can use a CodeQL pack to analyze a database, you must download any packages you require from the GitHub Container registry. This can be done either by using the
--download flag as part of the
codeql database analyze command. If a package is not publicly available, you will need to use a GitHub App or personal access token to authenticate. For more information and an example, see "Uploading results to GitHub" above.
|Specify the scope and name of one or more CodeQL query packs to download using a comma-separated list. Optionally, include the version to download and unzip. By default the latest version of this pack is downloaded. Optionally, include a path to a query, directory, or query suite to run. If no path is included, then run the default queries of this pack.|
|Pass the GitHub App or personal access token created for authentication with GitHub's REST API to the CLI via standard input. This is not needed if the command has access to a |
Note: If you specify a particular version of a query pack to use, be aware that the version you specify may eventually become too old for the latest version of CodeQL to make efficient use of. To ensure optimal performance, if you need to specify exact query pack versions, you should reevaluate which versions you pin to whenever you upgrade the CodeQL CLI you're using.
For more information about pack compatibility, see "Publishing and using CodeQL packs."
Basic example of downloading and using query packs
This example runs the
codeql database analyze command with the
--download option to:
- Download the latest version of the
- Download a version of the
octo-org/optional-security-queriespack that is compatible with version 1.0.1 (in this case, it is version 1.0.2). For more information on semver compatibility, see npm's semantic version range documentation.
- Run all the default queries in
- Run only the query
$ echo $OCTO-ORG_ACCESS_TOKEN | codeql database analyze --download /codeql-dbs/example-repo \ octo-org/security-queries \ octo-org/optional-security-queries@~1.0.1:queries/csrf.ql \ --format=sarif-latest --output=/temp/example-repo-js.sarif > Download location: /Users/mona/.codeql/packages > Installed fresh email@example.com > Installed fresh firstname.lastname@example.org > Running queries. > Compiling query plan for /Users/mona/.codeql/packages/octo-org/security-queries/1.0.0/potential-sql-injection.ql. > [1/2] Found in cache: /Users/mona/.codeql/packages/octo-org/security-queries/1.0.0/potential-sql-injection.ql. > Starting evaluation of octo-org/security-queries/query1.ql. > Compiling query plan for /Users/mona/.codeql/packages/octo-org/optional-security-queries/1.0.2/queries/csrf.ql. > [2/2] Found in cache: /Users/mona/.codeql/packages/octo-org/optional-security-queries/1.0.2/queries/csrf.ql. > Starting evaluation of octo-org/optional-security-queries/queries/csrf.ql. > [2/2 eval 694ms] Evaluation done; writing results to octo-org/security-queries/query1.bqrs. > Shutting down query evaluator. > Interpreting results.
Direct download of CodeQL packs
If you want to download a CodeQL pack without running it immediately, then you can use the
codeql pack download command. This is useful if you want to avoid accessing the internet when running CodeQL queries. When you run the CodeQL analysis, you can specify packs, versions, and paths in the same way as in the previous example:
echo $OCTO-ORG_ACCESS_TOKEN | codeql pack download <scope/name@version:path> <scope/name@version:path> ...
Downloading CodeQL packs from multiple GitHub container registries
If your CodeQL packs reside on multiple container registries, then you must instruct the CodeQL CLI where to find each pack. For more information, see "Customizing code scanning."
Example CI configuration for CodeQL analysis
This is an example of the series of commands that you might use to analyze a codebase with two supported languages and then upload the results to GitHub.
# Create CodeQL databases for Java and Python in the 'codeql-dbs' directory # Call the normal build script for the codebase: 'myBuildScript' codeql database create codeql-dbs --source-root=src \ --db-cluster --language=java,python --command=./myBuildScript # Analyze the CodeQL database for Java, 'codeql-dbs/java' # Tag the data as 'java' results and store in: 'java-results.sarif' codeql database analyze codeql-dbs/java java-code-scanning.qls \ --format=sarif-latest --sarif-category=java --output=java-results.sarif # Analyze the CodeQL database for Python, 'codeql-dbs/python' # Tag the data as 'python' results and store in: 'python-results.sarif' codeql database analyze codeql-dbs/python python-code-scanning.qls \ --format=sarif-latest --sarif-category=python --output=python-results.sarif # Upload the SARIF file with the Java results: 'java-results.sarif' echo $UPLOAD_TOKEN | codeql github upload-results --repository=my-org/example-repo \ --ref=refs/heads/main --commit=deb275d2d5fe9a522a0b7bd8b6b6a1c939552718 \ --sarif=java-results.sarif --github-auth-stdin # Upload the SARIF file with the Python results: 'python-results.sarif' echo $UPLOAD_TOKEN | codeql github upload-results --repository=my-org/example-repo \ --ref=refs/heads/main --commit=deb275d2d5fe9a522a0b7bd8b6b6a1c939552718 \ --sarif=python-results.sarif --github-auth-stdin
Troubleshooting the CodeQL CLI in your CI system
Viewing log and diagnostic information
When you analyze a CodeQL database using a code scanning query suite, in addition to generating detailed information about alerts, the CLI reports diagnostic data from the database generation step and summary metrics. For repositories with few alerts, you may find this information useful for determining if there are genuinely few problems in the code, or if there were errors generating the CodeQL database. For more detailed output from
codeql database analyze, use the
For more information about the type of diagnostic information available, see "Viewing code scanning logs".
Code scanning only shows analysis results from one of the analyzed languages
By default, code scanning expects one SARIF results file per analysis for a repository. Consequently, when you upload a second SARIF results file for a commit, it is treated as a replacement for the original set of data.
If you want to upload more than one set of results to the code scanning API for a commit in a repository, you must identify each set of results as a unique set. For repositories where you create more than one CodeQL database to analyze for each commit, use the
--sarif-category option to specify a language or other unique category for each SARIF file that you generate for that repository.
Issues with Python extraction
We are deprecating Python 2 support for the CodeQL CLI, more specifically for the CodeQL database generation phase (code extraction).
If you use the CodeQL CLI to run CodeQL code scanning on code written in Python, you must make sure that your CI system has Python 3 installed.