Skip to main content

The DfE technical guidance and its content is intended for internal use by the DfE community.

Storing source code

GOV.UK Service Manual

We follow principles set out within the GOV.UK Service Manual and CDDO Secure by Design for managing code we write.

Service Manual Sections of relevance:

Summary / Highlights:

  • Changes to source code must be tracked
  • Code we produce should be made available via an internet source code repository
  • Published code should be under an Open Source initiative compatible licence
  • Due-care must be given to security considerations, including:
    • Suitable protection of confidential information and secrets
    • Departmental/governmental rules related to the use of cloud/3rd-party tooling
    • Proper process and accountability/approvals for making code changes

Additional detail and information is available via the links above.

Types of source code

Source code is broader and wider than just business and presentation code.

Examples of source code types and purposes:

  • Project source code
    • Code used to meet a user need - i.e., what is normally considered when describing “source code”
  • Test code
    • Code used to evaluate the correctness of the project code
    • Depending on the project, test code may involve provisioning infrastructure, deploying a build, and even running the project code (e.g. a headless browser to test the presentation and accessibility of a web page)
  • Infrastructure as code
    • Code used to provision and configure the infrastructure a project runs upon
  • CI configuration
    • Code used to inspect, validate, and potentially gate-keep changes being made to project code
    • May include GitHub Actions and Azure Pipelines
    • Typically triggered on a merge/PR event, but other examples include being triggered on creation of a particular tag (e.g., one in the format vX.Y.Z) or on a timer/cron-basis
  • Deployment code
    • Code used to build, test, and deploy project source code into a running environment

How this source code is stored and structured will vary by project, based on needs and historical convention. For example:

  • A project may be composed of multiple services where each has its own repository
  • A monolith may have all source code for all purposes stored within the same source code repository
  • A mixture may apply where project source code and tests are within one repository, while infrastructure code may be stored within a separate repository

Source Code Versioning: Git

At the Department for Education (DfE) we use Git for source code versioning.

  • Git is decentralised - this means all copies of the repository include the whole history of the repository, not just a snapshot
  • Branches are “cheap” - creating a new branch (or tag) involves just a new pointer at a specific commit (thus minimal compute and storage implications)
  • Hashes/checkums for each file and commit depend on the entire tree - thus, the repository is safe from surreptitious / malicious / accidental changes to earlier versions of a file without it being very visible to other users

Git Repository Hosting - GitHub and Azure DevOps (ADO)

While not required, most git users will nominate one copy of the git repository to be the authoritative copy.

  • It is possible to self-host a git server for this purpose but, often, this will be a hosted solution such as GitHub, Azure DevOps, GitLab, or any of the numerous other commercial services available.
  • A “hub and spoke” is easier to reason about and keep synchronised
  • Integrations with other tools will work with less friction, where they have a single copy to work with (e.g., automated test/deployment tools, issue/bug management)

GitHub

Historically, some projects use (and may remain on) private Azure DevOps and/or private GitHub repositories for legacy reasons, though we are now required to make new source code open.

Specifically, we use GitHub for new and migrated work.

GitHub Organisations

Department for Education (DfE) source code repositories on GitHub should be stored under an appropriate organisation, thereby giving appropriate oversight and protections to these source code repositories.

Specifically:

If your account is added to a repository without the account being a member of the owning organisation, it will be counted and labelled as an “outside collaborator”.

To join a GitHub organisation, follow the guidance and request forms available via Digital Tools Support. As there is a small cost implication for accounts to be added to a GitHub organisation, this should normally be done via / with support from your Delivery Manager.

Work vs Personal GitHub accounts

You may use your personal GitHub account, but you should:

Repository Requirements

There are a number of requirements repositories in DfE must meet in order to be compliant with service standards and Secure by Design principles.

Repositories containing production/ service code must:

  • have branch protection turned on for the default branch in your repository either via traditional branch protection or via repository rules. Branch protection should include:
    • at least one approval before merging
    • dismissal of stale approvals when new commits are pushed
    • codeowners review requirement
    • approval of most recent push
    • conversation resolution before merging
    • status checks to have passed
  • be clearly named
  • have an appropriate licence
  • have a CONTRIBUTING.md file to give guidance on how users can collaborate and contribute to the project
  • have a CODEOWNERS file that outlines the users or teams that own the code in your repository. Team’s are preferred for maintainability reasons.
  • include topics that help to provide metadata on ownership, teams, type of product, whether the code is production or prototype

For repositories containing application code, the repository should:

  • run Software Composition Analysis (SCA) scans for dependencies
  • run Static Application Security Testing (SAST) scans against the application code
    • teams can utilise the DfE SAST reusable workflow which can run CodeQL or Semgrep
    • alternatively teams may already be running Sonar
  • run Dynamic Application Security Testing (DAST) scans against testing environments before code is pushed to production
    • open source options with GitHub Actions include zaproxy and nuclei
  • run Unit Tests
  • run linters
  • run Infrastructure as Code scanning to look for potential cloud infrastructure misconfigurations
    • checkov is a reliable and open source IaC scanner, it supports Terraform, ARM, Bicep, Helm charts and Dockerfiles (and more)
  • run container vulnerability scanning to look for vulnerabilities in container images
  • turn on private vulnerability reporting
  • turn secret scanning with push protection on
  • block builds from deploying to production when high and critical risk issues are found that are over DfE SLA’s (this can be done via the DfE SAST reusable workflow)
  • include a webhook to receive security events
  • use a .gitignore file

For repositories that are particularly sensitive, or considered higher risk systems we should ensure that:

  • commits are signed using a gpg key
  • SBOM’s are generated and verified
  • artifact attestations are used to provide cryptographically signed provenance of software

Dependabot configuration options

The below example shows some recommended options you can include in your dependabot.yml file under the .github directory.

version: 2

registries:
  dockerhub: # Define access for a private registry
    type: docker-registry
    url: registry.hub.docker.com
    username: dfe-digital
    password: ${{ secrets.DFE_DOCKERHUB_PASSWORD }}

updates:

  - package-ecosystem: "docker"
    directory: "/docker" # docker directory
    registries:
      - dockerhub # Allow version updates for dependencies in this registry defined above
    schedule:
      interval: "weekly"

  - package-ecosystem: "nuget" # .NET
    directory: "/src" # src directory
    schedule:
      interval: "weekly"
    commit-message: # prefixes commit message with specified text
      prefix: "chore(deps): "

  - package-ecosystem: "bundler" # ruby (gems)
    directory: "/" # root
    schedule:
      interval: "weekly"
    groups: # Group all dependabot PRs for production into one single PR (based on wildcard)
      production-dependencies:
        dependency-type: "production"
        patterns:
          - "*"

  - package-ecosystem: "npm" # JavaScript (npm and yarn)
    directory: "/" # root
    schedule:
      interval: "weekly"
    labels: # adds the following labels to pull requests
      - "npm"
      - "dependencies"

  - package-ecosystem: "terraform" # Terraform
    directory: "/terraform" # terraform directory
    schedule:
      interval: "weekly"
    reviewers: # select required reviewers
      - "dfe-digital/security-engineers"

  - package-ecosystem: "github-actions" # GitHub Actions
    directory: "/" # For GHA '/' actually correlates to '.github/workflows'
    schedule:
      interval: "weekly"

Managing vulnerabilities with GitHub Advanced Security

When using SCA and SAST tools such as dependabot, CodeQL and Sonar you will find that they have the ability to send the results of their scans to the security tab on a repository.

The security tab collects this data so it can be easily viewed by developers and triaged. This could include:

  • marking as a false positive where necessary
  • reading the suggestion to produce a PR fix
  • removing and rotating secrets that have been accidentally pushed to the repository
  • reviewing and merging dependabot Pull Requests

Developers should keep the vulnerabilities in their repositories to a minimum by triaging and fixing vulnerabilities within DfE’s Service Level Agreement timetables, following the DfE guidance on vulnerability triaging.

Note that GitHub Advanced Security is currently only available for public repositories unless you have a licence via GitHub Enterprise.

Data Protection Considerations - Git Repositories

Personal Data

Storage of a git repository must be treated with due care and consideration. This applies whether it is within a central hosted environment or stored elsewhere such on a developer’s computer.

Places where we may normally find personally-identifiable information:

  • Changes to source code, commits, are annotated with authorship details. Typically, this is a real name (or username) and an email address.
  • Where a commit is cryptographically-signed, the GPG key used will also have personally-identifying information associated with it (such as an email address).

Additionally, note that git is explicitly a decentralised source versioning and control system.

  • It is, therefore, not possible to delete/change information within one copy of the repository (e.g., GitHub) and force all other copies to be updated also
  • It is, therefore, extremely important to prevent non-public content from ever being added to the git repository in the first place because it cannot be removed with 100% confidence (being able to do so is an edge case, not the norm)

Secrets

You must keep secrets separate from source code, and keep them private.

Private repositories are a poor way to protect secrets, and may only be used where access to the code might reveal draft policy decisions.

Teams should ensure GitHub Secrets Detection and Push Protection is turned on.

Secrets should be managed at the platform level, at DfE we can use:

If a secret has been pushed to a repository by accident, this should trigger an incident.

Archiving Repositories

If a repository becomes redundant and the codebase is no longer in use or required then it should be archived using the following guideline before deletion:

Repository Type Archive Duration Recovery after Deletion
Public 12 months 90 days
Private 6 months 90 days

Archiving of repositories should be completed following agreement with the relevant parties.