search iconsearch icon
Type something to search...

Scalable GitHub Actions for Modern Repos

Scalable GitHub Actions for Modern Repos

0. Intro

Creating maintainable and efficient GitHub Actions pipelines becomes critical when you manage multiple repositories or modular projects. This post walks through four key patterns to make your CI/CD setup scalable, reliable, and fast.

1. Reusable Hooks Across Repositories

When multiple repositories share CI logic, extract it into a reusable composite action that lives in a dedicated repo (e.g., villoro/vhooks). This pattern keeps your CI/CD logic consistent across projects while avoiding duplication.

1.1 Repository Layout

Start by organizing your hooks into separate folders, each representing a single reusable action.

vhooks/
├─ check_version/
│  ├─ action.yml          # composite action definition
│  ├─ requirements.txt    # python deps (optional)
│  └─ check_version.py    # hook logic
└─ tag_version/
   ├─ action.yml
   ├─ requirements.txt
   └─ tag_version.py

Each folder under vhooks is a standalone hook. The action.yml defines its interface, while the Python file contains the logic.

1.2 Composite Action Definition

The composite action serves as the glue between GitHub Actions and your Python script. It defines the inputs, runs any required setup, and calls your Python logic.

tag_version/action.yml

name: Tag Version
description: Tag with the version from a file only when selected paths change.
author: Arnau Villoro

inputs:
  branch:
    description: Branch to check the version from
    required: false
    default: main

  file:
    description: File to extract the version from (supports .toml, .json, .yml)
    required: false
    default: pyproject.toml

  path:
    description: Path inside the file to extract the version
    required: false
    default: project/version

  filters:
    description: |
      YAML for dorny/paths-filter. Must define a 'code' key.
      Example:
        code:
          - 'src/**'
    required: false
    default: |
      code:
        - '**'

runs:
  using: composite
  steps:
    - name: Checkout
      uses: actions/checkout@v4
      with:
        fetch-depth: 0

    - name: Detect changes
      id: changes
      uses: dorny/paths-filter@v2
      with:
        filters: ${{ inputs.filters }}

    - name: Install dependencies
      if: steps.changes.outputs.code == 'true'
      shell: bash
      run: pip install toml loguru click pyyaml

    - name: Extract version
      if: steps.changes.outputs.code == 'true'
      shell: bash
      run: python "$GITHUB_ACTION_PATH/tag_version.py" --file="${{ inputs.file }}" --path="${{ inputs.path }}"

    - name: Check if tag exists
      if: steps.changes.outputs.code == 'true'
      id: check_tag
      uses: mukunku/tag-exists-action@v1.4.0
      with:
        tag: ${{ env.VERSION }}

    - name: Create tag
      if: steps.changes.outputs.code == 'true' && steps.check_tag.outputs.exists != 'true'
      uses: actions/github-script@v7
      with:
        script: |
          github.rest.git.createRef({
            owner: context.repo.owner,
            repo: context.repo.repo,
            ref: `refs/tags/${{ env.VERSION }}`,
            sha: context.sha
          })

1.3 Python Implementation

The Python script extracts the version from your file and pushes a new tag if it doesn’t exist yet.

tag_version/tag_version.py

import subprocess
from pathlib import Path
import click
import json

try:
    import tomllib  # Python 3.11+
except ModuleNotFoundError:
    import tomli as tomllib


def read_version(file_path, key_path):
    p = Path(file_path)
    if p.suffix == ".toml":
        data = tomllib.loads(p.read_text())
    elif p.suffix in {".yml", ".yaml"}:
        import yaml
        data = yaml.safe_load(p.read_text())
    elif p.suffix == ".json":
        data = json.loads(p.read_text())
    else:
        raise click.ClickException(f"Unsupported file type: {p.suffix}")

    node = data
    for part in key_path.split("/"):
        node = node[part]
    return str(node).strip()


def git(cmd):
    return subprocess.check_output(cmd, text=True).strip()


@click.command()
@click.option("--file", default="pyproject.toml")
@click.option("--path", default="project/version")
def main(file, path):
    version = read_version(file, path)
    tag = version

    existing_tags = git(["git", "tag", "--list", tag])
    if existing_tags:
        click.echo(f"⚠️ Tag {tag} already exists — skipping.")
        return

    git(["git", "tag", tag])
    git(["git", "push", "origin", tag])
    click.echo(f"✅ Created and pushed tag: {tag}")


if __name__ == "__main__":
    main()

1.4 Consuming the Hook from Any Repo

To use your new hook, create a simple workflow that triggers on main pushes and automatically tags new versions.

name: Tag Version

on:
  push:
    branches: [main]
permissions:
  contents: write

jobs:
  tag_version:
    runs-on: ubuntu-latest
    steps:
      - uses: villoro/vhooks/tag_version@1.3.0
        with:
          file: pyproject.toml
          path: project/version

Always pin to a version tag like @1.3.0 to avoid breaking changes from main.

2. Matrix Jobs

Matrix jobs let you apply the same logic across multiple packages without duplicating YAML. Keep the matrix minimal (only what varies), derive everything else at runtime, and filter work so each package runs only when its files change.

2.1 Minimal Matrix, Clear Names

Only include the fields you truly need (here: name and tag_prefix). Everything else (paths, tags) is computed per package.

.github/workflows/tag_version.yml

name: Tag Version
on:
  push:
    branches: [main]

permissions:
  contents: write

jobs:
  tag_packages:
    name: "tag / ${{ matrix.name }}"
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        include:
          - name: nt_common
            tag_prefix: "common_"

          - name: nt_api
            tag_prefix: "api_"

          - name: nt_ml
            tag_prefix: "ml_"

          - name: nt_rds
            tag_prefix: "rds_"

          - name: nt_sdk
            tag_prefix: "sdk_"

    steps:
      - name: Tag ${{ matrix.name }} version
        uses: villoro/vhooks/tag_version@1.3.1
        with:
          file: src/${{ matrix.name }}/pyproject.toml
          path: project/version
          tag-prefix: ${{ matrix.tag_prefix }}
          filters: |
            code:
              - 'src/${{ matrix.name }}/**'

The hook receives a different file and tag-prefix per package. dorny/paths-filter inside the hook ensures we only tag when that package actually changed.

2.2 Practical Tips

  • Use strategy.fail-fast: false so one failure doesn’t cancel all packages.
  • Keep job names informative (e.g., tag / nt_api).
  • Pass filters down to the hook rather than duplicating filtering logic in the workflow.

Minimal matrices + in‑hook filtering = fast runs and clean YAML as your monorepo grows.

3. Gate Jobs

Branch protection becomes noisy if you require every matrix job. Add a gate job that depends on the matrix and fails if any package failed and then protect only the gate.

3.1 Aggregate Matrix Results

Create a tiny job that always runs, inspects the matrix result, and exits accordingly.

.github/workflows/tag_version.yml (continued)

  tag_gate:
    name: tag_result
    needs: [tag_packages]
    runs-on: ubuntu-latest
    if: always()
    steps:
      - name: Summarize matrix outcome
        run: |
          echo "Matrix result: ${{ needs.tag_packages.result }}"
          if [ "${{ needs.tag_packages.result }}" != "success" ]; then
            echo "Some package tagging jobs failed. Check the matrix logs."
            exit 1
          fi

Protect only the tag_result check in your branch rules. This keeps PR status simple while still enforcing success across all packages.

Common Pitfall: Expressions like ${{ hashFiles() }} don’t evaluate inside the matrix definition. Compute cache keys at runtime instead using values like matrix.name or matrix.path inside a step.

4. Concurrency

When you push new commits to a PR, older runs become obsolete. Cancel them automatically to save resources.

name: CI_global

on:
  pull_request:

# This is the important part
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

jobs:
  pre_commit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
      - uses: pre-commit/action@v3.0.0

This ensures that only the latest workflow for each PR remains active.

5. Putting It All Together

Reusable hooks, matrix jobs, gate checks, and concurrency form a scalable CI/CD pattern:

  • Hooks keep logic centralized and DRY.
  • Matrix jobs scale across packages efficiently.
  • Gate jobs ensure atomic, reliable results.
  • Concurrency cancels redundant runs to save resources.

Together, they make your workflows modular, efficient, and production-ready.