search iconsearch icon
Type something to search...

Scaling ECS Python Deployments with a Modular Monorepo

Scaling ECS Python Deployments with a Modular Monorepo

0. Intro

Every project starts small. In our case, it was a straightforward ELT setup focused on extracting data from APIs and storing it cleanly. At the time, we didn’t need a complex monorepo—just a single workspace with a small set of interdependent packages.

As the team grew and the project evolved to cover more diverse domains (scraping, ML jobs, RDS interactions etc.) we needed a way to modularize, test, and deploy independently. That’s when we shifted to a multi-workspace monorepo.

This post explains how we made that evolution: from a single-package workspace to a scalable, cleanly isolated monorepo with CI/CD tailored for each package.

1. Monorepo Design Options

When you start a Python monorepo, you typically have two options:

  1. A single shared workspace (like one uv venv covering everything).
  2. Multiple independent workspaces (one per package).

Let’s look at each.

1.1 Single Workspace: Easy at First

This is a great starting point. It keeps things simple while you’re focused on building a single flow or service.

In the initial phase of a project, having a single workspace is ideal:

  • Just one pyproject.toml for all code
  • One venv (via uv venv) to manage everything
  • Easy to run tests, build, or deploy

This is the setup described in Domain LogoFast and Reproducible Python Deployments on ECS with uv.

But as the codebase expands to include APIs, ML pipelines, or connectors, the single workspace model starts to break down—dependency conflicts emerge (e.g., prefect needing package_x<3 while pandas needs package_x>=4), and even simple jobs end up dragging massive venvs due to heavyweight libraries.

1.2 Multi-Workspace: Scales Better

When packages become logically independent (e.g. nt_sdk, nt_rds, nt_ml), a multi-workspace monorepo makes more sense:

  • Each package has its own pyproject.toml
  • Each has its own uv venv
  • Each defines only what it needs (faster builds, smaller deploys)
src/
├─ nt_common/       # shared code (utils, schemas, etc.)
├─ nt_rds/          # jobs that interact with databases
├─ nt_api/          # API jobs
└─ nt_ml/           # ML jobs

Multi-workspace setups improve modularity, CI/CD speed, and team ownership. They also make it easier to plug different packages into different deploy targets.

In the next section, we’ll explain the benefits in more detail and how we organized the transition.

2. Monorepo with Multiple Workspaces

When splitting a growing project into independent parts, the first challenge is organizing the codebase.

I recommend the following structure:

src/
├── nt_common/
|   ├── nt_common/
|   ├── tests/
|   ├── pyproject.toml
|   └── uv.lock
├── nt_api/
|   ├── nt_api/
|   ├── tests/
|   ├── pyproject.toml
|   └── uv.lock
└── nt_rds/
    ├── nt_rds/
    ├── tests/
    ├── pyproject.toml
    └── uv.lock

Each top-level folder inside src/ represents a standalone package with its own dependencies, tests, and isolated environment.

While it’s possible to have a single workspace that includes all packages, using independent workspaces provides better isolation, easier testing, and faster deployment for modular systems.

2.1 Creating Shared Code with nt_common

Sometimes, packages need to share common logic—models, helpers, utilities. That’s where nt_common comes in. It behaves like any other package but is added as a local dependency to others.

Steps:

  1. Create src/nt_common/pyproject.toml and define your shared code in src/nt_common/nt_common/
  2. From any other workspace (e.g., nt_api):
cd src/nt_api
uv add ../nt_common

This installs nt_common as a built wheel, with its pinned dependencies from its own uv.lock.

If you update nt_common, run uv sync in the dependent package (e.g., nt_api) to apply the latest version.

2.2 Minimal pyproject.toml example

Here’s a clean example for nt_api:

src/nt_api/pyproject.toml

[project]
name = "nt-api"
version = "0.1.0"
description = "Code for interacting with APIs"
readme = "README.md"
requires-python = ">=3.10, <3.13"
dependencies = [
    "nt-common",
]

[tool.setuptools.packages.find]
where = ["."]
include = ["nt_api*"]

[tool.uv.sources]
nt-common = { path = "../nt_common" }

[tool.setuptools.package-data]
nt_api = [
    "**/*.html",
    "**/*.j2",
    "**/*.yaml",
    "**/*.yml"
]

[dependency-groups]
dev = [
    "pytest>=9.0.0",
]

Key points:

  • include explicitly declares which folder should be packaged
  • package-data makes sure config files are bundled
  • pytest under dev allows for easy testing without affecting prod dependencies

3. Import Testing in CI

To guarantee correctness before packaging or deploying any Python package, we include a minimal test that confirms all modules can be imported. This prevents common issues such as missing dependencies, misconfigured paths, or broken module declarations.

This test runs as part of every pull request and is required to pass before merge. It’s fast and effective at catching mistakes early.

3.1. How It Works

We use pytest and importlib to:

  • Recursively discover all .py files in the package
  • Skip hidden directories and irrelevant content
  • Try importing each module individually

This ensures the codebase reflects the dependencies declared in pyproject.toml, and any structural issues are caught early.

3.2. Minimal Test Code

src/nt_xxx/tests/test_imports.py

import importlib
from pathlib import Path
import pytest

PACKAGE_DIR = Path(__file__).resolve().parent.parent
PACKAGE_NAME = PACKAGE_DIR.name
CODE_DIR = PACKAGE_DIR / PACKAGE_NAME

def is_hidden_path(py_file: Path) -> bool:
    parts = py_file.relative_to(CODE_DIR).parts[:-1]  # exclude filename
    return any(part.startswith(".") for part in parts)

def iter_modules():
    for py_file in CODE_DIR.rglob("*.py"):
        if is_hidden_path(py_file):
            continue
        module_path = py_file.with_suffix("").relative_to(PACKAGE_DIR)
        yield ".".join(module_path.parts)

MODULES = list(iter_modules())

@pytest.mark.parametrize("module_name", MODULES)
def test_import_module(module_name):
    importlib.invalidate_caches()
    importlib.import_module(module_name)

This will be placed in each package and tested at each PR.

4. CI/CD with Matrix Strategy

To keep packages isolated but uniformly tested, we use GitHub Actions with a matrix job. This approach scales easily as new packages are added by just updating the matrix list.

4.1 Matrix Setup for Pytest

We use a shared CI workflow that:

  • Loops over each defined package (e.g., nt_common, nt_api, etc.)
  • Sets up the virtual environment via uv sync
  • Installs the project using uv pip install .
  • Runs pytest inside the venv

Example:

.github/workflows/CI_pytest.yaml

description: CI_pytest
strategy:
  matrix:
    name:
      - nt_common
      - nt_api
      - nt_rds
      # Add new packages here

You can read more about this at Domain LogoScalable GitHub Actions for Modern Repos.

4.2 Generic Dockerfile.venv

To build reproducible, isolated environments for each package, we use a shared Dockerfile.venv:

  • Accepts --build-arg PACKAGE_NAME=nt_api and PACKAGE_VERSION=0.3.1
  • Builds the venv using uv, copies the code, and installs the package non-editably
  • Packages the virtual environment into a versioned .tar.gz

Example output:

nt_api__venv_0.3.1.tar.gz

Or optionally, structured by folder:

nt_api/venv_0.3.1.tar.gz

4.3 Uploading and Referencing Artifacts

Because each package produces its own archive, downstream systems (e.g., deployment scripts or S3 uploads) must:

  • Handle versioned and namespaced paths
  • Support prefix-based lookup

This makes it trivial to deploy or cache specific venvs for small jobs, without reusing bloated ML dependencies.

Ensure your upload_all.py or similar code respects the naming convention so each file is easy to find and consume.

5. Closing Thoughts

Starting with a single workspace and a simple uv-based setup (as shown in Domain LogoFast and Reproducible Python Deployments on ECS with uv) is the most efficient way to bootstrap a project. It keeps complexity low and lets you move fast.

But as your project grows—adding new modules, APIs, scrapers, ML pipelines, or connectors—the overhead of maintaining everything in one workspace starts to show. Splitting into multiple workspaces allows better isolation of dependencies, targeted CI/CD, and smaller deployable units.

This pattern provides the best of both worlds: code sharing when you need it, and independence when you don’t.

If you’re hitting the limits of a single Python package, this structure is a great next step.