export

Database migrations with FastAPI, Alembic and AWS Lambda

2024-08-19 | 8 min read
Armand Rego
Management of database schema changes can be a significant challenge, especially in serverless environments. Automated, continuous deployment of application and infrastructure code is commonplace; managing database migrations in a similar way can greatly increase development speed and minimise issues with what can otherwise be a highly error-prone procedure.

Introduction

In our last blog post we presented a demo serverless FastAPI application deployed on AWS Lambda and configured using CDK. Part of what made that architecture very simple and very scalable was the use of DynamoDB as the persistence layer. But what if the data you are storing is relational? DynamoDB can model simple relationships between data models, but sometimes an SQL database is what you really need to capture the complexity of highly related data.

An example of a highly related data model from https://vertabelo.com/blog/vertabelo-tips-good-er-diagram-layout/.

Of course, one of the core features of any SQL database is a defined schema and database migrations to modify said schema, which raises the important question of how to best manage the migrations in a deployed environment. In this blog, we demonstrate a method of running database migrations for a serverless application using the AWS Custom Resource CDK construct.

Getting started

Key technologies

Two of the important technologies used in the application are SQLAlchemy, a popular Python ORM, commonly used with web application frameworks such as FastAPI and Flask, and Alembic, its sister database migration tool (written by the same author). The code examples included assume use of these libraries.

Setup

The starting point is a FastAPI application deployed as a Lambda function using CDK. The function is placed within the VPC and appropriate security groups to enable access to the RDS instance.

1
api_function = _lambda.DockerImageFunction(
2
    self,
3
    "ApiFunction",
4
    code=_lambda.DockerImageCode.from_image_asset(
5
        directory="api_function"
6
    ),
7
    vpc=vpc,
8
    vpc_subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS),
9
    security_groups=[rds_access_security_group],
10
)

In the previous blog post, we used the mangum library to wrap the FastAPI application and convert the incoming HTTP requests into Lambda events. This time, we use the DockerImageFunction construct with the following Dockerfile to allow us to bring in the Lambda Web Adapter, as well as package up the application and install dependencies.

1
FROM public.ecr.aws/docker/library/python:3.12-slim
2

3
# Copy Lambda web adapter extensions
4
COPY --from=public.ecr.aws/awsguru/aws-lambda-adapter:0.8.3 /lambda-adapter /opt/extensions/lambda-adapter
5

6
# Configure port and working directory as base image is not Lambda image
7
ENV PORT=8000
8
WORKDIR /var/task
9

10
# Copy requirements and install dependencies
11
COPY requirements.txt ./
12
RUN pip install -r ./requirements.txt
13

14
# Copy all contents of current directory
15
COPY . ./
16

17
CMD uvicorn --port=$PORT main:app

Handling the migrations

Up to here we have only set up the FastAPI application to run in Lambda. We can now set up the migrations to run on every cdk deploy command - this involves configuring two CDK constructs in the stack:

  • A migration Lambda function
  • An AWS Custom Resource

Why a migration function

If we deployed the application in a traditional, server-full way, we would simply be able to run the command to execute the migrations on the server itself i.e alembic upgrade head. Given we are deploying on Lambda, we need a way of circumventing the Docker CMD directive that starts the uvicorn server and instead invoke the Alembic CLI command.

We do this by configuring another DockerImageFunction that points to the same source code but with a new Dockerfile; Dockerfile.migrate.

1
migration_function = _lambda.DockerImageFunction(
2
    self,
3
    "MigrationFunction",
4
    code=_lambda.DockerImageCode.from_image_asset(
5
        directory="api_function",
6
        file="Dockerfile.migrate"
7
    ),
8
    vpc=vpc,
9
    vpc_subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS),
10
    security_groups=[rds_access_security_group],
11
)

The new Dockerfile.migrate is given below, and simply points to the entrypoint for the Alembic console runner.

1
FROM public.ecr.aws/lambda/python:3.12
2

3
# Copy requirements and install dependencies
4
COPY requirements.txt ./
5
RUN pip install -r ./requirements.txt
6

7
# Copy all contents of current directory
8
COPY . ./
9

10
CMD ["alembic.config.main"]

AWS Custom Resource

Now that we have the migration function configured, we need a way of triggering it when the stack is deployed. There are multiple ways in which this can be done such as via CDK pipelines, directly from a CI pipeline e.g Github Actions, or using an AWS Custom Resource.

Using an AWS Custom Resource brings certain advantages that make it good mechanism of invoking the migration function:

  • Automatic dependency chain - the custom resource depends on the migration Lambda function so will only trigger after it is deployed
  • Automatic rollback of the entire stack if the migrations fail to run
  • Can achieve automated database migrations even without a CI pipeline - useful for ephemeral or test environments
1
custom_resources.AwsCustomResource(
2
    self,
3
    "MigrationCustomResource",
4
    policy=custom_resources.AwsCustomResourcePolicy.from_statements(
5
        statements=[
6
            iam.PolicyStatement(
7
                actions=["lambda:InvokeFunction"],
8
                effect=iam.Effect.ALLOW,
9
                resources=[migration_function.function_arn],
10
            )
11
        ]
12
    ),
13
    on_update=custom_resources.AwsSdkCall(
14
        service="Lambda",
15
        action="invoke",
16
        parameters={
17
            "FunctionName": migration_function.function_name,
18
            "InvocationType": "RequestResponse",
19
            "Payload": '["--config=/var/task/alembic.ini", "upgrade", "head"]',
20
        },
21
        # Like an idempotency key to trigger the resource on every deploy
22
        physical_resource_id=custom_resources.PhysicalResourceId.of(
23
            datetime.now(tz=timezone.utc).isoformat()
24
        ),
25
    ),
26
    vpc=vpc,
27
    vpc_subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS),
28
    timeout=migration_function.timeout,
29
)

There are few key points to note about the Custom Resource. Firstly, the principle of least privilege is applied by only giving it the permissions required to invoke the migration function - that is all it needs to do, after all!

Secondly, we specify the payload of the Lambda invocation as the arguments to pass to the alembic.config.main entrypoint that we configured in the Dockerfile.migrate file. Note that the path to the Alembic config file is manually specified with "--config=/var/task/alembic.ini".

Lastly, the physical_resource_id is set to the current time. This acts as a key to determine whether the SDK call is actually made - by making it unique it guarantees the call will be made on each deploy.

Conclusion

In this post we've shown how an AWS Custom Resource can be used to streamline database migrations in a serverless FastAPI application that uses SQLAlchemy and Alembic as the ORM and migrations tool.

If you'd like to know more, or have any other questions about AWS infrastructure, serverless or Python, please do book in a meeting with me!

© Weird Sheep Labs Ltd 2024
Weird Sheep Labs Ltd is a company registered in England & Wales (Company No. 15160367)
85 Great Portland St, London, W1W 7LT