export

Using FFmpeg in AWS Lambda with Docker

2025-05-25 | 9 min read
Armand Rego
Audio and video processing in serverless environments like AWS Lambda has traditionally been challenging due to the lack of native support for media tools. However, we can now package powerful media processing libraries like FFmpeg directly into our deployments by containerising Lambda functions with Docker. In this post, we'll explore how to effectively use FFmpeg within AWS Lambda by examining a proof of concept transcription service we recently built for a client.

The challenge

Processing audio and video files in AWS Lambda presents several challenges:

1. Lambda's default runtime environments don't include FFmpeg
2. The /tmp directory is the only writable location
3. Memory and execution time limitations
4. Deploying binary dependencies can be complicated

Use case

We recently built an audio transcription service proof of concept for a client where the primary requirement was to enable significantly faster speech-to-text transcription with AWS Transcribe.

The solution we landed on was to chunk the input audio files; instead of submitting a single large file as one transcription job, we split the audio file into multiple chunks using FFmpeg and sent the chunks to be handled by Transcribe simultaneously. This parallel processing approach drastically reduces the overall transcription time for large audio files.

For example, a 30-minute audio file that might take 10-15 minutes to transcribe as a single job could be split into 60 thirty-second chunks and transcribed in roughly 1-2 minutes total, as all chunks are processed in parallel.

Why serverless for this proof of concept?

This architecture was deliberately built using serverless technologies like AWS Lambda for several key reasons:

1. Speed of development - Using Lambda functions allowed the team to rapidly build and iterate on the solution without managing infrastructure
2. Pay-as-you-go pricing - With a proof of concept where usage patterns are unknown, serverless provides cost efficiency by only charging for actual usage
3. Auto-scaling - The solution automatically scales from processing a single file to handling many files concurrently
4. Managed services integration - Lambda functions integrate seamlessly with S3, DynamoDB, and AWS Transcribe

The overall architecture of the transcription service.

Including FFmpeg in Lambda

The key to running FFmpeg in Lambda is properly configuring the Docker container. Let's examine the Dockerfile:

1
FROM public.ecr.aws/lambda/python:3.12
2
# Install git so git dependencies can be installed
3
RUN dnf update -y && dnf install -y git tar xz wget
4

5
# Install ffmpeg
6
RUN mkdir -p ${LAMBDA_TASK_ROOT}/ffmpeg && \
7
 cd ${LAMBDA_TASK_ROOT}/ffmpeg && \
8
 wget https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz && \
9
 tar -xvf ffmpeg-release-amd64-static.tar.xz && \
10
 mv ffmpeg-*-amd64-static/* . && \
11
 rm -rf ffmpeg-*-amd64-static && \
12
 rm ffmpeg-release-amd64-static.tar.xz
13

14
 ENV PATH="${LAMBDA_TASK_ROOT}/ffmpeg:${PATH}"
15
RUN chmod -R +x ${LAMBDA_TASK_ROOT}/ffmpeg
16

17
# Download the public key for github.com
18
RUN mkdir -p -m 0600 ~/.ssh && ssh-keyscan github.com >> ~/.ssh/known_hosts
19

20
# Copy requirements and install dependencies
21
COPY requirements.txt ./
22
RUN --mount=type=ssh pip install -r ./requirements.txt
23

24
# Copy all contents of current directory
25
COPY . ./
26

27
CMD ["main.handler"]

We first install the necessary tools and libraries for downloading and extracting FFmpeg.

The FFmpeg install is where the magic happens. We:
1. Create a directory for FFmpeg within ${LAMBDA_TASK_ROOT}, which is /var/task in the Lambda environment
2. Download a static build of FFmpeg (no system dependencies required)
3. Extract the files and clean up
4. Add FFmpeg to the PATH environment variable
5. Make the FFmpeg binaries executable

Using a static build is important since it eliminates dependencies on system libraries that might not be available in the Lambda environment.

Finally, we copy our requirements and application code, installing dependencies with pip. Note the --mount=type=ssh flag, which allows the pip install process to access private GitHub repositories if your project depends on them.

Using FFmpeg via PyDub

In our example, we don't call FFmpeg directly. Instead, we use PyDub, a Python library that provides a convenient abstraction layer over FFmpeg. Here's how our code processes audio files:

1
from pydub import AudioSegment
2

3
# ...
4

5
@event_parser
6
def handler(event: S3Model, context):
7
 # ...
8
 for record in event.Records:
9
 filename = record.s3.object.key
10

11
 # Download the audio file from S3
12
 file_path = os.path.join("/tmp", filename)
13
 s3.download_file(record.s3.bucket.name, filename, file_path)
14

15
 # Split the audio file into chunks
16
 audio = AudioSegment.from_file(file_path)
17

18
 # Process chunks in parallel using ThreadPoolExecutor
19
 with cf.ThreadPoolExecutor() as tpe:
20
   futures = {
21
     tpe.submit(
22
       _extract_chunk_and_upload, audio, filename, chunk_idx, start_idx
23
     ): start_idx for chunk_idx, start_idx in enumerate(range(0, len(audio), settings.chunk_size)
24
   )
25
 }
26
 # ...

The key line here is AudioSegment.from_file(file_path) which internally calls FFmpeg to load the audio file into memory. PyDub automatically detects and uses the FFmpeg binary available in our PATH.

Extracting and processing chunks

Once we have the audio loaded, we can process it in chunks:

1
def _extract_chunk_and_upload(
2
 audio: AudioSegment, filename: str, chunk_idx: int, start_idx: int
3
):
4
 # Get end index and create chunk
5
 end_index = (
6
   start_idx + settings.chunk_size if len(audio) > start_idx + settings.chunk_size else len(audio) - 1
7
 )
8
 chunk: AudioSegment = audio[start_idx:end_index] # type: ignore
9

10
 # Define filepaths and names
11
 filename_prefix, extension = os.path.splitext(filename)
12
 chunk_filename = f"{filename_prefix}_{chunk_idx}{extension}"
13
 chunk_filepath = f"/tmp/{chunk_filename}"
14

15
 # Export chunk to mp3 file and upload to S3
16
 chunk.export(chunk_filepath, format=extension.removeprefix("."))
17
 s3.upload_file(
18
   chunk_filepath,
19
   settings.transcription_bucket_name,
20
   f"{filename_prefix}/{chunk_idx}{extension}",
21
 )
22
 return f"{Chunk.SK_PREFIX}{chunk_idx}"

The chunk.export() method also uses FFmpeg internally to encode the audio segment back to the desired format.

Performance considerations

When working with FFmpeg in Lambda, keep these performance considerations in mind:

1. Memory allocation: Audio and video processing is memory intensive. For our chunking function, allocate at least 1-2GB of memory.

2. Execution timeouts: Processing large files takes time. Set your Lambda timeout appropriately (up to 15 minutes).

3. Temporary storage: Lambda provides 512MB of non-persistent temporary storage in /tmp. For larger files, consider processing in smaller batches.

4. Parallel processing: As shown in our example, using ThreadPoolExecutor allows you to process multiple chunks in parallel, significantly improving performance.

DynamoDB for State Management and Persistence

A critical component of our architecture is the use of DynamoDB for state management and persistence. We use the dyntastic library as an elegant model layer over DynamoDB:

1
# Save chunk entity to DB
2
chunk_sk = future.result()
3
_chunk = Chunk(pk=pk, sk=chunk_sk, status=ChunkStatus.PENDING)
4
_chunk.save()

DynamoDB serves several important functions in our pipeline:

1. Processing state - Tracks which chunks are pending, in progress, or completed
2. Deduplication - Prevents processing the same file multiple times with condition expressions
3. Reassembly metadata - Stores information needed to correctly order chunks during reassembly
4. Durability - Persists state information even if Lambda functions are restarted

For example, in our processor function, we check if all chunks are complete before reassembling the transcript:

1
def _all_chunks_completed(pk: str, sk: str):
2
 """
3
 Check all chunks in file are transcribed. First update the current chunk status to
4
 COMPLETED. Then check if all chunks are COMPLETED.
5
 """
6
 _chunk = Chunk.get(pk, sk)
7
 _chunk.update(A.status.set(ChunkStatus.COMPLETED))
8

9
 num_incomplete_chunks = 0
10
 for _ in Chunk.query(
11
   pk,
12
   range_key_condition=A.sk.begins_with(Chunk.SK_PREFIX),
13
   filter_condition=A.status.ne(ChunkStatus.COMPLETED),
14
   consistent_read=True,
15
 ):
16
   num_incomplete_chunks += 1
17

18
   if num_incomplete_chunks != 0:
19
     logger.info(
20
       f"Chunks are not all completed. {num_incomplete_chunks} chunks remain incomplete."
21
     )
22
     return False
23
 return True

Using DynamoDB with the expressive dyntastic models fits perfectly with our rapid development approach, providing a powerful persistence layer without requiring traditional database setup or maintenance.

Conclusion

In this post we showed how, with container image support for Lambda, using tools like FFmpeg has become much more straightforward, enabling serverless media processing workflows, allowing you to build scalable audio and video processing pipelines without managing servers.

The approach shown here—using a static build of FFmpeg to enable parallel audio processing and transcription—demonstrates how Lambda functions can be leveraged to dramatically improve processing times for media workloads.

At Weird Sheep Labs, we specialise in serverless application development on AWS. If this sounds like something your company could benefit from, don't hesitate to get in touch!

© Weird Sheep Labs Ltd 2025
Weird Sheep Labs Ltd is a company registered in England & Wales (Company No. 15160367)
85 Great Portland St, London, W1W 7LT