The challenge
Processing audio and video files in AWS Lambda presents several challenges:
1. Lambda's default runtime environments don't include FFmpeg
2. The /tmp
directory is the only writable location
3. Memory and execution time limitations
4. Deploying binary dependencies can be complicated
Use case
We recently built an audio transcription service proof of concept for a client where the primary requirement was to enable significantly faster speech-to-text transcription with AWS Transcribe.
The solution we landed on was to chunk the input audio files; instead of submitting a single large file as one transcription job, we split the audio file into multiple chunks using FFmpeg and sent the chunks to be handled by Transcribe simultaneously. This parallel processing approach drastically reduces the overall transcription time for large audio files.
For example, a 30-minute audio file that might take 10-15 minutes to transcribe as a single job could be split into 60 thirty-second chunks and transcribed in roughly 1-2 minutes total, as all chunks are processed in parallel.
Why serverless for this proof of concept?
This architecture was deliberately built using serverless technologies like AWS Lambda for several key reasons:
1. Speed of development - Using Lambda functions allowed the team to rapidly build and iterate on the solution without managing infrastructure
2. Pay-as-you-go pricing - With a proof of concept where usage patterns are unknown, serverless provides cost efficiency by only charging for actual usage
3. Auto-scaling - The solution automatically scales from processing a single file to handling many files concurrently
4. Managed services integration - Lambda functions integrate seamlessly with S3, DynamoDB, and AWS Transcribe

Including FFmpeg in Lambda
The key to running FFmpeg in Lambda is properly configuring the Docker container. Let's examine the Dockerfile:
1 | |
2 | |
3 | |
4 | |
5 | |
6 | |
7 | |
8 | |
9 | |
10 | |
11 | |
12 | |
13 | |
14 | |
15 | |
16 | |
17 | |
18 | |
19 | |
20 | |
21 | |
22 | |
23 | |
24 | |
25 | |
26 | |
27 | |
We first install the necessary tools and libraries for downloading and extracting FFmpeg.
The FFmpeg install is where the magic happens. We:
1. Create a directory for FFmpeg within ${LAMBDA_TASK_ROOT}
, which is /var/task
in the Lambda environment
2. Download a static build of FFmpeg (no system dependencies required)
3. Extract the files and clean up
4. Add FFmpeg to the PATH environment variable
5. Make the FFmpeg binaries executable
Using a static build is important since it eliminates dependencies on system libraries that might not be available in the Lambda environment.
Finally, we copy our requirements and application code, installing dependencies with pip. Note the --mount=type=ssh
flag, which allows the pip install process to access private GitHub repositories if your project depends on them.
Using FFmpeg via PyDub
In our example, we don't call FFmpeg directly. Instead, we use PyDub, a Python library that provides a convenient abstraction layer over FFmpeg. Here's how our code processes audio files:
1 | |
2 | |
3 | |
4 | |
5 | |
6 | |
7 | |
8 | |
9 | |
10 | |
11 | |
12 | |
13 | |
14 | |
15 | |
16 | |
17 | |
18 | |
19 | |
20 | |
21 | |
22 | |
23 | |
24 | |
25 | |
26 | |
The key line here is AudioSegment.from_file(file_path)
which internally calls FFmpeg to load the audio file into memory. PyDub automatically detects and uses the FFmpeg binary available in our PATH.
Extracting and processing chunks
Once we have the audio loaded, we can process it in chunks:
1 | |
2 | |
3 | |
4 | |
5 | |
6 | |
7 | |
8 | |
9 | |
10 | |
11 | |
12 | |
13 | |
14 | |
15 | |
16 | |
17 | |
18 | |
19 | |
20 | |
21 | |
22 | |
The chunk.export()
method also uses FFmpeg internally to encode the audio segment back to the desired format.
Performance considerations
When working with FFmpeg in Lambda, keep these performance considerations in mind:
1. Memory allocation: Audio and video processing is memory intensive. For our chunking function, allocate at least 1-2GB of memory.
2. Execution timeouts: Processing large files takes time. Set your Lambda timeout appropriately (up to 15 minutes).
3. Temporary storage: Lambda provides 512MB of non-persistent temporary storage in /tmp
. For larger files, consider processing in smaller batches.
4. Parallel processing: As shown in our example, using ThreadPoolExecutor
allows you to process multiple chunks in parallel, significantly improving performance.
DynamoDB for State Management and Persistence
A critical component of our architecture is the use of DynamoDB for state management and persistence. We use the dyntastic
library as an elegant model layer over DynamoDB:
1 | |
2 | |
3 | |
4 | |
DynamoDB serves several important functions in our pipeline:
1. Processing state - Tracks which chunks are pending, in progress, or completed
2. Deduplication - Prevents processing the same file multiple times with condition expressions
3. Reassembly metadata - Stores information needed to correctly order chunks during reassembly
4. Durability - Persists state information even if Lambda functions are restarted
For example, in our processor function, we check if all chunks are complete before reassembling the transcript:
1 | |
2 | |
3 | |
4 | |
5 | |
6 | |
7 | |
8 | |
9 | |
10 | |
11 | |
12 | |
13 | |
14 | |
15 | |
16 | |
17 | |
18 | |
19 | |
20 | |
21 | |
22 | |
23 | |
Using DynamoDB with the expressive dyntastic
models fits perfectly with our rapid development approach, providing a powerful persistence layer without requiring traditional database setup or maintenance.
Conclusion
In this post we showed how, with container image support for Lambda, using tools like FFmpeg has become much more straightforward, enabling serverless media processing workflows, allowing you to build scalable audio and video processing pipelines without managing servers.
The approach shown here—using a static build of FFmpeg to enable parallel audio processing and transcription—demonstrates how Lambda functions can be leveraged to dramatically improve processing times for media workloads.
At Weird Sheep Labs, we specialise in serverless application development on AWS. If this sounds like something your company could benefit from, don't hesitate to get in touch!