The challenge

Processing audio and video files in AWS Lambda presents several challenges:

1. Lambda's default runtime environments don't include FFmpeg
2. The /tmp directory is the only writable location
3. Memory and execution time limitations
4. Deploying binary dependencies can be complicated

Use case

We recently built an audio transcription service proof of concept for a client where the primary requirement was to enable significantly faster speech-to-text transcription with AWS Transcribe.

The solution we landed on was to chunk the input audio files; instead of submitting a single large file as one transcription job, we split the audio file into multiple chunks using FFmpeg and sent the chunks to be handled by Transcribe simultaneously. This parallel processing approach drastically reduces the overall transcription time for large audio files.

For example, a 30-minute audio file that might take 10-15 minutes to transcribe as a single job could be split into 60 thirty-second chunks and transcribed in roughly 1-2 minutes total, as all chunks are processed in parallel.

Why serverless for this proof of concept?

This architecture was deliberately built using serverless technologies like AWS Lambda for several key reasons:

1. Speed of development - Using Lambda functions allowed the team to rapidly build and iterate on the solution without managing infrastructure
2. Pay-as-you-go pricing - With a proof of concept where usage patterns are unknown, serverless provides cost efficiency by only charging for actual usage
3. Auto-scaling - The solution automatically scales from processing a single file to handling many files concurrently
4. Managed services integration - Lambda functions integrate seamlessly with S3, DynamoDB, and AWS Transcribe

The overall architecture of the transcription service.

Including FFmpeg in Lambda

The key to running FFmpeg in Lambda is properly configuring the Docker container. Let's examine the Dockerfile:

FROM public.ecr.aws/lambda/python:3.12
# Install git so git dependencies can be installed
RUN dnf update -y && dnf install -y git tar xz wget

# Install ffmpeg
RUN mkdir -p ${LAMBDA_TASK_ROOT}/ffmpeg && \
 cd ${LAMBDA_TASK_ROOT}/ffmpeg && \
 wget https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz && \
 tar -xvf ffmpeg-release-amd64-static.tar.xz && \
 mv ffmpeg-*-amd64-static/* . && \
 rm -rf ffmpeg-*-amd64-static && \
 rm ffmpeg-release-amd64-static.tar.xz

 ENV PATH="${LAMBDA_TASK_ROOT}/ffmpeg:${PATH}"
RUN chmod -R +x ${LAMBDA_TASK_ROOT}/ffmpeg

# Download the public key for github.com
RUN mkdir -p -m 0600 ~/.ssh && ssh-keyscan github.com >> ~/.ssh/known_hosts

# Copy requirements and install dependencies
COPY requirements.txt ./
RUN --mount=type=ssh pip install -r ./requirements.txt

# Copy all contents of current directory
COPY . ./

CMD ["main.handler"]

We first install the necessary tools and libraries for downloading and extracting FFmpeg.

The FFmpeg install is where the magic happens. We:
1. Create a directory for FFmpeg within ${LAMBDA_TASK_ROOT}, which is /var/task in the Lambda environment
2. Download a static build of FFmpeg (no system dependencies required)
3. Extract the files and clean up
4. Add FFmpeg to the PATH environment variable
5. Make the FFmpeg binaries executable

Using a static build is important since it eliminates dependencies on system libraries that might not be available in the Lambda environment.

Finally, we copy our requirements and application code, installing dependencies with pip. Note the --mount=type=ssh flag, which allows the pip install process to access private GitHub repositories if your project depends on them.

Using FFmpeg via PyDub

In our example, we don't call FFmpeg directly. Instead, we use PyDub, a Python library that provides a convenient abstraction layer over FFmpeg. Here's how our code processes audio files:

from pydub import AudioSegment

# ...

@event_parser
def handler(event: S3Model, context):
 # ...
 for record in event.Records:
 filename = record.s3.object.key

 # Download the audio file from S3
 file_path = os.path.join("/tmp", filename)
 s3.download_file(record.s3.bucket.name, filename, file_path)

 # Split the audio file into chunks
 audio = AudioSegment.from_file(file_path)

 # Process chunks in parallel using ThreadPoolExecutor
 with cf.ThreadPoolExecutor() as tpe:
   futures = {
     tpe.submit(
       _extract_chunk_and_upload, audio, filename, chunk_idx, start_idx
     ): start_idx for chunk_idx, start_idx in enumerate(range(0, len(audio), settings.chunk_size)
   )
 }
 # ...

The key line here is AudioSegment.from_file(file_path) which internally calls FFmpeg to load the audio file into memory. PyDub automatically detects and uses the FFmpeg binary available in our PATH.

Extracting and processing chunks

Once we have the audio loaded, we can process it in chunks:

def _extract_chunk_and_upload(
 audio: AudioSegment, filename: str, chunk_idx: int, start_idx: int
):
 # Get end index and create chunk
 end_index = (
   start_idx + settings.chunk_size if len(audio) > start_idx + settings.chunk_size else len(audio) - 1
 )
 chunk: AudioSegment = audio[start_idx:end_index] # type: ignore

 # Define filepaths and names
 filename_prefix, extension = os.path.splitext(filename)
 chunk_filename = f"{filename_prefix}_{chunk_idx}{extension}"
 chunk_filepath = f"/tmp/{chunk_filename}"

 # Export chunk to mp3 file and upload to S3
 chunk.export(chunk_filepath, format=extension.removeprefix("."))
 s3.upload_file(
   chunk_filepath,
   settings.transcription_bucket_name,
   f"{filename_prefix}/{chunk_idx}{extension}",
 )
 return f"{Chunk.SK_PREFIX}{chunk_idx}"

The chunk.export() method also uses FFmpeg internally to encode the audio segment back to the desired format.

Performance considerations

When working with FFmpeg in Lambda, keep these performance considerations in mind:

1. Memory allocation: Audio and video processing is memory intensive. For our chunking function, allocate at least 1-2GB of memory.

2. Execution timeouts: Processing large files takes time. Set your Lambda timeout appropriately (up to 15 minutes).

3. Temporary storage: Lambda provides 512MB of non-persistent temporary storage in /tmp. For larger files, consider processing in smaller batches.

4. Parallel processing: As shown in our example, using ThreadPoolExecutor allows you to process multiple chunks in parallel, significantly improving performance.

DynamoDB for state management and persistence

A critical component of our architecture is the use of DynamoDB for state management and persistence. We use the dyntastic library as an elegant model layer over DynamoDB:

`1`	`# Save chunk entity to DB`
`2`	`chunk_sk = future.result()`
`3`	`_chunk = Chunk(pk=pk, sk=chunk_sk, status=ChunkStatus.PENDING)`
`4`	`_chunk.save()`

DynamoDB serves several important functions in our pipeline:

1. Processing state - Tracks which chunks are pending, in progress, or completed
2. Deduplication - Prevents processing the same file multiple times with condition expressions
3. Reassembly metadata - Stores information needed to correctly order chunks during reassembly
4. Durability - Persists state information even if Lambda functions are restarted

For example, in our processor function, we check if all chunks are complete before reassembling the transcript:

def _all_chunks_completed(pk: str, sk: str):
 """
 Check all chunks in file are transcribed. First update the current chunk status to
 COMPLETED. Then check if all chunks are COMPLETED.
 """
 _chunk = Chunk.get(pk, sk)
 _chunk.update(A.status.set(ChunkStatus.COMPLETED))

 num_incomplete_chunks = 0
 for _ in Chunk.query(
   pk,
   range_key_condition=A.sk.begins_with(Chunk.SK_PREFIX),
   filter_condition=A.status.ne(ChunkStatus.COMPLETED),
   consistent_read=True,
 ):
   num_incomplete_chunks += 1

   if num_incomplete_chunks != 0:
     logger.info(
       f"Chunks are not all completed. {num_incomplete_chunks} chunks remain incomplete."
     )
     return False
 return True

Using DynamoDB with the expressive dyntastic models fits perfectly with our rapid development approach, providing a powerful persistence layer without requiring traditional database setup or maintenance.

Conclusion

In this post we showed how, with container image support for Lambda, using tools like FFmpeg has become much more straightforward, enabling serverless media processing workflows, allowing you to build scalable audio and video processing pipelines without managing servers.

The approach shown here—using a static build of FFmpeg to enable parallel audio processing and transcription—demonstrates how Lambda functions can be leveraged to dramatically improve processing times for media workloads.

At Weird Sheep Labs, we specialise in serverless application development on AWS. If this sounds like something your company could benefit from, don't hesitate to get in touch!

`1`	`FROM public.ecr.aws/lambda/python:3.12`
`2`	`# Install git so git dependencies can be installed`
`3`	`RUN dnf update -y && dnf install -y git tar xz wget`
`4`
`5`	`# Install ffmpeg`
`6`	`RUN mkdir -p ${LAMBDA_TASK_ROOT}/ffmpeg && \`
`7`	`cd ${LAMBDA_TASK_ROOT}/ffmpeg && \`
`8`	`wget https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz && \`
`9`	`tar -xvf ffmpeg-release-amd64-static.tar.xz && \`
`10`	`mv ffmpeg--amd64-static/ . && \`
`11`	`rm -rf ffmpeg-*-amd64-static && \`
`12`	`rm ffmpeg-release-amd64-static.tar.xz`
`13`
`14`	`ENV PATH="${LAMBDA_TASK_ROOT}/ffmpeg:${PATH}"`
`15`	`RUN chmod -R +x ${LAMBDA_TASK_ROOT}/ffmpeg`
`16`
`17`	`# Download the public key for github.com`
`18`	`RUN mkdir -p -m 0600 ~/.ssh && ssh-keyscan github.com >> ~/.ssh/known_hosts`
`19`
`20`	`# Copy requirements and install dependencies`
`21`	`COPY requirements.txt ./`
`22`	`RUN --mount=type=ssh pip install -r ./requirements.txt`
`23`
`24`	`# Copy all contents of current directory`
`25`	`COPY . ./`
`26`
`27`	`CMD ["main.handler"]`

`1`	`from pydub import AudioSegment`
`2`
`3`	`# ...`
`4`
`5`	`@event_parser`
`6`	`def handler(event: S3Model, context):`
`7`	`# ...`
`8`	`for record in event.Records:`
`9`	`filename = record.s3.object.key`
`10`
`11`	`# Download the audio file from S3`
`12`	`file_path = os.path.join("/tmp", filename)`
`13`	`s3.download_file(record.s3.bucket.name, filename, file_path)`
`14`
`15`	`# Split the audio file into chunks`
`16`	`audio = AudioSegment.from_file(file_path)`
`17`
`18`	`# Process chunks in parallel using ThreadPoolExecutor`
`19`	`with cf.ThreadPoolExecutor() as tpe:`
`20`	`futures = {`
`21`	`tpe.submit(`
`22`	`_extract_chunk_and_upload, audio, filename, chunk_idx, start_idx`
`23`	`): start_idx for chunk_idx, start_idx in enumerate(range(0, len(audio), settings.chunk_size)`
`24`	`)`
`25`	`}`
`26`	`# ...`

`1`	`def _extract_chunk_and_upload(`
`2`	`audio: AudioSegment, filename: str, chunk_idx: int, start_idx: int`
`3`	`):`
`4`	`# Get end index and create chunk`
`5`	`end_index = (`
`6`	`start_idx + settings.chunk_size if len(audio) > start_idx + settings.chunk_size else len(audio) - 1`
`7`	`)`
`8`	`chunk: AudioSegment = audio[start_idx:end_index] # type: ignore`
`9`
`10`	`# Define filepaths and names`
`11`	`filename_prefix, extension = os.path.splitext(filename)`
`12`	`chunk_filename = f"{filename_prefix}_{chunk_idx}{extension}"`
`13`	`chunk_filepath = f"/tmp/{chunk_filename}"`
`14`
`15`	`# Export chunk to mp3 file and upload to S3`
`16`	`chunk.export(chunk_filepath, format=extension.removeprefix("."))`
`17`	`s3.upload_file(`
`18`	`chunk_filepath,`
`19`	`settings.transcription_bucket_name,`
`20`	`f"{filename_prefix}/{chunk_idx}{extension}",`
`21`	`)`
`22`	`return f"{Chunk.SK_PREFIX}{chunk_idx}"`

`1`	`def _all_chunks_completed(pk: str, sk: str):`
`2`	`"""`
`3`	`Check all chunks in file are transcribed. First update the current chunk status to`
`4`	`COMPLETED. Then check if all chunks are COMPLETED.`
`5`	`"""`
`6`	`_chunk = Chunk.get(pk, sk)`
`7`	`_chunk.update(A.status.set(ChunkStatus.COMPLETED))`
`8`
`9`	`num_incomplete_chunks = 0`
`10`	`for _ in Chunk.query(`
`11`	`pk,`
`12`	`range_key_condition=A.sk.begins_with(Chunk.SK_PREFIX),`
`13`	`filter_condition=A.status.ne(ChunkStatus.COMPLETED),`
`14`	`consistent_read=True,`
`15`	`):`
`16`	`num_incomplete_chunks += 1`
`17`
`18`	`if num_incomplete_chunks != 0:`
`19`	`logger.info(`
`20`	`f"Chunks are not all completed. {num_incomplete_chunks} chunks remain incomplete."`
`21`	`)`
`22`	`return False`
`23`	`return True`

Using FFmpeg in AWS Lambda with Docker