---
title: Automatic captioning for video uploads · Cloudflare Reference Architecture docs
description: By integrating automatic speech recognition technology into video
  platforms, content creators, publishers, and distributors can reach a broader
  audience, including individuals with hearing impairments or those who prefer
  to consume content in different languages.
lastUpdated: 2025-10-13T13:40:40.000Z
chatbotDeprioritize: false
tags: AI
source_url:
  html: https://developers.cloudflare.com/reference-architecture/diagrams/ai/ai-video-caption/
  md: https://developers.cloudflare.com/reference-architecture/diagrams/ai/ai-video-caption/index.md
---

## Introduction

Automatic Speech Recognition (ASR) models have revolutionized the accessibility of video content by enabling the generation of subtitles and translations. These models utilize advanced algorithms to transcribe spoken words into text with high accuracy. By integrating ASR technology into video platforms, content creators, publishers, and distributors can reach a broader audience, including individuals with hearing impairments or those who prefer to consume content in different languages.

The process begins with capturing the audio from the video source, which is then fed into the ASR model. This model analyzes the audio waveform and converts it into a textual representation, capturing the spoken content in the form of subtitles. Furthermore, you can also use ASR models for language translation, enabling the creation of multilingual subtitles. Once the subtitles are generated, they can be displayed alongside the video, providing a synchronized text representation of the spoken content.

## Automatic captioning on upload

![Figure 1: Automatic captioning on upload](https://developers.cloudflare.com/_astro/ai-auto-caption-architecture-diagram.CyBpgQKS_2g5MiU.svg)

1. **Client upload**: Send POST request with both video and audio to API endpoint.
2. **Audio transcription**: Generate timestamped transcriptions by calling [Workers AI](https://developers.cloudflare.com/workers-ai/) [automatic speech recognition (ARS) model](https://developers.cloudflare.com/workers-ai/models/) with audio as input. Use [Workers](https://developers.cloudflare.com/workers/) to convert the output to a supported subtitled format.
3. **Store subtitles**: Store the subtitle file(s) on [R2](https://developers.cloudflare.com/r2/).
4. **Store video**: Store the video files on [R2](https://developers.cloudflare.com/r2/).
5. **Client request**: Send GET requests for video and subtitle(s) to origin. Use global [Cache](https://developers.cloudflare.com/cache/) to increase performance.
6. **Origin request**: Fetch file(s) from [R2](https://developers.cloudflare.com/r2/) on cache `MISS` by using [Public Buckets](https://developers.cloudflare.com/r2/buckets/public-buckets/).

## Related resources

* [Community project: automatic captioning demo](https://auto-caption.pages.dev/)
* [Workers AI: Automatic speech recognition (ARS) model](https://developers.cloudflare.com/workers-ai/models/)
* [R2: Object storage for all your data](https://developers.cloudflare.com/r2/)
