Convert audio into text transcriptions and integrate speech recognition into applications with easy-to-use APIs.
New customers also get up to $300 in free credits to try Speech-to-Text and other Google Cloud products.
Features
Speech-to-Text can utilize Chirp 3, Google Cloud’s foundation model for speech trained on millions of hours of audio data and billions of text sentences. This contrasts with traditional speech recognition techniques that focus on large amounts of language-specific supervised data. These techniques give users improved recognition and transcription for more spoken languages and accents.
Build for a global user base with extensive language support. Transcribe short, long, and even streaming audio data. Speech-to-Text also offers users more accurate and globe-spanning deployments for transcription with Chirp 3, the next generation of universal speech models.
Chirp 3: Transcription was built using self-supervised training on millions of hours of audio and 28 billion sentences of text spanning 100+ languages.
Receive real-time speech recognition results as the API processes the audio input streamed from your application’s microphone or sent from a prerecorded audio file (inline or through Cloud Storage).
Speech-to-Text uses model adaptation to improve the accuracy of frequently used words, expand the vocabulary available for transcription, and improve transcription from noisy audio. Model adaptation lets users customize Speech-to-Text to recognize specific words or phrases more frequently than other options that might otherwise be suggested. For example, you could bias Speech-to-Text towards transcribing "weather" over "whether."
Speech-to-Text API v2 gives enterprise and business customers added security and regulatory requirements out of the box. Data residency enables the invocation of transcription models through a fully regionalized service that taps into Google Cloud regions like Singapore and Belgium. Logs for resource generation and transcription are made easily available in the Google Cloud console. And Speech-to-Text API v2 offers enterprise-grade encryption with customer-managed encryption keys for all resources as well as batch transcription.
Have full control over your infrastructure and protected speech data while leveraging Google’s speech recognition technology on-premises, right in your own private data centers. Contact sales to get started.
Speech-to-Text can recognize distinct channels in multichannel situations (for example, video conference) and annotate the transcripts to preserve the order.
Speech-to-Text can handle noisy audio from many environments without requiring additional noise cancellation.
Choose from a selection of trained models for voice control and phone call and video transcription optimized for domain-specific quality requirements. For example, our enhanced phone call model is tuned for audio originated from telephony, such as phone calls recorded at an 8khz sampling rate.
Profanity filter helps you detect inappropriate or unprofessional content in your audio data and filter out profane words in text results.
Upload your own voice data and have it transcribed with no code. Evaluate quality by iterating on your configuration.
Speech-to-Text accurately punctuates transcriptions, such as by providing commas, question marks, and periods.
Know who said what by receiving automatic predictions about which of the speakers in a conversation spoke each utterance.
Compare Speech-to-Text Chirp model in API and Vertex AI Studio
Product | What is it | Best for | Key features |
---|---|---|---|
Chirp 3: Transcription in Vertex AI | A simple to use no code, web-based, graphical user interface. | Rapidly test audio files, quickly prototype, create audio transcription, upload audio or recordings directly into a web browser. | -Enhanced multilingual language detection and transcription -Supports transcription in 85+ languages and variants -Supports speaker diarization and model adaptation -Automatic speech recognition, transcribing audio into text -Multilingual language detection and transcription |
Chirp 3: Transcription on Speech-to-Text V2 API | An API that is the next generation of Google's universal Speech-to-Text model, unifying data from multiple languages. | Building scalable, Enterprise-grade applications. Easy transcription integration into existing software. | -Enhanced multilingual language detection and transcription -Supports transcription in 85+ languages and variants -Supports speaker diarization and model adaptation -Automatic speech recognition, transcribing audio into text -Multilingual language detection and transcription |
Chirp 3: Transcription in Vertex AI
A simple to use no code, web-based, graphical user interface.
Rapidly test audio files, quickly prototype, create audio transcription, upload audio or recordings directly into a web browser.
-Enhanced multilingual language detection and transcription
-Supports transcription in 85+ languages and variants
-Supports speaker diarization and model adaptation
-Automatic speech recognition, transcribing audio into text
-Multilingual language detection and transcription
Chirp 3: Transcription on Speech-to-Text V2 API
An API that is the next generation of Google's universal Speech-to-Text model, unifying data from multiple languages.
Building scalable, Enterprise-grade applications.
Easy transcription integration into existing software.
-Enhanced multilingual language detection and transcription
-Supports transcription in 85+ languages and variants
-Supports speaker diarization and model adaptation
-Automatic speech recognition, transcribing audio into text
-Multilingual language detection and transcription
How It Works
Speech-to-Text has three main methods to perform speech recognition: synchronous, asynchronous, and streaming. Each method returns text results based on if transcription is needed in post processing, periodically, or in real time. Simply put, you'll input audio data and then receive a text-based response.
Demo
Quickly create audio transcription from a file upload or directly speaking into a mic.
Common Uses
Create an audio transcription
Learn how to use the Speech-to-Text API from within the Cloud Console by creating an audio transcription in just a few steps. You can also transcribe short, long, and streaming audio.
Create an audio transcription
Learn how to use the Speech-to-Text API from within the Cloud Console by creating an audio transcription in just a few steps. You can also transcribe short, long, and streaming audio.
Create subtitles for videos using AI
Transcribe your audio and video to include captions. Add subtitles to existing content or in real time to streaming content. Our video transcription model is ideal for indexing or subtitling video and/or multispeaker content and uses similar machine learning technology as YouTube does for video captioning. This tutorial shows you how to use the Google Cloud AI services Speech-to-Text API and Translation API to add subtitles to videos and to provide localized subtitles in other languages.
Create subtitles for videos using AI
Transcribe your audio and video to include captions. Add subtitles to existing content or in real time to streaming content. Our video transcription model is ideal for indexing or subtitling video and/or multispeaker content and uses similar machine learning technology as YouTube does for video captioning. This tutorial shows you how to use the Google Cloud AI services Speech-to-Text API and Translation API to add subtitles to videos and to provide localized subtitles in other languages.
How to add Speech-to-Text to apps
Learn how you can quickly and easily enable Speech-to-Text for your application with Google Cloud. This video covers how to add AI to your application without extensive machine learning model experience. Using the pretrained Speech-to-Text API you'll quickly and easily enable AI for your application.
How to add Speech-to-Text to apps
Learn how you can quickly and easily enable Speech-to-Text for your application with Google Cloud. This video covers how to add AI to your application without extensive machine learning model experience. Using the pretrained Speech-to-Text API you'll quickly and easily enable AI for your application.
Language, speech, text, and translation with Google Cloud APIs
In this course, you'll use the Speech-to-Text API to transcribe an audio file into a text file, translate with the Google Cloud Translation API, and create synthetic speech with Natural Language AI.
Language, speech, text, and translation with Google Cloud APIs
In this course, you'll use the Speech-to-Text API to transcribe an audio file into a text file, translate with the Google Cloud Translation API, and create synthetic speech with Natural Language AI.
Pricing
How Speech-to-Text pricing works | Speech-to-Text pricing is based on the API version, channels, batch methods, and any additional Google Cloud service costs like storage. | |
---|---|---|
API version | Service and capability | Pricing |
Speech-to-Text V2 API | V2 offers data residency for multi and single region deployments of Chirp 3. V2 does include audit logging and support for customer managed encryption keys. | $0.016 per min |
View pricing details for Speech-to-Text.
How Speech-to-Text pricing works
Speech-to-Text pricing is based on the API version, channels, batch methods, and any additional Google Cloud service costs like storage.
Speech-to-Text V2 API
V2 offers data residency for multi and single region deployments of Chirp 3. V2 does include audit logging and support for customer managed encryption keys.
$0.016
per min
View pricing details for Speech-to-Text.