Autosub

简体中文

This repo is not the same as the original autosub repo.

This repo has been modified by several people. See the Changelog.

autosub icon designed by BingLingGroup.

Software: inkscape

Font: source-han-sans (SIL)

Color: Solarized

Description
License
Dependencies
- 3.1 Optional Dependencies
- 3.2 Required Dependencies
Download and Installation
- 4.1 Branches
- 4.2 Install on Ubuntu
- 4.3 Install on Windows
Workflow
- 5.1 Input
- 5.2 Split
- 5.3 Speech-to-Text/Translation API request
- 5.4 Speech-to-Text/Translation language support
- 5.5 Output
Usage
- 6.1 Typical usage
  - 6.1.1 Pre-process Audio
  - 6.1.2 Detect Regions
  - 6.1.3 Split Audio
  - 6.1.4 Transcribe Audio To Subtitles
    - 6.1.4.1 Google Speech V2
    - 6.1.4.2 Google Cloud Speech-to-Text
    - 6.1.4.3 Google speech config
    - 6.1.4.4 Output API full response
    - 6.1.4.5 Xfyun speech config
    - 6.1.4.6 Baidu speech config
  - 6.1.5 Translate Subtitles
- 6.2 Options
- 6.3 Internationalization
FAQ
- 7.1 Other APIs supports
- 7.2 Batch processing
- 7.3 proxy support
- 7.4 macOS locale issue
- 7.5 Accuracy
Bugs report
Build

Click up arrow to go back to TOC.

Description

Autosub is an automatic subtitles generating utility. It can detect speech regions automatically by using Auditok, split the audio files according to regions by using ffmpeg, transcribe speech based on several APIs and translate the subtitles' text by using py-googletrans.

The new features mentioned above are only available in the latest alpha branch. Not available on PyPI or the original repo.

License

This repo has a different license from the original repo.

GPLv2

Dependencies

Autosub depends on these third party softwares or Python site-packages. Much appreciation to all of these projects.

Optional dependencies

For windows user:

Build Tools for Visual Studio 2019
- Used by marisa-trie when installing.
- marisa-trie is the dependency of the langcodes.
- Probable components installation: MSVC VS C++ build tools, windows 10 SDK.

Required dependencies

requirements.txt.

About how to install these dependencies, see Download and Installation.

↑

Download and Installation

Except the PyPI version, others include non-original codes not from the original repository.

0.4.0 > autosub

These versions are only compatible with Python 2.7.

0.5.6a >= autosub >= 0.4.0

These versions are compatible with both Python 2.7 and Python 3. It doesn't matter if you change the Python version in the installation commands below.

autosub >= 0.5.7a

These versions are only compatible with Python 3.

ffmpeg, ffprobe, ffmpeg-normalize need to be put on one of these places to let the autosub detect and use them. The following codes are in the constants.py. Priority is determined in order.

Set the following environment variables before running the program: FFMPEG_PATH, FFPROBE_PATH and FFMPEG_NORMALIZE_PATH. It will override the ones located at the environment variable PATH. This will be helpful if you don't want to use the one in the PATH.
Add them to the environment variable PATH. No need to worry about if using package manager to install such as using pip to install ffmpeg-normalize and using chocolatey to install ffmpeg.
Add them to the same directory as the autosub executable.
Add them to the current command line working directory.

About the git installation. If you don't want to install git to use pip VCS support to install python package or just confused with git environment variables, you can manually click that clone and download button to download the source code and use pip to install the source code locally by input these commands.

cd the_directory_contains_the_source_code
pip install .

Due to the autosub PyPI project is maintained by the original autosub repo's owner, I can't modify it or upload a project with the same name. Perhaps later when this version of autosub becomes stabler, I will rename and duplicate this repo and then upload it to PyPI.

Branches

alpha branch

Include many changes from the original repo. Details in Changelog. Codes will be updated when an alpha version have been released. It is stabler than the dev branch

origin branch

Include the least changes from the original repo except all new features in the alpha branch. The changes in origin branch just make sure there's no critical bugs when the program is running on Windows. Currently isn't maintained.

dev branch

The latest codes will be pushed to this branch. If it works fine, it will be merged to alpha branch when new version is released.
Only used to test or pull request. Don't install them unless you know what you are doing.

↑

Install on Ubuntu

Include dependencies installation commands.

Install from alpha branch.(latest autosub alpha release)

apt install ffmpeg python3 python3-dev curl git -y
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python3 get-pip.py
pip install git+https://github.com/BingLingGroup/autosub.git@alpha ffmpeg-normalize langcodes

Install from dev branch.(latest autosub dev version)

apt install ffmpeg python3 python3-dev curl git -y
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python3 get-pip.py
pip install git+https://github.com/BingLingGroup/autosub.git@dev ffmpeg-normalize langcodes

Install from origin branch.(autosub-0.4.0a)

apt install ffmpeg python python-pip git -y
pip install git+https://github.com/BingLingGroup/autosub.git@origin

Install from PyPI.(autosub-0.3.12)

apt install ffmpeg python python-pip -y
pip install autosub

Recommend using python3 and python-pip3 instead of python and python-pip after autosub-0.4.0.

↑

Install on Windows

You can just go to the release page and download the latest release(standalone version) for Windows. Thre release version can run without the python environment. The click-and-run batches are also in the package. You can manually edit by using Notepad++. Or add the executable files' directory to system environment variables so you can use it as a universal command everywhere in the system if permission is Ok.

Tips: Shift - Right Click is the keyboard shortcut for opening a Powershell on current directory. To open an exe at current directory, the format is like .\autosub.

Or you can just directly open it and input the args manually though I don't recommend doing this due to its less efficiency.

The one without pyinstaller suffix is compiled by Nuitka. It's faster than the pyinstaller due to its compiling feature different from pyinstaller which just bundles the application.
ffmpeg and ffmpeg-normalize are also in the package. The original ffmpeg-normalize doesn't have a standalone version. The standalone version of ffmpeg-normalize is built separately. Codes are here.
If there's anything wrong on the both releases, or the package size and any other things are annoying you, you can just use the traditional pip installation method below.

Or install Python environment(if you still don't have one) from choco and then install the package.

Recommend using chocolatey on windows to install the environment and dependencies.

Choco installation command is for cmd.(not Powershell)

@"%SystemRoot%\System32\WindowsPowerShell\v1.0\powershell.exe" -NoProfile -InputFormat None -ExecutionPolicy Bypass -Command "iex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1'))" && SET "PATH=%PATH%;%ALLUSERSPROFILE%\chocolatey\bin"

If you don't have Build Tools for Visual Studio 2019, please install autosub without langcodes.

Install from alpha branch.(latest autosub alpha release)

choco install git python curl ffmpeg -y
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python get-pip.py
pip install git+https://github.com/BingLingGroup/autosub.git@alpha ffmpeg-normalize langcodes

Install from dev branch.(latest autosub dev version)

choco install git python curl ffmpeg -y
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python get-pip.py
pip install git+https://github.com/BingLingGroup/autosub.git@dev ffmpeg-normalize langcodes

Install from origin branch.(autosub-0.4.0a)

choco install git python2 curl ffmpeg -y
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python get-pip.py
pip install git+https://github.com/BingLingGroup/autosub.git@origin

PyPI version(autosub-0.3.12) is not recommended using on windows because it just can't run successfully. See the changelog on the origin branch and you will know the details.

Recommend using python instead of python2 for autosub-0.4.0.

↑

Workflow

Input

A video/audio/subtitles file.

If it is a video or audio file, use ffmpeg to convert the format into the proper one for API. Any format supported by ffmpeg is OK to input, but the output or processed format for API is limited by API and autosub codes.

Supported formats below:

Google-Speech-v2

24bit/44100Hz/mono FLAC(default)
Other format like OGG_OPUS isn't supported by API. (I've tried modifying requests headers or json requests and it just don't work) Or format like PCM has less bits per sample but more storage usage than FLAC. Although the API support it but I think it's unnecessary to modify codes to support it.

Google Cloud Speech-to-Text API v1p1beta1

Supported
- 24bit/44100Hz/mono FLAC(default)
Supported but not default args (more info on Transcribe Audio To Subtitles)
- 8000Hz|12000Hz|16000Hz|24000Hz|48000Hz/mono OGG_OPUS
- MP3
- 16bit/mono PCM

Xfyun Speech-to-Text WebSocket API/Baidu ASR API/Baidu ASR Pro API

Supported
- 16bit/16000Hz/mono PCM

Also, you can use the built-in audio pre-processing function though Google doesn't recommend doing this. Honestly speaking, if your audio volume is not been standardized like too loud or too quiet, it's recommended to use some tools or just the built-in function to standardize it. The default pre-processing commands depend on the ffmpeg-normalize and ffmpeg. The commands include three commands. The first is for converting stereo to mono. The second is for filtering out the sound not in the frequency of speech. The third is to normalize the audio to make sure it is not too loud or too quiet. If you are not satisfied with the default commands, you can also modified them yourself by input -apc option. Still, it currently only supports 24bit/44100Hz/mono FLAC format.

If it is a subtitles file and you give the proper arguments, only translate it by py-googletrans.

Split

Audio length limits:

Google-Speech-v2

No longer than 10 to 15 seconds.
In autosub it is set as the 60-seconds-limit.

Google Cloud Speech-to-Text API

No longer than 1 minute.
In autosub it is currently set the same as the 60-seconds-limit.
Currently only support sync-recognize means only short-term audio supported.

Xfyun Speech-to-Text WebSocket API/Baidu ASR API/Baidu ASR Pro API

Same limit above.

Autosub uses Auditok to detect speech regions. And then use them to split as well as convert the video/audio into many audio fragments. Each fragment per region per API request. All these audio fragments are converted directly from input to avoid any extra quality loss.

Or uses external regions from the file that pysubs2 supports like .ass or .srt. This will allow you to manually adjust the regions to get better recognition result.

Speech-to-Text/Translation API request

Makes parallel requests to generate transcriptions for those regions. One audio fragment per request. Recognition speed mostly depends on your network upload speed.

Manually post-processing for the subtitles lines may be needed, some of which are too long to be fitted in a single line at the bottom of the video frame.

After Speech-to-Text, translates them to a different language. Combining multiple lines of text to a chunk of text to request for result. Details at issue #49. And finally saves the result subtitles to the local storage.

↑

Speech-to-Text/Translation language support

Below is only for Google API language codes description. About other API: Xfyun speech config, baidu speech config.

The Speech-to-Text lang codes are different from the Translation lang codes due to the difference between these two APIs. And of course, they are in Google formats, not following the iso standards, making users more confused to use.

To solve this problem, autosub uses langcodes to detect input lang code and convert it to a best match according to the lang code lists. Default it won't be enabled. To enable it in different phases, use -bm all option.

To manually match or see the full list of the lang codes, run the utility with the argument -lsc/--list-speech-codes and -ltc/ --list-translation-codes. Or open constants.py and check.

To get the language of the first line of the subtitles file, you can use -dsl to detect.

Currently, autosub allows to send the lang codes not from the --list-speech-codes, which means in this case the program won't stop.
Though you can input the speech lang code whatever you want, need to point out that if not using the codes on the list but somehow the API accept it, Google-Speech-v2 recognizes your audio in the ways that depend on your IP address which is uncontrollable by yourself. This is a known issue and I ask for a pull request in the original repo.
On the other hand, py-googletrans is stricter. When it receive a lang code not on its list, it will throw an exception and stop translation.
Apart from the user input, another notable change is I split the -S option into two parts, -S and -SRC. -S option is for speech recognition lang code. -SRC is for translation source language. When not offering the arg of -SRC, autosub will automatically match the -S arg by using langcodes and get a best-match lang code for translation source language though py-googletrans can auto-detect source language. Of course you can manually specify one by input -SRC option. -D is for translation destination language, still the same as before.

↑

Output

Currently support the following formats to output.

OUTPUT_FORMAT = {
    'srt': 'SubRip',
    'ass': 'Advanced SubStation Alpha',
    'ssa': 'SubStation Alpha',
    'sub': 'MicroDVD Subtitle',
    'mpl2.txt': 'Similar to MicroDVD',
    'tmp': 'TMP Player Subtitle Format',
    'vtt': 'WebVTT',
    'json': 'json(Only times and text)',
    'ass.json': 'json(Complex ass content json)',
    'txt': 'Plain Text(Text or times)'
}

Or other subtitles types/output modes, depend on what you need. More info in help message.

DEFAULT_MODE_SET = {
    'regions',
    'src',
    'full-src',
    'dst',
    'bilingual',
    'dst-lf-src',
    'src-lf-dst'
}

↑

Usage

For the original autosub usage, see 简体中文使用指南.

For the modified alpha branch version, see the typical usage below.

Typical usage

Pre-process Audio

Use default Audio pre-processing.

Pre-processing only.

autosub -i input_file -ap o

Pre-processing as a part.

autosub -i input_file -ap y ...(other options)

Detect Regions

Detect regions by using Auditok.

Getting regions only.

autosub -i input_file

Getting regions as a part.

autosub -i input_file -of regions ...(other options)

↑

Split Audio

Get audio fragments according to the regions.

Only get audio fragments according to auto regions detection.

autosub -i input_file -ap s

Only get audio fragments according to external regions.

autosub -i input_file -ap s -er external_regions_subtitles

Getting audio fragments as a part.

autosub -i input_file -k ...(other options)

Transcribe Audio To Subtitles

Speech audio fragments to speech language subtitles.

Google Speech V2

Use default Google-Speech-v2 to transcribe speech language subtitles only.

autosub -i input_file -S lang_code

Use default Google-Speech-v2 to transcribe speech language subtitles as a part.

autosub -i input_file -S lang_code -of src ...(other options)

↑

Google Cloud Speech-to-Text

Use Google Cloud Speech-to-Text API service account(GOOGLE_APPLICATION_CREDENTIALS has already been set in system environment variable) to transcribe.

autosub -i input_file -sapi gcsv1 -S lang_code ...(other options)

Use Google Cloud Speech-to-Text API service account(GOOGLE_APPLICATION_CREDENTIALS is set by -sa) to transcribe.(Currently not available in Nuitka build.)

autosub -i input_file -sapi gcsv1 -S lang_code -sa path_to_key_file ...(other options)

Use Google Cloud Speech-to-Text API key to transcribe.

autosub -i input_file -sapi gcsv1 -S lang_code -skey API_key ...(other options)

Use 48000Hz OGG_OPUS in Google Cloud Speech-to-Text API. The conversion commands will be automatically modified by these codes.

autosub -i input
E63
_file -sapi gcsv1 -asf .ogg -asr 48000 ...(other options)

Use MP3 in Google Cloud Speech-to-Text API.(Not recommended because OGG_OPUS is better than MP3)

autosub -i input_file -sapi gcsv1 -asf .mp3 ...(other options)

↑

Google Speech config

Use customized speech config file to send request to Google Cloud Speech API. If using the config file, override these options: -S, -asr, -asf.

language_code will be replaced by the best matching one if using option -bm src or -bm all. encoding string will be replaced by the enum in google.cloud.speech_v1p1beta1.enums.RecognitionConfig.AudioEncoding if using service account credentials. Default encoding is FLAC. Default sample_rate_hertz is 44100.

Example speech config file:

{
    "language_code": "zh",
    "enable_word_time_offsets": true
}

If not provide option -asr and -asf, equal to:

{
    "language_code": "zh",
    "sample_rate_hertz": 44100,
    "encoding": "FLAC",
    "enable_word_time_offsets": true
}

otherwise:

{
    "language_code": "zh",
    "sample_rate_hertz": "from --api-sample-rate",
    "encoding": "from --api-suffix",
    "enable_word_time_offsets": true
}

Name		Name	Last commit message	Last commit date
Latest commit History 283 Commits
.github		.github
autosub		autosub
docs		docs
scripts		scripts
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pylintrc		pylintrc
requirements.txt		requirements.txt
setup.py		setup.py

License

BingLingGroup/autosub

Folders and files

Latest commit

History

Repository files navigation

Autosub

TOC

Description

License

Dependencies

Optional dependencies

Required dependencies

Download and Installation

Branches

Install on Ubuntu

Install on Windows

Workflow

Input

Split

Speech-to-Text/Translation API request

Speech-to-Text/Translation language support

Output

Usage

Typical usage

Pre-process Audio

Detect Regions

Split Audio

Transcribe Audio To Subtitles

Google Speech V2

Google Cloud Speech-to-Text

Google Speech config

Output API full response

Xfyun speech config

Baidu speech config

Translate Subtitles

Options

Internationalization

FAQ

Other APIs supports

Batch processing

proxy support

macOS locale issue

Accuracy

Bugs report

Build

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages