HTK Speech Recognition Toolkit

What is HTK?

The Hidden Markov Model Toolkit (HTK) is a portable toolkit for building and manipulating hidden Markov models. HTK is primarily used for speech recognition research although it has been used for numerous other applications including research into speech synthesis, character recognition and DNA sequencing. HTK is in use at hundreds of sites worldwide.

HTK consists of a set of library modules and tools available in C source form. The tools provide sophisticated facilities for speech analysis, HMM training, testing and results analysis. The software supports HMMs using both continuous density mixture Gaussians and discrete distributions and can be used to build complex HMM systems. The HTK release contains extensive documentation and examples.

HTK was originally developed at the Machine Intelligence Laboratory (formerly known as the Speech Vision and Robotics Group) of the Cambridge University Engineering Department (CUED) where it has been used to build CUED's large vocabulary speech recognition systems (see CUED HTK LVR). In 1993 Entropic Research Laboratory Inc. acquired the rights to sell HTK and the development of HTK was fully transferred to Entropic in 1995 when the Entropic Cambridge Research Laboratory Ltd was established. HTK was sold by Entropic until 1999 when Microsoft bought Entropic. Microsoft has now licensed HTK back to CUED and is providing support so that CUED can redistribute HTK and provide development support via the HTK3 web site. See History of HTK for more details.

While Microsoft retains the copyright to the original HTK code, everybody is encouraged to make changes to the source code and contribute them for inclusion in HTK3.

Join the HTK Team at CUED

If you are interested in joining the HTK Team to work on software development or speech recognition algorithm research (either as an RA or PhD student) send an email with your CV to Phil Woodland <pcw@eng.cam.ac.uk>

Cambridge University Engineering Department (CUED) is currently able to offer a number of well-funded three year PhD research studentships. For more details please see the MIL Laboratory jobs page

Those interested in a research studentship working with the HTK team might also consider first applying for a place on the MPhil in Machine Learning, Speech and Language Technology http://www.mlsalt.eng.cam.ac.uk/

Current releases

HTK version 3.4.1 is the current stable release.

HTK version 3.5 beta is the most recent release.

Getting HTK

HTK is available for free download but you must first agree to this license. You must then register for a username and password which will allow you to download the HTK Book and source code. Registration is free but does require a valid e-mail address; your password for site access will be sent to this address.

HTK News

28 June 2016 (pcw):

We have released another beta(3) of HTK 3.5
This includes
- some minor bug fixes
- source code changes which to allow easier compilation on Windows (although we haven’t included a Visual Studio setup)
- proper integration of the RNNLM rescoring functions as discussed in the HTK Book for HTK 3.5 (alpha).
Thanks for the feedback and suggestions from various users.
We are still working on more substantial updates to HTK 3.5 to include further functionality.
Phil & the Cambridge HTK Team

31 December 2015 (pcw):

HTK 3.5 beta is released. This can be downloaded from the HTK downloads page. Note that the samples package is now included with the HTK 3.5 beta download. HDecode is still an additional download due to its separate license. HTK 3.4.1 continues to be available.

Key features of HTK 3.5 are described in the 24th August news item and the UK speech presentation and interspeech paper referenced there provide further background.

The HTK 3.5 beta source code package has been developed and tested for use on Linux. Only a simple build procedure is included which will require some manual configuration. A more automatic configuration will be available in future as well as support for other platforms. Compilation options include builds for standard CPU; use with Intel MKL libraries; and use with NVIDIA GPUs.

HTK 3.5 also includes a new version of the HTKBook. This is an alpha version of the book and so is in some places incomplete. The HTKBook for HTK 3.5 includes documentation of the new features of HTK including the new tools for acoustic modelling with neural networks and use of recurrent neural network language models. The book also includes extended tutorial information for using the new HTK features, and includes a new section of tutorial examples using the Resource Management task that illustrate new (and old) functionality. The scripts that are provided for this task may well be of use more generally.

In future we intend to both extend the functionality of HTK 3.5 with additional neural network models and also include recipes for standard current speech recognition tasks.

24 August 2015 (pcw):

We are currently preparing a new major release, HTK 3.5. The key features of HTK 3.5 are the inclusion of
- Built-in support for artificial neural network (ANN) models while maintaining compatibility with most existing functions.
  - Flexible input feature configurations
  - ANN structures can be any directed acyclic graph
  - Stochastic gradient descent supporting frame/sequence training
  - CPU/GPU math kernels for ANNs
  - Decoders extended to support both tandem and hybrid systems
- Support for decoding RNN language models
  - Lattice rescoring using RNNLMs
  - Class / Full word outputs, interpolation with n-grams
- 64-bit compatible throughout
- Bug fixes
- Updated documentation and examples
More details of our plans for HTK 3.5 can be found in
- Slides from the talk "An Overview of HTK V3.5" (invited talk given the UK Speech Conference, Jul 2015, UEA)
- Chao Zhang and Phil Woodland "A General Artificial Neural Network Extension for HTK" Proc. Interspeech 2015, Dresden.

13 March 2009 (mjfg):

HTK 3.4.1 has now been released
- Tutorial examples for discriminative training and HDecode have been added
- HDecode lattice generation has been improved
- Windows build procedure and recipes updated

13 March 2009 (xl):

You are warmly invited again this year to join us for an HTK meeting in Taipei during ICASSP 2009. The meeting will be held from 6pm to 8:30pm, Thursday, 23rd April in meeting room 203A of the Taipei International Convention Center (TICC). We will have a short presentation covering new features in the 3.4.1 release. Some examples of using the HTK large vocabulary decoder and discriminative training tools will also be shown. This is followed by an open discussion and networking. We will also provide some liquid refreshments.
Please feel free to forward this announcement to other researchers interested in HTK.
We hope to see many of you in Taipei.
Mark Gales, Andrew Liu, Anton Ragni and Phil Woodland

13 March 2007 (xl):

As many of you will know the Cambridge University speech group traditionally organises an HTK meeting at ICASSP. These meetings are intended to provide a forum for users of HTK and other researchers interested in speech recognition toolkits to exchange ideas and discuss future plans. We'd like to invite you to join us at this year's meeting in Hawaii.
The meeting will be held from 6pm to 10pm, Thursday, 19th April 2007 in the "Ilima" meeting room, Hotel Ala Moana (right cross Atkinson drive, opposite to the Hawaii convention center).
We will have a short presentation covering:
- overview of new features in the 3.4 release
- support of STC/HLDA linear transformations
- large vocabulary decoder
- discriminative training tools
followed by an open discussion and networking. We will also provide some liquid refreshments.
Please feel free to forward this announcement to other researchers interested in HTK.
We hope to see many of you in Honolulu.
Andrew Liu, Mark Gales and Phil Woodland

13 December 2006 (mjfg):

HTK 3.4 has now been released
- HMMIRest has been added as a tool for discriminative training
- HERest now supports estimation of HLDA and semi-tied transforms
- HDecode is a large vocabulary decoder available as an extension to HTK

23 February 2006 (jal):

ICASSP Tutorial: Recent advances in large vocabulary continuous speech recognition: An HTK perspective

25 July 2005 (jal):

HTK 3.3 has now been released.
- HERest now incorporates the adaptation transform generation that was previously performed in HEAdapt. The range of linear transformations and the ability to combine transforms hierarchically has now been included. The system also now supports adaptive training with contrained MLLR transforms
- Many other smaller changes and bug fixes have been integrated
- Build procedure for Linux/Unix now uses autoconf

02 June 2004 (ge):

A new version (1.4) of the ATK Real-Time API for HTK is available for download. For further information see the ATK page