In this paper, by
AI FrontierOriginal, original
Wav2letter, Facebook’s open source, end-to-end automated speech recognition system


Translator | Xue Mingdeng


Edit | Natalie

Wav2letter is a simple and efficient end-to-end automatic speech recognition system developed by The Facebook AI research team. An end-to-end convnet-based Speed Recognition System and letter-based Speech Recognition with Gated ConvNets.

Thesis Address:

(1) WavLetter: an end-to-end convnet-based Speed Recognition System

Arxiv.org/abs/1609.03…

(2) letter-based Speech Recognition with Gated ConvNets

Arxiv.org/abs/1712.09…


Paper quoted

If wav2Letter or a related pre-training model is to be used, please cite the following two papers.


Operation requirements

  • A computer running MacOS or Linux.
  • The Torch framework.
  • Training on CPU: Intel MKL.
  • Training on GPU: NVIDIA CUDA Toolkit (cuDNN V5.1 for CUDA 8.0).
  • Read the sound file: Libsndfile.
  • Standard voice feature: FFTW.

The installation

MKL

If you plan to train on a CPU, it is highly recommended to install Intel MKL.

Modify the. Bashrc file and add the following content:

LuaJIT and LuaRocks

Now install LuaJIT and LuaRocks in the $HOME/user directory. If you want to install at the system level, remove the DCMAKE_INSTALL_PREFIX=$HOME/user option.

Let’s assume that luarocks and luajit are already included in $PATH. If not, call luarocks and luajit with the correct PATH. Assuming they are installed in the $HOME/usr directory, call ~/usr/bin/luarocks and ~/usr/bin/luajit.

KenLM language modeling tool

To use the Wav2Letter decoder, you need to install KenLM, which requires Boost.


OpenMPI and TorchMPI

OpenMPI and TorchMPI are needed if you are training your model with multiple cpus/Gpus (or machines).

Tip: It is recommended to recompile OpenMPI, as the standard distribution uses a large number of variable compile tags, some of which are critical to running TorchMPI successfully.

Install OpenMPI first:

Note: Openmpi-3.0.0.tar. bz2 can be used, but the -enable-mpi-thread-multiple option needs to be removed.

TorchMPI can now be installed:

Torch and other libraries

Wav2letter package


Train the Wav2letter model

Data preprocessing

The data directory contains scripts for preprocessing various data sets, and currently only LibriSpeech and TIMIT are available.

Here is an example of how to preprocess the LibriSpeech ASR corpus:

training

Train on multiple Gpus

To start multiple training processes using OpenMPI, running one per GPU, we assume mpirun is already included in the $PATH PATH.

Run decoder (inference)

Before running the decoder, we need to do some pre-processing.

First, create an alphabetic dictionary that includes the repeated letters used in the Wav2Letter.

So we have the language model and we preprocess it. Here we use the pre-trained LibriSpeech language model, but of course we can also use KenLM to train other models. We then convert the words to lowercase and generate phonetic symbols for them. The preprocessing script may warn you that some words will not be correctly generated due to repeated letters (in this case, 2 because of the -r 2 argument), but this is rare enough that it doesn’t seem to be a problem to us.

Note: we could also use 4gram.arpa.gz, but it takes longer to preprocess.

Additional steps: Converting the model to a two-level file using KenLM will speed up subsequent loading, assuming KenLM is already included in $PATH:

Now let’s run test.lua on the dataset. The script also displays the Letter Error Rate and Word Error Rate — the latter is not calculated using a post-processed acoustic model.

Next run the decoder:


Pretraining model

We provide a pre-training model for LibriSpeech:

To use the pre-trained model, you need to perform the installation and decoding steps mentioned in the README.

Note: The pre-training model is trained on Facebook’s infrastructure, so you need to pass appropriate parameters to test.lua at runtime:

For more content, you can follow AI Front, ID: AI-front, reply “AI”, “TF”, “big Data” to get AI Front series PDF mini-book and skill Map.