aeneas Library Tutorial¶
Overview¶
Although a majority of aeneas
users work with the built-in command line tools,
aeneas
is primarily designed for being used as a Python library.
Even the aeneas.tools
can be used programmatically,
thanks to their standard I/O interface.
Example
Create a Task and process it, outputting the resulting sync map to file:
#!/usr/bin/env python
# coding=utf-8
from aeneas.executetask import ExecuteTask
from aeneas.task import Task
# create Task object
config_string = u"task_language=eng|is_text_type=plain|os_task_file_format=json"
task = Task(config_string=config_string)
task.audio_file_path_absolute = u"/path/to/input/audio.mp3"
task.text_file_path_absolute = u"/path/to/input/plain.txt"
task.sync_map_file_path_absolute = u"/path/to/output/syncmap.json"
# process Task
ExecuteTask(task).execute()
# output sync map to file
task.output_sync_map_file()
You can also use ExecuteTaskCLI
:
#!/usr/bin/env python
# coding=utf-8
from aeneas.tools.execute_task import ExecuteTaskCLI
ExecuteTaskCLI(use_sys=False).run(arguments=[
None, # dummy program name argument
u"/path/to/input/audio.mp3",
u"/path/to/input/plain.txt",
u"task_language=eng|is_text_type=plain|os_task_file_format=json",
u"/path/to/output/syncmap.json"
])
Clearly, you can also manipulate objects programmatically.
Example
Create a Task, process it, and print all fragments in the resulting sync map whose duration is less than five seconds:
#!/usr/bin/env python
# coding=utf-8
from aeneas.executetask import ExecuteTask
from aeneas.task import Task
# create Task object
config_string = u"task_language=eng|is_text_type=plain|os_task_file_format=json"
task = Task(config_string=config_string)
task.audio_file_path_absolute = u"/path/to/input/audio.mp3"
task.text_file_path_absolute = u"/path/to/input/plain.txt"
# process Task
ExecuteTask(task).execute()
# print fragments with a duration < 5 seconds
for fragment in task.sync_map_leaves():
if fragment.length < 5.0:
print(fragment)
Instead of passing around configuration strings, you can set properties explicitly, using the library functions and constants.
Example
Create a Task, process it, and print the resulting sync map:
#!/usr/bin/env python
# coding=utf-8
from aeneas.exacttiming import TimeValue
from aeneas.executetask import ExecuteTask
from aeneas.language import Language
from aeneas.syncmap import SyncMapFormat
from aeneas.task import Task
from aeneas.task import TaskConfiguration
from aeneas.textfile import TextFileFormat
import aeneas.globalconstants as gc
# create Task object
config = TaskConfiguration()
config[gc.PPN_TASK_LANGUAGE] = Language.ENG
config[gc.PPN_TASK_IS_TEXT_FILE_FORMAT] = TextFileFormat.PLAIN
config[gc.PPN_TASK_OS_FILE_FORMAT] = SyncMapFormat.JSON
task = Task()
task.configuration = config
task.audio_file_path_absolute = u"/path/to/input/audio.mp3"
task.text_file_path_absolute = u"/path/to/input/plain.txt"
# process Task
ExecuteTask(task).execute()
# print produced sync map
print(task.sync_map)
Dependencies¶
numpy
(v1.9 or later)lxml
(v3.6.0 or later)BeautifulSoup
(v4.5.1 or later)
Only numpy
is actually needed, as it is heavily used for the alignment computation.
The other two dependencies (lxml
and BeautifulSoup
) are needed
only if you use XML-like input or output formats.
However, since they are popular Python packages, to avoid complex import testing,
they are listed as requirements.
This choice might change in the future.
Depending on what aeneas
classes you want to use,
you might need to install the following optional dependencies:
boto3
(for using the AWS Polly TTS API wrapper)requests
(for using the Nuance TTS API wrapper)Pillow
(for plotting waveforms withplotter
)tgt
(for outputting sync maps to TextGrid format)youtube-dl
(for downloading audio from Internet withDownloader
)
Speeding Critical Sections Up: Python C/C++ Extensions¶
Forced alignment is a computationally demanding task, both CPU-intensive and memory-intensive. Aligning a dozen minutes of audio might require an hour if done with pure Python code.
Hence, critical sections of the alignment code are written as Python C/C++ extensions, that is, C/C++ code that receives input from Python code, performs the heavy computation, and returns results to the Python code. The rule of thumb is that the C/C++ code only perform “computation-like”, low-level functions, while “house-keeping”, high-level functions are done in Python land.
With this approach, aligning a dozen minutes of audio
requires only few seconds, and even aligning hours of audio
can be done in few minutes.
The drawback is that your environment must be able to compile
Python C/C++ extensions. If you install aeneas
via PyPI
(e.g., pip install aeneas
), the compilation step is done automatically for you.
Warning
Due to the Python C/C++ extension compile and setup mechanism,
you must install numpy
before installing aeneas
,
and there is no (sane) way for the aeneas
setup.py
to install numpy
before compiling the aeneas
source code.
Hence, you really need to (manually) install numpy
before installing aeneas
.
Hopefully this inconvenience will be removed in the future.
The Python C/C++ extensions included in aeneas
are:
aeneas.cdtw
, for computing the DTW;aeneas.cew
, for synthesizing text via theeSpeak
C API;aeneas.cfw
, for synthesizing text via theFestival
C++ API;aeneas.cmfcc
, for computing a MFCC representation of a WAVE (RIFF) audio file;aeneas.cwave
, for reading WAVE (RIFF) audio files.
Note
Currently aeneas.cew
is available on Linux, Mac OS X, and Windows.
On Windows 64 bit it does not seem to work, probably because
eSpeak is available only as a 32 bit program/library,
and hence aeneas
will fall back to run the pure Python code.
Starting with v1.5.0, the pure Python code
for synthesizing text with eSpeak via subprocess
is only 2-3 times slower than aeneas.cew
.
Unless you work with thousands of text fragments,
the performance difference is negligible.
Note
Currently aeneas.cfw
is experimental and disabled by default.
Probably it works only on Linux.
To compile it, make sure you have installed
the Festival
and speech_tools
libraries
(e.g., install the festival-dev
package on DEB-based OSes) and
set the environment variable
AENEAS_FORCE_CFW=True
before running pip install aeneas
or python setup.py
.
Note
Currently aeneas.cwave
is not used.
It will be enabled in a future version of aeneas
.
Concepts¶
Except for “enumeration” classes (e.g., TextFileFormat
) and
“data-only” classes (e.g., TextFragment
), most classes
are subclasses of Loggable
,
which provides the ability to log events using a shared
Logger
object (logger
),
and to inject runtime execution parameters using a shared
RuntimeConfiguration
object (rconf
).
The logger
can tee (i.e., store messages and print them to stdout)
or dump to file.
The rconf
provides a way to fine tune aeneas
by changing its internal behavior.
The library defaults should fine for most use cases,
and they do not require explicitly passing an rconf
object.
Example
Process a task with custom parameters, and log messages:
# create Logger which logs and tees
logger = Logger(tee=True)
# create RuntimeConfiguration object, with custom MFCC length and shift
rconf = RuntimeConfiguration()
rconf[RuntimeConfiguration.MFCC_WINDOW_LENGTH] = TimeValue(u"0.150")
rconf[RuntimeConfiguration.MFCC_WINDOW_SHIFT] = TimeValue(u"0.050")
# create Task object
task = ...
# process Task with custom parameters
ExecuteTask(task, rconf=rconf, logger=logger).execute()
If you read from/write to file, you should be fine
interacting only with Task
functions.
For example, setting a path in
audio_file_path_absolute()
(resp., text_file_path_absolute()
)
force the library to load the given file,
and to create a
AudioFile
(resp., TextFile
)
object behind the scenes, storing it inside the Task object.
However, you can also build e.g. your own
TextFile
and then assign it to your Task.
Example
Create a TextFile programmatically, and assign it to Task:
task = Task()
textfile = TextFile()
for identifier, frag_text in [
(u"f001", [u"first fragment"]),
(u"f002", [u"second fragment"]),
(u"f003", [u"third fragment"])
]:
textfile.add_fragment(TextFragment(identifier, Language.ENG, frag_text, frag_text))
task.text_file = textfile
Starting with v1.5.0, both TextFile
and SyncMap
are backed by the
Tree
structure, which can represent multilevel I/O files.
Both have a “virtual” (empty) root node, to which the “level 1” nodes
are attached.
Note that single-level text files and sync maps are a special case,
where only “level 1” nodes are present, producing a tree with a root node
and a list of children, effectively equivalent to the “list” structure pre-v1.5.0.
Miscellanea¶
- Ensuring that all the strings you pass to
aeneas
are Unicode strings will save you a lot of headaches. If you read from files, be sure they are encoded usingUTF-8
. - You can use any audio file format that is supported by
ffprobe
andffmpeg
. If unsure, just try to play them on your audio file on the console: if it works there, it should work insideaeneas
too. - Enumeration classes usually have an
ALLOWED_VALUE
class member, which lists all the allowed values. For example:ALLOWED_VALUES
. This list is used for example by the validator to check input values. - Most classes are optimized for reducing memory consumption.
For example, if you create an
AudioFileMFCC
with a file path, the input audio file will be converted to a temporary WAVE file, audio samples will be read into memory, MFCCs will be computed, and then audio data will be discarded from memory and the temporary WAVE file will be deleted, keeping only the MFCC matrix into memory. If you prefer persistence, you need to build intermediate objects yourself (i.e.,FFMPEGWrapper
,AudioFile
, etc.) and properly dispose of them in your code. - Wherever possible,
NumPy
views are used to avoid data copying. Similarly, built-inNumPy
functions are used to improve run time. - To avoid numerical issues, always use
TimeValue
to hold time values with arbitrary precision. Note that doing so incurs in a negligible execution slow down, because the heaviest computations are done with integerNumPy
indices and arrays and the transformation toTimeValue
takes place only when the sync map is output to file.
Package aeneas
¶
The main aeneas
package contains several subpackages:
aeneas.cdtw
(Python C extension)aeneas.cew
(Python C extension)aeneas.cfw
(Python C++ extension)aeneas.cmfcc
(Python C extension)aeneas.cwave
(Python C extension)aeneas.extra
aeneas.syncmap
aeneas.tests
aeneas.tools
aeneas.ttswrappers
and the following modules:
- adjustboundaryalgorithm
- analyzecontainer
- audiofile
- audiofilemfcc
- cewsubprocess
- configuration
- container
- diagnostics
- downloader
- dtw
- exacttiming
- executejob
- executetask
- ffmpegwrapper
- ffprobewrapper
- globalconstants
- globalfunctions
- hierarchytype
- idsortingalgorithm
- job
- language
- logger
- mfcc
- plotter
- runtimeconfiguration
- sd
- syncmap
- synthesizer
- task
- textfile
- vad
- validator
Package aeneas.extra
¶
The aeneas.extra
package contains some extra Python source files
which provide experimental and not officially supported functions,
mainly custom, not built-in TTS engine wrappers.
For example, if you want to write your own custom TTS engine wrapper,
have a look at the aeneas/extra/ctw_espeak.py
source file,
which is heavily commented and should be easy to modify for your own TTS engine.
Package aeneas.tests
¶
The aeneas.tests
package contains the unit test files for aeneas
.
Resources needed to run the tests,
for example audio and text files,
are located in the aeneas/tests/res/
directory.
Package aeneas.tools
¶
The aeneas.tools
package contains the built-in command line tools for aeneas
.
The two main tools are:
aeneas.tools.execute_job
aeneas.tools.execute_task
which are described in the aeneas Built-in Command Line Tools Tutorial.
Moreover, the aeneas.tools
package also contains the following programs,
useful for debugging or converting between different file formats:
aeneas.tools.convert_syncmap
: convert a sync map from a format to anotheraeneas.tools.download
: download a file from a Web resource (currently, audio from a YouTube video)aeneas.tools.extract_mfcc
: extract MFCCs from a monoaural WAVE fileaeneas.tools.ffmpeg_wrapper
: a wrapper aroundffmpeg
aeneas.tools.ffprobe_wrapper
: a wrapper aroundffprobe
aeneas.tools.plot_waveform
: plot a waveform and sets of labels to fileaeneas.tools.read_audio
: read the properties of an audio fileaeneas.tools.read_text
: read a text file and show the extracted text fragmentsaeneas.tools.run_sd
: read an audio file and the corresponding text file and detect the audio head/tailaeneas.tools.run_vad
: read an audio file and compute speech/nonspeech time intervalsaeneas.tools.synthesize_text
: synthesize several text fragments read from file into a single wav fileaeneas.tools.validate
: validate a job container or configuration strings/files
Run each program without arguments to get its help manual and usage examples.
Resources needed to run the live examples,
for example audio and text files,
are located in the aeneas/tools/res/
directory.
The package also contains the aeneas.tools.hydra
script,
which can run any of the tools listed above.
Run it without arguments to get its manual.
Package aeneas.ttswrappers
¶
The aeneas.ttswrappers
package contains the wrappers for
several built-in TTS engines which can be used
in the synthesis step of the alignment procedure.