.. _libtutorial: aeneas Library Tutorial ======================= Overview ~~~~~~~~ Although a majority of ``aeneas`` users work with the built-in command line tools, ``aeneas`` is primarily designed for being used as a Python library. Even the ``aeneas.tools`` can be used programmatically, thanks to their standard I/O interface. .. Topic:: Example Create a Task and process it, outputting the resulting sync map to file: .. code-block:: python #!/usr/bin/env python # coding=utf-8 from aeneas.executetask import ExecuteTask from aeneas.task import Task # create Task object config_string = u"task_language=eng|is_text_type=plain|os_task_file_format=json" task = Task(config_string=config_string) task.audio_file_path_absolute = u"/path/to/input/audio.mp3" task.text_file_path_absolute = u"/path/to/input/plain.txt" task.sync_map_file_path_absolute = u"/path/to/output/syncmap.json" # process Task ExecuteTask(task).execute() # output sync map to file task.output_sync_map_file() You can also use :class:`~aeneas.tools.execute_task.ExecuteTaskCLI`: .. code-block:: python #!/usr/bin/env python # coding=utf-8 from aeneas.tools.execute_task import ExecuteTaskCLI ExecuteTaskCLI(use_sys=False).run(arguments=[ None, # dummy program name argument u"/path/to/input/audio.mp3", u"/path/to/input/plain.txt", u"task_language=eng|is_text_type=plain|os_task_file_format=json", u"/path/to/output/syncmap.json" ]) Clearly, you can also manipulate objects programmatically. .. Topic:: Example Create a Task, process it, and print all fragments in the resulting sync map whose duration is less than five seconds: .. code-block:: python #!/usr/bin/env python # coding=utf-8 from aeneas.executetask import ExecuteTask from aeneas.task import Task # create Task object config_string = u"task_language=eng|is_text_type=plain|os_task_file_format=json" task = Task(config_string=config_string) task.audio_file_path_absolute = u"/path/to/input/audio.mp3" task.text_file_path_absolute = u"/path/to/input/plain.txt" # process Task ExecuteTask(task).execute() # print fragments with a duration < 5 seconds for fragment in task.sync_map_leaves(): if fragment.length < 5.0: print(fragment) Instead of passing around configuration strings, you can set properties explicitly, using the library functions and constants. .. Topic:: Example Create a Task, process it, and print the resulting sync map: .. code-block:: python #!/usr/bin/env python # coding=utf-8 from aeneas.exacttiming import TimeValue from aeneas.executetask import ExecuteTask from aeneas.language import Language from aeneas.syncmap import SyncMapFormat from aeneas.task import Task from aeneas.task import TaskConfiguration from aeneas.textfile import TextFileFormat import aeneas.globalconstants as gc # create Task object config = TaskConfiguration() config[gc.PPN_TASK_LANGUAGE] = Language.ENG config[gc.PPN_TASK_IS_TEXT_FILE_FORMAT] = TextFileFormat.PLAIN config[gc.PPN_TASK_OS_FILE_FORMAT] = SyncMapFormat.JSON task = Task() task.configuration = config task.audio_file_path_absolute = u"/path/to/input/audio.mp3" task.text_file_path_absolute = u"/path/to/input/plain.txt" # process Task ExecuteTask(task).execute() # print produced sync map print(task.sync_map) Dependencies ------------ * ``numpy`` (v1.9 or later) * ``lxml`` (v3.6.0 or later) * ``BeautifulSoup`` (v4.5.1 or later) Only ``numpy`` is actually needed, as it is heavily used for the alignment computation. The other two dependencies (``lxml`` and ``BeautifulSoup``) are needed only if you use XML-like input or output formats. However, since they are popular Python packages, to avoid complex import testing, they are listed as requirements. This choice might change in the future. Depending on what ``aeneas`` classes you want to use, you might need to install the following optional dependencies: * ``boto3`` (for using the AWS Polly TTS API wrapper) * ``requests`` (for using the Nuance TTS API wrapper) * ``Pillow`` (for plotting waveforms with :mod:`~aeneas.plotter`) * ``tgt`` (for outputting sync maps to TextGrid format) * ``youtube-dl`` (for downloading audio from Internet with :class:`~aeneas.downloader.Downloader`) Speeding Critical Sections Up: Python C/C++ Extensions ------------------------------------------------------ Forced alignment is a computationally demanding task, both CPU-intensive and memory-intensive. Aligning a dozen minutes of audio might require an hour if done with pure Python code. Hence, critical sections of the alignment code are written as Python C/C++ extensions, that is, C/C++ code that receives input from Python code, performs the heavy computation, and returns results to the Python code. The rule of thumb is that the C/C++ code only perform "computation-like", low-level functions, while "house-keeping", high-level functions are done in Python land. With this approach, aligning a dozen minutes of audio requires only few seconds, and even aligning hours of audio can be done in few minutes. The drawback is that your environment must be able to compile Python C/C++ extensions. If you install ``aeneas`` via ``PyPI`` (e.g., ``pip install aeneas``), the compilation step is done automatically for you. .. warning:: Due to the Python C/C++ extension compile and setup mechanism, you must install ``numpy`` before installing ``aeneas``, and there is no (sane) way for the ``aeneas`` ``setup.py`` to install ``numpy`` before compiling the ``aeneas`` source code. Hence, you really need to (manually) install ``numpy`` before installing ``aeneas``. Hopefully this inconvenience will be removed in the future. The Python C/C++ extensions included in ``aeneas`` are: .. toctree:: :maxdepth: 3 cdtw cew cfw cmfcc cwave * :mod:`aeneas.cdtw`, for computing the DTW; * :mod:`aeneas.cew`, for synthesizing text via the ``eSpeak`` C API; * :mod:`aeneas.cfw`, for synthesizing text via the ``Festival`` C++ API; * :mod:`aeneas.cmfcc`, for computing a MFCC representation of a WAVE (RIFF) audio file; * :mod:`aeneas.cwave`, for reading WAVE (RIFF) audio files. .. note:: Currently :mod:`aeneas.cew` is available on Linux, Mac OS X, and Windows. On Windows 64 bit it does not seem to work, probably because eSpeak is available only as a 32 bit program/library, and hence ``aeneas`` will fall back to run the pure Python code. Starting with v1.5.0, the pure Python code for synthesizing text with eSpeak via ``subprocess`` is only 2-3 times slower than :mod:`aeneas.cew`. Unless you work with thousands of text fragments, the performance difference is negligible. .. note:: Currently :mod:`aeneas.cfw` is experimental and disabled by default. Probably it works only on Linux. To compile it, make sure you have installed the ``Festival`` and ``speech_tools`` libraries (e.g., install the ``festival-dev`` package on DEB-based OSes) and set the environment variable ``AENEAS_FORCE_CFW=True`` before running ``pip install aeneas`` or ``python setup.py``. .. note:: Currently :mod:`aeneas.cwave` is not used. It will be enabled in a future version of ``aeneas``. Concepts -------- Except for "enumeration" classes (e.g., :class:`~aeneas.textfile.TextFileFormat`) and "data-only" classes (e.g., :class:`~aeneas.textfile.TextFragment`), most classes are subclasses of :class:`~aeneas.logger.Loggable`, which provides the ability to log events using a shared :class:`~aeneas.logger.Logger` object (``logger``), and to inject runtime execution parameters using a shared :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration` object (``rconf``). The ``logger`` can tee (i.e., store messages and print them to stdout) or dump to file. The ``rconf`` provides a way to fine tune ``aeneas`` by changing its internal behavior. The library defaults should fine for most use cases, and they do not require explicitly passing an ``rconf`` object. .. Topic:: Example Process a task with custom parameters, and log messages: .. code-block:: python # create Logger which logs and tees logger = Logger(tee=True) # create RuntimeConfiguration object, with custom MFCC length and shift rconf = RuntimeConfiguration() rconf[RuntimeConfiguration.MFCC_WINDOW_LENGTH] = TimeValue(u"0.150") rconf[RuntimeConfiguration.MFCC_WINDOW_SHIFT] = TimeValue(u"0.050") # create Task object task = ... # process Task with custom parameters ExecuteTask(task, rconf=rconf, logger=logger).execute() If you read from/write to file, you should be fine interacting only with :class:`~aeneas.task.Task` functions. For example, setting a path in :func:`~aeneas.task.Task.audio_file_path_absolute` (resp., :func:`~aeneas.task.Task.text_file_path_absolute`) force the library to load the given file, and to create a :class:`~aeneas.audiofile.AudioFile` (resp., :class:`~aeneas.textfile.TextFile`) object behind the scenes, storing it inside the Task object. However, you can also build e.g. your own :class:`~aeneas.textfile.TextFile` and then assign it to your Task. .. Topic:: Example Create a TextFile programmatically, and assign it to Task: .. code-block:: python task = Task() textfile = TextFile() for identifier, frag_text in [ (u"f001", [u"first fragment"]), (u"f002", [u"second fragment"]), (u"f003", [u"third fragment"]) ]: textfile.add_fragment(TextFragment(identifier, Language.ENG, frag_text, frag_text)) task.text_file = textfile Starting with v1.5.0, both :class:`~aeneas.textfile.TextFile` and :class:`~aeneas.syncmap.SyncMap` are backed by the :class:`~aeneas.tree.Tree` structure, which can represent multilevel I/O files. Both have a "virtual" (empty) root node, to which the "level 1" nodes are attached. Note that single-level text files and sync maps are a special case, where only "level 1" nodes are present, producing a tree with a root node and a list of children, effectively equivalent to the "list" structure pre-v1.5.0. Miscellanea ----------- * Ensuring that all the strings you pass to ``aeneas`` are Unicode strings will save you a lot of headaches. If you read from files, be sure they are encoded using ``UTF-8``. * You can use any audio file format that is supported by ``ffprobe`` and ``ffmpeg``. If unsure, just try to play them on your audio file on the console: if it works there, it should work inside ``aeneas`` too. * Enumeration classes usually have an ``ALLOWED_VALUE`` class member, which lists all the allowed values. For example: :data:`~aeneas.textfile.TextFileFormat.ALLOWED_VALUES`. This list is used for example by the validator to check input values. * Most classes are optimized for reducing memory consumption. For example, if you create an :class:`~aeneas.audiofilemfcc.AudioFileMFCC` with a file path, the input audio file will be converted to a temporary WAVE file, audio samples will be read into memory, MFCCs will be computed, and then audio data will be discarded from memory and the temporary WAVE file will be deleted, keeping only the MFCC matrix into memory. If you prefer persistence, you need to build intermediate objects yourself (i.e., :class:`~aeneas.ffmpegwrapper.FFMPEGWrapper`, :class:`~aeneas.audiofile.AudioFile`, etc.) and properly dispose of them in your code. * Wherever possible, ``NumPy`` views are used to avoid data copying. Similarly, built-in ``NumPy`` functions are used to improve run time. * To avoid numerical issues, always use :class:`~aeneas.exacttiming.TimeValue` to hold time values with arbitrary precision. Note that doing so incurs in a negligible execution slow down, because the heaviest computations are done with integer ``NumPy`` indices and arrays and the transformation to :class:`~aeneas.exacttiming.TimeValue` takes place only when the sync map is output to file. Package ``aeneas`` ~~~~~~~~~~~~~~~~~~ The main ``aeneas`` package contains several subpackages: * :mod:`aeneas.cdtw` (Python C extension) * :mod:`aeneas.cew` (Python C extension) * :mod:`aeneas.cfw` (Python C++ extension) * :mod:`aeneas.cmfcc` (Python C extension) * :mod:`aeneas.cwave` (Python C extension) * :mod:`aeneas.extra` * :mod:`aeneas.syncmap` * :mod:`aeneas.tests` * :mod:`aeneas.tools` * :mod:`aeneas.ttswrappers` and the following modules: .. toctree:: :maxdepth: 3 adjustboundaryalgorithm analyzecontainer audiofile audiofilemfcc cewsubprocess configuration container diagnostics downloader dtw exacttiming executejob executetask ffmpegwrapper ffprobewrapper globalconstants globalfunctions hierarchytype idsortingalgorithm job language logger mfcc plotter runtimeconfiguration sd syncmap synthesizer task textfile vad validator Package ``aeneas.extra`` ~~~~~~~~~~~~~~~~~~~~~~~~ The ``aeneas.extra`` package contains some extra Python source files which provide **experimental** and **not officially supported** functions, mainly custom, not built-in TTS engine wrappers. For example, if you want to write your own custom TTS engine wrapper, have a look at the ``aeneas/extra/ctw_espeak.py`` source file, which is heavily commented and should be easy to modify for your own TTS engine. Package ``aeneas.tests`` ~~~~~~~~~~~~~~~~~~~~~~~~ The ``aeneas.tests`` package contains the **unit test** files for ``aeneas``. Resources needed to run the tests, for example audio and text files, are located in the ``aeneas/tests/res/`` directory. .. _libtutorial_tools: Package ``aeneas.tools`` ~~~~~~~~~~~~~~~~~~~~~~~~ The ``aeneas.tools`` package contains the built-in command line tools for ``aeneas``. The two main tools are: * ``aeneas.tools.execute_job`` * ``aeneas.tools.execute_task`` which are described in the :ref:`clitutorial`. Moreover, the ``aeneas.tools`` package also contains the following programs, useful for debugging or converting between different file formats: * ``aeneas.tools.convert_syncmap``: convert a sync map from a format to another * ``aeneas.tools.download``: download a file from a Web resource (currently, audio from a YouTube video) * ``aeneas.tools.extract_mfcc``: extract MFCCs from a monoaural WAVE file * ``aeneas.tools.ffmpeg_wrapper``: a wrapper around ``ffmpeg`` * ``aeneas.tools.ffprobe_wrapper``: a wrapper around ``ffprobe`` * ``aeneas.tools.plot_waveform``: plot a waveform and sets of labels to file * ``aeneas.tools.read_audio``: read the properties of an audio file * ``aeneas.tools.read_text``: read a text file and show the extracted text fragments * ``aeneas.tools.run_sd``: read an audio file and the corresponding text file and detect the audio head/tail * ``aeneas.tools.run_vad``: read an audio file and compute speech/nonspeech time intervals * ``aeneas.tools.synthesize_text``: synthesize several text fragments read from file into a single wav file * ``aeneas.tools.validate``: validate a job container or configuration strings/files Run each program without arguments to get its help manual and usage examples. Resources needed to run the live examples, for example audio and text files, are located in the ``aeneas/tools/res/`` directory. The package also contains the ``aeneas.tools.hydra`` script, which can run any of the tools listed above. Run it without arguments to get its manual. Package ``aeneas.ttswrappers`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The ``aeneas.ttswrappers`` package contains the wrappers for several built-in **TTS engines** which can be used in the synthesis step of the alignment procedure. .. toctree:: :maxdepth: 3 awsttswrapper basettswrapper espeakttswrapper espeakngttswrapper festivalttswrapper macosttswrapper nuancettswrapper