aeneas Built-in Command Line Tools Tutorial

This tutorial explains how to process tasks and jobs with the command line tools aeneas.tools.execute_task and aeneas.tools.execute_job.

(If you are interested in using aeneas as a Python package in your own application, please consult the aeneas Library Tutorial.)

Processing Tasks

First, we need some definitions:

Audio File

An audio file is a file on disk containing audio data, usually text narrated by a human being. The audio format can be any of those supported by ffprobe and ffmpeg, including: FLAC, MP3, MP4/AAC, OGG, WAVE, etc.

Example: /home/rb/audio.mp3

Text File

A text file is a file on disk containing the textual data to be aligned with a matching audio file. The format of the text file can be any format listed in ALLOWED_VALUES. The contents of the text file define, explicitly or implicity, a segmentation of the entire text into fragments, which can have arbitrary granularity (paragraph, sentence, sub-sentence, word, etc.), can be nested in a hierarchical structure, can consist of multiple lines, and can be associated to unique identifiers. Certain input formats require the user to specify additional parameters to parse the input file.

Example of a text file /home/rb/text.txt in PLAIN format, with three fragments:

Text of the first fragment
Text of the second fragment
Text of the third fragment

Sync Map File

A sync map file is a file on disk which expresses the correspondence between an audio file and a text file. Specifically, for each fragment in the text file, it declares a time interval in the audio file where the text of the fragment is spoken. The actual format of the sync map file depends on the intended application. Available formats are listed in ALLOWED_VALUES. Text fragments can be represented by the full text and/or by their unique idenfiers.

Example of a sync map file in CSV format:

f001,0.000,1.234,First fragment text
f002,1.234,5.678,Second fragment text
f003,5.678,7.890,Third fragment text

Task

A Task is a triple (audio file, text file, parameters). When a task is processed (executed), a sync map is computed for the given audio and text files. The parameters control how the alignment is computed, for example:

  • specifying the language and the format of the input text;
  • setting the format of the sync map file to be output;
  • excluding the head/tail of the audio file because they contain speech not present in the text;
  • modifying the time step of the aligner;
  • etc.

Example (continued):

  • audio file: /home/rb/audio.mp3
  • text file: /home/rb/text.txt
  • parameters:
    • text in PLAIN format
    • language is ENGLISH
    • output in JSON format

The aeneas.tools.execute_task tool processes a Task and writes the corresponding sync map to file. Therefore, it requires at least four arguments:

  • the path of the input audio file;
  • the path of the input text file;
  • the parameters, formatted as a key1=value1|key2=value2|...|keyN=valueN string;
  • the path of the sync map to be created.

Showing Help Messages

If you execute the program without arguments, it will print the following help message:

$ python -m aeneas.tools.execute_task

NAME
  execute_task - Execute a Task.

SYNOPSIS
  python -m aeneas.tools.execute_task [-h|--help|--help-rconf|--version]
  python -m aeneas.tools.execute_task --list-parameters
  python -m aeneas.tools.execute_task --list-values[=PARAM]
  python -m aeneas.tools.execute_task AUDIO_FILE  TEXT_FILE CONFIG_STRING OUTPUT_FILE [OPTIONS]
  python -m aeneas.tools.execute_task YOUTUBE_URL TEXT_FILE CONFIG_STRING OUTPUT_FILE -y [OPTIONS]

OPTIONS
  --faster-rate : print fragments with rate > task_adjust_boundary_rate_value
  --help : print full help and exit
  --help-rconf : list all runtime configuration parameters
  --keep-audio : do not delete the audio file downloaded from YouTube (-y only)
  --largest-audio : download largest audio stream (-y only)
  --list-parameters : list all parameters
  --list-values : list all parameters for which values can be listed
  --list-values=PARAM : list all allowed values for parameter PARAM
  --output-html : output HTML file for fine tuning
  --presets-word : apply presets for word-level alignment (MFCC masking)
  --rate : print rate of each fragment
  --skip-validator : do not validate the given config string
  --version : print the program name and version and exit
  --zero : print fragments with zero duration
  -h : print short help and exit
  -l[=FILE], --log[=FILE] : log verbose output to tmp file or FILE if specified
  -r=CONF, --runtime-configuration=CONF : apply runtime configuration CONF
  -v, --verbose : verbose output
  -vv, --very-verbose : verbose output, print date/time values
  -y, --youtube : download audio from YouTube video

EXAMPLES
  python -m aeneas.tools.execute_task --examples
  python -m aeneas.tools.execute_task --examples-all

If you pass the --help argument, it will print a slightly more verbose version:

$ python -m aeneas.tools.execute_task --help

NAME
  execute_task - Execute a Task.

SYNOPSIS
  python -m aeneas.tools.execute_task [-h|--help|--help-rconf|--version]
  python -m aeneas.tools.execute_task --list-parameters
  python -m aeneas.tools.execute_task --list-values[=PARAM]
  python -m aeneas.tools.execute_task AUDIO_FILE  TEXT_FILE CONFIG_STRING OUTPUT_FILE [OPTIONS]
  python -m aeneas.tools.execute_task YOUTUBE_URL TEXT_FILE CONFIG_STRING OUTPUT_FILE -y [OPTIONS]

OPTIONS
  --faster-rate : print fragments with rate > task_adjust_boundary_rate_value
  --help : print full help and exit
  --help-rconf : list all runtime configuration parameters
  --keep-audio : do not delete the audio file downloaded from YouTube (-y only)
  --largest-audio : download largest audio stream (-y only)
  --list-parameters : list all parameters
  --list-values : list all parameters for which values can be listed
  --list-values=PARAM : list all allowed values for parameter PARAM
  --output-html : output HTML file for fine tuning
  --presets-word : apply presets for word-level alignment (MFCC masking)
  --rate : print rate of each fragment
  --skip-validator : do not validate the given config string
  --version : print the program name and version and exit
  --zero : print fragments with zero duration
  -h : print short help and exit
  -l[=FILE], --log[=FILE] : log verbose output to tmp file or FILE if specified
  -r=CONF, --runtime-configuration=CONF : apply runtime configuration CONF
  -v, --verbose : verbose output
  -vv, --very-verbose : verbose output, print date/time values
  -y, --youtube : download audio from YouTube video

EXAMPLES
  python -m aeneas.tools.execute_task --examples
  python -m aeneas.tools.execute_task --examples-all

EXIT CODES
  0 : no error
  1 : error
  2 : help shown, no command run

AUTHOR
  Alberto Pettarin, http://www.albertopettarin.it/

REPORTING BUGS
  Please use the GitHub Issues Web page : https://github.com/ReadBeyond/aeneas/issues/

COPYRIGHT
  2012-2016, Alberto Pettarin and ReadBeyond Srl
  This software is available under the terms of the GNU Affero General Public License Version 3

SEE ALSO
  Code repository  : https://github.com/ReadBeyond/aeneas/
  Documentation    : http://www.readbeyond.it/aeneas/docs/
  Project Web page : http://www.readbeyond.it/aeneas/

Showing And Running Built-In Examples

aeneas includes some example input files which cover common use cases, enabling the user to run live examples. To list them, pass the --examples switch:

$ python -m aeneas.tools.execute_task --examples

Example 1 (input: plain text, output: EAF)
  $ python -m aeneas.tools.execute_task --example-eaf

Example 2 (input: plain text, output: JSON)
  $ python -m aeneas.tools.execute_task --example-json

Example 3 (input: multilevel plain text (mplain), output: SMIL)
  $ python -m aeneas.tools.execute_task --example-mplain-smil

Example 4 (input: multilevel unparsed text (munparsed), output: SMIL)
  $ python -m aeneas.tools.execute_task --example-munparsed-smil

Example 5 (input: unparsed text, output: SMIL)
  $ python -m aeneas.tools.execute_task --example-smil

Example 6 (input: subtitles text, output: SRT)
  $ python -m aeneas.tools.execute_task --example-srt

Example 7 (input: parsed text, output: TextGrid)
  $ python -m aeneas.tools.execute_task --example-textgrid

Example 8 (input: parsed text, output: TSV)
  $ python -m aeneas.tools.execute_task --example-tsv

Example 9 (input: single word granularity plain text, output: AUD)
  $ python -m aeneas.tools.execute_task --example-words

Example 10 (input: audio from YouTube, output: TXT)
  $ python -m aeneas.tools.execute_task --example-youtube

Similarly, the --examples-all switch prints a list of more than twenty built-in examples, covering more specific input/output/parameter combinations.

$ python -m aeneas.tools.execute_task --examples-all

Example 1 (input: plain text (plain), output: AUD, aba beforenext 0.200)
  $ python -m aeneas.tools.execute_task --example-aftercurrent

Example 2 (input: plain text (plain), output: AUD, aba beforenext 0.200)
  $ python -m aeneas.tools.execute_task --example-beforenext

Example 3 (input: plain text, output: TSV, run via cewsubprocess)
  $ python -m aeneas.tools.execute_task --example-cewsubprocess

Example 4 (input: plain text, output: TSV, tts engine: ctw espeak)
  $ python -m aeneas.tools.execute_task --example-ctw-espeak

Example 5 (input: plain text, output: TSV, tts engine: ctw speect)
  $ python -m aeneas.tools.execute_task --example-ctw-speect

Example 6 (input: plain text, output: EAF)
  $ python -m aeneas.tools.execute_task --example-eaf

Example 7 (input: plain text (plain), output: SRT, print faster than 12.0 chars/s)
  $ python -m aeneas.tools.execute_task --example-faster-rate

Example 8 (input: plain text, output: TSV, tts engine: Festival)
  $ python -m aeneas.tools.execute_task --example-festival

Example 9 (input: mplain text (multilevel), output: JSON, levels to output: 1 and 2)
  $ python -m aeneas.tools.execute_task --example-flatten-12

Example 10 (input: mplain text (multilevel), output: JSON, levels to output: 2)
  $ python -m aeneas.tools.execute_task --example-flatten-2

Example 11 (input: mplain text (multilevel), output: JSON, levels to output: 3)
  $ python -m aeneas.tools.execute_task --example-flatten-3

Example 12 (input: plain text, output: TSV, explicit head and tail)
  $ python -m aeneas.tools.execute_task --example-head-tail

Example 13 (input: plain text, output: JSON)
  $ python -m aeneas.tools.execute_task --example-json

Example 14 (input: multilevel plain text (mplain), output: JSON)
  $ python -m aeneas.tools.execute_task --example-mplain-json

Example 15 (input: multilevel plain text (mplain), output: SMIL)
  $ python -m aeneas.tools.execute_task --example-mplain-smil

Example 16 (input: multilevel plain text (mplain), different TTS engines, output: JSON)
  $ python -m aeneas.tools.execute_task --example-multilevel-tts

Example 17 (input: multilevel unparsed text (munparsed), output: JSON)
  $ python -m aeneas.tools.execute_task --example-munparsed-json

Example 18 (input: multilevel unparsed text (munparsed), output: SMIL)
  $ python -m aeneas.tools.execute_task --example-munparsed-smil

Example 19 (input: plain text, output: JSON, resolution: 0.500 s)
  $ python -m aeneas.tools.execute_task --example-mws

Example 20 (input: multilevel plain text (mplain), output: JSON, no zero duration)
  $ python -m aeneas.tools.execute_task --example-no-zero

Example 21 (input: plain text (plain), output: AUD, aba offset 0.200)
  $ python -m aeneas.tools.execute_task --example-offset

Example 22 (input: plain text (plain), output: AUD, aba percent 50)
  $ python -m aeneas.tools.execute_task --example-percent

Example 23 (input: plain text, output: JSON, pure python)
  $ python -m aeneas.tools.execute_task --example-py

Example 24 (input: plain text (plain), output: SRT, max rate 14.0 chars/s, print rates)
  $ python -m aeneas.tools.execute_task --example-rate

Example 25 (input: plain text (plain), output: SRT, remove nonspeech >=0.500 s)
  $ python -m aeneas.tools.execute_task --example-remove-nonspeech

Example 26 (input: plain text (plain), output: SRT, remove nonspeech >=0.500 s, max rate 14.0 chars/s, print rates)
  $ python -m aeneas.tools.execute_task --example-remove-nonspeech-rateaggressive

Example 27 (input: plain text (plain), output: AUD, replace nonspeech >=0.500 s with (sil))
  $ python -m aeneas.tools.execute_task --example-replace-nonspeech

Example 28 (input: plain text, output: TSV, head/tail detection)
  $ python -m aeneas.tools.execute_task --example-sd

Example 29 (input: unparsed text, output: SMIL)
  $ python -m aeneas.tools.execute_task --example-smil

Example 30 (input: subtitles text, output: SRT)
  $ python -m aeneas.tools.execute_task --example-srt

Example 31 (input: parsed text, output: TextGrid)
  $ python -m aeneas.tools.execute_task --example-textgrid

Example 32 (input: parsed text, output: TSV)
  $ python -m aeneas.tools.execute_task --example-tsv

Example 33 (input: single word granularity plain text, output: AUD)
  $ python -m aeneas.tools.execute_task --example-words

Example 34 (input: single word granularity plain text, output: AUD, tts engine: Festival, TTS cache on)
  $ python -m aeneas.tools.execute_task --example-words-festival-cache

Example 35 (input: mplain text (multilevel), output: AUD, levels to output: 3)
  $ python -m aeneas.tools.execute_task --example-words-multilevel

Example 36 (input: audio from YouTube, output: TXT)
  $ python -m aeneas.tools.execute_task --example-youtube

Running a built-in example can help learning quickly all the options/parameters available in aeneas.

For example, passing the --example-json switch will produce:

$ python -m aeneas.tools.execute_task --example-json

[INFO] Running example task with arguments:
  Audio file:    aeneas/tools/res/audio.mp3
  Text file:     aeneas/tools/res/plain.txt
  Config string: task_language=eng|is_text_type=plain|os_task_file_format=json
  Sync map file: output/sonnet.json
[INFO] Creating task...
[INFO] Creating task... done
[INFO] Executing task...
[INFO] Executing task... done
[INFO] Creating output sync map file...
[INFO] Creating output sync map file... done
[INFO] Created file 'output/sonnet.json'

Warning

If the above command generates an error, be sure to have a directory named output in your current working directory. If one does not exist, create it.

As you can see in the example above, built-in examples will print the command line arguments they shortcut. Therefore, the example above is essentially equivalent to:

$ python -m aeneas.tools.execute_task aeneas/tools/res/audio.mp3 aeneas/tools/res/plain.txt "task_language=eng|is_text_type=plain|os_task_file_format=json" output/sonnet.json

[INFO] Validating config string (specify --skip-validator to bypass)...
[INFO] Validating config string... done
[INFO] Creating task...
[INFO] Creating task... done
[INFO] Executing task...
[INFO] Executing task... done
[INFO] Creating output sync map file...
[INFO] Creating output sync map file... done
[INFO] Created file 'output/sonnet.json'

Note

There is a formal difference: when running an example, no validation of the input files and parameters is performed. In fact, by default they are validated using a Validator object, created and run automatically for you. If a validation error occurs, the execution of the Task does not begin. You can override this safety check with the --skip-validator switch.

In both cases, a new file output/sonnet.json is created, containing the sync map in JSON format:

{
 "fragments": [
  {
   "begin": "0.000", 
   "children": [], 
   "end": "2.640", 
   "id": "f000001", 
   "language": "eng", 
   "lines": [
    "1"
   ]
  }, 
  {
   "begin": "2.640", 
   "children": [], 
   "end": "5.880", 
   "id": "f000002", 
   "language": "eng", 
   "lines": [
    "From fairest creatures we desire increase,"
   ]
  }, 
  {
   "begin": "5.880", 
   "children": [], 
   "end": "9.240", 
   "id": "f000003", 
   "language": "eng", 
   "lines": [
    "That thereby beauty's rose might never die,"
   ]
  }, 
  {
   "begin": "9.240", 
   "children": [], 
   "end": "11.920", 
   "id": "f000004", 
   "language": "eng", 
   "lines": [
    "But as the riper should by time decease,"
   ]
  }, 
  {
   "begin": "11.920", 
   "children": [], 
   "end": "15.280", 
   "id": "f000005", 
   "language": "eng", 
   "lines": [
    "His tender heir might bear his memory:"
   ]
  }, 
  {
   "begin": "15.280", 
   "children": [], 
   "end": "18.800", 
   "id": "f000006", 
   "language": "eng", 
   "lines": [
    "But thou contracted to thine own bright eyes,"
   ]
  }, 
  {
   "begin": "18.800", 
   "children": [], 
   "end": "22.760", 
   "id": "f000007", 
   "language": "eng", 
   "lines": [
    "Feed'st thy light's flame with self-substantial fuel,"
   ]
  }, 
  {
   "begin": "22.760", 
   "children": [], 
   "end": "25.680", 
   "id": "f000008", 
   "language": "eng", 
   "lines": [
    "Making a famine where abundance lies,"
   ]
  }, 
  {
   "begin": "25.680", 
   "children": [], 
   "end": "31.240", 
   "id": "f000009", 
   "language": "eng", 
   "lines": [
    "Thy self thy foe, to thy sweet self too cruel:"
   ]
  }, 
  {
   "begin": "31.240", 
   "children": [], 
   "end": "34.400", 
   "id": "f000010", 
   "language": "eng", 
   "lines": [
    "Thou that art now the world's fresh ornament,"
   ]
  }, 
  {
   "begin": "34.400", 
   "children": [], 
   "end": "36.920", 
   "id": "f000011", 
   "language": "eng", 
   "lines": [
    "And only herald to the gaudy spring,"
   ]
  }, 
  {
   "begin": "36.920", 
   "children": [], 
   "end": "40.640", 
   "id": "f000012", 
   "language": "eng", 
   "lines": [
    "Within thine own bud buriest thy content,"
   ]
  }, 
  {
   "begin": "40.640", 
   "children": [], 
   "end": "43.640", 
   "id": "f000013", 
   "language": "eng", 
   "lines": [
    "And tender churl mak'st waste in niggarding:"
   ]
  }, 
  {
   "begin": "43.640", 
   "children": [], 
   "end": "48.080", 
   "id": "f000014", 
   "language": "eng", 
   "lines": [
    "Pity the world, or else this glutton be,"
   ]
  }, 
  {
   "begin": "48.080", 
   "children": [], 
   "end": "53.240", 
   "id": "f000015", 
   "language": "eng", 
   "lines": [
    "To eat the world's due, by the grave and thee."
   ]
  }
 ]
}

for the input file:

1
From fairest creatures we desire increase,
That thereby beauty's rose might never die,
But as the riper should by time decease,
His tender heir might bear his memory:
But thou contracted to thine own bright eyes,
Feed'st thy light's flame with self-substantial fuel,
Making a famine where abundance lies,
Thy self thy foe, to thy sweet self too cruel:
Thou that art now the world's fresh ornament,
And only herald to the gaudy spring,
Within thine own bud buriest thy content,
And tender churl mak'st waste in niggarding:
Pity the world, or else this glutton be,
To eat the world's due, by the grave and thee.

Verbose Output And Logging To File

If you want more verbose output, you can pass the -v or --verbose switch:

$ python -m aeneas.tools.execute_task --example-json -v

[DEBU] CLI: Formal arguments: [u'/home/alberto/ebooks/cloned/rb/aeneas/aeneas/tools/execute_task.py', u'--example-json', u'-v']
[DEBU] CLI: Actual arguments: [u'--example-json']
[DEBU] CLI: Runtime configuration: 'allow_unlisted_languages=False|c_extensions=True|cew_subprocess_enabled=False|cew_subprocess_path=python|dtw_algorithm=stripe|dtw_margin=60.000|ffmpeg_path=ffmpeg|ffmpeg_sample_rate=16000|ffprobe_path=ffprobe|job_max_tasks=0|mfcc_emphasis_factor=0.97|mfcc_fft_order=512|mfcc_filters=40|mfcc_lower_frequency=133.3333|mfcc_size=13|mfcc_upper_frequency=6855.4976|mfcc_window_length=0.100|mfcc_window_length_l1=0.500|mfcc_window_length_l2=0.100|mfcc_window_length_l3=0.020|mfcc_window_shift=0.040|mfcc_window_shift_l1=0.200|mfcc_window_shift_l2=0.040|mfcc_window_shift_l3=0.005|nuance_tts_api_retry_attempts=5|nuance_tts_api_sleep=1.000|task_max_audio_length=7200.0|task_max_text_length=0|tts=espeak|tts_path=espeak|vad_extend_speech_after=0.000|vad_extend_speech_before=0.000|vad_log_energy_threshold=0.699|vad_min_nonspeech_length=0.200'
[INFO] CLI: Running example task with arguments:
  Audio file:    aeneas/tools/res/audio.mp3
  Text file:     aeneas/tools/res/plain.txt
  Config string: task_language=eng|is_text_type=plain|os_task_file_format=json
  Sync map file: output/sonnet.json
[INFO] Running example task with arguments:
  Audio file:    aeneas/tools/res/audio.mp3
  Text file:     aeneas/tools/res/plain.txt
  Config string: task_language=eng|is_text_type=plain|os_task_file_format=json
  Sync map file: output/sonnet.json
[INFO] CLI: Creating task...
[INFO] Creating task...
[DEBU] Task: Populate audio file...
[DEBU] Task: audio_file_path_absolute is None
[DEBU] Task: Populate audio file... done
[DEBU] Task: Populate text file...
[DEBU] Task: text_file_path_absolute and/or language is None
[DEBU] Task: Populate text file... done
[DEBU] Task: Populate audio file...
[DEBU] Task: audio_file_path_absolute is 'aeneas/tools/res/audio.mp3'

...

[DEBU] Task: Output sync map to output/sonnet.json
[DEBU] Task: sync_map_format is json
[DEBU] Task: page_ref is None
[DEBU] Task: audio_ref is None
[DEBU] Task: Calling sync_map.write...
[DEBU] Task: Calling sync_map.write... done
[INFO] CLI: Creating output sync map file... done
[INFO] Creating output sync map file... done
[SUCC] CLI: Created file 'output/sonnet.json'
[INFO] Created file 'output/sonnet.json'
[DEBU] CLI: Execution completed with code 0

There is also a -vv or --very-verbose switch to increase the verbosity of the output.

Sometimes it is easier to dump the log to file, and then inspect it with a text editor. To do so, just specify the -l switch:

$ python -m aeneas.tools.execute_task --example-json -l

[INFO] Running example task with arguments:
  Audio file:    aeneas/tools/res/audio.mp3
  Text file:     aeneas/tools/res/plain.txt
  Config string: task_language=eng|is_text_type=plain|os_task_file_format=json
  Sync map file: output/sonnet.json
[INFO] Creating task...
[INFO] Creating task... done
[INFO] Executing task...
[INFO] Executing task... done
[INFO] Creating output sync map file...
[INFO] Creating output sync map file... done
[INFO] Created file 'output/sonnet.json'
[INFO] Log written to file '/tmp/tmpyS_VBv.log'

The path of the log file will be printed. By default, the log file will be created in the temporary directory of your OS. If you want your log file to be created at a specific path, use --log=/path/to/your.log instead of -l.

Note that you can specify both -v/-vv and -l/--log.

Input Text Formats

aeneas is able to read several text file formats, listed in TextFileFormat:

  1. PLAIN, one fragment per line (example: --example-json):

    Text of the first fragment
    Text of the second fragment
    Text of the third fragment
    
  2. PARSED, one fragment per line, starting with an explicit identifier (example: --example-tsv):

    f001|Text of the first fragment
    f002|Text of the second fragment
    f003|Text of the third fragment
    
  3. SUBTITLES, fragments separated by a blank line, each fragment might span multiple lines. This format is suitable for creating subtitle sync map files (example: --example-srt):

    Fragment on a single row
    
    Fragment on two rows
    because it is quite long
    
    Another one liner
    
    Another fragment
    on two rows
    
  4. UNPARSED, XML file from which text fragments will be extracted by matching id and/or class attributes (example: --example-smil):

    <?xml version="1.0" encoding="UTF-8"?>
    <html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" lang="en" xml:lang="en">
     <head>
      <meta charset="utf-8"/>
      <link rel="stylesheet" href="../Styles/style.css" type="text/css"/>
      <title>Sonnet I</title>
     </head>
     <body>
      <div id="divTitle">
       <h1><span class="ra" id="f001">I</span></h1>
      </div>
      <div id="divSonnet">
       <p>
        <span class="ra" id="f002">From fairest creatures we desire increase,</span><br/>
        <span class="ra" id="f003">That thereby beauty’s rose might never die,</span><br/>
        <span class="ra" id="f004">But as the riper should by time decease,</span><br/>
        <span class="ra" id="f005">His tender heir might bear his memory:</span><br/>
        <span class="ra" id="f006">But thou contracted to thine own bright eyes,</span><br/>
        <span class="ra" id="f007">Feed’st thy light’s flame with self-substantial fuel,</span><br/>
        <span class="ra" id="f008">Making a famine where abundance lies,</span><br/>
        <span class="ra" id="f009">Thy self thy foe, to thy sweet self too cruel:</span><br/>
        <span class="ra" id="f010">Thou that art now the world’s fresh ornament,</span><br/>
        <span class="ra" id="f011">And only herald to the gaudy spring,</span><br/>
        <span class="ra" id="f012">Within thine own bud buriest thy content,</span><br/>
        <span class="ra" id="f013">And tender churl mak’st waste in niggarding:</span><br/>
        <span class="ra" id="f014">Pity the world, or else this glutton be,</span><br/>
        <span class="ra" id="f015">To eat the world’s due, by the grave and thee.</span>
       </p>
      </div>
     </body>
    </html>
    
  5. MPLAIN, the multilevel equivalent to PLAIN, with paragraphs separated by a blank line, one sentence per line, and words separated by blank spaces (example: --example-mplain-json):

    First sentence of Paragraph One.
    Second sentence of Paragraph One.
    
    First sentence of Paragraph Two.
    
    First sentence of Paragraph Three.
    Second sentence of Paragraph Three.
    Third sentence of Paragraph Three.
    
  6. MUNPARSED, the multilevel equivalent to UNPARSED (example: --example-munparsed-json):

    <?xml version="1.0" encoding="UTF-8"?>
    <html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" lang="en" xml:lang="en">
     <head>
      <meta charset="utf-8"/>
      <link rel="stylesheet" href="../Styles/style.css" type="text/css"/>
      <title>Sonnet I</title>
     </head>
     <body>
      <div id="divTitle">
       <h1>
        <span id="p000001">
         <span id="p000001s000001">
          <span id="p000001s000001w000001">I</span>
         </span>
        </span>
       </h1>
      </div>
      <div id="divSonnet">
       <p class="stanza" id="p000002">
        <span id="p000002s000001">
         <span id="p000002s000001w000001">From</span>
         <span id="p000002s000001w000002">fairest</span>
         <span id="p000002s000001w000003">creatures</span>
         <span id="p000002s000001w000004">we</span>
         <span id="p000002s000001w000005">desire</span>
         <span id="p000002s000001w000006">increase,</span>
        </span><br/>
        <span id="p000002s000002">
         <span id="p000002s000002w000001">That</span>
         <span id="p000002s000002w000002">thereby</span>
         <span id="p000002s000002w000003">beauty’s</span>
         <span id="p000002s000002w000004">rose</span>
         <span id="p000002s000002w000005">might</span>
         <span id="p000002s000002w000006">never</span>
         <span id="p000002s000002w000007">die,</span>
        </span><br/>
        ...
       </p>
       ...
      </div>
     </body>
    </html>
    

If you use UNPARSED files, you need to provide the following additional parameters:

$ python -m aeneas.tools.execute_task --example-smil

[INFO] Running example task with arguments:
  Audio file:    aeneas/tools/res/audio.mp3
  Text file:     aeneas/tools/res/page.xhtml
  Config string: task_language=eng|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric|os_task_file_format=smil|os_task_file_smil_audio_ref=p001.mp3|os_task_file_smil_page_ref=p001.xhtml
  Sync map file: output/sonnet.smil
[INFO] Creating task...
[INFO] Creating task... done
[INFO] Executing task...
[INFO] Executing task... done
[INFO] Creating output sync map file...
[INFO] Creating output sync map file... done
[INFO] Created file 'output/sonnet.smil'

Note

Even if you only specify the PPN_TASK_IS_TEXT_UNPARSED_CLASS_REGEX regex, your XML elements still need to have id attributes. This is required for e.g. SMIL output to make sense. (Although the EPUB 3 Media Overlays specification allows you to specify an EPUB CFI instead of an id value, it is recommended to use id values for maximum reading system compatibility, and hence aeneas only outputs SMIL files with id references.)

Similarly, for MUNPARSED files you need to provide the following additional parameters:

$ python -m aeneas.tools.execute_task --example-munparsed-smil

[INFO] Running example task with arguments:
  Audio file:    aeneas/tools/res/audio.mp3
  Text file:     aeneas/tools/res/munparsed.xhtml
  Config string: task_language=eng|is_text_type=munparsed|is_text_munparsed_l1_id_regex=p[0-9]+|is_text_munparsed_l2_id_regex=p[0-9]+s[0-9]+|is_text_munparsed_l3_id_regex=p[0-9]+s[0-9]+w[0-9]+|os_task_file_format=smil|os_task_file_smil_audio_ref=p001.mp3|os_task_file_smil_page_ref=p001.xhtml
  Sync map file: output/sonnet.munparsed.smil
[INFO] Creating task...
[INFO] Creating task... done
[INFO] Executing task...
[INFO] Executing task... done
[INFO] Creating output sync map file...
[INFO] Creating output sync map file... done
[INFO] Created file 'output/sonnet.munparsed.smil'

Note

If you are interested in synchronizing at word granularity, it is highly suggested to use:

  1. MFCC nonspeech masking;
  2. a multilevel text format, even if you are going to use only the timings for the finer granularity;
  3. better TTS engines, like Festival or AWS/Nuance TTS API;

as they generally yield more accurate timings.

(If you do not want the output sync map file to contain the multilevel tree hierarchy for the timings, you might “flatten” the output sync map file, retaining only the word-level timings, by using the configuration parameter PPN_TASK_OS_FILE_LEVELS with value 3).

Since aeneas v1.7.0, the aeneas.tools.execute_task has a switch --presets-word that enables MFCC nonspeech masking for single level tasks or MFCC nonspeech masking on level 3 (word) for multilevel tasks. For example:

$ python -m aeneas.tools.execute_task --example-words
$ python -m aeneas.tools.execute_task --example-words --presets-word
$ python -m aeneas.tools.execute_task --example-words-multilevel
$ python -m aeneas.tools.execute_task --example-words-multilevel --presets-word

The other default settings should be fine for most users, however if you need finer control, feel free to experiment with the following parameters.

Starting with aeneas v1.5.1, you can specify different MFCC parameters for each level, see:

Starting with aeneas v1.6.0, you can also specify a different TTS engine for each level, see:

Starting with aeneas v1.7.0, you can specify the MFCC nonspeech masking, for both single level tasks and multilevel tasks. In the latter case, you can apply it to each level separately, see:

If you are using a multilevel text format, you might want to enable MFCC masking only for level 3 (word), as enabling it for level 1 and 2 does not seem to yield significantly better results.

The aeneas mailing list contains some interesting threads about using aeneas for word-level synchronization.

Output Sync Map Formats

aeneas is able to write the sync map into several formats, listed in SyncMapFormat.

As for the input text, certain output sync map formats require the user to specify additional parameters to correctly create the output file. For example, SMIL requires:

Example:

$ python -m aeneas.tools.execute_task --example-smil

[INFO] Running example task with arguments:
  Audio file:    aeneas/tools/res/audio.mp3
  Text file:     aeneas/tools/res/page.xhtml
  Config string: task_language=eng|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric|os_task_file_format=smil|os_task_file_smil_audio_ref=p001.mp3|os_task_file_smil_page_ref=p001.xhtml
  Sync map file: output/sonnet.smil
[INFO] Creating task...
[INFO] Creating task... done
[INFO] Executing task...
[INFO] Executing task... done
[INFO] Creating output sync map file...
[INFO] Creating output sync map file... done
[INFO] Created file 'output/sonnet.smil'

Listing Parameter Names And Values

Since there are dozens of parameter names and values, it is easy to forget their correct spelling. You can use the --list-parameters switch to print the list of parameter names that you can use in the configuration string.

$ python -m aeneas.tools.execute_task --list-parameters

[INFO] You can use --list-values=PARAM on parameters marked by '*'
[INFO] Parameters marked by 'REQ' are required
[INFO] Available parameters:

is_audio_file_detect_head_max           : detect audio head, at most this many seconds (float, None)
is_audio_file_detect_head_min           : detect audio head, at least this many seconds (float, None)
is_audio_file_detect_tail_max           : detect audio tail, at most this many seconds (float, None)
is_audio_file_detect_tail_min           : detect audio tail, at least this many seconds (float, None)
is_audio_file_head_length               : ignore this many seconds at begin of audio (float, None)
is_audio_file_process_length            : process this many seconds of audio (float, None)
is_audio_file_tail_length               : ignore this many seconds at end of audio (float, None)
is_text_file_ignore_regex               : for the alignment, ignore text matched by regex
is_text_file_transliterate_map          : for the alignment, apply this transliteration map to text
is_text_mplain_word_separator           : word separator (mplain)
is_text_munparsed_l1_id_regex           : regex matching level 1 id attributes (munparsed)
is_text_munparsed_l2_id_regex           : regex matching level 2 id attributes (munparsed)
is_text_munparsed_l3_id_regex           : regex matching level 3 id attributes (munparsed)
is_text_type                            : text format (REQ, *)
is_text_unparsed_class_regex            : regex matching class attributes (unparsed)
is_text_unparsed_id_regex               : regex matching id attributes (unparsed)
is_text_unparsed_id_sort                : algorithm to sort matched element (unparsed) (*)
os_task_file_eaf_audio_ref              : audio ref value (eaf)
os_task_file_format                     : sync map format (REQ, *)
os_task_file_head_tail_format           : audio head/tail format (*)
os_task_file_id_regex                   : regex to build sync map id's (subtitles, plain)
os_task_file_levels                     : output the specified levels only (mplain, munparserd)
os_task_file_name                       : sync map file name (ignored)
os_task_file_smil_audio_ref             : audio ref value (smil, smilh, smilm)
os_task_file_smil_page_ref              : text ref value (smil, smilh, smilm)
task_adjust_boundary_aftercurrent_value : offset value, in s (aftercurrent) (float, None)
task_adjust_boundary_algorithm          : algorithm to adjust sync map values (*)
task_adjust_boundary_beforenext_value   : offset value, in s (beforenext) (float, None)
task_adjust_boundary_no_zero            : if True, do not allow zero-length fragments (bool, None)
task_adjust_boundary_nonspeech_min      : minimum long nonspeech duration, in s (float, None)
task_adjust_boundary_nonspeech_string   : replace long nonspeech with this string or specify REMOVE
task_adjust_boundary_offset_value       : offset value, in s (offset) (float, None)
task_adjust_boundary_percent_value      : percent value in [0..100] (percent) (int, None)
task_adjust_boundary_rate_value         : max rate, in chars/s (rate, rateaggressive) (float, None)
task_custom_id                          : custom ID
task_description                        : description
task_language                           : language (REQ, *)

For parameters that accept a restricted set of values, you can list the allowed values with --list-values=PARAM. For example:

$ python -m aeneas.tools.execute_task --list-values

[INFO] Parameters for which values can be listed:
aws
espeak
espeak-ng
festival
is_text_type
is_text_unparsed_id_sort
nuance
os_task_file_format
os_task_file_head_tail_format
task_adjust_boundary_algorithm
task_language


$ python -m aeneas.tools.execute_task --list-values=is_text_type

[INFO] Available values for parameter 'is_text_type':
mplain
munparsed
parsed
plain
subtitles
unparsed


$ python -m aeneas.tools.execute_task --list-values=espeak

[INFO] Available values for parameter 'espeak':
af	    Afrikaans
afr	    Afrikaans
an	    Aragonese (not tested)
arg	    Aragonese (not tested)
...
yue     Yue Chinese (not tested)
zh	    Mandarin Chinese (not tested)
zh-yue	Yue Chinese (not tested)
zho	    Chinese (not tested)

Downloading Audio From YouTube

aeneas can download the audio stream from a YouTube video. Instead of the audio file path, you provide the YouTube URL, and add the -y switch at the end:

$ python -m aeneas.tools.execute_task --example-youtube

[INFO] Running example task with arguments:
  YouTube URL:   https://www.youtube.com/watch?v=rU4a7AA8wM0
  Text file:     aeneas/tools/res/plain.txt
  Config string: task_language=eng|is_text_type=plain|os_task_file_format=txt
  Sync map file: output/sonnet.txt
  Options:       -y
[INFO] Downloading audio from 'https://www.youtube.com/watch?v=rU4a7AA8wM0' ...
[INFO] Downloading audio from 'https://www.youtube.com/watch?v=rU4a7AA8wM0' ... done
[INFO] Creating task...
[INFO] Creating task... done
[INFO] Executing task...
[INFO] Executing task... done
[INFO] Creating output sync map file...
[INFO] Creating output sync map file... done
[INFO] Created file 'output/sonnet.txt'

Warning

The download feature is experimental, and it might be unavailable in the future, for example if YouTube disables API access to audio/video contents. Also note that sometimes the download fails for network/backend reasons: just wait a few seconds and try executing again.

The Runtime Configuration

Although the default settings should be fine for most users, sometimes it might be useful to modify certain internal parameters affecting the processing of tasks, for example changing the directory where temporary files are created, modifying processing parameters like the time resolution, etc.

To do so, the user can use the -r or --runtime-configuration switch, providing a suitable configuration string as its value.

Warning

Using the runtime configuration switch is advisable only to expert users or if explicitly suggested by expert users, since there are (almost) no sanity checks on the values provided this way, and setting wrong values might lead to erratic behaviors of the aligner.

The available paramenter names are listed in RuntimeConfiguration.

Examples:

  1. disable checks on the language codes:

    python -m aeneas.tools.execute_task --example-json -r="allow_unlisted_languages=True"
    
  2. disable the Python C/C++ extensions, running the pure Python code:

    python -m aeneas.tools.execute_task --example-json -r="c_extensions=False"
    
  3. disable only the cew Python C/C++ extension, while cdtw and cmfcc will still run (if compiled):

    python -m aeneas.tools.execute_task --example-json -r="cew=False"
    
  4. set the DTW margin to 10.000 seconds:

    python -m aeneas.tools.execute_task --example-json -r="dtw_margin=10"
    
  5. specify the path to the ffprobe and ffmpeg executables:

    python -m aeneas.tools.execute_task --example-json -r="ffmpeg_path=/path/to/my/ffmpeg|ffprobe_path=/path/to/my/ffprobe"
    
  6. set the time resolution of the aligner to 0.050 seconds:

    python -m aeneas.tools.execute_task --example-json -r="mfcc_window_length=0.150|mfcc_window_shift=0.050"
    
  7. use the eSpeak-ng TTS, via the espeak-ng executable available on $PATH, instead of eSpeak:

    python -m aeneas.tools.execute_task --example-json -r="tts=espeak-ng"
    
  8. use the eSpeak-ng TTS, via the espeak-ng executable at a custom location, instead of eSpeak:

    python -m aeneas.tools.execute_task --example-json -r="tts=espeak-ng|tts_path=/path/to/espeak-ng"
    
  9. use the Festival TTS, via the text2wave executable available on $PATH, instead of eSpeak:

    python -m aeneas.tools.execute_task --example-json -r="tts=festival"
    
  10. use the Festival TTS, via the text2wave executable at a custom location, instead of eSpeak:

    python -m aeneas.tools.execute_task --example-json -r="tts=festival|tts_path=/path/to/text2wave"
    
  11. use the AWS Polly TTS API instead of eSpeak (with TTS caching enabled):

    python -m aeneas.tools.execute_task --example-json -r="tts=aws|tts_cache=True"
    
  12. use the Nuance TTS API instead of eSpeak (with TTS caching enabled):

    python -m aeneas.tools.execute_task --example-json -r="tts=nuance|nuance_tts_api_id=YOUR_NUANCE_API_ID|nuance_tts_api_key=YOUR_NUANCE_API_KEY|tts_cache=True"
    
  13. use a custom TTS wrapper located at /path/to/your/wrapper.py (see the aeneas/extra/ directory for examples):

    python -m aeneas.tools.execute_task --example-json -r="tts=custom|tts_path=/path/to/your/wrapper.py"
    
  14. set the temporary directory:

    python -m aeneas.tools.execute_task --example-json -r="tmp_path=/path/to/tmp/"
    
  15. allow processing tasks with audio files at most 1 hour (= 3600 seconds) long:

    python -m aeneas.tools.execute_task --example-json -r="task_max_audio_length=3600"
    

Miscellanea

  1. --example-head-tail: ignore the first 0.400 seconds and the last 0.500 seconds of the audio file for alignment purposes
  2. --example-no-zero: ensure that no fragment in the output sync map has zero length
  3. --example-percent: adjust the output sync map, setting each boundary between adjacent fragments to the middle of the nonspeech interval, using the PERCENT algorithm with value 50 (i.e., 50%)
  4. --example-rate: adjust the output sync map, trying to ensure that no fragment has a rate of more than 14 character/s, using the RATE algorithm
  5. --example-sd: detect the audio head/tail, each at most 10.000 seconds long
  6. --example-multilevel-tts: use different TTS engines for different levels (mplain multilevel input text)

Processing Jobs

If you have several Tasks sharing the same parameters (configuration strings) and changing only in their audio/text files, you can either write your own Bash/BAT script, or you might want to create a Job:

Job

A Job is a container (compressed file or uncompressed directory), containing:

  • one or more pairs audio/text files, and
  • a configuration file (config.txt or config.xml) specifying parameters to locate each Task assets inside the Job, to process each Task, and to create the output container containing the output sync map files.

Example: /home/rb/job.zip, containing the following files, corresponding to three Tasks:

.
├── config.txt
└── OEBPS
    └── Resources
        ├── sonnet001.mp3
        ├── sonnet001.txt
        ├── sonnet002.mp3
        ├── sonnet002.txt
        ├── sonnet003.mp3
        └── sonnet003.txt

The aeneas.tools.execute_job tool processes a Job and writes the corresponding output container to file. Therefore, it requires at least two arguments:

  • the path of the input job container;
  • the path of an existing directory where the output container should be created.

The --help, -v, -l, and -r switches have the same meaning for aeneas.tools.execute_job as described above. For example, the help message reads:

$ python -m aeneas.tools.execute_job

NAME
  execute_job - Execute a Job, passed as a container.

SYNOPSIS
  python -m aeneas.tools.execute_job [-h|--help|--help-rconf|--version]
  python -m aeneas.tools.execute_job --list-parameters
  python -m aeneas.tools.execute_job CONTAINER OUTPUT_DIR [CONFIG_STRING] [OPTIONS]

OPTIONS
  --cewsubprocess : run cew in separate process (see docs)
  --help : print full help and exit
  --help-rconf : list all runtime configuration parameters
  --skip-validator : do not validate the given container and/or config string
  --version : print the program name and version and exit
  -h : print short help and exit
  -l[=FILE], --log[=FILE] : log verbose output to tmp file or FILE if specified
  -r=CONF, --runtime-configuration=CONF : apply runtime configuration CONF
  -v, --verbose : verbose output
  -vv, --very-verbose : verbose output, print date/time values

EXAMPLES
  python -m aeneas.tools.execute_job aeneas/tools/res/job.zip output/
  python -m aeneas.tools.execute_job aeneas/tools/res/job.zip output/ --cewsubprocess
  python -m aeneas.tools.execute_job aeneas/tools/res/job_no_config.zip output/ "is_hierarchy_type=flat|is_hierarchy_prefix=assets/|is_text_file_relative_path=.|is_text_file_name_regex=.*\.xhtml|is_text_type=unparsed|is_audio_file_relative_path=.|is_audio_file_name_regex=.*\.mp3|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric|os_job_file_name=demo_sync_job_output|os_job_file_container=zip|os_job_file_hierarchy_type=flat|os_job_file_hierarchy_prefix=assets/|os_task_file_name=\$PREFIX.xhtml.smil|os_task_file_format=smil|os_task_file_smil_page_ref=\$PREFIX.xhtml|os_task_file_smil_audio_ref=../Audio/\$PREFIX.mp3|job_language=eng|job_description=Demo Sync Job"

Currently aeneas.tools.execute_job does not have built-in examples shortcuts (--example-*), but you can run a built-in example:

$ python -m aeneas.tools.execute_job aeneas/tools/res/job.zip output/

[INFO] Validating the container (specify --skip-validator to bypass)...
[INFO] Validating the container... done
[INFO] Loading job from container...
[INFO] Loading job from container... done
[INFO] Executing...
[INFO] Executing... done
[INFO] Creating output container...
[INFO] Creating output container... done
[INFO] Created output file 'output/demo_sync_job_output.zip'

TXT Config File (config.txt)

A ZIP container with the following files:

.
├── config.txt
└── OEBPS
    └── Resources
        ├── sonnet001.mp3
        ├── sonnet001.txt
        ├── sonnet002.mp3
        ├── sonnet002.txt
        ├── sonnet003.mp3
        └── sonnet003.txt

where the config.txt config file reads:

is_hierarchy_type=flat
is_hierarchy_prefix=OEBPS/Resources/
is_text_file_relative_path=.
is_text_file_name_regex=.*\.txt
is_text_type=parsed
is_audio_file_relative_path=.
is_audio_file_name_regex=.*\.mp3

os_job_file_name=output_example1
os_job_file_container=zip
os_job_file_hierarchy_type=flat
os_job_file_hierarchy_prefix=OEBPS/Resources/
os_task_file_name=$PREFIX.smil
os_task_file_format=smil
os_task_file_smil_page_ref=$PREFIX.xhtml
os_task_file_smil_audio_ref=$PREFIX.mp3

job_language=en
job_description=Example 1 (flat hierarchy, parsed text files)

will generate three tasks (sonnet001, sonnet002 and sonnet003), output a SMIL file for each of them, finally compress them in a ZIP file with the following structure:

.
└── OEBPS
    └── Resources
        ├── sonnet001.smil
        ├── sonnet002.smil
        └── sonnet003.smil

Note that the paths in config.txt are relative to (the directory containing) the config.txt file, and that you can use the PPV_OS_TASK_PREFIX placeholder ($PREFIX) that will be replaced with the Task id.

XML Config File (config.xml)

While config.txt is concise and easy to write, it constraints all the tasks of the job to share the same execution settings (language, output format, and so on).

If you need to specify different values for execution parameters of different tasks, you must use an XML config file, named config.xml.

The following config.xml is equivalent to the example above:

<?xml version = "1.0" encoding="UTF-8" standalone="no"?>
<job>
    <job_language>en</job_language>
    <job_description>Example 4 (XML, flat hierarchy, parsed text files)</job_description>
    <tasks>
        <task>
            <task_language>en</task_language>
            <task_description>Sonnet 1</task_description>
            <task_custom_id>sonnet001</task_custom_id>
            <is_text_file>OEBPS/Resources/sonnet001.txt</is_text_file>
            <is_text_type>parsed</is_text_type>
            <is_audio_file>OEBPS/Resources/sonnet001.mp3</is_audio_file>
            <os_task_file_name>sonnet001.smil</os_task_file_name>
            <os_task_file_format>smil</os_task_file_format>
            <os_task_file_smil_page_ref>sonnet001.xhtml</os_task_file_smil_page_ref>
            <os_task_file_smil_audio_ref>sonnet001.mp3</os_task_file_smil_audio_ref>
        </task>
        <task>
            <task_language>en</task_language>
            <task_description>Sonnet 2</task_description>
            <task_custom_id>sonnet002</task_custom_id>
            <is_text_file>OEBPS/Resources/sonnet002.txt</is_text_file>
            <is_text_type>parsed</is_text_type>
            <is_audio_file>OEBPS/Resources/sonnet002.mp3</is_audio_file>
            <os_task_file_name>sonnet002.smil</os_task_file_name>
            <os_task_file_format>smil</os_task_file_format>
            <os_task_file_smil_page_ref>sonnet002.xhtml</os_task_file_smil_page_ref>
            <os_task_file_smil_audio_ref>sonnet002.mp3</os_task_file_smil_audio_ref>
        </task>
        <task>
            <task_language>en</task_language>
            <task_description>Sonnet 3</task_description>
            <task_custom_id>sonnet003</task_custom_id>
            <is_text_file>OEBPS/Resources/sonnet003.txt</is_text_file>
            <is_text_type>parsed</is_text_type>
            <is_audio_file>OEBPS/Resources/sonnet003.mp3</is_audio_file>
            <os_task_file_name>sonnet003.smil</os_task_file_name>
            <os_task_file_format>smil</os_task_file_format>
            <os_task_file_smil_page_ref>sonnet003.xhtml</os_task_file_smil_page_ref>
            <os_task_file_smil_audio_ref>sonnet003.mp3</os_task_file_smil_audio_ref>
        </task>
    </tasks>
    <os_job_file_name>output_example4</os_job_file_name>
    <os_job_file_container>zip</os_job_file_container>
    <os_job_file_hierarchy_type>flat</os_job_file_hierarchy_type>
    <os_job_file_hierarchy_prefix>OEBPS/Resources/</os_job_file_hierarchy_prefix>
</job>

Now note that config.xml allows you to bundle together Tasks with different languages, output formats, etc.:

<?xml version = "1.0" encoding="UTF-8" standalone="no"?>
<job>
    <job_language>en</job_language>
    <job_description>Example 7 (XML, multiple languages, multiple output formats)</job_description>
    <tasks>
        <task>
            <task_language>en</task_language>
            <task_description>Sonnet 1</task_description>
            <task_custom_id>sonnet001</task_custom_id>
            <is_text_file>OEBPS/Resources/en.txt</is_text_file>
            <is_text_type>parsed</is_text_type>
            <is_audio_file>OEBPS/Resources/en.mp3</is_audio_file>
            <os_task_file_name>en.smil</os_task_file_name>
            <os_task_file_format>smil</os_task_file_format>
            <os_task_file_smil_page_ref>en.xhtml</os_task_file_smil_page_ref>
            <os_task_file_smil_audio_ref>en.mp3</os_task_file_smil_audio_ref>
        </task>
        <task>
            <task_language>de</task_language>
            <task_description>Simplicissimus</task_description>
            <task_custom_id>simplicissimus</task_custom_id>
            <is_text_file>OEBPS/Resources/de.txt</is_text_file>
            <is_text_type>parsed</is_text_type>
            <is_audio_file>OEBPS/Resources/de.mp3</is_audio_file>
            <os_task_file_name>de.csv</os_task_file_name>
            <os_task_file_format>csv</os_task_file_format>
        </task>
        <task>
            <task_language>es</task_language>
            <task_description>Capitan Veneno</task_description>
            <task_custom_id>capitan veneno</task_custom_id>
            <is_text_file>OEBPS/Resources/es.txt</is_text_file>
            <is_text_type>parsed</is_text_type>
            <is_audio_file>OEBPS/Resources/es.mp3</is_audio_file>
            <os_task_file_name>es.srt</os_task_file_name>
            <os_task_file_format>srt</os_task_file_format>
        </task>
    </tasks>
    <os_job_file_name>output_example7</os_job_file_name>
    <os_job_file_container>zip</os_job_file_container>
    <os_job_file_hierarchy_type>flat</os_job_file_hierarchy_type>
    <os_job_file_hierarchy_prefix>OEBPS/Resources/</os_job_file_hierarchy_prefix>
</job>