runtimeconfiguration

This module contains the following classes:

New in version 1.4.1.

class aeneas.runtimeconfiguration.RuntimeConfiguration(config_string=None)[source]

A structure representing a runtime configuration, that is, a set of parameters for the algorithms which process jobs and tasks.

Allowed keys are listed below as class members.

Parameters:config_string (string) – the configuration string
Raises:TypeError: if config_string is not None and it is not a Unicode string
Raises:KeyError: if trying to access a key not listed above
ABA_NONSPEECH_TOLERANCE = 'aba_nonspeech_tolerance'

Tolerance, in seconds, for considering a given time value inside a nonspeech interval.

Default: 0.080 seconds.

New in version 1.7.0.

ABA_NO_ZERO_DURATION = 'aba_no_zero_duration'

Offset, in seconds, to be added to fragments with zero length.

Default: 0.001 seconds.

New in version 1.7.0.

ALLOW_UNLISTED_LANGUAGES = 'allow_unlisted_languages'

If True, allow using a language code not listed in the TTS supported languages list; otherwise, generate an error if the user attempts to use a language not listed.

Default: False.

New in version 1.4.1.

CDTW = 'cdtw'

If True and the Python C extension cdtw is available, use it. Otherwise, use the pure Python code.

Default: True.

New in version 1.5.1.

CEW = 'cew'

If True and the Python C extension cew is available, use it. Otherwise, use the pure Python code.

Default: True.

New in version 1.5.1.

CEW_SUBPROCESS_ENABLED = 'cew_subprocess_enabled'

If True, calls to aeneas.cew will be done via subprocess, using the CEWSubprocess helper class.

Default: False.

New in version 1.5.0.

CEW_SUBPROCESS_PATH = 'cew_subprocess_path'

Use the given path to the python executable when calling aeneas.cew via subprocess.

You might need to use a full path, like /path/to/your/python.

Default: python.

New in version 1.5.0.

CFW = 'cfw'

If True and the Python C++ extension cfw is available, use it. Otherwise, use the pure Python code.

Default: True.

New in version 1.6.0.

CMFCC = 'cmfcc'

If True and the Python C extension cmfcc is available, use it. Otherwise, use the pure Python code.

Default: True.

New in version 1.5.1.

C_EXTENSIONS = 'c_extensions'

If True and the Python C/C++ extensions are available, use them. Otherwise, use the pure Python code.

This option is equivalent to setting CDTW, CEW, CFW, and CMFCC to True or False at once.

Default: True.

New in version 1.4.1.

DOWNLOADER_RETRY_ATTEMPTS = 'downloader_retry_attempts'

Retry an HTTP POST request generated by the Downloader for this number of times before giving up. It must be an integer greater than zero.

Default: 5.

New in version 1.7.2.

DOWNLOADER_SLEEP = 'downloader_sleep'

Wait this number of seconds before the next HTTP POST request of the Downloader. This parameter can be used to throttle the HTTP usage. It cannot be a negative value.

Default: 1.000.

New in version 1.7.2.

DTW_ALGORITHM = 'dtw_algorithm'

DTW aligner algorithm.

Allowed values:

New in version 1.4.1.

DTW_MARGIN = 'dtw_margin'

DTW aligner margin, in seconds, for the stripe algorithm.

Default: 60, corresponding to 60 s ahead and behind (i.e., 120 s total margin).

New in version 1.4.1.

DTW_MARGIN_L1 = 'dtw_margin_l1'

DTW aligner margin, in seconds, for the stripe algorithm at level 1 (paragraph).

Default: 60, corresponding to 60 s ahead and behind (i.e., 120 s total margin).

New in version 1.7.0.

DTW_MARGIN_L2 = 'dtw_margin_l2'

DTW aligner margin, in seconds, for the stripe algorithm at level 2 (sentence).

Default: 30, corresponding to 30 s ahead and behind (i.e., 60 s total margin).

New in version 1.7.0.

DTW_MARGIN_L3 = 'dtw_margin_l3'

DTW aligner margin, in seconds, for the stripe algorithm at level 3 (word).

Default: 10, corresponding to 10 s ahead and behind (i.e., 20s total margin).

New in version 1.7.0.

FFMPEG_PATH = 'ffmpeg_path'

Path to the ffmpeg executable.

You might need to use a full path, like /path/to/your/ffmpeg.

Default: ffmpeg.

New in version 1.4.1.

FFMPEG_SAMPLE_RATE = 'ffmpeg_sample_rate'

Sample rate for ffmpeg, in Hertz.

Default: 16000.

New in version 1.4.1.

FFPROBE_PATH = 'ffprobe_path'

Path to the ffprobe executable.

You might use a full path, like /path/to/your/ffprobe.

Default: ffprobe.

New in version 1.4.1.

JOB_MAX_TASKS = 'job_max_tasks'

Maximum number of Tasks of a Job. If a Job has more Tasks than this value, it will not be executed and an error will be raised. Use 0 for disabling this check.

Default: 0 (disabled).

New in version 1.4.1.

MFCC_EMPHASIS_FACTOR = 'mfcc_emphasis_factor'

Emphasis factor to be applied to MFCCs.

Default: 0.970.

New in version 1.4.1.

MFCC_FFT_ORDER = 'mfcc_fft_order'

Order of the RFFT for extracting MFCCs. It must be a power of two.

Default: 512.

New in version 1.4.1.

MFCC_FILTERS = 'mfcc_filters'

Number of filters for extracting MFCCs.

Default: 40.

New in version 1.4.1.

MFCC_GRANULARITY_MAP = {1: ('dtw_margin_l1', 'mfcc_mask_nonspeech_l1', 'mfcc_window_length_l1', 'mfcc_window_shift_l1'), 2: ('dtw_margin_l2', 'mfcc_mask_nonspeech_l2', 'mfcc_window_length_l2', 'mfcc_window_shift_l2'), 3: ('dtw_margin_l3', 'mfcc_mask_nonspeech_l3', 'mfcc_window_length_l3', 'mfcc_window_shift_l3')}

Map level numbers to DTW_MARGIN_*, MFCC_MASK_NONSPEECH_*, MFCC_WINDOW_LENGTH_*, and MFCC_WINDOW_SHIFT_* keys.

New in version 1.5.0.

MFCC_LOWER_FREQUENCY = 'mfcc_lower_frequency'

Lower frequency to be used for extracting MFCCs, in Hertz.

Default: 133.3333.

New in version 1.4.1.

MFCC_MASK_EXTEND_SPEECH_INTERVAL_AFTER = 'mfcc_mask_extend_speech_after'

Extend to the right (after/future) a speech interval found by the VAD algorithm, by this many frames, when masking nonspeech out.

Default: 0.

New in version 1.7.0.

MFCC_MASK_EXTEND_SPEECH_INTERVAL_BEFORE = 'mfcc_mask_extend_speech_before'

Extend to the left (before/past) a speech interval found by the VAD algorithm, by this many frames, when masking nonspeech out.

Default: 0.

New in version 1.7.0.

MFCC_MASK_LOG_ENERGY_THRESHOLD = 'mfcc_mask_log_energy_threshold'

Threshold for the VAD algorithm to decide that a given frame contains speech, when masking nonspeech out. Note that this is the log10 of the energy coefficient.

Default: 0.699 = log10(5), that is, a frame must have an energy at least 5 times higher than the minimum to be considered a speech frame.

New in version 1.7.0.

MFCC_MASK_MIN_NONSPEECH_LENGTH = 'mfcc_mask_min_nonspeech_length'

Minimum length, in frames, of a nonspeech interval to be masked out.

Default: 1.

New in version 1.7.0.

MFCC_MASK_NONSPEECH = 'mfcc_mask_nonspeech'

If True, computes the DTW path ignoring nonspeech frames. Setting this parameter to True might help aligning at word level granularity.

Default: False.

New in version 1.7.0.

MFCC_MASK_NONSPEECH_L1 = 'mfcc_mask_nonspeech_l1'

If True, computes the DTW path ignoring nonspeech frames at level 1 (paragraph).

Default: False.

New in version 1.7.0.

MFCC_MASK_NONSPEECH_L2 = 'mfcc_mask_nonspeech_l2'

If True, computes the DTW path ignoring nonspeech frames at level 2 (sentence).

Default: False.

New in version 1.7.0.

MFCC_MASK_NONSPEECH_L3 = 'mfcc_mask_nonspeech_l3'

If True, computes the DTW path ignoring nonspeech frames at level 3 (word).

Default: False.

New in version 1.7.0.

MFCC_SIZE = 'mfcc_size'

Number of MFCCs to extract, including the 0th.

Default: 13.

New in version 1.4.1.

MFCC_UPPER_FREQUENCY = 'mfcc_upper_frequency'

Upper frequency to be used for extracting MFCCs, in Hertz.

Default: 6855.4976.

New in version 1.4.1.

MFCC_WINDOW_LENGTH = 'mfcc_window_length'

Length of the window for extracting MFCCs, in seconds. It is usual to set it between 1.5 and 4 times the value of MFCC_WINDOW_SHIFT.

Default: 0.100.

New in version 1.4.1.

MFCC_WINDOW_LENGTH_L1 = 'mfcc_window_length_l1'

Length of the window, in seconds, for extracting MFCCs at level 1 (paragraph). It is usual to set it between 1.5 and 4 times the value of MFCC_WINDOW_SHIFT_L1.

Default: 0.100.

New in version 1.5.0.

MFCC_WINDOW_LENGTH_L2 = 'mfcc_window_length_l2'

Length of the window, in seconds, for extracting MFCCs at level 2 (sentence). It is usual to set it between 1.5 and 4 times the value of MFCC_WINDOW_SHIFT_L2.

Default: 0.050.

New in version 1.5.0.

MFCC_WINDOW_LENGTH_L3 = 'mfcc_window_length_l3'

Length of the window, in seconds, for extracting MFCCs at level 3 (word). It is usual to set it between 1.5 and 4 times the value of MFCC_WINDOW_SHIFT_L3.

Default: 0.020.

New in version 1.5.0.

MFCC_WINDOW_SHIFT = 'mfcc_window_shift'

Shift of the window for extracting MFCCs, in seconds. This parameter is basically the time step of the synchronization maps output.

Default: 0.040.

New in version 1.4.1.

MFCC_WINDOW_SHIFT_L1 = 'mfcc_window_shift_l1'

Shift of the window, in seconds, for extracting MFCCs at level 1 (paragraph). This parameter is basically the time step of the synchronization map output at level 1.

Default: 0.040.

New in version 1.5.0.

MFCC_WINDOW_SHIFT_L2 = 'mfcc_window_shift_l2'

Shift of the window, in seconds, for extracting MFCCs at level 2 (sentence). This parameter is basically the time step of the synchronization map output at level 2.

Default: 0.020.

New in version 1.5.0.

MFCC_WINDOW_SHIFT_L3 = 'mfcc_window_shift_l3'

Shift of the window, in seconds, for extracting MFCCs at level 3 (word). This parameter is basically the time step of the synchronization map output at level 3.

Default: 0.005.

New in version 1.5.0.

NUANCE_TTS_API_ID = 'nuance_tts_api_id'

Your ID value to use the Nuance TTS API.

You will be billed according to your Nuance Developers account plan.

Important: this feature is experimental, use at your own risk. It is recommended not to use this TTS at word-level granularity, as it will create many requests, hence it will be expensive. If you still want to use it, you can enable the TTS caching mechanism by setting TTS_CACHE to True.

New in version 1.5.0.

NUANCE_TTS_API_KEY = 'nuance_tts_api_key'

Your KEY value to use the Nuance TTS API.

You will be billed according to your Nuance Developers account plan.

Important: this feature is experimental, use at your own risk. It is recommended not to use this TTS at word-level granularity, as it will create many requests, hence it will be expensive. If you still want to use it, you can enable the TTS caching mechanism by setting TTS_CACHE to True.

New in version 1.5.0.

SAFETY_CHECKS = 'safety_checks'

If True, perform safety checks on input files and parameters. If set to False, it disables:

  • checks perfomed by Validator;
  • converting the audio file synthesized by the TTS engine so that its sample rate times the MFCC shift is an integer value.

Warning

Setting this parameter to False might result in runtime errors. Please be sure to understand the implications.

Default: True.

New in version 1.7.0.

TASK_MAX_AUDIO_LENGTH = 'task_max_audio_length'

Maximum length of the audio file of a Task, in seconds. If a Task has an audio file longer than this value, it will not be executed and an error will be raised.

Use 0 to disable this check.

Default: 0 seconds.

New in version 1.4.1.

TASK_MAX_TEXT_LENGTH = 'task_max_text_length'

Maximum number of text fragments in the text file of a Task. If a Task has more text fragments than this value, it will not be executed and an error will be raised.

Use 0 to disable this check.

Default: 0 (disabled).

New in version 1.4.1.

TMP_PATH = 'tmp_path'

Path to the temporary directory to be used. Default: None, meaning that the default temporary directory will be set by TMP_PATH_DEFAULT_POSIX or TMP_PATH_DEFAULT_NONPOSIX depending on your OS.

New in version 1.4.1.

TTS = 'tts'

The TTS engine to use for synthesizing text.

Allowed values are listed in ALLOWED_VALUES.

The default value is ESPEAK (espeak) which will use the built-in eSpeak TTS wrapper. You might need to provide a /full/path/to/your/espeak value to the TTS_PATH parameter if the command espeak is not available in one of the directories listed in your PATH environment variable.

Specify the value ESPEAKNG (espeak-ng) to use the eSpeak-ng TTS wrapper. You might need to provide a /full/path/to/your/espeak-ng value to the TTS_PATH parameter if the command espeak-ng is not available in one of the directories listed in your PATH environment variable.

Specify the value FESTIVAL (festival) to use the built-in Festival TTS wrapper. You might need to provide a /full/path/to/your/text2wave value to the TTS_PATH parameter if the command text2wave is not available in one of the directories listed in your PATH environment variable.

Specify the value AWS (aws) to use the built-in AWS Polly TTS API wrapper; you will need to provide your AWS API Access Key and Secret Access Key by either storing them on disk (e.g., in ~/.aws/credentials and ~/.aws/config) or setting them in environment variables. Please refer to http://boto3.readthedocs.io/en/latest/guide/configuration.html for further details.

Specify the value NUANCE (nuance) to use the built-in Nuance TTS API wrapper; you will need to provide your Nuance Developer API ID and API Key using the NUANCE_TTS_API_ID and NUANCE_TTS_API_KEY parameters. Please note that you will be billed according to your Nuance Developers account plan.

Specify the value CUSTOM (custom) to use a custom TTS; you will need to provide the path to the Python source file containing your TTS wrapper using the TTS_PATH parameter.

New in version 1.5.0.

TTS_API_RETRY_ATTEMPTS = 'tts_api_retry_attempts'

Retry an HTTP POST request to the Nuance TTS API for this number of times before giving up. It must be an integer greater than zero.

Note that this parameter was called nuance_tts_api_retry_attempts before v1.7.0.

Default: 5.

New in version 1.5.0.

TTS_API_SLEEP = 'tts_api_sleep'

Wait this number of seconds before the next HTTP POST request to the Nuance TTS API. This parameter can be used to throttle the HTTP usage. It cannot be a negative value.

Note that this parameter was called nuance_tts_api_sleep before v1.7.0.

Default: 1.000.

New in version 1.5.0.

TTS_CACHE = 'tts_cache'

If set to True, synthesize each distinct text fragment only once, caching the resulting audio data as a file on disk.

The cache files will be removed after the synthesis is compled.

This option is useful when calling TTS engines, via subprocess or remote APIs, on text files with many identical fragments, for example when aligning at word-level granularity.

Enabling this option will create the cache files in TMP_PATH, so make sure that that path has enough free space.

Default: False.

New in version 1.6.0.

TTS_GRANULARITY_MAP = {1: ('tts_l1', 'tts_path_l1'), 2: ('tts_l2', 'tts_path_l2'), 3: ('tts_l3', 'tts_path_l3')}

Map level numbers to TTS_* and TTS_PATH_* keys.

New in version 1.6.0.

TTS_L1 = 'tts_l1'

The TTS engine to use for synthesizing text at level 1 (paragraph).

See also TTS.

Default: espeak.

New in version 1.6.0.

TTS_L2 = 'tts_l2'

The TTS engine to use for synthesizing text at level 2 (sentence).

See also TTS.

Default: espeak.

New in version 1.6.0.

TTS_L3 = 'tts_l3'

The TTS engine to use for synthesizing text at level 3 (word).

See also TTS.

Default: espeak.

New in version 1.6.0.

TTS_PATH = 'tts_path'

Path to the TTS engine executable or the Python CustomTTSWrapper .py source file (see the aeneas/extra directory for examples).

You might need to use a full path, like /path/to/your/ttsengine or /path/to/your/ttswrapper.py.

Default: None, implying to use the default path defined by each TTS wrapper, if it calls the TTS engine via subprocess (otherwise it does not matter).

New in version 1.5.0.

TTS_PATH_L1 = 'tts_path_l1'

Path to the TTS engine executable to use for synthesizing text at level 1 (paragraph).

See also TTS_PATH.

Default: None.

New in version 1.6.0.

TTS_PATH_L2 = 'tts_path_l2'

Path to the TTS engine executable to use for synthesizing text at level 2 (sentence).

See also TTS_PATH.

Default: None.

New in version 1.6.0.

TTS_PATH_L3 = 'tts_path_l3'

Path to the TTS engine executable to use for synthesizing text at level 3 (word).

See also TTS_PATH.

Default: None.

New in version 1.6.0.

TTS_VOICE_CODE = 'tts_voice_code'

The code of the TTS voice to use. If you specify this value, it will override the default voice code associated with the language of your text.

Default: None.

New in version 1.5.0.

VAD_EXTEND_SPEECH_INTERVAL_AFTER = 'vad_extend_speech_after'

Extend to the right (after/future) a speech interval found by the VAD algorithm, by this many seconds.

Default: 0 seconds.

New in version 1.4.1.

VAD_EXTEND_SPEECH_INTERVAL_BEFORE = 'vad_extend_speech_before'

Extend to the left (before/past) a speech interval found by the VAD algorithm, by this many seconds.

Default: 0 seconds.

New in version 1.4.1.

VAD_LOG_ENERGY_THRESHOLD = 'vad_log_energy_threshold'

Threshold for the VAD algorithm to decide that a given frame contains speech. Note that this is the log10 of the energy coefficient.

Default: 0.699 = log10(5), that is, a frame must have an energy at least 5 times higher than the minimum to be considered a speech frame.

New in version 1.4.1.

VAD_MIN_NONSPEECH_LENGTH = 'vad_min_nonspeech_length'

Minimum length, in seconds, of a nonspeech interval.

Default: 0.200 seconds.

New in version 1.4.1.

dtw_margin

Return the value of the DTW_MARGIN key stored in this configuration object.

Return type:TimeValue
mmn

Return the value of the MFCC_MASK_NONSPEECH key stored in this configuration object.

Return type:bool
mwl

Return the value of the MFCC_WINDOW_LENGTH key stored in this configuration object.

Return type:TimeValue
mws

Return the value of the MFCC_WINDOW_SHIFT key stored in this configuration object.

Return type:TimeValue
safety_checks

Return the value of the SAFETY_CHECKS key stored in this configuration object.

If False, safety checks are not performed.

Return type:bool
sample_rate

Return the value of the FFMPEG_SAMPLE_RATE key stored in this configuration object.

Return type:int
set_granularity(level)[source]

Set the values for MFCC_WINDOW_LENGTH and MFCC_WINDOW_SHIFT matching the given granularity level.

Currently supported levels:

  • 1 (paragraph)
  • 2 (sentence)
  • 3 (word)
Parameters:level (int) – the desired granularity level
set_tts(level)[source]

Set the values for TTS and TTS_PATH matching the given granularity level.

Currently supported levels:

  • 1 (paragraph)
  • 2 (sentence)
  • 3 (word)
Parameters:level (int) – the desired granularity level
tts

Return the value of the TTS key stored in this configuration object.

Return type:string
tts_path

Return the value of the TTS_PATH key stored in this configuration object.

Return type:string