sd

This module contains the following classes:

  • SD, for detecting the audio head and tail of a given audio file.

Warning

This module is likely to be refactored in a future version

New in version 1.2.0.

class aeneas.sd.SD(real_wave_mfcc, text_file, rconf=None, logger=None)[source]

The SD (“start detector”).

Given an audio file and a text, detects the audio head and/or tail, using a voice activity detector (via VAD) and performing an alignment with a partial portion of the text (via DTWAligner).

This implementation relies on the following heuristic:

  1. synthesize text until max_head_length times aeneas.sd.SD.QUERY_FACTOR seconds are reached;
  2. consider only the first max_head_length times aeneas.sd.SD.AUDIO_FACTOR seconds of the audio file;
  3. compute the best partial alignment of 1. with 2., and return the corresponding time value.

(Similarly for the audio tail.)

Parameters:
  • real_wave_mfcc (AudioFileMFCC) – the audio file
  • text_file (TextFile) – the text file
  • rconf (RuntimeConfiguration) – a runtime configuration
  • logger (Logger) – the logger object
AUDIO_FACTOR = Decimal('2.5')

Multiply the max head/tail length by this factor to get the minimum length in the audio that will be searched for. Set it to be at least 1.0 + QUERY_FACTOR * 1.5. Default: 2.5.

New in version 1.5.0.

MAX_LENGTH = TimeValue('10.000')

Try detecting audio head or tail up to this many seconds. Default: 10.000.

New in version 1.2.0.

MIN_LENGTH = TimeValue('0.000')

Try detecting audio head or tail of at least this many seconds. Default: 0.000.

New in version 1.2.0.

QUERY_FACTOR = Decimal('1.0')

Multiply the max head/tail length by this factor to get the minimum query length to be synthesized. Default: 1.0.

New in version 1.5.0.

detect_head(min_head_length=None, max_head_length=None)[source]

Detect the audio head, returning its duration, in seconds.

Parameters:
  • min_head_length (TimeValue) – estimated minimum head length
  • max_head_length (TimeValue) – estimated maximum head length
Return type:

TimeValue

Raises:

TypeError: if one of the parameters is not None or a number

Raises:

ValueError: if one of the parameters is negative

detect_interval(min_head_length=None, max_head_length=None, min_tail_length=None, max_tail_length=None)[source]

Detect the interval of the audio file containing the fragments in the text file.

Return the audio interval as a tuple of two TimeValue objects, representing the begin and end time, in seconds, with respect to the full wave duration.

If one of the parameters is None, the default value (0.0 for min, 10.0 for max) will be used.

Parameters:
  • min_head_length (TimeValue) – estimated minimum head length
  • max_head_length (TimeValue) – estimated maximum head length
  • min_tail_length (TimeValue) – estimated minimum tail length
  • max_tail_length (TimeValue) – estimated maximum tail length
Return type:

(TimeValue, TimeValue)

Raises:

TypeError: if one of the parameters is not None or a number

Raises:

ValueError: if one of the parameters is negative

detect_tail(min_tail_length=None, max_tail_length=None)[source]

Detect the audio tail, returning its duration, in seconds.

Parameters:
  • min_tail_length (TimeValue) – estimated minimum tail length
  • max_tail_length (TimeValue) – estimated maximum tail length
Return type:

TimeValue

Raises:

TypeError: if one of the parameters is not None or a number

Raises:

ValueError: if one of the parameters is negative