sd¶

This module contains the following classes:

SD, for detecting the audio head and tail of a given audio file.

Warning

This module is likely to be refactored in a future version

New in version 1.2.0.

class aeneas.sd.SD(real_wave_mfcc, text_file, rconf=None, logger=None)[source]¶

The SD (“start detector”).

Given an audio file and a text, detects the audio head and/or tail, using a voice activity detector (via VAD) and performing an alignment with a partial portion of the text (via DTWAligner).

This implementation relies on the following heuristic:

synthesize text until max_head_length times aeneas.sd.SD.QUERY_FACTOR seconds are reached;
consider only the first max_head_length times aeneas.sd.SD.AUDIO_FACTOR seconds of the audio file;
compute the best partial alignment of 1. with 2., and return the corresponding time value.

(Similarly for the audio tail.)

Parameters:	real_wave_mfcc (`AudioFileMFCC`) – the audio file text_file (`TextFile`) – the text file rconf (`RuntimeConfiguration`) – a runtime configuration logger (`Logger`) – the logger object

AUDIO_FACTOR = Decimal('2.5')¶: Multiply the max head/tail length by this factor to get the minimum length in the audio that will be searched for. Set it to be at least 1.0 + QUERY_FACTOR * 1.5. Default: 2.5.

New in version 1.5.0.

MAX_LENGTH = TimeValue('10.000')¶: Try detecting audio head or tail up to this many seconds. Default: 10.000.

New in version 1.2.0.

MIN_LENGTH = TimeValue('0.000')¶: Try detecting audio head or tail of at least this many seconds. Default: 0.000.

New in version 1.2.0.

QUERY_FACTOR = Decimal('1.0')¶: Multiply the max head/tail length by this factor to get the minimum query length to be synthesized. Default: 1.0.

New in version 1.5.0.

detect_head(min_head_length=None, max_head_length=None)[source]¶

Detect the audio head, returning its duration, in seconds.

Parameters:	min_head_length (`TimeValue`) – estimated minimum head length max_head_length (`TimeValue`) – estimated maximum head length
Return type:	`TimeValue`
Raises:	TypeError: if one of the parameters is not `None` or a number
Raises:	ValueError: if one of the parameters is negative

detect_interval(min_head_length=None, max_head_length=None, min_tail_length=None, max_tail_length=None)[source]¶

Detect the interval of the audio file containing the fragments in the text file.

Return the audio interval as a tuple of two TimeValue objects, representing the begin and end time, in seconds, with respect to the full wave duration.

If one of the parameters is None, the default value (0.0 for min, 10.0 for max) will be used.

Parameters:	min_head_length (`TimeValue`) – estimated minimum head length max_head_length (`TimeValue`) – estimated maximum head length min_tail_length (`TimeValue`) – estimated minimum tail length max_tail_length (`TimeValue`) – estimated maximum tail length
Return type:	(`TimeValue`, `TimeValue`)
Raises:	TypeError: if one of the parameters is not `None` or a number
Raises:	ValueError: if one of the parameters is negative

detect_tail(min_tail_length=None, max_tail_length=None)[source]¶

Detect the audio tail, returning its duration, in seconds.

Parameters:	min_tail_length (`TimeValue`) – estimated minimum tail length max_tail_length (`TimeValue`) – estimated maximum tail length
Return type:	`TimeValue`
Raises:	TypeError: if one of the parameters is not `None` or a number
Raises:	ValueError: if one of the parameters is negative