sd¶
This module contains the following classes:
SD, for detecting the audio head and tail of a given audio file.
Warning
This module is likely to be refactored in a future version
New in version 1.2.0.
-
class
aeneas.sd.SD(real_wave_mfcc, text_file, rconf=None, logger=None)[source]¶ The SD (“start detector”).
Given an audio file and a text, detects the audio head and/or tail, using a voice activity detector (via
VAD) and performing an alignment with a partial portion of the text (viaDTWAligner).This implementation relies on the following heuristic:
- synthesize text until
max_head_lengthtimesaeneas.sd.SD.QUERY_FACTORseconds are reached; - consider only the first
max_head_lengthtimesaeneas.sd.SD.AUDIO_FACTORseconds of the audio file; - compute the best partial alignment of 1. with 2., and return the corresponding time value.
(Similarly for the audio tail.)
Parameters: - real_wave_mfcc (
AudioFileMFCC) – the audio file - text_file (
TextFile) – the text file - rconf (
RuntimeConfiguration) – a runtime configuration - logger (
Logger) – the logger object
-
AUDIO_FACTOR= Decimal('2.5')¶ Multiply the max head/tail length by this factor to get the minimum length in the audio that will be searched for. Set it to be at least
1.0 + QUERY_FACTOR * 1.5. Default:2.5.New in version 1.5.0.
-
MAX_LENGTH= TimeValue('10.000')¶ Try detecting audio head or tail up to this many seconds. Default:
10.000.New in version 1.2.0.
-
MIN_LENGTH= TimeValue('0.000')¶ Try detecting audio head or tail of at least this many seconds. Default:
0.000.New in version 1.2.0.
-
QUERY_FACTOR= Decimal('1.0')¶ Multiply the max head/tail length by this factor to get the minimum query length to be synthesized. Default:
1.0.New in version 1.5.0.
-
detect_head(min_head_length=None, max_head_length=None)[source]¶ Detect the audio head, returning its duration, in seconds.
Parameters: Return type: Raises: TypeError: if one of the parameters is not
Noneor a numberRaises: ValueError: if one of the parameters is negative
-
detect_interval(min_head_length=None, max_head_length=None, min_tail_length=None, max_tail_length=None)[source]¶ Detect the interval of the audio file containing the fragments in the text file.
Return the audio interval as a tuple of two
TimeValueobjects, representing the begin and end time, in seconds, with respect to the full wave duration.If one of the parameters is
None, the default value (0.0for min,10.0for max) will be used.Parameters: Return type: Raises: TypeError: if one of the parameters is not
Noneor a numberRaises: ValueError: if one of the parameters is negative
- synthesize text until