sd¶
This module contains the following classes:
SD
, for detecting the audio head and tail of a given audio file.
Warning
This module is likely to be refactored in a future version
New in version 1.2.0.
-
class
aeneas.sd.
SD
(real_wave_mfcc, text_file, rconf=None, logger=None)[source]¶ The SD (“start detector”).
Given an audio file and a text, detects the audio head and/or tail, using a voice activity detector (via
VAD
) and performing an alignment with a partial portion of the text (viaDTWAligner
).This implementation relies on the following heuristic:
- synthesize text until
max_head_length
timesaeneas.sd.SD.QUERY_FACTOR
seconds are reached; - consider only the first
max_head_length
timesaeneas.sd.SD.AUDIO_FACTOR
seconds of the audio file; - compute the best partial alignment of 1. with 2., and return the corresponding time value.
(Similarly for the audio tail.)
Parameters: - real_wave_mfcc (
AudioFileMFCC
) – the audio file - text_file (
TextFile
) – the text file - rconf (
RuntimeConfiguration
) – a runtime configuration - logger (
Logger
) – the logger object
-
AUDIO_FACTOR
= Decimal('2.5')¶ Multiply the max head/tail length by this factor to get the minimum length in the audio that will be searched for. Set it to be at least
1.0 + QUERY_FACTOR * 1.5
. Default:2.5
.New in version 1.5.0.
-
MAX_LENGTH
= TimeValue('10.000')¶ Try detecting audio head or tail up to this many seconds. Default:
10.000
.New in version 1.2.0.
-
MIN_LENGTH
= TimeValue('0.000')¶ Try detecting audio head or tail of at least this many seconds. Default:
0.000
.New in version 1.2.0.
-
QUERY_FACTOR
= Decimal('1.0')¶ Multiply the max head/tail length by this factor to get the minimum query length to be synthesized. Default:
1.0
.New in version 1.5.0.
-
detect_head
(min_head_length=None, max_head_length=None)[source]¶ Detect the audio head, returning its duration, in seconds.
Parameters: Return type: Raises: TypeError: if one of the parameters is not
None
or a numberRaises: ValueError: if one of the parameters is negative
-
detect_interval
(min_head_length=None, max_head_length=None, min_tail_length=None, max_tail_length=None)[source]¶ Detect the interval of the audio file containing the fragments in the text file.
Return the audio interval as a tuple of two
TimeValue
objects, representing the begin and end time, in seconds, with respect to the full wave duration.If one of the parameters is
None
, the default value (0.0
for min,10.0
for max) will be used.Parameters: Return type: Raises: TypeError: if one of the parameters is not
None
or a numberRaises: ValueError: if one of the parameters is negative
- synthesize text until