vad¶
This module contains the following classes:
VAD
, a simple voice activity detector based on the energy of the 0-th MFCC.
Given an energy vector representing an audio file,
it will return a boolean mask
with elements set to True
where speech is,
and False
where nonspeech occurs.
New in version 1.0.4.
-
class
aeneas.vad.
VAD
(logger=None, rconf=None)[source]¶ The voice activity detector (VAD).
Parameters: - rconf (
RuntimeConfiguration
) – a runtime configuration - logger (
Logger
) – the logger object
-
run_vad
(wave_energy, log_energy_threshold=None, min_nonspeech_length=None, extend_before=None, extend_after=None)[source]¶ Compute the time intervals containing speech and nonspeech, and return a boolean mask with speech frames set to
True
, and nonspeech frames set toFalse
.The last four parameters might be
None
: in this case, the corresponding RuntimeConfiguration values are applied.Parameters: - wave_energy (
numpy.ndarray
(1D)) – the energy vector of the audio file (0-th MFCC) - log_energy_threshold (float) – the minimum log energy threshold to consider a frame as speech
- min_nonspeech_length (int) – the minimum length, in frames, of a nonspeech interval
- extend_before (int) – extend each speech interval by this number of frames to the left (before)
- extend_after (int) – extend each speech interval by this number of frames to the right (after)
Return type: numpy.ndarray
(1D)- wave_energy (
- rconf (