dtw¶
This module contains the implementation of dynamic time warping (DTW) algorithms to align two audio waves, represented by their Mel-frequency cepstral coefficients (MFCCs).
This module contains the following classes:
DTWAlgorithm
, an enumeration of the available algorithms;DTWAligner
, the actual wave aligner;DTWExact
, a DTW aligner implementing the exact (full) DTW algorithm;DTWStripe
, a DTW aligner implementing the Sachoe-Chiba band heuristic.
To align two wave files:
- build an
DTWAligner
object, passing in the constructor the paths of the two wave files or their MFCC representations; - call
compute_path()
to compute the min cost path between the MFCC representations of the two wave files.
Warning
This module might be refactored in a future version
-
class
aeneas.dtw.
DTWAlgorithm
[source]¶ Enumeration of the DTW algorithms that can be used for the alignment of two audio waves.
-
ALLOWED_VALUES
= ['exact', 'stripe']¶ List of all the allowed values
-
EXACT
= 'exact'¶ Classical (exact) DTW algorithm.
This implementation has
O(nm)
time and space complexity, wheren
(respectively,m
) is the number of MFCC window shifts (vectors) of the real (respectively, synthesized) wave.
-
STRIPE
= 'stripe'¶ DTW algorithm restricted to a stripe around the main diagonal (Sakoe-Chiba Band), for reducing memory usage and run time.
Note that this is an heuristic approximation of the optimal (exact) path.
This implementation has
O(nd)
time and space complexity, wheren
is the number of MFCC window shifts (vectors) of the real wave, andd
is the number of MFCC window shifts corresponding to the margin.
-
-
class
aeneas.dtw.
DTWAligner
(real_wave_mfcc=None, synt_wave_mfcc=None, real_wave_path=None, synt_wave_path=None, rconf=None, logger=None)[source]¶ The audio wave aligner.
The two waves, henceforth named real and synthesized, can be passed as
AudioFileMFCC
objects or as file paths. In the latter case, MFCCs will be extracted upon object creation.Parameters: - real_wave_mfcc (
AudioFileMFCC
) – the real audio file - synt_wave_mfcc (
AudioFileMFCC
) – the synthesized audio file - real_wave_path (string) – the path to the real audio file
- synt_wave_path (string) – the path to the synthesized audio file
- rconf (
RuntimeConfiguration
) – a runtime configuration - logger (
Logger
) – the logger object
Raises: ValueError: if
real_wave_mfcc
orsynt_wave_mfcc
is notNone
but not of typeAudioFileMFCC
Raises: ValueError: if
real_wave_path
orsynt_wave_path
is notNone
but it cannot be read-
compute_accumulated_cost_matrix
()[source]¶ Compute the accumulated cost matrix, and return it.
Return
None
if the accumulated cost matrix cannot be computed because one of the two waves is empty after masking (if requested).Return type: numpy.ndarray
(2D)Raises: RuntimeError: if both the C extension and the pure Python code did not succeed. New in version 1.2.0.
-
compute_boundaries
(synt_anchors)[source]¶ Compute the min cost path between the two waves, and return a list of boundary points, representing the argmin values with respect to the provided
synt_anchors
timings.If
synt_anchors
hask
elements, the returned array will havek+1
elements, accounting for the tail fragment.Parameters: synt_anchors (list of TimeValue
) – the anchor time values (in seconds) of the synthesized fragments, each representing the begin time in the synthesized wave of the corresponding fragmentReturn the list of boundary indices.
Return type: numpy.ndarray
(1D)
-
compute_path
()[source]¶ Compute the min cost path between the two waves, and return it.
Return the computed path as a tuple with two elements, each being a
numpy.ndarray
(1D) ofint
indices:([r_1, r_2, ..., r_k], [s_1, s_2, ..., s_k])
where
r_i
are the indices in the real wave ands_i
are the indices in the synthesized wave, andk
is the length of the min cost path.Return
None
if the accumulated cost matrix cannot be computed because one of the two waves is empty after masking (if requested).Return type: tuple (see above) Raises: RuntimeError: if both the C extension and the pure Python code did not succeed.
- real_wave_mfcc (