dtw

This module contains the implementation of dynamic time warping (DTW) algorithms to align two audio waves, represented by their Mel-frequency cepstral coefficients (MFCCs).

This module contains the following classes:

  • DTWAlgorithm, an enumeration of the available algorithms;
  • DTWAligner, the actual wave aligner;
  • DTWExact, a DTW aligner implementing the exact (full) DTW algorithm;
  • DTWStripe, a DTW aligner implementing the Sachoe-Chiba band heuristic.

To align two wave files:

  1. build an DTWAligner object, passing in the constructor the paths of the two wave files or their MFCC representations;
  2. call compute_path() to compute the min cost path between the MFCC representations of the two wave files.

Warning

This module might be refactored in a future version

class aeneas.dtw.DTWAlgorithm[source]

Enumeration of the DTW algorithms that can be used for the alignment of two audio waves.

ALLOWED_VALUES = ['exact', 'stripe']

List of all the allowed values

EXACT = 'exact'

Classical (exact) DTW algorithm.

This implementation has O(nm) time and space complexity, where n (respectively, m) is the number of MFCC window shifts (vectors) of the real (respectively, synthesized) wave.

STRIPE = 'stripe'

DTW algorithm restricted to a stripe around the main diagonal (Sakoe-Chiba Band), for reducing memory usage and run time.

Note that this is an heuristic approximation of the optimal (exact) path.

This implementation has O(nd) time and space complexity, where n is the number of MFCC window shifts (vectors) of the real wave, and d is the number of MFCC window shifts corresponding to the margin.

class aeneas.dtw.DTWAligner(real_wave_mfcc=None, synt_wave_mfcc=None, real_wave_path=None, synt_wave_path=None, rconf=None, logger=None)[source]

The audio wave aligner.

The two waves, henceforth named real and synthesized, can be passed as AudioFileMFCC objects or as file paths. In the latter case, MFCCs will be extracted upon object creation.

Parameters:
  • real_wave_mfcc (AudioFileMFCC) – the real audio file
  • synt_wave_mfcc (AudioFileMFCC) – the synthesized audio file
  • real_wave_path (string) – the path to the real audio file
  • synt_wave_path (string) – the path to the synthesized audio file
  • rconf (RuntimeConfiguration) – a runtime configuration
  • logger (Logger) – the logger object
Raises:

ValueError: if real_wave_mfcc or synt_wave_mfcc is not None but not of type AudioFileMFCC

Raises:

ValueError: if real_wave_path or synt_wave_path is not None but it cannot be read

compute_accumulated_cost_matrix()[source]

Compute the accumulated cost matrix, and return it.

Return None if the accumulated cost matrix cannot be computed because one of the two waves is empty after masking (if requested).

Return type:numpy.ndarray (2D)
Raises:RuntimeError: if both the C extension and the pure Python code did not succeed.

New in version 1.2.0.

compute_boundaries(synt_anchors)[source]

Compute the min cost path between the two waves, and return a list of boundary points, representing the argmin values with respect to the provided synt_anchors timings.

If synt_anchors has k elements, the returned array will have k+1 elements, accounting for the tail fragment.

Parameters:synt_anchors (list of TimeValue) – the anchor time values (in seconds) of the synthesized fragments, each representing the begin time in the synthesized wave of the corresponding fragment

Return the list of boundary indices.

Return type:numpy.ndarray (1D)
compute_path()[source]

Compute the min cost path between the two waves, and return it.

Return the computed path as a tuple with two elements, each being a numpy.ndarray (1D) of int indices:

([r_1, r_2, ..., r_k], [s_1, s_2, ..., s_k])

where r_i are the indices in the real wave and s_i are the indices in the synthesized wave, and k is the length of the min cost path.

Return None if the accumulated cost matrix cannot be computed because one of the two waves is empty after masking (if requested).

Return type:tuple (see above)
Raises:RuntimeError: if both the C extension and the pure Python code did not succeed.
exception aeneas.dtw.DTWAlignerNotInitialized[source]

Error raised when trying to compute using an DTWAligner object whose real and/or synt waves are not initialized yet.