syncmap

A synchronization map, or sync map, is a map from text fragments to time intervals.

This package contains the following classes:

class aeneas.syncmap.SyncMap(tree=None, rconf=None, logger=None)[source]

A synchronization map, that is, a tree of SyncMapFragment objects.

Parameters:tree (Tree) – the tree of fragments; if None, an empty one will be created
add_fragment(fragment, as_last=True)[source]

Add the given sync map fragment, as the first or last child of the root node of the sync map tree.

Parameters:
  • fragment (SyncMapFragment) – the sync map fragment to be added
  • as_last (bool) – if True, append fragment; otherwise prepend it
Raises:

TypeError: if fragment is None or it is not an instance of SyncMapFragment

clear()[source]

Clear the sync map, removing all the current fragments.

clone()[source]

Return a deep copy of this sync map.

New in version 1.7.0.

Return type:SyncMap
fragments

The current list of sync map fragments which are (the values of) the children of the root node of the sync map tree.

Return type:list of SyncMapFragment
fragments_tree

Return the current tree of fragments.

Return type:Tree
has_adjacent_leaves_only

Return True if the sync map fragments which are the leaves of the sync map tree are all adjacent.

Return type:bool

New in version 1.7.0.

has_zero_length_leaves

Return True if there is at least one sync map fragment which has zero length among the leaves of the sync map tree.

Return type:bool

New in version 1.7.0.

is_single_level

Return True if the sync map has only one level, that is, if it is a list of fragments rather than a hierarchical tree.

Return type:bool
json_string

Return a JSON representation of the sync map.

Return type:string

New in version 1.3.1.

leaves(fragment_type=None)[source]

The current list of sync map fragments which are (the values of) the leaves of the sync map tree.

Return type:list of SyncMapFragment

New in version 1.7.0.

leaves_are_consistent

Return True if the sync map fragments which are the leaves of the sync map tree (except for HEAD and TAIL leaves) are all consistent, that is, their intervals do not overlap in forbidden ways.

Return type:bool

New in version 1.7.0.

output_html_for_tuning(audio_file_path, output_file_path, parameters=None)[source]

Output an HTML file for fine tuning the sync map manually.

Parameters:
  • audio_file_path (string) – the path to the associated audio file
  • output_file_path (string) – the path to the output file to write
  • parameters (dict) – additional parameters

New in version 1.3.1.

read(sync_map_format, input_file_path, parameters=None)[source]

Read sync map fragments from the given file in the specified format, and add them the current (this) sync map.

Return True if the call succeeded, False if an error occurred.

Parameters:
  • sync_map_format (SyncMapFormat) – the format of the sync map
  • input_file_path (string) – the path to the input file to read
  • parameters (dict) – additional parameters (e.g., for SMIL input)
Raises:

ValueError: if sync_map_format is None or it is not an allowed value

Raises:

OSError: if input_file_path does not exist

write(sync_map_format, output_file_path, parameters=None)[source]

Write the current sync map to file in the requested format.

Return True if the call succeeded, False if an error occurred.

Parameters:
  • sync_map_format (SyncMapFormat) – the format of the sync map
  • output_file_path (string) – the path to the output file to write
  • parameters (dict) – additional parameters (e.g., for SMIL output)
Raises:

ValueError: if sync_map_format is None or it is not an allowed value

Raises:

TypeError: if a required parameter is missing

Raises:

OSError: if output_file_path cannot be written

class aeneas.syncmap.fragment.SyncMapFragment(text_fragment=None, interval=None, begin=None, end=None, fragment_type=0, confidence=1.0)[source]

A sync map fragment, that is, a text fragment and an associated time interval.

Parameters:
  • text_fragment (TextFragment) – the text fragment
  • begin (TimeValue) – the begin time of the audio interval
  • end (TimeValue) – the end time of the audio interval
  • confidence (float) – the confidence of the audio timing
HEAD = 1

Head fragment

NONSPEECH = 3

Nonspeech fragment (not head nor tail)

NOT_REGULAR_TYPES = [1, 2, 3]

Types of fragment different than REGULAR

REGULAR = 0

Regular fragment

TAIL = 2

Tail fragment

begin

The begin time of this sync map fragment.

Return type:TimeValue
chars

Return the number of characters of the text fragment, not including the line separators.

Return type:int

New in version 1.2.0.

confidence

The confidence of the audio timing, from 0.0 to 1.0.

Currently this value is not used, and it is always 1.0.

Return type:float
end

The end time of this sync map fragment.

Return type:TimeValue
fragment_type

The type of fragment.

Possible values are:

Return type:int
has_zero_length

Returns True if this sync map fragment has zero length, that is, if its begin and end values coincide.

Return type:bool

New in version 1.7.0.

identifier

The identifier of this sync map fragment.

Return type:string

New in version 1.7.0.

interval

The time interval corresponding to this fragment.

Return type:TimeInterval
is_head_or_tail

Return True if the fragment is HEAD or TAIL.

Return type:bool

New in version 1.7.0.

is_regular

Return True if the fragment is REGULAR.

Return type:bool

New in version 1.7.0.

length

The audio duration of this sync map fragment, as end time minus begin time.

Return type:TimeValue
pretty_print

Pretty print representation of this fragment, as (identifier, begin, end, text).

Return type:string

New in version 1.7.0.

rate

The rate, in characters/second, of this fragment.

If the fragment is not REGULAR or its duration is zero, return None.

Return type:None or Decimal

New in version 1.2.0.

rate_lack(max_rate)[source]

The time interval that this fragment lacks to respect the given max rate.

A positive value means that the current fragment is faster than the max rate (bad). A negative or zero value means that the current fragment has rate slower or equal to the max rate (good).

Always return 0.000 for fragments that are not REGULAR.

Parameters:max_rate (Decimal) – the maximum rate (characters/second)
Return type:TimeValue

New in version 1.7.0.

rate_slack(max_rate)[source]

The maximum time interval that can be stolen to this fragment while keeping it respecting the given max rate.

For REGULAR fragments this value is the opposite of the rate_lack. For NONSPEECH fragments this value is equal to the length of the fragment. For HEAD and TAIL fragments this value is 0.000, meaning that they cannot be stolen.

Parameters:max_rate (Decimal) – the maximum rate (characters/second)
Return type:TimeValue

New in version 1.7.0.

text

The text of this sync map fragment.

Return type:string

New in version 1.7.0.

text_fragment

The text fragment associated with this sync map fragment.

Return type:TextFragment
class aeneas.syncmap.fragmentlist.SyncMapFragmentList(begin, end, rconf=None, logger=None)[source]

A type representing a list of sync map fragments, with some constraints:

  • the begin and end time of each fragment should be within the list begin and end times;
  • two time fragments can only overlap at the boundary;
  • the list is kept sorted.

This class has some convenience methods for clipping, offsetting, moving fragment boundaries, and fixing fragments with zero length.

Parameters:
Raises:
  • TypeError – if begin or end are not instances of TimeValue
  • ValueError – if begin is negative or if begin is bigger than end

New in version 1.7.0.

ALLOWED_POSITIONS = [0, 1, 2, 3, 4, 6, 7, 8, 9, 11, 12, 13, 14, 24, 25]

Allowed positions for any pair of time intervals in the list

add(fragment, sort=True)[source]

Add the given fragment to the list (and keep the latter sorted).

An error is raised if the fragment cannot be added, for example if its interval violates the list constraints.

Parameters:
  • fragment (SyncMapFragment) – the fragment to be added
  • sort (bool) – if True ensure that after the insertion the list is kept sorted
Raises:
  • TypeError – if interval is not an instance of TimeInterval
  • ValueError – if interval does not respect the boundaries of the list or if it overlaps an existing interval, or if sort=True but the list is not guaranteed sorted
clone()[source]

Return a deep copy of this configuration object.

Return type:SyncMapFragmentList
fix_zero_length_fragments(duration=TimeValue('0.001'), min_index=None, max_index=None, ensure_adjacent=True)[source]

Fix fragments with zero length, enlarging them to have length duration, reclaiming the difference from the next fragment(s), or moving the next fragment(s) forward.

This function assumes the fragments to be adjacent.

Parameters:
  • duration (TimeValue) – set the zero length fragments to have this duration
  • min_index (int) – examine fragments with index greater than or equal to this index (i.e., included)
  • max_index (int) – examine fragments with index smaller than this index (i.e., excluded)
Raises:

ValueError – if min_index is negative or max_index is bigger than the current number of fragments

fragments

Iterates through the fragments in the list (which are sorted).

Return type:generator of SyncMapFragment
fragments_ending_inside_nonspeech_intervals(nonspeech_intervals, tolerance)[source]

Determine a list of pairs (nonspeech interval, fragment index), such that the nonspeech interval contains exactly one fragment ending inside it (within the given tolerance) and adjacent to the next fragment.

Parameters:
  • nonspeech_intervals (list of TimeInterval) – the list of nonspeech intervals to be examined
  • tolerance (TimeValue) – the tolerance to be applied when checking if the end point falls within a given nonspeech interval
Return type:

list of (TimeInterval, int)

has_adjacent_fragments_only(min_index=None, max_index=None)[source]

Return True if the list contains only adjacent fragments, that is, if it does not have gaps.

Parameters:
  • min_index (int) – examine fragments with index greater than or equal to this index (i.e., included)
  • max_index (int) – examine fragments with index smaller than this index (i.e., excluded)
Raises:

ValueError – if min_index is negative or max_index is bigger than the current number of fragments

Return type:

bool

has_zero_length_fragments(min_index=None, max_index=None)[source]

Return True if the list has at least one interval with zero length withing min_index and max_index. If the latter are not specified, check all intervals.

Parameters:
  • min_index (int) – examine fragments with index greater than or equal to this index (i.e., included)
  • max_index (int) – examine fragments with index smaller than this index (i.e., excluded)
Raises:

ValueError – if min_index is negative or max_index is bigger than the current number of fragments

Return type:

bool

inject_long_nonspeech_fragments(pairs, replacement_string)[source]

Inject nonspeech fragments corresponding to the given intervals in this fragment list.

It is assumed that pairs are consistent, e.g. they are produced by fragments_ending_inside_nonspeech_intervals.

Parameters:
  • pairs (list) – list of (TimeInterval, int) pairs, each identifying a nonspeech interval and the corresponding fragment index ending inside it
  • replacement_string (string) – the string to be applied to the nonspeech intervals
is_guaranteed_sorted

Return True if the list is sorted, and False if it might not be sorted (for example, because an add(..., sort=False) operation was performed).

Return type:bool
move_transition_point(fragment_index, value)[source]

Change the transition point between fragment fragment_index and the next fragment to the time value value.

This method fails silently (without changing the fragment list) if at least one of the following conditions holds:

  • fragment_index is negative
  • fragment_index is the last or the second-to-last
  • value is after the current end of the next fragment
  • the current fragment and the next one are not adjacent and both proper intervals (not zero length)

The above conditions ensure that the move makes sense and that it keeps the list satisfying the constraints.

Parameters:
  • fragment_index (int) – the fragment index whose end should be moved
  • value (TimeValue) – the new transition point
nonspeech_fragments

Iterates through the nonspeech fragments in the list (which are sorted).

Return type:generator of (int, SyncMapFragment)
offset(offset)[source]

Move all the intervals in the list by the given offset.

Parameters:offset (TimeValue) – the shift to be applied
Raises:TypeError – if offset is not an instance of TimeValue
regular_fragments

Iterates through the regular fragments in the list (which are sorted).

Return type:generator of (int, SyncMapFragment)
remove(indices)[source]

Remove the fragments corresponding to the given list of indices.

Parameters:indices (list of int) – the list of indices to be removed
Raises:ValueError – if one of the indices is not valid
remove_nonspeech_fragments(zero_length_only=False)[source]

Remove NONSPEECH fragments from the list.

If zero_length_only is True, remove only those fragments with zero length, and make all the others REGULAR.

Parameters:zero_length_only (bool) – remove only zero length NONSPEECH fragments
sort()[source]

Sort the fragments in the list.

Raises:ValueError – if there is a fragment which violates the list constraints
class aeneas.syncmap.format.SyncMapFormat[source]

Enumeration of the supported output formats for a synchronization map.

ALLOWED_VALUES = ['aud', 'audh', 'audm', 'csv', 'csvh', 'csvm', 'dfxp', 'eaf', 'json', 'rbse', 'sbv', 'smil', 'smilh', 'smilm', 'srt', 'ssv', 'ssvh', 'ssvm', 'sub', 'tab', 'textgrid', 'textgrid_long', 'textgrid_short', 'tsv', 'tsvh', 'tsvm', 'ttml', 'txt', 'txth', 'txtm', 'vtt', 'xml', 'xml_legacy']

List of all the allowed values

AUD = 'aud'

Alias for AUDM.

AUDH = 'audh'

Tab-separated plain text, with human-readable time values and fragment text:

00:00:00.000   00:00:01.234   Text of fragment 1
00:00:01.234   00:00:05.678   Text of fragment 2
00:00:05.678   00:00:07.890   Text of fragment 3
  • Multiple levels: no
  • Multiple lines: no

See also http://manual.audacityteam.org/man/label_tracks.html#export

New in version 1.5.0.

AUDM = 'audm'

Tab-separated plain text, with machine-readable time values and fragment text, compatible with Audacity:

0.000   1.234   Text fragment 1
1.234   5.678   Text fragment 2
5.678   7.890   Text fragment 3
  • Multiple levels: no
  • Multiple lines: no

See also http://manual.audacityteam.org/man/label_tracks.html#export

New in version 1.5.0.

CSV = 'csv'

Alias for CSVM.

CSVH = 'csvh'

Comma-separated values (CSV), with human-readable time values:

f001,00:00:00.000,00:00:01.234,"First fragment text"
f002,00:00:01.234,00:00:05.678,"Second fragment text"
f003,00:00:05.678,00:00:07.890,"Third fragment text"
  • Multiple levels: no
  • Multiple lines: no

Please note that the text is assumed to be contained in double quotes (”...”), which are stripped when reading from file, and added back when writing to file.

New in version 1.0.4.

CSVM = 'csvm'

Comma-separated values (CSV), with machine-readable time values:

f001,0.000,1.234,"First fragment text"
f002,1.234,5.678,"Second fragment text"
f003,5.678,7.890,"Third fragment text"
  • Multiple levels: no
  • Multiple lines: no

Please note that the text is assumed to be contained in double quotes (”...”), which are stripped when reading from file, and added back when writing to file.

New in version 1.2.0.

DFXP = 'dfxp'

Alias for TTML.

New in version 1.4.1.

EAF = 'eaf'

ELAN EAF:

<?xml version="1.0" encoding="UTF-8"?>
<ANNOTATION_DOCUMENT AUTHOR="aeneas" DATE="2016-01-01T00:00:00+00:00" FORMAT="2.8" VERSION="2.8" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://www.mpi.nl/tools/elan/EAFv2.8.xsd">
    <HEADER MEDIA_FILE="" TIME_UNITS="milliseconds" />
    <TIME_ORDER>
        <TIME_SLOT TIME_SLOT_ID="ts001b" TIME_VALUE="0"/>
        <TIME_SLOT TIME_SLOT_ID="ts001e" TIME_VALUE="1234"/>
        <TIME_SLOT TIME_SLOT_ID="ts002b" TIME_VALUE="1234"/>
        <TIME_SLOT TIME_SLOT_ID="ts002e" TIME_VALUE="5678"/>
        <TIME_SLOT TIME_SLOT_ID="ts003b" TIME_VALUE="5678"/>
        <TIME_SLOT TIME_SLOT_ID="ts003e" TIME_VALUE="7890"/>
    </TIME_ORDER>
    <TIER LINGUISTIC_TYPE_REF="utterance" TIER_ID="tier1">
        <ANNOTATION>
            <ALIGNABLE_ANNOTATION ANNOTATION_ID="f001" TIME_SLOT_REF1="ts001b" TIME_SLOT_REF2="ts001e">
                <ANNOTATION_VALUE>First fragment text</ANNOTATION_VALUE>
            </ALIGNABLE_ANNOTATION>
        </ANNOTATION>
        <ANNOTATION>
            <ALIGNABLE_ANNOTATION ANNOTATION_ID="f002" TIME_SLOT_REF1="ts002b" TIME_SLOT_REF2="ts002e">
                <ANNOTATION_VALUE>First fragment text</ANNOTATION_VALUE>
            </ALIGNABLE_ANNOTATION>
        </ANNOTATION>
        <ANNOTATION>
            <ALIGNABLE_ANNOTATION ANNOTATION_ID="f003" TIME_SLOT_REF1="ts003b" TIME_SLOT_REF2="ts003e">
                <ANNOTATION_VALUE>First fragment text</ANNOTATION_VALUE>
            </ALIGNABLE_ANNOTATION>
        </ANNOTATION>
    </TIER>
    <LINGUISTIC_TYPE LINGUISTIC_TYPE_ID="utterance" TIME_ALIGNABLE="true"/>
</ANNOTATION_DOCUMENT>
  • Multiple levels: no
  • Multiple lines: no

See also https://tla.mpi.nl/tla-news/documentation-of-eaf-elan-annotation-format/

New in version 1.5.0.

JSON = 'json'

JSON:

{
 "fragments": [
  {
   "id": "f001",
   "language": "en",
   "begin": 0.000,
   "end": 1.234,
   "children": [],
   "lines": [
    "First fragment text"
   ]
  },
  {
   "id": "f002",
   "language": "en",
   "begin": 1.234,
   "end": 5.678,
   "children": [],
   "lines": [
    "Second fragment text",
    "Second line of second fragment"
   ]
  },
  {
   "id": "f003",
   "language": "en",
   "begin": 5.678,
   "end": 7.890,
   "children": [],
   "lines": [
    "Third fragment text",
    "Second line of third fragment"
   ]
  }
 ]
}
  • Multiple levels: yes (output only)
  • Multiple lines: yes

New in version 1.2.0.

RBSE = 'rbse'

JSON compatible with rb_smil_emulator.js:

{
 "smil_ids": [
  "f001",
  "f002",
  "f003",
 ],
 "smil_data": [
  { "id": "f001", "begin": 0.000, "end": 1.234 },
  { "id": "f002", "begin": 1.234, "end": 5.678 },
  { "id": "f003", "begin": 5.678, "end": 7.890 }
 ]
}
  • Multiple levels: no
  • Multiple lines: no

See also https://github.com/pettarin/rb_smil_emulator

Deprecated, it will be removed in v2.0.0.

Deprecated since version 1.5.0.

New in version 1.2.0.

SBV = 'sbv'

SubViewer (SBV/SUB) caption/subtitle format, with multiple lines per fragment are separated by a newline character:

[SUBTITLE]
00:00:00.000,00:00:01.234
First fragment text

00:00:01.234,00:00:05.678
Second fragment text
Second line of second fragment

00:00:05.678,00:00:07.890
Third fragment text
Second line of third fragment
  • Multiple levels: no
  • Multiple lines: yes

See also https://wiki.videolan.org/SubViewer/

Note that the [INFORMATION] header is ignored when reading, and it is not produced when writing. Moreover, extensions (i.e., [COLF], [SIZE], [FONT]) are not supported.

SMIL = 'smil'

Alias for SMILH.

SMILH = 'smilh'

SMIL (as in the EPUB 3 Media Overlay specification), with human-readable time values:

<smil xmlns="http://www.w3.org/ns/SMIL" xmlns:epub="http://www.idpf.org/2007/ops" version="3.0">
 <body>
  <seq id="seq000001" epub:textref="p001.xhtml">
   <par id="par000001">
    <text src="p001.xhtml#f001"/>
    <audio clipBegin="00:00:00.000" clipEnd="00:00:01.234" src="../Audio/p001.mp3"/>
   </par>
   <par id="par000002">
    <text src="p001.xhtml#f002"/>
    <audio clipBegin="00:00:01.234" clipEnd="00:00:05.678" src="../Audio/p001.mp3"/>
   </par>
   <par id="par000003">
    <text src="p001.xhtml#f003"/>
    <audio clipBegin="00:00:05.678" clipEnd="00:00:07.890" src="../Audio/p001.mp3"/>
   </par>
  </seq>
 </body>
</smil>
  • Multiple levels: yes (output only)
  • Multiple lines: no

See also http://www.idpf.org/epub3/latest/mediaoverlays

New in version 1.2.0.

SMILM = 'smilm'

SMIL (as in the EPUB 3 Media Overlay specification), with machine-readable time values:

<smil xmlns="http://www.w3.org/ns/SMIL" xmlns:epub="http://www.idpf.org/2007/ops" version="3.0">
 <body>
  <seq id="seq000001" epub:textref="p001.xhtml">
   <par id="par000001">
    <text src="p001.xhtml#f001"/>
    <audio clipBegin="0.000" clipEnd="1.234" src="../Audio/p001.mp3"/>
   </par>
   <par id="par000002">
    <text src="p001.xhtml#f002"/>
    <audio clipBegin="1.234" clipEnd="5.678" src="../Audio/p001.mp3"/>
   </par>
   <par id="par000003">
    <text src="p001.xhtml#f003"/>
    <audio clipBegin="5.678" clipEnd="7.890" src="../Audio/p001.mp3"/>
   </par>
  </seq>
 </body>
</smil>
  • Multiple levels: yes (output only)
  • Multiple lines: no

See also http://www.idpf.org/epub3/latest/mediaoverlays

New in version 1.2.0.

SRT = 'srt'

SubRip (SRT) caption/subtitle format (it might have multiple lines per fragment):

1
00:00:00,000 --> 00:00:01,234
First fragment text

2
00:00:01,234 --> 00:00:05,678
Second fragment text
Second line of second fragment

3
00:00:05,678 --> 00:00:07,890
Third fragment text
Second line of third fragment
  • Multiple levels: no
  • Multiple lines: yes

See also https://wiki.videolan.org/SubRip/

Note that extensions (i.e., <b>, <s>, <u>, <i>, <font>) are not supported.

SSV = 'ssv'

Alias for SSVM.

New in version 1.0.4.

SSVH = 'ssvh'

Space-separated plain text, with human-readable time values:

00:00:00.000 00:00:01.234 f001 "First fragment text"
00:00:01.234 00:00:05.678 f002 "Second fragment text"
00:00:05.678 00:00:07.890 f003 "Third fragment text"
  • Multiple levels: no
  • Multiple lines: no

Please note that the text is assumed to be contained in double quotes (”...”), which are stripped when reading from file, and added back when writing to file.

New in version 1.0.4.

SSVM = 'ssvm'

Space-separated plain text, with machine-readable time values:

0.000 1.234 f001 "First fragment text"
1.234 5.678 f002 "Second fragment text"
5.678 7.890 f003 "Third fragment text"
  • Multiple levels: no
  • Multiple lines: no

Please note that the text is assumed to be contained in double quotes (”...”), which are stripped when reading from file, and added back when writing to file.

New in version 1.2.0.

SUB = 'sub'

SubViewer (SBV/SUB) caption/subtitle format, with multiple lines per fragment are separated by [br]:

[SUBTITLE]
00:00:00.000,00:00:01.234
First fragment text

00:00:01.234,00:00:05.678
Second fragment text[br]Second line of second fragment

00:00:05.678,00:00:07.890
Third fragment text[br]Second line of third fragment
  • Multiple levels: no
  • Multiple lines: yes

See also https://wiki.videolan.org/SubViewer/

Note that the [INFORMATION] header is ignored when reading, and it is not produced when writing. Moreover, extensions (i.e., [COLF], [SIZE], [FONT]) are not supported.

New in version 1.4.1.

TAB = 'tab'

Deprecated, it will be removed in v2.0.0. Use TSV instead.

Deprecated since version 1.0.3.

TEXTGRID = 'textgrid'

Alias for TEXTGRID_LONG.

TEXTGRID_LONG = 'textgrid_long'

Praat full TextGrid format:

File type = "ooTextFile"
Object class = "TextGrid"

xmin = 0.0
xmax = 7.89
tiers? <exists>
size = 1
item []:
    item [1]:
        class = "IntervalTier"
        name = "Token"
        xmin = 0.0
        xmax = 7.89
        intervals: size = 3
        intervals [1]:
            xmin = 0.0
            xmax = 1.234
            text = "First fragment text"
        intervals [2]:
            xmin = 1.234
            xmax = 5.678
            text = "Second fragment text"
        intervals [3]:
            xmin = 5.678
            xmax = 7.89
            text = "Third fragment text"
  • Multiple levels: no (not yet)
  • Multiple lines: no

See also http://www.fon.hum.uva.nl/praat/manual/TextGrid_file_formats.html

Note that at the moment reading support is limited to the first tier in the TextGrid file.

New in version 1.7.0.

TEXTGRID_SHORT = 'textgrid_short'

Praat short TextGrid format:

File type = "ooTextFile"
Object class = "TextGrid"

0.0
7.89
<exists>
1
"IntervalTier"
"Token"
0.0
7.89
3
0.0
1.234
"First fragment text"
1.234
5.678
"Second fragment text"
5.678
7.89
"Third fragment text"
  • Multiple levels: no (not yet)
  • Multiple lines: no

See also http://www.fon.hum.uva.nl/praat/manual/TextGrid_file_formats.html

Note that at the moment reading support is limited to the first tier in the TextGrid file.

New in version 1.7.0.

TSV = 'tsv'

Alias for TSVM.

TSVH = 'tsvh'

Tab-separated plain text, with human-readable time values:

00:00:00.000   00:00:01.234   f001
00:00:01.234   00:00:05.678   f002
00:00:05.678   00:00:07.890   f003
  • Multiple levels: no
  • Multiple lines: no

New in version 1.0.4.

TSVM = 'tsvm'

Tab-separated plain text, with machine-readable time values, compatible with Audacity:

0.000   1.234   f001
1.234   5.678   f002
5.678   7.890   f003
  • Multiple levels: no
  • Multiple lines: no

New in version 1.2.0.

TTML = 'ttml'

TTML caption/subtitle format (it might have multiple lines per fragment):

<?xml version="1.0" encoding="UTF-8" ?>
<tt xmlns="http://www.w3.org/ns/ttml">
 <body>
  <div>
   <p xml:id="f001" begin="0.000" end="1.234">
    First fragment text
   </p>
   <p xml:id="f002" begin="1.234" end="5.678">
    Second fragment text<br/>Second line of second fragment
   </p>
   <p xml:id="f003" begin="5.678" end="7.890">
    Third fragment text<br/>Second line of third fragment
   </p>
  </div>
 </body>
</tt>

See also https://www.w3.org/TR/ttml1/

  • Multiple levels: yes (output only)
  • Multiple lines: yes
TXT = 'txt'

Alias for TXTM.

TXTH = 'txth'

Space-separated plain text with human-readable time values:

f001 00:00:00.000 00:00:01.234 "First fragment text"
f002 00:00:01.234 00:00:05.678 "Second fragment text"
f003 00:00:05.678 00:00:07.890 "Third fragment text"
  • Multiple levels: no
  • Multiple lines: no

Please note that the text is assumed to be contained in double quotes (”...”), which are stripped when reading from file, and added back when writing to file.

New in version 1.0.4.

TXTM = 'txtm'

Space-separated plain text, with machine-readable time values, compatible with SonicVisualizer:

f001 0.000 1.234 "First fragment text"
f002 1.234 5.678 "Second fragment text"
f003 5.678 7.890 "Third fragment text"
  • Multiple levels: no
  • Multiple lines: no

Please note that the text is assumed to be contained in double quotes (”...”), which are stripped when reading from file, and added back when writing to file.

New in version 1.2.0.

VTT = 'vtt'

WebVTT caption/subtitle format:

WEBVTT

1
00:00:00.000 --> 00:00:01.234
First fragment text

2
00:00:01.234 --> 00:00:05.678
Second fragment text
Second line of second fragment

3
00:00:05.678 --> 00:00:07.890
Third fragment text
Second line of third fragment
  • Multiple levels: no
  • Multiple lines: yes

See also https://w3c.github.io/webvtt/

Note that WebVTT files using tabs as separators cannot be read at the moment. Use spaces instead or pre-process your files, replacing tabs with spaces.

XML = 'xml'

XML:

<?xml version="1.0" encoding="UTF-8" ?>
<map>
 <fragment id="f001" begin="0.000" end="1.234">
  <line>First fragment text</line>
  <children></children>
 </fragment>
 <fragment id="f002" begin="1.234" end="5.678">
  <line>Second fragment text</line>
  <line>Second line of second fragment</line>
  <children></children>
 </fragment>
 <fragment id="f003" begin="5.678" end="7.890">
  <line>Third fragment text</line>
  <line>Second line of third fragment</line>
  <children></children>
 </fragment>
</map>
  • Multiple levels: yes (output only)
  • Multiple lines: yes
XML_LEGACY = 'xml_legacy'

XML, legacy format:

<?xml version="1.0" encoding="UTF-8" ?>
<map>
 <fragment>
  <identifier>f001</identifier>
  <start>0.000</start>
  <end>1.234</end>
 </fragment>
 <fragment>
  <identifier>f002</identifier>
  <start>1.234</start>
  <end>5.678</end>
 </fragment>
 <fragment>
  <identifier>f003</identifier>
  <start>5.678</start>
  <end>7.890</end>
 </fragment>
</map>
  • Multiple levels: no
  • Multiple lines: no

Deprecated, it will be removed in v2.0.0. Use XML instead.

Deprecated since version 1.2.0.

class aeneas.syncmap.headtailformat.SyncMapHeadTailFormat[source]

Enumeration of the supported output formats for the head and tail of the synchronization maps.

New in version 1.2.0.

ADD = 'add'

Add two empty sync map fragments, one at the begin and one at the end of the sync map, corresponding to the head and the tail.

For example:

0.000 0.500 HEAD
0.500 1.234 First fragment
1.234 5.678 Second fragment
5.678 7.000 Third fragment
7.000 7.890 TAIL

becomes:

0.000 0.500
0.500 1.234 First fragment
1.234 5.678 Second fragment
5.678 7.000 Third fragment
7.000 7.890
ALLOWED_VALUES = ['add', 'hidden', 'stretch']

List of all the allowed values

HIDDEN = 'hidden'

Do not output sync map fragments for the head and tail.

For example:

0.000 0.500 HEAD
0.500 1.234 First fragment
1.234 5.678 Second fragment
5.678 7.000 Third fragment
7.000 7.890 TAIL

becomes:

0.500 1.234 First fragment
1.234 5.678 Second fragment
5.678 7.000 Third fragment
STRETCH = 'stretch'

Set the begin attribute of the first sync map fragment to 0, and the end attribute of the last sync map fragment to the length of the audio file.

For example:

0.000 0.500 HEAD
0.500 1.234 First fragment
1.234 5.678 Second fragment
5.678 7.000 Third fragment
7.000 7.890 TAIL

becomes:

0.000 1.234 First fragment
1.234 5.678 Second fragment
5.678 7.890 Third fragment
exception aeneas.syncmap.missingparametererror.SyncMapMissingParameterError[source]

Error raised when a parameter which is implied by the actual SyncMapFormat value is missing.