Learn how to use the Multimodal Annotation Tool effectively
| Column | Type | Description |
|---|---|---|
turn_id | integer | Unique identifier for each utterance |
speaker | string | Speaker identifier (e.g., "S1", "S2") |
start | float | Start time in seconds |
end | float | End time in seconds |
utterance | string | The spoken text |
turn_id,speaker,start,end,utterance 1,S1,0.0,2.5,"Hello, how are you?" 2,S2,3.0,5.2,"I'm doing well, thanks!" 3,S1,6.1,8.9,"That's great to hear."
MP3 and MP4 formats are highly compressed and provide fast processing with smaller file sizes.
Keep CSV files under 10 MB for best performance. Large transcripts with thousands of rows may cause UI lag.
Define custom annotation columns with your own labels. Set up your schema before loading data for automatic column matching.
The transcript automatically highlights the current utterance during playback. Click any utterance to jump directly to that point in the media.
Simply drag media and CSV files directly onto the window for quick loading.
Export your annotated data as a CSV file, preserving all original columns plus your annotations.
Adjust the layout by dragging panel dividers to customize your workspace.
Click the "Save Annotations" button to export your annotated data. The exported CSV will include:
The exported file will be named original_filename_annotated.csv