Post

Icon Media Player Queue Support API Reference

Complete API reference and implementation guide for the Media Player queue support feature. Includes data models, listener interface, usage examples, and speech marks integration with AWS Polly for synchronized audio playback.

Media Player Queue Support API Reference

Overview

This enhancement adds support for queuing multiple audio tracks with callbacks for track playback events, specifically designed for playing AWS Polly-generated MP3s with their corresponding speech marks JSON files.

Features

  • Queue Management: Play multiple tracks in sequence
  • Track Callbacks: Get notified when tracks start and complete
  • Duration Info: Access track duration from speech marks
  • Speech Marks: Full access to word and sentence timing data from Polly
  • Error Handling: Graceful error handling with callbacks

Data Models

AudioTrack

Represents a single audio track with its associated speech marks.

1
2
3
4
data class AudioTrack(
    val mp3Url: String,       // URL to the MP3 file
    val marksJsonUrl: String  // URL to the marks.json file from Polly
)

SpeechMark

Represents a single speech mark from Polly (word or sentence timing).

1
2
3
4
5
6
7
data class SpeechMark(
    val time: Int,      // Time in milliseconds from start
    val type: String,   // "word" or "sentence"
    val start: Int,     // Character position start
    val end: Int,       // Character position end
    val value: String   // The word or sentence text
)

TrackInfo

Contains the track, its speech marks, and calculated duration.

1
2
3
4
5
data class TrackInfo(
    val track: AudioTrack,
    val marks: List<SpeechMark>,
    val durationMs: Int  // Duration calculated from marks
)

TrackPlaybackListener Interface

Implement this interface to receive playback events:

1
2
3
4
5
6
interface TrackPlaybackListener {
    fun onTrackStarted(trackIndex: Int, trackInfo: TrackInfo)
    fun onTrackCompleted(trackIndex: Int)
    fun onQueueCompleted()
    fun onError(trackIndex: Int, error: String)
}

Usage Examples

Basic Queue Playback with Listener

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
suspend fun playDemo() {
    val tracks = listOf(
        AudioTrack(
            mp3Url = "https://example.com/001.mp3",
            marksJsonUrl = "https://example.com/001.marks.json"
        ),
        AudioTrack(
            mp3Url = "https://example.com/002.mp3",
            marksJsonUrl = "https://example.com/002.marks.json"
        )
    )
    
    val listener = object : TrackPlaybackListener {
        override fun onTrackStarted(trackIndex: Int, trackInfo: TrackInfo) {
            Logger.d("Track $trackIndex started - Duration: ${trackInfo.durationMs}ms")
            // Access speech marks for word-by-word timing
            trackInfo.marks.forEach { mark ->
                Logger.d("${mark.type} at ${mark.time}ms: ${mark.value}")
            }
        }
        
        override fun onTrackCompleted(trackIndex: Int) {
            Logger.d("Track $trackIndex completed")
        }
        
        override fun onQueueCompleted() {
            Logger.d("All tracks completed!")
        }
        
        override fun onError(trackIndex: Int, error: String) {
            Logger.d("Error on track $trackIndex: $error")
        }
    }
    
    mediaPlayer.playQueue(tracks, listener)
}

Simple Playback Without Listener

1
2
3
4
5
6
7
8
9
10
suspend fun playSimple() {
    val tracks = listOf(
        AudioTrack(
            mp3Url = "https://example.com/audio.mp3",
            marksJsonUrl = "https://example.com/audio.marks.json"
        )
    )
    
    mediaPlayer.playQueue(tracks)
}

Control Playback

1
2
3
4
5
// Stop current playback
mediaPlayer.stop()

// Clear the queue
mediaPlayer.clearQueue()

Speech Marks JSON Format

The marks.json files generated by AWS Polly have the following structure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[
  {
    "time": 0,
    "type": "sentence",
    "start": 7,
    "end": 24,
    "value": "Welcome to Krill!"
  },
  {
    "time": 25,
    "type": "word",
    "start": 7,
    "end": 14,
    "value": "Welcome"
  },
  {
    "time": 337,
    "type": "word",
    "start": 15,
    "end": 17,
    "value": "to"
  }
]

Field Descriptions

  • time: Milliseconds from the start of audio
  • type: “word” or “sentence”
  • start/end: Character positions in original text
  • value: The actual word or sentence

Implementation Details

Android Implementation

The Android implementation uses ExoPlayer3:

  • Fetches speech marks from the provided URLs
  • Calculates duration from the last mark’s timestamp
  • Uses ExoPlayer’s listener system to detect track transitions
  • Provides callbacks on the Main dispatcher

Track Transition Detection

stateDiagram-v2
    [*] --> Idle
    Idle --> Loading: playQueue()
    Loading --> Ready: STATE_READY
    Ready --> Playing: Auto-play
    Playing --> TrackStarted: onTrackStarted()
    TrackStarted --> Transitioning: End of track
    Transitioning --> TrackCompleted: onTrackCompleted()
    TrackCompleted --> Playing: Next track
    TrackCompleted --> QueueEnded: No more tracks
    QueueEnded --> [*]: onQueueCompleted()

Callback Sequence

  • onTrackStarted: Called when ExoPlayer enters READY state and starts playing
  • onTrackCompleted: Called when ExoPlayer transitions to the next media item
  • onQueueCompleted: Called when ExoPlayer reaches STATE_ENDED

Error Handling

  • Network errors when fetching marks.json are logged but don’t stop playback
  • Playback errors are reported via onError callback
  • Empty marks list is used as fallback if marks.json fetch fails

Generating Content with AWS Polly

The Python script python/synth_narration.py generates the required files:

1
2
3
4
5
./python/synth_narration.py \
  --input content/narration/krillapp \
  --out-dir build/narration \
  --voice Matthew \
  --engine neural

This creates for each .ssml file:

  • *.mp3 - The audio file
  • *.marks.json - Speech marks data
  • *.vtt - WebVTT subtitles

Example SSML Input

1
2
3
4
5
<speak>
    <amazon:domain name="conversational">
        Welcome to Krill! This is a demonstration of the audio queue system.
    </amazon:domain>
</speak>

Platform Support

Currently implemented for:

  • ✅ Android (full support with ExoPlayer3)
  • ❌ iOS (TODO)
  • ❌ Desktop/JVM (TODO)
  • ❌ WebAssembly (TODO)

Other platforms will return TODO("Not yet implemented") when calling mediaPlayer.playQueue().

MediaPlayer Interface

The complete MediaPlayer interface includes:

1
2
3
4
5
interface MediaPlayer {
    suspend fun playQueue(tracks: List<AudioTrack>, listener: TrackPlaybackListener? = null)
    fun stop()
    fun clearQueue()
}

Method Descriptions

playQueue()

Plays a queue of audio tracks sequentially.

Parameters:

  • tracks: List of AudioTrack objects to play
  • listener: Optional callback listener for playback events

Behavior:

  • Fetches all marks.json files in parallel before starting playback
  • Queues all MP3s in ExoPlayer
  • Calls listener callbacks as tracks play

stop()

Stops current playback immediately.

Behavior:

  • Stops ExoPlayer
  • Does not call onQueueCompleted()
  • Queue remains intact (use clearQueue() to clear)

clearQueue()

Clears the playback queue.

Behavior:

  • Removes all queued tracks
  • Stops playback if currently playing
  • Resets internal state
This post is licensed under CC BY 4.0 by the author.