Media Player Queue Support API Reference

Complete API reference and implementation guide for the Media Player queue support feature. Includes data models, listener interface, usage examples, and speech marks integration with AWS Polly for synchronized audio playback.

Posted Nov 16, 2025

By Benjamin Sautner

4 min read

Media Player Queue Support API Reference

Overview

This enhancement adds support for queuing multiple audio tracks with callbacks for track playback events, specifically designed for playing AWS Polly-generated MP3s with their corresponding speech marks JSON files.

Features

Queue Management: Play multiple tracks in sequence
Track Callbacks: Get notified when tracks start and complete
Duration Info: Access track duration from speech marks
Speech Marks: Full access to word and sentence timing data from Polly
Error Handling: Graceful error handling with callbacks

Data Models

AudioTrack

Represents a single audio track with its associated speech marks.

  
data class AudioTrack(
    val mp3Url: String,       // URL to the MP3 file
    val marksJsonUrl: String  // URL to the marks.json file from Polly
)

SpeechMark

Represents a single speech mark from Polly (word or sentence timing).

  
data class SpeechMark(
    val time: Int,      // Time in milliseconds from start
    val type: String,   // "word" or "sentence"
    val start: Int,     // Character position start
    val end: Int,       // Character position end
    val value: String   // The word or sentence text
)

TrackInfo

Contains the track, its speech marks, and calculated duration.

  
data class TrackInfo(
    val track: AudioTrack,
    val marks: List<SpeechMark>,
    val durationMs: Int  // Duration calculated from marks
)

TrackPlaybackListener Interface

Implement this interface to receive playback events:

  
interface TrackPlaybackListener {
    fun onTrackStarted(trackIndex: Int, trackInfo: TrackInfo)
    fun onTrackCompleted(trackIndex: Int)
    fun onQueueCompleted()
    fun onError(trackIndex: Int, error: String)
}

Usage Examples

Basic Queue Playback with Listener

  
suspend fun playDemo() {
    val tracks = listOf(
        AudioTrack(
            mp3Url = "https://example.com/001.mp3",
            marksJsonUrl = "https://example.com/001.marks.json"
        ),
        AudioTrack(
            mp3Url = "https://example.com/002.mp3",
            marksJsonUrl = "https://example.com/002.marks.json"
        )
    )
    
    val listener = object : TrackPlaybackListener {
        override fun onTrackStarted(trackIndex: Int, trackInfo: TrackInfo) {
            Logger.d("Track $trackIndex started - Duration: ${trackInfo.durationMs}ms")
            // Access speech marks for word-by-word timing
            trackInfo.marks.forEach { mark ->
                Logger.d("${mark.type} at ${mark.time}ms: ${mark.value}")
            }
        }
        
        override fun onTrackCompleted(trackIndex: Int) {
            Logger.d("Track $trackIndex completed")
        }
        
        override fun onQueueCompleted() {
            Logger.d("All tracks completed!")
        }
        
        override fun onError(trackIndex: Int, error: String) {
            Logger.d("Error on track $trackIndex: $error")
        }
    }
    
    mediaPlayer.playQueue(tracks, listener)
}

Simple Playback Without Listener

  
suspend fun playSimple() {
    val tracks = listOf(
        AudioTrack(
            mp3Url = "https://example.com/audio.mp3",
            marksJsonUrl = "https://example.com/audio.marks.json"
        )
    )
    
    mediaPlayer.playQueue(tracks)
}

Control Playback

  
// Stop current playback
mediaPlayer.stop()

// Clear the queue
mediaPlayer.clearQueue()

Speech Marks JSON Format

The marks.json files generated by AWS Polly have the following structure:

  
[
  {
    "time": 0,
    "type": "sentence",
    "start": 7,
    "end": 24,
    "value": "Welcome to Krill!"
  },
  {
    "time": 25,
    "type": "word",
    "start": 7,
    "end": 14,
    "value": "Welcome"
  },
  {
    "time": 337,
    "type": "word",
    "start": 15,
    "end": 17,
    "value": "to"
  }
]

Field Descriptions

time: Milliseconds from the start of audio
type: “word” or “sentence”
start/end: Character positions in original text
value: The actual word or sentence

Implementation Details

Android Implementation

The Android implementation uses ExoPlayer3:

Fetches speech marks from the provided URLs
Calculates duration from the last mark’s timestamp
Uses ExoPlayer’s listener system to detect track transitions
Provides callbacks on the Main dispatcher

Track Transition Detection

stateDiagram-v2
    [*] --> Idle
    Idle --> Loading: playQueue()
    Loading --> Ready: STATE_READY
    Ready --> Playing: Auto-play
    Playing --> TrackStarted: onTrackStarted()
    TrackStarted --> Transitioning: End of track
    Transitioning --> TrackCompleted: onTrackCompleted()
    TrackCompleted --> Playing: Next track
    TrackCompleted --> QueueEnded: No more tracks
    QueueEnded --> [*]: onQueueCompleted()

Callback Sequence

onTrackStarted: Called when ExoPlayer enters READY state and starts playing
onTrackCompleted: Called when ExoPlayer transitions to the next media item
onQueueCompleted: Called when ExoPlayer reaches STATE_ENDED

Error Handling

Network errors when fetching marks.json are logged but don’t stop playback
Playback errors are reported via onError callback
Empty marks list is used as fallback if marks.json fetch fails

Generating Content with AWS Polly

The Python script python/synth_narration.py generates the required files:

  
./python/synth_narration.py \
  --input content/narration/krillapp \
  --out-dir build/narration \
  --voice Matthew \
  --engine neural

This creates for each .ssml file:

*.mp3 - The audio file
*.marks.json - Speech marks data
*.vtt - WebVTT subtitles

Example SSML Input

  
<speak>
    <amazon:domain name="conversational">
        Welcome to Krill! This is a demonstration of the audio queue system.
    </amazon:domain>
</speak>

Platform Support

Currently implemented for:

✅ Android (full support with ExoPlayer3)
❌ iOS (TODO)
❌ Desktop/JVM (TODO)
❌ WebAssembly (TODO)

Other platforms will return TODO("Not yet implemented") when calling mediaPlayer.playQueue().

MediaPlayer Interface

The complete MediaPlayer interface includes:

  
interface MediaPlayer {
    suspend fun playQueue(tracks: List<AudioTrack>, listener: TrackPlaybackListener? = null)
    fun stop()
    fun clearQueue()
}

Method Descriptions

playQueue()

Plays a queue of audio tracks sequentially.

Parameters:

tracks: List of AudioTrack objects to play
listener: Optional callback listener for playback events

Behavior:

Fetches all marks.json files in parallel before starting playback
Queues all MP3s in ExoPlayer
Calls listener callbacks as tracks play

stop()

Stops current playback immediately.

Behavior:

Stops ExoPlayer
Does not call onQueueCompleted()
Queue remains intact (use clearQueue() to clear)

clearQueue()

Clears the playback queue.

Behavior:

Removes all queued tracks
Stops playback if currently playing
Resets internal state

Krill App, Media Player

This post is licensed under CC BY 4.0 by the author.

Overview

Features

Data Models

AudioTrack

SpeechMark

TrackInfo

TrackPlaybackListener Interface

Usage Examples

Basic Queue Playback with Listener

Simple Playback Without Listener

Control Playback

Speech Marks JSON Format

Field Descriptions

Implementation Details

Android Implementation

Track Transition Detection

Callback Sequence

Error Handling

Generating Content with AWS Polly

Example SSML Input

Platform Support

MediaPlayer Interface

Method Descriptions

playQueue()

stop()

clearQueue()

Trending Tags