Krill Peer Mesh Network Architecture
Deep dive into Krill's peer-to-peer mesh networking including beacon discovery, server handshake, SSE real-time updates, and server settings bootstrap
Krill Peer Mesh Network Architecture
Overview
This document provides a comprehensive analysis of how Krill’s peer-to-peer mesh networking operates, including beacon discovery, server handshake processes, SSE (Server-Sent Events) real-time update lifecycle, and the /trust endpoint for server-to-server communication bootstrap.
Core Components
Key Classes and Their Responsibilities
| Class | Location | Purpose |
|---|---|---|
BeaconSupervisor | shared | Manages beacon listening and broadcasting lifecycle |
BeaconProcessor | shared | Processes incoming beacons, detects new/reconnected peers |
BeaconSender | shared | Sends beacon signals with rate limiting |
PeerSessionManager | shared | Tracks known peers by installId/sessionId for deduplication |
ServerHandshakeProcess | shared | Handles trust establishment and node synchronization |
CertificateCache | shared | Caches validated connections to avoid redundant cert downloads |
SSEBoss | shared | Manages SSE connections to servers for real-time updates |
ServerNodeManager | shared | Server-side node management with nodeUpdates SharedFlow for SSE broadcasting |
ServerServerProcessor | shared | Processes Server node state changes, triggers handshake |
NodeWire - Beacon Data Structure
1
2
3
4
5
6
7
data class NodeWire(
val timestamp: Long, // When beacon was sent
val installId: String, // Stable peer identity (UUID)
val host: String, // Hostname or IP
val port: Int, // Server port (0 for apps)
val sessionId: String // Current session ID (changes on restart)
)
Key Identity Rules:
installIdis the primary peer identity (stable across restarts, IP changes)sessionIdchanges on each restart (used to detect peer restarts)host:portis used for network connectivity but NOT for identity keying
Beacon Discovery Flow
Beacon Classification
- Server beacons:
port > 0(servers listen on a specific port) - App beacons:
port == 0(apps don’t run a server)
Startup Beacon Flow
sequenceDiagram
participant App as App/Server
participant BS as BeaconSupervisor
participant MC as Multicast
participant BP as BeaconProcessor
participant PSM as PeerSessionManager
participant SHP as ServerHandshakeProcess
App->>BS: startBeaconProcess()
BS->>MC: receiveBeacons(callback)
BS->>BS: sentStartupBeacon = true
BS->>MC: sendBeacon(wire)
Note over MC: Beacon broadcast on startup
MC-->>BS: Incoming beacon received
BS->>BS: handleIncomeWire(wire)
alt wire.port > 0 (Server beacon)
BS->>BP: processWire(wire)
BP->>PSM: isKnownSession(wire)?
alt Duplicate beacon (same session)
PSM-->>BP: true
Note over BP: Skip - already known
else Known host, new session (restart)
PSM-->>BP: false (but hasKnownHost = true)
BP->>PSM: add(wire)
BP->>SHP: trustServer(wire)
else New host
PSM-->>BP: false
BP->>PSM: add(wire)
BP->>SHP: trustServer(wire)
end
else wire.port == 0 (App beacon)
alt SystemInfo.isServer()
BS->>MC: sendBeacon(ourWire)
Note over BS: Server responds to app beacon
end
end
Server Handshake + Node Download Pipeline
Connection Result States
1
2
3
4
5
6
enum class ConnectionResult {
SUCCESS, // Connected and synced
CERTIFICATE_ERROR, // SSL/TLS issue - need cert download
NETWORK_ERROR, // Network unreachable
AUTH_ERROR // No API key or unauthorized
}
Handshake Pipeline Flow
sequenceDiagram
participant BP as BeaconProcessor
participant SHP as ServerHandshakeProcess
participant CC as CertificateCache
participant NM as NodeManager
participant SS as ServerSettings
participant CSM as ClientSocketManager
participant HTTP as HTTP Client
BP->>SHP: trustServer(wire)
Note over SHP: Mutex lock with installId-based job key
SHP->>SHP: Check existing job for installId
alt Job already running
Note over SHP: Skip - idempotent
else New job
SHP->>CC: hasValidConnection(installId, host, port, sessionId)?
SHP->>NM: nodeAvailable(wire.installId)?
alt Node not available (no API key yet)
SHP->>SHP: return AUTH_ERROR
SHP->>NM: setErrorState(stubNode)
Note over NM: Creates stub with ERROR state
else Node available
SHP->>SS: read(installId)
SS-->>SHP: ServerSettingsData
alt No API key
SHP->>SHP: return AUTH_ERROR
else Has API key
SHP->>CSM: start(wire)
SHP->>HTTP: GET /nodes
alt Success
SHP->>NM: Update/sync all nodes
SHP->>CC: markValid(installId, ...)
SHP->>SHP: return SUCCESS
else SSL Error
SHP->>CC: invalidate(installId)
SHP->>HTTP: Download cert from /trust
SHP->>SHP: Retry connection
else Unauthorized
SHP->>SHP: return AUTH_ERROR
end
end
end
end
/trust Endpoint - Server Settings Bootstrap
Two Entry Points for Peer Connection
| Entry Point | Trigger | Use Case |
|---|---|---|
| Beacon | UDP multicast | Automatic discovery on LAN |
| POST /trust | HTTP from app | User provides API key for server-to-server connection |
/trust Flow (App → Server → Peer Server)
sequenceDiagram
participant User
participant App as App UI (ExpandServer)
participant ServerA as Server A (connected)
participant NM as NodeManager
participant SS as ServerSettings
participant SSP as ServerServerProcessor
participant SHP as ServerHandshakeProcess
participant ServerB as Server B (target peer)
User->>App: Click peer server in ExpandServer
User->>App: Enter API key for Server B
App->>ServerA: POST /trust {id: serverB.installId, apiKey, trustCert}
ServerA->>NM: nodeAvailable(settingsData.id)?
alt Peer known (from beacon)
ServerA->>SS: write(id, settingsData)
ServerA->>NM: update(peer with USER_EDIT state)
NM->>SSP: post(peerNode)
SSP->>SSP: state == USER_EDIT
SSP->>SHP: trustServer(wire from meta)
SHP->>ServerB: Handshake + node download
else Peer unknown
ServerA-->>App: 404 Not Found
Note over App: Peer must be discovered via beacon first
end
Why Beacon Must Come First
The beacon provides essential information that /trust cannot:
- Hostname/IP: Where to connect
- Port: Which port the server listens on
- SessionId: Current session for deduplication
The /trust endpoint only provides:
- installId: Peer identity
- apiKey: Authentication credential
- trustCert: Whether to trust self-signed certificates
SSE Real-Time Updates
Architecture Overview
Real-time node updates flow from server to clients using Server-Sent Events (SSE). This replaces the previous WebSocket-based approach with a simpler, more reliable unidirectional stream.
sequenceDiagram
participant Client as App Client
participant SSEBoss as SSEBoss
participant Server as Ktor Server
participant NM as ServerNodeManager
participant Actor as Actor Channel
Client->>SSEBoss: connect(serverNode)
SSEBoss->>Server: GET /sse (HTTPS)
Server-->>SSEBoss: Initial server state
Note over NM,Actor: Node state change occurs
NM->>Actor: update(node)
Actor->>Actor: updateInternal(node)
Actor->>NM: emit to _nodeUpdates SharedFlow
NM-->>Server: nodeUpdates.collect { node }
Server-->>SSEBoss: SSE event: node JSON
SSEBoss->>Client: nodeManager.update(node)
SSE Connection Flow
stateDiagram-v2
[*] --> Disconnected
Disconnected --> Connecting: SSEBoss.connect(node)
Connecting --> Connected: SSE stream established
Connecting --> Disconnected: Connection error
Connected --> Receiving: Receive node update
Receiving --> Connected: Update local NodeManager
Connected --> Disconnected: Connection closed
Connected --> Disconnected: Error
Disconnected --> [*]: Server removed
note right of Disconnected: Reconnection attempted on next beacon
note right of Connected: Real-time updates flow via SSE
Key Components
Server-Side (Routes.kt):
1
2
3
4
5
6
7
8
9
10
11
sse("/sse", serialize = { _, node ->
fastJson.encodeToString<Node>(node as Node)
}) {
// Send current server state on connect
send(nodeManager.readNodeState(installId()).value)
// Collect from the nodeUpdates SharedFlow for real-time updates
nodeManager.nodeUpdates.collect { node ->
send(node)
}
}
Client-Side (SSEBoss.kt):
1
2
3
4
5
6
7
client.sse(urlString = sseUrl.toString()) {
incoming.collect { event ->
deserialize<Node>(event.data)?.let { node ->
nodeManager.update(node)
}
}
}
NodeManager SharedFlow Integration
The ServerNodeManager uses the actor pattern for thread-safe updates and emits to a SharedFlow for SSE broadcasting:
1
2
3
4
5
6
7
8
9
10
11
// In ServerNodeManager.updateInternal()
private fun updateInternal(node: Node) {
// Update the StateFlow (triggers NodeObserver → Processor)
val f = nodes.getOrPut(node.id) { MutableStateFlow(node).also { observe(it) } }
f.value = node
// Broadcast to SSE clients (only for owned nodes)
if (node.isMine() && node.state != NodeState.DELETING) {
scope.launch { _nodeUpdates.emit(node) }
}
}
Disconnect Handling
When an SSE connection is lost:
SSEBosscatches the exception and logs it- The job is removed from the active connections map
- On next beacon from peer, a new SSE connection is attempted
Peer Server Node State Transitions
stateDiagram-v2
[*] --> NONE: Beacon received + API key exists + sync success
NONE --> USER_EDIT: User updates settings
USER_EDIT --> NONE: Handshake success
USER_EDIT --> ERROR: Handshake failure
NONE --> ERROR: WebSocket disconnect
ERROR --> NONE: Beacon + successful resync
[*] --> ERROR: Beacon received + no API key
ERROR --> USER_EDIT: User sets API key
note right of ERROR: ERROR state does NOT trigger network work
note right of USER_EDIT: Triggers handshake pipeline
Invariants and Guarantees
Startup Invariants
- Client Node EXECUTE once: On app start, client node is created/loaded and executed exactly once
- Server Node EXECUTE once: On server start, server node is created/loaded and executed exactly once
- Beacon listening starts: Both apps and servers start beacon listening on startup
Beacon Handling Invariants
- Server beacons identified by port > 0
- Duplicate beacons ignored: Same installId + sessionId = already known
- Peer restarts detected: Same installId + different sessionId = reconnect flow
- Server responds to app beacons: Enables discovery
Connection Pipeline Invariants
- Idempotent operations: Job keys use
installId-sessionIdto prevent duplicates - No API key = STOP: No handshake/download/SSE connect without API key
- ERROR state = STOP: ERROR nodes don’t trigger new network work
- Servers don’t broadcast deleting nodes: DELETING states are not sent over SSE
Identity Invariants
- installId is primary identity: Used for all peer tracking
- sessionId detects restarts: New session = peer restarted
- host:port for connectivity only: NOT used for identity keying
Troubleshooting Guide
Symptom: Server appears in ERROR state
Likely Causes:
- No API key configured
- Wrong API key
- Network unreachable
- Certificate trust issue
Log Lines to Look For:
1
2
3
4
Received beacon from ${wire.host()}:${wire.port}
Cannot update settings for unknown peer - peer must be discovered via beacon first
Unauthorised, you may need to set your api key for this server
SSL/Certificate error for peer ${installId}
Symptom: Duplicate connections or handshakes
Likely Causes:
- Using host:port as key instead of installId (fixed in this update)
- Beacon storm without proper deduplication
Log Lines to Look For:
1
2
Handshake job already in progress for ${jobKey}
Connection already active for peer ${installId}
Symptom: Node updates not received
Likely Causes:
- SSE connection disconnected
- ERROR state on peer node
- Server not emitting to nodeUpdates SharedFlow
Log Lines to Look For:
1
2
3
SSEBoss connecting failed
SSE sending update: ${type} ${state}
node updated ${details}
Diagram Source Map
These diagrams were derived from the actual code in the repository:
| Diagram | Source Classes/Functions |
|---|---|
| Beacon Discovery Flow | BeaconSupervisor.startBeaconListener(), BeaconSupervisor.handleIncomeWire(), BeaconProcessor.processWire(), PeerSessionManager.isKnownSession() |
| Handshake Pipeline Flow | ServerHandshakeProcess.trustServer(), ServerHandshakeProcess.attemptConnection(), ServerHandshakeProcess.downloadAndSyncServerData(), CertificateCache.hasValidConnection() |
| /trust Flow | Routes.kt POST /trust, ServerServerProcessor.post(), ServerHandshakeProcess.trustServer() |
| SSE Connection Flow | SSEBoss.connect(), Routes.kt /sse endpoint, ServerNodeManager.updateInternal(), NodeManager.nodeUpdates |
| Peer Node State Transitions | ServerHandshakeProcess connection results, NodeManager.setErrorState(), NodeManager.complete() |