Krill Peer Mesh Network Architecture
Deep dive into Krill's peer-to-peer mesh networking including beacon discovery, server handshake, WebSocket lifecycle, and server settings bootstrap
Krill Peer Mesh Network Architecture
Overview
This document provides a comprehensive analysis of how Krill’s peer-to-peer mesh networking operates, including beacon discovery, server handshake processes, WebSocket connection lifecycle, and the /trust endpoint for server-to-server communication bootstrap.
Core Components
Key Classes and Their Responsibilities
| Class | Location | Purpose |
|---|---|---|
BeaconSupervisor | krill-sdk | Manages beacon listening and broadcasting lifecycle |
BeaconProcessor | krill-sdk | Processes incoming beacons, detects new/reconnected peers |
BeaconSender | krill-sdk | Sends beacon signals with rate limiting |
PeerSessionManager | krill-sdk | Tracks known peers by installId/sessionId for deduplication |
ServerHandshakeProcess | krill-sdk | Handles trust establishment and node synchronization |
CertificateCache | krill-sdk | Caches validated connections to avoid redundant cert downloads |
ClientSocketManager | krill-sdk | Manages WebSocket connections to servers |
ServerSocketManager | server | Manages WebSocket connections from clients |
ServerEventBusProcessor | server | Broadcasts node updates over WebSocket (excludes ERROR states) |
ServerServerProcessor | krill-sdk | Processes Server node state changes, triggers handshake |
NodeWire - Beacon Data Structure
1
2
3
4
5
6
7
data class NodeWire(
val timestamp: Long, // When beacon was sent
val installId: String, // Stable peer identity (UUID)
val host: String, // Hostname or IP
val port: Int, // Server port (0 for apps)
val sessionId: String // Current session ID (changes on restart)
)
Key Identity Rules:
installIdis the primary peer identity (stable across restarts, IP changes)sessionIdchanges on each restart (used to detect peer restarts)host:portis used for network connectivity but NOT for identity keying
Beacon Discovery Flow
Beacon Classification
- Server beacons:
port > 0(servers listen on a specific port) - App beacons:
port == 0(apps don’t run a server)
Startup Beacon Flow
sequenceDiagram
participant App as App/Server
participant BS as BeaconSupervisor
participant MC as Multicast
participant BP as BeaconProcessor
participant PSM as PeerSessionManager
participant SHP as ServerHandshakeProcess
App->>BS: startBeaconProcess()
BS->>MC: receiveBeacons(callback)
BS->>BS: sentStartupBeacon = true
BS->>MC: sendBeacon(wire)
Note over MC: Beacon broadcast on startup
MC-->>BS: Incoming beacon received
BS->>BS: handleIncomeWire(wire)
alt wire.port > 0 (Server beacon)
BS->>BP: processWire(wire)
BP->>PSM: isKnownSession(wire)?
alt Duplicate beacon (same session)
PSM-->>BP: true
Note over BP: Skip - already known
else Known host, new session (restart)
PSM-->>BP: false (but hasKnownHost = true)
BP->>PSM: add(wire)
BP->>SHP: trustServer(wire)
else New host
PSM-->>BP: false
BP->>PSM: add(wire)
BP->>SHP: trustServer(wire)
end
else wire.port == 0 (App beacon)
alt SystemInfo.isServer()
BS->>MC: sendBeacon(ourWire)
Note over BS: Server responds to app beacon
end
end
Server Handshake + Node Download Pipeline
Connection Result States
1
2
3
4
5
6
enum class ConnectionResult {
SUCCESS, // Connected and synced
CERTIFICATE_ERROR, // SSL/TLS issue - need cert download
NETWORK_ERROR, // Network unreachable
AUTH_ERROR // No API key or unauthorized
}
Handshake Pipeline Flow
sequenceDiagram
participant BP as BeaconProcessor
participant SHP as ServerHandshakeProcess
participant CC as CertificateCache
participant NM as NodeManager
participant SS as ServerSettings
participant CSM as ClientSocketManager
participant HTTP as HTTP Client
BP->>SHP: trustServer(wire)
Note over SHP: Mutex lock with installId-based job key
SHP->>SHP: Check existing job for installId
alt Job already running
Note over SHP: Skip - idempotent
else New job
SHP->>CC: hasValidConnection(installId, host, port, sessionId)?
SHP->>NM: nodeAvailable(wire.installId)?
alt Node not available (no API key yet)
SHP->>SHP: return AUTH_ERROR
SHP->>NM: setErrorState(stubNode)
Note over NM: Creates stub with ERROR state
else Node available
SHP->>SS: read(installId)
SS-->>SHP: ServerSettingsData
alt No API key
SHP->>SHP: return AUTH_ERROR
else Has API key
SHP->>CSM: start(wire)
SHP->>HTTP: GET /nodes
alt Success
SHP->>NM: Update/sync all nodes
SHP->>CC: markValid(installId, ...)
SHP->>SHP: return SUCCESS
else SSL Error
SHP->>CC: invalidate(installId)
SHP->>HTTP: Download cert from /trust
SHP->>SHP: Retry connection
else Unauthorized
SHP->>SHP: return AUTH_ERROR
end
end
end
end
/trust Endpoint - Server Settings Bootstrap
Two Entry Points for Peer Connection
| Entry Point | Trigger | Use Case |
|---|---|---|
| Beacon | UDP multicast | Automatic discovery on LAN |
| POST /trust | HTTP from app | User provides API key for server-to-server connection |
/trust Flow (App → Server → Peer Server)
sequenceDiagram
participant User
participant App as App UI (ExpandServer)
participant ServerA as Server A (connected)
participant NM as NodeManager
participant SS as ServerSettings
participant SSP as ServerServerProcessor
participant SHP as ServerHandshakeProcess
participant ServerB as Server B (target peer)
User->>App: Click peer server in ExpandServer
User->>App: Enter API key for Server B
App->>ServerA: POST /trust {id: serverB.installId, apiKey, trustCert}
ServerA->>NM: nodeAvailable(settingsData.id)?
alt Peer known (from beacon)
ServerA->>SS: write(id, settingsData)
ServerA->>NM: update(peer with USER_EDIT state)
NM->>SSP: post(peerNode)
SSP->>SSP: state == USER_EDIT
SSP->>SHP: trustServer(wire from meta)
SHP->>ServerB: Handshake + node download
else Peer unknown
ServerA-->>App: 404 Not Found
Note over App: Peer must be discovered via beacon first
end
Why Beacon Must Come First
The beacon provides essential information that /trust cannot:
- Hostname/IP: Where to connect
- Port: Which port the server listens on
- SessionId: Current session for deduplication
The /trust endpoint only provides:
- installId: Peer identity
- apiKey: Authentication credential
- trustCert: Whether to trust self-signed certificates
WebSocket Connection Lifecycle
Connection Keying
Connections are keyed by installId:sessionId (NOT host:port) for:
- Consistent peer identity across IP changes
- Proper handling of peer restarts (new sessionId)
- Avoiding duplicate connections
WebSocket Connection Flow
stateDiagram-v2
[*] --> Disconnected
Disconnected --> Connecting: start(wire)
Connecting --> Connected: WebSocket established
Connecting --> Disconnected: Connection error
Connected --> Receiving: Receive node updates
Receiving --> Connected: Process update
Connected --> Disconnected: Connection closed
Connected --> Disconnected: Error
Disconnected --> [*]: Peer removed
note right of Disconnected: onDisconnect sets peer to ERROR
note right of Connected: Updates flow via WebSocket
Disconnect Handling
When a WebSocket disconnects:
nodeManager.onDisconnect(wire)is called- If node exists, sets state to
ERROR - Connection tracking is cleaned up
- On next beacon from peer, reconnection is attempted
Peer Server Node State Transitions
stateDiagram-v2
[*] --> NONE: Beacon received + API key exists + sync success
NONE --> USER_EDIT: User updates settings
USER_EDIT --> NONE: Handshake success
USER_EDIT --> ERROR: Handshake failure
NONE --> ERROR: WebSocket disconnect
ERROR --> NONE: Beacon + successful resync
[*] --> ERROR: Beacon received + no API key
ERROR --> USER_EDIT: User sets API key
note right of ERROR: ERROR state does NOT trigger network work
note right of USER_EDIT: Triggers handshake pipeline
Invariants and Guarantees
Startup Invariants
- Client Node EXECUTE once: On app start, client node is created/loaded and executed exactly once
- Server Node EXECUTE once: On server start, server node is created/loaded and executed exactly once
- Beacon listening starts: Both apps and servers start beacon listening on startup
Beacon Handling Invariants
- Server beacons identified by port > 0
- Duplicate beacons ignored: Same installId + sessionId = already known
- Peer restarts detected: Same installId + different sessionId = reconnect flow
- Server responds to app beacons: Enables discovery
Connection Pipeline Invariants
- Idempotent operations: Job keys use
installId-sessionIdto prevent duplicates - No API key = STOP: No handshake/download/socket connect without API key
- ERROR state = STOP: ERROR nodes don’t trigger new network work
- Servers don’t broadcast errors: ERROR states are not sent over eventBus/WebSocket
Identity Invariants
- installId is primary identity: Used for all peer tracking
- sessionId detects restarts: New session = peer restarted
- host:port for connectivity only: NOT used for identity keying
Troubleshooting Guide
Symptom: Server appears in ERROR state
Likely Causes:
- No API key configured
- Wrong API key
- Network unreachable
- Certificate trust issue
Log Lines to Look For:
1
2
3
4
Received beacon from ${wire.host}:${wire.port}
Cannot update settings for unknown peer - peer must be discovered via beacon first
Unauthorised, you may need to set your api key for this server
SSL/Certificate error for peer ${installId}
Symptom: Duplicate connections or handshakes
Likely Causes:
- Using host:port as key instead of installId (fixed in this update)
- Beacon storm without proper deduplication
Log Lines to Look For:
1
2
Handshake job already in progress for ${jobKey}
Connection already active for peer ${installId}
Symptom: Node updates not received
Likely Causes:
- WebSocket disconnected
- ERROR state on peer node
- Traffic control rejecting as duplicate
Log Lines to Look For:
1
2
3
WebSocket connection closed for peer ${installId}
Error receiving message
ServerEventBusProcessor skipping broadcast for ${type} ${state}
Diagram Source Map
These diagrams were derived from the actual code in the repository:
| Diagram | Source Classes/Functions |
|---|---|
| Beacon Discovery Flow | BeaconSupervisor.startBeaconListener(), BeaconSupervisor.handleIncomeWire(), BeaconProcessor.processWire(), PeerSessionManager.isKnownSession() |
| Handshake Pipeline Flow | ServerHandshakeProcess.trustServer(), ServerHandshakeProcess.attemptConnection(), ServerHandshakeProcess.downloadAndSyncServerData(), CertificateCache.hasValidConnection() |
| /trust Flow | Routes.kt POST /trust, ServerServerProcessor.post(), ServerHandshakeProcess.trustServer() |
| WebSocket Connection Flow | ClientSocketManager.start(), ClientSocketManager.connectWebSocket(), ClientNodeManager.onDisconnect() |
| Peer Node State Transitions | ServerHandshakeProcess connection results, NodeManager.setErrorState(), NodeManager.complete() |