Post

Icon Krill Peer Mesh Network Architecture

Deep dive into Krill's peer-to-peer mesh networking including beacon discovery, server handshake, WebSocket lifecycle, and server settings bootstrap

Krill Peer Mesh Network Architecture

Krill Peer Mesh Network Architecture

Overview

This document provides a comprehensive analysis of how Krill’s peer-to-peer mesh networking operates, including beacon discovery, server handshake processes, WebSocket connection lifecycle, and the /trust endpoint for server-to-server communication bootstrap.

Core Components

Key Classes and Their Responsibilities

ClassLocationPurpose
BeaconSupervisorkrill-sdkManages beacon listening and broadcasting lifecycle
BeaconProcessorkrill-sdkProcesses incoming beacons, detects new/reconnected peers
BeaconSenderkrill-sdkSends beacon signals with rate limiting
PeerSessionManagerkrill-sdkTracks known peers by installId/sessionId for deduplication
ServerHandshakeProcesskrill-sdkHandles trust establishment and node synchronization
CertificateCachekrill-sdkCaches validated connections to avoid redundant cert downloads
ClientSocketManagerkrill-sdkManages WebSocket connections to servers
ServerSocketManagerserverManages WebSocket connections from clients
ServerEventBusProcessorserverBroadcasts node updates over WebSocket (excludes ERROR states)
ServerServerProcessorkrill-sdkProcesses Server node state changes, triggers handshake

NodeWire - Beacon Data Structure

1
2
3
4
5
6
7
data class NodeWire(
    val timestamp: Long,      // When beacon was sent
    val installId: String,    // Stable peer identity (UUID)
    val host: String,         // Hostname or IP
    val port: Int,            // Server port (0 for apps)
    val sessionId: String     // Current session ID (changes on restart)
)

Key Identity Rules:

  • installId is the primary peer identity (stable across restarts, IP changes)
  • sessionId changes on each restart (used to detect peer restarts)
  • host:port is used for network connectivity but NOT for identity keying

Beacon Discovery Flow

Beacon Classification

  • Server beacons: port > 0 (servers listen on a specific port)
  • App beacons: port == 0 (apps don’t run a server)

Startup Beacon Flow

sequenceDiagram
    participant App as App/Server
    participant BS as BeaconSupervisor
    participant MC as Multicast
    participant BP as BeaconProcessor
    participant PSM as PeerSessionManager
    participant SHP as ServerHandshakeProcess
    
    App->>BS: startBeaconProcess()
    BS->>MC: receiveBeacons(callback)
    BS->>BS: sentStartupBeacon = true
    BS->>MC: sendBeacon(wire)
    Note over MC: Beacon broadcast on startup
    
    MC-->>BS: Incoming beacon received
    BS->>BS: handleIncomeWire(wire)
    
    alt wire.port > 0 (Server beacon)
        BS->>BP: processWire(wire)
        BP->>PSM: isKnownSession(wire)?
        
        alt Duplicate beacon (same session)
            PSM-->>BP: true
            Note over BP: Skip - already known
        else Known host, new session (restart)
            PSM-->>BP: false (but hasKnownHost = true)
            BP->>PSM: add(wire)
            BP->>SHP: trustServer(wire)
        else New host
            PSM-->>BP: false
            BP->>PSM: add(wire)
            BP->>SHP: trustServer(wire)
        end
    else wire.port == 0 (App beacon)
        alt SystemInfo.isServer()
            BS->>MC: sendBeacon(ourWire)
            Note over BS: Server responds to app beacon
        end
    end

Server Handshake + Node Download Pipeline

Connection Result States

1
2
3
4
5
6
enum class ConnectionResult {
    SUCCESS,            // Connected and synced
    CERTIFICATE_ERROR,  // SSL/TLS issue - need cert download
    NETWORK_ERROR,      // Network unreachable
    AUTH_ERROR          // No API key or unauthorized
}

Handshake Pipeline Flow

sequenceDiagram
    participant BP as BeaconProcessor
    participant SHP as ServerHandshakeProcess
    participant CC as CertificateCache
    participant NM as NodeManager
    participant SS as ServerSettings
    participant CSM as ClientSocketManager
    participant HTTP as HTTP Client
    
    BP->>SHP: trustServer(wire)
    
    Note over SHP: Mutex lock with installId-based job key
    SHP->>SHP: Check existing job for installId
    
    alt Job already running
        Note over SHP: Skip - idempotent
    else New job
        SHP->>CC: hasValidConnection(installId, host, port, sessionId)?
        
        SHP->>NM: nodeAvailable(wire.installId)?
        
        alt Node not available (no API key yet)
            SHP->>SHP: return AUTH_ERROR
            SHP->>NM: setErrorState(stubNode)
            Note over NM: Creates stub with ERROR state
        else Node available
            SHP->>SS: read(installId)
            SS-->>SHP: ServerSettingsData
            
            alt No API key
                SHP->>SHP: return AUTH_ERROR
            else Has API key
                SHP->>CSM: start(wire)
                SHP->>HTTP: GET /nodes
                
                alt Success
                    SHP->>NM: Update/sync all nodes
                    SHP->>CC: markValid(installId, ...)
                    SHP->>SHP: return SUCCESS
                else SSL Error
                    SHP->>CC: invalidate(installId)
                    SHP->>HTTP: Download cert from /trust
                    SHP->>SHP: Retry connection
                else Unauthorized
                    SHP->>SHP: return AUTH_ERROR
                end
            end
        end
    end

/trust Endpoint - Server Settings Bootstrap

Two Entry Points for Peer Connection

Entry PointTriggerUse Case
BeaconUDP multicastAutomatic discovery on LAN
POST /trustHTTP from appUser provides API key for server-to-server connection

/trust Flow (App → Server → Peer Server)

sequenceDiagram
    participant User
    participant App as App UI (ExpandServer)
    participant ServerA as Server A (connected)
    participant NM as NodeManager
    participant SS as ServerSettings
    participant SSP as ServerServerProcessor
    participant SHP as ServerHandshakeProcess
    participant ServerB as Server B (target peer)
    
    User->>App: Click peer server in ExpandServer
    User->>App: Enter API key for Server B
    App->>ServerA: POST /trust {id: serverB.installId, apiKey, trustCert}
    
    ServerA->>NM: nodeAvailable(settingsData.id)?
    
    alt Peer known (from beacon)
        ServerA->>SS: write(id, settingsData)
        ServerA->>NM: update(peer with USER_EDIT state)
        NM->>SSP: post(peerNode)
        SSP->>SSP: state == USER_EDIT
        SSP->>SHP: trustServer(wire from meta)
        SHP->>ServerB: Handshake + node download
    else Peer unknown
        ServerA-->>App: 404 Not Found
        Note over App: Peer must be discovered via beacon first
    end

Why Beacon Must Come First

The beacon provides essential information that /trust cannot:

  • Hostname/IP: Where to connect
  • Port: Which port the server listens on
  • SessionId: Current session for deduplication

The /trust endpoint only provides:

  • installId: Peer identity
  • apiKey: Authentication credential
  • trustCert: Whether to trust self-signed certificates

WebSocket Connection Lifecycle

Connection Keying

Connections are keyed by installId:sessionId (NOT host:port) for:

  • Consistent peer identity across IP changes
  • Proper handling of peer restarts (new sessionId)
  • Avoiding duplicate connections

WebSocket Connection Flow

stateDiagram-v2
    [*] --> Disconnected
    
    Disconnected --> Connecting: start(wire)
    Connecting --> Connected: WebSocket established
    Connecting --> Disconnected: Connection error
    
    Connected --> Receiving: Receive node updates
    Receiving --> Connected: Process update
    
    Connected --> Disconnected: Connection closed
    Connected --> Disconnected: Error
    
    Disconnected --> [*]: Peer removed
    
    note right of Disconnected: onDisconnect sets peer to ERROR
    note right of Connected: Updates flow via WebSocket

Disconnect Handling

When a WebSocket disconnects:

  1. nodeManager.onDisconnect(wire) is called
  2. If node exists, sets state to ERROR
  3. Connection tracking is cleaned up
  4. On next beacon from peer, reconnection is attempted

Peer Server Node State Transitions

stateDiagram-v2
    [*] --> NONE: Beacon received + API key exists + sync success
    
    NONE --> USER_EDIT: User updates settings
    USER_EDIT --> NONE: Handshake success
    USER_EDIT --> ERROR: Handshake failure
    
    NONE --> ERROR: WebSocket disconnect
    ERROR --> NONE: Beacon + successful resync
    
    [*] --> ERROR: Beacon received + no API key
    ERROR --> USER_EDIT: User sets API key
    
    note right of ERROR: ERROR state does NOT trigger network work
    note right of USER_EDIT: Triggers handshake pipeline

Invariants and Guarantees

Startup Invariants

  1. Client Node EXECUTE once: On app start, client node is created/loaded and executed exactly once
  2. Server Node EXECUTE once: On server start, server node is created/loaded and executed exactly once
  3. Beacon listening starts: Both apps and servers start beacon listening on startup

Beacon Handling Invariants

  1. Server beacons identified by port > 0
  2. Duplicate beacons ignored: Same installId + sessionId = already known
  3. Peer restarts detected: Same installId + different sessionId = reconnect flow
  4. Server responds to app beacons: Enables discovery

Connection Pipeline Invariants

  1. Idempotent operations: Job keys use installId-sessionId to prevent duplicates
  2. No API key = STOP: No handshake/download/socket connect without API key
  3. ERROR state = STOP: ERROR nodes don’t trigger new network work
  4. Servers don’t broadcast errors: ERROR states are not sent over eventBus/WebSocket

Identity Invariants

  1. installId is primary identity: Used for all peer tracking
  2. sessionId detects restarts: New session = peer restarted
  3. host:port for connectivity only: NOT used for identity keying

Troubleshooting Guide

Symptom: Server appears in ERROR state

Likely Causes:

  1. No API key configured
  2. Wrong API key
  3. Network unreachable
  4. Certificate trust issue

Log Lines to Look For:

1
2
3
4
Received beacon from ${wire.host}:${wire.port}
Cannot update settings for unknown peer - peer must be discovered via beacon first
Unauthorised, you may need to set your api key for this server
SSL/Certificate error for peer ${installId}

Symptom: Duplicate connections or handshakes

Likely Causes:

  1. Using host:port as key instead of installId (fixed in this update)
  2. Beacon storm without proper deduplication

Log Lines to Look For:

1
2
Handshake job already in progress for ${jobKey}
Connection already active for peer ${installId}

Symptom: Node updates not received

Likely Causes:

  1. WebSocket disconnected
  2. ERROR state on peer node
  3. Traffic control rejecting as duplicate

Log Lines to Look For:

1
2
3
WebSocket connection closed for peer ${installId}
Error receiving message
ServerEventBusProcessor skipping broadcast for ${type} ${state}

Diagram Source Map

These diagrams were derived from the actual code in the repository:

DiagramSource Classes/Functions
Beacon Discovery FlowBeaconSupervisor.startBeaconListener(), BeaconSupervisor.handleIncomeWire(), BeaconProcessor.processWire(), PeerSessionManager.isKnownSession()
Handshake Pipeline FlowServerHandshakeProcess.trustServer(), ServerHandshakeProcess.attemptConnection(), ServerHandshakeProcess.downloadAndSyncServerData(), CertificateCache.hasValidConnection()
/trust FlowRoutes.kt POST /trust, ServerServerProcessor.post(), ServerHandshakeProcess.trustServer()
WebSocket Connection FlowClientSocketManager.start(), ClientSocketManager.connectWebSocket(), ClientNodeManager.onDisconnect()
Peer Node State TransitionsServerHandshakeProcess connection results, NodeManager.setErrorState(), NodeManager.complete()

This post is licensed under CC BY 4.0 by the author.