DOCS demo
Support
# Reference Simulation: Fault-Tolerant, Cryptographically Secure Firmware Updates for Remote Edge Nodes

**Bauxite Technical Demonstration Suite**  
*June 2026*

---

```text
Reproducibility

Repository:
docs/demos/ota_resilient/

Environment:
Ubuntu 22.04
Kernel 5.15+
Staging area filesystems

Duration:
~30 seconds

Output:
MCAP recording
Agent checkpoint logs
Atomic swap outputs

Objective:
Demonstrate checkpoint-based update resumption, chunk-level BLAKE3 verification,
and Ed25519 signature validation with atomic rollbacks under simulated power failures.
```

---

## Abstract
Over-the-Air (OTA) firmware updates are systems-critical for remote edge nodes. A failed update due to connectivity loss, power failure, or binary tampering can lead to node unavailability, causing significant operational downtime and recovery costs. This paper introduces a reference simulation of **Bauxite's Resilient OTA Update Framework**, featuring checkpoint-based resumption, chunk-level integrity verification, and cryptographic signature gates. We demonstrate the deployment of a virtual $3\text{ GB}$ deterministic firmware image (consisting of 49,152 generated chunks) under three conditions: clean deployment, power failure simulation at approximately 47% progress, and a supply-chain tampering attack. Empirical results show that Bauxite's checkpoint matrix successfully recovers progress without retransferring verified data, and its cryptographic gate detects and rejects a tampered payload, executing an atomic rollback to the previous stable firmware version.

*Note: In this demonstration workload, HMAC is used as a fallback validation mechanism when Ed25519 cryptographic libraries are unavailable on the test host.*

---

## 1. Introduction
Firmware updates for distributed robotic systems and edge nodes are highly vulnerable. If an update fails mid-transfer, or if an attacker injects malicious code, the node may become non-responsive. Resolving this requires manual recovery or reflashing, causing substantial downtime.

Bauxite solves this via a robust update manager that guarantees three key properties:
1. **Atomic Checkpoint Resumption**: Progress is tracked in a local checkpoint matrix. If power or network is lost, the update resumes from the last completed block rather than restarting from 0%.
2. **Cryptographic Gates**: Individual $64\text{ KB}$ chunks are hashed with BLAKE3. The full update is signed using Ed25519 signatures verified against the master Hub key.
3. **Atomic Rollbacks**: Updates are staged in isolated directories. Swapping the active firmware version is executed via a POSIX atomic symlink swap, resulting in atomic version activation with no observed interruption in the demonstration workload.

---

## 2. System Architecture
Bauxite's OTA update manager runs as a daemon on the node. It interacts with the Hub's node service.
* **Lazy Chunk Generator**: The Hub hosts updates and serves them in $64\text{ KB}$ chunks. This lazy stream has a constant $O(\text{chunk\_size})$ RAM footprint on both server and agent, enabling updates of arbitrary size (e.g., $3\text{ GB}$ virtual deterministic images).
* **Checkpointing**: The agent maintains a local state file (`checkpoint.bin`). Any writes to the checkpoint file are performed atomically via temporary files and POSIX renames to prevent corruption on sudden power-off.
* **Validation and Swap**: The staging area directory (e.g., `v2.0.0/`) is populated. If signature checks succeed, the active symlink is renamed to point to the new staging directory.

---

## 3. Methodology & Test Environment
The simulation environment is designed to evaluate Bauxite's OTA update manager:
* **Host Setup**: Standard Linux testbed (Ubuntu 22.04 LTS, kernel 5.15).
* **Setup**: Update server hosted locally running a gRPC/HTTP node service serving the virtual deterministic 3 GB payload. Bauxite OTA agent running as a daemon on the test node.
* **Network & Control**: Virtualized workload runs a simulated control loop and sensor feedback stream.

The simulation is evaluated across three sequential phases:
* **Phase 1: Baseline Clean Deployment**: The virtual $3\text{ GB}$ firmware package is downloaded and verified. The control loop latency is monitored to verify that the download does not affect node operations.
* **Phase 2: Interrupted Transfer (Power/Network Cut)**: A power failure is simulated by terminating the agent process (`kill -9`) at approximately 47% progress. The agent is restarted, and recovery behaviour is observed.
* **Phase 3: Supply-Chain Verification Failure**: An attacker injects 8 malicious bytes into the staged firmware binary. The agent's response to signature mismatch is recorded.

### Running the Demo
To execute the multi-phase demo and view the local logs, run:
```bash
./docs/demos/ota_resilient/run.sh
```
This runs the local Python gRPC/HTTP simulation server and executes the update scenarios.

### Forensic Telemetry Replay
During the update, telemetry events are recorded in an MCAP log. To view this, import `docs/demos/ota_resilient/foxglove_layout.json` into Foxglove Studio and load the MCAP file from the recordings directory.

---

## 4. Results and Discussion

The empirical metrics aggregated across 10 simulation trials are summarized in the table and chart below.

### OTA Update Resiliency and Security Metrics (10-Trial Summary)
| Metric | Standard Update Manager | Bauxite Resilient Update |
| :--- | :---: | :---: |
| **Firmware Package Size** | 3.0 GB (Virtual payload) | 3.0 GB (Virtual payload) |
| **Resume Capability** | No (Restarts download from 0%) | Yes (Resumes from last verified chunk) |
| **Data Retransferred After Power Cut** | Full size (3.0 GB) | 0 bytes (No verified chunks retransferred) |
| **Chunk-Level Verification** | None or post-download only | Active (BLAKE3 per-chunk verification) |
| **Cryptographic Signature Gate** | None or weak validation | Verified (Ed25519 Master Signature) |
| **Staging Area Isolation** | Direct overwrite / loose files | Isolated staging directory |
| **Rollback Downtime** | Highly variable / manual recovery | Atomic version activation; restart depends on workload |
| **Control Loop Latency** | $<150\ \mu\text{s}$ (Flat) | $<150\ \mu\text{s}$ |

### OTA Firmware Update Progress Timeline
![OTA Progress Recovery Timeline](ota_resilient_chart.jpg)

In Phase 1, the virtual $3\text{ GB}$ package finishes download and verification. Despite the high-volume streaming, the system's critical control loop latency remains flat at under $150\ \mu\text{s}$ due to Bauxite's QoS lane isolation. 

In Phase 2, when the agent is interrupted at approximately 47% progress, the `checkpoint.bin` file remains intact. Upon agent restart, it successfully loads the verified chunk signatures and resumes downloading from the last verified block, saving time and network bandwidth by ensuring no previously verified firmware chunks were retransferred. 

In Phase 3, the tampering is detected. The signature check fails, returning `INVALID_SIGNATURE_REJECTED`. Bauxite immediately purges the staging directory and maintains the symlink pointing to the stable firmware version. The version swap completes atomically via a symlink exchange, minimizing version transition latency.

---

## 5. Conclusion
Updating distributed systems in remote environments requires failure-resilient architectures. Bauxite's OTA update manager is designed to prevent partial firmware activation following interruption events such as power loss, ensuring operational continuity.

---

## Appendix: Raw Checkpoint and Update Progress Log

The table below compiles checkpoint parameters and state validation stages recorded during the simulation run, aggregated across 10 trials:

| Time (s) | Phase | Chunk Index | Progress (%) | Ed25519 State | Retransferred Bytes |
| :---: | :---: | :---: | :---: | :---: | :---: |
| **0.0** | Clean | 0 | 0% | PENDING_VALIDATION | 0 |
| **5.0** | Clean | approx. 12k | approx. 25% | PENDING_VALIDATION | 0 |
| **10.0** | Clean | approx. 24k | approx. 50% | PENDING_VALIDATION | 0 |
| **20.0** | Clean | approx. 49k | 100% | VALID | 0 |
| **0.0** | Power Cut | 0 | 0% | PENDING_VALIDATION | 0 |
| **9.0** | Power Cut | approx. 23k | approx. 47% | PENDING_VALIDATION | 0 |
| **9.1** | Power Cut | — | — | POWER_FAILURE_CUT | — |
| **11.0** | Power Cut | approx. 23k | approx. 47% | CHECKPOINT_LOADED | 0 |
| **15.0** | Power Cut | approx. 36k | approx. 73% | PENDING_VALIDATION | 0 |
| **20.0** | Power Cut | approx. 49k | 100% | VALID | 0 |
| **0.0** | Tampered | 0 | 0% | PENDING_VALIDATION | 0 |
| **19.0** | Tampered | approx. 49k | 100% | PENDING_VALIDATION | 0 |
| **19.5** | Tampered | — | 100% | INVALID_SIGNATURE_REJECTED | 0 |
| **20.0** | Tampered | — | — | ATOMIC_ROLLED_BACK | 0 |