Table of Contents

NATS Architecture for Fleet Scale

The messaging topology determines how telemetry flows, where state persists, and how the system behaves when connectivity degrades. This architecture scales from tens to thousands of vehicles while maintaining low latency and high reliability.


The Challenge

Fleet messaging must solve competing requirements:

RequirementChallenge
Low latencyVehicle-to-vehicle communication needs milliseconds, not seconds
WAN resilienceCellular connections drop, latency spikes, bandwidth varies
State persistenceDigital twin data must survive restarts and reconnections
ScaleThousands of concurrent publishers and subscribers
IsolationVehicle failures shouldn’t cascade to the fleet

Traditional architectures force tradeoffs. Centralized brokers add latency. Peer-to-peer meshes don’t scale. Edge-only approaches lose fleet visibility.


Hierarchical Topology

The solution is a three-tier hierarchy:

┌─────────────────────────────────────────────────────────────────┐
│                         GLOBAL (optional)                        │
│                    Cross-region replication                      │
│                    Fleet-wide aggregation                        │
└────────────────────────────┬────────────────────────────────────┘
                             │
         ┌───────────────────┼───────────────────┐
         │                   │                   │
         ▼                   ▼                   ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│  REGIONAL HUB   │ │  REGIONAL HUB   │ │  REGIONAL HUB   │
│   Americas      │ │   Europe        │ │   Asia-Pacific  │
│  3-node cluster │ │  3-node cluster │ │  3-node cluster │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
         │                   │                   │
    ┌────┼────┐         ┌────┼────┐         ┌────┼────┐
    │    │    │         │    │    │         │    │    │
    ▼    ▼    ▼         ▼    ▼    ▼         ▼    ▼    ▼
┌─────┐┌─────┐┌─────┐┌─────┐┌─────┐┌─────┐┌─────┐┌─────┐┌─────┐
│LEAF ││LEAF ││LEAF ││LEAF ││LEAF ││LEAF ││LEAF ││LEAF ││LEAF │
│VID-1││VID-2││VID-3││VID-4││VID-5││VID-6││VID-7││VID-8││VID-9│
└─────┘└─────┘└─────┘└─────┘└─────┘└─────┘└─────┘└─────┘└─────┘

Tier 1: Vehicle Leaf Nodes

Each drone runs a NATS leaf node on the Jetson:

AspectConfiguration
Processnats-server in leaf mode
StorageLocal JetStream on SSD/eMMC
ConnectionSingle upstream to regional hub
Resources~50MB RAM, minimal CPU

What Leaf Nodes Provide:

Local Pub/Sub

On-vehicle services communicate through the local NATS instance:

Vehicle Gateway ──publish──▶ fleet.prod.veh.VID-001.state.position
                                       │
                                       ▼
                            ┌──────────────────┐
                            │   Local NATS     │
                            │   (Leaf Node)    │
                            └──────────────────┘
                                       │
                                       ▼
AI Perception ◀──subscribe── fleet.prod.veh.VID-001.state.position

Messages between on-vehicle services never leave the vehicle. Latency is sub-millisecond.

Store-and-Forward

When the upstream connection drops, the leaf node:

  1. Continues accepting publishes from local services
  2. Stores messages in local JetStream
  3. Queues outbound messages for the hub
  4. Reconnects and replays when connectivity returns

This means vehicles continue operating during connectivity loss. State synchronizes when the link recovers.

WAN Isolation

The leaf node acts as a buffer between the vehicle and WAN:

  • Backpressure — If the hub can’t keep up, the leaf buffers locally
  • Filtering — Only subscribed subjects traverse the WAN link
  • Compression — NATS protocol compresses efficiently
  • Reconnection — Automatic reconnection with exponential backoff

Tier 2: Regional Hub Clusters

Regional hubs aggregate vehicles by geographic area:

AspectConfiguration
Deployment3-node NATS cluster (minimum)
LocationCloud region or edge data center
StorageJetStream on fast SSDs
CapacityHundreds of leaf connections per cluster

Why Regional:

  • Latency — Leaf nodes connect to nearby hubs
  • Bandwidth — Telemetry stays regional by default
  • Compliance — Data residency requirements
  • Failure isolation — Regional outages don’t affect other regions

Hub Responsibilities:

FunctionDescription
Leaf aggregationAccept connections from vehicle leaf nodes
Stream storagePersist digital twin data in JetStream
Consumer supportServe fleet dashboards, APIs, analytics
Cross-region routingForward to global tier when needed

Cluster Configuration

A 3-node cluster provides:

  • High availability — Survives single node failure
  • Data replication — JetStream streams replicated across nodes
  • Load distribution — Leaf connections balanced across nodes
# Example hub cluster configuration
cluster:
  name: hub-americas
  routes:
    - nats://hub-americas-1:6222
    - nats://hub-americas-2:6222
    - nats://hub-americas-3:6222

jetstream:
  store_dir: /data/jetstream
  max_memory_store: 4GB
  max_file_store: 100GB

leafnodes:
  port: 7422
  authorization:
    users:
      - user: vehicle
        password: $VEHICLE_PASSWORD
        allowed_connection_types: ["LEAFNODE"]

Tier 3: Global (Optional)

For fleets spanning multiple regions, a global tier enables:

FunctionDescription
Cross-region mirroringReplicate streams between regional hubs
Global aggregationFleet-wide dashboards and analytics
Command routingSend commands to vehicles in any region
Disaster recoveryFailover between regions

Implementation Options:

  1. NATS Supercluster — Federate regional clusters via gateway connections
  2. JetStream Mirroring — Replicate specific streams to central location
  3. Application-level — Custom sync between regional APIs

Most fleets operate within a single region and don’t need global tier initially.


Benefits of This Topology

Low Latency Local Pub/Sub

On-vehicle communication is sub-millisecond:

MAVLink message received
    └─▶ Vehicle Gateway publishes to local NATS
        └─▶ AI Perception subscribes from local NATS
            └─▶ Total latency: <1ms

No WAN round-trip for on-vehicle communication.

WAN Isolation

Vehicle operations continue during connectivity loss:

  • Perception systems keep running
  • State accumulates locally
  • Commands queue for execution
  • Reconnection synchronizes automatically

Fleet-Wide Digital Twin Replay

JetStream enables temporal queries across the fleet:

# Replay vehicle state from 10 minutes ago
nats stream get TWIN_STATE --start-time="2024-01-15T10:00:00Z"

# Subscribe to real-time state updates
nats sub "fleet.prod.veh.*.state.position"

Every vehicle’s state history is queryable from the regional hub.

Scalable Architecture

The topology scales horizontally:

Scale PointApproach
More vehiclesAdd leaf connections to existing hub
Higher throughputAdd nodes to hub cluster
More regionsDeploy additional regional hubs
Global reachConnect hubs via supercluster

Connection Flow

When a vehicle powers on:

1. Jetson boots, NATS leaf node starts
2. Leaf node reads hub address from config
3. TLS connection established to regional hub
4. Authentication via credentials
5. Leaf node advertises local subjects
6. Hub routes relevant subscriptions to leaf
7. Bidirectional message flow begins

When connectivity drops:

1. Leaf node detects connection loss
2. Local pub/sub continues uninterrupted
3. Outbound messages queue locally
4. Reconnection attempts with backoff
5. On reconnect, queued messages replay
6. Stream state synchronizes

Security

Every connection is authenticated and encrypted:

ConnectionSecurity
Leaf → HubTLS 1.3, credential authentication
Hub clusterTLS, cluster routes authenticated
Hub → GlobalTLS, gateway authentication

Credentials are provisioned per-vehicle, enabling:

  • Individual vehicle revocation
  • Audit trails per vehicle
  • Rate limiting per connection

The authorization model supports decentralized security with grants—third parties can receive scoped, time-bounded access without central coordination. See Authorization & Grants for details on credential management and third-party integration.


Summary

TierPurposeDeployment
Leaf (Vehicle)Local pub/sub, store-and-forwardOn Jetson
Hub (Regional)Aggregation, persistence, consumersCloud/edge datacenter
Global (Optional)Cross-region, fleet-wide viewsCentral cloud

This topology ensures vehicles operate independently while maintaining fleet-wide visibility and control.


Deployment Options

Open Source Foundation

NATS JetStream is 100% open source under the Apache 2.0 license:

  • Source code: github.com/nats-io/nats-server
  • No commercial license required — use freely in production
  • Full feature parity — open source has all features
  • Active development — backed by Synadia, used by thousands

You run the exact same code we run. No lock-in.


Option 1: Self-Hosted

Deploy the entire stack on your infrastructure:

ComponentYour Responsibility
Hub clustersProvision, configure, maintain
Leaf nodesDeploy on each vehicle
MonitoringSet up observability
UpdatesManage upgrades

We provide:

  • Reference configurations (this documentation)
  • Architecture consulting
  • Implementation support

Best for: Organizations with ops teams, data residency requirements, or existing infrastructure.


Option 2: Connect to Our Infrastructure

Tap into Ubuntu Software’s managed NATS infrastructure:

ComponentResponsibility
Hub clustersWe manage
Leaf nodesYou deploy on vehicles
MonitoringWe provide dashboards
ScalingWe handle

How it works:

  1. We provision credentials for your fleet
  2. Your leaf nodes connect to our regional hubs
  3. Your data stays isolated (dedicated subjects)
  4. You get fleet dashboards and API access

Benefits:

  • No infrastructure to build or maintain
  • Pre-tuned for drone fleet patterns
  • Global coverage across regions
  • Focus on your drones, not your messaging

Pricing:

  • Free tier — Up to 10 vehicles, perfect for development and small deployments
  • Scale tier — Per-vehicle pricing as you grow

Contact Us → to get started.


Comparison

AspectSelf-HostedManaged
Setup timeDays to weeksHours
Ops burdenYou maintainWe maintain
Cost modelInfrastructure + timePer-vehicle
CustomizationFull controlStandard config
Data locationYour choiceOur regions
Best forLarge ops teamsFast deployment

Next

Subject Naming →