Version: 0.0.0 (Beta)

Delivery Strategy

This document explains how logs are routed and delivered once they leave the logging path.

The SDK follows a strict and deterministic delivery order.

Delivery order

Logs are delivered in the following priority order:

Primary server
EchoPost
Fallback server
Drop with persistence

Each step exists to reduce data loss while maintaining system stability.

Primary server delivery

The primary server is always the first choice.

Endpoint and protocol

URL: Configured via server.host or DATANADHI_SERVER_HOST
Endpoint: POST /log
Format: JSON with pipelines and log_data
Authentication: DATANADHI_API_KEY header
Timeout: 10 seconds

Request format

{
  "pipelines": ["pipeline-id-1", "pipeline-id-2"],
  "log_data": {
    "message": "...",
    "trace_id": "...",
    "timestamp": "...",
    "module_name": "...",
    "log_record": { ... },
    "context": { ... }
  }
}

Response handling

Status Code	Action
2xx	Success - mark item as done
3xx-5xx	Failure - drop item, store to disk
5xx+	Unavailable - requeue item, mark server unhealthy
Timeout/Connection error	Unavailable - requeue item, mark server unhealthy

Health monitoring

When marked unhealthy:

Background health check thread starts
Checks GET / every 500ms
When server responds with 2xx, marks healthy
Routing automatically resumes

EchoPost delivery

EchoPost is used only when the primary server is unavailable.

Behavior:

Logs are sent locally over gRPC
EchoPost buffers logs on disk
Logs are replayed later by EchoPost

If EchoPost fails:

It is disabled
Logs move to fallback delivery

Fallback server delivery

The fallback server is used when:

Primary server is down AND
EchoPost is disabled

Endpoint and protocol

URL: Configured via server.fallback_host or DATANADHI_FALLBACK_SERVER_HOST
Endpoint: POST /upload
Format: Gzipped JSONL (JSON Lines)
Authentication: DATANADHI_API_KEY header
Timeout: 30 seconds (longer for batch)
Batch size: Up to 100 items per request

Batch format

The worker collects up to 100 items and sends them as compressed JSONL:

# Each line is one log entry
{"pipelines": [...], "log_data": {...}}\n
{"pipelines": [...], "log_data": {...}}\n
# Compressed with gzip, sent as application/octet-stream

Response handling

Status Code	Action
2xx	Success - mark all items as done
3xx-5xx	Failure - drop batch, store to disk
5xx+	Unavailable - requeue batch, mark server unhealthy
Timeout/Connection error	Unavailable - requeue batch

Batching logic

When a worker sends to fallback:

Takes first item from queue
Calls queue.get_batch(99) to get up to 99 more
Sends all as a single compressed batch
Marks all as done on success

Drain worker

When the async queue reaches 90% capacity:

A drain worker is started automatically
Logs are drained in batches to the fallback server
Draining continues until the queue drops to 10%
The drain worker exits when finished

The drain worker exists purely for overload protection.

Health monitoring

Health is tracked independently for primary and fallback servers using ServerHealthMonitor.

Implementation

class ServerHealthMonitor:
    # Track health state
    _is_healthy = {"server_url": True, "fallback:server_url": True}
    
    # Active health check threads
    _check_threads = {}

Health state transitions

Mark server as down:

Worker encounters unavailability (5xx, timeout, connection error)
Calls health_monitor.set_health_down(server_host)
Monitor spawns daemon health check thread

Health check loop:

while True:
    sleep(0.5)
    if check_server_health(server_host):  # GET /
        mark_server_healthy()
        break  # Exit thread

Automatic recovery:

Workers check health_monitor.is_server_up(server) before routing
When health check succeeds, routing resumes immediately
No manual intervention required

Key properties

Primary and fallback servers tracked separately
Each server gets its own health check thread
Health check threads are daemon (don't block shutdown)
Single check thread per server (protected by lock)
Polling interval: 500ms
Timeout: 2 seconds per check

Delivery order​

Primary server delivery​

Endpoint and protocol​

Request format​

Response handling​

Health monitoring​

EchoPost delivery​

Fallback server delivery​

Endpoint and protocol​

Batch format​

Response handling​

Batching logic​

Drain worker​

Health monitoring​

Implementation​

Health state transitions​

Key properties​

Delivery order

Primary server delivery

Endpoint and protocol

Request format

Response handling

Health monitoring

EchoPost delivery

Fallback server delivery

Endpoint and protocol

Batch format

Response handling

Batching logic

Drain worker

Health monitoring

Implementation

Health state transitions

Key properties