How to Handle WebSocket Reconnections Without Losing Events

June 12, 2026 · 6 min read · #websocket #eth_subscribe #rpc #tutorial

The worst RPC bug is the one that doesn't throw. Your service connects over WebSocket, subscribes to contract logs with eth_subscribe, and processes events for hours. Then the connection drops — an idle timeout, a load balancer cycling a backend, the provider shipping a deploy — and your code keeps running against a dead socket. No exception, no log line, just silence. Twenty minutes later the socket reconnects (or your wrapper reconnects it), the events start flowing again, and everything looks healthy.

Except you lost every event that fired during those twenty minutes. They were never queued anywhere; eth_subscribe is fire-and-forget, and a subscription does not survive the connection that created it. For an indexer, an accounting service, or a bot that acts on transfers, a silent gap is a correctness bug that surfaces days later as "why is our balance off."

This post is the pattern we use to make WebSocket consumers actually reliable. There are four jobs, and most reconnect code only does the first two.

Why WebSocket connections drop (all the time)

Long-lived WebSockets are not a stable resource. In production you will see disconnects from:

Idle timeouts. Many providers and proxies close a socket that hasn't sent a frame in 30–120 seconds. A subscription that happens to be quiet looks idle.
Load balancer recycling. Behind any real RPC endpoint there's a fleet. Backends get drained for deploys and health-check failures; your socket goes with them.
Network blips. NAT rebinding, Wi-Fi handoff, a container migration — the TCP connection just dies.
Server-side resource caps. Hit a per-connection subscription limit or a memory ceiling and the server hangs up.

The takeaway: design for the socket dying every few minutes, not as a rare event. If your reconnect path is well-worn, drops become a non-event.

The four jobs of a reliable consumer

Detect the drop quickly (don't trust onclose alone).
Reconnect with backoff so you don't hammer a struggling endpoint.
Re-subscribe to everything you were watching.
Backfill the events you missed while disconnected — and de-duplicate.

Job 4 is the one almost everyone skips, and it's the only one that prevents data loss.

Job 1: detect the drop with a heartbeat

onclose and onerror fire eventually, but a half-open socket — TCP alive, no data flowing — can sit silent for a long time. Add a heartbeat: send a cheap request on an interval and reset a watchdog whenever any data arrives. If the watchdog expires, treat the socket as dead and tear it down yourself.

let lastData = Date.now();
ws.on("message", () => { lastData = Date.now(); });

setInterval(() => {
  // any cheap call works as a liveness ping
  ws.send(JSON.stringify({ jsonrpc: "2.0", id: "ping", method: "net_version", params: [] }));
  if (Date.now() - lastData > 30_000) ws.terminate(); // force onclose -> reconnect
}, 10_000);

Job 2: reconnect with exponential backoff and jitter

When an endpoint is having a bad minute, fifty clients reconnecting in a tight loop make it worse. Back off, cap the delay, and add jitter so a fleet of your own workers doesn't reconnect in lockstep.

Attempt	Base delay	With jitter (±30%)
1	1s	0.7–1.3s
2	2s	1.4–2.6s
3	4s	2.8–5.2s
4	8s	5.6–10.4s
5+	30s (cap)	21–39s

function backoff(attempt) {
  const base = Math.min(30_000, 1000 * 2 ** attempt);
  return base * (0.7 + Math.random() * 0.6); // ±30% jitter
}

Jobs 3 + 4: re-subscribe, then backfill the gap

This is the heart of it. On reconnect you re-create your subscriptions — but a fresh subscription only delivers events from now. The window between your last received event and the new subscription is a hole. Fill it with eth_getLogs.

The trick is to track the last block you fully processed. On reconnect, query logs from that block forward to the current head, replay them, then let the live subscription take over. Because the boundary overlaps, you must de-duplicate on a stable key: blockHash + logIndex (or transactionHash + logIndex).

import { createPublicClient, webSocket, http } from "viem";
import { mainnet } from "viem/chains";

const WSS = "wss://rpc.swiftnodes.io/ws/eth?key=YOUR_API_KEY";
const HTTPS = "https://rpc.swiftnodes.io/rpc/eth?key=YOUR_API_KEY";

// viem's webSocket transport reconnects on its own; we add the backfill.
const wsClient = createPublicClient({ chain: mainnet, transport: webSocket(WSS, {
  reconnect: { attempts: 10, delay: 1_000 },
}) });
const httpClient = createPublicClient({ chain: mainnet, transport: http(HTTPS) });

const seen = new Set();          // `${blockHash}:${logIndex}` for the overlap window
let lastProcessed = 0n;          // highest block we have fully handled
const FILTER = { address: "0xYourContract", event: /* parseAbiItem(...) */ undefined };

function handle(log) {
  const id = `${log.blockHash}:${log.logIndex}`;
  if (seen.has(id)) return;
  seen.add(id);
  lastProcessed = log.blockNumber > lastProcessed ? log.blockNumber : lastProcessed;
  // ... your event handler ...
}

async function backfill() {
  if (lastProcessed === 0n) return;        // nothing to catch up on yet
  const head = await httpClient.getBlockNumber();
  // lag the tip by a few blocks so a reorg doesn't replay logs you'll un-see
  const safeHead = head - 3n;
  if (safeHead <= lastProcessed) return;
  const logs = await httpClient.getLogs({ ...FILTER, fromBlock: lastProcessed + 1n, toBlock: safeHead });
  for (const log of logs) handle(log);
}

// on every (re)connect: backfill first, then resume live
wsClient.watchEvent({
  ...FILTER,
  onLogs: (logs) => logs.forEach(handle),
  onError: () => {/* transport will reconnect; backfill runs on the next open */},
});

A few things make this robust:

Backfill over HTTP, not WS. A range query is a request/response — it belongs on HTTP. Keep WS for the live tail. (And mind the provider's range cap; if the gap is large, page it. We covered those limits in eth_getLogs range caps.)
Lag the tip by a few blocks. The very head reorgs. If you backfill all the way to head, a reorg can make you replay or act on logs that get orphaned. Stay a handful of confirmations back for anything you act on irreversibly.
Bound the dedup set. Don't let seen grow forever — clear entries older than the overlap window (e.g. anything below lastProcessed - 50).

The ethers v6 version

ethers v6's WebSocketProvider does not reconnect itself, so you wrap it: recreate the provider on close, re-attach listeners, and run the same backfill.

import { WebSocketProvider, JsonRpcProvider } from "ethers";

const http = new JsonRpcProvider("https://rpc.swiftnodes.io/rpc/eth?key=YOUR_API_KEY");

function connect(attempt = 0) {
  const ws = new WebSocketProvider("wss://rpc.swiftnodes.io/ws/eth?key=YOUR_API_KEY");

  ws.websocket.onopen = () => { backfill(); subscribe(ws); };
  ws.websocket.onclose = () => {
    const delay = Math.min(30_000, 1000 * 2 ** attempt) * (0.7 + Math.random() * 0.6);
    setTimeout(() => connect(attempt + 1), delay);
  };
}

function subscribe(ws) {
  ws.on({ address: "0xYourContract" }, (log) => handle(log)); // re-attach on every reconnect
}

connect();

Same shape in web3.py: catch the ConnectionClosed, reconnect in a loop with backoff, re-create the filter, and run an eth_getLogs backfill from your last stored block.

Persist the watermark

One last piece: lastProcessed has to survive a process restart, not just a reconnect. If your service crashes and comes back, the in-memory watermark is gone and you'll either re-process from genesis or skip the gap. Write the last fully-processed block to Redis or your database after each batch, and load it on boot. Then the same backfill that recovers from a dropped socket also recovers from a deploy.

The mental model

A WebSocket subscription is a best-effort live tail, not a delivery guarantee. Treat it as one half of a pair: WS for low-latency live events, eth_getLogs for the authoritative gap-fill. With a heartbeat to detect drops, jittered backoff to reconnect politely, re-subscription, and a watermark-driven backfill with dedup, your consumer can lose its connection a hundred times a day and never lose an event.

SwiftNodes runs flat-rate WebSocket and HTTP endpoints across 50+ chains — same ?key= on both transports, no compute-unit surprises when a reconnect storm makes you fire a burst of eth_getLogs backfills. Spin up a free key at swiftnodes.io and point both your wss:// tail and your https:// backfill at our Ethereum RPC or any chain you build on. For the WS quirks that differ by network, see Arbitrum WebSocket gotchas.

Subscribing to Contract Events with eth_subscribe (logs)
Polling eth_getLogs in a loop is the slow, brittle way to watch contract events. eth_subscribe pushes logs to you over WebSocket the instant they're mined. Here's how to build a subscription that filters correctly, survives reconnects, and never silently misses an event.
WebSocket vs HTTP Polling for Blockchain Events
Should you poll eth_getLogs on a timer or subscribe over WebSocket? Polling is simpler and more robust; WebSocket is lower-latency and lighter at scale — but drops events on reconnect. Here's the honest trade-off, when each wins, and why production systems often use both.
Retrying RPC Calls the Right Way: Backoff, Idempotency, and Failover
A naive retry loop turns a brief blip into an outage. Here's how to retry RPC calls correctly: exponential backoff with jitter, knowing which calls are safe to retry, and failing over across providers — with copy-paste examples in viem, ethers, and web3.py.

Try SwiftNodes free — multi-chain RPC across 75+ networks, flat-rate pricing, pay by card or crypto, no KYC. Get an API key in 30 seconds →