Retrying RPC Calls the Right Way: Backoff, Idempotency, and Failover

June 21, 2026 · 6 min read · #rpc #reliability #retry #tutorial

Every RPC call fails eventually. A node restarts, a rate limit kicks in, a packet drops, a provider has a bad ten seconds. None of that is avoidable — what is avoidable is the way most code reacts to it. The default reaction is either "let the exception bubble up and crash the job" or "wrap it in while (true) retry," and both are wrong. The first turns a 200ms blip into a failed batch. The second turns a rate limit into a self-inflicted DDoS, hammering an already-struggling endpoint until it bans you.

Getting retries right comes down to three questions: how long do I wait between attempts, is this call even safe to repeat, and where do I send the retry. Backoff, idempotency, failover. Get all three and transient failures become invisible. Get them wrong and you build an outage amplifier.

1. Backoff: wait longer each time, and add jitter

The single most important rule: never retry immediately in a tight loop. If a node is rate-limiting you or overloaded, retrying after 0ms just adds load at the exact moment it can least handle it. Use exponential backoff — double the wait each attempt — and cap it so you don't wait forever.

The less obvious rule: add jitter. If 50 workers all hit a rate limit at the same instant and all back off by exactly 200ms, 400ms, 800ms, they retry in lockstep and collide again — a "thundering herd." Randomizing the delay spreads them out. Full jitter (a random value between 0 and the backoff ceiling) is the simplest version that works.

Here's a framework-agnostic wrapper you can drop around any call:

const RETRYABLE = new Set([429, 500, 502, 503, 504]);

async function withRetry(fn, { tries = 5, base = 200, cap = 5000 } = {}) {
  for (let attempt = 1; ; attempt++) {
    try {
      return await fn();
    } catch (err) {
      const status = err?.status ?? err?.statusCode;
      const retryable =
        RETRYABLE.has(status) ||
        err?.code === "ETIMEDOUT" ||
        err?.code === "ECONNRESET";
      if (!retryable || attempt >= tries) throw err;

      // Respect Retry-After (seconds) on a 429; otherwise full jitter.
      const retryAfter = Number(err?.headers?.["retry-after"]) * 1000;
      const ceiling = Math.min(cap, base * 2 ** (attempt - 1));
      const delay = retryAfter || Math.random() * ceiling;
      await new Promise((r) => setTimeout(r, delay));
    }
  }
}

Two details that matter. Honor Retry-After if the server sends it on a 429 — it's telling you exactly when it'll accept you again, so guessing is pointless. And only retry retryable failures: a 429, a 5xx, or a network timeout is worth repeating; a revert, an "invalid params," or an "execution reverted" is a deterministic error that will fail identically every time. Retrying those just wastes your retry budget and hides the real bug.

The Python equivalent:

import random
import time

RETRYABLE = {429, 500, 502, 503, 504}

def with_retry(fn, tries=5, base=0.2, cap=5.0):
    for attempt in range(1, tries + 1):
        try:
            return fn()
        except Exception as e:
            status = getattr(getattr(e, "response", None), "status_code", None)
            retryable = status in RETRYABLE or isinstance(e, (TimeoutError, ConnectionError))
            if not retryable or attempt == tries:
                raise
            ceiling = min(cap, base * 2 ** (attempt - 1))
            time.sleep(random.random() * ceiling)  # full jitter

2. Idempotency: know what's safe to repeat

Backoff decides when to retry. Idempotency decides whether you're allowed to. An idempotent call produces the same result and the same side effects no matter how many times you make it — so retrying is free. A non-idempotent call has side effects, and blindly repeating it can do real damage.

Reads are always safe. eth_call, eth_getLogs, eth_getBalance, eth_blockNumber, eth_estimateGas, eth_getTransactionReceipt — these are pure queries. Retry them as much as your budget allows.

Writes need care, but are safer than people assume — if you sign locally. A signed transaction has a fixed hash and a fixed nonce. Broadcasting the same signed bytes twice via eth_sendRawTransaction is idempotent: the second attempt gets you already known or nonce too low, which you should treat as success, not failure — the transaction is already out there.

async function broadcast(raw) {
  try {
    return await withRetry(() => client.sendRawTransaction({ serializedTransaction: raw }));
  } catch (err) {
    const msg = String(err?.message ?? err).toLowerCase();
    if (msg.includes("already known") || msg.includes("nonce too low")) {
      return; // already in the mempool / mined — not an error
    }
    throw err;
  }
}

The trap is the opposite pattern: re-building and re-signing the transaction with a fresh nonce on each attempt. Do that and a retry after a timeout — where the first broadcast actually succeeded but the response got lost — can put two transactions on chain. Sign once, retry the broadcast of those exact bytes. And never retry eth_sendTransaction (the node-signs-for-you variant) blindly; you don't control the nonce, so you can't reason about idempotency.

3. Failover: don't retry into the same hole

If a node is down, retrying the same node — however politely — won't help. Real resilience means a second, independent endpoint to fail over to. The good news is both major JS libraries do this for you.

viem combines backoff and failover in its transport layer. Each http transport retries with backoff internally, and fallback rotates to the next provider when one keeps failing:

import { createPublicClient, fallback, http } from "viem";
import { mainnet } from "viem/chains";

const client = createPublicClient({
  chain: mainnet,
  transport: fallback([
    http("https://rpc.swiftnodes.io/rpc/eth?key=YOUR_API_KEY", {
      retryCount: 3,
      retryDelay: 150, // base; viem backs off exponentially from here
    }),
    http("https://your-backup-provider.example/eth"),
  ]),
});

ethers v6 has FallbackProvider, which queries across a set of providers and applies a quorum:

import { FallbackProvider, JsonRpcProvider } from "ethers";

const provider = new FallbackProvider(
  [
    new JsonRpcProvider("https://rpc.swiftnodes.io/rpc/eth?key=YOUR_API_KEY"),
    new JsonRpcProvider("https://your-backup-provider.example/eth"),
  ],
  1, // quorum: one healthy answer is enough
);

A few failover principles regardless of library:

Use genuinely independent endpoints. Two URLs that resolve to the same upstream don't protect you from that upstream's outage.
For broadcasts, fanning out is fine. Sending the same signed transaction to two providers is safe — same hash, deduplicated in the mempool. It can even land your transaction faster.
Watch out for "node behind" on reads. Failing over to a lagging node can return stale data — an eth_getTransactionReceipt that says "not found" for a transaction that's actually mined. For read-after-write, pin to one provider or poll by block number until the data catches up.

Putting it together

A production RPC client layers all three: failover transport on the outside (viem fallback or ethers FallbackProvider), backoff with jitter on each attempt, and a clear-eyed view of which methods are safe to repeat. Set a total deadline too — five attempts with exponential backoff can stretch past 30 seconds, which is fine for a background job and a disaster for a request blocking a user.

It also helps to start from infrastructure that fails less often. SwiftNodes serves flat-rate, multi-region RPC across Ethereum, Base, BNB Smart Chain, and dozens of other chains — a sensible primary endpoint that makes your retry logic the safety net it should be, not the thing holding production together. Pair it with a backup provider in your fallback list and you've got the full picture.

For the failure mode that triggers most retries in the first place, see Surviving Solana RPC 429s, and if you're retrying subscriptions rather than calls, How to Handle WebSocket Reconnections Without Losing Events covers the streaming side.

Grab a free API key at swiftnodes.io and point your fallback list at it — the free tier is enough to wire up and test your whole retry path.

How to Handle WebSocket Reconnections Without Losing Events
A WebSocket subscription that silently drops is worse than no subscription at all — you keep running, but events vanish into the gap. Here's how to build reconnect logic that detects the drop, backs off, re-subscribes, and backfills the missed events so your indexer never loses a log.
Mantle RPC: Endpoints, EigenDA, and What's Different
Mantle looks like a standard EVM L2 until two things trip you up: gas is paid in MNT, not ETH, and data availability runs through EigenDA instead of Ethereum calldata. Here's what that means for your RPC calls, plus the Mantle endpoints to point at.
What Is a Sequencer? How L2 Transactions Get Ordered
On an Ethereum L2, a single component decides the order your transaction lands in and how fast it confirms: the sequencer. Here's what it actually does, why nearly every rollup runs a centralized one today, and what that means when you're reading L2 state over RPC.

Try SwiftNodes free — multi-chain RPC across 75+ networks, flat-rate pricing, pay by card or crypto, no KYC. Get an API key in 30 seconds →

Retrying RPC Calls the Right Way: Backoff, Idempotency, and Failover

1. Backoff: wait longer each time, and add jitter

2. Idempotency: know what's safe to repeat

3. Failover: don't retry into the same hole

Putting it together

Related posts