SLATrustOperations

What "99.99% Gateway" Actually Means

We commit to 99.99% gateway availability. Here's what that covers, what it doesn't, how we measure it, and why it's the only honest SLA in messaging.

Flowstates Team · Customer messaging operations · 18 December 2025 · 5 min read

SLA inflation in messaging

"99.999% delivery guaranteed" appears on a surprising number of vendor websites. It can't be true. No vendor controls the operator network, the recipient handset, or the regulatory environment that determines whether a given message reaches a given person.

What a vendor *can* control is its own gateway — the API, the routing engine, the vendor connections, the monitoring. That's where Flowstates' 99.99% commitment lives, and it's worth explaining what's in and out of scope.

What's in scope

Our 99.99% gateway availability covers:

API ingestion: every API request is accepted and queued
Routing decisions: each message is dispatched to a vendor route
Vendor connection health: we maintain warm connections to your vendors
DLR processing: status updates from vendors are received and forwarded
Webhook delivery: status callbacks reach your application

Measured: percentage of API requests successfully accepted and routed in a given month. Excluded: planned maintenance windows announced 7+ days in advance.

99.99% over a month allows for ~4 minutes 20 seconds of unplanned downtime.

What's out of scope

Honest answer: a lot of things.

Operator network outages. If a major MNO has an SS7 incident, messages to that operator won't deliver, regardless of gateway uptime.
Vendor outages on routes you control. If your primary SMPP vendor goes down and you haven't configured a fallback, the gateway is up but your messages aren't getting through.
Carrier filtering decisions. Operator spam filters can drop or delay traffic without notice. Detecting and rerouting around it is part of operations, not the SLA.
Recipient handset state. Powered off, out of coverage, blocked sender — none of these are gateway issues.
Regulator-driven changes. New 10DLC rules, sender-ID re-registrations, template re-approvals — these create downstream impact the SLA doesn't cover.

This is why "delivery guaranteed" is meaningless. No vendor SLA can credibly cover the parts of the system the vendor doesn't run.

How we measure it

Three dimensions:

Synthetic transactions every 30 seconds against the API and against test routes through each connected vendor. These run from multiple geographic origins.

Production telemetry from the API gateway and routing layer, aggregated per minute, retained for 13 months for SLA reporting.

Vendor connection health continuously monitored — if any connected vendor route degrades, we surface it, route around it where possible, and log the event.

The monthly availability number is calculated from the production telemetry, not the synthetics. Synthetics are an early-warning system, not the official measure.

What you get if we miss

The standard remedy for missing the gateway SLA is service credits — applied as a percentage of the gateway fee for the month. Not the per-message vendor cost (which we don't charge). Specific terms are in your customer agreement.

More importantly: every SLA-relevant incident gets a written postmortem within 5 business days, regardless of whether we missed the threshold.

Where the operations layer sits

The reason we're conservative about what the SLA covers is that the value of working with a managed gateway is mostly outside the gateway itself. It's the:

Continuous monitoring of routes you don't have time to watch
Vendor escalations during incidents on the operator side
Routing changes when carrier filtering shifts
Compliance and registration handling as rules change per market

These don't fit into a single uptime number. They show up in your conversion rates, your incident frequency, and the absence of 3am pages your team didn't have to take.

What to ask any messaging vendor

Three questions worth asking when you evaluate a gateway or CPaaS provider:

What does your SLA actually cover, in technical terms — API only, or end-to-end delivery?
How is it measured, and can I see the underlying telemetry?
What's the published incident response time for a route degradation that's *outside* the SLA?

The third question is usually the most revealing. The vendors who do this well have an answer ready. The ones who treat operations as the customer's problem don't.

If you'd like to see how we report against the SLA, or talk through what an operational SLO covering your specific markets would look like, book a 30-minute messaging review.

Book a messaging review