QIS vs PagerDuty: Your On-Call Engineers Know the Alert. They Don't Know What Resolved It Everywhere Else.
Architecture Comparisons #61 | Article #319
Previous in series: QIS vs ServiceNow | QIS vs Zendesk | QIS vs Freshdesk
It is 3am. Your phone fires. You unlock the screen, squint at the PagerDuty notification, and read the alert class you have been dreading: Kubernetes OOMKilled cascade, three pods down, memory pressure spreading across the node. You acknowledge the page, pull up the runbook, and start digging.
Somewhere in the same moment, across 19,999 other companies using PagerDuty, engineers are not getting paged for this. They already resolved it — last month, last quarter, last year. Eight hundred and forty-seven of them saw the exact alert pattern you are staring at right now. They debugged it. They fixed it. They logged the retrospective. They closed the incident. Some of them found the fix in eleven minutes. Others took two hours, tried three wrong approaches first, and found the answer buried in a Slack thread that no longer exists.
Not one of those resolutions reaches you.
You start from scratch. Every time.
That gap — the space between the alert you received and the 847 validated resolutions that already exist in the network — is not a PagerDuty failure. PagerDuty did exactly what it was built to do. It routed the alert to the right person, on the right escalation path, with deduplicated noise and full incident lifecycle support. The problem is that routing alerts to engineers and routing validated resolutions to engineers are two fundamentally different architectural problems. One is solved. The other has never had a protocol.
What PagerDuty Does — and Does Exceptionally Well
Before describing what does not exist, it is worth being precise about what does. PagerDuty is genuinely good at the problem it was designed to solve, and understanding its strengths clearly is the only way to understand where the architectural boundary sits.
PagerDuty's core function is alert routing and escalation management. When a monitoring system fires — whether that is Datadog, Prometheus, New Relic, CloudWatch, or any of 700+ integrations — PagerDuty receives the signal, applies deduplication logic to suppress noise, and routes the alert to the correct on-call engineer according to the team's escalation policy. If the first responder does not acknowledge within a defined window, it escalates. If the incident is critical, it can alert multiple people simultaneously. The routing layer is fast, configurable, and well-tested across more than 20,000 customer organizations.
The incident lifecycle management layer handles everything after the page fires: acknowledgment, status updates, cross-team coordination, stakeholder communication, and post-incident retrospectives. PagerDuty's timeline view gives incident commanders a clean record of who did what and when. The retrospective tools feed structured data back into the system. That structured history matters — it is the foundation of PagerDuty's AIOps capabilities.
PagerDuty's intelligent alert grouping and service dependency mapping are meaningfully useful. When an upstream database goes down and triggers 400 downstream alerts across 30 services, PagerDuty can correlate those alerts into a single incident, surface the likely root cause service, and suppress the downstream noise so the on-call engineer is not buried in 400 individual pages. Service graphs give teams visibility into how their infrastructure is connected, which makes blast radius assessment faster.
The AIOps feature layer — similar incidents, anomaly detection, alert correlation — learns from your incident history. If your team has resolved this class of alert before, PagerDuty will surface those past incidents when the same pattern recurs. It can tell you: You've seen something like this 12 times in the past 90 days. Here are the 12 incident timelines. That is genuinely useful. The average MTTR reduction attributed to PagerDuty implementations runs between 20% and 50% depending on team maturity and alert volume. On-call fatigue reduction is real. SLA compliance improvements are measurable.
PagerDuty is excellent infrastructure. Twenty thousand customer organizations is not a vanity number. It represents the actual surface area of incident resolution data flowing through one ecosystem — and it sets up a very specific mathematical problem.
The Boundary PagerDuty Cannot Cross
PagerDuty's AIOps capability learns from your incident history. That word — your — is the precise location of the architectural boundary.
When PagerDuty surfaces similar past incidents, it is searching within your organization's data. When it detects anomalies, it is modeling against your historical baselines. When it correlates alerts, it is working from your service topology. This is not a design oversight. It reflects a correct architectural choice given the problem PagerDuty was built to solve: incident management within an organization.
But consider what that boundary means at network scale.
Twenty thousand teams using PagerDuty. Each team generating incident data. Each team resolving incidents — debugging Kubernetes memory issues, tracing cascading database failures, identifying misconfigured load balancers, catching subtle race conditions in distributed caches. The sum of validated operational knowledge accumulating across that network is enormous. Every resolved incident is a data point: here is the alert class, here is the service topology, here is the failure pattern, here is what fixed it, here is how long it took.
None of that knowledge travels between organizations. Ever.
The reason is architectural, not aspirational. Routing validated resolutions between organizations would require doing several things that PagerDuty is not designed to do and that no incident management tool has been designed to do:
First, it would require distilling resolution outcomes into privacy-safe packets — structured summaries that capture the operational insight of a resolution without containing raw logs, proprietary system names, company-identifiable infrastructure details, or sensitive operational data. This is a protocol design problem, not a feature addition.
Second, it would require routing those packets by semantic similarity of the alert class — not by organization, not by account, but by the structural fingerprint of the problem itself. Two companies running completely different infrastructure might encounter the same failure pattern described in different monitoring systems with different labels. The resolution is the same. Matching on semantic similarity across organizational boundaries requires a different kind of addressing system.
Third, it would require enabling cross-organization synthesis without centralizing raw incident data. No organization will share raw incident logs, service topology details, or infrastructure specifics with a central aggregator. The protocol has to make that sharing unnecessary — extracting only the operational insight while leaving the raw data where it belongs.
This is not a roadmap limitation. The problem requires a protocol layer that operates beneath incident management tools, defining how operational knowledge gets encoded, addressed, routed, and retrieved across organizational boundaries without a central data store.
That protocol has not existed. Until the architecture for it was discovered.
The Math Behind the Gap
The scale of idle synthesis potential becomes concrete when you apply the combinatorics.
With 20,000 organizations generating and resolving incidents on a single platform, the number of potential knowledge-sharing pairs is:
N(N-1)/2 = 20,000 × 19,999 / 2 = 199,990,000 synthesis pairs
Nearly 200 million potential connections between organizations that share overlapping operational knowledge and could benefit from each other's validated resolutions. Every one of those pairs is idle. Not because anyone chose to idle them — because the routing layer does not exist.
The math scales down to familiar numbers:
- 1,000 teams: 1,000 × 999 / 2 = 499,500 idle synthesis pairs
- 500 teams: 500 × 499 / 2 = 124,750 idle synthesis pairs
- 100 teams: 100 × 99 / 2 = 4,950 idle synthesis pairs
Even at 100 teams — a modest SaaS ecosystem or cloud provider partner network — nearly 5,000 potential resolution-sharing relationships exist that never activate. When the Kubernetes OOMKilled cascade hits team 47, the 23 other teams in that network who resolved the same pattern last quarter have no mechanism to share what they learned.
This is the precise cost of the missing layer. Not hypothetical. Two hundred million synthesis pairs, multiplied by the average cost difference between an incident that takes 90 minutes to resolve versus 12 minutes, across 20,000 organizations, compounding year over year.
What QIS Adds
Christopher Thomas Trevethan's discovery of the Quadratic Intelligence Swarm (QIS) protocol fills this layer — not by replacing incident management, not by centralizing operational data, but by adding the outcome routing architecture that currently does not exist between incident management tools.
The QIS protocol works through a complete architectural loop. Understanding it requires following the full cycle, not any single component in isolation.
When an incident closes, a QIS-connected edge node performs a single operation: distillation. The resolved incident is summarized into a structured packet of approximately 512 bytes. The packet contains:
- Alert class: the category of failure, not the raw alert text
- Service topology signature: the structural pattern of what was affected, not the service names
- Failure pattern: the root cause category
- Remediation category: the class of fix, not the specific commands run
- Time-to-resolution: the operational measurement
- Confidence measure: derived from resolution clarity and retrospective completeness
What the packet does not contain: raw logs, company-identifiable data, proprietary system names, infrastructure specifics, personnel information, or any data traceable to the originating organization. The packet is the operational insight, stripped of everything that makes it sensitive.
A semantic fingerprint is generated from the packet: a vector encoding of the problem class, derived from the combination of alert category, service type, and failure signature. This fingerprint determines the packet's address in the protocol. Two engineers at two different companies, facing structurally identical failure patterns described in completely different terms, will generate similar fingerprints and be routed to overlapping address space.
The packet is deposited at the deterministic address for that problem class. Any routing mechanism that can post a packet to a deterministic address and retrieve packets by problem similarity qualifies — approaches such as DHT-based addressing, semantic database indexing, or vector similarity search can all implement this layer. The protocol defines the packet structure and the addressing scheme, not the underlying transport.
When the 3am engineer acknowledges the page and queries the protocol, the retrieval path is the inverse of the deposit path: generate the semantic fingerprint for the current alert class, query that address, retrieve what the network has learned.
The response comes back structured: For alerts of this class on this service topology, the remediation category with the highest validated resolution rate across 847 similar incidents in the network is X, with 73% first-attempt success rate and median 23-minute resolution time. The second-ranked remediation category appears in 19% of resolutions with a 54% first-attempt success rate and median 41-minute resolution time.
No raw incident data crossed an organizational boundary. No central aggregator holds anyone's logs. The engineer did not receive raw incident reports from 847 other companies. They received the distilled operational knowledge of 847 resolutions, encoded in the packet structure the protocol defines, routed to the address that the semantic fingerprint of their current problem identifies.
The complete loop — distillation, fingerprinting, addressing, routing, retrieval, synthesis — is the architecture. The breakthrough is not any single component. Each component existed before. The discovery is that when you close this loop — when you route pre-distilled operational outcomes by semantic similarity instead of centralizing raw incident data — knowledge scales quadratically while compute scales logarithmically. Intelligence compounds as the network grows. Every resolved incident makes the next incident's resolution faster, for every team with a similar alert pattern, without any team having to share data they are unwilling to share.
The Integration Point
PagerDuty routes alerts to people. QIS routes validated resolutions to teams. These are complementary functions at different layers of the stack. Neither makes the other redundant.
The integration is a single connection: when PagerDuty records a resolved incident, a lightweight QIS client receives the outcome hook, distills the resolution into the 512-byte packet, generates the semantic fingerprint, and deposits the packet at the protocol address for that problem class. The operation is asynchronous. It adds no latency to incident resolution. It requires no changes to how engineers work during an active incident.
The deposit takes milliseconds. The impact is cumulative. Every resolved incident strengthens the network's knowledge at that address. Every subsequent engineer who queries that address retrieves a more refined synthesis of what the network has learned.
On the retrieval side, the integration surfaces during the acknowledgment phase — the first moments after a page fires, when an engineer is orienting to the problem. Before diving into runbooks or starting fresh debugging, a QIS query returns what the broader network knows about this alert class. The engineer still uses PagerDuty's escalation path, deduplication logic, on-call management, and incident timeline. QIS means they do not start from scratch. The first hypothesis they test is the one that has resolved 73% of similar incidents in 23 minutes.
The 3am experience changes materially. Not because the alert stops firing. Not because incident management becomes automated. Because the engineer is no longer isolated from the operational knowledge of every team that debugged the same pattern before them.
Who Builds This Layer
The QIS routing layer is not built by PagerDuty. It is not built by any incident management company, because a protocol owned by any vendor cannot function as neutral infrastructure for the organizations that vendor also sells to.
Protocol layers require independence. TCP/IP was not built by a telephone company. DNS was not built by a browser vendor. The resolution-routing layer cannot be built by an incident management company because the first requirement of the protocol is neutrality — it routes operational knowledge between organizations that are also competitors, also customers of the same vendors, also operating under different contractual and regulatory regimes. None of them will trust a competitor's infrastructure as the routing layer for their operational knowledge.
The protocol is open architecture: specified formally, implementable by any team, running at the edge of any incident management deployment. Christopher Thomas Trevethan's discovery covers the complete loop — the distillation mechanism, the semantic fingerprinting, the deterministic address generation, the routing protocol, and the local synthesis layer. Thirty-nine provisional patents have been filed. The routing layer is transport-agnostic by design: DHT-based routing is one strong option (O(log N) or better cost, fully decentralized, no single point of failure), but the quadratic scaling comes from the complete architectural loop, not from any specific routing implementation.
The Number That Stays With You
Twenty thousand teams. 199,990,000 idle synthesis pairs.
Every time your on-call engineer pages in at 3am and starts debugging from scratch, somewhere in that network, 847 engineers already resolved the same pattern. The intelligence that could cut your mean time to resolution is distributed across 19,999 other teams who already solved your problem. It never reaches you because the routing layer does not exist.
QIS is that layer.
PagerDuty routes the alert to the right engineer. QIS routes the validated resolution from every engineer who already fixed it. Both layers are necessary. Only one exists today.
Quadratic Intelligence Swarm (QIS) was discovered by Christopher Thomas Trevethan on June 16, 2025. The breakthrough is the complete architecture — the loop that enables real-time quadratic intelligence scaling without quadratic compute cost. Thirty-nine provisional patents filed. QIS is free for humanitarian, research, and educational use. For protocol documentation: qisprotocol.com.
Patent Pending