Incident management that doesn't assume you have an incident commander.
A timeline per incident, a status page that updates itself, and acknowledgements that work from your inbox at 11pm. Built for teams where the person on call is also the one answering tickets.
No war room to staff. No second tool to wire up to your status page. The incident is the status update.
API latency elevated for some customers
InvestigatingTimeline
We're seeing elevated API latency for some customers and are looking into it. We'll post an update here as we learn more.
Priya
Looks correlated with this morning's deploy — pulling the slow-query logs now.
Sam
You don't need a war room. You need fewer tabs.
Here's how an incident actually goes on a team your size. Something looks off. Someone posts in the Slack channel — "is the dashboard slow for anyone else?" An hour in, a customer emails to ask. Two of you are now looking at logs. Somebody says "should we update the status page?" and nobody does, because the status page is a different tool nobody has open, and the person who'd update it is the same person reading the logs.
The incident gets sorted. The status page still says "all systems operational". The customer who emailed never hears back about the thing they noticed. There was no war room — there was you, a Slack thread, and three tabs you were switching between at 11pm. The tool you need isn't one that choreographs a response team you don't have. It's one that means you're not the integration layer between your logs, your customers, and your status page.
A timeline, not a war room.
An incident moves through four states, and that's the whole flow. No phases to configure, no roles to assign before you can start.
Something's wrong and you're figuring out what. The default starting state.
You know the cause. You may not have fixed it yet, but you can tell customers what's going on.
The fix is in; you're watching to be sure. This state doesn't count against your uptime — a fix you're verifying isn't an outage.
Done. The incident's resolved time is set the moment you post the resolving update.
Severity, and what your customers see.
Severity is yours to set internally — but it also decides what your customers see on the status page, automatically. Four severity levels, four public states, mapped one-to-one so the customer-facing status is never a guess.
Two outage states, because "down for everyone" and "broken for some" aren't the same promise: a partial outage (orange) means some customers are affected; a major outage (red) means the service is down. Add degraded performance (amber) and operational (green) and that's the whole key. You set the severity once; the public label follows.
Set the severity. Your status page already knows.
This is the part most tools make you do twice. You manage the incident in one place, then re-type it into a separate status-page tool so customers see it. Two surfaces, kept in sync by hand, by the same person who's mid-incident.
In StayUpfront there's no second surface to keep in sync. The status page is a view of the incident, not a copy of it. You set the severity and pick the affected components; the public status page reflects it the moment you post — the right components flip to the matching public state — degraded performance, partial outage, or major outage — the active incident shows up at the top of the portal, and your ninety-day uptime history records it. When you move to resolved, the components clear themselves.
Your internal severity stays internal. What customers see is the plain-English version: this component is degraded, here's the incident, here's the latest update. You decide what's public per update — without leaving the incident.
The status page is a derivative. Severity, component status, public label — automatically. No second tool, no manual paste.
Resolve once. Move the incident to resolved and the affected components clear themselves. No "remember to flip the status page back".
Public is a choice, per update. Each update can be public or internal. Customers see the wording you chose — not your internal severity label.
API latency elevated for some customers
MonitoringWe've rolled back the slow query and load times have recovered. We're watching to be sure before we resolve.
Updated
One incident. The left is what you set; the right is what your customers see. You only touch the left.
Incidents start themselves. Or you start them.
Most incidents you'd rather not find out about from a customer. So your monitors create them for you. When an uptime check fails, a scheduled job misses its heartbeat, or a domain check spots an expiring certificate or a DNS change you didn't make, StayUpfront opens the incident automatically — already linked back to the monitor that raised it, so the same alert manager that opened it can resolve it when the monitor recovers.
And when you spot something a monitor can't — a customer-reported bug, a degraded third-party dependency, a planned change going sideways — you declare the incident yourself in a couple of fields. Either way it's the same incident, with the same timeline, the same severity-to-status mapping, and the same public surface.
Auto-created from monitors. Uptime checks, heartbeat monitors, and domain checks (which cover DNS and certificate expiry) open incidents on their own — tagged as automated, linked to their source.
Or declared by hand. Something a monitor can't see? Open an incident in a couple of fields. Same timeline, same public surface.
Heartbeat overdue: nightly-export job
InvestigatingResolves automatically when the monitor recovers.
One timeline. Internal notes and public updates, in order.
Every incident is a single timeline, posted in order. Some entries are public — the updates your customers read on the status page. Some are internal notes for the team, marked with a lock and never shown publicly. They live in the same thread, so the team context and the customer-facing story aren't in two different places.
The timeline is the record. When you change an incident — its title, severity, status, which components are affected — the change is logged into the timeline automatically as an internal note, so the audit trail captures what changed and when, not just the words you typed. You can correct a timestamp, but it's validated to stay in order, and the first update can't be deleted. The history holds.
Sue — the AI inside StayUpfront — drafts the update from what's already on the incident, so you're editing a sentence instead of starting from a blank box at 11pm. She drafts; you decide what ships. Nothing reaches a customer that you didn't send.
Timeline
We've rolled back the slow query and load times have recovered. We're watching to be sure before we resolve.
Priya
Slow query traced to this morning's deploy. Rollback running now — hold the public update until latency drops.
Sam
Severity changed from major to minor.
Priya acknowledged this incident.
Acknowledge from the email. At 11pm. Without opening anything.
When an incident pages you, the alert email has an acknowledge link in it. Tap it and you've acknowledged — from your phone, from your inbox, without logging into anything first. There's an in-app button too, for when you're already at your desk. Either way, the acknowledgement lands on the incident timeline so the rest of the team can see someone's got it.
And once you've acknowledged, the system stops escalating. If a voice call was queued to wake you, it checks first — if you've already acknowledged, it doesn't call. The point of an alert is to reach someone, not to keep punishing the person it already reached.
Ack by email link. A tap on the alert email acknowledges the incident — no app, no login. There's an in-app button too.
Acknowledged means acknowledged. Escalation stops, and a queued voice call checks your ack status before it dials. Already on it? It won't call.
An incident on API needs an owner. Acknowledge to let the team know you're on it and stop the escalation.
AcknowledgeWorks straight from this email. No app to open.
The ticket that reported it, right there on the incident.
On most teams your size, the person on call is also the person answering support. So when a customer raises a ticket about the thing that's breaking, you shouldn't have to play detective across two tools to connect them.
In StayUpfront, incidents and support tickets link both ways. The incident's sidebar lists the tickets customers raised about it; each linked ticket shows the active incident right on the ticket. Declare an incident from a ticket and it pre-fills the severity, the visibility, and the affected components from what you already know. The agent picking up the ticket sees "this is a known incident, here's the status" instead of replying "have you tried a hard refresh?" to someone who's already read your status page.
Linked tickets
Dashboard slow to load this afternoon
Getting timeouts on the API since ~2pm
On-call, for when you grow into it.
You don't need an on-call rotation to use any of this. A founder and one engineer can run incidents perfectly well by both just getting the alert. But when you grow into wanting a rotation, it's here and it's real — not a checkbox.
Each team can have on-call schedules with weekly or daily rotations, a handoff day and time, and active hours so the rotation only pages someone when it's meant to. Need to swap a shift or cover a holiday? Add an override and it takes priority over the rotation. Your team dashboard shows who's on call right now across every schedule, a week-at-a-time timeline of who's got it next, and it flags the gaps in red where nobody's covered — so you find the holes before an incident does. When an alert targets the team, it goes to whoever's on call; if it's off-hours and nobody is, it falls back to the whole team rather than alerting no one.
Slack threads, both ways.
Incident updates post to your Slack channel, and subsequent updates reply in the same thread — so the channel reads as one conversation, not a stream of disconnected pings. Replies in that Slack thread get pulled back and shown in the incident's sidebar, so the discussion that happened in Slack is part of the incident record, not lost in a channel nobody scrolls back through.
Big enough? Still fits.
Plenty of incident tools assume a platform team, a reliability function, and a formal incident-commander rotation before you can run a clean incident. This doesn't. It's built for the team where one person is the on-call engineer, the support agent, and the person posting the update — and it still holds up when that's three different people, or a whole team. The shape stays the same: declare it, work it, tell your customers, resolve it. You won't outgrow the way it works just because the team grew.
You'll know if this is you.
You're a founder and one engineer. "Incident management" sounds like a thing bigger teams do — but you're the one who notices the site's slow, fixes it, and should be telling customers. The incident, the status update, and the customer who reported it are one workflow here.
You're a small team tired of paying for three tools. A status-page tool, an on-call tool, and a support tool that don't know each other. Here the monitor opens the incident, the incident updates the status page, and the ticket that reported it is right there in the sidebar. One workspace, one bill.
Either way: incident management sized for the team you actually have.
Want this before your next incident? Get in early.
Private beta is a few weeks out, and I'm letting people in a small group at a time so I can actually work with each of you — what's breaking, what's missing, what to ship next. The earliest customers shape how this handles a real incident, because they're the ones running real incidents through it.
Drop your emailDirect email from Rob when your slot's ready. No drip sequence.