<p>Incidents are events that affect the availability or performance of your services. They allow you to communicate issues to users and track the resolution process with detailed updates.</p><h2>Incident Properties</h2><p>Each incident has the following properties:</p><ul><li><p><strong>Title</strong> - Clear description of the issue (e.g., "API latency affecting mobile app")</p></li><li><p><strong>Status Type</strong> - Severity level (degraded_performance, partial_outage, major_outage, incident)</p></li><li><p><strong>State</strong> - Current stage (active or resolved)</p></li><li><p><strong>Impact</strong> - Affected services (minor, major, critical)</p></li><li><p><strong>Affected Services</strong> - List of services impacted by this incident</p></li><li><p><strong>Description</strong> - Initial description of the incident</p></li><li><p><strong>Created At</strong> - When the incident was first reported</p></li><li><p><strong>Resolved At</strong> - When the incident was marked as resolved (if applicable)</p></li></ul><h2>Creating an Incident</h2><p>When an issue occurs, create an incident to communicate with users:</p><ol><li><p><strong>Identify the Issue</strong> - Determine what services are affected and the severity</p></li><li><p><strong>Create the Incident</strong> - Add to your status page with a clear title</p></li><li><p><strong>Set the Status Type</strong> - Choose appropriate severity (partial_outage, major_outage, etc.)</p></li><li><p><strong>Mark Affected Services</strong> - Select all services impacted by this incident</p></li><li><p><strong>Add Initial Description</strong> - Provide initial details about what's happening</p></li><li><p><strong>Publish</strong> - Make the incident visible to users on your status page</p></li></ol><h2>Incident Lifecycle</h2><p>An incident goes through several states:</p><h3>1. Investigating</h3><p>The initial state when an incident is first created. The team is gathering information and assessing the impact.</p><h3>2. Identified</h3><p>The root cause has been identified and a fix is being prepared. This provides users with confidence that the issue is understood.</p><h3>3. Monitoring</h3><p>A fix has been deployed and the team is monitoring for resolution. Services should be improving at this stage.</p><h3>4. Resolved</h3><p>The incident has been fixed and services have been restored to normal operation. Users can resume using affected services.</p><h2>Incident Updates</h2><p>Each incident can have multiple updates that track the resolution progress:</p><ul><li><p><strong>Initial Update</strong> - Created automatically when the incident is created</p></li><li><p><strong>Status Updates</strong> - Add updates when the incident state changes</p></li><li><p><strong>Progress Updates</strong> - Communicate progress even when state doesn't change</p></li><li><p><strong>Resolution Update</strong> - Final update when the incident is resolved</p></li></ul><h2>Updating an Incident</h2><p>As you work on resolving an incident, update it regularly:</p><ol><li><p><strong>Change State</strong> - Update to "Identified", "Monitoring", or "Resolved"</p></li><li><p><strong>Add Message</strong> - Describe what's new (e.g., "Deploying fix now")</p></li><li><p><strong>Update Affected Services</strong> - Add or remove services as needed</p></li><li><p><strong>Save Update</strong> - Each update creates a timeline entry</p></li></ol><h2>Impact Levels</h2><h3>Minor Impact</h3><ul><li><p>Affects a small subset of users</p></li><li><p>Service degradation is minimal</p></li><li><p>Workarounds may be available</p></li><li><p>Example: Mobile app shows error for 5% of users</p></li></ul><h3>Major Impact</h3><ul><li><p>Affects many users</p></li><li><p>Significant service degradation</p></li><li><p>Core functionality impaired</p></li><li><p>Example: API response times are 3x normal</p></li></ul><h3>Critical Impact</h3><ul><li><p>Affects most or all users</p></li><li><p>Services are completely unavailable</p></li><li><p>No workarounds available</p></li><li><p>Example: Entire API is down</p></li></ul><h2>Affected Services</h2><p>Mark which services are impacted by each incident:</p><ul><li><p><strong>Multi-select</strong> - Choose multiple affected services</p></li><li><p><strong>Dynamic Updates</strong> - Update affected services as you learn more</p></li><li><p><strong>Service Status</strong> - Affected services automatically show outage status</p></li><li><p><strong>Visual Indicators</strong> - Users can see which services are impacted</p></li></ul><h2>Incident History</h2><p>Resolved incidents are displayed in the "Past Incidents" section of your status page:</p><ul><li><p><strong>Title and Date</strong> - Shows what happened and when</p></li><li><p><strong>Duration</strong> - Displays how long the incident lasted</p></li><li><p><strong>Impact Badge</strong> - Shows severity level</p></li><li><p><strong>Resolved Badge</strong> - Indicates the incident is resolved</p></li><li><p><strong>Clickable</strong> - Users can click to view full incident details</p></li></ul><h2>Best Practices</h2><ul><li><p>Create incidents promptly when issues are discovered</p></li><li><p>Update incidents at least every 30 minutes during active resolution</p></li><li><p>Use clear, non-technical language in incident descriptions</p></li><li><p>Always mark incidents as resolved when services are fully restored</p></li><li><p>Conduct post-incident reviews to prevent recurrence</p></li><li><p>Keep incident titles concise but informative</p></li><li><p>Include affected services to help users understand scope</p></li></ul><h2>Example Incident Timeline</h2><ol><li><p><strong>10:00 AM</strong> - Incident created: "API latency affecting mobile app" (Investigating)</p></li><li><p><strong>10:15 AM</strong> - Update: "We're investigating increased response times on our API servers"</p></li><li><p><strong>10:30 AM</strong> - State change to "Identified": "Database load caused by recent deployment"</p></li><li><p><strong>10:45 AM</strong> - Update: "Rolling back problematic changes"</p></li><li><p><strong>11:00 AM</strong> - State change to "Monitoring": "Rollback complete. Monitoring API performance"</p></li><li><p><strong>11:15 AM</strong> - State change to "Resolved": "API performance back to normal"</p></li></ol><p></p>