What's New

New updates and improvements to College Aid Index

← Back

Production Incident - site unavailable for approximately 52 hours.

Fix
Incident Summary
On Wed 11/26/2025 I made a classic mistake by rushing a code change before leaving for Thanksgiving break. It broke the site. Monitoring wasn’t in place, so I only learned from users. I’m sorry for the downtime and frustration.

Impact
  • Site unavailable for roughly two days (11/26–11/28).
  • Users you contacted directly couldn’t access reports.
Timeline
  • Wed 11/26: Last-minute change deployed; no alerts raised.
  • Thu 11/27: Outage ongoing; discovered via user reports.
  • Fri 11/28: Returned, found the bad change, deployed fix; site restored.
Root Cause
  • Rushed, unreviewed code change introduced a breaking error.
  • No monitoring to surface the outage quickly.
Resolution
  • Fixed the faulty change and redeployed; site is serving traffic now.
Action Items
  1. No more last-minute deploys without review or a rollback plan.
  2. Add basic monitoring/alerts so outages are caught immediately.
  3. Improve test coverage to catch issues before deployment.
Contact
If you hit any outage or other issues, email me at [email protected].

Context
This is an early, part-time project, but availability comes first. I’ll prioritize stability over new features when time is limited.