Incident Summary
Between the afternoon of March 14 and early morning March 15, The Luc Cochran Foundation experienced a widespread partial outage affecting its internal and external VOIP phone systems. Reports surfaced across departments citing failed call connections, dropped internal transfers, unavailable voicemail systems, and unstable inbound helpline functionality.
Though live web chat and staff email channels remained functional for most users, the disruption significantly impacted the Foundation’s ability to conduct scheduled tutoring support, field donor communications, and assist with volunteer coordination via phone.
The core issue stemmed from a routing loop across our VOIP infrastructure, impacting the communication bridge connecting multiple office endpoints across Metro Atlanta. This triggered severe delays in SIP signaling and ultimately blocked call completion across the majority of extensions.
Timeline of Events
March 14, 2025
3:01 PM EST – Initial incident reports received from Academic Allies and HR departments regarding call drops.
3:22 PM – Internal traffic monitors detect latency spikes and intermittent signaling failures.
3:36 PM – Incident formally declared and logged by IT Support.
4:10 PM – Failover procedures begin on secondary SIP routes; partial call traffic is restored.
5:50 PM – System-wide notice issued on the official Status Page. Investigation continues overnight.
March 15, 2025
2:30 AM – Engineering traces confirm persistent loop conditions between Bridge ATL-3 and Routing Pool 1B.
3:15 AM – Routing table scrub initiated to isolate and resolve affected bridge loop.
4:05 AM – Voicemail systems restored after SIP stabilization on redundant trunks.
5:48 AM – All inbound/outbound call channels validated across all primary locations.
6:20 AM – VOIP system considered fully restored and stable. Final status updated.
Root Cause
A recently deployed VOIP routing policy intended to unify call flow between offices unintentionally introduced a loop between Bridge ATL-3 and Routing Pool 1B. This created an infinite attempt to resolve call handoff requests, leading to congested bridge queues, slow SIP responses, and eventual packet timeouts.
The automated failover failed to trigger within the intended threshold due to TTL misconfigurations and bridge health misreporting.
Impact Overview
Affected Services:
Extension-to-extension calling
External inbound/outbound communication
Automated voicemail routing
Volunteer & donor phone-based support
Impacted Departments:
Tutoring Services
Donor Relations
Community Outreach
Human Resources
Administration
Geographic Reach:
Henry County Headquarters
North Atlanta Regional Office
Mobile softphone users in South Fulton & Fayette areas
Resolution Overview
Routing updates were rolled back and restructured to remove the loop and reset call flow balance across all bridges. Temporary overrides were put in place to allow for manual rerouting during the transition. SIP traffic gradually normalized after queues were flushed and DNS propagation completed. Final confirmation of restoration occurred at 6:20 AM on March 15.
Conclusion
The incident exposed a vulnerability in how multi-bridge VOIP configurations are rolled out across multiple office locations without sandbox simulation. Despite the outage’s duration, our fallback communication channels helped maintain core services during the affected hours. Additional monitoring thresholds and pre-deployment validation layers are being reviewed internally to ensure smoother operations in future updates.