Intermittent Phone Connectivity Issues

Incident Report for The Luc Cochran Foundation

Postmortem

Incident Summary

Between the afternoon of March 14 and early morning March 15, The Luc Cochran Foundation experienced a widespread partial outage affecting its internal and external VOIP phone systems. Reports surfaced across departments citing failed call connections, dropped internal transfers, unavailable voicemail systems, and unstable inbound helpline functionality.

Though live web chat and staff email channels remained functional for most users, the disruption significantly impacted the Foundation’s ability to conduct scheduled tutoring support, field donor communications, and assist with volunteer coordination via phone.

The core issue stemmed from a routing loop across our VOIP infrastructure, impacting the communication bridge connecting multiple office endpoints across Metro Atlanta. This triggered severe delays in SIP signaling and ultimately blocked call completion across the majority of extensions.

Timeline of Events

March 14, 2025

  • 3:01 PM EST – Initial incident reports received from Academic Allies and HR departments regarding call drops.

  • 3:22 PM – Internal traffic monitors detect latency spikes and intermittent signaling failures.

  • 3:36 PM – Incident formally declared and logged by IT Support.

  • 4:10 PM – Failover procedures begin on secondary SIP routes; partial call traffic is restored.

  • 5:50 PM – System-wide notice issued on the official Status Page. Investigation continues overnight.

March 15, 2025

  • 2:30 AM – Engineering traces confirm persistent loop conditions between Bridge ATL-3 and Routing Pool 1B.

  • 3:15 AM – Routing table scrub initiated to isolate and resolve affected bridge loop.

  • 4:05 AM – Voicemail systems restored after SIP stabilization on redundant trunks.

  • 5:48 AM – All inbound/outbound call channels validated across all primary locations.

  • 6:20 AM – VOIP system considered fully restored and stable. Final status updated.

Root Cause

A recently deployed VOIP routing policy intended to unify call flow between offices unintentionally introduced a loop between Bridge ATL-3 and Routing Pool 1B. This created an infinite attempt to resolve call handoff requests, leading to congested bridge queues, slow SIP responses, and eventual packet timeouts.

The automated failover failed to trigger within the intended threshold due to TTL misconfigurations and bridge health misreporting.

Impact Overview

Affected Services:

  • Extension-to-extension calling

  • External inbound/outbound communication

  • Automated voicemail routing

  • Volunteer & donor phone-based support

Impacted Departments:

  • Tutoring Services

  • Donor Relations

  • Community Outreach

  • Human Resources

  • Administration

Geographic Reach:

  • Henry County Headquarters

  • North Atlanta Regional Office

  • Mobile softphone users in South Fulton & Fayette areas

Resolution Overview

Routing updates were rolled back and restructured to remove the loop and reset call flow balance across all bridges. Temporary overrides were put in place to allow for manual rerouting during the transition. SIP traffic gradually normalized after queues were flushed and DNS propagation completed. Final confirmation of restoration occurred at 6:20 AM on March 15.

Conclusion

The incident exposed a vulnerability in how multi-bridge VOIP configurations are rolled out across multiple office locations without sandbox simulation. Despite the outage’s duration, our fallback communication channels helped maintain core services during the affected hours. Additional monitoring thresholds and pre-deployment validation layers are being reviewed internally to ensure smoother operations in future updates.

Posted Mar 15, 2025 - 06:27 EDT

Resolved

The recent issue affecting phone connectivity across various departments has been resolved. All voice services have returned to normal operation and are functioning as expected.

Our team has implemented a fix and will continue to monitor performance to ensure continued stability.

Thank you for your patience and understanding.

— The Luc Cochran Foundation IT & Admin Services Team
Posted Mar 15, 2025 - 06:18 EDT

Monitoring

Our team has identified the cause of the recent phone service disruptions affecting certain departments and call queues across The Luc Cochran Foundation.

The issue stemmed from a VOIP bridge synchronization failure across our multi-office communications network. This bridge is responsible for routing and managing voice traffic between our administrative offices, tutoring sites, and regional field locations. During a recent rebalancing operation designed to optimize call load distribution, a misalignment occurred between regional handoff nodes, resulting in dropped connections, delayed inbound routing, and intermittent call quality degradation.

We’ve since applied a routing correction and reset the affected bridge endpoints. As of this update, all systems have stabilized, and call functionality has returned to near-normal performance. However, we are continuing to monitor all voice traffic closely, particularly across remote locations where routing sensitivity is higher.

The VOIP infrastructure team is working alongside our carrier to ensure long-term remediation and enhanced call path redundancy to prevent future occurrences.

We appreciate your patience and understanding as we work to maintain seamless communication throughout our organization.

For immediate support, please contact hi@luccares.org or use the secure contact form on our website.

— The Luc Cochran Foundation IT & Admin Services Team
Posted Mar 15, 2025 - 06:11 EDT

Identified

We are currently experiencing partial outages with our phone system, which may affect inbound or outbound calls to certain departments. Our team is actively working with our service provider to restore full functionality as quickly as possible.

In the meantime, we recommend contacting us via email at hi@luccares.org or by using the contact form on our website if you’re unable to reach us by phone.

We appreciate your patience and understanding as we work to resolve this issue and ensure a smooth communication experience for all.

Thank you for continuing to support The Luc Cochran Foundation.
Posted Mar 14, 2025 - 03:01 EDT
This incident affected: Communications (Field Phone Systems).