What Happened:
Our internal monitoring tool raised an unusually high number of alerts, leading to high traffic to an internal Astronomer API. A technical inefficiency in the API caused it to scan the entire database instead of retrieving the necessary information, leading to memory issues. This in turn resulted in an outage for components reliant on this API. Among those components was one element of the startup process for Airflow workers.
Immediate Actions:
To address the issue promptly, we disabled non-critical alerts processing in the monitoring tool.
Preventive Measures:
We apologize for any inconvenience this may have caused and are committed to ensuring a more reliable experience. If you have questions or need further information, please reach out to our support team at support@astronomer.io.