Companies make configuration changes. Mistakes happen. Sometimes they cascade. In this case, it seems like a simple syntax error or similar cascaded. I’ve seen far, far stupider causes of outages in my career, like idiots running tests against the prod DB and forgetting that the test does DROP SCHEMA public; before it runs.
So, anyone wanna take a guess at what it is this time?
My money is on a self-signed SSL certificate used in the back end somewhere expired. Some SAML setup or something that everyone forgot about.
I’m gonna guess it’s related to leap day. Someone rolled there own time date module and a weekly cron job.
Att engineers
I’ll wager BGP. Or DNS.
Someone on nanog-list said DB troubles.
edit: the replies on this will make you cry, but https://m.facebook.com/nt/screen/?params={"note_id"%3A10158791436142200}&path=%2Fnotes%2Fnote%2F
They screwed around with a critical service that is necessary for every single one of their sites?
Why is it such an enormous monolith anyway
Companies make configuration changes. Mistakes happen. Sometimes they cascade. In this case, it seems like a simple syntax error or similar cascaded. I’ve seen far, far stupider causes of outages in my career, like idiots running tests against the prod DB and forgetting that the test does
DROP SCHEMA public;
before it runs.