• 0 Posts
  • 9 Comments
Joined 2 years ago
cake
Cake day: June 15th, 2023

help-circle
  • This doesn’t really answer my question but Crowdstrike do explain a bit here: https://www.crowdstrike.com/blog/technical-details-on-todays-outage/

    These channel files are configuration for the driver and are pushed several times a day. It seems the driver can take a page fault if certain conditions are met. A mistake in a config file triggered this condition and put a lot of machines into a BSOD bootloop.

    I think it makes sense that this was a preexisting bug in the driver which was triggered by an erroneous config. What I still don’t know is if these channel updates have a staged deployment (presumably driver updates do), and what fraction of machines that got the bad update actually had a BSOD.

    Anyway, they should rewrite it in Rust.



  • I would encourage you not to split things up too finely. A single repo for your environment would allow you to see all related changes with git. E.g. if you set up a new VM it might need a playbook to set something up, a script to automate a task, and a DNS entry. With a well put together commit message explaining why you’re making those changes there’s not much need for external documentation.

    Maybe if you want some more info organised in a wiki, point to the initial commit where you introduced some set up. That way you can see how something was structured. Or if you have a issue tracker you can comment with research on something and then close the issue when you commit a resolution.

    Try not to have info spread out too much or maintaining all the pieces will become a chore. Make it simple and easy to keep up.