Quoting Steve Litt (slitt@???):
> Sometimes a good, prophylactic fresh install is just what's needed.
There's something to that.
At $FIRM, a big shop where I was Senior Sysadmin for six years in the
Operations department, we tried to make every machine auto-buildable
using confirmation management (CM) software [1]. If there was even a
suspicion of something wonky in a host's software, we would disable it
in the hardware load-balancer and re-kickstart it, which installed a
minimal OS load, parsed its assigned IP out of the IP/MAC database, IPed
the host, and installed/started the CM agent. The latter checked in
with the CM master, determined the host's intended role based on its IP,
and installed/configured additional software to suit the machine's role.
Total downtime for such a rebuild was maybe 1/2 hour. Then, re-enable
in the load balancer, and done.
If the hardware appeared wonky, same thing except with a swapout for a
new host and updating of the IP/MAC records.
Fortunately, the presence of the CM agent keeping an eye on things meant
_most_ unauthorised changes (e.g., by a coder deciding to go cowboy)
would be corrected automatically, but sometimes there's nothing quite
like a full rebuild.
I really do think making hosts be autobuildable, with all package and
conffile state recorded in CM rulesets, is the _right_ way to go for any
host that needs to be reliable. I'm aiming to do that in the near
future even with machines on my home network. (For a relatively simple
CM system suitable for small setups, Ansible is good.
https://www.ansible.com/ )
For purposes of my home network, I don't need to make the machines
_totally_ automatically buildable, which is a good thing, as I'd rather
not deal with d-i pre-seeding, Kickstart, FAI, or that sort of thing if
I don't have to (on a small network). Whereas, the gain from CM is
_huge_ and worth the trouble IMO.
[1] We started out using cfengine 2.x, and like many other shops
migrated to puppet.