Setting up the Control plane
Installing Ansible
On FreeBSD (or whatever you choose to be the control host), you’ll need to install a configuration management system. I personally prefer OpenVox (nee Puppet) for large-scale infrastructure, but Ansible is completely adequate for our purposes. Installing on FreeBSD is simple:
pkg update
pkg install py311-ansibleAlternatively, you could install Ansible in a virtual environment or via Anaconda, whatever your preferred Python package management is. If you don’t have an opinion, then I think that just using the system package manager is best, even if you might be a bit behind on feature releases.
Creating some Debian VMs
From Debian VM Installation on FreeBSD, you should have a base Debian VM you clone to create the rest of your infrastructure.
I’ve since upgraded to Debian Sid, but the cloning process is straightforward. For our tiny cluster, I’ll have one slurm control node running slurmctld, and one worker running slurmd. So let’s clone the base Debian Sid VM into two new VMs:
vm clone deb-sid slurmctl
vm clone deb-sid slurm-c1There’s some additonal preparation we need to do before we can kick off the automation. The cloned VMs will have the same SSH host keys as the original parent VM, so we should start the VMs and regenerate them. Start and access the VM:
vm start slurmctl
vm console slurmctlLogin as root, or login as your unprivileged user and su to root. Then,
remove the SSH keys and regenerate:
rm -f /etc/ssh/ssh_host_*
dpkg-reconfigure openssh-serverIt also seems to be a good idea to regenerate the systemd machine id (/etc/machine-id):
rm -f /etc/machine-id
dbus-uuidgen --ensure=/etc/machine-idAt this time, I suggest recording the IP address for the VM at this time, via
ip addr. You’ll need this later.
Repeat the process for any other VMs you’ve created.
Preparing the cluster network
You can of course prepare your cluster in arbitrarily complex ways, but for my
purposes I simply insert my VMs’ IPs into /etc/hosts on the server. For
instance:
$ tail -n2 /etc/hosts
192.168.68.50 slurmctl slurmctl.heisenburgers.com
192.168.58.53 slurm-c1 slurm-c1.heisenburgers.comCreating the Ansible playbook
Inventory
First we’ll need to create an inventory of all of our hosts. As I have lazily put the appropriate DNS entries into /etc/hosts, the inventory is quite simple:
cluster:
hosts:
slurmctl:
slurm-c1:
slurm-mgmt:
hosts:
slurmctl:
slurm-worker:
hosts:
slurm-c1:I’ve defined three host groupings here. cluster which encompasses all nodes
in the pool. slurm-mgmt which is supposed to have the slurm management
software and such, and finally slurm-worker which contains all of the worker
nodes.
The global configuration
My first playbook, which only operated over the cluster group looks
essentially like this:
- name: base config
hosts: cluster
handlers:
- name: restart munge
ansible.builtin.service:
name: munge
state: restarted
tasks:
- name: set hostname
ansible.builtin.hostname:
name: "{{ inventory_hostname }}"
- name: setup /etc/hosts
ansible.builtin.copy:
src: files/hosts
dest: /etc/hosts
- name: universal packages
apt:
name:
- tmux
- htop
- btop
- neovim
- sysstat
- pv
- munge
- slurm-client
- strace
- nfs-client
- name: munge setup
ansible.builtin.copy:
src: files/munge/munge.key
dest: /etc/munge/munge.key
notify:
- restart mungeI am not an Ansible expert, so this organization may be a bit crude or unidomatic, but bear with me. Reading this from top to bottom, we create a base config for all hosts in the cluster, set up some handlers (I’ll come back to this momentarily), and then run some tasks. The tasks are as follows:
* Set up the hostname, using a special built-in variable, such that each node has a name other than `debian` at boot.
* Copy my static `/etc/hosts` to all machines, so everyone has the same view of DNS
* Install a number of packages that I find to be useful
* Setup the `munge` software, slurm's authentication daemon
The munge setup step has something a bit unusual, the notify element. This
triggers Ansible to run the handler restart munge, such that if the munge key
is ever updated on disk, then all nodes will restart their munge daemon to pick
up the new key. In Puppet world, this is a fairly common pattern. I assume the
same is true for Ansible-based environments.
Grabbing the Munge key
I store the munge key in files/munge/munge.key, which I simply copied off of
the slurmctl VM. It is effectively a binary file, so one way you can copy-paste
this sort of thing is to do something like the following:
base64 /etc/munge/munge.key
which will produce a safely encoded version of the key that you can copy and paste. Then you can decode it via:
echo -n "<paste the key here>" | base64 -d > files/munge/munge.key
Slurm-specific config
For the Slurm-specific configuration, I used the slurm configuration tool. It worked okay, no complaints. Probably for a more complex, production configuration I’d write it by hand or otherwise templatize it.
That file was then stored into files/slurm/slurm.conf:
ClusterName=homelab
SlurmctldHost=slurmctl
ProctrackType=proctrack/cgroup
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
StateSaveLocation=/var/spool/slurmctld
TaskPlugin=task/affinity,task/cgroup
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
SchedulerType=sched/backfill
SelectType=select/cons_tres
JobCompType=jobcomp/none
JobAcctGatherFrequency=30
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurmctld.log
SlurmdDebug=info
SlurmdLogFile=/var/log/slurmd.log
NodeName=slurm-c1 CPUs=4 RealMemory=3072 State=UNKNOWN
PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UPAdding the slurm-specific parts to the playbook
Here’s another dump of YAML representing the slurm configuration:
- name: slurm management
hosts: slurm-mgmt
handlers:
- name: restart slurmctld
ansible.builtin.service:
name: slurmctld
state: restarted
tasks:
- name: packages
apt:
name:
- slurmctld
- name: slurmctld setup
ansible.builtin.copy:
src: files/slurm/slurm.conf
dest: /etc/slurm/slurm.conf
notify:
- restart slurmctld
- name: slurm spool
ansible.builtin.file:
path: /var/spool/slurm
state: directory
owner: slurm
group: slurm
mode: '0755'
- name: slurm worker
hosts: slurm-worker
handlers:
- name: restart slurmd
ansible.builtin.service:
name: slurmd
state: restarted
tasks:
- name: packages
apt:
name:
- slurmd
- name: slurmd setup
ansible.builtin.copy:
src: files/slurm/slurm.conf
dest: /etc/slurm/slurm.conf
notify:
- restart slurmd
notify:
- restart slurmdAt this point it may be prudent to start breaking this up into multiple files. The configuration is getting a bit long although not yet unwieldy. I’ll have to study Ansible a bit more to figure out the best approach to complex configuration. Nothing particularly special here, basically just using the same task/handlers approach as I did for the global configuration.
Notes, bugs, other annoyances
I’ll note here that the initial configuration of my node did not work, with the
node stuck in inval state. Apparently I needed to leave my node with a bit of
free memory. Instead of the full 4GB, I’ve reserved 1GB here for ‘system’
things and given the remainign 3GB to slurm.
I found that I needed to create the /var/spool/slurm dir on my control node.
Perhaps this is often on a dedicated, high-performance device. The slurmctld
wouldn’t start without it and couldn’t create it on its own. Presumably the
package should do this at install time. TBD.
I also note that running slurm -C on the node, which is supposed to print the
node’s configuration, hung indefinitely. Inspecting it with strace yielded an
ever-increasing number of attempts to open non-existent files:
newfstatat(AT_FDCWD, "/sys/class/drm/card6730/device/vendor", 0x7ffc478f02c0, 0) = -1 ENOENT (No such file or directory)
Is this a problem with BHyve on FreeBSD, Slurm version, Debian, or some interplay between all three? In any case, it’s not critical.