:: Re: [DNG] Running libvirt fails und…
Kezdőlap
Delete this message
Reply to this message
Szerző: wirelessduck
Dátum:  
Címzett: dng
Tárgy: Re: [DNG] Running libvirt fails under runit-init
Hi,

On Thu, 21 Sept 2023 at 03:56, Bob Proulx via Dng <dng@???> wrote:
>
> wirelessduck--- wrote:
> > I'm having trouble running virsh under runit-init.
> >
> > It was working previously under sysvinit-core, but after I install
> > runit-init and reboot I can no longer access the hypervisor.
>
> I don't know much detail but I know how I would start to debug this.
> On a sysvinit booted system I would look for running processes related
> to libvirt.
>
>     root@calamity:~# ps -efH | grep libvirt
>     root      1657     1  0 Sep09 ?        00:05:18   /usr/sbin/libvirtd -d
>     libvirt+  3651     1 38 Sep09 ?        4-01:18:14   qemu-system-x86_64 -enable-kvm -name guest=...

>
> On a working system "libvirtd -d" is running in deamon mode.


On my working system with sysvinit-core installed, I get slightly
different with no qemu process but it still works and I can create/run
VM in virt-manager and run virsh without error.
I also get the same output when I have runit-init installed.

# ps -efH | grep libvirt
root      1823     1  0 Sep25 ?        00:00:01   /usr/sbin/libvirtd -d
nobody    2049     1  0 Sep25 ?        00:00:00   /usr/sbin/dnsmasq
--conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro
--dhcp-script=/usr/lib/libvirt/libvirt_leaseshelper
root      2050  2049  0 Sep25 ?        00:00:00     /usr/sbin/dnsmasq
--conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro
--dhcp-script=/usr/lib/libvirt/libvirt_leaseshelper



>     root@calamity:~# lsof -p 1657 | grep -i -e sock -e stream
>     libvirtd 1657 root  mem       REG               8,18    97464  135073016 /usr/lib/x86_64-linux-gnu/libboost_iostreams.so.1.62.0
>     libvirtd 1657 root   11u     unix 0xffff922018c85400      0t0      18513 /var/run/libvirt/libvirt-sock type=STREAM
>     libvirtd 1657 root   12u     unix 0xffff922017582400      0t0      18515 /var/run/libvirt/libvirt-sock-ro type=STREAM
>     libvirtd 1657 root   13u     unix 0xffff922015ee6400      0t0      18517 /var/run/libvirt/libvirt-admin-sock type=STREAM
>     libvirtd 1657 root   15u     unix 0xffff922018e0f000      0t0      18550 type=STREAM
>     libvirtd 1657 root   16u     sock                0,8      0t0      16620 protocol: TCP
>     libvirtd 1657 root   22u     unix 0xffff921fcfd63c00      0t0      29737 /var/run/libvirt/libvirt-sock type=STREAM
>     libvirtd 1657 root   26u     unix 0xffff922017582800      0t0      30981 type=STREAM


Once again, slightly different under sysvinit-core for me:

# lsof -p 1823 | grep -i -e sock -e stream
lsof: WARNING: can't stat() fuse.portal file system /run/user/1001/doc
      Output information may be incomplete.
libvirtd 1823 root    9u     sock                0,8      0t0
23356 protocol: NETLINK
libvirtd 1823 root   10u     unix 0x00000000430f3988      0t0
23357 /run/libvirt/libvirt-sock type=STREAM (LISTEN)
libvirtd 1823 root   11u     unix 0x0000000028d785f9      0t0
23358 /run/libvirt/libvirt-sock-ro type=STREAM (LISTEN)
libvirtd 1823 root   12u     unix 0x000000000d0f7fb5      0t0
23359 /run/libvirt/libvirt-admin-sock type=STREAM (LISTEN)
libvirtd 1823 root   15u     unix 0x000000008f5b86d7      0t0
23368 type=STREAM (CONNECTED)


When I check again under runit-init, I am missing the last output line:

# lsof -p 1856 | grep -i -e sock -e stream
lsof: WARNING: can't stat() fuse.portal file system /run/user/1001/doc
      Output information may be incomplete.
libvirtd 1856 root    9u     sock                0,8      0t0
14760 protocol: NETLINK
libvirtd 1856 root   10u     unix 0x0000000021f3f3ac      0t0
14761 /run/libvirt/libvirt-sock type=STREAM (LISTEN)
libvirtd 1856 root   11u     unix 0x00000000460ca424      0t0
14762 /run/libvirt/libvirt-sock-ro type=STREAM (LISTEN)
libvirtd 1856 root   12u     unix 0x000000004055b5e6      0t0
14763 /run/libvirt/libvirt-admin-sock type=STREAM (LISTEN)


Perhaps this is telling me that something socket-related didn't
connect/activate properly under runit-init? But I have no problems
running `virsh list` as root under runit-init. The hypervisor connect
error is only appearing when running as my local user. Hmmm....

>
>     root@calamity:~# strace -v virsh list 2>&1 | grep sock
>     socket(AF_UNIX, SOCK_STREAM, 0)         = 5
>     connect(5, {sa_family=AF_UNIX, sun_path="/var/run/libvirt/libvirt-sock"}, 110) = 0
>     getsockname(5, {sa_family=AF_UNIX}, [128->2]) = 0


Running as root under runit-init:

# strace -v virsh list 2>&1 | grep sock
access("/var/run/libvirt/libvirt-sock", F_OK) = 0
socket(AF_UNIX, SOCK_STREAM, 0)         = 5
connect(5, {sa_family=AF_UNIX,
sun_path="/var/run/libvirt/libvirt-sock"}, 110) = 0
getsockname(5, {sa_family=AF_UNIX}, [128 => 2]) = 0


Running as my local user, it seems to be accessing a different socket
that doesn't exist:

$ strace -v virsh list 2>&1 | grep sock
access("/var/run/libvirt/virtqemud-sock", F_OK) = -1 ENOENT (No such
file or directory)
access("/var/run/libvirt/libvirt-sock", F_OK) = 0
socket(AF_UNIX, SOCK_STREAM, 0)         = 5
connect(5, {sa_family=AF_UNIX,
sun_path="/var/run/libvirt/libvirt-sock"}, 110) = 0
getsockname(5, {sa_family=AF_UNIX}, [128 => 2]) = 0



>
>     root@calamity:~# ll /var/run/libvirt/libvirt-sock
>     srwxrwxrwx 1 root root 0 Sep  9 19:48 /var/run/libvirt/libvirt-sock


# ls -l /var/run/libvirt/libvirt*
srwx------ 1 root root 0 Sep 27 10:16 /var/run/libvirt/libvirt-admin-sock
srwxrwxrwx 1 root root 0 Sep 27 10:16 /var/run/libvirt/libvirt-sock
srwxrwxrwx 1 root root 0 Sep 27 10:16 /var/run/libvirt/libvirt-sock-ro

>
> It uses /var/run/libvirt/libvirt-sock for communications. That's the
> area where things are failing for you. The client virsh can't connect
> to the running libvirtd daemon.


>
>     root@calamity:~# dpkg -S /etc/init.d/libvirtd
>     libvirt-daemon-system: /etc/init.d/libvirtd

>
>     root@calamity:~# dpkg -L libvirt-daemon-system | grep /etc/ | grep /etc/init.d/
>     /etc/init.d/libvirt-guests
>     /etc/init.d/libvirtd
>     /etc/init.d/virtlogd

>
>     root@calamity:~# dpkg -L libvirt-daemon-system | grep /etc/ | less

>
> In the case of libvirtd AFAIK there is only sysvinit scripts. Which
> means runit will be running them in legacy compatibility mode. That
> seems to be the likely part where things are not working.


On daedalus, the init script was split into
libvirt-daemon-system-sysv. I also have libvirt-daemon-driver-qemu
installed.
Given that it works under root but not a regular user account, I'm
guessing there is some sort of permission problem somewhere.
The /etc/libvirt/*.conf files mention polkit so I'm not sure if that
is related here.

> I would boot runit and then look to see how things are different. Is
> the libvirtd running? Are there are any error messages at boot time
> around the running of those scripts. Then to further debug run those
> scripts manually. Trace them.
>
>     root@calamity:~# ll /etc/init.d/libvirtd
>     -rwxr-xr-x 1 root root 5600 Oct  1  2020 /etc/init.d/libvirtd

>
>     root@calamity:~# less /etc/init.d/libvirtd

>
>     root@calamity:~# service libvirtd status
>     Checking status of libvirt management daemon: libvirtd not running.

>
>     root@calamity:~# service libvirtd start

>
> One should normally use "service" to start and stop init scripts
> because it cleans the environment, changes directory, and so forth.
> All needed things. Of course runit also has a native interface. For
> more debugging I would run the script directly tracing it.
>
>     root@calamity:~# sh -x /etc/init.d/libvirtd

>
> That will produce a lot of shell tracing output. I can't really
> speculate more without more detail. Hopefully by looking at what it
> is doing the reason for it not doing it under runit will become
> apparent. It might be a problem with the required cgroups setup. It
> might be a different problem. Check the syslog to see what errors are
> logged there.


Thanks for that suggestion to run initscript with `sh -x`. It appears
to show an error mounting cgroups in the init script, but perhaps that
is ok because I already have cgroupfs-mount package installed for
docker and that is already mounting the cgroups.

Sorry for the jumbled responses, but I'm testing as I write the reply here...

Hmmmm.. I noticed the output line of:

+ check_mount_cgroup_options
+ [ ! = yes ]
+ return 1

which looked a bit suspicious. So I checked the init script and it is
reading $mount_cgroups from /etc/default/libvirtd. The value is
commented out by default so I uncommented and set:

mount_cgroups=no

because I am already mounting cgroups via the cgroupfs-mount package.

I then stopped/restarted the libvirtd service and got some better output.

+ check_mount_cgroup_options
+ [ ! no = yes ]
+ return 1

> Hope that helps!


Yes! It appears that last shell debugging of init script definitely helped.
After setting mount_cgroups=no in /etc/default/libvirtd and restarting
libvirtd service, I can now successfully run `virsh list` and
`virt-manager` from my regular user account!

>
> Bob


Thanks for your assistance on this weird problem!

Tom