:: [Libbitcoin] obelisk: runaway file …
Top Page
Delete this message
Reply to this message
Author: Noel Maersk
Date:  
To: libbitcoin
Subject: [Libbitcoin] obelisk: runaway file descriptors/sockets
I'm trying to run an Obelisk node using Arch Linux (64-bit) and its AUR
packages. Versions:

gcc-multilib           4.8.1-3
glibc                  2.18-5
boost                  1.54.0-4
boost-libs             1.54.0-4
zeromq                 4.0.1-5
libbitcoin-leveldb-git v1.4.15.g1deb4ab-1
obelisk-git            v0.2.13.g18b69c7-1
sx-git                 v0.2.3.g33fdb4b-1


I am able to build them all just fine, init the blockchain using `sx
initchain blockchain`, run obbalancer and obworker.

obworker starts downloading the blockchain after the first connection.
While this is happening, the number of connections keeps growing,
indefinitely (displayed in "INFO [protocol]: Connected to" messages).
Same for the number of open file descriptors. /proc/`pidof obworker`/fd
shows a lot of sockets. Their number eventually grows to the maximum
allowed by the system, at which point obworker segfaults with "Too many
open files" (from libc via zeromq, I think).

At some point, which I am unable to pin down, block downloading
stops, and only new "Connected to" messages show up. Also, it is clear
that request_worker::update() starts missing heartbeats, since
request_worker::create_new_socket() is called, producing "worker ready"
messages. These two do not happen simultaneously.

I'm lost here. I tried using ZeroMQ 3 and 2, a stable version of obelisk
(0.2) - doesn't help, same issues. Or increasing the maximum number of
file descriptors - doesn't segfault, doesn't download blocks, eventually
starts producing errors, like this:

INFO [protocol]: Connected to A.A.A.A:18333 (43 connections)
INFO [protocol]: Connected to B.B.B.B:18333 (44 connections)
INFO [protocol]: Connected to C.C.C.C:18333 (45 connections)
INFO [protocol]: Connected to D.D.D.D:18333 (46 connections)
INFO [worker]: worker ready
INFO [worker]: worker ready
INFO [worker]: worker ready
INFO [worker]: worker ready
INFO [worker]: worker ready
ERROR [protocol]: Problem receiving addresses: Service stopped
ERROR [session]: inventory: Service stopped
ERROR [session]: get_data: Service stopped
ERROR [session]: get_blocks: Service stopped
ERROR: recv_transaction: Service stopped
ERROR [poller]: Received bad block: Service stopped
ERROR [protocol]: Channel stopped internal error: Service stopped
ERROR [poller]: Received bad inventory: Service stopped
INFO [protocol]: Connected to E.E.E.E:18333 (47 connections)
INFO [worker]: worker ready
INFO [protocol]: Connected to X.X.X.X:18333 (48 connections)
INFO [protocol]: Connected to X.X.X.X:18333 (49 connections)
INFO [protocol]: Connected to X.X.X.X:18333 (50 connections)
^C

I could start doing various debug traces, but my feeling is there
could be a simple system/library incompatibility.

What are the recommended library versions? Could someone with a working
obelisk give a list? Anyone using Arch Linux?..