:: [Libbitcoin] libbitcoin server upda…
Top Page
Delete this message
Reply to this message
Author: Eric Voskuil
Date:  
To: Kobi Gurkan, mlmikael
CC: libbitcoin
Old-Topics: Re: [Libbitcoin] Syncing a version 2 server
Subject: [Libbitcoin] libbitcoin server update [was: Syncing a version 2 server]
TL;DR - version 2.2.0 is close and the sync problems are gone.

================================================================

The open issues affecting server are always listed here:

https://github.com/libbitcoin/libbitcoin/issues
https://github.com/libbitcoin/libbitcoin-consensus/issues
https://github.com/libbitcoin/libbitcoin-blockchain/issues
https://github.com/libbitcoin/libbitcoin-node/issues
https://github.com/libbitcoin/libbitcoin-server/issues

Apart from missing features there is only one material issue that
affects version2:

https://github.com/libbitcoin/libbitcoin-server/issues/100

The master branch and sync feature sub branch have been focused on
permanently resolving this issue. This has required a significant rework
of the networking stack, which has been completed in the libbitcoin repo
of those branches. The bulk of the other work is in libbitcoin-node,
which is where I was working until Hong Kong. This work in master is not
consider stable, although it should always build.

Upon returning from Hong Kong the BIP65 enforcement threshold had
arrived. I decided to take the time to implement both both BIP66 (which
activated over the summer) and BIP65 in version2. Note that these are
both soft forks, so libbitcoin continues to work despite their
activation, though it does not enforce the new rules. This work is
complete, but it took me down a rabbit hole over the past couple of weeks.

First, it was necessary to adopt the latest libsecp256k1 in order to
update libbitcoin-consensus. This was necessary in order to track core
0.12.0 consensus updates. This forced an update all version2 repos to
change the dependency on libsecp256k1, an update to the wrapper code in
libbitcoin, an update to the MSVC libsecp256k1 build, and a new
libbitcoin/libsecp256k1/version4 branch to build from. In doing this I
forgot to branch and re-version libbitcoin-consensus, which caused
issues with master and version2.1 builds. pmienk fixed that recently.

Second, it was necessary to implement the additional consensus rules in
our native script implementation. This work exposed a problem with our
softfork activation code. Formerly the only soft forks that we
implemented were BIP16, BIP30 and BIP34. BIP30 is applied retroactively
to all blocks except two well-known blocks and BIP16 was activated as of
a calendar date. But BIP34 defined the method that is used by BIP66 and
BIP65. This was not implemented in libbitcoin, but instead fixed block
ids were used as thresholds (similar to BIP16 and BIP30). This is not
ideal, especially in the case where we intend to cross thresholds with
running code. So I implemented the BIP34 activation technique, applied
it retroactively to BIP34 and implemented and tested BIP66 and BIP65.

Testing required that I sync and validate the full chain. This has been
pretty much impossible in version2. But in my master|version3 work I had
recently discovered the primary reason for the documented libbitcoin
sync issue. There is a long-standing bug in the expected use of the
subscriber template, where resubscription causes loss of messages. This
dropping of incoming blocks leads to excessive "reorg" processing by the
node, since it can't connect subsequent blocks to the chain. This causes
both stalls and the re-requesting of missing blocks by new connections,
significantly compounding the backlog. The larger the backlog, the worse
is the resubscription issue, resulting in more dropped blocks. This
spirals out of control, bringing down the server.

So I decided to patch this issue in version2 so that I could validate
the new softfork activation code. In patching this I ended up patching a
couple of race conditions that I had already resolved in
master|version3. Once this was resolved I got to the point where we
never missed a block from peers. This is a bit more of a challenge than
it may seem, because of the asynchronous architecture of libbitcoin.
There are still issues that I'm aware of in version2, but they are
pushed to the margins, generally startup and shutdown. These may still
cause failures but should be rare. The net major release should resolve
these issues entirely.

The benefit of the asynchronous architecture, and memory-mapped hash
table blockchain implementation, is speed. We should be extremely fast
at sync, but these problems in bc::network and libbitcoin-node have
prevented that from being realized. Yesterday, as I tested the first
runs with no dropped blocks, I witnessed outstanding sync performance in
version2 for the first time:

I was able to sync the first 200,000 blocks in 90 minutes.

One of the problems with version2 is that it does not make a significant
distinction between sync and post-sync. This is defined by the last
checkpoint. The major distinction is the level of validation. But there
are networking considerations as well. When syncing there should be no
orphan pool accumulation, but because we cannot detect block
announcements without having first performed header downloads, we do
start to accumulate orphans. This slows down sync a lot, since orphan is
tested against each incoming block. Also, we want to disable transaction
relay by peers during sync. This can only be done during the handshake,
which again requires a distinction between sync and post-sync.
Connections need to be dropped and reconfigured once sync is complete.
Furthermore you want only 1-3 peers during sync, but 8 or so post-sync.
1 fast peer would be ideal, but 2 peers guards against 1 peer
under-performing, and three peers starts to create excessive work.
Version2 has no mechanism to do make these changes automatically, so you
need to do it manually.

So I've defaulted the configuration file for sync... 1 orphan, no relay,
and ~2 outgoing peers, 0 incoming peers. You will need to set a high
checkpoint yourself. Once the last sync is complete you will need to
switch the config... ~20 orphans, enable relay, ~8/~8 peers. I'll
probably provide a second post-sync config file that can be manually
swapped.

Also, there is a new configuration setting to enable mempool consistency
enforcement. This was a new feature that was always enabled in version2
post v2.1.0. This may cause significant mempool delays, so it's now
configurable and disabled by default.

I have a couple of things to finish up on version2, which will be tagged
as v2.2.0. But I encourage people to start testing on the head of
version2 presently. I hope to release in the next couple of days, then
back to version3 work :).

e

On 12/19/2015 10:52 AM, Kobi Gurkan wrote:
> Mimikael,
>
> Thanks - I eventually gave up on version2 and currently I'm syncing with
> master with checkpoints in the config which sped it up enormously.
> Apart from Merkle Root Mismatch on block 322670, on which I just started
> version2 again to get through, and then master again, it works pretty great!
>
> Kobi
>
> On 19/12/15 20:33, mlmikael wrote:
>> Kobi,
>>
>> Short answer: Wait until January
>>
>> Long answer: Version2 has an "off-by-1" bug that will eat all your
>> memory and crash the backend. Therefore the current best practice is
>> to put an "ulimit" on total RAM (e.g. to 800MB) as well as do restarts
>> continuously - I don't know - every 1 minute?? 10 seconds?
>>
>> For the interim, would you mind trying the workaround behavior
>> mentioned here and let us know your results?
>>
>> Cheers,
>> Mlmikael
>>
>> On 2015-12-19 04:36, Kobi Gurkan wrote:
>>> Hi,
>>>
>>> I'm using the latest stable release (version 2) of libbitcoin-server,
>>> and I find that it takes about 30-60 seconds per block since around
>>> block 273038.
>>> Is it realistic to sync version 2 right now? Maybe I should improve
>>> the specs of my machine?
>>>
>>> I know that Eric has been working on headers-first sync, but I would
>>> like to use a stable release.
>>>
>>> Kobi