Author: Eric Voskuil Date: To: mlmikael, libbitcoin Subject: Re: [Libbitcoin] Proper handling after unexpected shutdown?
It is not presently possible to know whether there is corruption when a
hard shutdown has occurred. On the other hand I'm not aware of any case
where a corruption has occurred apart from a hard shutdown.
Validating the data would require hash validation against each
transaction and block. As for the indexes, it would probably be faster
to rebuild them than to validate them. In version2 it will probably be
as fast to rebuild as it would be to validate, assuming bandwidth is not
constrained and you have a checkpoint near the top. This is because the
cost for reading the entire store is basically the same as for writing
the entire store.
For this reason I haven't been planning to implement store
validation/repair. On the other hand, it is very fast and easy to detect
at startup that a shutdown previously occurred during a write. I have
been planning to implement this detection in version2. The fix would be
to rebuild the store, which again shouldn't be slower than a full
validation.
The store is very reliable if it is shutdown properly. So I would
recommend the following precautions in a production environment:
1) As your chain grows, periodically add checkpoints to your
configuration settings file. Don't pick points too close to the top or
they could get reorganized out. If your block pool is 50 then 51 blocks
deep is entirely safe, since you can't reorganize deeper than that
anyway. These additions will significantly speed a rebuild from the
network. You could also rely on public sources, but this creates a
centralization risk.
2) Periodically shut down a server and copy the store files to another
directory on the same drive (or elsewhere). If you have a hard shutdown,
change settings to use the saved location. The updated checkpoints from
#1 will get you back to the top pretty quickly.
3) Maintain a second server on an independent device, using the same
procedures. Configure each to exclude the other as a peer so that any
corruption on one cannot affect the other. Having a second server will
allow you to keep running while performing #2.
Step three is recommended in a production environment apart from
recovery purposes. When you post a tx from a client to a server you will
not know for sure if the network has "accepted" the transaction until
it's mined (and sufficiently deep in the chain). However if you want
some confidence in that it is being distributed to miners you should
query for the tx using the other server. Given they are mutually
excluded as peers the presence of the tx will prove that it has moved
through at least one external node.
Using the above technique requires two servers always up and the ability
to shut one down periodically. So maintaining a robust production
environment requires at least three servers (and the ability to shift
traffic away from the down server). I recommend four servers, with
clients configured to send transactions to either of two and to retrieve
from the mempool of either of the other two (single redundant failover).
Other queries can be balanced across all four. This allows you to bring
down one server in either of the two pools. The pools of course must be
configured to exclude each others members as peers.
e
On 05/15/2016 07:20 AM, mlmikael wrote: > Hi Eric,
>
> Say that my machine shuts down unexpectedly. Perhaps at startup I won't
> even know that it did shut down unexpectedly so the LibBitcoin database
> could be in an inconsistent state.
>
> To mitigate that it would be great to do some kind of read operation for
> the whole database, that provides a verification deep enough to prove
> that the probability of an inconsistency is smaller than 1 in
> ~10^10-10^20 .
>
> I.e., is there any cheaper way than doing a full local sync do a new
> directory.
>
> What's in the box now and what do you suggest?
>
> Mlmikael
>