Author: mlmikael Date: To: Eric Voskuil CC: libbitcoin Subject: Re: [Libbitcoin]
Proper handling after unexpected shutdown?
A thought -
Additional robustness could be achieved by storing checksums of the
involved data in the database files *and their location*, at even
intervals, together with some kind of overarching checksum information
that is written at "checkpoints" where the database is known to be in a
consistent state.
That way it would be possible to get a guarantee that the storage media
has integrity in relation with LibBitcoin's logics (presuming ECC RAM
and a watertight CPU), by just reading the whole database file to check
that the overarching checksum is correct (and to be safe, individual
checksums too).
Would that be of value?
On 2016-05-16 03:11, Eric Voskuil wrote: > It is not presently possible to know whether there is corruption when a
> hard shutdown has occurred. On the other hand I'm not aware of any case
> where a corruption has occurred apart from a hard shutdown.
>
> Validating the data would require hash validation against each
> transaction and block. As for the indexes, it would probably be faster
> to rebuild them than to validate them. In version2 it will probably be
> as fast to rebuild as it would be to validate, assuming bandwidth is
> not
> constrained and you have a checkpoint near the top. This is because the
> cost for reading the entire store is basically the same as for writing
> the entire store.
>
> For this reason I haven't been planning to implement store
> validation/repair. On the other hand, it is very fast and easy to
> detect
> at startup that a shutdown previously occurred during a write. I have
> been planning to implement this detection in version2. The fix would be
> to rebuild the store, which again shouldn't be slower than a full
> validation.
>
> The store is very reliable if it is shutdown properly. So I would
> recommend the following precautions in a production environment:
>
> 1) As your chain grows, periodically add checkpoints to your
> configuration settings file. Don't pick points too close to the top or
> they could get reorganized out. If your block pool is 50 then 51 blocks
> deep is entirely safe, since you can't reorganize deeper than that
> anyway. These additions will significantly speed a rebuild from the
> network. You could also rely on public sources, but this creates a
> centralization risk.
>
> 2) Periodically shut down a server and copy the store files to another
> directory on the same drive (or elsewhere). If you have a hard
> shutdown,
> change settings to use the saved location. The updated checkpoints from
> #1 will get you back to the top pretty quickly.
>
> 3) Maintain a second server on an independent device, using the same
> procedures. Configure each to exclude the other as a peer so that any
> corruption on one cannot affect the other. Having a second server will
> allow you to keep running while performing #2.
>
> Step three is recommended in a production environment apart from
> recovery purposes. When you post a tx from a client to a server you
> will
> not know for sure if the network has "accepted" the transaction until
> it's mined (and sufficiently deep in the chain). However if you want
> some confidence in that it is being distributed to miners you should
> query for the tx using the other server. Given they are mutually
> excluded as peers the presence of the tx will prove that it has moved
> through at least one external node.
>
> Using the above technique requires two servers always up and the
> ability
> to shut one down periodically. So maintaining a robust production
> environment requires at least three servers (and the ability to shift
> traffic away from the down server). I recommend four servers, with
> clients configured to send transactions to either of two and to
> retrieve
> from the mempool of either of the other two (single redundant
> failover).
> Other queries can be balanced across all four. This allows you to bring
> down one server in either of the two pools. The pools of course must be
> configured to exclude each others members as peers.
>
> e
>
>
> On 05/15/2016 07:20 AM, mlmikael wrote:
>> Hi Eric,
>>
>> Say that my machine shuts down unexpectedly. Perhaps at startup I
>> won't
>> even know that it did shut down unexpectedly so the LibBitcoin
>> database
>> could be in an inconsistent state.
>>
>> To mitigate that it would be great to do some kind of read operation
>> for
>> the whole database, that provides a verification deep enough to prove
>> that the probability of an inconsistency is smaller than 1 in
>> ~10^10-10^20 .
>>
>> I.e., is there any cheaper way than doing a full local sync do a new
>> directory.
>>
>> What's in the box now and what do you suggest?
>>
>> Mlmikael
>>