Re: [Libbitcoin] Proper handling after unexpected shutdown?

Auteur: Eric Voskuil
Date:
À: mlmikael
CC: libbitcoin
Sujet: Re: [Libbitcoin] Proper handling after unexpected shutdown?

This would not be sufficient. It requires atomicity of the write of the
checksum and the data that has been summed. There is no facility to
guarantee that atomicity, which is the original problem. Furthermore
there can be inconsistency between two tables, so the atomicity needs to
span files at the same time as protecting writes to a single file.

Implementing these sort of guarantees requires a significant amount of
overhead:

https://en.wikipedia.org/wiki/Atomicity_(database_systems)#Implementation

Given that the blockchain is merely a cache of public data, there is no
reason to suffer this overhead. Corruption can be detected and the cache
rebuilt. Optimization consists in preventing and reliably detecting the
corruption, deploying with redundancy, and optimizing the cache rebuild.

Along with hash table indexing, this design decision is fundamental to
the version2 blockchain and material to its performance benefits.

e

On 05/16/2016 07:51 AM, mlmikael wrote:
> A thought -
>
> Additional robustness could be achieved by storing checksums of the
> involved data in the database files *and their location*, at even
> intervals, together with some kind of overarching checksum information
> that is written at "checkpoints" where the database is known to be in a
> consistent state.
>
> That way it would be possible to get a guarantee that the storage media
> has integrity in relation with LibBitcoin's logics (presuming ECC RAM
> and a watertight CPU), by just reading the whole database file to check
> that the overarching checksum is correct (and to be safe, individual
> checksums too).
>
> Would that be of value?
>
> On 2016-05-16 03:11, Eric Voskuil wrote:
>> It is not presently possible to know whether there is corruption when a
>> hard shutdown has occurred. On the other hand I'm not aware of any case
>> where a corruption has occurred apart from a hard shutdown.
>>
>> Validating the data would require hash validation against each
>> transaction and block. As for the indexes, it would probably be faster
>> to rebuild them than to validate them. In version2 it will probably be
>> as fast to rebuild as it would be to validate, assuming bandwidth is not
>> constrained and you have a checkpoint near the top. This is because the
>> cost for reading the entire store is basically the same as for writing
>> the entire store.
>>
>> For this reason I haven't been planning to implement store
>> validation/repair. On the other hand, it is very fast and easy to detect
>> at startup that a shutdown previously occurred during a write. I have
>> been planning to implement this detection in version2. The fix would be
>> to rebuild the store, which again shouldn't be slower than a full
>> validation.
>>
>> The store is very reliable if it is shutdown properly. So I would
>> recommend the following precautions in a production environment:
>>
>> 1) As your chain grows, periodically add checkpoints to your
>> configuration settings file. Don't pick points too close to the top or
>> they could get reorganized out. If your block pool is 50 then 51 blocks
>> deep is entirely safe, since you can't reorganize deeper than that
>> anyway. These additions will significantly speed a rebuild from the
>> network. You could also rely on public sources, but this creates a
>> centralization risk.
>>
>> 2) Periodically shut down a server and copy the store files to another
>> directory on the same drive (or elsewhere). If you have a hard shutdown,
>> change settings to use the saved location. The updated checkpoints from
>> #1 will get you back to the top pretty quickly.
>>
>> 3) Maintain a second server on an independent device, using the same
>> procedures. Configure each to exclude the other as a peer so that any
>> corruption on one cannot affect the other. Having a second server will
>> allow you to keep running while performing #2.
>>
>> Step three is recommended in a production environment apart from
>> recovery purposes. When you post a tx from a client to a server you will
>> not know for sure if the network has "accepted" the transaction until
>> it's mined (and sufficiently deep in the chain). However if you want
>> some confidence in that it is being distributed to miners you should
>> query for the tx using the other server. Given they are mutually
>> excluded as peers the presence of the tx will prove that it has moved
>> through at least one external node.
>>
>> Using the above technique requires two servers always up and the ability
>> to shut one down periodically. So maintaining a robust production
>> environment requires at least three servers (and the ability to shift
>> traffic away from the down server). I recommend four servers, with
>> clients configured to send transactions to either of two and to retrieve
>> from the mempool of either of the other two (single redundant failover).
>> Other queries can be balanced across all four. This allows you to bring
>> down one server in either of the two pools. The pools of course must be
>> configured to exclude each others members as peers.
>>
>> e
>>
>>
>> On 05/15/2016 07:20 AM, mlmikael wrote:
>>> Hi Eric,
>>>
>>> Say that my machine shuts down unexpectedly. Perhaps at startup I won't
>>> even know that it did shut down unexpectedly so the LibBitcoin database
>>> could be in an inconsistent state.
>>>
>>> To mitigate that it would be great to do some kind of read operation for
>>> the whole database, that provides a verification deep enough to prove
>>> that the probability of an inconsistency is smaller than 1 in
>>> ~10^10-10^20 .
>>>
>>> I.e., is there any cheaper way than doing a full local sync do a new
>>> directory.
>>>
>>> What's in the box now and what do you suggest?
>>>
>>> Mlmikael
>>>
>
>

Ce message fait partie du fil suivant :
	Arborescence complète du fil triée par date
	mlmikael à
	mlmikael à

Donate to Dyne.org