Re: [Libbitcoin] Why the new protocol is cool

Author: Eric Voskuil
Date:
To: Amir Taaki, libbitcoin
Subject: Re: [Libbitcoin] Why the new protocol is cool

The goals of the blockchain implementation are different from the goals
of the wire protocol. This is the beginning of a discussion on bringing
together the layers.

We should first be clear about the difference between protocol and
blockchain. The protocol is the contract between client and server.
Design goals of the protocol are to consistently expose call semantics,
optimizing for developer efficiency, extensibility, server scalability,
bandwidth, privacy and statelessness. Design goals for the blockchain
relate to speed, store scalability, concurrency and fault tolerance.

We should be careful about letting the optimizations of one layer bleed
into another. The current blockchain optimizations pervade the
implementation and are closely coupled to the obelisk protocol. This is
of course necessary. At the same time I think we may find it difficult
to adapt this implementation to the new protocol.

The two major issues are pagination and prefixing:

Pagination
----------

The blockchain optimizations around pagination rely partly on exposure
of surrogate keys <http://en.wikipedia.org/wiki/Surrogate_key>.

This means that a query from one server will return differing results
from that of another, for the same semantic data retrieved. This creates
a problem with RESTfulness
<http://en.wikipedia.org/wiki/Representational_state_transfer>.

Additionally, a caller must return to the same server on each request in
a page set, since the page keys are server-specific. This complicates
load-balancing on both ends.

Also, the exposure of server-specific id's leaks information (however
benign that may seem in this case).

Finally, since surrogate keys cannot be sequential the caller has no
means to identify gaps or even overlap in paginated results. This will
certainly happen in the case of a chain fork. This is a common data
paging problem that is usually left to eventual consistency
<http://en.wikipedia.org/wiki/Eventual_consistency>. However for a
caching server-trusting wallet this presents an insurmountable obstacle.

The semantic "page" in bitcoin is the block and each block has an
immutable natural key (the block hash). The protocol relies on this
natural key to resolve the above issues while also presenting a
consistent semantic model.

Prefixing
---------

Prefix searches are the only means of achieving query privacy unless
client, server and channel are contained withing a trust boundary. That
scenario does exist, and the protocol fully supports it by allowing for
unprefixed searches (i.e. by supplying all bits in the prefix). It is
also possible that some clients will trade security for performance.

The difference between prefix and full value query is one of
implementation as opposed to protocol. Certainly if it is more optimal
the implementation can be entirely different if all bits in the prefix
are supplied vs. when only a partial prefix is supplied. The point being
that such a difference wouldn't justify a protocol distinction.

Additionally, it is problematic from a privacy standpoint if not all
queries can achieve a comparable level of privacy. This is because of
the weak-link problem. If one query that must be performed is zero
privacy but others are private the net result is still zero privacy.

Finally, it should be noted that the only means to privacy when sending
a transaction is onion routing (effectively Tor or I2P), although this
is insufficient for query unless each filter is sent via independent Tor
channels. So prefix filters for address searches remain essential even
with onion routing.

--

So while speed is important it must serve the scenarios to be useful. I
think that a fully-optimized blockchain around the new protocol will
require lazy suffix tree indexes optimized for memory mapped I/O. As
such I'm not suggesting that we attempt to align the existing
implementation with the new protocol right away.

e

On 09/26/2014 04:00 AM, Amir Taaki wrote:
> I agree with you totally, I got very excited when Eric was showing me
> through the design. It's definitely very well thought out.
>
> The only thing is that you have address prefix search paginated by block
> heights.
>
> The current database scheme has 2 databases:
>
> * history_scan_database
> Scanning from a certain block height (to get updates) by prefix
> is fast.
> Fetching the entire history for an address from 0 is very slow.
> Paginating on block height is constant speed.
>
> * history_database
> The more classic scheme of fetching a history for a fixed address.
> Doesn't not support prefix.
> Has a different pagination scheme which is constant speed.
> Using block heights to paginate would mean linear speed indexing.
>
> And it has its uses (i.e restoring new wallet). I wouldn't expect you to
> use the scan database in all scenarios.
>
> http://blog.coinkite.com/post/97397052686/public-obelisk-server-for-the-community
>
> http://www.reddit.com/r/Bitcoin/comments/2go7qm/devs_be_sure_to_test_your_bitcoin_apidata/
>
> https://wiki.unsystem.net/en/index.php/Libbitcoin/Blockchain/htdb_slab_Performance
>
> we've got some good promotion based on the speed of fetching address
> history recently. So it's still something I feel we should support like
> if you use the server on the backend as a private API with
> encryption/signing support. Some people don't need the extra security,
> privacy and don't want to do the SPV processing. For Darkwallet that's
> great, and we want to move to this new protocol.
>
> Since the fetch_history has its own semantics (query based on full
> address only- not prefix) and returns results different to what's
> expected, I thought it'd be apt to separate into its own call. But let
> me know what's the best option here.
>
> On 09/26/2014 06:39 AM, William Swanson wrote:
>> I started writing this as part of the other thread, but it got
>> super-long, so I decided to post it separately...
>>
>> The raw blockchain has two data-structures which can exist
>> independently: the transaction and the block. Everything else is
>> contained within these two data structures, and never appears
>> separately (inputs, outputs, signatures, scripts, etc.). Transactions
>> can be free-floating in the mempool, for instance, but there is no
>> such thing as a loose input script; it's always part of a transaction.
>>
>> Our protocol mirrors this fact by providing two query functions: "get
>> transactions" and "get blocks". Aside from details about result
>> encoding (full transactions vs hashes vs utxo's), this is all we could
>> ever need.
>>
>> On the push side, if we just provide "push transaction" and "push
>> block," we would actually have enough power to run the full network
>> with an alternative to the satoshi protocol.
>>
>> Of course, these four messages (get/push transaction and get/push
>> block) don't include housekeeping things like "get peer IP list". If
>> we include those, we get a nice three-layer protocol:
>>
>> // Read-only blockchain access:
>> interface read_blockchain
>> {
>> get_transactions(...);
>> get_blocks(...);
>> }
>>
>> // Write-only blockchain access:
>> interface write_blockchain
>> {
>> push_transaction(...);
>> push_block(...);
>> }
>>
>> // Full-node p2p interface: >> interface blockchain_node >> : public write_blockchain, >> public read_blockchain >> { >> get_version(...); >> get_peer_list(...); >> validate_transaction(...); >> ... >> }

>>
>> Notice that I have put validate_transaction with the housekeeping
>> stuff, since it's mainly a debugging thing.
>>
>> This protocol is truly universal. It doesn't care whether the
>> connection uses JSON over websockets, protobuff over zeromq, or even
>> if there is a connection at all.
>>
>> This last option is really interesting. Imagine what would happen if
>> we create a nice C++ interface that mirrors this protocol, and start
>> using it internally. Suddenly, all the different parts of our system
>> (libbitcoin-blockchain, libbitcoin-server, libbitcoin-client, etc.)
>> all share a common language for talking about the blockchain.
>>
>> The libbitcoin-blockchain library would start out by implementing this
>> C++ interface as a way of querying its blockchain (at least the
>> read_blockchain part). The results wouldn't include the mempool, of
>> course, but that's not a problem. The libbitcoin-server node would
>> take the results from libbitcoin-blockchain, augment them with results
>> from its mempool, and expose them again under the exact same C++
>> read_blockchain interface. Then, libbitcoin-protocol would consume the
>> interface, marshal it over zeromq, and reconstitute it on the other
>> side. A hypothetical libbitcoin-spv library would consume the
>> interface again, validate it, cache it, and present it to the client.
>>
>> Under this model, any trusting wallet can be turned into a
>> non-trusting wallet by slipping an SPV module into the data pipeline.
>> It's just like clicking in Legos - the common bump-and-socket
>> interface makes everything interchangeable. For example, if we create
>> a three-layer sandwich with libbitcoin-protocol, libbitcoin-spv, and
>> another libbitcoin-protocol, we suddenly have a lightweight cache that
>> we could put in front of a full-node server for load-balancing. The
>> possibilities are endless.
>>
>> I think a design like this would represent a really powerful evolution
>> of the libbitcoin architecture. It will be interesting to see how much
>> work it takes, or even if it's really possible.
>>
>> -William
>> _______________________________________________
>> Libbitcoin mailing list
>> Libbitcoin@???
>> https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/libbitcoin
>>
>
>
>
> _______________________________________________
> Libbitcoin mailing list
> Libbitcoin@???
> https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/libbitcoin
>

This message is part of the following thread:
	the complete thread tree sorted by date
	Eric Voskuil at
	Amir Taaki at

Donate to Dyne.org