Re: [Libbitcoin] Why the new protocol is cool

Author: Eric Voskuil
Date:
To: Amir Taaki, libbitcoin
Subject: Re: [Libbitcoin] Why the new protocol is cool

The first point is I think a question of how much data the caller
desires in response to the query. If the entire tx is not required the
protocol allows for less. William was just pointing out that the
protocol can answer the history question using tx results.

I don't follow you with respect to privacy. Privacy isn't a function of
what is returned but what is requested.

There is a third point raised relating trust to privacy. The protocol is
agnostic on trust, explicitly supporting server-trusting wallets. But
I'm not sure how that ties in to privacy.

e

On 09/28/2014 05:05 PM, Amir Taaki wrote:
> For example say we have 100k rows for an address, and querying a tx
> takes 0.0001 secs, total = 10 secs to get the balance or history for an
> address.
>
> vs mere milliseconds with the current get_history call.
>
> The history is also cool to have a wallet that works instantly while you
> work in the background to do SPV verification.
>
> For privacy, you can find similar addresses and mix in a few false
> addresses. There could be a special database for this purpose.
> I think the risk is greater for a passive observer rather than active
> attacker in terms of privacy. Active attacker quickly makes themselves
> visible. So to some level we could trust this database was returning
> accurate results.
>
> On 09/29/2014 12:53 AM, Amir Taaki wrote:
>> About pagination of keys, as long as 2 servers have the same blockchain,
>> the pagination will be consistent since it is entirely deterministic.
>> More pages will be added as the blockchain grows without modifying
>> earlier page indexes.
>>
>> So they are not server specific. They are specific to the blockchain
>> model in use with backwards compatibility when blocks are added (no reorgs).
>>
>> The block height is in the returned dataset itself, so continuing the
>> query to get the next page based on some arbitrary data index shouldn't
>> pose a problem for any use case I imagine... except where you want to
>> fetch the data from a specific height onwards (which fixed address
>> lookup table is not ideal for- scanning db is better).
>>
>> The fetch_history is something that people really liked in those recent
>> posts, so I think it's good to keep that. Since it doesn't fit the
>> semantics of the current API, it makes sense we just put it in an
>> entirely separate call get_history without clobbering the API to fit a
>> circle in a square peg.
>>
>> The paginated keys are not sequential but they only increase (never get
>> smaller). Since we have the history with the block heights, and we can
>> see when the blockchain reorgs (back to which height), allows us to jump
>> back to the page where the history changed and re-fetch it.
>> Or just fetch specific pages (or even rows) that interest us rather than
>> the whole data set. This can be way more efficient.
>>
>> About the prefix stuff- it's more a matter of time and implementation.
>> As we go on, I will definitely improve this code and expand it. I just
>> need to properly develop the conceptual models and data structures
>> needed first.
>>
>> You can see this:
>>
>> https://wiki.unsystem.net/en/index.php/Libbitcoin/Blockchain/history
>>
>> I'm still unsure if a tree which seems conceptually the most general, is
>> the fastest for majority of queries. It seems to have its own problems.
>>
>> On 09/27/2014 10:13 AM, Eric Voskuil wrote:
>>> The goals of the blockchain implementation are different from the goals
>>> of the wire protocol. This is the beginning of a discussion on bringing
>>> together the layers.
>>>
>>> We should first be clear about the difference between protocol and
>>> blockchain. The protocol is the contract between client and server.
>>> Design goals of the protocol are to consistently expose call semantics,
>>> optimizing for developer efficiency, extensibility, server scalability,
>>> bandwidth, privacy and statelessness. Design goals for the blockchain
>>> relate to speed, store scalability, concurrency and fault tolerance.
>>>
>>> We should be careful about letting the optimizations of one layer bleed
>>> into another. The current blockchain optimizations pervade the
>>> implementation and are closely coupled to the obelisk protocol. This is
>>> of course necessary. At the same time I think we may find it difficult
>>> to adapt this implementation to the new protocol.
>>>
>>> The two major issues are pagination and prefixing:
>>>
>>> Pagination
>>> ----------
>>>
>>> The blockchain optimizations around pagination rely partly on exposure
>>> of surrogate keys <http://en.wikipedia.org/wiki/Surrogate_key>.
>>>
>>> This means that a query from one server will return differing results
>>> from that of another, for the same semantic data retrieved. This creates
>>> a problem with RESTfulness
>>> <http://en.wikipedia.org/wiki/Representational_state_transfer>.
>>>
>>> Additionally, a caller must return to the same server on each request in
>>> a page set, since the page keys are server-specific. This complicates
>>> load-balancing on both ends.
>>>
>>> Also, the exposure of server-specific id's leaks information (however
>>> benign that may seem in this case).
>>>
>>> Finally, since surrogate keys cannot be sequential the caller has no
>>> means to identify gaps or even overlap in paginated results. This will
>>> certainly happen in the case of a chain fork. This is a common data
>>> paging problem that is usually left to eventual consistency
>>> <http://en.wikipedia.org/wiki/Eventual_consistency>. However for a
>>> caching server-trusting wallet this presents an insurmountable obstacle.
>>>
>>> The semantic "page" in bitcoin is the block and each block has an
>>> immutable natural key (the block hash). The protocol relies on this
>>> natural key to resolve the above issues while also presenting a
>>> consistent semantic model.
>>>
>>> Prefixing
>>> ---------
>>>
>>> Prefix searches are the only means of achieving query privacy unless
>>> client, server and channel are contained withing a trust boundary. That
>>> scenario does exist, and the protocol fully supports it by allowing for
>>> unprefixed searches (i.e. by supplying all bits in the prefix). It is
>>> also possible that some clients will trade security for performance.
>>>
>>> The difference between prefix and full value query is one of
>>> implementation as opposed to protocol. Certainly if it is more optimal
>>> the implementation can be entirely different if all bits in the prefix
>>> are supplied vs. when only a partial prefix is supplied. The point being
>>> that such a difference wouldn't justify a protocol distinction.
>>>
>>> Additionally, it is problematic from a privacy standpoint if not all
>>> queries can achieve a comparable level of privacy. This is because of
>>> the weak-link problem. If one query that must be performed is zero
>>> privacy but others are private the net result is still zero privacy.
>>>
>>> Finally, it should be noted that the only means to privacy when sending
>>> a transaction is onion routing (effectively Tor or I2P), although this
>>> is insufficient for query unless each filter is sent via independent Tor
>>> channels. So prefix filters for address searches remain essential even
>>> with onion routing.
>>>
>>> --
>>>
>>> So while speed is important it must serve the scenarios to be useful. I
>>> think that a fully-optimized blockchain around the new protocol will
>>> require lazy suffix tree indexes optimized for memory mapped I/O. As
>>> such I'm not suggesting that we attempt to align the existing
>>> implementation with the new protocol right away.
>>>
>>> e
>>>
>>> On 09/26/2014 04:00 AM, Amir Taaki wrote:
>>>> I agree with you totally, I got very excited when Eric was showing me
>>>> through the design. It's definitely very well thought out.
>>>>
>>>> The only thing is that you have address prefix search paginated by block
>>>> heights.
>>>>
>>>> The current database scheme has 2 databases:
>>>>
>>>> * history_scan_database
>>>> Scanning from a certain block height (to get updates) by prefix
>>>> is fast.
>>>> Fetching the entire history for an address from 0 is very slow.
>>>> Paginating on block height is constant speed.
>>>>
>>>> * history_database
>>>> The more classic scheme of fetching a history for a fixed address.
>>>> Doesn't not support prefix.
>>>> Has a different pagination scheme which is constant speed.
>>>> Using block heights to paginate would mean linear speed indexing.
>>>>
>>>> And it has its uses (i.e restoring new wallet). I wouldn't expect you to
>>>> use the scan database in all scenarios.
>>>>
>>>> http://blog.coinkite.com/post/97397052686/public-obelisk-server-for-the-community
>>>>
>>>> http://www.reddit.com/r/Bitcoin/comments/2go7qm/devs_be_sure_to_test_your_bitcoin_apidata/
>>>>
>>>> https://wiki.unsystem.net/en/index.php/Libbitcoin/Blockchain/htdb_slab_Performance
>>>>
>>>> we've got some good promotion based on the speed of fetching address
>>>> history recently. So it's still something I feel we should support like
>>>> if you use the server on the backend as a private API with
>>>> encryption/signing support. Some people don't need the extra security,
>>>> privacy and don't want to do the SPV processing. For Darkwallet that's
>>>> great, and we want to move to this new protocol.
>>>>
>>>> Since the fetch_history has its own semantics (query based on full
>>>> address only- not prefix) and returns results different to what's
>>>> expected, I thought it'd be apt to separate into its own call. But let
>>>> me know what's the best option here.
>>>>
>>>> On 09/26/2014 06:39 AM, William Swanson wrote:
>>>>> I started writing this as part of the other thread, but it got
>>>>> super-long, so I decided to post it separately...
>>>>>
>>>>> The raw blockchain has two data-structures which can exist
>>>>> independently: the transaction and the block. Everything else is
>>>>> contained within these two data structures, and never appears
>>>>> separately (inputs, outputs, signatures, scripts, etc.). Transactions
>>>>> can be free-floating in the mempool, for instance, but there is no
>>>>> such thing as a loose input script; it's always part of a transaction.
>>>>>
>>>>> Our protocol mirrors this fact by providing two query functions: "get
>>>>> transactions" and "get blocks". Aside from details about result
>>>>> encoding (full transactions vs hashes vs utxo's), this is all we could
>>>>> ever need.
>>>>>
>>>>> On the push side, if we just provide "push transaction" and "push
>>>>> block," we would actually have enough power to run the full network
>>>>> with an alternative to the satoshi protocol.
>>>>>
>>>>> Of course, these four messages (get/push transaction and get/push
>>>>> block) don't include housekeeping things like "get peer IP list". If
>>>>> we include those, we get a nice three-layer protocol:
>>>>>
>>>>> // Read-only blockchain access:
>>>>> interface read_blockchain
>>>>> {
>>>>> get_transactions(...);
>>>>> get_blocks(...);
>>>>> }
>>>>>
>>>>> // Write-only blockchain access:
>>>>> interface write_blockchain
>>>>> {
>>>>> push_transaction(...);
>>>>> push_block(...);
>>>>> }
>>>>>
>>>>> // Full-node p2p interface: >>>>> interface blockchain_node >>>>> : public write_blockchain, >>>>> public read_blockchain >>>>> { >>>>> get_version(...); >>>>> get_peer_list(...); >>>>> validate_transaction(...); >>>>> ... >>>>> }

>>>>>
>>>>> Notice that I have put validate_transaction with the housekeeping
>>>>> stuff, since it's mainly a debugging thing.
>>>>>
>>>>> This protocol is truly universal. It doesn't care whether the
>>>>> connection uses JSON over websockets, protobuff over zeromq, or even
>>>>> if there is a connection at all.
>>>>>
>>>>> This last option is really interesting. Imagine what would happen if
>>>>> we create a nice C++ interface that mirrors this protocol, and start
>>>>> using it internally. Suddenly, all the different parts of our system
>>>>> (libbitcoin-blockchain, libbitcoin-server, libbitcoin-client, etc.)
>>>>> all share a common language for talking about the blockchain.
>>>>>
>>>>> The libbitcoin-blockchain library would start out by implementing this
>>>>> C++ interface as a way of querying its blockchain (at least the
>>>>> read_blockchain part). The results wouldn't include the mempool, of
>>>>> course, but that's not a problem. The libbitcoin-server node would
>>>>> take the results from libbitcoin-blockchain, augment them with results
>>>>> from its mempool, and expose them again under the exact same C++
>>>>> read_blockchain interface. Then, libbitcoin-protocol would consume the
>>>>> interface, marshal it over zeromq, and reconstitute it on the other
>>>>> side. A hypothetical libbitcoin-spv library would consume the
>>>>> interface again, validate it, cache it, and present it to the client.
>>>>>
>>>>> Under this model, any trusting wallet can be turned into a
>>>>> non-trusting wallet by slipping an SPV module into the data pipeline.
>>>>> It's just like clicking in Legos - the common bump-and-socket
>>>>> interface makes everything interchangeable. For example, if we create
>>>>> a three-layer sandwich with libbitcoin-protocol, libbitcoin-spv, and
>>>>> another libbitcoin-protocol, we suddenly have a lightweight cache that
>>>>> we could put in front of a full-node server for load-balancing. The
>>>>> possibilities are endless.
>>>>>
>>>>> I think a design like this would represent a really powerful evolution
>>>>> of the libbitcoin architecture. It will be interesting to see how much
>>>>> work it takes, or even if it's really possible.
>>>>>
>>>>> -William
>>>>> _______________________________________________
>>>>> Libbitcoin mailing list
>>>>> Libbitcoin@???
>>>>> https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/libbitcoin
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Libbitcoin mailing list
>>>> Libbitcoin@???
>>>> https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/libbitcoin
>>>>
>>>
>>>
>>>
>>
>>
>>
>> _______________________________________________
>> Libbitcoin mailing list
>> Libbitcoin@???
>> https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/libbitcoin
>>
>
>
>
> _______________________________________________
> Libbitcoin mailing list
> Libbitcoin@???
> https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/libbitcoin
>

This message is part of the following thread:
	the complete thread tree sorted by date
	Eric Voskuil at

Donate to Dyne.org