Bringing Decentralized Search to Decentralized Services

29/11/2022
5-minute read

There was an interesting topic presented at Usenix OSDi 21 conference, which I summarized in one page 😉 . The main reference is indicated at the end of the summary. If you have the opportunity, I would definitely recommend reading the reference’s complete version. Let’s get started 😍:

Online services such as search, social networks, and e-commerce are centralized due to cost savings, suitable revenue techniques, network externalities, regulatory restrictions, and technological constraints. Decentralization advocates claim that centralized services engage in anti-user behavior due to a discrepancy among users’ demands and providers’ objectives. Centralized systems are prone to self-inflicted or coerced censorship, and the gathering of consumers’ data. Without a decentralized search, which is a fundamental feature of any service, it is difficult to achieve its alleged anti-censorship objectives: the search engine may relatively easily follow users’ requests and covertly restrict certain results. Creating a decentralized search engine that overcomes these flaws is difficult. Initially, the search engine should validate the data source to ensure the dataset is complete and accurate. Next, the user’s data including search phrases and responses, must be confidential. Thirdly, the search engine should be able to demonstrate to consumers how the results were produced and justifies why the search results are valid. Finally, the search engine should be affordable and scalable. To accomplish these objectives, the first decentralized search engine called “DESEARCH”, which protects the confidentiality of customer queries while enabling consumers to confirm the accuracy of their search results, is introduced.

DESEARCH uses trusted execution environments (TEEs) operating on unreliable volunteer executors to distribute portions of search operations including crawling, indexing, aggregation, ranking, and query processing throughout a decentralized network. With the help of TEEs, it is possible to develop mechanisms to prevent tampering or eavesdropping on crucial operations. DESEARCH breaks down a search into a series of quick jobs, where every other executor (basic components called verifiable lambda) just handles one job at a time and works with a limited subset of the data. Executors are divided into the following four functions: crawlers, indexers, queriers, and masters. Crawlers collect information from online databases. Queriers evaluate and prioritize search requests while indexers create inverted indexes, which making up the first three functions a complete search pipeline. The executors who deliver essential enrollment functions are the masters. Masters present services that confirm the authenticity of a TEE node seeking to join DESEARCH and a job assignment service that enables the individual executors to coordinate the search workflow with the least amount of repetitive work. Furthermore, they provide a key management service (KMS) that enables anyone to determine whether an executor is an authorized DESEARCH participant. DESEARCH employs witnesses that are cryptographic representations, which record the proper transmission of data across executors to monitor the execution of the dynamically performing jobs inside the search workflow. Consumers may validate their search by using witnesses as verification of an executor’s operations. Executors have no state. They access a storage system called Kanban and bring in inputs and publish outcomes to it.

Kanban technology is a storage solution that offers high availability and data integrity. Using Kanban, pipeline executors interact and transmit preliminary data, and their generated data (items, indexes, and witnesses) are stored by Kanban, which also offers integrity of data by routinely taking snapshots of every state and publishing the calculated hashes of those snapshots to a public append-only log. The interval between two successive snapshots is referred to as an epoch, and the data that marks the start of an epoch is referred to as an epoch snapshot. To initialize a search, the consumer must first acquire a list of currently active queriers. The masters and queriers manage the list together. Queriers sign proofs indicating that they have viewed the latest epoch and updated their status on the list, and masters confirm the authenticity of the status proofs. Users may verify this by looking for masters’ signatures on the list. The consumer chooses one of the current queriers arbitrary to be the leader and then transmits the encrypted search queries to the leader. Next, the leader recruits additional peer queriers to address the demand and jointly fulfil a single user search request as a group. The leader then combines the findings by rating them according to their significance and provides the consumer with a list of the items that are the most pertinent. The consumer also obtains testimonies from queriers in addition to the search engine results. Consumers may validate the witnesses generated by the search pipeline by beginning with the witnesses provided from the leader querier, navigating the tree, examining each witness node, and verifying that the search has been properly carried out.

The DESEARCH was implemented on top of Intel SGX machines and simulated with two databases on two decentralized systems: Steemit, a social networking platform and decentralized weblog platform based on the Steem blockchain and OpenBazaar, a decentralized e-commerce system that allows people to exchange products directly with each other. The Redis was used as Kanban and a large number of nodes on the AWS were used to simulate a large-scale decentralized environment. The implementation evaluated the performance and reliability of DESEARCH with the consideration of scalability, throughput, overhead, latency, cost and fault tolerance. Based on the obtained results, DESEARCH provides high throughput with excellent scalability and reduces fault disruptions. Furthermore, it can accomplish the strict latency needed for a search engine to be broadly useable and can grow horizontally with the number of executors.

Reference: Bringing Decentralized Search to Decentralized Services

Decentralization