What are Identity Oracles ?

A high-level overview

Problem Statement

In the off-chain world, a much more robust set of user identity data is available compared to its web3 counterpart. For example, information about a user’s authenticated identity and social media behaviors allows for more sophisticated forms of user engagement, user quality control, and advertising.
In contrast, blockchains inherently lack a credible user reputation and identity system due to their built-in pseudo-anonymity and the fact that oftentimes native user behaviors are purely financial and speculative. This makes these systems easily prone to Sybil attacks and other forms of identity manipulation.
As a result, the on-chain application design space is highly limited. A reliable on-chain reputation system would allow protocols to conduct higher-quality user selection, distribute directional incentives for a wider range of value-generating activities, and establish more sophisticated solutions in DeFi. For example, a GameFi project could release NFTs limited only to first-person shooter (FPS) players with over a thousand hours clocked in CS:GO. Or projects could incentivize long-tail influencers to create organic marketing content through the use of recurrent airdrops. On-chain peer-to-peer lending can also be conducted in a permissionless manner if users are issued credits based on off-chain FICO scores and KYC information.


To solve these problems, Clique proposes a new type of primitive: identity oracles. In general, an oracle refers to a piece of software that channels off-chain data on-chain. A good example is Chainlink’s price oracles. In contrast, an identity oracle specializes in bringing user-specific data such as their identity information (social media influence, gaming skills, credit scores, etc.) and behavioral data (social media engagement, e-commerce consumption, etc.) on-chain.
A High-level Overview of Clique Identity Oracles
The design problem for an identity oracle can be broken down into four parts — identity authentication, data retrieval, off-chain computation, and feeding the data into some on-chain vehicle. Authentication is usually done with some kind of O-Auth token or private credential (signing key in a PKI), during which a user needs to prove that they actually own the identity data from another platform. Preserving user anonymity becomes a key challenge in this context — ideally, no one, not even the middleware provider, should be able to link the user’s off-chain and on-chain identities together.
Data retrieval and computation, on the other hand, require that the entire process has both provenance and integrity. The former suggests that the relevant data can be correctly attributed to its original source — this is usually done by verifying TLS certificates and the corresponding chain of trust — while the latter suggests that the computation is being correctly executed without any form of adversarial tampering. Of course, both can be done with centralized servers without any privacy preservation, but this would limit the usecase to public data (e.g. Twitter interactions), leaving private user information like KYC status, credit scores, and even e-commerce transaction histories outside of the picture. Decentralizing the oracle nodes, on the other hand, is important for fault tolerance and custom access control. This is also in line with designing a decentralized identity (DID) system, or a network of nodes to issue verifiable credentials that host identity information. Obviously, user identity information can’t be exposed to an arbitrary node runner, as it would likely violate data compliance legislations like GDPR and CCPA. Therefore, the importance of privacy preservation is once again highlighted.
After the completion of the above three steps, the data can be fed into any decentralized data vehicle. It can be minted to an SBT (Soulbound token) or a non-transferable token on-chain, issued as a verifiable credential within a DID system, used to create upgradable NFTs, and also used to trigger arbitrary smart contract calls. Note that each of these vehicles requires a different application interface for signing the data, storing the data, and verifying the data.

Modular Privacy Layer

To solve the above problems, Clique uses cryptographic tools like zero-knowledge proofs (ZKPs), trusted execution environments (TEEs), and multi-party computation (MPC) to design a modular privacy-preserving layer and supply our identity oracles with custom trust assumptions.
Zero-knowledge proofs enable users to prove certain attributes about their identity without actually revealing them. Clique uses two types of ZKPs; membership proofs and query proofs. Membership proofs allow users to prove that they belong to a group or set anonymously. This is done by having the user prove that a valid Merkle path exists from the corresponding leaf (usually an identity commitment to hide the actual value) to the Merkle root that represents the set. The idea underneath is similar to on-chain mixers like Tornado Cash, which provides transactional anonymity.
On the other hand, ZK query proofs can be used for maintaining confidentiality when generating general identity queries. For example, if a project wants to require that a social media account is of a certain age or that a user has a FICO score higher than some threshold, then it can use these proofs as a proxy for the actual data. It serves as a de-sensitization tool that can be verified in a highly interoperable manner.
But using only ZKPs is not enough for identity oracles to function properly. Whenever a ZKP is generated, someone needs to have a valid witness that satisfies the circuit constraints. Because, in this case, the witness is the user’s off-chain data, which likely originates from some third-party web server, we need to make sure that the data has provenance and has not been tampered with throughout the proof generation process. If the user directly generates the ZKP in their frontend, they can easily change the witness to be something that’s different from its original source, as long as enough incentives are presented for making such an attack. A simple solution here is to use a centralized party to execute all the relevant computation, but as described above this limits both user privacy and network decentralization.
Right now, Clique mainly uses TEEs, or secure enclaves, to validate, encrypt, and process the data for ZK query proofs so that both data integrity and confidentiality are guaranteed. The execution runs in a hardware-based encrypted memory, preventing adversaries from altering the execution logic or accessing the memory. TEEs also provide attestations that allow end users to verify that the execution result is actually produced by an authentic enclave and is operating as expected. Currently, we use Intel SGX to create on-chain verifiable ECDSA signatures that can be directly verified against Intel’s Root CA. We are constantly tracking potential attack vectors against SGX based on newly released vulnerabilities like Aepic and MMIO. Some of the mitigation measures we have taken to reduce the attack surface include keeping the SGX trusted computing base (TCB) up-to-date, implementing various ORAM techniques, and limiting client diversity by allowing only proven hardware that enforces additional DCAP rules to join the network.
A similar solution can be constructed with a TLS-level MPC, where an external verifier engages in a two-party computation protocol (after splitting the MAC key into two shares) with the user under a sequentially enforced commitment scheme to make sure that the data packet is not being tampered with. This enables end-user proof generation within the browser under a WASM-based computation environment. Clique is collaborating with Chainlink on DECO and also exploring integrations with TLS Notary (building extensions with emp-zk/Mystique) with respect to MPC-based solutions. Although information-theoretically secure, MPC-based mechanisms are known for inducing huge communication overhead, thus limiting their performance from production-level readiness.