Skip to content

Zero knowledge Provable DB

The database constraints come from the zk circuits due to the fact that WeaveDB is a zero knowledge provable DB (zkDB).

zkJSON

zkJSON is a technique to convert any JSON within defined constraints into digits and pack them into the smallest possible number of uint256 values in a format that zero-knowledge circuits can efficiently prove and verify. This makes verifying arbitrary JSON in zk circuits orders of magnitude smaller and faster, solving a long-standing impracticality. As a result, proofs can be generated in just a few seconds on consumer devices.

Uint Packing

Zero-knowledge circuits like Circom often operate over a prime field with a modulus very close to uint256. Similarly, the EVM storage model is structured around 256-bit slots. This makes uint256-optimized encoding ideal for both ZK circuits and on-chain interoperability.

However, the packing technique used in zkJSON is not limited to uint256—it can be adapted to any word size. The core idea is to minimize wasted bits by tightly packing decimal digits into the available field width.

For example, a uint256 can represent up to 77 decimal digits. But if you're only storing a 3-digit number like 123, you're wasting 74 digits worth of space. Now imagine you have 25 such numbers—instead of storing them in 25 separate uint256 values (using 256 × 25 bits), you can pack all 75 digits into a single uint256, saving 24 entire slots.

This technique is extremely efficient for ZK circuits, because you cannot arbitrarily choose the bit-width of signals or variables inside most circuit languages (e.g., Circom). Every variable typically occupies a full field element.

By packing multiple digits into a single uint, zkJSON drastically reduces the number of signals and constraints needed to represent and verify the same logic. The circuit can then traverse the packed digits efficiently, resulting in smaller proof sizes and faster generation times.

For example, a complex JSON structure like the following will be converted into just uint256[3], and the zkJSON cuicuit can prove and verify values at any path without revealing the rest of the JSON.

encode

zkJSON is also verifiable directly on-chain via EVM smart contracts optimized with inline assembly. These contracts can traverse the packed digits without decoding them back into JSON, enabling gas-efficient verification of arbitrary data fields.

Combined with the zkDB technique described next, zkJSON enables cross-chain querying of WeaveDB state from external blockchains such as Ethereum, Solana, and others—bridging decentralized storage with on-chain logic in a verifiable, cost-efficient way.

zkDB

zkDB is a technique for representing an entire database instance using nested Sparse Merkle Trees (SMTs) combined with zkJSON. This structure enables efficient proof and verification of any specific data within the database—without revealing or exposing the rest of the data.

A typical NoSQL database including WeaveDB has a structure like the following. A database instance has collections (or dirs), and each collection has documents (or docs), which is typically JSONs.

structure

A sparse merkle tree (SMT) is a fixed size merkle tree for efficient hash computing of a large data set. If we apply this structure to a collection, each leaf can be a poseidon hash of the zkJSON encoded JSON document stored at that path, and document ids can be represented by leaf index numbers. For example, a SMT of 160 levels has 2 ** 20 leaves, which allows any document ids within 20 bytes.

For example, we can allow base64url encoded document ids up to 20 bytes, and convert each id to a decimal number and assign to the leaf with that index.

Now if you have the merkle root hash and the zk circuit, you can generate a proof to verify a membership of the poceidon hash, which in turn, with zkJSON, verifies a membership of any field within the JSON associated with the poceidon hash.

collection

We can have a parent sparse merkle tree for the entire database structure, whose leaves are the root hash of the collection SMT whose id is the leaf index. In this way, if you have the top most merkle root hash, you can zi-prove and verify the membership of any collection within the database, which zk-prove the membership of any zkJSON poseidon hash of the document in the collection, which verify any data value stored in the JSON.

db

Because of the efficiency of uint packing, anyone can generate a zk proof of any data within a few second, and use it to query WeaveDB from other blockchain. The top-most merkle root proof should be periodically stored in the target blockchain to bridge the correct data from WeaveDB to the blockchain.

zkDB is an optimistic zk rollup, in a sense that it could genrate a proof for state transition, but it's unnecessary because the data finalization process involves validators (possibly TEE) to validate the correct zk root hash by replaying the database state transitions.

However, in extreme use cases, zkDB can still generate state transition proof which requires huge computation.

zkQuery

This is the interface of EVM smartcontract to query WeaveDB. You can specify a path and the data type to use any data with a zk proof.

interface IZKQuery {
  function qNull (uint[] memory path, uint[] memory zkp) external pure returns (bool);
  function qBool (uint[] memory path, uint[] memory zkp) external pure returns (bool);
  function qInt (uint[] memory path, uint[] memory zkp) external pure returns (int);
  function qFloat (uint[] memory path, uint[] memory zkp) external pure returns (uint[3] memory);
  function qString (uint[] memory path, uint[] memory zkp) external pure returns (string memory);
  function qRaw (uint[] memory path, uint[] memory zkp) external pure returns (uint[] memory);
  function qCond (uint[] memory path, uint[] memory cond, uint[] memory zkp) external pure returns (bool);
}

You can also do conditional queries with qCond, which verifies a specific value with a condition without revealing the actual value.

For instance, you can verify the age of a person to be greater than 20 without revealing the actual age.

const { encodeQuery } = require("zkjson")
 
const json = { num: 5, arr: [ 1, 2, 3 ]}
 
// for num field
const num_gt = encodeQuery([ "$gt", 4 ])
const num_gte = encodeQuery([ "$gte", 5 ])
const num_lt = encodeQuery([ "$lt", 6 ])
const num_lte = encodeQuery([ "$lte", 5 ])
const num_eq = encodeQuery([ "$eq", 5 ])
const num_ne = encodeQuery([ "$ne", 7 ])
const num_in = encodeQuery([ "$in", [ 4, 5, 6 ]])
const num_nin = encodeQuery([ "$nin", [ 1, 2, 3 ]])
 
// for arr field
const arr_contains = encodeQuery([ "$contains", 3 ])
const arr_contains_any = encodeQuery([ "$contains_any", [ 3, 4, 5 ]])
const arr_contains_all = encodeQuery([ "$contains_all", [ 2, 3 ]])
const arr_contains_none = encodeQuery([ "$contains_none", [ 4, 5, 6 ]])

Special Bonus

zkJSON is not only for zkDB, but could be used with other technology.

For example, you can combine zkJSON with IPFS, and as long as you have the IPFS address (CID) stored on-chain (Ethereum NFT Metadata URL), you can magically use the metadata stored offchain without storing it onchain. This opens up wide varieties of novel NFT use cases.