Perpetual Datasets

About

Each Perpetual Dataset (or simply called Dataset) corresponds to a specific domain: labelled anime videos, images of pets, Japanese audio etc. For all datasets, only the hash of each data sample goes on-chain while the actual data (content + labels) is stored off-chain. The benefits of this are two-fold:

  1. Keeping data like images, text and annotations offchain: enable high storage and throughput requirements for the datasets

  2. Keeping all data contribution and verification records on-chain along with hash of each data sample: ensure complete transparency

Keeping hash of the each data sample on-chain ensures that user can always calculate hash on their end to verify against the on-chain value, similar to checksum

Contribution and Verification

Contributors and Verifiers need to stake FRAC to participate in the data augmentation process. They either earn rewards in FRAC or their stake gets slashed based on quality of their submission as decided by the consensus protocol

Reward Pool

Each dataset has a reward pool, funded by the Dataset Creator at the genesis, to compensate contributors and verifiers. At the end of each epoch, %k%k%k\% of the current reward pool is distributed to that epoch's participants, with kk being a hyperparameter defined by the Dataset Creator.

Last updated