Tensorplex Dojo Subnet: Democratizing AI Dataset Creation
The development of open-source AI models is often hindered by the lack of
high-quality human-generated datasets. Closed-source AI developers, aiming to
reduce data collection costs, have created significant social and economic
equity challenges, with workers being paid less than $2 per hour for mentally
and emotionally taxing tasks. The benefits of these models have been
concentrated among a select few, exacerbating inequalities among contributors.
Enter Tensorplex Dojo Subnet — an open platform designed to crowdsource
high-quality human-generated datasets. Powered by Bittensor, the Dojo Subnet
addresses these challenges by allowing anyone to earn TAO by labeling data or
providing human-preference data. This approach democratizes the collection of
human preference data, addressing existing equity issues and paving the way
for more inclusive and ethical AI development.
Key Features
To ensure the quality and integrity of the data collected, Dojo introduces
several novel features:
-
Synthetic Task Generation: Unique tasks are generated by
state-of-the-art Large Language Models (LLMs) to collect human feedback
data, which can be used to improve open-source models.
-
Synthetic Ground Truth Validation Mechanism: Validators can
synthetically generate partial ground truths, allowing them to determine the
quality of responses provided by individual participants.
-
Obfuscation: Techniques to prevent sybil attacks and ensure
contributions are genuinely human.
Use Cases
The Dojo Subnet offers multiple use cases:
-
Synthetically Generated Tasks: These tasks can bootstrap
the human participant pool and can be used for model training or fine-tuning
from the outset.
-
Cross-subnet Validation: Validators can use responses to
rate the quality of outputs across other Bittensor subnets, thereby
incentivizing miners to improve their performance.
-
External Data Acquisition: Entities outside the Bittensor
ecosystem can tap into the subnet to acquire high-quality human-generated
data.
By creating an open platform for gathering human-generated datasets,
Tensorplex Dojo Subnet aims to solve the challenges of quality control, human
verification, and sybil attack prevention while promoting a more equitable
distribution of benefits in AI development.
Benefits to participants contributing through Dojo
-
Open platform: Anyone capable can contribute, ensuring
broad participation and diverse data collection.
-
Flexible work environment: Participants enjoy the freedom
to work on tasks at their convenience from any location.
-
Quick payment: Rewards are streamed consistently to
participants, as long as they complete sufficient tasks within a stipulated
deadline and have them accepted by the subnet.