subnet logo
SN 58

Dippy Speech


1 SN-58 =
0.0026 τ≈ $0.83
1h %
+0.13%
24h %
-6.11%
7d %
+3.65%
Team
Impel
FDV (USD)
$17.44m
Market cap (USD)
$670.40k
Liquidity (α)
454.95k α
Liquidity (τ)
1.18k τ
Circulating supply
807.10k α

The Dippy Empathetic Speech Subnet on Bittensor is dedicated to developing the world's most advanced open-source Speech model for immersive, lifelike interactions.

Introduction

Dippy is one of the world's leading AI companion apps with 1M+ users. The app has ranked #3 on the App Store in countries like Germany, been covered by publications like Wired magazine and the average Dippy user spends 1+ hour on the app.

The Dippy team is also behind Bittensor's Subnet 11, which exists to create the world's best open-source roleplay LLM. Open-source miner models created on Subnet 11 are used to power the Dippy app. We also plan to integrate the models created from this speech subnet within the Dippy app.

The Dippy Empathetic Speech Subnet on Bittensor is dedicated to developing the world's most advanced open-source Speech model for immersive, lifelike interactions. By leveraging the collaborative strength of the open-source community, this subnet meets the growing demand for genuine companionship through a speech-first approach. Our objective is to create a model that delivers personalized, empathetic speech interactions beyond the capabilities of traditional assistants and closed-source models.

Unlike existing models that depend on reference speech recordings that limit creative flexibility, we use natural language prompting to manage speaker identity and style. This intuitive approach enables more dynamic and personalized roleplay experiences, fostering deeper and more engaging interactions.

Roadmap

Given the complexity of creating a state of the art speech model, we plan to divide the process into 3 distinct phases.

Phase 1:

  • Launch a subnet with a robust pipeline for roleplay-specific TTS models, capable of interpreting prompts for speaker identity and stylistic speech description.
  • Launch infinitely scaling synthetic speech data pipeline
  • Implement a public model leaderboard, ranked on core evaluation metric
  • Introduce Human Likeness Score and Word Error Rate as live evaluation criteria for ongoing model assessment.

Phase 2:

  • Refine TTS models toward producing more creatively expressive, highly human-like speech outputs.
  • Showcase the highest-scoring models and make them accessible to the public through the front-end interface.

Phase 3:

  • Advance toward an end-to-end Speech model that seamlessly generates and processes high-quality roleplay audio.
  • Establish a comprehensive pipeline for evaluating new Speech model submissions against real-time performance benchmarks.
  • Integrate the Speech model within the Dippy app
  • Drive the state of the art in Speech roleplay through iterative enhancements and ongoing data collection.
SN58: Dippy Speech | Backprop Finance