At VectorChat, our mission is to create the most immersive conversational AI experience. Our upcoming platform, Toffee, leverages Retrieval-Augmented Generation (RAG) to offer users seemingly endless memory, conversation length, and domain-specific knowledge.
During the development of Toffee, we found that while many improvements had been made to RAG, chunking had been severely neglected. Existing solutions were either too rudimentary or resource-intensive, making the RAG pipeline both cost-prohibitive and less accurate. Traditional chunking methods (e.g., chunking every X tokens with Y overlap) were cheaper but resulted in higher runtime costs. Unnecessary information was included as context for every LLM query, which is unaffordable when the user base of entertainment apps largely consists of free users and low-cost subscriptions. Conversely, existing semantic chunking solutions, such as Unstructured.io, were prohibitively expensive and slow, which would have severely limited the number of files users could upload.
In response, to realize the vision of Toffee, our team had to design an algorithm that significantly outperformed the current offerings by industry leaders. We achieved this not by training proprietary models, but by leveraging the severely underdeveloped state of the field. The necessary information to develop solutions that match or exceed our current model is provided in this documentation.
Our goal is to continue driving down costs, increasing accuracy, and enabling new possibilities. As LLMs begin to use larger and more diverse datasets (e.g., audio, image, video), the importance of intelligent chunking will only grow.
Thus, we designed this subnet to have a straightforward, transparent, and fair incentive mechanism to surpass our own achievements. Explore the subnet architecture below to learn how responses are evaluated fairly.
We believe the best solutions are yet to come, and we are excited to see how miners can push the boundaries of this technology!
The following phases are not necessarily sequential and may occur concurrently.
The initial phase of the subnet, containing the basic functionality of the subnet, verifying the incentive mechanism works as intended, and creating resources for miners, validators, and potential consumers to monitor the subnet.
The next phase of the subnet aims to make the intelligence incentivized by this subnet viable for commercial, enterprise, and personal use. Ensuring end-user data privacy becomes paramount.
The incentive mechanism must change such that Validators and Miners never have access to the documents sent to be chunked, and such that no other parties ever gain the models created by Miners.
But how might that work? See our (very tentative) approach.
The third phase has the subnet expand to subsume the full Retrieval-Augmented Generation (RAG) pipeline, forming a complete connection between Bittensor and viable, real world demand.
As all components of the subnet are meant to be used in RAG, the Evaluation will change to a standardized RAG benchmark, where the independent variable for any given contest is the subject of that very contest.