Artificial intelligence is rapidly gaining attention, yet a significant challenge lies beneath the excitement surrounding decentralized AI (DeAI): the lack of diverse, secure, and verifiable data. Limited on-chain datasets hinder the training of truly powerful AI models, potentially leaving the future of AI dominated by centralized entities with access to extensive web data.
The potential of DeAI—offering democratized, transparent, and robust AI—depends on closing this data gap. Innovative cryptography may provide a solution.
The strength of traditional AI is rooted in its data consumption; the more data it processes, the smarter it becomes. However, this reliance on large datasets also presents a challenge, as centralized AI often utilizes data harvested without explicit consent, raising privacy concerns.
In contrast, DeAI, grounded in the principles of blockchain technology—decentralization and transparency—promises a compelling alternative. However, most available on-chain data originates from financial transactions or decentralized finance (DeFi), with small language models needing more specific data for effective training. This scenario results in DeAI models lacking the rich datasets required to compete with contemporary AI advancements.
Vast datasets exist outside the blockchain ecosystem, such as The Pile and Common Crawl, aggregating information from billions of unique sources. The abundance and quality of data from these web2 sources have allowed centralized AI providers to refine their models rapidly.
Reproducing a comparable dataset on the blockchain within a competitive timeline is challenging. While some AI companies face criticism from data creators over the use of nuanced data, a more ethical approach to data acquisition is through enhanced security measures.
Building Bridges with Cryptography
This is where cryptographic methods come into play. Techniques like zero-knowledge proofs are revolutionizing blockchain scalability and privacy, offering promising solutions. Particularly, zero-knowledge fully homomorphic encryption (zkFHE) and zero-knowledge TLS (zkTLS) are critical for integrating web2 data with DeAI.
zkFHE enables the execution of computations on encrypted data without exposing sensitive underlying information, allowing training of AI models on confidential datasets, such as medical records, without compromising privacy.
zkTLS extends these capabilities to internet communications, enabling users to verify possession of specific data—like a credit score—without disclosing the actual information. This innovation is essential for merging the wealth of web2 data into DeAI systems, facilitating secure access to authenticated financial data from traditional institutions while maintaining confidentiality.
Empowering DeAI
The potential ramifications are significant. By leveraging zkFHE and zkTLS, DeAI can tap into the extensive available web2 data while upholding privacy and decentralization principles. This could create a more level playing field, allowing DeAI solutions to rival, and potentially exceed, those of centralized AI.
Take the development of large language models, which are currently led by well-funded tech giants requiring immense amounts of text data for their training. With zkTLS, DeAI developers may use publicly available data from the web without compromising privacy, fostering the growth of more equitable AI models.
However, challenges remain. Implementing zkFHE and zkTLS demands significant computational resources, along with advancements in both hardware and software. Additionally, standardization and interoperability are crucial for widespread adoption. Despite these obstacles, the potential benefits are monumental.
In the quest for AI dominance, data serves as the key resource. By embracing cryptographic innovations like zkFHE and zkTLS, DeAI can access the essential data needed to thrive. This endeavor goes beyond enhancing AI capabilities; it’s about forging a more democratic and fair AI landscape.