Open data is emerging as a significant driver in building a global technology economy, projected to be valued at over $350 billion. However, much of the data currently relies on centralized infrastructure, which contradicts the fundamental principles of autonomy and censorship resistance that underpin the concept of decentralization.
To unlock its full potential, open data must transition to decentralized infrastructures. The utilization of decentralized systems will address multiple vulnerabilities faced by user applications and create a more secure platform for innovation.
Open infrastructure presents a myriad of use cases. From hosting decentralized applications (DApps) and trading bots to sharing and analyzing research data and training large language models (LLMs), each application of decentralized systems showcases their advantages over centralized counterparts.
Affordable LLM Training and Inference
The launch of the open-source AI model DeepSeek, which significantly impacted the U.S. tech markets, highlights the transformative power of open-source protocols. It serves as a reminder to pivot our focus toward the burgeoning economy of open data.
Centralized AI models incur substantial costs for LLM training and data accuracy. In stark contrast, DeepSeek’s final training phase cost approximately $5.5 million, a fraction of the over $100 million required for OpenAI’s GPT-4. Despite these discrepancies, the emerging AI landscape remains heavily reliant on centralized infrastructures, including LLM API providers that are misaligned with the ethos of open-source innovation.
Hosting open-source LLMs like Llama 2 and DeepSeek R1 is not only straightforward but also cost-effective. Unlike stateful blockchains that demand constant synchronization, LLMs require only periodic updates.
Although running inferences on these open-source models incurs high computational costs due to the need for GPUs, these models are advantageous as they do not require real-time updates for continuous synchronization.
Moreover, decentralized networks facilitate the innovative development of open-source LLMs, functioning as AI endpoints to provide reliable data to clients while lowering barriers to entry for operators.
For instance, the Akash protocol allows users to train LLMs using decentralized computing resources at costs that can be up to 85% lower than those of centralized cloud providers. The AI training and inference market holds immense upside potential, with AI companies spending approximately $1 million daily on infrastructure maintenance for LLM inference. This points to a serviceable market size of roughly $365 million annually.
Accessible Research Data Sharing
In research and scientific domains, the intersection of data sharing, machine learning, and LLMs can significantly accelerate progress and enhance quality of life. Currently, access to valuable data is restricted by costly journal systems that impose subscriptions on users.
The advent of blockchain-enabled zero-knowledge machine learning models allows for trustless data sharing and computation while ensuring privacy and confidentiality. This evolution permits researchers to share and obtain data without compromising sensitive information.
Creating sustainable channels for open research data necessitates a decentralized infrastructure that incentivizes researchers to share their findings, eliminating the need for intermediaries. An incentivized open data network could democratize access to scientific data, liberating it from the confines of expensive journals and proprietary corporate storage.
Unstoppable DApp Hosting
While centralized data hosting platforms like Amazon Web Services, Google Cloud, and Microsoft Azure offer convenience, they harbor a single point of failure that can impact reliability and availability during outages.
There have been notable incidents in technology history where centralized Infrastructure-as-a-Service platforms have failed to deliver uninterrupted service. For example, in 2022, access to MetaMask was temporarily blocked for users in certain regions due to Infura’s centralized service interruptions.
Given the many niche requirements of developers in a dynamic open-source ecosystem, relying on a single company’s infrastructure becomes increasingly untenable. Furthermore, the overarching trend toward decentralization within social networking applications like BlueSky confirms the growing demand for non-centralized alternatives.
A decentralized finance protocol can source on-chain price data without relying on centralized APIs, illustrating a forward-thinking approach to DApp development. The Web3 RPC market is vast, with an estimated 100 billion serviceable RPC requests costing between $3-$6 per million requests, stating an annual market size of $100 million-$200 million.
Open Data Requires Decentralized Infrastructure
In the long run, we will observe a shift toward generalized blockchain clients that offload storage and networking to specialized middleware protocols. For example, Solana initiated the decentralization movement by leveraging platforms like Arweave for data storage.
In the future, we anticipate increased data flow through infrastructure protocols, further strengthening middleware dependencies. As decentralized channels become more modular and scalable, they will facilitate the integration of open-source, decentralized middleware at the protocol level.
Centralized companies cannot effectively serve as intermediaries for light client headers. The future landscape of app development will likely favor decentralized infrastructure that is trustless, distributed, cost-effective, and censorship-resistant.
Ultimately, decentralized infrastructure will emerge as the default for app developers and companies, fostering growth and innovation in the open data economy.
Opinion by: Michael O’Rourke, founder of Pocket Network and CEO of Grove.
This article is for general information purposes and should not be construed as legal or investment advice. The views expressed herein are those of the author and do not necessarily reflect the views of Cointelegraph.