This article is part of VentureBeat’s special issue, “AI at Scale: From Vision to Viability.” Read more from this special issue here.
This article is part of VentureBeat’s special issue, “AI at Scale: From Vision to Viability.” Read more from the issue here.
As we wrap up 2024, we can look back and acknowledge that artificial intelligence has made impressive and groundbreaking advances. At the current pace, predicting what kind of surprises 2025 has in store for AI is virtually impossible. But several trends paint a compelling picture of what enterprises can expect in the coming year and how they can prepare themselves to take full advantage.
The plummeting costs of inference
In the past year, the costs of frontier models have steadily decreased. The price per million tokens of OpenAI’s top-performing large language model (LLM) has dropped by more than 200 times in the past two years.
One key factor driving down the price of inference is growing competition. For many enterprise applications, most frontier models will be suitable, which makes it easy to switch from one to another, shifting the competition to pricing. Improvements in accelerator chips and specialized inference hardware are also making it possible for AI labs to provide their models at lower costs.
To take advantage of this trend, enterprises should start experimenting with the most advanced LLMs and build application prototypes around them even if the costs are currently high. The continued reduction in model prices means that many of these applications will soon be scalable. At the same time, the models’ capabilities continue to improve, which means you can do a lot more with the same budget than you could in the past year.
The rise of large reasoning models
The release of OpenAI o1 has triggered a new wave of innovation in the LLM space. The trend of letting models “think” for longer and review their answers is making it possible for them to solve reasoning problems that were impossible with single-inference calls. Even though OpenAI has not released o1’s details, its impressive capabilities have triggered a new race in the AI space. There are now many open-source models that replicate o1’s reasoning abilities and are extending the paradigm to new fields, such as answering open-ended questions.
Advances in o1-like models, which are sometimes referred to as large reasoning models (LRMs), can have two important implications for the future. First, given the immense number of tokens that LRMs must generate for their answers, we can expect hardware companies to be more incentivized to create specialized AI accelerators with higher token throughput.
Second, LRMs can help address one of the important bottlenecks of the next generation of language models: high-quality training data. There are already reports that OpenAI is using o1 to generate training examples for its next generation of models. We can also expect LRMs to help spawn a new generation of small specialized models that have been trained on synthetic data for very specific tasks.
To take advantage of these developments, enterprises should allocate time and budget to experimenting with the possible applications of frontier LRMs. They should always test the limits of frontier models, and think about what kinds of applications would be possible if the next generation of models overcome those limitations. Combined with the ongoing reduction in inference costs, LRMs can unlock many new applications in the coming year.
Transformer alternatives are picking up steam
The memory and compute bottleneck of transformers, the main deep learning architecture used in LLMs, has given rise to a field of alternative models with linear complexity. The most popular of these architectures, the state-space model (SSM), has seen many advances in the past year. Other promising models include liquid neural networks (LNNs), which use new mathematical equations to do a lot more with many fewer artificial neurons and compute cycles.
In the past year, researchers and AI labs have released pure SSM models as well as hybrid models that combine the strengths of transformers and linear models. Although these models have yet to perform at the level of the cutting-edge transformer-based models, they are catching up fast and are already orders of magnitude faster and more efficient. If progress in the field continues, many simpler LLM applications can be offloaded to these models and run on edge devices or local servers, where enterprises can use bespoke data without sending it to third parties.
Changes to scaling laws
The scaling laws of LLMs are constantly evolving. The release of GPT-3 in 2020 proved that scaling model size would continue to deliver impressive results and enable models to perform tasks for which they were not explicitly trained. In 2022, DeepMind released the Chinchilla paper, which set a new direction in data scaling laws. Chinchilla proved that by training a model on an immense dataset that is several times larger than the number of its parameters, you can continue to gain improvements. This development enabled smaller models to compete with frontier models with hundreds of billions of parameters.
Today, there is fear that both of those scaling laws are nearing their limits. Reports indicate that frontier labs are experiencing diminishing returns on training larger models. At the same time, training datasets have already grown to tens of trillions of tokens, and obtaining quality data is becoming increasingly difficult and costly.
Meanwhile, LRMs are promising a new vector: inference-time scaling. Where model and dataset size fail, we might be able to break new ground by letting the models run more inference cycles and fix their own mistakes.
As we enter 2025, the AI landscape continues to evolve in unexpected ways, with new architectures, reasoning capabilities, and economic models reshaping what’s possible. For enterprises willing to experiment and adapt, these trends represent not just technological advancement, but a fundamental shift in how we can harness AI to solve real-world problems.
Source link