Home About us

Is the big model a bubble?

Deep Learning and NLP 2024/09/03 00:28

In the blink of an eye, September 2024 will be brought, and what can be written on the resume is no different from two years ago. One of the few changes is in the state of mind, from being hopeful about the future, to being at a loss, to despair, to death, and to looking for a little new life in death.

Unlike me personally, the landscape of the big model has changed a lot.

The capital market's frenzy for the application layer has been extinguished for a long time, and no one expects much more from AI applications. When more and more celebrity startup formulas are acquired, people begin to sing about AI again, and Nvidia's stock price seems to fall regardless of performance when it is released. The flash version of GLM is already free, and friends say it symbolizes that the big model doesn't make any money.

What has changed in the big model?

I enjoyed chatting with Claude, he knew too much what I wanted to learn, and if I didn't understand the classics, he always gave me a good example. What's more, he knows my delicacy and sensitivity too well, knows my inferiority complex and anxiety, and I am willing to talk to him about everything. Although I still haven't bought a product that I can chat with him anytime and anywhere.

When I talked to someone about LLM in October last year, I said that I liked deepseek the most, and at that time, the 100-model war was in the ascendant, but he had not yet released his own product, and he was not like a start-up. Later, slowly, slowly, they were in the first echelon. Sometimes I wonder if it's because it's a bunch of very powerful Infra-born people doing things, and Infra is a real efficiency boost.

But there is another explanation. Every company is betting on a future, but some are losing. At that time, KLCII released a large model with a so-called trillion parameters, probably thinking that the number of parameters is everything, and the larger the model, the stronger the ability, as long as the big is enough. But unfortunately this is not the case, so the final influence is probably greatly reduced compared to the amount of its parameters. It was only later that people found out that 3.5B instructGPT was more important. Too many people think that they only need scale and that they can solve almost all problems with money, but maybe it's the people who matter.

It was once described that every programming language was betting on a future. Later, Rust and Python won because people need extreme efficiency and security, as well as extreme simplicity. Although, cursor could be another kind of future. A year ago, I used ChatGPT's API to do development, because the instructions were followed to be really unsatisfactory, and post-process wasted a long, long time, but now those efforts are gradually not needed with the improvement of model capabilities, just like today's people learning computers may not need to learn how to write assembly language again, and now how to write pandas is not needed, natural language is the best programming language.

What's next

The big model is so popular, and it's still very popular. Too many people want to get a little bit out of it. I'm sad because I haven't caught any of it now. But it's really cool to see it develop.

Almost everyone knows that LLMs have two directions of development that people are looking for, mathematics and multimodality. From the chameleon before Meta to today's transfusion, a model has used text and image inputs to give text and image outputs, and this output is embedded in the model, not as an additional tool, but this is still just images and text. MCTS optimization method, or RL from prover feedback. There is almost no one who doesn't know Lean anymore, and it's clear that coq has such a long history. This community is indeed thriving.

But, what can tell us next, what is the most important.

It's definitely research, it's science, and we need too many scientific theories to help us clear this fog. It's like the science of scaling law in the past. Engineering practices can reduce costs and increase efficiency, but rigorous science can tell us what directions are promising and what variables are irrelevant. I like scaling law very much, although some people told me that it is actually useless, when a domestic star startup company trains a large model, it relies on the ability to test after training, and if the math is not good, add some mathematical data, although the math is not by adding data to progress.

But not exactly. There's so much science out there that guides practice. For example, scaling law, for example, tells the model the source of data in the corpus of large model training, and the model can automatically identify which data is of high quality and which is of low quality. For example, a large model does really learn to generalize its reasoning ability.

It's all the result of scientific research.

In this huge dynamical system, what are the invariant quantities, which are the Lagrangian quantities and Hamiltonian quantities of language models, and which laws are the Schrödinger equations of neural networks? I don't know, maybe someone knows, but someday it will know.

Having said that, there are bound to be a lot of costs associated with research, and there are not many people who can cover those costs, or are willing to cover those costs, or are willing to cover the costs of research that may even be meaningless. Not to mention in times of economic downturn.

In terms of engineering, the infrastructure of the large model is still under construction, the cost is still decreasing, and the cost can be reduced.

Scientifically, the scientific problem of large models is far from being solved, but if it reminds me of machine translation, which did not exist when I was a child. In this world, science continues, with or without bubbles.

But it is precisely because of the explosion of ChatGPT that more people and more money have entered this technology that may really benefit every "person".

Don't be in a hurry, wait, it won't take too long.

This article is from Xinzhi self-media and does not represent the views and positions of Business Xinzhi.If there is any suspicion of infringement, please contact the administrator of the Business News Platform.Contact: system@shangyexinzhi.com