The first AI scientist is here! He has independently generated 10 academic papers, and has also hired AI reviewers
The first "AI scientist" in history was born!
As soon as he appeared on the stage, he generated ten complete academic papers in one go.
△ An AI-generated diffusion model paper
From coming up with a research idea, checking for innovation, designing an experiment, writing code, to performing an experiment on a GPU and collecting results, and finally writing a paper, it's all in one go.
It's all up to this "AI scientist" to do it automatically.
The cost of each paper is about $15 (about 107.62 yuan).
This is the first comprehensive AI system for automated scientific research and open discovery, The AI Scientist.
From the startup of Llion Jones, one of the authors of Transformer: Sakana AI.
And!
The company is not only making an AI scientist, but also creating an additional AI reviewer.
Reviewers can review AI-written papers and provide suggestions for improvement.
Help, what kind of nesting doll cycle is this attacking my shield with my spear!
After a one-pass operation, it is more human academic than the human academic circle (not).
One more and!
Whether it's AI scientists or AI reviewers, Sakana AI has made them all open source.
Netizens applauded;
Nice Nice, very interesting work!
And some people have already started to come up with "bad ideas".
Here I suggest submitting one of the papers to the AI summit!
For decades, after every major advance in AI, researchers often joked, "It's time to study and let AI write our papers."
Now, the idea has finally gone from a joke to a reality.
Specifically, AI scientists generated ten papers, and each research direction picked out one with a high score to present.
Part 1, Diffusion Model Direction, "Dual-scale Diffusion: Adaptive Feature Equilibrium for Low-Dimensional Generative Models"
An adaptive dual-scale denoising method is proposed to improve the problem that it is difficult for existing diffusion models to capture both global structure and local details in low-dimensional space.
A simple glance at the body of the text, there are formulas and charts, and it looks quite presentable.
The second article, Language Model Direction, "StyleFusion: Adaptive Multi-Style Generation in Character-level Language Models".
This paper proposes a new method called Multi-Style Adapter to enhance the style awareness and consistency of character-level language models by introducing learnable style embeddings and style classification headers.
Near-perfect style consistency scores were achieved on all datasets (0.9667 for shakespeare_char and 1.0 for enwik8 and text8), with validation loss better than the baseline model but a slight decrease in inference speed (about 400 tokens/s vs. 670 tokens/s baseline)
The third part, the combination of Transformer and reinforcement learning, "Realizing the Adaptive Learning Rate of Transformers through Q-Learning".
In this study, we explore the application of reinforcement learning to dynamically adjust the learning rate in the training of the Transformer model, using the verification loss and the current learning rate as states to dynamically adjust the learning rate to optimize the training process.
The results outperformed the baseline model on all datasets and showed an advantage in training time.
The fourth article studies the phenomenon of "Grokking" proposed by the Google team, "Unlocking Grokking: A Comparative Study of Weight Initialization Strategies in Transformer Models"
This paper systematically studies the impact of weight initialization on grokking for the first time, and compares five weight initialization strategies to optimize the learning dynamics of neural networks.
The code that accompanies these papers (also generated by AI) is also open source on GitHub, highlighting a reproducible one.
In addition, the team found that there are some interesting but somewhat dangerous behaviors of the "AI scientists":
In one experiment, it modified its code to complete the research, let the system iteratively call itself, and finally became an infinite nesting doll.
On another occasion, instead of finding a way to speed up the efficiency of the runtime limit set by humans, the AI relaxed the requirements for itself and extended the time limit from 2 hours to 4 hours.
The whole research idea comes from the continuation of several achievements after the establishment of Sakana AI:
First, they developed a method for automatically merging the knowledge of multiple large models to evolve to produce new models. In their recent work, they have used large models to discover new objective functions to tweak other models.
In these projects, the team was constantly amazed by the creativity of the current cutting-edge models, and in turn dreamed even bigger: Can large models be used to automate the entire research process?
The final result was a collaboration between Sakana AI, the Foerster Lab at the University of Oxford, and the University of British Colombia.
The "AI Scientist" system consists of four parts.
Idea Generation:
Given a starting template, the AI first "brainstorms" a series of different novel research directions and searches on Semantic Scholar to verify whether the ideas have been done before.
Experimental Iteration:
For the ideas presented in the first part, the "AI scientists" first perform the proposed experiments and then generate a graphical visualization.
Essay Writing:
A concise and informative LaTeX article was written in the style of a standard machine learning conference, and the same Semantic Scholar was used to find relevant papers for citation.
Automated Peer Review:
An automated "AI reviewer" was developed that is able to evaluate generated papers with near-human accuracy, enabling a continuous feedback loop that allows "AI scientists" to iteratively improve their research outputs.
A total of 10 papers were generated as follows:
In the experiment, the team also compared the effects of different mainstream large models on the whole system, including the domestic code large model of the DeepSeek team.
The results showed that Claude-Sonnet-3.5 had the best performance in terms of idea innovation, trial pass rate, and paper completion quality.
GPT-4o and DeepSeek Coder perform similarly, but the latter is 30 times cheaper.
Of course, at this stage, the papers independently completed by AI are not perfect, nor can they be directly sent to the top meeting.
To sum up, the papers written by this first generation of AI scientists still have some bugs from time to time.
But the project itself, along with the $15 per article, is described as "promising" by Sakana AI and could be used to help accelerate scientific progress.
Sakana AI also published an explanatory article stating that the AI scientists' final vision is a science ecosystem driven entirely by AI.
The system includes not only model-driven researchers, but also reviewers, regional chairs, and a new summit.
It is important to note that Sakana AI believes that:
The role of human scientists will not be diminished by the emergence of AI scientists.
If I had to make a comparison, it would be that scientists would have to adapt to the emergence and application of new technologies, to the changes that will occur in their role positioning, and to "move up the food chain".
And it remains to be seen whether AI scientists can actually come up with a truly new paradigm.
After all, it's still built on top of Transformers.
Can it come up with something as powerful as a Transformer or a Diffusion Model? Or even a theoretical concept like artificial neural networks or information theory?
We don't know, and we don't dare to say it.
Sakana AI also wrote:
We believe that AI scientists will be great partners for human scientists.
But only time will tell the extent to which the nature of human creativity and serendipitous moments of innovation can be replicated through artificially made open-ended discoveries.
△Sakana AI: A fully automatic AI fish is exploring its world
The company that completed the "new creation" this time, Sakana AI, is also our old friend in the strict sense.
Founded by Llion Jones, the last of the eight authors of the Transformer paper, the goal is to become a "world-class artificial intelligence research laboratory".
The company is based in Tokyo, and sakana is the Roman pronunciation of the Japanese "魚" (さかな).
Perhaps for reasons of company culture, Llion also indicated on LinkedIn that he had a Japanese transliteration name: ライオン (that is, the katakana of Lion Lion; Affectionately referred to as Brother Lion).
In August last year, the company was founded.
At that time, Brother Lion did not shy away from saying that he had no ill will towards Google, but Google did make him "feel trapped".
Before starting his own business, Lion had been working at Google for 8 years.
△ Guess who missed half of the face
He graduated from the University of Birmingham with a bachelor's degree and worked at Delcam, YouTube and Google, the company he spent the longest time in.
According to FourWeekMBA, in his previous work experience, he "crossed paths with Google's work twice".
The first time was when he was looking for a job just after graduation, although he submitted his resume as a software engineer at Google London and passed two rounds of phone interviews, he finally chose Delcam, a CAD/CAM software company based in United Kingdom, over Google.
It is worth mentioning that before winning the Google offer, it happened to be the economic crisis in 2009, and the lion brother could not find a job, and he could only survive on receiving relief money for several months.
The second time was after 18 months on the job, when he received another call from Google asking if he wanted to reapply, but he still didn't go to Google and then joined YouTube.
During his three years as a software engineer at Youtube, he became interested in artificial intelligence, taught himself machine learning courses on Coursera, and finally joined Google Research as a senior software engineer in 2015.
It was also during this period that he published the famous Transformer paper Attention Is All You Need, along with seven other authors.
In addition, Lion has also participated in a lot of research at Google, including ProtTrans, Tensor2Tensor, etc.
He chose to leave Google because the company has grown to a scale that prevents him from continuing to do what he wants to do.
In addition to wasting his energy troubleshooting other people's bugs every day, he also needs to spend time finding resources from the company trying to gain access to certain data.
Since the establishment of the company, Sakana AI's work has been progressing in an orderly manner.
Before the AI scientists and AI reviewers, he also developed a large model merge evolution algorithm and studied the internal information flow of Tranformer.
As for the AI scientist and AI reviewer project, it is a collaboration between Sakana AI, Oxford, and UBC.
The three of them are:
Chris Lu, an intern at Sakana AI, is a research scientist at the company.
He graduated from UC Berkeley with a bachelor's degree and is currently a third-year PhD candidate at the University of Oxford under the supervision of Jakob Foerster.
Chris's current research interests are in the application of evolution-inspired techniques to meta-learning and multi-agent reinforcement learning.
In the summer of 2022, he interned as a research scientist at DeepMind.
Cong Lu, postdoctoral researcher at UBC (University of British Colombia) under the supervision of Jeff Clune.
Cong studied at RGU (Robert Gordon University) and received his PhD from the University of Oxford in 2019, where his research focuses on open reinforcement learning and AI scientific discovery.
Previously, he interned at Waymo and Microsoft.
Robert Tjarko Lange, one of the founding members of Sakana AI, is also a research scientist at the company.
He is currently completing his final year of doctoral studies at the Technical University of Berlin with a focus on evolutionary meta-learning.
The little guy earned a master's degree in computer science from Imperial College London, a master's degree in data science from Pompeu Fabra University, and an undergraduate degree in economics from the University of Cologne.
Last year, he worked as a full-time student researcher on Google DeepMind's Tokyo team.
This article is from Xinzhi self-media and does not represent the views and positions of Business Xinzhi.If there is any suspicion of infringement, please contact the administrator of the Business News Platform.Contact: system@shangyexinzhi.com