Home About us

Tsinghua Tang Jie's team's new work: 20,000 words are generated in one go, and the large model is unbooked and output long

Qubits 2024/08/15 13:12
Ming Min from the concave non-temple
qubit | Official account QbitAI

20,000 words were generated in one go, and the output of the large model was also rolled up!

The latest research of Tsinghua & Zhipu AI has successfully increased the output length of GLM-4 and Llama-3.1.

Under the same problem, the output increased from 1,800 words to 7,800 words, a fourfold increase.

qubits, Tsinghua Tang Jie's team's new work: 20,000 words are generated in one go, and the large model is output at length

You should know that the generation length of large models is generally less than 2K. This has implications for content creation, question answering, and more, which can lead to incomplete answers to questions and reduced creativity in the model.

The research was co-led by Li Juanzi and Tang Jie, founders of Zhipu AI and professors at Tsinghua University.

qubits, Tsinghua Tang Jie's team's new work: 20,000 words are generated in one go, and the large model is output at length

The paper and code have been open-sourced on GitHub.

Some netizens have already been given early access. LongWriter-llama3.1-8b can generate a 10,000-word long essay "History of the Decline and Fall of the Roman Empire", which runs on MacBook Pro 2018 (32GB).

The output is very accurate and can be A++.

qubits, Tsinghua Tang Jie's team's new work: 20,000 words are generated in one go, and the large model is output at length

The 9B model can handle the output of 10,000 words

This study mainly includes three aspects.

First, the researchers built a testing tool, LongWrite-Ruler. By testing multiple large models, they found that all of them had difficulty generating text of more than 2,000 words.

Further analyzing the interaction logs between users and the large model, the researchers found that only more than 1% of user requests explicitly mentioned that they wanted to generate more than 2,000 words of text.

To do this, they changed the maximum output length of the dataset used by the model during the supervised fine-tuning (SFT) phase.

The results show that the maximum output length of the model is significantly positively correlated with the maximum output length in the SFT dataset.

Therefore, it is concluded that the existing model is limited in output length mainly because of the lack of long output samples in the SFT dataset.

Even though the model has seen longer sequences in the pre-training phase, the lack of long text samples in the SFT phase will still affect the output length.

qubits, Tsinghua Tang Jie's team's new work: 20,000 words are generated in one go, and the large model is output at length

To overcome this limitation, the researchers came up with AgentWrite.

This is an agent-based pipline.

qubits, Tsinghua Tang Jie's team's new work: 20,000 words are generated in one go, and the large model is output at length

It allows the extra-long text generation task to be broken down into multiple subtasks, with each subtask processing a segment of it.

The specific process is that AgentWrite first formulates a detailed writing plan according to the user's instructions, including the main content points and the target number of words for each paragraph. According to the schedule, AgentWrite prompts the model in turn to generate the content of each paragraph.

qubits, Tsinghua Tang Jie's team's new work: 20,000 words are generated in one go, and the large model is output at length

On the basis of AgentWrite, the team used GPT-4o to generate 6,000 long-output SFT data, with output lengths ranging from 2k to 32k words, constituting the dataset LongWriter-6k. and add this data to the training process.

To verify the effectiveness of the method, the team also proposed a LongBench-Write. It contains a variety of user writing instructions, and the output length specifications are 0-500 words, 500-2000 words, 2000-4000 words, and more than 4000 words.

The evaluation results show that the output length of the model increases significantly after using AgentWrite.

qubits, Tsinghua Tang Jie's team's new work: 20,000 words are generated in one go, and the large model is output at length

With Direct Preference Optimization (DPO), the GLM-4-9B achieves the best performance among a number of models.

qubits, Tsinghua Tang Jie's team's new work: 20,000 words are generated in one go, and the large model is output at length

Netizens with fast hands have already taken the test.

A user on Reddit asked LongWriter-llama3.1-8b to generate the history of the decline of the Roman Empire, which took 22 minutes (hardware-related) and generated an average of 3.34 tokens per second.

qubits, Tsinghua Tang Jie's team's new work: 20,000 words are generated in one go, and the large model is output at length

The generated content is more formulaic, and the structure and rhythm of answering different questions are similar.

Anyway, it's a good start, and the boost is obvious.

qubits, Tsinghua Tang Jie's team's new work: 20,000 words are generated in one go, and the large model is output at length

The research team also said that in the future, the output length and output quality of the model will be further expanded, and they will also start to study how to improve efficiency without sacrificing the quality of the generation. 

This article is from Xinzhi self-media and does not represent the views and positions of Business Xinzhi.If there is any suspicion of infringement, please contact the administrator of the Business News Platform.Contact: system@shangyexinzhi.com