REverse-Engineered Reasoning for Open-Ended Generation

Haozhe Wang1,2,3, Haoran Que1,6, Qixin Xu5, Minghao Liu3,4,
Wangchunshu Zhou3, Jiazhan Feng1, Wanjun Zhong1, Wei Ye6, Tong Yang6, Wenhao Huang1,
Ge Zhang1,3, Fangzhen Lin2


ByteDance Seed1, HKUST2, M-A-P3,
2077AI4, Tsinghua University5, Peking University6


Corresponding to: zhangge.eli@bytedance.com, jasper.whz@outlook.com

🔔News

🔥[2025-09-09] The paper is out 🚀. We are now working on releasing code and data.

Introduction

While the "deep reasoning" paradigm has spurred significant advances in verifiable domains like mathematics, its application to open-ended, creative generation remains a critical challenge. The two dominant methods for instilling reasoning—reinforcement learning (RL) and instruction distillation -- falter in this area; RL struggles with the absence of clear reward signals and high-quality reward models, while distillation is prohibitively expensive and capped by the teacher model's capabilities. To overcome these limitations, we introduce REverse-Engineered Reasoning (REER), a new paradigm that fundamentally shifts the approach. Instead of building a reasoning process "forwards" through trial-and-error or imitation, REER works "backwards" from known good solutions to computationally discover the latent, step-by-step deep reasoning process that could have produced them. Using this scalable, gradient-free approach, we curate and open-source DeepWriting-20K, a large-scale dataset of 20,000 deep reasoning trajectories for open-ended tasks. Our model, DeepWriter-8B, trained on this data, not only surpasses strong open-source baselines but also achieves performance competitive with, and at times superior to, leading proprietary models like GPT-4o and Claude 3.5.

To overcome these limitations, we introduce REverse-Engineered Reasoning (REER), a new paradigm that fundamentally shifts the approach. Instead of building a reasoning process "forwards" through trial-and-error or imitation, REER works "backwards" from known good solutions to computationally discover the latent, step-by-step deep reasoning process that could have produced them.

Using this scalable, gradient-free approach, we curate and open-source DeepWriting-20K, a large-scale dataset of 20,000 deep reasoning trajectories for open-ended tasks. Our model, DeepWriter-8B, trained on this data, not only surpasses strong open-source baselines but also achieves performance competitive with, and at times superior to, leading proprietary models like GPT-4o and Claude 3.5.

REverse-Engineered Reasoning (REER)

Our central goal is to instill deep reasoning in LLMs for open-ended tasks without relying on costly distillation or reinforcement learning. To achieve this, we introduce \textbf{REverse-Engineered Reasoning (REER)}, a novel paradigm that shifts the objective from generating a solution to discovering the latent reasoning process behind an existing high-quality one. Instead of building a reasoning process "forwards" via trial-and-error, REER works "backwards" from a known good output to computationally synthesize the step-by-step thinking that could have produced it. Below is an example of the structured reasoning we aim to cultivate, where the model demonstrates deliberate planning, exploration of alternatives (``Hmm... Alternatively''), and self-correction (``Wait, that's a bit too straightforward'').

algebraic reasoning

REER as a Search Problem

This approach is operationalized as a search problem where we iteratively refine an initial thinking process to discover a trajectory that best explains a high-quality, human-written output.

We evaluate the quality of a \textit{thinking process} based on how well it explains a known-good output. We use the \textbf{perplexity} (a.k.a, the model surprise) of the output $y$ as a proxy for the quality of a given reasoning trajectory $z$. A lower perplexity score for $y$, conditioned on both $x$ and $z$, indicates that the trajectory provides a more coherent and effective plan. In essence, REER posits that a good thinking process $z$ is one that makes a high-quality answer $y$ seem maximally probable and logical to the model.

Formally, the objective for the search problem is,

algebraic reasoning

Iterative Local Search

Solving for the optimal trajectory $z^*$ directly is intractable due to the vast search space. Therefore, we propose an iterative refinement algorithm that employs a guided local search to discover a high-quality deep reasoning trajectory. The algorithm starts with an initial trajectory and progressively improves it by refining its constituent segments, guided by the perplexity signal.

Data Curation

To curate the data, we collect 16K publicly available QA pairs, and run the proposed iterative search for deep reasoning synthesis. We employ techniques such as context engineering, end-of-thinking filtering. This process resulted in a final dataset of 20,000 high-quality deep reasoning trajectories. The distribution of this dataset, shown in \textbf{Figure \ref{fig:data_pie}}, highlights its diversity, with a significant focus on \textbf{Artistic} (Literature and Arts) writing, which is further broken down into sub-genres like Creative Writing and Essay Writing.

Qualitative Results

To understand how well DeepWriter internalizes the \textit{qualities} of deep thinking, we conducted a qualitative analysis, scoring model outputs on five dimensions intrinsically linked to advanced reasoning and planning.

Explore more about details, analysis of our approach within our paper.

Reference

If you find our work useful, please give us a free cite:

@article{reer,
      title={REverse-Engineered Reasoning for Open-Ended Generation},
      author = {Wang, Haozhe and Que, Haoran and Xu, Qixin and Liu, Minghao and Zhou, Wangchunshu and Feng, Jiazhan and Zhong, Wanjun and Ye, Wei and Yang, Tong and Huang, Wenhao and Zhang, Ge and Lin, Fangzhen},
      journal={arXiv preprint arXiv:},
      year={2025}
}