Surprisingly, as a smaller model it performed better than Gemini 3 Pro. It found some valid assignments for SAT formulas, but has the same issue of making up assignments for UNSAT formulas.
在格式化的数学推理任务上,前者表现不错;但在需要自主探索、动态规划的复杂代理任务上,两者的差距是真实存在的。
。51吃瓜对此有专业解读
"But now it's a case of how do you make it robust, how do you make it at scale, and how do you actually make it at a reasonable price?"。关于这个话题,夫子提供了深入分析
其次,市场规模太小,撑不起业绩反转。