Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning

UNC Chapel Hill
*Equal Contribution
Symbolic-MoE

Symbolic Mixture-of-Experts (Symbolic-MoE) is an adaptive framework that recruits experts on the instance level (i.e., each problem would be solved by different experts), based on the skills needed for each problem. This adaptiveness does not only deliver a better performance across tasks, but also being more efficient than existing multi-agent framework (while the model pool is much larger).

Abstract

Combining existing pre-trained expert LLMs is a promising avenue for scalably tackling large-scale and diverse tasks. However, selecting experts at the task level is often too coarse-grained, as heterogeneous tasks may require different expertise for each instance. To enable adaptive instance-level mixing of pre-trained LLM experts, we propose Symbolic-MoE, a symbolic, text-based, and gradient-free Mixture-of-Experts framework. Symbolic-MoE takes a fine-grained approach to selection by emphasizing skills, i.e., specialized subcategories or subtopics such as algebra in mathematics or molecular biology in biomedical reasoning. We propose a skill-based recruiting strategy that dynamically selects the most relevant set of expert LLMs for diverse reasoning tasks based on their strengths. Each selected expert then generates its own reasoning, resulting in k outputs from k experts, which are then synthesized into a final high-quality response by an aggregator. The aggregator is chosen based on its ability to integrate diverse reasoning outputs. We show that instance-level expert selection improves performance by a large margin but – when implemented naively – can introduce a high computational overhead due to the need for constant model loading and offloading. To address this, we implement a batch inference strategy that groups instances based on their assigned experts, ensuring each model will only be loaded once. This allows us to integrate 16 models on a single GPU with a time cost comparable to prior multi-agent baselines using 4 GPUs. Through extensive evaluations on diverse benchmarks (MMLU-Pro, GPQA, AIME, and MedMCQA), we demonstrate that Symbolic-MoE outperforms strong LLMs like GPT4o-mini, as well as multi- agent approaches, with an absolute average improvement of 8.15% over the best multi-agent baseline. Moreover, Symbolic-MoE removes the need for expensive multi-round discussions, outperforming discussion baselines with less computation.

Comparison of Symbolic-MoE with Prior Multi-Agent Work

Symbolic-MoE

Figure 1: (a) In prior work, a fixed set of task-level experts (e.g., Phi, Mistral, and Llama) is recruited to solve mathematical problems. Expert models then engage in multiple rounds of discussion, making this approach resource-intensive. (b) In contrast, Symbolic-MoE adaptively recruits instance-level experts via a skill based router. By generating only a single round of responses and using an aggregator to synthesize the final output, our approach is both more performant and more efficient.

Overview of Symbolic-MoE

Symbolic-MoE

Figure 2: (a) Preprocessing: Given a validation set and a pool of agents, we create model profiles and select an aggregator. (b) Inference: For each test example, Symbolic-MoE activates the most relevant experts based on skill-based routing. These models generate CoT responses, which the aggregator (chosen based on its ability to aggregate answers) synthesizes into a final answer.

Main Results

Symbolic-MoE

Comparison of Symbolic-MoE with single-model and multi-model baselines. Symbolic-MoE outperforms all multi-agent baselines and achieves performance comparable to strong proprietary models like GPT4o-mini & 70B models, while primarily operating with 7-8B models.

Symbolic-MoE Shows Efficiency Gains

via (1) Batch Inference (2) Skipping Multi-round Discussion.

Comparison of Different Inference Strategies

Symbolic-MoE

Figure 3: Different approaches to achieving adaptiveness in Symbolic-MoE, which uses different models for each instance. In a naive setup (I), k GPUs must be hosted simultaneously, allowing immediate access to outputs from each model. Another naive setup (II) requires only a single GPU but involves constant loading and offloading of models to obtain outputs from the corresponding model. Our scalable batch inference process (III) strikes a balance between (I) and (II). When models are assigned to problems, we group samples by model and sequentially load the corresponding LLM onto a single GPU to generate outputs efficiently. Moreover, this approach still allows us to parallelize across GPUs if they are available.

Analysis of the Efficiency Improvements

Symbolic-MoE

Left: Using our batch inference strategy, Symbolic-MoE can be run on a single GPU while the run time is comparable to the Mixture-of-Agents baseline with 4 GPUs. While our method is running on 4 GPUs, it further improves the run time. Right: We found that given an optimal aggregator, the final performance without having any discussion is often similar to having a round of multi-agent discussion, followed by the aggregation.

BibTeX

@article{chen2025symbolic,
        title={Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Scalable Heterogeneous Reasoning},
        author={Chen, Justin Chih-Yao and Yun, Sukwon and Stengel-Eskin, Elias and Chen, Tianlong and Bansal, Mohit},
        journal={arXiv preprint arXiv:2503.05641},
        year={2025}}