Generating publication-ready illustrations is a labor-intensive bottleneck in the research workflow. While AI scientists can now handle literature reviews and code, they struggle to visually communicate complex discoveries. A research team from Google and Peking University introduce new framework called âPaperBananaâ which is changing that by using a multi-agent system to automate high-quality academic diagrams and plots.

5 Specialized Agents: The Architecture
PaperBanana does not rely on a single prompt. It orchestrates a collaborative team of 5 agents to transform raw text into professional visuals.


Phase 1: Linear Planning
- Retriever Agent: Identifies the 10 most relevant reference examples from a database to guide the style and structure.
- Planner Agent: Translates technical methodology text into a detailed textual description of the target figure.
- Stylist Agent: Acts as a design consultant to ensure the output matches the âNeurIPS Lookâ using specific color palettes and layouts.
Phase 2: Iterative Refinement
- Visualizer Agent: Transforms the description into a visual output. For diagrams, it uses image models like Nano-Banana-Pro. For statistical plots, it writes executable Python Matplotlib code.
- Critic Agent: Inspects the generated image against the source text to find factual errors or visual glitches. It provides feedback for 3 rounds of refinement.
Beating the NeurIPS 2025 Benchmark


The research team introduced PaperBananaBench, a dataset of 292 test cases curated from actual NeurIPS 2025 publications. Using a VLM-as-a-Judge approach, they compared PaperBanana against leading baselines.
| Metric | Improvement over Baseline |
| Overall Score | +17.0% |
| Conciseness | +37.2% |
| Readability | +12.9% |
| Aesthetics | +6.6% |
| Faithfulness | +2.8% |
The system excels in âAgent & Reasoningâ diagrams, achieving a 69.9% overall score. It also provides an automated âAesthetic Guidelineâ that favors âSoft Tech Pastelsâ over harsh primary colors.
Statistical Plots: Code vs. Image
Statistical plots require numerical precision that standard image models often lack. PaperBanana solves this by having the Visualizer Agent write code instead of drawing pixels.
- Image Generation: Excels in aesthetics but often suffers from ânumerical hallucinationsâ or repeated elements.
- Code-Based Generation: Ensures 100% data fidelity by using the Matplotlib library to render the final plot.
Domain-Specific Aesthetic Preferences in AI Research
According to the PaperBanana style guide, aesthetic choices often shift based on the research domain to match the expectations of different scholarly communities.
| Research Domain | Visual âVibeâ | Key Design Elements |
| Agent & Reasoning | Illustrative, Narrative, âFriendlyâ | 2D vector robots, human avatars, emojis, and âUser Interfaceâ aesthetics (chat bubbles, document icons) |
| Computer Vision & 3D | Spatial, Dense, Geometric | Camera cones (frustums), ray lines, point clouds, and RGB color coding for axis correspondence |
| Generative & Learning | Modular, Flow-oriented | 3D cuboids for tensors, matrix grids, and âZoneâ strategies using light pastel fills to group logic |
| Theory & Optimization | Minimalist, Abstract, âTextbookâ | Graph nodes (circles), manifolds (planes), and a restrained grayscale palette with single highlight colors |
Comparison of Visualization Paradigms
For statistical plots, the framework highlights a clear trade-off between using an image generation model (IMG) versus executable code (Coding).
| Feature | Plots via Image Generation (IMG) | Plots via Coding (Matplotlib) |
| Aesthetics | Generally higher; plots look more âvisually appealingâ | Professional and standard academic look |
| Fidelity | Lower; prone to ânumerical hallucinationsâ or element repetition | 100% accurate; strictly represents the raw data provided |
| Readability | High for sparse data but struggles with complex datasets | Consistently high; handles dense or multi-series data without error |
Key Takeaways
- Multi-Agent Collaborative Framework: PaperBanana is a reference-driven system that orchestrates 5 specialized agentsâRetriever, Planner, Stylist, Visualizer, and Criticâto transform raw technical text and captions into publication-quality methodology diagrams and statistical plots.
- Dual-Phase Generation Process: The workflow consists of a Linear Planning Phase to retrieve reference examples and set aesthetic guidelines, followed by a 3-round Iterative Refinement Loop where the Critic agent identifies errors and the Visualizer agent regenerates the image for higher accuracy.
- Superior Performance on PaperBananaBench: Evaluated against 292 test cases from NeurIPS 2025, the framework outperformed vanilla baselines in Overall Score (+17.0%), Conciseness (+37.2%), Readability (+12.9%), and Aesthetics (+6.6%).
- Precision-Focused Statistical Plots: For statistical data, the system switches from direct image generation to executable Python Matplotlib code; this hybrid approach ensures numerical precision and eliminates âhallucinationsâ common in standard AI image generators.
Check out the Paper and Repo. Also, feel free to follow us on Twitter and donât forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

