← Spell Book

paper-ppt-agent

ai-tools

Spell Rating
🔮🔮🔮○○
Pricing
freemium
Difficulty
intermediate

Best For

Researchers who regularly turn academic papers into presentation slides. If your workflow involves reading a paper (yours or someone else's), extracting the key contributions, methods, and experimental results, then building a slide deck for a defense, lab meeting, or conference talk, this tool automates the middle steps. Particularly useful for LaTeX-heavy researchers who can upload .zip/.tar.gz source archives directly. Not a general-purpose document-to-slides tool — if you need to convert DOCX proposals or Markdown outlines, look at PPT Master instead.

How I Actually Use It

Upload a PDF paper or a LaTeX source archive. The pipeline runs in stages. First, PyMuPDF or pylatexenc parses the document, extracting sections, figures, tables, and equations into structured data. Then a 4-pass deep reading process kicks in: Pass 1 analyzes the research background and contributions, Pass 2 plans the narrative arc, Pass 3 generates slide-structured Markdown (each page separated by --- with metadata tags), and Pass 4 reviews and refines.

The Strategist agent takes this manuscript and produces a design spec — color palette, typography, layout contracts, page rhythm. The Executor generates each slide as SVG, with the Static Critic running XML-based quality checks in milliseconds (text overflow, element overlap, low contrast) and optionally the Visual Critic rendering the SVG to an image for VLM review. After all pages pass quality gates, the SVG stack converts to a downloadable .pptx.

The frontend is a React 19 app with a Konva canvas editor where you can preview, adjust individual elements, request page regeneration, and export. Version history is automatic — every iteration gets a snapshot.

A key detail: figures from the paper travel through the entire pipeline via a [[FIG:id]] token contract. The parser tags each figure with a stable ID, the manuscript preserves these tokens, and the Executor resolves them to actual image paths with correct aspect ratios. The LLM never has to guess where figures live.

Where It Is Strong

  • Academic-specific 4-pass deep reading extracts paper structure (contributions, methods, experiments, related work) far more faithfully than generic summarization
  • The [[FIG:id]] token contract eliminates figure hallucination. Every figure reference traces back to an actual extracted image with known dimensions
  • Two-tier quality gate: Static Critic catches structural violations for free and instantly; Visual Critic (resvg rendering plus VLM) catches visual issues that rules cannot express. The layered design keeps API costs down by filtering cheap errors first
  • Hybrid Icon RAG combines Gemini Embedding 2 semantic search with lexical boosting (up to +0.24) for icon name matches. Solves the problem of pure vector search ranking visually unrelated icons higher than exact name matches
  • Optional external research enrichment via arXiv, Semantic Scholar, and web search injects related work context before analysis
  • LaTeX source archive support (.zip/.tar.gz) — most competing tools only handle PDF
  • Multi-model: OpenAI, Anthropic, Gemini, DeepSeek, plus any OpenAI-compatible endpoint
  • Prompt engineering quality is exceptional. The Executor prompt (10KB) includes CJK line-break character count tables, three-zone layout formulas, and aspect-ratio-based figure placement rules

Where It Fails

  • Model IDs in the registry are placeholders (GPT-5.5, Claude Opus 4.6, Gemini 3.1 Pro Preview). None of these exist. You must manually edit registry.py to use real model IDs before running anything. This signals the tool has not been shipped to real users yet
  • Icon RAG requires a Gemini API key regardless of which LLM provider you choose for generation. If you only have an OpenAI key, the icon matching feature will not work
  • No Docker deployment. You need to set up both a Python (uv) and Node (npm) environment locally
  • CORS is fully open (allow_origins=["*"]). Fine for localhost, a security hole on any network
  • All state is filesystem-based. No database, no multi-user isolation
  • Frontend has zero tests. Backend has 25 test files with solid coverage, but the React app is untested
  • Sequential generation only in practice. The codebase has chapter_parallel and page_parallel mode stubs, but the default and only documented mode is sequential. A 20-slide deck means 20+ LLM calls plus Critic repairs, potentially 5-30 minutes
  • The SVG Executor is a single 107KB file. Difficult to navigate, extend, or debug

Pricing, Difficulty, and Risk

Free and MIT-licensed. The real cost is LLM API tokens plus an additional Gemini API key if you want icon matching. Expect token consumption comparable to PPT Master (~100K+ tokens for a full deck) since the 4-pass reading adds significant overhead before generation even starts. Setup difficulty is moderate — uv sync and npm install handle most dependencies, but you may need pandoc and pdflatex for LaTeX input, and resvg-py for the Visual Critic. Privacy risk is low; everything runs locally except LLM API calls.

Stability risk is notable. Single author (CRui5in), placeholder model IDs, no release tags, no Docker — this is an early-stage research project, not a production tool. The architecture is sound, but expect to read and patch source code.

Verdict

Architecturally impressive, practically premature. The three-role multi-agent design, 4-pass deep reading, figure token contract, and two-tier Critic are genuinely novel approaches to the academic-paper-to-slides problem. The Prompt engineering is among the best I have seen in open-source AI tools. But the placeholder model IDs, missing deployment infrastructure, and lack of frontend testing put it firmly in "watch" territory. Worth tracking for when it stabilizes. If you need a working solution today for academic slides, PPT Master is more mature; for quick drafts, baoyu-slide-deck is faster.

Source