Articles

    The Tech Stack Behind InfiniteGrammar.de

    Alex
    Alex

    Building InfiniteGrammar.de

    This is a factual overview of the technologies used in InfiniteGrammar.de, organised by workload. The system has several distinct parts that run differently and were built with different tools.

    Frontend

    The learner-facing app and the admin dashboard are both built with React 18, TypeScript, and Vite (with the SWC plugin for compilation speed). Styling uses Tailwind CSS with shadcn/ui components built on Radix UI primitives.

    Other frontend libraries in use:

    • React Router v6 for routing.
    • TanStack Query (React Query) for server-state fetching and caching.
    • Recharts for charts in the admin dashboard.
    • i18next for German/English internationalisation.
    • react-helmet-async for per-route SEO meta tags.
    • zod + react-hook-form for form validation.
    • lucide-react for icons.

    The admin dashboard also uses custom SVG components for dendrograms and similarity heatmaps that are not covered by Recharts.

    Backend

    The API layer runs as Netlify Functions (serverless) written in TypeScript. These handle exercise fetching, progress tracking, statistics, campaign logic, and admin endpoints.

    The backend uses raw SQL queries against the database rather than an ORM. This was a deliberate choice — many admin queries use PostgreSQL-specific features (CTEs, conditional aggregations, window functions) that are easier to write and debug as explicit SQL.

    Database

    Neon PostgreSQL (serverless Postgres) stores everything: exercises, gaps, distractors, user progress, completions, email campaign state, similarity runs, pairwise scores, clustering output, and ordering snapshots. The @neondatabase/serverless driver is used for connections from Netlify Functions.

    Email

    Transactional email is sent via the Resend API. The package.json also includes Nodemailer as a dependency. Email templates are plain string interpolation — no template engine.

    SEO and prerendering

    The app is a React SPA. For grammar topic pages and other SEO-relevant routes, Puppeteer runs at build time to prerender static HTML. The build script (vite build && node scripts/prerender.js) produces crawlable HTML for ~120 routes. A static sitemap.xml is maintained manually.

    Content generation pipeline

    Exercise generation runs as a separate Python system, not inside the Node.js/Netlify runtime. It calls the OpenAI API (including the Batch API for cost efficiency) to generate gap-fill exercises with distractors and explanations. The pipeline uses JSON-based run state for resumability and tracks each exercise through generation, assessment, and finalisation stages before writing to the database.

    Similarity analysis pipeline

    Also a separate Python system. It computes pairwise similarity between exercises within a grammar section using:

    • spaCy for POS-based linguistic features,
    • scikit-learn for TF-IDF vectorisation and similarity utilities,
    • NumPy for vector operations,
    • SciPy for hierarchical clustering (linkage output for dendrograms).

    There is also an experimental semantic-embedding path that can run on Vast.ai GPU instances using SGLang or vLLM with models like BAAI/bge-m3. Remote instances process a state file and never access the database or environment variables directly — results are collected back locally and written to the database in a separate finalisation step.

    Hosting and deployment

    Netlify hosts the frontend and serverless functions. Deployment is triggered by git push. The Python pipelines run locally or on remote GPU instances, not on Netlify.

    What I would probably do differently

    Puppeteer prerendering is fragile. It works, but it is slow (~120 routes at build time), breaks silently when components change, and produces HTML that can drift from what React hydrates on the client. A framework with built-in SSR or static generation — Next.js, Astro, or even Vite SSG — would handle this more reliably. The prerender script was a pragmatic shortcut that avoided a framework migration, but it has become the most maintenance-prone part of the build.

    No ORM is fine until it is not. Raw SQL works well for the admin dashboard’s complex queries. But for the simpler CRUD operations (user progress, completions, campaign state), a lightweight query builder like Kysely or Drizzle would reduce boilerplate and catch schema drift at compile time without sacrificing query control where it matters.

    The frontend carries unused weight. The package.json includes the full set of Radix UI primitives via shadcn/ui, but many of them are never used (menubar, context menu, hover card, navigation menu, etc.). Tree-shaking handles some of this, but the dependency list is larger than it needs to be. A cleanup pass or a more selective shadcn/ui installation would reduce surface area.

    Nodemailer and Resend are both in the dependency list. This is a leftover from migrating between providers. Only one should remain. Having both adds confusion about which path is active.

    The admin dashboard probably should not live in the same React app. It shares a router and build with the learner-facing product, which means admin-only code (heavy chart libraries, heatmap components, similarity visualisations) is part of the same bundle. Splitting the admin into a separate app or using more aggressive code splitting would reduce the main app’s bundle size and make the two concerns independently deployable.

    Static sitemap maintenance does not scale. The sitemap is a manually edited XML file. With ~120 routes and growing, generating it from the route definitions or the database at build time would be less error-prone.

    Python pipelines have no formal task runner. The generation and similarity pipelines use CLI scripts with JSON state files. This works at the current scale, but there is no retry logic, no scheduling, and no visibility into failed runs beyond log files. A lightweight task runner (even just a Makefile with targets, or something like Prefect for the Python side) would make pipeline operations more predictable as the number of grammar sections grows.

    Next article

    SEO Postmortem →