BIG — flamegame

Branded Illustration Generation: Brand-safe illustrations by everyone.

01 - The Challenge

A global financial service’s branded illustration library was a cost problem disguised as a design problem. Commissioning bespoke assets across a global network of offices — many without in-house design capability — was slow and expensive. And a centralised library of generic assets couldn't serve markets that needed local landmarks, local contexts, local relevance.

The opportunity was a generation tool in the hands of marketers: brand-consistent output that could be directed toward any market's specific needs, without a design team as intermediary.

The question was whether generative tools had reached the point where they could meet that bar. They hadn't — not out of the box. We proved it, and then built the alternative.

Generated images using the branded asset generator in three different brand illustration styles.

02 - The System

The pipeline runs from a simple text input to a curated set of brand-accurate illustrations. A user prompt is combined with an injected quality prompt and passed to a fine-tuned SDXL model, which generates a candidate batch of eight images. An aesthetic scoring model — PickScore — evaluates the batch and surfaces the top four for display.

The frontend is a React application that serialises user input to JSON and calls a local ComfyUI API running on a dedicated 4070 Ti. Post-generation, output quality was evaluated manually: a structured internal QA process collated feedback across the candidate images and presented findings to the client, giving them a transparent view of the model's current state and iteration trajectory.

The infrastructure was scoped as a proof of concept. Scaling was discussed after delivery but not part of the original brief — the goal was to demonstrate the pipeline's validity, not to productionise it.

03 - The Decisions

The central problem was fidelity — reaching the point where generated outputs were indistinguishable in style from client company’s existing illustration library. Getting there meant navigating two separate but entangled variables: training parameters that shaped what the model learned, and inference parameters that shaped how it applied that learning at generation time.

The primary evaluation method was XY plots: systematically comparing outputs across different parameter values to isolate the effect of each variable and lock down the settings that produced consistent, brand-accurate results. It's a manual process by necessity — illustration fidelity is a judgment call, not a metric.

ELLA with SD 1.5 was evaluated as a candidate for improving prompt adherence. It worked — outputs were richer and more detailed — but it injected details the user hadn't asked for and couldn't control. Style fidelity held; user control didn't. The decision was to reject it: a tool that produces beautiful outputs the user can't direct is not a tool for marketers. The fine-tuned SDXL model, with carefully locked inference values, gave controllable results within the brand style.

The harder prior question was whether fine-tuning was necessary at all. Adobe Firefly — including its fine-tuning tools — was tested and invalidated. The main MetLife client had reached the same conclusion independently. That alignment mattered: it meant the case for a custom pipeline wasn't just our recommendation, it was a shared finding.

One sample image from each illustration style used in the fine tuning dataset

04 - The Complexity

The main client was an innovation lead in Singapore, with APAC remit. The real decision-maker was her boss in Hong Kong — the regional authority on whether outputs met brand standards. Getting the project approved required passing through client company's internal AI council, then a presentation to the regional CEO. The pipeline wasn't just a technical deliverable; it was a governance case.

We worked across markets — Singapore, Japan, Korea — each with different local use cases and different relationships to the brand. The brand custodian in Hong Kong held the final call on acceptability, which meant the evaluation criteria weren't just internal — they were proxies for how a non-technical senior stakeholder would read the outputs.

Upon approval, the client rolled the tool out at a regional internal event. I ran a prompt workshop for their teams — the moment where the system moved from something we demonstrated to something their marketers actually used.

05 - The Evidence

Three phases of process documentation: model training, model optimisation, and prompt optimisation — each structured as an iterative loop between ComfyUI, internal QA scoring, and client review. The experiments log tracks multiple LoRA versions across variables including training set size, repeat count, epoch count, and total steps, with hypotheses and results noted per iteration.

QA evaluation used a structured batch scoring sheet: each prompt tested across generated candidates, scored against three explicit criteria — semantic relevance (prompt adherence), human realism (no deformities), and brand coherence (colours, shapes). Pass/fail tracked per image, per market, per scenario.

XY plots documented parameter optimisation visually: LoRA Master and CLIP strength tested across a grid of values, with the optimal range identified and marked. The data flow diagram shows the full ComfyUI pipeline — base SDXL model, LoRA, IP Adapter, KSampler, CLIP encode, latent space, PNG decode, SVG vectorisation — through to the frontend interface.

Infrastructure: local 4090/4070 Ti instances accessed remotely via Tailscale and RustDesk, with Runpod for training runs on Kohya SS.

06 - My Contribution

Technical lead and client-facing lead for all technical matters. Led a rotating team of producers, developers, and QA across three phases — model training, optimisation, and prompt engineering. Designed and iterated on the pipeline architecture, developed the process for evaluating LoRA versions, and built the ComfyUI inference pipeline. Ran the XY plot evaluation methodology to lock down inference parameters and presented findings at each client review cycle. Delivered a prompt workshop to client company's regional marketing teams at their internal rollout event.

On the client side: regular conversations with the innovation lead and her team across Singapore, Hong Kong, Japan, and Korea — showing what was possible at each stage, explaining process and constraints, running quick feasibility tests against evolving asks. The client team valued the transparency; being equipped to explain the work to their own stakeholders was as important to them as the outputs. That relationship was the direct reason a phase two was requested.

Phase two was ultimately rejected — not by the client, but by client company’s AI council. Productionising the system would have required handing over the pipeline architecture and ways of working, which wasn't something the agency could agree to. The project ended at the boundary between a successful proof of concept and a transfer of capability the client couldn't have on those terms.

Flame Game - Joe Chung, 2021 - 2026
Today is: