Androidify : Prototyping selfies into Android mascots.

01 - The Challenge


Google wanted to give people an Android mascot that looked like them. The question was whether a generative system could produce personalised output at scale without losing the precise visual grammar that makes the Android character recognisable.

They'd seen what fine-tuning could do on a previous project. The prototype had to demonstrate that brand fidelity and personalisation weren't in conflict — and that the answer was worth resourcing.




02 - The System


The pipeline runs in two phases. The first converts a selfie into structured data: a vision-language model reads the image and extracts clothing, colour, accessories, and hairstyle as a typed JSON schema. The second phase consumes that schema to generate an Androidify mascot via a fine-tuned image model trained on the Android character corpus.

The schema is the critical handoff. It isn't just an output format — it's an interface contract between two models that were developed separately. The extractor and the generator had to speak the same language, which meant co-designing the extraction vocabulary alongside the training dataset captions.

A parallel training pipeline ran independently: dataset preparation, LoRA fine-tuning, and iterative testing fed the generation model. A proposed evaluator stage — output validation against brand references — was scoped but not implemented in the prototype.


03 - The Decisions


The hardest technical problem wasn't generation quality in isolation. It was how to apply personalised clothing to the mascot without destabilising the base character — the proportions, posture, and visual grammar that make the Android recognisable.

The approach was masked regional generation: establishing a base Bot in a neutral colour, then constraining generation to specific regions so clothing could be applied without the model collapsing the character's underlying structure. The alternative — generating the full mascot from scratch on every run — produced outputs where clothing attributes bled into the character's form.

The decision to separate the base character from the clothing layer was an architectural choice with aesthetic consequences. It preserved brand fidelity by treating the mascot as a stable substrate rather than a fully generative output.


04 - The Complexity


The prototype was a pitch as much as a proof of concept. The audience was R/GA’s global CTO, and  VP Creative Technology and key technical leader for the client — people who could read both the technical architecture and its commercial implications.

Imagen 3 was under strict confidentiality at the time. Direct evaluation against the model wasn't possible, so the submission had to make the case for fine-tuning capability without being able to demonstrate it on the actual target model. What was submitted was convincing enough for the team to negotiate access — the prototype unlocked the resource, which was the point.

The GCP compute infrastructure, set up briefly with a senior technical director, flagged a scalability question that remained open: the cost model for production-scale generation was never resolved within the prototype phase.




05 - The Evidence


Over one week: a preliminary system architecture covering the full project pipeline, and a structured model evaluation framework across four candidate models — SDXL, Flux.1 Schnell, SD 1.5 + ELLA, and Imagen 3.

Evaluation was organised around two explicit priorities: prompt adherence first — does the model follow the clothing description accurately — and style fidelity second — does the output hold to the visual grammar of the Android brand assets. Both criteria were assessed manually against reference Droids, since brand fit at this level of specificity doesn't reduce to a metric.

ComfyUI was the testing environment for four generation approaches: pure prompt with a fine-tuned model using LoRA and DoRA, masked regional generation to apply clothing without destabilising the base character, and ControlNet for structural guidance. Avatar outputs across these approaches document how each technique handled the clothing-transfer problem differently.


06 - My Contribution


Sole technical lead on the prototype. Designed the two-phase pipeline architecture, the extraction schema, and the masked generation approach for clothing application. Ran the model evaluation across SDXL, Flux.1 Schnell, SD 1.5 + ELLA, and Imagen 3 candidates. Collaborated with 3D artists who built the corpus for fine tuning SDXL using AI Toolkit. Built and presented the feasibility case to R/GA’s global CTO and VP creative technology on Google account. GCP infrastructure for ComfyUI instances was set up in collaboration with senior tech director for a short period during the prototype phase.

Flame Game - Joe Chung, 2021 - 2026
Today is:      

© All Rights Reserved, 2021 - 2026