Building an Automated Image-to-Live2D Pipeline: A Developer's Guide
For Developers: This guide provides a technical blueprint for building an automated image-to-Live2D conversion pipeline. We'll cover the AI models, file formats, and integration strategies needed to turn a single character image into a fully rigged, interactive Live2D model.
๐️ Pipeline Architecture Overview
The complete pipeline consists of 6 stages, each requiring specific AI models or algorithms:
| Stage | Input | Output | AI/Algorithm |
|---|---|---|---|
| 1. Segmentation | Single image | Layer masks | Semantic segmentation (UNet, ISNet) |
| 2. Inpainting | Layer masks + image | Complete layers (RGBA) | Diffusion inpainting, amodal completion |
| 3. Mesh Generation | RGBA layers | Triangulated meshes | Delaunay triangulation, edge detection |
| 4. Deformer Setup | Meshes | Warp/rotation deformers | Template matching, pose estimation |
| 5. Parameter Binding | Deformers | Keyform animations | Predefined templates or AI prediction |
| 6. Export | All above | .model3.json + .moc3 | Live2D Cubism SDK |
Stage 1: Semantic Segmentation
The first step is splitting the character image into semantic parts: face, hair, eyes, eyebrows, mouth, body, clothes, accessories.
1 Choose a Segmentation Model
SkyTNT/anime-segmentation
High-accuracy anime character segmentation using ISNet. Trained on combined datasets for robust performance.
siyeong0/Anime-Face-Segmentation
UNet-based model specifically for anime faces. Classes: background, hair, eye, mouth, face, skin, clothes.
Hugging Face: skytnt/anime-segmentation
Pre-trained models and datasets ready for inference or fine-tuning.
# Example: Anime segmentation with ISNet from anime_segmentation import get_model model = get_model("isnet_anime") masks = model.predict(image) # Returns dict of layer masks # Expected output layers: # - "face": mask for face area # - "hair": mask for hair (may be multiple: front_hair, back_hair) # - "left_eye", "right_eye": individual eye masks # - "mouth": mouth/lips area # - "body": torso and limbs # - "clothes": outfit layers
Stage 2: Amodal Inpainting
Each layer mask defines a visible region, but Live2D layers need the COMPLETE part—including areas hidden behind other layers. This is called "amodal completion."
2 Fill in Occluded Regions
Stable Diffusion Inpainting
Use SD inpainting to fill masked regions. Works well for extending hair behind the face, completing eyes behind bangs, etc.
pix2gestalt (Amodal Completion)
Latent diffusion model trained specifically for amodal completion—synthesizing full objects from partially visible ones.
LaMa (Large Mask Inpainting)
Fast Fourier convolution-based inpainting. Great for large masked regions.
# Example: Inpainting workflow from diffusers import StableDiffusionInpaintPipeline pipe = StableDiffusionInpaintPipeline.from_pretrained( "runwayml/stable-diffusion-inpainting" ) for layer_name, mask in masks.items(): # Expand mask to include occluded regions that need filling occlusion_mask = compute_occlusion_mask(layer_name, all_masks) # Inpaint the hidden parts completed_layer = pipe( prompt=f"anime character {layer_name}, consistent style", image=original_image, mask_image=occlusion_mask ).images[0] # Extract layer with alpha channel rgba_layer = apply_alpha_mask(completed_layer, mask) layers[layer_name] = rgba_layer
Stage 3: Mesh Generation
Each RGBA layer needs a deformable mesh. The mesh defines how the layer can be warped and animated.
3 Create Triangle Meshes
Live2D uses triangulated meshes where vertices can be moved to deform the texture.
# Example: Mesh generation for a layer import cv2 from scipy.spatial import Delaunay def generate_mesh(rgba_layer, density="medium"): # Extract alpha channel for shape alpha = rgba_layer[:, :, 3] # Find contours (outline of the layer) contours, _ = cv2.findContours(alpha, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) # Sample points along contours edge_points = sample_contour_points(contours, spacing=10) # Add interior points for better deformation interior_points = generate_interior_grid(alpha, spacing=20) # Combine and triangulate all_points = np.vstack([edge_points, interior_points]) triangles = Delaunay(all_points) return { "vertices": all_points, # [[x, y], ...] "triangles": triangles.simplices, # [[v0, v1, v2], ...] "uvs": normalize_to_uv(all_points, rgba_layer.shape) # Texture coords }
- Too sparse: Deformations look blocky and unnatural
- Too dense: Performance issues, harder to animate
- Sweet spot: ~50-200 vertices per major part (eye, mouth), ~200-500 for large parts (hair, body)
Stage 4: Deformer Setup
Deformers control HOW the mesh moves. Live2D has two types:
| Deformer Type | Best For | How It Works |
|---|---|---|
| Warp Deformer | Organic shapes (hair, clothes) | Grid of control points that bend the mesh |
| Rotation Deformer | Rigid parts (head, limbs) | Rotates around a pivot point |
4 Auto-Generate Deformer Hierarchy
# Example: Deformer structure template DEFORMER_TEMPLATE = { "root": { "type": "rotation", "children": { "head": { "type": "rotation", "params": ["ParamAngleX", "ParamAngleY", "ParamAngleZ"], "children": { "face": { "type": "warp", "grid": [3, 3] }, "front_hair": { "type": "warp", "grid": [4, 4], "physics": True }, "back_hair": { "type": "warp", "grid": [4, 4], "physics": True }, "left_eye": { "type": "warp", "params": ["ParamEyeLOpen", "ParamEyeBallX", "ParamEyeBallY"] }, "right_eye": { /* mirror of left_eye */ }, "mouth": { "type": "warp", "params": ["ParamMouthOpenY", "ParamMouthForm"] } } }, "body": { "type": "rotation", "params": ["ParamBodyAngleX", "ParamBodyAngleY"] } } } }
Stage 5: Parameter & Keyform Binding
Parameters define controllable variables (e.g., "ParamMouthOpenY" = 0 to 1). Keyforms define what the mesh looks like at specific parameter values.
5 Generate Keyforms for Each Parameter
This is the hardest part to automate. Options:
Option A: Template-Based (Recommended for MVP)
# Use predefined deformation patterns KEYFORM_TEMPLATES = { "ParamMouthOpenY": { 0.0: "mouth_closed.json", # Mouth vertices at rest 1.0: "mouth_open.json" # Mouth vertices stretched down }, "ParamEyeLOpen": { 0.0: "eye_closed.json", # Eyelid vertices covering eye 1.0: "eye_open.json" # Eyelid vertices raised } } # Apply template deformations scaled to the actual mesh def apply_template(mesh, param_name, param_value, templates): template = templates[param_name][param_value] scaled_offsets = scale_template_to_mesh(template, mesh) return mesh.vertices + scaled_offsets
Option B: AI-Predicted Deformations (Advanced)
# Train a model to predict vertex offsets # Input: mesh vertices, parameter name, parameter value # Output: vertex offset deltas class DeformationPredictor(nn.Module): def forward(self, vertices, param_embedding, param_value): # Encoder-decoder architecture # Predicts how each vertex should move return vertex_offsets
ParamMouthOpenY- Lip sync (REQUIRED)ParamMouthForm- Smile/frownParamEyeLOpen/ParamEyeROpen- BlinkingParamAngleX/Y/Z- Head tilt
ParamMouthOpenY for basic lip sync functionality.
Stage 6: Export to Live2D Format
Live2D models consist of multiple files:
├── model.model3.json ← Entry point, links everything
├── model.moc3 ← Compiled binary (mesh, deformers, params)
├── ๐ textures/
│ └── texture_00.png ← Packed texture atlas
├── model.physics3.json ← Physics simulation config
└── ๐ motions/
└── idle.motion3.json ← Optional animations
6 Generate model3.json and .moc3
# model3.json structure { "Version": 3, "FileReferences": { "Moc": "model.moc3", "Textures": ["textures/texture_00.png"], "Physics": "model.physics3.json", "Motions": { "Idle": [{ "File": "motions/idle.motion3.json" }] } }, "Groups": [ { "Target": "Parameter", "Name": "LipSync", "Ids": ["ParamMouthOpenY"] }, { "Target": "Parameter", "Name": "EyeBlink", "Ids": ["ParamEyeLOpen", "ParamEyeROpen"] } ] }
.moc3 file is a proprietary binary format. You have
two options:
- Use Live2D Cubism SDK: Export from Cubism Editor (requires their software)
- Reverse-engineer format: Community efforts exist but are incomplete and legally gray
- Alternative: Export to an intermediate format and convert using Cubism Editor's batch export
๐ ️ Implementation Approaches
Approach A: Python Pipeline + Cubism Export
- Run segmentation + inpainting in Python
- Export layered PSD with layer names matching Cubism conventions
- Use Cubism Editor's auto-mesh and template features
- Script Cubism Editor automation if possible
Approach B: WebGL Runtime (No .moc3)
- Build your own simplified Live2D-like renderer
- Use JSON for mesh/deformer/parameter definitions
- Render with WebGL/Three.js
- Less compatible but fully open
Approach C: Hybrid with ComfyUI
- Use ComfyUI workflows for segmentation + inpainting
- Export intermediate format
- Post-process into Live2D structure
๐ Resources & Libraries
| Component | Library/Tool | Link |
|---|---|---|
| Anime Segmentation | anime-segmentation | GitHub |
| Face Segmentation | Anime-Face-Segmentation | GitHub |
| Inpainting | Stable Diffusion Inpainting | HuggingFace |
| Amodal Completion | pix2gestalt | GitHub |
| Layer Decomposition | Qwen-Image-Layered | HuggingFace |
| Live2D SDK | Cubism SDK for Web | live2d.com |
| Mesh Generation | scipy.spatial.Delaunay | SciPy |
๐ฏ Minimum Viable Pipeline
- Segment face + hair + eyes + mouth (4 layers minimum)
- Inpaint hair behind face
- Generate simple quad meshes (not full triangulation)
- Implement only
ParamMouthOpenYkeyforms - Export as custom JSON + use a WebGL renderer
The full automated pipeline is challenging but achievable. Start with the MVP, iterate on quality, and expand parameter support over time. ๐
Comments
Post a Comment