SPRITETOMESH: Automatic Mesh Generation for 2D Skeletal Animation Using Learned Segmentation and Contour-Aware Vertex Placement

2026-02-24Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors created SPRITETOMESH, a tool that automatically turns 2D game sprites into triangle meshes for animation, which normally takes artists a long time to do by hand. Their method uses a neural network to accurately find the sprite's shape and then applies algorithms to place mesh points along the edges and important lines inside the sprite. They tried a different neural network method to place points directly but found it doesn't work well because there are many valid ways to do it. Their combined approach is much faster, processing each sprite in a few seconds, and they shared their model with the game development community.

2D game spritetriangle meshskeletal animationsegmentation networkEfficientNet-B0U-NetDouglas-Peucker simplificationCanny edge detectionDelaunay triangulationheatmap regression
Authors
Bastien Gimbert
Abstract
We present SPRITETOMESH, a fully automatic pipeline for converting 2D game sprite images into triangle meshes compatible with skeletal animation frameworks such as Spine2D. Creating animation-ready meshes is traditionally a tedious manual process requiring artists to carefully place vertices along visual boundaries, a task that typically takes 15-60 minutes per sprite. Our method addresses this through a hybrid learned-algorithmic approach. A segmentation network (EfficientNet-B0 encoder with U-Net decoder) trained on over 100,000 sprite-mask pairs from 172 games achieves an IoU of 0.87, providing accurate binary masks from arbitrary input images. From these masks, we extract exterior contour vertices using Douglas-Peucker simplification with adaptive arc subdivision, and interior vertices along visual boundaries detected via bilateral-filtered multi-channel Canny edge detection with contour-following placement. Delaunay triangulation with mask-based centroid filtering produces the final mesh. Through controlled experiments, we demonstrate that direct vertex position prediction via neural network heatmap regression is fundamentally not viable for this task: the heatmap decoder consistently fails to converge (loss plateau at 0.061) while the segmentation decoder trains normally under identical conditions. We attribute this to the inherently artistic nature of vertex placement - the same sprite can be meshed validly in many different ways. This negative result validates our hybrid design: learned segmentation where ground truth is unambiguous, algorithmic placement where domain heuristics are appropriate. The complete pipeline processes a sprite in under 3 seconds, representing a speedup of 300x-1200x over manual creation. We release our trained model to the game development community.