AI Driven Soccer Analysis Using Computer Vision
2026-04-09 • Computer Vision and Pattern Recognition
Computer Vision and Pattern RecognitionArtificial Intelligence
AI summaryⓘ
The authors developed a computer vision system to analyze soccer games by detecting and tracking players on the field. Their method uses object detection models like YOLO and Faster R-CNN combined with a segmentation tool called SAM2 to identify players accurately. They also use another model to find key points on the soccer field and convert the players' positions from the camera view to real-world distances using a math technique called homography. This helps coaches get detailed info about player movement and positioning that isn't easily visible in regular game videos. Overall, their system turns video footage into useful stats for improving team performance.
Object detectionTrackingYOLOFaster R-CNNSAM2 (Segment Anything Model 2)Key point detectionConvolutional Neural Network (CNN)HomographyPlayer positioningTactical analysis
Authors
Adrian Manchado, Tanner Cellio, Jonathan Keane, Yiyang Wang
Abstract
Sport analysis is crucial for team performance since it provides actionable data that can inform coaching decisions, improve player performance, and enhance team strategies. To analyze more complex features from game footage, a computer vision model can be used to identify and track key entities from the field. We propose the use of an object detection and tracking system to predict player positioning throughout the game. To translate this to positioning in relation to the field dimensions, we use a point prediction model to identify key points on the field and combine these with known field dimensions to extract actual distances. For the player-identification model, object detection models like YOLO and Faster R-CNN are evaluated on the accuracy of our custom video footage using multiple different evaluation metrics. The goal is to identify the best model for object identification to obtain the most accurate results when paired with SAM2 (Segment Anything Model 2) for segmentation and tracking. For the key point detection model, we use a CNN model to find consistent locations in the soccer field. Through homography, the positions of points and objects in the camera perspective will be transformed to a real-ground perspective. The segmented player masks from SAM2 are transformed from camera perspective to real-world field coordinates through homography, regardless of camera angle or movement. The transformed real-world coordinates can be used to calculate valuable tactical insights including player speed, distance covered, positioning heatmaps, and more complex team statistics, providing coaches and players with actionable performance data previously unavailable from standard video analysis.