DeepForestSound: a multi-species automatic detector for passive acoustic monitoring in African tropical forests, a case study in Kibale National Park

2026-04-09Sound

SoundMachine Learning
AI summary

The authors created DeepForestSound (DFS), a computer model that listens to sounds in African tropical forests to identify different animals like birds, primates, and elephants. They used a mix of unsupervised and supervised learning techniques to train DFS on recordings from Uganda and tested it on new data from the same forest collected later. DFS was better than existing tools, especially for animals other than birds, showing it can work well across different times and places within the forest. This suggests that training models specific to a region and task can improve detecting animal sounds in complex environments.

Passive Acoustic MonitoringEcoacousticsAudio Spectrogram TransformerSemi-supervised learningLow-rank adaptation (LoRA)Biodiversity monitoringAfrican tropical forestsSpecies detectionGeneralizationKibale National Park
Authors
Gabriel Dubus, Théau d'Audiffret, Claire Auger, Raphaël Cornette, Sylvain Haupert, Innocent Kasekendi, Raymond Katumba, Hugo Magaldi, Lise Pernel, Harold Rugonge, Jérôme Sueur, John Justice Tibesigwa, Sabrina Krief
Abstract
Passive Acoustic Monitoring (PAM) is widely used for biodiversity assessment. Its application in African tropical forests is limited by scarce annotated data, reducing the performance of general-purpose ecoacoustic models on underrepresented taxa. In this study, we introduce DeepForestSound (DFS), a multi-species automatic detection model designed for PAM in African tropical forests. DFS relies on a semi-supervised pipeline combining clustering of unannotated recordings with manual validation, followed by supervised fine-tuning of an Audio Spectrogram Transformer (AST) using low-rank adaptation, which is compared to a frozen-backbone linear baseline (DFS-Linear). The framework supports the detection of multiple taxonomic groups, including birds, primates, and elephants, from long-term acoustic recordings. DFS was trained on acoustic data collected in the Sebitoli area, in Kibale National Park, Uganda, and evaluated on an independent dataset recorded two years later at different locations within the same forest. This evaluation therefore assesses generalization across time and recording sites within a single tropical forest ecosystem. Across 8 out of 12 taxons, DFS outperforms existing automatic detection tools, particularly for non-avian taxa, achieving average AP values of 0.964 for primates and 0.961 for elephants. Results further show that LoRA-based fine-tuning substantially outperforms linear probing across taxa. Overall, these results demonstrate that task-oriented, region-specific training substantially improves detection performance in acoustically complex tropical environments, and highlight the potential of DFS as a practical tool for biodiversity monitoring and conservation in African rainforests.