This table presents publicly available bioacoustic datasets that can be used with machine learning.
Want to add a dataset? Follow the instructions of this README file.
Want to contribute to make this list better? Open a pull request on this github repo!
Name | Title | Taxonomic Class | Tot. Duration in Hours | Locality |
---|---|---|---|---|
Data from: An archive of longitudinal recordings of the vocalizations of adult Gombe chimpanzees | Primates (Gombe Chimpanzes) | 10 | Gombe National Park | |
AnuraSet: A dataset for benchmarking neotropical anuran calls identification in passive acoustic monitoring | Anuran | 27 | 4 sites in the Cerrado and Atlantic forest biomes, Brazil | |
ArcticBirdSounds: an open-access, multi-year, and detailed annotated dataset of bird songs and calls | Birds | 20 | six locations across the Arctic, | |
Audio-Based identification of Beehive states: The dataset | Insects | 96 | Not specified | |
Audio tagging of avian dawn chorus recordings in California, Oregon, and Washington | Birds | 131.25 | 525 sites | |
BIRDeep Audio Annotations | Birds | 8.7 | 9 sites across Doñana National Park | |
To bee or not to bee: An annotated dataset for beehive sound recognition | Insects | 12 | ||
Bengalese Finch song repository | Birds | 14.75 | Sober Lab at Emory University in Atlanta | |
Data from: A simple explanation for the evolution of complex song syntax in Bengalese finches | Birds | 5 | RIKEN Brain Science
Institute in Saitama | |
BirdsongRecognition | Birds | 9.8 | Not specified | |
BirdVox-14SD: a dataset of flight calls with species annotation | Birds | Ithaca | ||
BirdVox-296h: a large-scale dataset for detection and classification of flight calls | Birds | 296 | Ithaca, across nine locations | |
BirdVox-70k: a dataset for species-agnostic flight call detection in half-second clips | Birds | 3 | Ithaca, in six different locations | |
BirdVox-ANAFCC: A dataset for American Northeast Avian Flight Call Classification | Birds | North-East USA | ||
BirdVox-DCASE-20k: a dataset for bird audio detection in 10-second clips | Birds | 55.5 | Ithaca, in six different locations | |
BirdVox-full-night: a dataset for avian flight call detection in continuous recordings | Birds | 62 | Ithaca, in six different locations | |
BirdVox-full-season: 6672 hours of audio from migratory birds | Birds | 6651 | Ithaca, across nine locations | |
Black-and-white ruffed lemur (Varecia variegata) calls for passive acoustic monitoring | Primates (Lemurs) | 60 | sub-humid rainforest site (Mangevo) in the southeast of Ranomafana National Park | |
AcousticTrends_BlueFinLibrary | Marine Mammals (Blue and fin whales) | 1880.25 | ||
Data from: Towards the automatic classification of avian flight calls for bioacoustic monitoring | Birds | 0.2166666667 | ||
Data from: Towards the automatic classification of avian flight calls for bioacoustic monitoring | Birds | 67 | Ithaca and New-York City | |
Data from: Towards the automatic classification of avian flight calls for bioacoustic monitoring | Birds | 5.616666667 | Ithaca and New-York City | |
Data used in PLoS One article 'Complexity, Predictability and Time Homogeneity of Syntax in the Songs of Cassins Vireo (Vireo cassini)' by Hedley (2016) | Birds | Sierra Nevada Mountains | ||
Datasets for automatic acoustic identification of individual birds | Birds | outer boundary of České Budějovice town (48°59.5′ N, 14°26.5′ E) | ||
A collection of fully-annotated soundscape recordings from neotropical coffee farms in Colombia and Costa Rica | Birds | 34 | Jardín, Colombia and San Ramon | |
Close range vocal interaction through trill calls in the common marmoset (Callithrix jacchus) | Primates (Marmosets) | Laboratory | ||
Congo Soundscapes, Public Database | Mixed | 2400 | 50 sites in a 25km2 grid in the tropical rain forest in northern republic of congo (the Nouabalé-Ndoki National Park) | |
Cornell Birdcall Identification | Birds | World (focused on USA?) | ||
DB3V: A Dialect Dominated Dataset of Bird Vocalisation for Cross-corpus Bird Species Recognition | Birds | 25.48666667 | ||
DCLDE 2022 Raw Passive Acoustic Data | Cetaceans and seabirds | US waters around the islands | ||
Labeled songs of domestic canary M1-2016-spring (Serinus canaria) | Birds | 3 | sound-isolation chamber | |
EDANSA-2019: The Ecoacoustic Dataset from Arctic North Slope Alaska | 27 | North Slope of Alaska | ||
An annotated dataset of Egyptian fruit bat vocalizations across varying contexts and during vocal ontogeny | Bats | Herzliya | ||
An annotated set of audio recordings of Eastern North American birds containing frequency, time, and species information | Birds | 6.4 | Powdermill Nature Reserve | |
An acoustically isolated European starling song library | Birds | Acoustically isolated chambers | ||
GardenFiles23 | 700 | A large back garden of approximately 80m2 with fruit trees and other vegetation | ||
The Vocal Repertoire of Adult and Neonate Giant Otters (Pteronura brasiliensis) | Mammals (Giant Otter) | Five lakes in Peru (wild groups), and three german zoos (captive groups) | ||
Data from: Superregular grammars do not provide additional explanatory power but allow for a compact analysis of animal song | Primates (Gibbons) | Primate Research institute, Kyoto University | ||
DCASE 2024 Task 5: Few-shot Bioacoustic Event Detection Development Set | Mammals (Hyenas) | 5 | ||
A collection of fully-annotated soundscape recordings from the Island of Hawai'i | Birds | 51 | Four locations | |
HumBugDB: a large-scale acoustic mosquito dataset | Insects (Mosquitoes) | 35 | 8 experimenting sites | |
ZooniverseData | Insects (Mosquitoes) | 22 | 4 sites in the world | |
The iNaturalist Sounds Dataset | Birds, Mammals, Insects, Reptiles, Amphibians | 1200 | ||
Data from: Longitudinal recordings of the vocalizations of immature Gombe chimpanzees for developmental studies | Primates (Gombe Chimpanzes) | 10 | Gombe National Park | |
InfantMarmosetsVox | Primates (Marmosets) | 58.33333333 | Two separate sound-proofed recording rooms | |
InsectSet32: Dataset for automatic acoustic identification of insects (Orthoptera and Cicadidae) | Insects (Orthoptera and Cicadidae) | 1 | ||
InsectSet47 & InsectSet66: Expanded datasets for automatic acoustic identification of insects (Orthoptera and Cicadidae) | Insects (Orthoptera and Cicadidae) | 22 | ||
InsectSet47 & InsectSet66: Expanded datasets for automatic acoustic identification of insects (Orthoptera and Cicadidae) | Insects (Orthoptera and Cicadidae) | 24 | ||
Dataset: InsectSound1000 | Insects | 115 | Julius Kühn-Institute | |
DCASE 2024 Task 5: Few-shot Bioacoustic Event Detection Development Set | Birds (Jackdaw) | 0.17 | ||
A collection of annotated soundscape recordings from western Kenya | Birds | 32 | Various sites west and southwest of Lake Baringo | |
Datasets for automatic acoustic identification of individual birds | Birds | 1.333333333 | northern Bohemia, Czech Republic (50°23′ N, 13°40′ E), and eastern Hungary (47°33′ N, 20°54′ E) | |
DCASE 2024 Task 5: Few-shot Bioacoustic Event Detection Development Set | Mammals (Meerkats) | 1.17 | Kuruman River Reserve | |
Data from: Distributed acoustic cues for caller identity in macaque vocalization | Primates (Macaques) | 0.7 | Laboratory | |
Marmoset vocalizations | Primates (Common Marmoset) | A laboratory in Natal, Rio Grande do Norte | ||
MeerKAT: Meerkat Kalahari Audio Transcripts | Mammals (Meerkats) | 1068 | Kalahari Research Centre | |
NIPS4Bplus: Transcriptions of NIPS4B 2013 Bird Challenge Training Dataset | Birds | 1 | Provence region, Andalusia | |
Nocturnal flight calls dataset: long-term acoustic monitoring of birds migrating at night | Birds | 56 | Baltic Sea coast (Dąbkowice, near Darłowo) | |
North American bird species | Birds | 0.1666666667 | ||
A collection of fully-annotated soundscape recordings from the Northeastern United States | Birds | 285 | Sapsucker Woods bird sanctuary in Ithaca | |
Neotropical forest soundscapes with call identifications for katydids | Insects (Katydids) | 4.5 | Two sites in the forest canopy of Barro Colorado Island | |
Sounds of neotropical katydids from Barro Colorado Island, Panama | Insects (Katydids) | Barro Colorado Island | ||
An Annotated and Segmented Acoustic Dataset of 7 Picidae Species | Birds (Picidae) | 1.4 | ||
Pin-tailed whydah (Vidua macroura) calls for passive acoustic monitoring | Birds | 6 | Intaka Island Nature Reserve in Cape Town | |
Leveraging tropical reef, bird and unrelated sounds for superior transfer learning in marine bioacoustics | Mixed | 29.82 | ||
Dataset for USVSEG performance test | Mammals (Rodents) | Laboratory | ||
RookID: an annotated dataset of vocalisations produced by individually-identified rooks housed together in an outdoors aviary in France | Birds | 17.4 | Strasbourg | |
Dataset for 'Individual, but not nest cluster or colonial, signature in the loud nest call of captive and wild female rooks (Corvus frugilegus)': annotated audio of brooding female rooks in five colonies with individual-level identification | Birds | Strasbourg and Cambridge | ||
Silent·Cities | 45145 | 317 sites in the world | ||
The Soundwel Database: a labeled pig vocalization repository | Mammals (Pigs) | |||
A collection of fully-annotated soundscape recordings from the southern Sierra Nevada mountain range | Birds | 16.67 | 10 lakes in Sequoia and Kings Canyon National Parks, above 3000m in altitude, in the Sierra Nevada Mountains | |
A collection of fully-annotated soundscape recordings from the Southwestern Amazon Basin | Birds | 21 | Inkaterra Reserva Amazonica, Madre de Dios | |
Thyolo alethe (Chamaetylas choloensis) calls for passive acoustic monitoring | Birds | 10 | Mount Mulanje Biosphere Reserve | |
Training dataset for NABat Machine Learning V1.0 | Bats | |||
Datasets for automatic acoustic identification of individual birds | Birds | 2.3 | Brdská vrchovina, the Czech Republic (49°84′ N, 14°10′ E) | |
Multimodal Birds Song Dataset: TwoRadioBirds | Birds | 11 | Laboratory, in their home cage | |
Ugandan Bird Vocalizations | Birds | 7 locations | ||
WABAD: A World Annotated Bird Acoustic Dataset for Passive Acoustic Monitoring | Birds | 84 | 70 recording sites distributed across 13 biomes | |
Western Mediterranean Wetlands Bird Dataset | Birds | 3.4 | Europe mainly | |
DCASE 2024 Task 5: Few-shot Bioacoustic Event Detection Development Set | Birds | 4.6 | Europe mainly | |
Watkins Library | Marine mammals | |||
A collection of fully-annotated soundscape recordings from the Western United States | Birds | 33 | Lassen and Plumas National Forests, Sierra Nevada Mountains in California, USA | |
Data from: A simple explanation for the evolution of complex song syntax in Bengalese finches | Birds | 4.5 | Birds captured in Huben, Mataian and Taipei (China), then recorded in an indoor space (either RIKEN Brain Science Institue in Japan, or locally) | |
Wytham Great Tit Song Dataset | Birds | 703 nest sites in Wytham Woods, Oxfordshire, UK (51°46 N, 1°20 W) | ||
Dataset for 'Benchmarking for the automated detection of southern yellow-cheeked crested gibbon calls from passive acoustic monitoring data' | Primates (yellow-cheeked crested gibbon) | 36 | Andoung Kraleung Village | |
Vocal repertoires from adult and chick, male and female zebra finches (Taeniopygia guttata) | Birds | Theunissen Lab, UC Berkeley | ||
The Rockefeller University Field Research Center Song Library | Birds | Sound attenuation chamber, in Rockefeller University Field Center Colony | ||
Zebra Finch Syllable Detector | Birds | Sound isolated chamber | ||
Ff1010bird | Birds | 21.4 | ||
Automated detection of Hainan gibbon calls for passive acoustic monitoring | Primates (Hainan Gibbons) | 6000 | Bawangling National Nature Reserve in Hainan | |
Rainforest Connection Species Audio Detection | Frogs and birds | |||
Warblrb10k_public | Birds | 22 |