EU flag

Datasets for bioacoustics

This table presents publicly available bioacoustic datasets that can be used with machine learning.

Want to add a dataset? Follow the instructions of this README file.

Want to contribute to make this list better? Open a pull request on this github repo!

Name Title Taxonomic Class Tot. Duration in Hours Locality
Data from: An archive of longitudinal recordings of the vocalizations of adult Gombe chimpanzees
Primates (Gombe Chimpanzes)
10
Gombe National Park
AnuraSet: A dataset for benchmarking neotropical anuran calls identification in passive acoustic monitoring
Anuran
27
4 sites in the Cerrado and Atlantic forest biomes, Brazil
ArcticBirdSounds: an open-access, multi-year, and detailed annotated dataset of bird songs and calls
Birds
20
six locations across the Arctic,
Audio-Based identification of Beehive states: The dataset
Insects
96
Not specified
Audio tagging of avian dawn chorus recordings in California, Oregon, and Washington
Birds
131.25
525 sites
BIRDeep Audio Annotations
Birds
8.7
9 sites across Doñana National Park
To bee or not to bee: An annotated dataset for beehive sound recognition
Insects
12
Bengalese Finch song repository
Birds
14.75
Sober Lab at Emory University in Atlanta
Data from: A simple explanation for the evolution of complex song syntax in Bengalese finches
Birds
5
RIKEN Brain Science Institute in Saitama
BirdsongRecognition
Birds
9.8
Not specified
BirdVox-14SD: a dataset of flight calls with species annotation
Birds
Ithaca
BirdVox-296h: a large-scale dataset for detection and classification of flight calls
Birds
296
Ithaca, across nine locations
BirdVox-70k: a dataset for species-agnostic flight call detection in half-second clips
Birds
3
Ithaca, in six different locations
BirdVox-ANAFCC: A dataset for American Northeast Avian Flight Call Classification
Birds
North-East USA
BirdVox-DCASE-20k: a dataset for bird audio detection in 10-second clips
Birds
55.5
Ithaca, in six different locations
BirdVox-full-night: a dataset for avian flight call detection in continuous recordings
Birds
62
Ithaca, in six different locations
BirdVox-full-season: 6672 hours of audio from migratory birds
Birds
6651
Ithaca, across nine locations
Black-and-white ruffed lemur (Varecia variegata) calls for passive acoustic monitoring
Primates (Lemurs)
60
sub-humid rainforest site (Mangevo) in the southeast of Ranomafana National Park
AcousticTrends_BlueFinLibrary
Marine Mammals (Blue and fin whales)
1880.25
Data from: Towards the automatic classification of avian flight calls for bioacoustic monitoring
Birds
0.2166666667
Data from: Towards the automatic classification of avian flight calls for bioacoustic monitoring
Birds
67
Ithaca and New-York City
Data from: Towards the automatic classification of avian flight calls for bioacoustic monitoring
Birds
5.616666667
Ithaca and New-York City
Data used in PLoS One article 'Complexity, Predictability and Time Homogeneity of Syntax in the Songs of Cassins Vireo (Vireo cassini)' by Hedley (2016)
Birds
Sierra Nevada Mountains
Datasets for automatic acoustic identification of individual birds
Birds
outer boundary of České Budějovice town (48°59.5′ N, 14°26.5′ E)
A collection of fully-annotated soundscape recordings from neotropical coffee farms in Colombia and Costa Rica
Birds
34
Jardín, Colombia and San Ramon
Close range vocal interaction through trill calls in the common marmoset (Callithrix jacchus)
Primates (Marmosets)
Laboratory
Congo Soundscapes, Public Database
Mixed
2400
50 sites in a 25km2 grid in the tropical rain forest in northern republic of congo (the Nouabalé-Ndoki National Park)
Cornell Birdcall Identification
Birds
World (focused on USA?)
DB3V: A Dialect Dominated Dataset of Bird Vocalisation for Cross-corpus Bird Species Recognition
Birds
25.48666667
DCLDE 2022 Raw Passive Acoustic Data
Cetaceans and seabirds
US waters around the islands
Labeled songs of domestic canary M1-2016-spring (Serinus canaria)
Birds
3
sound-isolation chamber
EDANSA-2019: The Ecoacoustic Dataset from Arctic North Slope Alaska
27
North Slope of Alaska
An annotated dataset of Egyptian fruit bat vocalizations across varying contexts and during vocal ontogeny
Bats
Herzliya
An annotated set of audio recordings of Eastern North American birds containing frequency, time, and species information
Birds
6.4
Powdermill Nature Reserve
An acoustically isolated European starling song library
Birds
Acoustically isolated chambers
GardenFiles23
700
A large back garden of approximately 80m2 with fruit trees and other vegetation
The Vocal Repertoire of Adult and Neonate Giant Otters (Pteronura brasiliensis)
Mammals (Giant Otter)
Five lakes in Peru (wild groups), and three german zoos (captive groups)
Data from: Superregular grammars do not provide additional explanatory power but allow for a compact analysis of animal song
Primates (Gibbons)
Primate Research institute, Kyoto University
DCASE 2024 Task 5: Few-shot Bioacoustic Event Detection Development Set
Mammals (Hyenas)
5
A collection of fully-annotated soundscape recordings from the Island of Hawai'i
Birds
51
Four locations
HumBugDB: a large-scale acoustic mosquito dataset
Insects (Mosquitoes)
35
8 experimenting sites
ZooniverseData
Insects (Mosquitoes)
22
4 sites in the world
The iNaturalist Sounds Dataset
Birds, Mammals, Insects, Reptiles, Amphibians
1200
Data from: Longitudinal recordings of the vocalizations of immature Gombe chimpanzees for developmental studies
Primates (Gombe Chimpanzes)
10
Gombe National Park
InfantMarmosetsVox
Primates (Marmosets)
58.33333333
Two separate sound-proofed recording rooms
InsectSet32: Dataset for automatic acoustic identification of insects (Orthoptera and Cicadidae)
Insects (Orthoptera and Cicadidae)
1
InsectSet47 & InsectSet66: Expanded datasets for automatic acoustic identification of insects (Orthoptera and Cicadidae)
Insects (Orthoptera and Cicadidae)
22
InsectSet47 & InsectSet66: Expanded datasets for automatic acoustic identification of insects (Orthoptera and Cicadidae)
Insects (Orthoptera and Cicadidae)
24
Dataset: InsectSound1000
Insects
115
Julius Kühn-Institute
DCASE 2024 Task 5: Few-shot Bioacoustic Event Detection Development Set
Birds (Jackdaw)
0.17
A collection of annotated soundscape recordings from western Kenya
Birds
32
Various sites west and southwest of Lake Baringo
Datasets for automatic acoustic identification of individual birds
Birds
1.333333333
northern Bohemia, Czech Republic (50°23′ N, 13°40′ E), and eastern Hungary (47°33′ N, 20°54′ E)
DCASE 2024 Task 5: Few-shot Bioacoustic Event Detection Development Set
Mammals (Meerkats)
1.17
Kuruman River Reserve
Data from: Distributed acoustic cues for caller identity in macaque vocalization
Primates (Macaques)
0.7
Laboratory
Marmoset vocalizations
Primates (Common Marmoset)
A laboratory in Natal, Rio Grande do Norte
MeerKAT: Meerkat Kalahari Audio Transcripts
Mammals (Meerkats)
1068
Kalahari Research Centre
NIPS4Bplus: Transcriptions of NIPS4B 2013 Bird Challenge Training Dataset
Birds
1
Provence region, Andalusia
Nocturnal flight calls dataset: long-term acoustic monitoring of birds migrating at night
Birds
56
Baltic Sea coast (Dąbkowice, near Darłowo)
North American bird species
Birds
0.1666666667
A collection of fully-annotated soundscape recordings from the Northeastern United States
Birds
285
Sapsucker Woods bird sanctuary in Ithaca
Neotropical forest soundscapes with call identifications for katydids
Insects (Katydids)
4.5
Two sites in the forest canopy of Barro Colorado Island
Sounds of neotropical katydids from Barro Colorado Island, Panama
Insects (Katydids)
Barro Colorado Island
An Annotated and Segmented Acoustic Dataset of 7 Picidae Species
Birds (Picidae)
1.4
Pin-tailed whydah (Vidua macroura) calls for passive acoustic monitoring
Birds
6
Intaka Island Nature Reserve in Cape Town
Leveraging tropical reef, bird and unrelated sounds for superior transfer learning in marine bioacoustics
Mixed
29.82
Dataset for USVSEG performance test
Mammals (Rodents)
Laboratory
RookID: an annotated dataset of vocalisations produced by individually-identified rooks housed together in an outdoors aviary in France
Birds
17.4
Strasbourg
Dataset for 'Individual, but not nest cluster or colonial, signature in the loud nest call of captive and wild female rooks (Corvus frugilegus)': annotated audio of brooding female rooks in five colonies with individual-level identification
Birds
Strasbourg and Cambridge
Silent·Cities
45145
317 sites in the world
The Soundwel Database: a labeled pig vocalization repository
Mammals (Pigs)
A collection of fully-annotated soundscape recordings from the southern Sierra Nevada mountain range
Birds
16.67
10 lakes in Sequoia and Kings Canyon National Parks, above 3000m in altitude, in the Sierra Nevada Mountains
A collection of fully-annotated soundscape recordings from the Southwestern Amazon Basin
Birds
21
Inkaterra Reserva Amazonica, Madre de Dios
Thyolo alethe (Chamaetylas choloensis) calls for passive acoustic monitoring
Birds
10
Mount Mulanje Biosphere Reserve
Training dataset for NABat Machine Learning V1.0
Bats
Datasets for automatic acoustic identification of individual birds
Birds
2.3
Brdská vrchovina, the Czech Republic (49°84′ N, 14°10′ E)
Multimodal Birds Song Dataset: TwoRadioBirds
Birds
11
Laboratory, in their home cage
Ugandan Bird Vocalizations
Birds
7 locations
WABAD: A World Annotated Bird Acoustic Dataset for Passive Acoustic Monitoring
Birds
84
70 recording sites distributed across 13 biomes
Western Mediterranean Wetlands Bird Dataset
Birds
3.4
Europe mainly
DCASE 2024 Task 5: Few-shot Bioacoustic Event Detection Development Set
Birds
4.6
Europe mainly
Watkins Library
Marine mammals
A collection of fully-annotated soundscape recordings from the Western United States
Birds
33
Lassen and Plumas National Forests, Sierra Nevada Mountains in California, USA
Data from: A simple explanation for the evolution of complex song syntax in Bengalese finches
Birds
4.5
Birds captured in Huben, Mataian and Taipei (China), then recorded in an indoor space (either RIKEN Brain Science Institue in Japan, or locally)
Wytham Great Tit Song Dataset
Birds
703 nest sites in Wytham Woods, Oxfordshire, UK (51°46 N, 1°20 W)
Dataset for 'Benchmarking for the automated detection of southern yellow-cheeked crested gibbon calls from passive acoustic monitoring data'
Primates (yellow-cheeked crested gibbon)
36
Andoung Kraleung Village
Vocal repertoires from adult and chick, male and female zebra finches (Taeniopygia guttata)
Birds
Theunissen Lab, UC Berkeley
The Rockefeller University Field Research Center Song Library
Birds
Sound attenuation chamber, in Rockefeller University Field Center Colony
Zebra Finch Syllable Detector
Birds
Sound isolated chamber
Ff1010bird
Birds
21.4
Automated detection of Hainan gibbon calls for passive acoustic monitoring
Primates (Hainan Gibbons)
6000
Bawangling National Nature Reserve in Hainan
Rainforest Connection Species Audio Detection
Frogs and birds
Warblrb10k_public
Birds
22