Automated Sound Detection Identified a Blue Grosbeak, but was it Accurate?
Dr. Jim Kellam, Associate Professor of Biology, Saint Vincent College
It happened while I was on vacation and had no internet service. Bill Powers, President of PixCams, Inc., texted me to say that our recently installed bird feeder camera with microphone had picked up the song of a Blue Grosbeak at Winnie Palmer Nature Reserve (WPNR). Blue Grosbeaks are rare birds here in western Pennsylvania. There are some that breed in neighboring Allegheny County at the Imperial Grasslands. However, the last time one was seen in Westmoreland County was in 1976. If you’ve never seen one, picture a Northern Cardinal, but instead of red, it is deep blue. Elsewhere in the south and central U.S., the Blue Grosbeak can be found nesting in shrubby roadside and agricultural habitats. WPNR is full of habitat like that, and it’s true that the species’ breeding range is slowly expanding northward. In winter, all individuals migrate to the Caribbean and Central America.
The Blue Grosbeak’s song was detected July 8, 2022 at WPNR at five separate times between 2pm and 6pm. It never appeared on camera, but the camera’s audio is fed through the BirdNET-Pi system, which processes the sound and attempts to identify the bird species that made the vocalizations. BirdNET is one of several artificial neural networks that have been developed for this purpose. Readers may already be familiar with BirdNET because the popular bird-identification app called Merlin uses it to analyze songs. BirdNET can be configured to give an estimate of confidence for each detection. The songs on July 8 had an average confidence of 83%, which I consider high. The problem is, 83% isn’t the same as 100%, and since these computer models are so new, I’m not sure having a computer that is 83% confident means what we think it does. It can only estimate confidence based on previous “training” that investigators used to test their algorithm, and if conditions at WPNR (or anywhere else) are different than what the investigators assumed, then the estimate of confidence the system gives is probably too high.
Let me stop and explain that I have a great deal of training in statistical testing, but not in computer models like this. I’ve read a few academic papers about BirdNET, but I’m not an expert on computing or song analysis. I’m going to try to summarize some things and in so doing, I may not get the details all right, but hopefully my overall assessments will be mostly right—let’s assume something like 83%!
Getting back to the Blue Grosbeak detections: BirdNET thinks it saw a pattern in the sound that matched the grosbeak. Sound can be easily converted into a visualization called a spectrogram. There are three dimensions to a spectrogram. First, the pitch (aka frequency) is shown. The pitch changes over time, which is the second dimension. A third dimension is the intensity (aka volume) of sound at each pitch. All these values can be input into the computer and a new vocalization can be compared to a vast collection of sounds with known bird species identity. But no system is perfect, and errors will be made. False-positives occur when the computer says the bird is there when in fact it is not. False-negatives occur when the bird is there but it is not detected by the computer.
People are starting to use BirdNET through the Merlin app and via other applications such as BirdNET-pi and BirdWeather. They wonder how accurate these tools really are. The answer is not simple. One frequently cited but unpublished study suggests a 15-20% error rate. My own unpublished trials fall within this range at 16%. The researchers who developed BirdNET explained in their own publication that BirdNET is most accurate when the bird vocalizations are from species that were common to the listening stations at Sapsucker Woods in Ithaca, NY, where most of the sound recordings used to teach the system were made. The listening stations were near bird feeders, so birds that visit bird feeders have the highest probability of being identified accurately. If a species is judged common in the location of the listening device, then the researchers decided to have BirdNET return a higher confidence estimate. BirdNET works best when there is a higher quality microphone being used, less background noise, and no overlap between individual songs (5).
The accuracy of identifications also varies depending on the species in question. BirdNET appears to do poorly with sounds from the European Starling (1), Northern Mockingbird (1), Barred Owl (2), and Common Loon (2). It’s exceptionally good with identifying Brown Thrasher (3) and Red Crossbill (4). No word on the Blue Grosbeak! The aforementioned results do not have a straightforward explanation, but it is important to note that the collection of known bird sounds that BirdNET uses to compare to the new sound does not include the same number or recordings for each species. On average, the scientists used 184 bird songs to teach the system how to identify each species (5). However there were some bird species for which far fewer sound recordings available. Maybe 184 recordings sounds like plenty to you, but do they include the sounds made by young of the year, who may not sound like the adults? Male birds sing the most, but females of some species sing, and they sound a little different. Does the BirdNET database include enough female songs to prevent false-negatives? Birds have also been shown to have many regional dialects (like humans have accents). BirdNET could mistake those for another species or it might fail to recognize them. Finally, each species has a variety of songs and calls that it makes in different contexts. The model likely doesn’t include the full repertoire.
The BirdNET researchers say the system’s accuracy varies from 97% to 60%, depending on background noise (5; the percentages here refer to a statistical measure called “area under the curve”).
I wonder if our Blue Grosbeak is a false-positive. After all, if it was real, why did it show up on just one day in the middle of summer? If it were breeding here, then surely we would have heard or seen the bird in person before now. If it was just passing through, then why would it bother to sing at all? Song is used for territorial defense or mate attraction. I want to believe, but I have doubts.
This reminds me of the automated sound recording devices that picked up sounds matched to the Ivory-billed Woodpecker, a bird recently declared extinct by the U.S. Fish & Wildlife Service. I don’t want to get into an argument about whether the bird still exists or not, but I do have doubts about the sound-based evidence gathered and analyzed by these computer systems. If false-positives are common enough for species that we know exist and have a lot of data for, then surely false-positives are all the more likely for a bird that is so rare that it is now considered extinct, and for which we have very limited numbers of sound recordings. Critics of the Ivory-bill’s existence say that Blue Jays could make the same kinds of sounds picked up by the automated sensors. I agree.
For full disclosure, I should say that a birder better than I has told me I’m too conservative with my identifications; that is, it takes me more evidence than the average birder to be comfortable with an identification
I listened to the recordings BirdNET identified as Blue Grosbeak. The songs do sound like a Blue Grosbeak, but not exactly. In fact, I think some of them sound more like a Common Yellowthroat, a small yellow-and-black warbler that is in fact common to WPNR and all of North America. This got me curious about whether BirdNET was properly detecting the Common Yellowthroat, so I used the BirdNET-Pi interface to visit multiple sites using the system in Pennsylvania, Tennessee, and Virginia. None of them listed the Common Yellowthroat as being detected this week. I am certain that this is an example of a false-negative because the species is so common.
Other examples of a false-negative would be if a species is nearby but doesn’t vocalize. Or if it does vocalize, maybe it is too far away to be picked up by the microphone, and/or maybe there is background noise that prevents the computer from identifying it. The scientific implications of a false-negative are that a rare species would not be detected, and we would wrongly assume that it is not at a particular site. It would therefore not be protected from harm or provided aid in terms of management practices that could be put in place to sustain it.
Scientific implications of a false-positive are just as harmful as a false-negative. This is due to how the sound data can be used to study bird populations. Let’s say I would like to study the habitat features at sites where Blue Grosbeaks occur. To do that, I’d look at the vegetation around WPNR, at the Imperial Grasslands in Allegheny County, and in southeastern Pennsylvania where it is also found. I could then come up with a model to predict what other locations in the state have a similar habitat structure and propose to the Game Commission that those locations be protected. But if the Blue Grosbeak does not live at WPNR, then the data I’ve used to create my model is faulty, and the Game Commission might then be wasting resources on protecting the wrong habitat. Models are only as good as the data that are used to develop them.
The scientific community is just getting started on assessing BirdNET. It’s not fair to say “the jury is still out” because that implies we will have a final decision and it will either accept or reject the technology. Instead, technology and science as a whole are a series of steps that steadily progress in the search for truth. The BirdNET algorithms could be made better than they are today, and they will be. More data are added to the system over time, and more refinements are made to the program. For now, I will use it to study patterns among relatively common birds and keep track of the rarities. Each rarity will have to be evaluated on a case-by-case basis and verified in person when possible. As for the Blue Grosbeak, as of today, I am fairly sure now that it does not exist at WPNR! I went to the birdfeeder camera, played songs of both the grosbeak and a Common Yellowthroat, and the system only recognized the grosbeak songs. I’m glad human ornithologists are still needed to figure these things out!