Comparing Speech-to-Text APIs for Bubble: Whisper AI & AssemblyAI
There are some incredibly powerful speech-to-text APIs available that you can link right in with your Bubble app, allowing your users to upload audio files and video files and create a transcript from them. We've already got a video showing you how to use the Whisper API to convert speech to text, but I want this video to be a quick comparison of another service that is Assembly AI.
Price Comparison
The key points I would say is one is price, which is that measured in per minute, the Whisper API is 0.006 dollars, whereas the Assembly API is 0.015 dollars. So the Assembly API is roughly double the price of the Whisper API.
Limitations of Whisper API
But there are some limitations of Whisper that in a side project I'm building has led me to use Assembly AI. One of those, one of the things that is gonna restrict you is that Whisper currently does not accept files larger than 25 megabytes. And that's gonna be particularly difficult if you are wanting to transcribe videos, as an HD video is easily going to exceed 25 megabytes.
Challenges for No-Code Builders
And also, if you're a no-code builder like me, you're not going to have the technical skill to incorporate some library that does compression, you'd have to use another external service to compress your audio or video files, and that increases your cost overall. But Whisper API is very good at accepting some very common formats. And also, I have to say it has the edge on speed.
Bubble API Connector and Response Time
When you send a request from the Bubble API connector to Whisper, you get a response back very quickly. And Bubble is actually there waiting for the response. And if we park that, that's actually one of the shortfalls of Whisper compared to Assembly AI.
Assembly AI Benefits
Now, if we look at Assembly AI, and look at the pricing, so they price it per second, but as I say, that works out as 0.015 dollars a minute. So yeah, double the price of the OpenAI Whisper API. But here is where you get the benefits of Assembly AI. One of them is that although it takes slightly longer to process, you can actually get your response sent to you, or at least notified that your response is ready using a webhook.
Advantages of Assembly AI for Bubble Apps
This means that the Bubble API connector is not waiting for a response to come back from Assembly AI. So even if it takes five minutes, because you've uploaded some huge audio file, your Bubble app can receive that and check that it's ready, and then receive it in and process the data when it's ready. You're not left hanging or your users waiting with the loading bar going across the top, and you're not restricted by the fact that as of the time of recording, the Bubble API connector times out in between 50 and 60 seconds.
Assembly AI Extra Features
Finally, Assembly AI comes with loads of extra features baked in if we look at the audio intelligence. So these are things which if you've got a transcript back from Whisper, you could then pass into ChatGPT, or a GPT 3.5 Turbo or GPT 4, and you could ask the text generation service from OpenAI to create a summarization or sentiment analysis. But it's baked in to Assembly AI.
Additional Capabilities of Assembly AI
You can make an API request with an audio file and you can get back a text summary, et cetera. There are other features, is it gonna show on this page here, such as chapter detection, reduction of personal information, topic detection. They've also got the ability to highlight speakers. And so there are a number of transcription services that we use at Planet No Code where it will label different speakers. I believe that that is possible with Assembly AI.
Conclusion
So there you have it. I just wanted to do a quick summary of basically a process we've been through, or being incredibly excited and amazed with how accurate Whisper API is, but then coming across these issues, these things that were restricting what we were trying to build, and then we found Assembly AI. And I can just say I'm immensely impressed a bit. And for the project we're building, it's well worth that extra cost to be able to work with it, I suppose more leanly and with less errors in Bubble.