Introduction to Extracting Data from Text in Bubble.io
This Bubble tutorial video is for anyone who isn't a Reject Wizard and has been struggling to find a way to extract data from a large portion of text. In this case, we're imagining that we've passed in an email and all of the email's metadata and content has just come through in a single text field. So how do we go about extracting it? Well, this is a little technique that I've developed when I was faced with a similar problem to this a few weeks ago.
Using the Split By Function in Bubble.io
I can refer to my multiline text input here so that we can see the data, but you could use this if the data was coming in through an API. So if I do that and then click preview, you'll see that everything is carried across, but I can use the split by to target specific parts in the data in our large text data. So I can say "from" and I put space in, and so that's going to split the text around "from: " and so the first bit of data is going to be blank but the second bit of data is going to be everything from that point onwards.
Extracting Specific Data from Text
So I can go item 2 but that leaves okay let's prove it let's just test everything's working. Okay, that's going to provide me with the rest of the text expression. I can then do split by again and this time I can do space and I can do the arrow. Basically, what I'm looking for is something that isn't going to change, so that's going to be fixed every time this piece of text comes through. So "from" is going to be a label that is going to be fixed every time and also the formatting of the email address is going to be fixed every time.
Extracting the Sender's Name
If I want to extract the name of the sender, I can do this which is to have the space, have the arrow, and then this time it's splitting the message at this point and so everything before is part one and everything after or at least until we get to another triangle bracket is one two three in our list. So I can just do first item and there you go, you see it has reliably extracted the business name or the sender name from the from field.
Extracting Order Numbers from Text
Let's do another example. Let's say I want to extract the order number. So again, I look for structure that's going to be consistent. So I can do split by, refer to the multi-line text input and split by okay. And so I can be fairly confident that the hypothetical scenario here is that I want to take details from an order email that's been passed into my Bubble app and I can assume that the order email subject line is unlikely to change apart from the beginning bit, apart from the number and that's the bit I want to extract.
Refining the Extraction Process
I can go, I mean if I wanted to really be detailed, I'm kind of thinking of are there possibly any scenarios where what I'm splitting by is going to appear elsewhere, but by making the split by text separator larger, I kind of narrow down the chance that it's going to not work. So I can go "order ID" and I can even do that space and the hash and then in this instance I can just go item 2 and there's nothing left at all. Hit preview okay so order number is what I've entered in and it extracts the order number reliably.
Handling Multiple Lines in Text Extraction
Let's make it a little bit more interesting as a final example. Let's put space in and then we can say put in a piece of text. Okay, so how now would I extract the order number? Because if I go on to preview it's going to do everything after this and so that's why I end up with the two lines. Well, you can split by a new line. So I'm still going to target item 2 because that's everything after my split by text separator and then I can split by again and this time I can split by doing two line spaces.
Conclusion: Reliable Text Extraction Techniques
Oh, and I need to choose the first item. So I'm saying when you find two line spaces as we've got here, go with the first item. So that's everything before the two line spaces and there we have it. So there's just a couple of techniques that I found to allow you to extract data from a larger volume of text and take out exactly what you need as long as you've got those fairly consistent. They need to be 100% consistent labels or basically locations in the text volume that you're working with in order to be able to target reliably what you're trying to extract.