Breaking Down Al Web Scraping Techniques
Explore how to generate structured JSON content with AI to intelligently scrape content from a job advert.
Unlock AI web scraping: Extract structured JSON from any URL with Claude!
Master AI-powered data extraction using Claude's tools for clean, formatted results.
Transform messy web data into structured JSON - no regex required!
Introduction to AI Web Scraping Techniques
In this Bubble tutorial video, I'm going to break down my favorite AI web scraping technique so that you can take a URL just like this. It's a job listing, and you can extract out of it core bits of data by getting a structured JSON response back from an AI. In this case we're using Anthropic's Claude, but so we're getting closing date, contract, term and salary.
Examining the Original Job Posting
Let's have a look back at the original post. Here we go. So we're extracting the contract term from here. Closing date. We're asking the AI to restructure that date and we're taking the salary. Now the salary. This is the perfect example of why using an AI in combination with a web scraper is really handy because we can't just say take the number after salary. We're using the AI's intelligence to judge what bit of text to scrape and we're using it to reformat the date.
Reviewing the Extracted Data
So let's go back to our example. We've reformatted the date and we've also got the salary and it happens to have picked the lower end. But we could of course improve our prompt, to make that better.
Introducing Planet No Code Resources
But before I dive into that, have you checked out our website? There is a link down in the description because as of this point of recording, we've got 351 Bubble tutorial videos accessible through our website, some of which are exclusive to our paid members. What do our members get? They get access to all of our videos that we've ever recorded. They get access to a no code community. And our courses. We've got more courses on the way currently. We've got build your own chat GPT clone course. And I'm working on a Bubble essentials course, which is expected to be published in just a few weeks. And you can get access to all of that by becoming a planet no code member and clicking that link down in the description to get started.
Setting Up the Web Scraper
So we're going to start with our web scraper. Now, there are a number of web scrapers that I've used over the years. Page to API is one that I've done previous videos on. But in this case I'm using a new one that I found only a few weeks ago called use scraper. And so what we need to do as a no code app developer is take their API documentation, which looks like this, and translate it into our Bubble app.
Understanding API Documentation
So we need to have an understanding of what's going on here and, and take this. What is code? Yeah, it's no code but it's code and put it into our Bubble API connector so we can see that the destination of where we're going to send data to, is going to be this URL here, this endpoint, it needs to be a post request. We need authorization, our API token, our API key in the header, in the data section we need to send across the URL we wish to scrape and the format.
Configuring the API Connector in Bubble
And this is a really common layout for most API documentation because if I scroll down it tells us exactly what we can swap in and out. So we can say I want the results in markdown. That's what I'm going to go for. And if I go back into our Bubble editor, I'm in plugins. I've added in the API connector. You can get that by just clicking add plugins if you're completely new to Bubble. And then I've added in an API.
Setting Up API Authentication
And so I've labeled it use scraper and I'm saying private key in header authorization. And then in that box which I've blurred out, I've got my bearer. Well have I got bearer? Well, let's just have a look. I've got to put the word bearer in front of my token. Okay.
Creating the API Call in Bubble
And then I've added an API call and I've labeled this scrape webpage in Markdown. I've set it as an action because I want to run this in a workflow. And then I say, well it's a post and it goes to that endpoint which I highlighted in the documentation just a moment ago. And then in body all I've done is paste, everything within the opening speech marks there.
Inserting Dynamic Values in the API Call
Let's copy that, paste it in there. And then Bubble allows me to insert dynamic values into the code. So I've opened up triangle brackets and I've created like a merge field or a merge tag, called it URL that then opens up this box down here and I've put in the URL of the job posting just down here and I've unchecked it from private because this is data that my users can have access to.
Securing API Keys in Bubble
The flip side of that is my API key. I need to ensure that that is marked as a private key in header because I don't want my Bubble app users to have access to my private key.
Testing the Web Scraper
I've then initialized it and in fact let's run this and let's see what happens. So it's scraped the web page and we then get back this text section here and so this is the markdown of the webpage containing all of those bits of vital details.
Introducing the LLM for Data Processing
Now it's unstructured here and it's a mess and that's where we're going to use the LLM, the large language model and in this case we're using Claude that's so we're going to use Claude to tidy it up.
Setting Up the Anthropic Claude API
So I've scrolled up because we need to make another API connection, and this time we're connecting to Anthropic's Claude, which if you haven't checked it out, is really worthwhile. You may have heard of OpenAI, GPT 4.0, very popular at the moment. Well, Claude is the hottest LLM of the week. Changes every week, doesn't it?
Advantages of Using Claude for Structured Data
And the reason I'm using Claude is because Claude makes it very easy to get structured data back. Let me show you what I mean. I'm going to explain, Oh, let's get rid of that. I'm going to explain every step of this as I go. But right now if I click initialize, we get back structured content. And it isn't just your regular chat message from an LLM, it is particular fields.
Examining the Structured JSON Response
So we get back, these fields here, it's probably easier if I go down to raw data. Here we go. So we get back our closing date, our contract term, and our salary. Again, it's not a chat message, it's structured JSON. And this is what makes it so easy now in the Bubble app to say display closing date or display contract term.
Advantages of Structured JSON Data
I'm not having to, I'm not getting back. Here is your job advert here, are the fields and having to then manually extract out, just not as reliable as it could be. I'm getting back structured JSON data. I'm going to click save there.
Explaining the Claude API Setup
So let me explain what's going on here. Well, firstly, this is a fairly standard, the top half of here set up for making an API call to Claude. Now we've got many videos on OpenAI at Claude and indeed the Bubble API connector that you can check out, but I will explain some of it briefly.
Configuring API Headers for Claude
So label it Claude private key and header. And I've done the same thing with the Claude API documentation. I've translated it into my Bubble app. So instead of authorization they want x API key, they want this additional shared header of the Anthropic version.
Setting Up the API Request
We are then making a request, the post request to this endpoint It's set as action. Now let me break this bit down for you, and I think it's easier to show you if I go over to this view.
Configuring the Claude API Parameters
So if I collapse tools, and then tool choices. This looks fairly familiar if you've ever worked with Claude. I'm saying use this model, maximum tokens. And then all I'm saying is what have I got down there? Why have I said messages test? Oh, that, yeah, we've just got a single message in place there.
Understanding Tool Functions in Claude API
Now it's the tool function or function calling. If you're coming from OpenAI, that makes it possible to get back a structured JSON response. Let's break it down.
Defining the Tool Function for Job Details Extraction
So I have tools and then I open up this JSON object and I give it a name, extract job details, I'm just giving it a sensible name for myself and then I describe it. And it is important what words you choose here, because you are trying to nudge the AI in the right direction using kind of human readable, understandable terms.
Specifying Input Schema for Claude API
So I'm saying extract job details from a job advert using well structured JSON. I then have another part, another parameter in the JSON and I say input schema. And I then say it's an object and properties. Now this is where it really matters.
Defining Properties for Job Details Extraction
So if I close that within properties, here is where I'm defining the different fields parameters that I want back from Claude. And so I'm saying get the closing date, it's type is string, its description, I'm saying the closing date of the job application. And then I'm dictating the format that I want the reply. And then basically you do the same thing for the others. I say contract term, and then I give a description again. Prompt engineering is at work here. I'm trying to make it as easy as possible for the LLM, for Claude to understand what I mean and what to extract.
Specifying Required Fields in the API Request
I then mark them all down here as required. And I say tool choice type tool, name. And then this name has to match what I put up here. And what we've got here between line 32 and 35. Is described in the Claude documentation as our way of saying we don't want a chat message or a mix of chat and JSON back, we just want the output of the tool as we've described higher up in the prompt.
Referencing Claude API Documentation
Now I'm going to just go into the Claude API documentation and show you how I've built this. And so you can kind of start from the right foundation and add your own parameters in the. I think this page here is the most helpful breakdown of how to insert a tool into your Claude API request.
Comparing API Documentation to Bubble Implementation
So you'll notice that this part here is very similar to what I've got in the API connector. And so all I've done is expanded on the property fields here. So at the moment it's saying location and then I've just added in, if I go back to the editor, in fact, let me go back to that beautified, JSON.
Adding Custom Properties to the API Request
So yeah, we've got properties and I've just added in the closing date, contract, term and salary. But you can see that the lines eleven through to 20, 23, are effectively what you have here.
Tips for Building API Requests in Bubble
So I'd advise that you start with here, you start with this bit of code, you copy this into your Bubble API connector, and then you start adding in your additional properties. Now I found in kind of debugging my own errors with using JSON that JSON validates was very helpful because I would miss a comma or a speech mark there and I wouldn't close some curly braces.
Handling JSON Syntax in Bubble
When you're working with JSON. It's so, it's a very sensitive syntax. If you make a mistake, you're going to get an error. So I'm I made my mistake was just adding in six different properties all in one go and then running it and realizing I've made a mistake in one. So I would really build this in Bubble in a manner of add something in, test it, add in a little bit more, test it again.
Setting Up the Bubble Page Elements
So how does this all plug into Bubble? Well, if I go back into my page, I've got a input here, I've got a button and I've got a text label. Let me break that down for. So my only workflow on the page is on the button and I'm going to say edit workflow to view it.
Creating the Workflow for Web Scraping
So what I've got here is I've got use web scraper and this is that first API call we looked at. And the way I get to that, once I've initialized it all through the Bubble API connector, is I can either search for it here or it's going to be under
Get the Complete Bundle for Just $99
Access 3 courses, 390+ tutorials, and a vibrant community to support every step of your app-building journey.
Start building with total confidence
No more delays. With 30+ hours of expert content, you’ll have the insights needed to build effectively.
Find every solution in one place
No more searching across platforms for tutorials. Our bundle has everything you need, with 390+ videos covering every feature and technique.
Dive deep into every detail
Get beyond the basics with comprehensive, in-depth courses & no code tutorials that empower you to create a feature-rich, professional app.
Save over 70%!
Valued at $80
Valued at $85
Valued at $30
Valued at $110
Valued at $45
Can't find what you're looking for?
Search our 300+ Bubble tutorial videos. Start learning no code today!
Have questions?
We have answers!
Find answers to common questions about our membership plans, programs, and more.
We're here to help you launch your no code SaaS. Reach out to the team and we'll double check our vast library for useful content. We'll advise you on how we'd tackle the same problem and there's a good chance we'll record the video to help the wider community.
As a Planet No Code member, you'll receive a discount on our Bubble coaching sessions. Monthly members receive a 10% discount, while Annual members receive a 17.5% discount. To redeem your discount, simply log into your account and book a coaching session through our platform.
Our 8-week intensive mentorship program is designed to provide personalized guidance and support to help you accelerate your startup journey. You'll be matched with a startup expert who will work with you one-on-one to set goals, overcome challenges, and make rapid progress.
To apply for the Mastery Program, simply click the "Request Invitation" button on our pricing page and fill out the application form. Our team will review your application and schedule a call with you to discuss your goals and determine if the program is a good fit for your needs.
We accept all major credit cards, including Visa, Mastercard, American Express, and Discover.
While we don't offer a free trial, we do provide a 14-day money-back guarantee. If you're not completely satisfied with your membership within the first 14 days, simply contact our support team, and we'll issue a full refund.