Buildship Scraping Help
I need to scrape websites by cycling through different variations of the same URL(I provide the URL extension), but I can't figure out how to get the crawler to visit a root website (ex: www.google.com) and then cycle through the URL extensions (www.google.com/1, /2, /3, /4, /5) extracting the data attached to the selector.
Thank you in advance
5 Replies
Hi @opaidios, Currently there is no option to directly cycle through different pages. Maybe we can implement something like this https://github.com/puppeteer/puppeteer/blob/1e66d332b8faf6a15803c0ad36178e56d4dadf7b/docs/api.md#pagewaitforselectorselector-options. @Deepanshu is it feasible to support this in our current Puppeteer runner?
GitHub
puppeteer/docs/api.md at 1e66d332b8faf6a15803c0ad36178e56d4dadf7b ·...
Node.js API for Chrome . Contribute to puppeteer/puppeteer development by creating an account on GitHub.
@opaidios can you tell which node you are using to crawl the website?
@Deepanshu I haven't actually used a node, but the one I've attempted was 'crawler'. I will try other things today.
thank you guys for looking into this
I had the AI write a node to send data via webhook to Make.com, hopefully there is a workaround. I tried running crawler in parallel, but it can't select URL extension from a list of items.
Try adding the list of urls in the as items and loop through it. Use give a try using loop node and inside add crawler.
here's the catch;
the base url is "https://www.google.com/maps/search/?api=1&query=gym&query_place_id="
& the data that I've extracted so far looks like this
[
{
"lat": 6.2476376,
"lng": -75.56581530000001,
"place_id": "ChIJBa0PuN8oRI4RVju1x_x8E0I"
},
{
"lat": 6.2073151,
"lng": -75.57068579999999,
"place_id": "ChIJA-ZKeCooRI4RsT_eKlHovT8"
},
{
"lat": 6.211902599999998,
"lng": -75.5652279,
"place_id": "ChIJOx9p1igoRI4RzNB2nThQF0M"
}]
I need to add the place_id to the end of the url and extract the data that way
I sent the data to a replit script to return a list of urls instead with the place_id, I will try crawler again with new data