opaidios
BBuildShip
•Created by opaidios on 1/26/2024 in #❓・buildship-help
Puppeteer
Right now puppeteer is POSTing to the provided urls and it's supposed to extract the entire HTML source, however it's only returning text.
Is there a function included with puppeteer through buildship where I can visit the urls with a headless browser instead of API call?
I've attached buildships POST data response & the html source that I need
Thank you
3 replies
BBuildShip
•Created by opaidios on 1/26/2024 in #❓・buildship-help
Puppeteer
For your future low code scraping users, it'd be great to have an option to put a list into the crawler.
Had to change url to urls and input an array instead of string.
I used GPT to edit the node logic & it was simply adding this;
for (const url of urls) {
const response = await fetch("https://puppeteer.buildship.run/v1/crawl", {
method: "POST",
headers: {
"Content-Type": "application/json"
},
body: JSON.stringify({
url, selector, maxRequestsPerCrawl, maxConcurrency, proxyUrls
})
});
3 replies
BBuildShip
•Created by opaidios on 1/23/2024 in #❓・buildship-help
Buildship Scraping Help
I need to scrape websites by cycling through different variations of the same URL(I provide the URL extension), but I can't figure out how to get the crawler to visit a root website (ex: www.google.com) and then cycle through the URL extensions (www.google.com/1, /2, /3, /4, /5) extracting the data attached to the selector.
Thank you in advance
9 replies