LLM extraction method (Anthropic) - fields naming
Hey guys, I have one question regarding the LLM extraction process (Anthropic).
https://youtu.be/OJZzwcgjKW8?si=TnVDDGXjOuomUzSN 11:20 - should I specify the exact names of fields to parse?
For example, I want to crawl & scrape 100 sites (collected by Google SERP scraping, with different sitemaps, structure and complexity, various JS selectors and fields naming), containing info about marketing agencies, to get the list of all agencies (name+link). Since there are many naming options for the same fields for each website, is there any possibility to give less accurate instructions like "marketing agecy name" + "marketing agecy link", instead of "title"+"link"?
Thanks in advance!
BuildShip
YouTube
Powerful Web Scraping - Scalable, Multiple Page Crawling API for LL...
Learn how to build powerful and scaleable Web Scraping for your LLM Apps and AI Workflows with scalable API. Supports Static and Dynamic Pages, Crawler for multiple pages and AI LLM based extractor. using BuildShip - a low-code Visual Backend builder with full flexibility.
Try on Live Playground: https://llm-web-crawler.vercel.app/
Remix links ...
Solution:Jump to solution
Hey @fisa, you don't have to specify the exact names because the LLM will try it's best to deduce the data, but for more accurate results you should provide field names you know are on the page. But as your asking you can pass those field names as well just in a comma separated and slightly different format i.e:
marketing_agency_name, marketing_agency_link
...1 Reply
Solution
Hey @fisa, you don't have to specify the exact names because the LLM will try it's best to deduce the data, but for more accurate results you should provide field names you know are on the page. But as your asking you can pass those field names as well just in a comma separated and slightly different format i.e:
marketing_agency_name, marketing_agency_link