Almendra10 - Hey everyone,I'm currently worki...
Hey everyone,
I'm currently working on a project where I need to extract content from a variety of document formats. Specifically, I'm looking for an API (maybe Buildship's endpoint) that can handle the following file types:
- Text and Markdown Files: .TXT, .MD
- Text Documents: .DOC, .DOCX, .ODT, .OTT, .RTF
- Presentations: .PPTX, .POTX, .ODP, .OTP
- Spreadsheets: .XLS, .XLSX, .XLSB, .XLSM, .XLTX, .CSV, .ODS, .OTS
- HTML and XML Files: .HTML, .HTM, .ATOM, .RSS, .XML
- PDF: .PDF
- Diagrams and Graphics: .ODG, .OTG
The goal is to send any of these documents to the API and receive the extracted content. Does anyone have experience with this on Buildship?
3 Replies
Hey @Almendra10 ! I also saw your post on the Bubble forum. I can help you with this if you want.
How do you want the extracted data to be structured? As text or in JSON?
Hey @Seb95,
That would be great. Regarding text or json, I need text.
did you guys figure out how to do it? I might need help extract things from images and PDFs