Almendra10
Almendra103mo ago

Almendra10 - Hey everyone,I'm currently worki...

Hey everyone, I'm currently working on a project where I need to extract content from a variety of document formats. Specifically, I'm looking for an API (maybe Buildship's endpoint) that can handle the following file types: - Text and Markdown Files: .TXT, .MD - Text Documents: .DOC, .DOCX, .ODT, .OTT, .RTF - Presentations: .PPTX, .POTX, .ODP, .OTP - Spreadsheets: .XLS, .XLSX, .XLSB, .XLSM, .XLTX, .CSV, .ODS, .OTS - HTML and XML Files: .HTML, .HTM, .ATOM, .RSS, .XML - PDF: .PDF - Diagrams and Graphics: .ODG, .OTG The goal is to send any of these documents to the API and receive the extracted content. Does anyone have experience with this on Buildship?
3 Replies
Sebastian AE
Sebastian AE3mo ago
Hey @Almendra10 ! I also saw your post on the Bubble forum. I can help you with this if you want. How do you want the extracted data to be structured? As text or in JSON?
Almendra10
Almendra103mo ago
Hey @Seb95, That would be great. Regarding text or json, I need text.
johnciech
johnciech2w ago
did you guys figure out how to do it? I might need help extract things from images and PDFs