Sleetza
Sleetza6mo ago

Node escaping backslashes

Hi guys, I have the pdf to text extraction node. Then I pass that text as a string to the next node (custom node) where I want to split the text using langchain js. The problem is; the method I want to use is split first on "\n\n", then on "\n", then on " ". The node does however seem to escape backslashes from the provided string. Any solution to this?
Solution:
Hi @Sleetza, The html text input escapes new lines by default you'll have to paste the text in the expanded text-area to preserve the new lines. It is not directly possible to make <input type="text" /> to support new lines....
Jump to solution
8 Replies
Sleetza
Sleetza6mo ago
Anyone? This is a really big deal for building LLM applications with Buildship. @Gaurav Chadha any clue?
Gaurav Chadha
Gaurav Chadha6mo ago
@Sleetza, can you please share an example of the node logic?
Sleetza
Sleetza6mo ago
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter"; import axios from 'axios'; import pdf from '@cyber2024/pdf-parse-fixed'; export default async function pdfToTextAndChunk({ pdfUrl, chunkSize, chunkOverlap }) { // Step 1: Extract text from the PDF let dataBuffer; if (pdfUrl) { const response = await axios.get(pdfUrl, { responseType: 'arraybuffer' }); dataBuffer = response.data; } else { throw Error("You must specify a PDF URL."); } const data = await pdf(dataBuffer); const text = data.text; // Step 2: Chunk the extracted text const splitter = new RecursiveCharacterTextSplitter({ chunkSize, chunkOverlap, separators: ["\n\n", "\n", " ", ""], }); // Wrap the input text in backticks to handle it as a template literal const templateLiteralText = ${text}; // Pass the template literal text to createDocuments const output = await splitter.createDocuments([templateLiteralText]); // Extract the pageContent from each document to get the chunks const chunks = output.map(doc => doc.pageContent); return chunks; } If I paste a string that contains linebreaks into a text field of any node, and later copy that same text out of that text field, the line breaks are gone. I need them for processing.
Solution
Gaurav Chadha
Gaurav Chadha6mo ago
Hi @Sleetza, The html text input escapes new lines by default you'll have to paste the text in the expanded text-area to preserve the new lines. It is not directly possible to make <input type="text" /> to support new lines.
Sleetza
Sleetza6mo ago
thx
Sleetza
Sleetza6mo ago
I still get an error related to an input not being a string. In test mode I open the input2 expanded text area (set to text) and then paste in a string (result of another node). Then I run the node but it fails because input2 is not a string. Here is the node:
Sleetza
Sleetza6mo ago
It doesnt work in the full workflow either when selecting a variable that is an output string of an earlier node in the workflow. I altered the code to log the type of input2: // Node Logic export default async function enrichJSON(input1, input2, jsonString) { // Log the type of input2 console.log(Type of input2: ${typeof input2}); // Check if input2 is a string if (typeof input2!== 'string') { // Create a custom error message including the type of input2 const errorMessage = Error: input2 must be a string. Received type: ${typeof input2}; console.error(errorMessage); return errorMessage; } // Step 1: Convert input2 to an array of lines without line numbers const linesArray = input2.split('\n').map(line => line.replace(/^\sLINE\d{4}\s/, '')); // Step 2: Process the JSON object const jsonObject = JSON.parse(jsonString); jsonObject.werkervaring.forEach(workExperience => { const startLineIndex = linesArray.findIndex(line => line.startsWith(workExperience.omschrijvingStartLine)); const endLineIndex = linesArray.findIndex(line => line.startsWith(workExperience.omschrijvingEndLine)); if (startLineIndex!== -1 && endLineIndex!== -1) { const omschrijvingText = linesArray.slice(startLineIndex, endLineIndex + 1).join('\n'); workExperience.omschrijving = omschrijvingText; } }); // Step 3: Return the updated JSON object return JSON.stringify(jsonObject, null, 2); } It says the provided string is an object
Gaurav Chadha
Gaurav Chadha6mo ago
Hi @Sleetza, A new LangChain Text Splitter is now added you can use that instead. https://discord.com/channels/853498675484819476/1237151066630525020/1237737611305418793