Node escaping backslashes
Hi guys,
I have the pdf to text extraction node. Then I pass that text as a string to the next node (custom node) where I want to split the text using langchain js. The problem is; the method I want to use is split first on "\n\n", then on "\n", then on " ". The node does however seem to escape backslashes from the provided string.
Any solution to this?
Solution:Jump to solution
Hi @Sleetza, The html text input escapes new lines by default
you'll have to paste the text in the expanded text-area to preserve the new lines.
It is not directly possible to make
<input type="text" />
to support new lines....8 Replies
Anyone? This is a really big deal for building LLM applications with Buildship.
@Gaurav Chadha any clue?
@Sleetza, can you please share an example of the node logic?
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import axios from 'axios';
import pdf from '@cyber2024/pdf-parse-fixed';
export default async function pdfToTextAndChunk({
pdfUrl,
chunkSize,
chunkOverlap
}) {
// Step 1: Extract text from the PDF
let dataBuffer;
if (pdfUrl) {
const response = await axios.get(pdfUrl, {
responseType: 'arraybuffer'
});
dataBuffer = response.data;
} else {
throw Error("You must specify a PDF URL.");
}
const data = await pdf(dataBuffer);
const text = data.text;
// Step 2: Chunk the extracted text
const splitter = new RecursiveCharacterTextSplitter({
chunkSize,
chunkOverlap,
separators: ["\n\n", "\n", " ", ""],
});
// Wrap the input text in backticks to handle it as a template literal
const templateLiteralText =
${text}
;
// Pass the template literal text to createDocuments
const output = await splitter.createDocuments([templateLiteralText]);
// Extract the pageContent from each document to get the chunks
const chunks = output.map(doc => doc.pageContent);
return chunks;
}
If I paste a string that contains linebreaks into a text field of any node, and later copy that same text out of that text field, the line breaks are gone. I need them for processing.Solution
Hi @Sleetza, The html text input escapes new lines by default
you'll have to paste the text in the expanded text-area to preserve the new lines.
It is not directly possible to make
<input type="text" />
to support new lines.thx
I still get an error related to an input not being a string. In test mode I open the input2 expanded text area (set to text) and then paste in a string (result of another node). Then I run the node but it fails because input2 is not a string. Here is the node:
It doesnt work in the full workflow either when selecting a variable that is an output string of an earlier node in the workflow.
I altered the code to log the type of input2:
// Node Logic
export default async function enrichJSON(input1, input2, jsonString) {
// Log the type of input2
console.log(
Type of input2: ${typeof input2}
);
// Check if input2 is a string
if (typeof input2!== 'string') {
// Create a custom error message including the type of input2
const errorMessage = Error: input2 must be a string. Received type: ${typeof input2}
;
console.error(errorMessage);
return errorMessage;
}
// Step 1: Convert input2 to an array of lines without line numbers
const linesArray = input2.split('\n').map(line => line.replace(/^\sLINE\d{4}\s/, ''));
// Step 2: Process the JSON object
const jsonObject = JSON.parse(jsonString);
jsonObject.werkervaring.forEach(workExperience => {
const startLineIndex = linesArray.findIndex(line => line.startsWith(workExperience.omschrijvingStartLine));
const endLineIndex = linesArray.findIndex(line => line.startsWith(workExperience.omschrijvingEndLine));
if (startLineIndex!== -1 && endLineIndex!== -1) {
const omschrijvingText = linesArray.slice(startLineIndex, endLineIndex + 1).join('\n');
workExperience.omschrijving = omschrijvingText;
}
});
// Step 3: Return the updated JSON object
return JSON.stringify(jsonObject, null, 2);
}
It says the provided string is an objectHi @Sleetza, A new LangChain Text Splitter is now added you can use that instead. https://discord.com/channels/853498675484819476/1237151066630525020/1237737611305418793