In this article, I'll show you how you can create a Puppeteer PDF document from a highly-designed React web page usingNode.jsNode.js is an event-driven, asynchronous JavaScript runtime and is most effective in building scalable web applications. Node.js is block-free, so there is no way to block a process., Headless Chrome y Docker.
Background: A few months ago, one of our clientsRisingStackasked us to develop a feature that would allow the user to request a React page in PDF format. This page is basically a report/output for patients with data view that contains many SVGs. Also, there were some special requests to manipulate the layout and do some rearrangements of the HTML elements. Therefore, the PDF should have a different layout and additions compared to the original React page.

Since the task was a bit more complex than could be solved with simple CSS rules, we first looked at possible implementations. Essentially, we found 3 main solutions. This blog post will walk you through these possibilities and the final implementations.
A personal note before we begin: it's a lot of work, so buckle up!
Index:
- Client-side or backend PDF generation?
- Option 1: Take a screenshot of the DOM
- Option 2: Use only a PDF library
- Last option 3: Titiritero, Headless Chrome with Node.js
- manipulation of silence
- Send the file to the client and save
- Using Puppeteer with Docker
- Option 3+1: CSS-Druck Rule
- resume
Client-side or server-side PDF generation?
A PDF file can be generated both client side and server side. However, it probably makes more sense to leave this on the backend, since you don't want to consume all the resources that the user's browser provides.
However, I will still show solutions for both methods.
Option 1: Take a screenshot of the DOM
At first glance, this solution seemed the simplest, and it turned out to be true, but it has its own limitations. If you don't have special requirements like B. selectable or searchable text in the PDF, this is a good and easy way to create one.
This method is simple: take a screenshot of the page and paste it into a PDF file. Very easy. We use two packages for this approach:
html2canvasto take a screenshot of the DOM
jspdf, a library for generating PDF
Let's start coding.
npm install html2canvas jspdf
{ const domElement = document.getElementById('your-id') html2canvas(domElement, { onclone: (document) => { document.getElementById('printbutton').style.visibility = 'hidden' }}) .then ((canvas) => { const img = canvas.toDataURL('image/png') const pdf = new jsPdf() pdf.addImage(imgData , 'JPEG', 0, 0, Breite, Höhe) pdf.save (' Your-date-name.pdf')})
And it is that!
make sure to see themhtml2canvas
uncle
Method. This can be useful when you need to take a quick snapshot and manipulate the DOM (for example, hide the push button) before capturing the image. I see quite a few use cases for this package. Unfortunately ours wasn't one because we had to do the PDF creation in the backend.
Option 2: Use only a PDF library
There are several libraries out thereMNPnpm is a software registry that hosts over 1.3 million packages. npm is used by open source developers around the world, as well as by many companies for sharing and borrowing code. npm consists of three components: the registry command line interface (CLI) website Use the website to discover and download packages, create user profiles, and...for this purpose, like jsPDF (mentioned earlier) thePDFKit🇧🇷 The problem with this was that you would have to rebuild the page structure if you wanted to use these libraries. This definitely hurts maintainability as you would have to apply any subsequent changes to the PDF template and react to the page.
Check the code below. You must create the PDF document manually. Now you can go through the DOM and figure out how to translate each element to PDF elements, but that's tedious work. There must be an easier way.
doc = new PDFDocumentdoc.pipe fs.createWriteStream('output.pdf')doc.font('fonts/PalatinoBold.ttf') .fontSize(25) .text('Some text with embedded font!', 100, 100 ) doc .image('path/a/image.png', { fit: [250, 300], align: 'center', align: 'center'}); doc.addPage() .fontSize(25) .text('Here are some vector graphics...', 100, 100) doc.end()
This snippet is taken from the PDFKit documentation. However, it can be useful if your goal is a PDF file and not to convert an existing (and constantly changing) HTML page.
Last option 3: Titiritero, Headless Chrome with Node.js
What ispuppet🇧🇷 The documentation says:
Puppeteer is a node library that provides a high-level API for controlling Chrome or Chromium through the DevTools protocol. Puppeteer runs command-free by default, but can be configured to run entirely (not command-free) Chrome or Chromium.
It's basically a browser that you can run from Node.js. When you read the documentation, the first thing it tells you about Puppeteer is that you can use itCaptures of Gere cloth and PDF of pages🇧🇷 Great! This is what we were looking for.
Let's install Puppeteer with itnpmi i puppeteer
, and we implement our use case.
const puppeteer = require('puppet') asynchronous function printPDF() { browser const = guardpuppeter.launch({ headless: true }); page const = table in browser.newPage(); await page.goto('https://blog.risingstack.com', {waitUntil:'networkidle0'}); const pdf = await page.pdf({ format: 'A4' }); wait in browser.close(); zurück pdf})
This is a simple function that navigates to a URL and outputs a PDF of the site.
First, we launch the browser (PDF generation is only supported in headless browser mode), then open a new page, set the viewport size, and navigate to the specified URL.
defining thewait until: 'network idle0'
The option means that Puppeteer will consider browsing complete if there are no network connections for at least 500ms. (TestAPI documentFor more information.)
We then save the PDF to a variable, close the browser, and return the PDF.
notice thatpage.pdf
method gets aoptions
Object where you can also save the file to disk using the "Path" option. If no path is specified, the PDF will not be saved to disk, but will get a buffer. I'll discuss how to handle this later.)
If you need to login first to generate a PDF from a secure page, you must first navigate to the login page, verify the form elements for ID or name, complete them, and submit the form:
waitpage.type('#email', process.env.PDF_USER)waitpage.type('#password', process.env.PDF_PASSWORD)waitpage.click('#submit');
Always store credentials in environment variables, don't hardcode them!
manipulation of silence
Puppeteer also has a workaround for this style manipulation. You can insert style tags before generating the PDF and Puppeteer will generate a file with the modified styles.
await page.addStyleTag({contenido: '.nav {display: none} .navbar {borde: 0px} #print-button {display: none}' })
Send the file to the client and save
Ok, now you have generated a PDF file in the backend. And now?
As I mentioned above, if you don't save the file to disk, you'll get a buffer. You just need to send this buffer to the interface with the correct content type.
printPDF().then(pdf => {res.set({ 'Content-Type': 'aplicativo/pdf', 'Content-Length': pdf.length })res.send(pdf)})
Now you can send a request to the server to get the generated PDF.
function getPDF() { return axios.get(`${API_URL}/your-pdf-endpoint`, { responseType: 'arraybuffer', headers: { 'Accept': 'application/pdf' } })
After sending the request, the buffer should start to flush. The final step now is to convert the buffer to a PDF file.
savePDF = () => { this.openModal('Carregando…')// open the modalgetPDF() volver// API call.then((resposta) => { const blob = new Blob([resposta.data], {tipo: 'aplicativo/pdf'}) const link = document.createElement('a') link.href = janela.URL. createObjectURL(blob) link.download = `Ihr-Dateiname.pdf` link.click() this.closeModal()// Close modal}) .catch(error =>/** Error handling **/) }
<button onClick={this.savePDF}>Also speaks PDF</button>
Is that! If you click the Save button, the browser will save the PDF.
Using Puppeteer with Docker
I think this is the trickiest part of the implementation, so I'll save you a few hours of googling.
The official documentation says so."Getting Headless Chrome to work on Docker can be tricky"🇧🇷 Official documents have aTroubleshootingSection where, at the time of writing this article, you can find all the information you need to install Puppeteer with Docker.
When installing Puppeteer on an Alpine image, make sure to scroll down a bitthis part of the page🇧🇷 Otherwise, you can ignore the fact that you can't run the latest version of Puppeteer, and you should also disable the use of shm using a flag:
const browser = await titiritero.launch({ headless: true, args: ['--disable-dev-shm-usage']});
Otherwise, the Puppeteer thread may run out of memory even before it starts successfully. For more information on how to do this, see the troubleshooting link above.
Option 3+1: CSS-Druck Rule
One might think that simply using CSS print rules is easy from a developer's point of view. No NPM or Node modules, just plain CSS. But how are they doing when it comes to cross-browser compatibility?
When choosing CSS print rules, you need to test the output in each browser to make sure it delivers the same layout, and it's not 100% true.
For example, inserting a pause after a certain element cannot be considered an esoteric use case, but you might be surprised to have to use workarounds.to make this work in firefox.
Unless you are a seasoned CSS wizard with a lot of experience creating printable pages, this can be time consuming.
Printing rules are great when you can keep your printing style sheets simple.
Let's see an example.
@media print { .print-button { pantalla: nenhum; } .content div { break-after: sempre; }}
This CSS above hides the print button and inserts a page break after each onedivision
with the classContent.
there is onegood articlewhich summarizes what you can do with print rules and the difficulties with them, including browser compatibility.
In general, CSS print rules are great and effective when you want to create a PDF from a not-so-complex page.
Summary: Puppeteer PDF from HTML using Node.js
So, let's quickly review the options we've covered here for generating PDF files from HTML pages:
- SUN screenshot: This can be useful when you need to take pictures of a page (for example, to create a thumbnail), but it is insufficient when you have a lot of data to process.
- Use only a PDF library: If you need to programmatically create PDF files from scratch, this is a perfect solution. Otherwise, you will have to keep the HTML and PDF templates, which is definitely not possible.
- puppet: Although Docker was relatively difficult to work with, it provided the best result for our use case and was also the easiest to code.
- CSS printing rulesNote: If your users are skilled enough to know how to print to a file, and your pages are relatively simple, this may be the easiest solution. As you saw in our case, this was not the case.
Be sure to contact RisingStack if necessary.Help with Node, React or just JS in general.
Have fun with your PDF HTML!
FAQs
How to generate PDF using puppeteer in node js? ›
Now that you have everything installed and your template in place, it's time to create the PDF! For this, you need to tell puppeteer to launch a new chrome instance. Then you will open a new page and set the content to your HTML. After that, you can either create a Buffer of the PDF file or simply save it to disk.
How to generate PDF in NodeJS ejs and HTML PDF? ›Type localhost:3000/generateReport in the URL box. Hit the send button. You will get the response as File created successfully.
How to generate dynamic PDF in node js? ›- Create a new directory. mkdir pdfGenerator && cd pdfGenerator.
- Create a new React App with. create-react-app client. ...
- Create an Express server with. mkdir server && cd server && touch index.js && npm init. ...
- Open two different terminals: First one: go into the client directory and run npm start.
...
The steps are discussed below.
- Create a new window using the 'window. open' method.
- Write the innerHTML for our <div> tag, inside that window.
- Print the window.
- Close the window.
On a Windows computer, open an HTML web page in Internet Explorer, Google Chrome, or Firefox. On a Mac, open an HTML web page in Firefox. Click the “Convert to PDF” button in the Adobe PDF toolbar to start the PDF conversion. Enter a file name and save your new PDF file in a desired location.
How do I extract data from a PDF automatically? ›- Collect a batch of sample documents to serve as a training set.
- Train the automated software to extract the data according to your needs.
- Test and verify.
- Run the trained software on real documents.
- Process the extracted data.
- Step 1: Create a C# Project. ...
- Step 2: Install the IronPDF Library. ...
- Step 3: Add the IronPDF Namespace. ...
- Step 4: Create PDF Documents Programmatically.
Due to its support for multiple operating systems, platforms, browsers, and programming languages, Selenium is relatively more complex and customized than Puppeteer. The installation and configuration of Selenium Grid and Selenium WebDriver are non-trivial compared to Puppeteer with npm or Yarn.
Can we generate HTML from PDF? ›Open the file you want to convert in your PDF editor. Select the Create & Edit button on the right-side toolbar. Click Export PDF at the top of the window. Choose HTML Web Page and select your options.
How do I extract text from a PDF in node? ›- PDF Text Extract. Extract text from pdfs that contain searchable pdf text. The module is wrapper that calls the pdftotext command to perform the actual extraction.
- Installation. npm install --save pdf-text-extract. You will need the pdftotext binary available on your path. ...
- Usage. As a module. ...
- Test. # install dev dependencies.
How to convert ejs to PDF in nodejs? ›
- Add this files do any directory.
- Run npm install.
- Run npm start to start server (port 3000)
- Run npm run print to create report.pdf file.
Using the html2pdf Library
html2pdf is a JavaScript library used to convert any webpage into a pdf. We can add this library using cdnjs in the script of the HTML file. It is easy to use, and it's a complete client-side library.
Adobe Document Services provide APIs to automate PDF creation, manipulation, content extraction, and document generation in any application. The automate task helps build document workflows for invoice and report creation, content processing and republishing, data analysis, and more.
Is Dynamic PDF free? ›It is a free tool to convert HTML documents to PDF or merger multiple PDFs together.
Which is the best JavaScript library to convert HTML to PDF? ›jsPDF is one of the best libraries to convert HTML to PDF using JavaScript.
Which is the best HTML to PDF library JavaScript? ›- pspdfkit. PDF viewing and editing in the browser. 18K. ...
- pdf. pdfmake. Client/server side PDF printing in pure JavaScript. ...
- pdfjs-dist. Generic build of PDF.js library. 1.8M. ...
- pdfkit. A JavaScript PDF generation library for Node and the browser. 483K. ...
- pl. pdf-lib. ...
- hp. html-pdf. ...
- jsp. jspdf. ...
- rnh. react-native-html-to-pdf.
- Step 1 - Download PDF. js Express Viewer. You can click here to download PDF. ...
- Step 2 - Integrate PDF. js Express into your application. Move the extracted PDF. ...
- Step 3 - Use PDF. js Express Viewer APIs to customize the viewer.
- Open Acrobat.
- Choose Export To from the File menu.
- Select HTML Web Page.
- Your PDF will automatically convert and open in your default web browser.
...
Convert web pages with the Acrobat plug-in.
- Open the HTML web page in your Microsoft, Google, or Mozilla browser.
- Select Convert to PDF in the Adobe PDF toolbar.
- Name the file and save the new PDF file in your desired location.
- var doc = new jsPDF();
- doc. text(10, 10, 'Hello world! ');
- doc. save('hello-world. pdf');
What is the easiest way to extract data from a PDF? ›
The most basic method of extracting data from a PDF file to Excel is to simply copy and paste. This consists of opening the file, selecting the relevant text, and copying and pasting it into an Excel sheet. This method may be the best option if you only have a few PDF files.
How to read data from PDF file using JavaScript? ›How to Read a Local PDF File in JavaScript. Before doing any operations on our PDF document, we have to get the document from the user. Reading any file in the browser can be handled by FileReader web API. First, we'll make and file input button and then process the uploaded file using the FileReader web API.
What is the easiest way to extract pages from a PDF? ›- Open the Organize Pages tool.
- Click the Select a File button.
- Open a PDF you want to extract pages from.
- Select Extract in the top menu.
- Highlight pages you want to extract.
- Click Extract to extract the selected pages.
- Save your new PDF.
HP Wolf Security has discovered a new technique to deliver keylogger malware to target systems, utilizing both PDF and Word files.
How do I create a PDF with node? ›- Step 1 - install the pdf creator package using the following command. ...
- Step 2 - Add required packages and read HTML template. ...
- Step 3 - Create your HTML Template. ...
- Step 4 - Provide format and orientation as per your need. ...
- Step 5 - Provide HTML, user data and PDF path for output.
PDFBlade. PDFBlade's API allows for the conversion of URLs and plain HTML into PDF files. It's pretty customizable, with various options for outputting PDFs including: 12+ different page formats.
What is better than Puppeteer? ›Playwright vs Puppeteer
Although, one of the ground-breaking differences lies in Playwright's ability to offer cross-browser support. Other than only controlling headless chrome browsers, you can now use codes across WebKit and Firefox too! This is by far one of the best alternatives to Puppeteer.
With its high-level API control over Chrome and Chromium, Puppeteer offers great control over these browsers and is comparatively faster than other testing tools, even Selenium.
Which tool is best for Selenium? ›JUnit is a popular framework that offers in-built reporting for the Selenium Tests. The JUnitHTMLReporter within the JUnit Reporting tool provides detailed reports about the test results.
How to embed PDF in HTML? ›Using an iframe tag is the second way to embed a pdf file in an HTML web page. In web development, web developers use the iframe tag to embed files in various formats and even other websites within a web page. Due to its wide compatibility, the iframe tag is widely used for embedding pdf.
Can you generate HTML with JavaScript? ›
Javascript has some helpful methods that allow us create HTML elements. This is important in cases where we do not want to hard-code the markup but rather generate them dynamically when certain events happen in the browser.
How do I display a local PDF in HTML? ›...
How to Embed a PDF File in HTML
- Open the File Embed Editor.
- In 'Content' > 'Upload File,' click the '+ Add File' button.
- Drop or upload your PDF file into the File Manager.
If you have Acrobat Reader, you can copy a portion of a PDF file to the clipboard and paste it into another program. For text, just highlight the portion of text in the PDF and press Ctrl + C to copy it. Then open a word processing program, such as Microsoft Word, and press Ctrl + V to paste the text.
How to extract text from PDF in JavaScript? ›- Select a PDF file to extract text.
- Create a 'FileReader' is to read the text.
- The 'AsposePdfExtractText' function is executed.
- Next, if the 'json. errorCode' is 0, then the 'json. ...
- As a result, you will receive a string with the extracted text from your PDF.
...
Project setup
- Start a new Android Studio Project.
- Select Empty Activity and Next.
- Name: Open-PDF-File-Android-Example.
- Package name: com. ...
- Language: Kotlin.
- Finish.
As mentioned earlier, EJS is one of the most popular template engines for JavaScript. One of the reasons to choose it is that EJS code looks like pure HTML. It retains the syntax of HTML while allowing data interpolation, unlike Pug (another template engine) which uses a different syntax with indentation and spaces.
Is EJS same as HTML? ›EJS is a simple templating language that lets you generate HTML markup with plain JavaScript.
Should I use HTML or EJS? ›EJS is an open source JavaScript template library whereas HTML is simple static template. If you write ejs code in html it won't work as this is not recognised code and won't render properly in browser. EJS helps you to write code in js which then complies and convert the code for browser to render properly.
How to convert HTML to PDF with CSS? ›- The HTML node tree you want to transform into a PDF is first transformed into a canvas using html2canvas (line 4)
- Then, an empty PDF structure is created using an A4 format. ...
- Almost finished, we turn the canvas into a PNG image using canvas.
In short, you can definitely find work using just HTML and CSS. And if those foundational skills aren't enough to get you your dream job, you can still use them to start making money while you're building other skills.
Can a PDF be a Trojan? ›
It will always be possible for PDFs, like any other type of document or bit of code, to carry viruses, malware, or tools for hackers. However, by taking these measures to defend yourself and by relying on proven software, you can make it less probable that any attacks succeed.
What is the fastest way to create a PDF? ›Open Acrobat and choose “Tools” > “Create PDF”. Select the file type you want to create a PDF from: single file, multiple files, scan, or other option. Click “Create” or “Next” depending on the file type. Follow the prompts to convert to PDF and save to your desired location.
Can PDF be automated using selenium? ›Selenium does not have any inbuilt functionality to test the content of PDF files; hence it needs to use the third-party library Apache PDFBox. It is an open-source Java tool and can be used with Selenium Java and TestNG to assert the content of PDF.
What is the best free PDF creator? ›Soda PDF Desktop, available for Windows, is by far the best PDF creator on the market today and best of all: it's free! You can easily create PDF documents from a variety of file formats, including images or Microsoft Word, Excel, and PowerPoint files.
Which free app is best for PDF making? ›Adobe Acrobat Reader is one of the most popular PDF editors on iOS and Android. This app comes with great features to manage PDF files. With this editor, you can fill forms, sign documents, set passwords, annotate PDF files when you read, and highlight a PDF file. Basic PDF editing is free.
What is the best free PDF tool? ›- Foxit PDF Editor.
- pdfFiller by airSlate.
- Wondershare PDFelement.
- Nitro PDF Productivity.
- Smallpdf.
- Lumin PDF for G Suite.
- Ultimate eBook Converter.
- FineReader PDF for Windows and Mac.
Generate PDF using JavaScript
The following example shows how to use the jsPDF library to generate PDF file using JavaScript. Specify the content in text() method of jsPDF object. Use the addPage() method to add new page to PDF. Use the save() method to generate and download PDF file.
- Using native browser printing with CSS Print Rules.
- Making a Screenshot from the DOM (HTML => Canvas => Image => PDF)
- Using PDF / JavaScript libraries.
- Using Puppeteer, Headless Chrome with Node. js.
- Conclusion.
The res. sendFile() method of the express. js module is used to render a particular HTML file that is present in the local machine.
Can we generate PDF using JavaScript? ›To be able to convert your webpage to a pdf, we need to: import the JavaScript library either by installing it locally in your server using NPM or by including it in your HTML code like the following example. Add generatePDF function which will convert the passed section of your webpage into a pdf by calling document.
Which is the best PDF generator for Reactjs? ›
React-pdf — Displays PDFs as if they were images.
React-pdf/renderer, which we discussed earlier, is specialized in creating PDFs. But if you are looking for a library to only display PDFs, React-pdf is the best option. React-pdf provides a React component API that allows to open PDF files and render them using PDF.
- Overview. ...
- What is React-pdf? ...
- Step 1 - Creating Project. ...
- Step 2 - Installing NPM Package. ...
- Step 3 - Generating Dummy Invoice Data. ...
- Step 4 - Creating Invoice Data File. ...
- Step 5 - Create Invoice Main Component. ...
- Step 6 - Invoice Heading Component.
- Open your word processor and paste the user's personal information.
- Write the user's purchases in a table and calculate the total.
- Download this file and convert it to PDF format.
- Email this document to the user.
- Step 1: Create index.html to be served.
- Step 2: write server code using http module.
- Step 3: run the server. $ node server.js. Then your index.html page will be available on URL: http://localhost:8080.
The most basic way you could do what you want is this : var http = require('http'); http. createServer(function (req, res) { var html = buildHtml(req); res. writeHead(200, { 'Content-Type': 'text/html', 'Content-Length': html.
How to load HTML page in node js? ›- Create a Node. js project directory and initialize the project within this directory. Use npm init -y to automatically initialize this Node. ...
- Install the Express. js framework using npm install express . Check this guide to learn more about of Express. ...
- We will create a server using Express. js.