Web content to PDF converter

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Web content to PDF converter

Bob Nemec
There a number of tools for converting web content to a PDF. 
Has anyone used one from within Seaside?
I'd like to generate a PDF from a div, not the entire page.

We already use PDF4Smalltalk and Report4PDF with a rudimentary HTML parser.
It can create PDF content for HTML that has bold, underline and italic markup.
Users use this to add content to reports. Report4PDF can render the user content either as HTML or generate a PDF. 
Complex content, like tables and images, is not supported. And I have not interest in adding support. 

For code generated content Report4PDF works fine. But for user entered content it is not optimal. 
I'd rather use something that can represent complex user content. 

Anyone have experience with this?

Thanks,
Bob Nemec


_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Bob Nemec
Reply | Threaded
Open this post in threaded view
|

Re: Web content to PDF converter

Esteban A. Maringolo
Hi Bob,

I have a report generator and I've been using wkhtml2pdf [1] to
generate the PDF files. It supports CSS... and JavaScript!  And has a
lot of print related tweaks.
The good thing is that this enables the user to "view" the content
generated in the browser and then download it as PDF with minimal
modifications.
For that I have a stylesheet that is optimized for printing.

There is a commercial alternative called PrinceXML [2] that is the
best option I know for doing the same as the above, but with many more
supported options (like keeping some content together when printing,
etc.). I remember contacting them years ago, but the pricing was too
high for my customer and the project I was working at. The owner was
one of the creators of Opera Browser and CSS3 itself AFAIR.

Another option, that I'm considering for a new project, is to use
Headless Chromium [3], that I expect would produce a similar output as
that of a Chromium based rendering.

To generate the input to be passed to wkhtml2pdf I simply instantiate
a WAHtmlCanvas and then render components the usual way, but the
stream is a temporary file which I then pass as argument to
wkhtml2pdf.

 ( Seaside.WAHtmlCanvas builder )
   fullDocument: true;
   rootBlock:
     [ :html |
       html meta charset: 'utf-8'.
          "html stylesheets..."
           html title: self reportElement title , ' - ' , self
applicationName ];
      render: [ :html | self renderReportContentOn: html ].

The good thing is I can reuse existing components such as WAReport or my own.
I was thinking of creating my own visitor that does something other
than `renderContentOn:` (like #renderPrintContentOn:`), but then this
worked and other requirements came by, and never got back to it.

Regards,

[1] https://wkhtmltopdf.org/
[2] https://www.princexml.com/
[3] https://developers.google.com/web/updates/2017/04/headless-chrome

Esteban A. Maringolo

On Wed, Aug 12, 2020 at 10:06 AM Bob Nemec <[hidden email]> wrote:

>
> There a number of tools for converting web content to a PDF.
> Has anyone used one from within Seaside?
> I'd like to generate a PDF from a div, not the entire page.
>
> We already use PDF4Smalltalk and Report4PDF with a rudimentary HTML parser.
> It can create PDF content for HTML that has bold, underline and italic markup.
> Users use this to add content to reports. Report4PDF can render the user content either as HTML or generate a PDF.
> Complex content, like tables and images, is not supported. And I have not interest in adding support.
>
> For code generated content Report4PDF works fine. But for user entered content it is not optimal.
> I'd rather use something that can represent complex user content.
>
> Anyone have experience with this?
>
> Thanks,
> Bob Nemec
>
> _______________________________________________
> seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Web content to PDF converter

Bob Nemec
Thanks Maringolo
That is exactly what I was looking for. 

Bob 


On Wednesday, August 12, 2020, 09:33:48 a.m. EDT, Esteban Maringolo <[hidden email]> wrote:


Hi Bob,

I have a report generator and I've been using wkhtml2pdf [1] to
generate the PDF files. It supports CSS... and JavaScript!  And has a
lot of print related tweaks.
The good thing is that this enables the user to "view" the content
generated in the browser and then download it as PDF with minimal
modifications.
For that I have a stylesheet that is optimized for printing.

There is a commercial alternative called PrinceXML [2] that is the
best option I know for doing the same as the above, but with many more
supported options (like keeping some content together when printing,
etc.). I remember contacting them years ago, but the pricing was too
high for my customer and the project I was working at. The owner was
one of the creators of Opera Browser and CSS3 itself AFAIR.

Another option, that I'm considering for a new project, is to use
Headless Chromium [3], that I expect would produce a similar output as
that of a Chromium based rendering.

To generate the input to be passed to wkhtml2pdf I simply instantiate
a WAHtmlCanvas and then render components the usual way, but the
stream is a temporary file which I then pass as argument to
wkhtml2pdf.

( Seaside.WAHtmlCanvas builder )
  fullDocument: true;
  rootBlock:
    [ :html |
      html meta charset: 'utf-8'.
          "html stylesheets..."
          html title: self reportElement title , ' - ' , self
applicationName ];
      render: [ :html | self renderReportContentOn: html ].

The good thing is I can reuse existing components such as WAReport or my own.
I was thinking of creating my own visitor that does something other
than `renderContentOn:` (like #renderPrintContentOn:`), but then this
worked and other requirements came by, and never got back to it.

Regards,

[1] https://wkhtmltopdf.org/
[2] https://www.princexml.com/
[3] https://developers.google.com/web/updates/2017/04/headless-chrome

Esteban A. Maringolo

On Wed, Aug 12, 2020 at 10:06 AM Bob Nemec <[hidden email]> wrote:

>
> There a number of tools for converting web content to a PDF.
> Has anyone used one from within Seaside?
> I'd like to generate a PDF from a div, not the entire page.
>
> We already use PDF4Smalltalk and Report4PDF with a rudimentary HTML parser.
> It can create PDF content for HTML that has bold, underline and italic markup.
> Users use this to add content to reports. Report4PDF can render the user content either as HTML or generate a PDF.
> Complex content, like tables and images, is not supported. And I have not interest in adding support.
>
> For code generated content Report4PDF works fine. But for user entered content it is not optimal.
> I'd rather use something that can represent complex user content.
>
> Anyone have experience with this?
>
> Thanks,
> Bob Nemec

>
> _______________________________________________
> seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside


_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Bob Nemec
Reply | Threaded
Open this post in threaded view
|

Re: Web content to PDF converter

dassi
In reply to this post by Bob Nemec
Hi Bob

Usually I generate HTML to a file with seaside techniques and then convert it with the open source command line tool "wkhtmltopdf". To generate HTML I use something like this:

**************************
component := ...

builder := WAHtmlCanvas builder
codec: (GRCodec forEncoding: 'utf-8');
fullDocument: true;
rootBlock: [:root |
root title: 'blabla'.
root beHtml5.
root stylesheet resourceUrl: 'css/style.css'.
component updateRoot: root];
yourself.

xhtmlString := builder render: component.

GRPlatform current write: xhtmlString toFile: 'blabla.html' inFolder: 'xyz'.
**************************

Cheers, Andreas


Am 12.08.2020 um 15:06 schrieb Bob Nemec <[hidden email]>:

There a number of tools for converting web content to a PDF. 
Has anyone used one from within Seaside?
I'd like to generate a PDF from a div, not the entire page.

We already use PDF4Smalltalk and Report4PDF with a rudimentary HTML parser.
It can create PDF content for HTML that has bold, underline and italic markup.
Users use this to add content to reports. Report4PDF can render the user content either as HTML or generate a PDF. 
Complex content, like tables and images, is not supported. And I have not interest in adding support. 

For code generated content Report4PDF works fine. But for user entered content it is not optimal. 
I'd rather use something that can represent complex user content. 

Anyone have experience with this?

Thanks,
Bob Nemec

_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside



_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside