Usage

Using with Python standalone

from xhtml2pdf import pisa             # import python module

# Define your data
sourceHtml = "<html><body><p>To PDF or not to PDF</p></body></html>"
outputFilename = "test.pdf"

# Utility function
def convertHtmlToPdf(sourceHtml, outputFilename):
    # open output file for writing (truncated binary)
    resultFile = open(outputFilename, "w+b")

    # convert HTML to PDF
    pisaStatus = pisa.CreatePDF(
            sourceHtml,                # the HTML to convert
            dest=resultFile)           # file handle to recieve result

    # close output file
    resultFile.close()                 # close output file

    # return True on success and False on errors
    return pisaStatus.err

# Main program
if __name__ == "__main__":
    pisa.showLogging()
    convertHtmlToPdf(sourceHtml, outputFilename)

This basic Python example will generate a test.pdf file with the text ‘To PDF or not to PDF’ in the top left of the page. In-memory files can be generated by using StringIO or cStringIO instead of the file open. Advanced options will be discussed later in this document.

Using xhtml2pdf in Django

To allow URL references to be resolved using Django’s STATIC_URL and MEDIA_URL settings, xhtml2pdf allows users to specify a link_callback paramter to point to a function that converts relative URLs to absolute system paths.

import datetime
import os

from django.conf import settings
from django.http import HttpResponse
from django.template import Context
from django.template.loader import get_template

from xhtml2pdf import pisa


def link_callback(uri, rel):
    """
    Convert HTML URIs to absolute system paths so xhtml2pdf can access those
    resources
    """
    # use short variable names
    sUrl = settings.STATIC_URL      # Typically /static/
    sRoot = settings.STATIC_ROOT    # Typically /home/userX/project_static/
    mUrl = settings.MEDIA_URL       # Typically /static/media/
    mRoot = settings.MEDIA_ROOT     # Typically /home/userX/project_static/media/

    # convert URIs to absolute system paths
    if uri.startswith(mUrl):
        path = os.path.join(mRoot, uri.replace(mUrl, ""))
    elif uri.startswith(sUrl):
        path = os.path.join(sRoot, uri.replace(sUrl, ""))
    else:
        return uri  # handle absolute uri (ie: http://some.tld/foo.png)

    # make sure that file exists
    if not os.path.isfile(path):
            raise Exception(
                'media URI must start with %s or %s' % (sUrl, mUrl)
            )
    return path

def generate_pdf(request):
    """
    A typical Django view
    """
    # Prepare context
    data = {}
    data['today'] = datetime.date.today()
    data['farmer'] = 'Old MacDonald'
    data['animals'] = [('Cow', 'Moo'), ('Goat', 'Baa'), ('Pig', 'Oink')]

    # Render html content through html template with context
    template = get_template('lyrics/oldmacdonald.html')
    html = template.render(Context(data))

    # Write PDF to file
    f = open(os.path.join(settings.MEDIA_ROOT, 'test.pdf'), "w+b")
    pisaStatus = pisa.CreatePDF(html, dest=f, link_callback=link_callback)

    # Return PDF document through a Django HTTP response
    file.seek(0)
    pdf = file.read()
    file.close()            # Don't forget to close the file handle
    return HttpResponse(pdf, mimetype='application/pdf')

Using in Command line

xhtml2pdf also provides a convenient command line tool which you can use to convert HTML files to PDF documents using the command line.

$ xhtml2pdf test.html

This basic command will convert the content of test.html to PDF and save it to test.pdf. Advanced options will be described later in this document.

The -s option can be used to start the default PDF viewer after the conversion:

$ xhtml2pdf -s test.html

Advanced Command line tool options

Use xhtml2pdf --help to get started.

Converting HTML data

To generate a PDF from an HTML file called test.html call:

$ xhtml2pdf -s test.html

The resulting PDF will be called test.pdf (if this file is locked e.g. by the Adobe Reader it will be called test-0.pdf and so on). The -s option takes care that the PDF will be opened directly in the Operating Systems default viewer.

To convert more than one file you may use wildcard patterns like * and ?:

$ xhtml2pdf "test/test-*.html"

You may also directly access pages from the internet:

$ xhtml2pdf -s http://www.xhtml2pdf.com/

Using special properties

If the conversion doesn’t work as expected some more informations may be usefull. You may turn on the output of warnings adding -w or even the debugging output by using -d.

Another reason could be, that the parsing failed. Consider trying the -xhtml and -html options. xhtml2pdf uses the HTMLT5lib parser that offers two internal parsing modes: one for HTML and one for XHTML.

When generating the HTML output xhtml2pdf uses an internal default CSS definition (otherwise all tags would appear with no diffences). To get an impression of how this one looks like start xhtml2pdf like this:

$ xhtml2pdf --css-dump > xhtml2pdf-default.css

The CSS will be dumped into the file xhtml2pdf-default.css. You may modify this or even take a totaly self defined one and hand it in by using the -css option, e.g.:

$ xhtml2pdf --css=xhtml2pdf-default.css test.html