Size does matter

05 Jan, 2024

Background

Many years ago, I worked on an ERP system. One of the key features of this ERP system was the ability to plan delivery routes for products ordered by clients. It collected all the orders, divided them into delivery routes, created invoices, and created any other documents needed in the logistics operations. In sum, there were about 100 different routes, with more than 80 clients per route every day.

All those documents had to be printed in the correct order, in multiple copies.

Printing system

The ERP system was built on Oracle Forms 9i. There were dedicated Windows servers running four instances of Oracle Forms services and another four running Oracle Reports services. All the printers in the multiple locations have been installed as network printers.

To print the invoice, the user selected the correct document, number of copies, and target printer. In the background, Oracle Reports took one of the printout layouts, created a PDF file on the server, and sent it to the printer selected by the user. The local user printer was installed as network one on the server, so finally, the printed invoice waited for the user on site.

The above solution worked fine for a single document. Unfortunately, it would take too much time to print all the documents related to the delivery route, keeping the correct number of copies and document order.

The client decided to hire an external company to design and build the new printing mechanism for delivery routes. The requirements, among others, included support for duplex printers.

Solution

The point of entry for the whole printing operation is the route definition. Here is how it was done.

First, it started a loop over all the clients within the route for the reverse delivery order. The documents, including the invoice, demand for payment, and additional delivery documents of the first client, have been generated into separate PDF documents called 0001-01.pdf, 0001-02.pdf, 0001-03.pdf - the 0001 was the client number in sequence, and 01, 02, and 03 indicated the different types of documents. In the second iteration, we got documents with prefixes 0002, 0003, and so on. One route could have 240 PDF files (sometimes more).

When PDFs are ready, we need to print them. Luckily, all the documents from the given route had to be printed on one printer. However, printers like to fail, paper can jam in, and other issues can happen. So to prevent any manual verification if all documents are printed, they decided to merge all the documents into one PDF file: this is why the naming of the separate files was so important. They sorted the files by name and merged them in the correct order. Now it could be printed as one document, so only one print job can handle it, and if something failed, they could just print it all again, throwing the printouts into the trash.

There is just one thing. The printouts had to be done in duplex mode. In normal cases, this is managed by the printer. However, if you merge many documents into one PDF, duplex printing can print documents for two different clients on one piece of paper. To prevent it, they came up with the idea of adding an additional empty PDF after every document with odd pages.

Issues

When this solution came into production, we started to get issues from users. Most of them were related not to the printing features but to the process being very slow. In some cases, it took 10 minutes from starting the print until the first page was printed.

I started to analyse how this was done. There was no documentation from the company that created it, but I had access to the servers and the code written in PL/SQL, so I could check all the steps involved in their solution. The code was not something sophisticated; it was a simple routine, as described above. I turn my attention to another part of the solution: scripts stored on the servers responsible for merging multiple PDF files into one.

I found a file named "blank.pdf". This was the empty PDF document that was added to the documents with an odd number of pages, so the duplex printer could correctly print the documents in order. There is nothing special, aside from the size of this file. The single page of an empty PDF document weighed almost 120KB. I checked it, and it was created using OpenOffice Writer; someone created a new document and saved it in PDF format. A single-page invoice with many lines created by Oracle Reports was about 60-90KB!

This was the root cause of slow operations. In the case of routes with 250 documents, the documents could be 25MB in size, while the empty pages added to them would be 29MB. This was in the early 2000's, so the network speeds were not as fast as we have now. Sending such an amount of data over the network took some time. The printer had to process 52MB PDF document.

Fix

As the root cause was the size of the "blank.pdf" file, I had to find a solution to make it smaller. The obvious choice was LaTeX. I did a quick search and found a simple solution that allowed me to create an empty PDF document that was 20KB. I have uploaded it to all of the servers. This changed the final size of the documents from 53MB to 30MB, and the final time to print the document changed from 10 minutes to 5 minute.

#work