Pack200 and Compression


This chapter includes the following topics:

Introduction

To increase server and network availability and band-width, two new compression formats are available to Java deployment of applications and applets: gzip and Pack200.

With both techniques the compressed JAR files are transmitted over the network and the receiving application decompresses and restores them.

Theory

HTTP 1.1 (RFC 2616) protocol discusses HTTP compression. HTTP Compression allows applications JAR files  to be deployed as compressed JAR files. The supported compression techniques are gzip,compress,deflate.

As of SDK/JRE version 5.0, HTTP compression is implemented in Java Web Start and Java Plug-in in compliance with RFC 2616. The supported techniques are gzip and pack200-gzip.

The requesting application sends an HTTP request to the server. An HTTP request has multiple fields. The Accept-Encoding (AE) field is set to pack200-gzip or gzip, indicating to the server that the application can handle pack200-gzip or gzip format.

The server implementation will search for the requested JAR file with .pack.gz or .gz file extension and respond back with the located file. The server will set the response header Content-Encoding (CE) field to pack200-gzip , gzip, or NULL depending on the type of file that is being sent, and optionally may set the Content-Type (CT) to application/Java-archive. Therefore,  by inspecting the CE field, the requesting application can apply the corresponding transformation to restore the original JAR file.



The above can be achieved using a simple servlet or server module with any HTTP 1.1 compliant web-servers.

Compressing files on the fly will degrade server performance, especially with Pack200, and therefore not recommended.

Sample Tomcat Servlet:

/**
 *  A simple HTTP Compression Servlet
 */

import java.util.*;
import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;
import java.util.zip.*;
import java.net.*;

/**
 * The servlet class.
 */
public class ContentType extends HttpServlet {

    private static final String JNLP_MIME_TYPE          = "application/x-java-jnlp-file";
    private static final String JAR_MIME_TYPE           = "application/x-java-archive";
    private static final String PACK200_MIME_TYPE       = "application/x-java-pack200";

    // HTTP Compression RFC 2616 : Standard headers
    public static final String ACCEPT_ENCODING          = "accept-encoding";
    public static final String CONTENT_TYPE             = "content-type";
    public static final String CONTENT_ENCODING         = "content-encoding";

    // HTTP Compression RFC 2616 : Standard header for HTTP/Pack200 Compression
    public static final String GZIP_ENCODING            = "gzip";
    public static final String PACK200_GZIP_ENCODING    = "pack200-gzip";
       
    private void sendHtml(HttpServletResponse response, String s) 
                 throws IOException {
         PrintWriter out = response.getWriter();
         
         out.println("<html>");
         out.println("<head>");
         out.println("<title>ContentType</title>");
         out.println("</head>");
         out.println("<body>");
         out.println(s);
         out.println("</body>");
         out.println("</html>");
    }

    /* 
     * Copy the inputStream to output ,
     */    
    private void sendOut(InputStream in, OutputStream ostream) 
                 throws IOException {
        byte buf[] = new byte[8192];

        int n = in.read(buf);
        while (n > 0 ) {
            ostream.write(buf,0,n);
            n = in.read(buf);
        }
        ostream.close();
        in.close();
    }
    
    boolean doFile(String name, HttpServletResponse response) {
        File f = new File(name);
        if (f.exists()) {
            getServletContext().log("Found file " + name);

            response.setContentLength(Integer.parseInt(
                        Long.toString(f.length())));

            response.setDateHeader("Last-Modified",f.lastModified());
            return true;  
        }
        getServletContext().log("File not found " + name);
        return false;
    }
    
    
    /** Called when someone accesses the servlet. */
    public void doGet(HttpServletRequest request, 
                HttpServletResponse response) 
                throws IOException, ServletException {
        
        String encoding = request.getHeader(ACCEPT_ENCODING);
        String pathInfo = request.getPathInfo();
        String pathInfoEx = request.getPathTranslated();
        String contentType = request.getContentType();
        StringBuffer requestURL  = request.getRequestURL();
        String requestName = pathInfo; 
        
        ServletContext sc = getServletContext();
        sc.log("----------------------------");
        sc.log("pathInfo="+pathInfo);
        sc.log("pathInfoEx="+pathInfoEx);
        sc.log("Accept-Encoding="+encoding);
        sc.log("Content-Type="+contentType);
        sc.log("requestURL="+requestURL);
        
        if (pathInfoEx == null) {
            response.sendError(response.SC_NOT_FOUND);
            return;
        }
        String outFile = pathInfo;
        boolean found = false;
        String contentEncoding = null;
        

        // Pack200 Compression
        if (encoding != null && contentType != null &&
                contentType.compareTo(JAR_MIME_TYPE) == 0 &&
                encoding.toLowerCase().indexOf(PACK200_GZIP_ENCODING) > -1){

            contentEncoding = PACK200_GZIP_ENCODING;
            
            
            if (doFile(pathInfoEx.concat(".pack.gz"),response)) {
                outFile = pathInfo.concat(".pack.gz") ;
                found = true;
            } else {
                // Pack/Compress and transmit, not very efficient.
                found = false;
            }
        }

        // HTTP Compression
        if (found == false && encoding != null &&
                contentType != null &&
                contentType.compareTo(JAR_MIME_TYPE) == 0 && 
                encoding.toLowerCase().indexOf("gzip") > -1) {
                
            contentEncoding = GZIP_ENCODING;

            if (doFile(pathInfoEx.concat(".gz"),response)) {
                outFile = pathInfo.concat(".gz");
                found = true;
            }             
        }

        // No Compression
        if (found == false) { // just send the file
            contentEncoding = null;
            sc.log(CONTENT_ENCODING + "=" + "null");
            doFile(pathInfoEx,response);
            outFile = pathInfo;
        }

        response.setHeader(CONTENT_ENCODING, contentEncoding);
        sc.log(CONTENT_ENCODING + "=" + contentEncoding + 
                " : outFile="+outFile);


        if (sc.getMimeType(pathInfo) != null) {
            response.setContentType(sc.getMimeType(pathInfo));
        }
        
        InputStream in = sc.getResourceAsStream(outFile);
        OutputStream out = response.getOutputStream();

        if (in != null) {
            try {
                sendOut(in,out);
            } catch (IOException ioe) {
                if (ioe.getMessage().compareTo("Broken pipe") == 0) {
                    sc.log("Broken Pipe while writing");
                    return;
                } else  throw ioe;
            }
        } else response.sendError(response.SC_NOT_FOUND);
        
    }

}


GZIP Compression

GZIP is a freely available compressor available within the JRE and  the SDK as Java.util.zip.GZIPInputStream and Java.util.zip.GZIPOutputStream.
The Command line versions are available with most Unix Operating Systems, Windows Unix Toolkits (Cygwin and MKS), or they are dowloadable for a plethora of operating systems at http://www.gzip.org/.

One can get the highest degree of compression using gzip to compress an uncompressed jar file vs. compressing a compressed jar file, the downside is that the file may be stored uncompressed on the target systems.

Here is an example:
Compressing using gzip on a jar file containing individual deflated entries.
Notepad.jar       46.25 kb
Notepad.jar.gz   43.00 kb

Compressing using gzip on a jar file containing "stored" entries
Notepad.jar      987.47 kb
Notepad.jar.gz   32.47 kb

As you can see the download size can be reduced by 14% using uncompressed jar, versus 3% using compressed jar file.

Pack200 Compression

Pack200 compresses large files very efficiently, depending on the density and size of the class files in the JAR file. One can expect compression to 1/9 the size of the JAR file, if it contains only class files and is in the order of several MB. 

Using the same jar in the previous example:
Notepad.jar      46.25 kb
Notepad.jar.pack.gz  22.58 kb

In this case the same jar can be reduced by 50%.

Please note: when signing large jars, step 5 may fail with a Security Error — a likely cause is bug 5078608. Please use one of the workarounds detailed in the release notes.

Pack200 works most efficiently on Java class files. It uses several techniques to efficiently reduce the size of JAR files:

  1. It merges and sorts the constant-pool data in the class files and co-locates them in the archive.
  2. It removes redundant class attributes.
  3. It stores internal data structures.
  4. It use delta and variable length encoding.
  5. It chooses optimum coding types for secondary compression.
Pack200 can be used by using the Command Line Interfaces pack200(1), unpack200(1) in the bin directory of your SDK or the JRE directory.
Pack200 interfaces can also invoked programmatically from Java, please refer to the API and JavaDoc references to Java.util.jar.Pack200.

Steps to Pack a file

1. Consider the size of the JAR file, the contents of the JAR file, and the bandwidth of your target audience.

All these factors play into choosing a compression technique. The unpack200 is designed to be as efficient as possible and it takes little time to restore the original file. If you have large JAR files (2 MB or more) comprised mostly of class files, Pack200 is the preferred compression technique. If you have large JAR files which are comprised of  resource files (JPEG, GIF, data, etc.), then gzip is the preferred compression technique.

2.  Pack200 segmenting.

Pack200 loads the entire packed file into memory. However, when target systems are memory and resource constrained, setting the Pack200.Packer.SEGMENT_LIMIT to a lower value, will reduce the memory requirements  during packing and unpacking. The Pack200.Packer.SEGMENT_LIMIT=-1  will force one segment to be generated, which will be effect in size reduction, but will require a much larger Java heap on the packing and and unpacking system. Note that several of these packed segments may be concatenated to produce a single packed file.

3. Signing the JAR files.

Pack200 rearranges the contents of the resultant JAR file. The jarsigner hashes the contents of the class file and stores the hash in an encrypted digest in the manifest. When the unpacker runs on a packed packed, the contents of the classes will be rearranged and thus  invalidate the signature. Therefore, the JAR file must be normalized first  using pack200 and unpack200, and thereafter signed.

(Here's why this works: Any reordering the packer does of any classfile structures is idempotent, so the second packing does not change the orderings produced by the first packing. Also, the unpacker is guaranteed by the JSR 200 specification to produce a specific bytewise image for any given transmission ordering of archive elements.)

An Example

Suppose you wish to use HelloWorld.jar.


Step 1:  Repack the file to normalize the jar, repacking calls the packer and unpacks the file in one step.

% pack200 --repack HelloWorld.jar

Step 2: Sign the jar after we normalize using repack.

% jarsigner -keystore myKeystore HelloWorld.jar ksrini

Verify the just signed jar to ensure the signing worked.

% jarsigner -verify HelloWorld.jar
jar verified.


Ensure the jar still works.

% Java -jar HelloWorld.jar
HelloWorld


Step 3: Now we pack the file

% pack200 HelloWorld.jar.pack.gz HelloWorld.jar

Step 4: Unpack the file

% unpack200 HelloWorld.jar.pack.gz HelloT1.jar

Step 5:  Verify the jar

% jarsigner -verify HelloT1.jar
jar verified.


// Test the jar ...
% Java -jar HelloT1.jar
HelloWorld


After verification, the compressed pack file HelloWorld.jar.pack.gz can be deployed.

4. Reduction techniques: 

 Pack200 by default behaves in a High Fidelity (Hi-Fi) mode, meaning all the original attributes present in the classes as well as the attributes of each individual entry in a JAR file is retained. These typically tend to add to the packed file size, here are some of the
techniques one can use to further reduce the size of the download:
  1. Modification times:  If modification time of the individual entries in a JAR file is not a concern, you can specify the option   Pack200.Packer.MODIFICATION_TIME="LATEST". This will allow one modification time to be transmitted in the pack file for each segment. The latest time will be the latest time of any entry within that segment. 

  2. Deflation hint: Similar to the above, if the compression state of the individual entries in the archive is not required, set Pack200.Packer.DEFLATION_HINT="false". This will fractionally reduce the download size, as individual compression hints will not be transmitted. However, the jar when recomposed will contain "stored" entries and hence may consume more disk space on the target system.

    For example:

    pack200 --modification-time=latest --deflate-hint="true" tools-md.jar.pack.gz tools.jar

    Note: the above optimizations will yield better results with a JAR file containing thousands of entries.

  3. Attributes: Several class attributes are not required when deploying JAR files. These attributes can be stripped out of class files, significantly reducing download size. However, care must be taken to ensure that required runtime attributes are maintained.

    1. Debugging attributes: If debugging information, such as Line Numbers and Source File, is not required (typically in applications stack traces), then these attributes can be discarded by specifying Pack200.Packer.STRIP_DEBUG=true.This typically reduces the packed file by about 10%.

      Example:
      pack200 --strip-debug tools-stripped.jar.pack.gz tools.jar

    2. Other attributes: Advanced users may use some of the other strip-related properties to strip out additional attributes. However, extreme caution should be used when doing so,  the resultant JAR file must be tested on all possible Java runtime systems to ensure that the runtime does not depend on the stripped attributes.

5. Handling unknown attributes:

Pack200 deals with standard attributes defined by the Java Virtual Machine Specification, however compilers are free to  introduce custom attributes. When such attributes are present, by default, Pack200 passes through the class, emitting a  warning message. These "passed-through" class files, may contribute to bloating of packed files. If the unknown attributes are prevalent in the classes of a JAR file, this may lead to a very large bloat of the packed output.  In such a cases, consider the following strategies:

Strip the attribute if the attribute is  deemed to be redundant at  runtime, this can be achieved by setting the property Pack200.Packer.UNKNOWN_ATTRIBUTE=STRIP or

pack200 --unknown-attribute=strip foo.pack.gz foo.jar

If the attributes are required at runtime, and they do contribute to an inflation, then identify the attribute from the warning message and apply a suitable layout for these, as described in the Pack200 JSR 200 specification., and the Java API reference section for Pack200.Packer.

Its possible that a compiler could define an attribute not implemented in the layout specification of Pack200, and may cause the Packer to malfunction, in such cases an entire class file(s) can be "passed through", as if it were a resource by virtue of its name and can be specified as follows:

pack200 --pass-file="com/acme/foo/bar/baz.class" foo.pack.gz foo.jar

or an entire directory and its contents,

pack200 --pass-file="com/acme/foo/bar/" foo.pack.gz foo.jar
6. Installers:
You may wish to take advantage of the Pack200 technology in your installation program, whereby a product's jars may need to compressed using Pack200 and decompressed during the installation. If  the  JRE or SDK is bundled in the installation, you are free to  use the unpack200 (Unix) or unpack200.exe(Windows) in the distribution 'bin' directory, this  implementation is a pure C++ application requiring no Java runtime to be present for it to run.

Windows:  Installers may use a better algorithm than the one in GZIP to compress entries in such cases, one will get better compression using the  Installer's intrinsic compression, by using the pack200 as follows:

pack200 --no-gzip foo.jar.pack foo.jar

This will prevent the output file from being gzip compressed.

unpack200 is a Windows Console application, ie. it will display a MS-DOS window during the install, to suppress this, you can use a launcher with a WinMain which will suppress this window, as shown below.


Sample Code:
#include "windows.h"
#include <stdio.h>

int APIENTRY WinMain(HINSTANCE hInstance,
                     HINSTANCE hPrevInstance,
                     LPSTR     lpCmdLine,
                     int       nCmdShow) {
  STARTUPINFO si;
  memset(&si, 0, sizeof(si));
  si.cb = sizeof(si);

  PROCESS_INFORMATION pi;
  memset(&pi, 0, sizeof(pi));

  //Test
  //lpCmdLine = "c:/build/windows-i586/bin/unpack200 -l c:/Temp/log c:/Temp/rt.pack c:/Temp/rt.jar";
  int ret = CreateProcess(NULL,			/* Exec. name */
			  lpCmdLine,		/* cmd line */
			  NULL,			/* proc. sec. attr. */
			  NULL,			/* thread sec. attr */
			  TRUE,			/* inherit file handle */
			  CREATE_NO_WINDOW | DETACHED_PROCESS, /* detach the process/suppress console */
			  NULL,			/* env block */
			  NULL,			/* inherit cwd */
			  &si,			/* startup info */
			  &pi);			/* process info */
  if ( ret == 0) ExitProcess(255);

  // Wait until child process exits.
  WaitForSingleObject( pi.hProcess, INFINITE );

  DWORD exit_val;

  // Be conservative and return
  if (GetExitCodeProcess(pi.hProcess, &exit_val) == 0) ExitProcess(255);

  ExitProcess(exit_val); // Return the error code of the child process

  return -1;
}

Testing

It is required that all JAR files, packed and unpacked, be tested for correctness with your applications test qualifiers. When using the command line interface pack200, the output file will be compressed using gzip with  default values. A user may create a simple pack file and  compress using  gzip with user-specified options or using some other compressor.

More Information

For more information see pack200 and unpack200 in Java Deployment Tools.