09 February 2012

How to extract data from a Java heap dump

I will show how to extract data from the memory of a running Java application. Lets assume that this is the class with interesting data:
package se.lesc.blog.heap_extract;

import java.util.*;

/** Class that contain data to extract */
public class ArrayDataContainer {
    
    List<byte[]> arrays = new ArrayList<byte[]>();
    
    public String toString() {
        String result = "";
        for (int i = 0; i < arrays.size(); i++) {
            result += "Array " + i + ": " + new String(arrays.get(i)) + "n";
        }
        return result;
    }
}

An example main class that starts a Java process and populates the class with some data:
    public static void main(String[] args) throws Exception {
        ArrayDataContainer arrayDataContainer = new ArrayDataContainer();
        
        for (int i = 0; i < 20; i++) {
            String dataString = "This is my data " + i;
            arrayDataContainer.arrays.add(dataString.getBytes());
        }
        
        System.out.println(arrayDataContainer.toString());
        Thread.sleep(1000*300);
    }


Use your favorite tool to take a heap dump (for example jmap).

Open Java Visual VM (typically found in C:\Program Files\Java\jdk1.6.0_24\bin\jvisualvm.exe). It has an Object Query Language (OQL) feature that is very useful in this case. Go to File -> Load... Select Heap Dumps in the file open dialog and open the heap dump. Click on the "OQL Console" button.

Entry this query and press the execute button:
map(heap.objects("se.lesc.blog.heap_extract.ArrayDataContainer"), 
  function (it, index, array, result) {
    var res = '';
    for each (var element in it.arrays.elementData) {
      res += '<p/>';
      for each (var i in element) {
        res += i + ', ';
      }
    }
    return res;
  })

Copy the data from the Query Results and save it to a text file. Use this program to parse the file and recreate the List of byte[]:
package se.lesc.blog.heap_extract;

import java.io.*;
import java.util.*;

public class ArrayDataImporter {

    public static List<byte[]> importArrayList(InputStream in) throws Exception {
        List<byte[]> arrays = new ArrayList<byte[]>();
        
        BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(in));

        String line;
        while ((line = bufferedReader.readLine()) != null) {
            
            line = line.trim();
            if (line.isEmpty()) {
                continue;
            }

            List<Byte> array = new ArrayList<Byte>();
            
            String[] dataPoints = line.split(",");
            for (String dataPointString : dataPoints) {
                dataPointString = dataPointString.trim();
                
                int dataPoint = Integer.parseInt(dataPointString);
                array.add((byte) dataPoint);
            }
            
            byte[] byteArray = new byte[array.size()];
            for (int i = 0; i < array.size(); i++) {
                byteArray[i] = array.get(i);
            }
            
            arrays.add(byteArray);
        }
        
        return arrays;
    }
    

    public static void main(String[] args) throws Exception {
        ArrayDataContainer arrayDataContainer = new ArrayDataContainer();
        arrayDataContainer.arrays =
             ArrayDataImporter.importArrayList(
                 ArrayDataImporter.class.getResourceAsStream("query_results.txt"));
        System.out.println(arrayDataContainer);
    }
}

Things to note:
  1. The OQL help can be found at http://visualvm.java.net/oqlhelp.html.
  2. The OQL is mostly Javascript, so most of the "normal" Javascript functions will work.
  3. The heap.objects method targets every instance of a class (there was only 1 instance in this application).
  4. The it.arrays.elementData needs explaining. "it" is the iterator from the Javascript map method. "arrays" is the name of the attribute inside in my class ArrayDataContainer. "elementData" is the attribute inside ArrayList.
  5. The result of the query is rendered as a HTML page (thus explaining the <p/> in the middle of the query.

No comments:

Post a Comment