02 March 2009

Compress database data with Mysql

Problem: the database is too large; it requires too much hard disc space. What can be done to solve this? Well of course more hardware can be bought. But what can be done in software?

To narrow the problem: I have very special table construction. I have tables that contain 5-15 ordinary columns with some integers and some VARCHAR:s. Typically some of the cells are empty for each row. Then I have a special column with raw data of some kind. This data is typically from 100 bytes to 3000 bytes in size. It is defined as a BLOB in Mysql table definition.

I tried different compression methods:
• No compression (for reference)
• Use Mysql COMPRESS function to compress only the raw data column
• Used Myisampack to compress the entire table
• Used both Mysql COMPRESS and Myisampack
• Used compression in my Java client with ZLIB


For pure text data in the raw column
                       TEXT_BLOB: 57,8 MiB (100,0 % of original size)
TEXT_BLOB_MYSQL_COMPRESS: 6,0 MiB ( 10,4 % of original size)
TEXT_BLOB_MYISAMPACK: 44,6 MiB ( 77,2 % of original size)
TEXT_BLOB_MYSQL_COMPRESS_AND_MYISAMPACK: 5,8 MiB ( 10,1 % of original size)
TEXT_BLOB_JAVA_ZLIB: 6,0 MiB ( 10,4 % of original size)


For data that is more binary then text-alike in the raw column and where the raw data is a bit smaller then the above example.
                              BINARY_BLOB:  5,7 MiB (100,0 % of original size)
BINARY_BLOB_MYSQL_COMPRESS: 4,3 MiB ( 75,7 % of original size)
BINARY_BLOB_MYISAMPACK: 3,6 MiB ( 63,3 % of original size)
BINARY_BLOB_MYSQL_COMPRESS_AND_MYISAMPACK: 3,3 MiB ( 58,4 % of original size)
BINARY_BLOB_JAVA_ZLIB: 4,2 MiB ( 74,3 % of original size)


From the results it is clear that Myisampack does a poor job of compressing larger BLOB-data. It performs rather ok compressiong the other column, but why doesn’t it compress the BLOB as good as the alternatives? If I’m guessing speed is probably a factor here.

For my specific type of tables I’m probably going for the Java Zlib compression, since it also has the benefit of reducing network traffic between my java client and mysql server.

No comments:

Post a Comment