Wednesday, April 09, 2003

Never Enough Compression Fun Department

New Scientist has recently reported that using the Bzip file compression program, scientists have found themselves capable of automatically sorting music not just by genre but by composer as well. Since this strategy is purely algorithmic and doesn't rely on abstractions such as harmony or rhythm, it might be helpful to identify the composer for a work where not much is known about its origin. A similar strategy has also been used to find out whether two sound files containing speech are spoken in the same language.

Interestingly enough, recently reported that a similar compression program, Gzip, may be used to identify whether microscopic structures within rocks are stromatolites--structures created by microbes--or stromato-like structures created through a chemical process. To wit, the smaller the compressed file, the more orderly the structure, and thus the more likely the structure is a stromatolite. If the process stands up to further study, we may have a ready means to analyze Martian rocks for signs that life once lived on Mars.

I have a feeling that compression programs--once created back when drive space was much more expensive--will be used more and more often to analyze elements of structure in complex data sets. Indeed, I am already wondering if, for example, compression programs can be used to determine authors of texts. Imagine being able to compare the works of Shakespeare and Bacon with an off-the-shelf compression program. Perhaps compression software can also be used to compare images--say, to help differentiate between authentic Van Goghs and forgeries. I'm sure there are dozens, maybe hundreds, of possible applications.

Maybe this is the time to invest in Aladdin Systems, maker of the Stuffit compression suite. Does WinZip have public stock? Or will the use of gzip continue, encouraging the Open Source movement to create new scientific analysis tools with basic compression algorithms at their core? I don't know, but I have seen the future, and it is zipped.

No comments: