One of the features of Exchange versions up to 2007 is Single Instance Storage (SIS). What SIS does is that it creates a single instance of messages sent to multiple recipients within the same database. Simply said, the first one gets a copy of the message in his or her mailbox, others get a referral to the message.
With the arrival of Exchange 2010 Microsoft made changes to the Extensible Storage Engine (ESE) .. again. Many of these changes are beneficiary to the performance of the system (read: less IOPS). For instance, Exchange 2010 uses larger pages and it orders database pages in the background. In an optimal situation this will result in an IOPS reduction of 70% versus Exchange 2007 (against Exchange 2003 a whopping 90%).
However, one of the victims of the new ESE is Single Instance Storage. This sounds worse than it is. Large (1 GB+) mailboxes are common nowadays, the maximum number of databases has increased and when looking at recovery times you are more likely to use multiple databases to reduce recovery times.
These developments also reduce the effectiveness of Single Instance Storage, which only works in a single database. In addition, the price of storage has dropped enormeously and focus these days is more on performance than on disk space.
Now before you think about getting additional storage space to counter the effects of SIS absence, I have good news. The new Exchange 2010 ESE engine has built-in storage compression. Certain parts of e-mail messages (headers, body) are stored in compressed format. Attachments are not compressed. Reason for not compressing attachments is that (de)compressing would be too CPU intensive (goodbye to performance improvements due to I/O reduction). Another reason I can think of is that it would be pointless to compress attachments as 90% of the attachments are probably already compressed (zip, docx, jpg).
First measurements show storage compression neutralizes the effects of SIS being gone. In fact, results are expected to be a lot better because storage compression doesn’t rely on messages being stored in the same database. Measurements also indicate overhead caused by (de)compression is equal or less than the amount of overhead required for the extra IOPS for fetching/storing uncompressed information.
For those interested, the algorithm used here is the same as in MAPI and DAG network traffic, i.e. XPRESS. This is a Microsoft implementation of the LZ77 algorithm. For more background information – or if you can’t sleep – check out the underlying RTF Extensions Specification.