Currently I'm trying to store a big amount of e-mails (100M+) in mysql in Amazon RDS. I've made a seperate emails_bodies table but it's getting way to big.
With around 40k e-mails the table size just got over 1GB, using Amazon RDS. The original (e-mail) files are saved on the Amazon S3 and the bodies (text-only) are just in the DB for searching. With higher user-numbers (which easily counts over 100M emails) I would use TB's of mysql storage.
CREATE TABLE `emails` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`accounts_id` int(10) unsigned NOT NULL,
`ehash` varchar(32) NOT NULL,
`subject` text NOT NULL,
`body` longtext NOT NULL,
`html` tinyint(1) unsigned NOT NULL,
`size` int(10) unsigned NOT NULL,
`datetime` datetime NOT NULL,
`created` datetime NOT NULL,
`last_updated` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `ehash` (`ehash`),
KEY `accounts_id` (`accounts_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
CREATE TABLE `bodies` (
`bodies_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`bodies_emails_id` int(10) unsigned NOT NULL,
`bodies_body` longtext NOT NULL,
PRIMARY KEY (`bodies_id`),
UNIQUE KEY `bodies_emails_id` (`bodies_emails_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
According to my calculations, each body consumes 25K in average. That's pretty fair amount for the email body. Though you can reduce that amount if extract only text part out of multipart body, if your only intention is search. I am sure that average size will be reduced to mere 1k or less.