Distributed Checksum Clearinghouse
Encyclopedia
Distributed Checksum Clearinghouse (also referred to as DCC) is a hash
sharing method of spam
email detection.
The basic logic in DCC is that most spam mails are sent to many recipients. The same message body appearing many times is therefore bulk email. DCC identifies bulk email by taking a checksum and sending that checksum to a Clearinghouse (server). The server responds with the number of times it has received that checksum. An individual email will create a score of 1 each time it is processed. Bulk mail can be identified because the response number is high. The content is not examined. DCC works over the UDP
protocol and uses little bandwidth
.
DCC is resistant to hashbusters because "the main DCC checksums are fuzzy and ignore aspects of messages. The fuzzy checksums are changed as spam evolves" DCC is likely to identify mailing lists as bulk email unless they are white listed. Likewise, repeatedly sending the same email to a server increases its number in the server, and, therefore, the likelihood of it being treated as spam by others.
Hash table
In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...
sharing method of spam
Spam (electronic)
Spam is the use of electronic messaging systems to send unsolicited bulk messages indiscriminately...
email detection.
The basic logic in DCC is that most spam mails are sent to many recipients. The same message body appearing many times is therefore bulk email. DCC identifies bulk email by taking a checksum and sending that checksum to a Clearinghouse (server). The server responds with the number of times it has received that checksum. An individual email will create a score of 1 each time it is processed. Bulk mail can be identified because the response number is high. The content is not examined. DCC works over the UDP
User Datagram Protocol
The User Datagram Protocol is one of the core members of the Internet Protocol Suite, the set of network protocols used for the Internet. With UDP, computer applications can send messages, in this case referred to as datagrams, to other hosts on an Internet Protocol network without requiring...
protocol and uses little bandwidth
Bandwidth (computing)
In computer networking and computer science, bandwidth, network bandwidth, data bandwidth, or digital bandwidth is a measure of available or consumed data communication resources expressed in bits/second or multiples of it .Note that in textbooks on wireless communications, modem data transmission,...
.
DCC is resistant to hashbusters because "the main DCC checksums are fuzzy and ignore aspects of messages. The fuzzy checksums are changed as spam evolves" DCC is likely to identify mailing lists as bulk email unless they are white listed. Likewise, repeatedly sending the same email to a server increases its number in the server, and, therefore, the likelihood of it being treated as spam by others.
History
According to the official DCC website:
The DCC is based on an idea of Paul VixiePaul VixiePaul Vixie is an American Internet pioneer, the author of several RFCs and well-known Unix software.Vixie attended George Washington High School in San Francisco, California. He received a Ph.D in computer science from Keio University in 2011....
and on fuzzy body matching to reject spam on a corporate firewall operated by Vernon Schryver starting in 1997. The DCC was designed and written at Rhyolite Software starting in 2000. It has been used in production since the winter of 2000/2001.