Address munging
Encyclopedia
Address munging is the practice of disguising, or munging
Munge
In computing, the term munge means to attempt to create a strong, secure password through character substitution. "Munge" is sometimes backronymmed as Modify Until Not Guessed Easily...

, an e-mail address
E-mail address
An email address identifies an email box to which email messages are delivered. An example format of an email address is lewis@example.net which is read as lewis at example dot net...

 to prevent it being automatically collected and used as a target for people and organizations who send unsolicited bulk e-mail
E-mail spam
Email spam, also known as junk email or unsolicited bulk email , is a subset of spam that involves nearly identical messages sent to numerous recipients by email. Definitions of spam usually include the aspects that email is unsolicited and sent in bulk. One subset of UBE is UCE...

. Address munging is intended to disguise an e-mail address in a way that prevents computer software seeing the real address, or even any address at all, but still allows a human reader to reconstruct the original and contact the author: an email address such as, "no-one@example.com", becomes "no-one at example dot com", for instance.

Any e-mail address posted in public is likely to be automatically collected by computer software
Computer software
Computer software, or just software, is a collection of computer programs and related data that provide the instructions for telling a computer what to do and how to do it....

 used by bulk emailers — a process known as e-mail address harvesting
E-mail address harvesting
Email harvesting is the process of obtaining lists of email addresses using various methods for use in bulk email or other purposes usually grouped as spam.-Methods:...

 — and addresses posted on webpages, Usenet
Usenet
Usenet is a worldwide distributed Internet discussion system. It developed from the general purpose UUCP architecture of the same name.Duke University graduate students Tom Truscott and Jim Ellis conceived the idea in 1979 and it was established in 1980...

 or chat rooms are particularly vulnerable to this. Private e-mail sent between individuals is highly unlikely to be collected, but e-mail sent to a mailing list
Mailing list
A mailing list is a collection of names and addresses used by an individual or an organization to send material to multiple recipients. The term is often extended to include the people subscribed to such a list, so the group of subscribers is referred to as "the mailing list", or simply "the...

 that is archive
Archive
An archive is a collection of historical records, or the physical place they are located. Archives contain primary source documents that have accumulated over the course of an individual or organization's lifetime, and are kept to show the function of an organization...

d and made available via the web
World Wide Web
The World Wide Web is a system of interlinked hypertext documents accessed via the Internet...

 or passed onto a Usenet
Usenet
Usenet is a worldwide distributed Internet discussion system. It developed from the general purpose UUCP architecture of the same name.Duke University graduate students Tom Truscott and Jim Ellis conceived the idea in 1979 and it was established in 1980...

 news server
News server
A news server is a set of computer software used to handle Usenet articles. It may also refer to a computer itself which is primarily or solely used for handling Usenet. A reader server provides an interface to read and post articles, generally with the assistance of a news client. A transit...

 and made public, may eventually be scanned and collected.

Disadvantages

Disguising addresses makes it more difficult for people to send e-mail to each other. Many see it as an attempt to fix a symptom rather than solving the real problem of e-mail spam
E-mail spam
Email spam, also known as junk email or unsolicited bulk email , is a subset of spam that involves nearly identical messages sent to numerous recipients by email. Definitions of spam usually include the aspects that email is unsolicited and sent in bulk. One subset of UBE is UCE...

, at the expense of causing problems for innocent users.

The use of address munging on Usenet is contrary to the recommendations of RFC 1036 governing the format of Usenet posts, which requires a valid e-mail address be supplied in the From: field of the post. In practice, few people follow this so strictly.

Disguising e-mail addresses in a systematic manner (for example, user[at]domain[dot]com), offers little protection. For example, such addresses can be revealed through a simple Google Search.

Any impediment reduces the users willing to take the extra trouble to email the user. In contrast, well maintained email filtering on the user's end does not drive away potential correspondents. Then again, no spam filter is 100% immune to false positives, and the same potential correspondent that would have been deterred by address munging may instead end up wasting time on long letters that will merely disappear in junk mail folders.

For commercial entities, maintaining contact forms on web pages rather than publicizing Email addresses may be one way to ensure that incoming messages are relatively spam-free yet do not get lost. In conjunction with CAPTCHA
CAPTCHA
A CAPTCHA is a type of challenge-response test used in computing as an attempt to ensure that the response is generated by a person. The process usually involves one computer asking a user to complete a simple test which the computer is able to generate and grade...

 fields, spam on such comment fields can be reduced to effectively zero, except that non-accessibility of CAPTCHAs bring exactly the same deterrent problems as address munging itself.

Alternatives

As an alternative to address munging, there are several "transparent" techniques that allow people to post a valid e-mail address, but still make it difficult for automated recognition and collection of the address:
  • "Transparent name mangling" involves replacing characters in the address with equivalent HTML references from the list of XML and HTML character entity references.
  • Posting all or part of the e-mail address as an image.
  • Posting an e-mail address as a text logo
    ASCII art
    ASCII art is a graphic design technique that uses computers for presentation and consists of pictures pieced together from the 95 printable characters defined by the ASCII Standard from 1963 and ASCII compliant character sets with proprietary extended characters...

     and shrinking it to normal size using inline CSS
    Cascading Style Sheets
    Cascading Style Sheets is a style sheet language used to describe the presentation semantics of a document written in a markup language...

    .
  • Posting an e-mail address with the order of characters jumbled and restoring the order using CSS.
  • Building the link by client-side scripting
    Client-side scripting
    Client-side scripting generally refers to the class of computer programs on the web that are executed client-side, by the user's web browser, instead of server-side...

    .
  • Using server-side scripting
    Server-side scripting
    Server-side scripting is a web server technology in which a user's request is verified by running a script directly on the web server to generate dynamic web pages. It is usually used to provide interactive web sites that interface to databases or other data stores. This is different from...

     to run a contact form.


An example of munging "user@example.com" via client-side scripting would be:



The use of images and scripts for address obfuscation can cause problems for people using screen reader
Screen reader
A screen reader is a software application that attempts to identify and interpret what is being displayed on the screen . This interpretation is then re-presented to the user with text-to-speech, sound icons, or a Braille output device...

s and users with disabilities,
and ignores users of text browsers like lynx
Lynx (web browser)
Lynx is a text-based web browser for use on cursor-addressable character cell terminals and is very configurable.-Usage:Browsing in Lynx consists of highlighting the chosen link using cursor keys, or having all links on a page numbered and entering the chosen link's number. Current versions support...

 and w3m
W3m
w3m is a free software/open source text-based web browser. It has support for tables, frames, SSL connections, color and inline images on suitable terminals...

, although being transparent means they don't disadvantage non-English speakers that cannot understand the plain text bound to a single language that is part of non-transparent munged addresses or instructions that accompany them.

According to a 2003 study by the Center for Democracy and Technology
Center for Democracy and Technology
The Center for Democracy & Technology is a Washington, D.C. based 501 non-profit public-interest group that works to promote an open, innovative and free Internet....

, even the simplest "transparent name mangling" of e-mail addresses can be effective.

Examples

Common methods of disguising addresses include:
Disguised address Recovering the original address
no-one at example (dot) com Replace " at " with "@", and " (dot) " with "."
no-one@elpmaxe.com.invalid Reverse domain name
Domain name
A domain name is an identification string that defines a realm of administrative autonomy, authority, or control in the Internet. Domain names are formed by the rules and procedures of the Domain Name System ....

: elpmaxe to example
remove .invalid
.invalid
The name invalid is reserved by the Internet Engineering Task Force in RFC 2606 as a domain name that may not be installed as a top-level domain in the Domain Name System of the Internet.-Reserved DNS names:...

moc.elpmaxe@eno-on Reverse the entire address
no-one@exampleREMOVEME.com Instructions in the address itself; remove REMOVEME.
no-one@exampleNOSPAM.com.invalid Remove NOSPAM from the address.
n o - o n e @ e x a m p l e . c o m This is still readable, but the spaces between letters stop automatic spambots.
no-one<i>@</i>example<i>.</i>com (as HTML) This is still readable and can be copied directly from webpages,
but stops many email harvesters.
по-опе@ехатрlе.сот Cannot be copied directly from Webpages, must be manually copied. All letters except l are Cyrillic
Cyrillic alphabet
The Cyrillic script or azbuka is an alphabetic writing system developed in the First Bulgarian Empire during the 10th century AD at the Preslav Literary School...

 homoglyph
Homoglyph
In typography, a homoglyph is one of two or more characters, or glyphs, with shapes that either appear identical or cannot be differentiated by quick visual inspection. This designation is also applied to sequences of characters sharing these properties....

s that are identical to Latin equivalents to the human eye but incomprehensibly different to most computers. (See also IDN homograph attack
IDN homograph attack
The internationalized domain name homograph attack is a way a malicious party may deceive computer users about what remote system they are communicating with, by exploiting the fact that many different characters look alike,...

 for more malicious use of this strategy.)


The reserved top-level domain
Top-level domain
A top-level domain is one of the domains at the highest level in the hierarchical Domain Name System of the Internet. The top-level domain names are installed in the root zone of the name space. For all domains in lower levels, it is the last part of the domain name, that is, the last label of a...

 .invalid
.invalid
The name invalid is reserved by the Internet Engineering Task Force in RFC 2606 as a domain name that may not be installed as a top-level domain in the Domain Name System of the Internet.-Reserved DNS names:...

 is appended to ensure that a real e-mail address is not inadvertently generated. One problem is that some spammers will now remove obvious munges and send spam to the cleaned up address. For this reason many people recommend using a totally invalid address (especially in the From line) and perhaps a disposable email address in the Reply To.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK