Email Address Harvest
How do spammers get my email address? Let me count the ways...
Certain types of web sites are more vulnerable to email spam than others. They include newsgroups, chat rooms, message boards, and online directories for web pages, instant message users, domain names, resumes, and dating services.
Read about how to
stop email spam.
Here are several ways that spammers can harvest your email address.
From posts to UseNet with your email address
Spammers regularly scan UseNet for email address with ready-made programs designed to do so. Some programs just look at articles headers which contain email address (From:, Reply-To:, etc), while other programs check the articles' bodies, starting with programs that look at signatures, through programs that take everything that contain a '@' character and attempt to decipher email addresses.
Common ways of obscuring email addresses, such as adding 'nospam' (like joe@nospamdomain.com) to an email address, is easily figured out by spammers.
As people who where spammed frequently report that spam frequency to their mailbox dropped sharply after a period in which they did not post to UseNet, as well as evidence to spammers' chase after 'fresh' and 'live' addresses, this technique seems to be the primary source of email addresses for spammers.
From mailing lists
Spammers regularly attempt to get the lists of subscribers to mailing lists (some mail servers will give those upon request), knowing that only a few of the addresses are invalid.
When mail servers are configured to refuse such requests, another trick might be used - spammers might send an email to the mailing list with the headers Return-Receipt-To: <email address> or X-Confirm-Reading-To: <email address>. Those headers would cause some mail transfer agents and reading programs to send email back to the <email address> saying that the email was delivered to / read at a given email address, divulging it to spammers.
A different technique used by spammers is to request a mailing lists server to give him the list of all mailing lists it carries (an option implemented by some mailing list servers for the convenience of legitimate users), and then send the spam to the mailing list's address, leaving the server to do the hard work of forwarding a copy to each subscribed email address.
From web pages
Spammers have programs which spider through web pages, looking for email addresses, e.g. email addresses contained in mailto: HTML tags (those you can click on and get a mail window opened).
A widely used technique to fight this technique is the 'poison' CGI script. The script creates a page with several bogus email addresses and a link to itself. Spammers' software visiting the page would harvest the bogus email addresses and follow up the link, entering an infinite loop polluting their lists with bogus email addresses.
From various web and paper forms
Some sites request various details via forms, e.g. guest books and registration. Spammers can get email addresses from those either because the form becomes available on the world wide web, or because the site sells or gives the emails list to others.
Some companies would sell or give email lists filled in on paper forms, e.g. organizers of conventions would make a list of participants' email addresses, and sell it when it's no longer needed.
Some spammers would actually use email addresses from printed material, e.g. professional directories and conference proceedings.
Domain name registration forms are a favorite as well - addresses are most usually correct and updated, and people read the emails sent to them expecting important messages.
Error pages
Many error pages have an email address on them. It is easy for spammers to generate an invalid web page then scan the error page for email addresses.
Via an Ident daemon
Many Unix computers run a daemon (a program which runs in the background, initiated by the system administrator), intended to allow other computers to identify people who connect to them.
When a person surfs from such a computer connects to a web site or news server, the site or server can connect the person's computer back and ask that daemon's for the person's email address.
Some chat clients on PCs behave similarly, so using IRC can cause an email address to be given out to spammers.
From a web browser
Some sites use various tricks to extract a surfer's email address from the web browser, sometimes without the surfer noticing it. These techniques include:
-
Making the browser fetch one of the page's images through an anonymous FTP connection to the site.
Some browsers would give the email address the user has configured into the browser as the password for the anonymous FTP account. A surfer not aware of this technique will not notice that the email address has leaked.
-
Using JavaScript to make the browser send an email to a chosen email address with the email address configured into the browser.
Some browsers would allow email to be sent when the mouse passes over some part of a page. Unless the browser is properly configured, no warning will be issued. Most browsers do not allow this any more.
-
Using the HTTP_FROM header that browsers send to the server.
Some browsers pass a header with your email address to every web server you visit.
It's worth noting here that when one reads email with a browser (or any mail reader that understands HTML), the reader should be aware of active content (Java applets, JavaScript, VB, etc) as well as web bugs.
An email containing HTML may contain a script that upon being read (or even the subject being highlighted) automatically sends email to any email addresses. A good example of this case is the Melissa virus. Such a script could send the spammer not only the reader's email address but all the addresses on the reader's address book.
From IRC
Some IRC clients will give a user's email address to anyone who cares to ask it. Many spammers harvest email addresses from IRC, knowing that those are 'live' addresses and send spam to those email addresses.
This method is used beside the annoying IRCbots that send messages interactively to IRC and chat rooms without attempting to recognize who is participating in the first place.
This is another major source of email addresses for spammers, especially as this is one of the first public activities newbies join, making it easy for spammers to harvest 'fresh' addresses of people who might have very little experience dealing with spam.
From chat rooms
Chat rooms are notorious for email spam. AOL chat rooms are the most popular of those - according to reports there's a utility that can get the screen names of participants in AOL chat rooms. The utility is reported to be specialized for AOL due to two main reasons - AOL makes the list of the actively participating users' screen names available and AOL users are considered prime targets by spammers due to the reputation of AOL as being the ISP of choice by newbies.
Microsoft's instant messenger uses an email address rather than a screen name - not good.
From finger daemons.
Some finger daemons are set to be very friendly - a finger query asking for jake@mydomain will produce list info including login names for all people named Jake on that host. A query for @mydomain will produce a list of all currently logged-on users.
Spammers use this information to get extensive users list from hosts, and of active accounts - ones which are 'live' and will read their mail soon enough to be really attractive spam targets.
AOL profiles.
Spammers harvest AOL names from user profiles lists, as it allows them to 'target' their mailing lists. Also, AOL has a name being the choice ISP of newbies, who might not know how to recognize scams or know how to handle spam.
From catchall accounts
Many hosts allow catchall accounts: an email address that invalid email addresses of a domain are redirected. For example, if abc@mydomain.com is the catchall email, then *any* email sent to mydomain.com will work (such as zzbottom@mydomain.com will work even if it is not a valid email address).
Some web hosts are disallowing catchall accounts for this reason.
From domain contact points
Every domain has one to three contact points - administration, technical, and billing. The contact point includes the email address of the contact person.
As the contact points are freely available, e.g. using the 'whois' command, spammers harvest the email addresses from the contact points for lists of domains (the list of domain is usually made available to the public by the domain registries). This is a tempting methods for spammers, as those email addresses are most usually valid and mail sent to it is being read regularly.
By guessing & cleaning
Some spammers guess email addresses, send a test message (or a real spam) to a list which includes the guessed addresses (such as test, info, webmaster). Then they wait for either an error message to return by email, indicating that the email address is correct, or for a confirmation. A confirmation could be solicited by inserting non-standard but commonly used mail headers requesting that the delivery system and/or mail client send a confirmation of delivery or reading. No news are, of course, good news for the spammer.
Specifically, the headers are -
Return-Receipt-To: <email-address> which causes a delivery confirmation to be sent, and
X-Confirm-Reading-To: <email-address> which causes a reading confirmation to be sent.
Another method of confirming valid email addresses is sending HTML in the email's body (that is sending a web page as the email's content), and embedding in the HTML an image. Mail clients that decode HTML, e.g. as Outlook and Eudora do in the preview pane, will attempt fetching the image - and some spammers put the recipient's email address in the image's URL, and check the web server's log for the email addresses of recipients who viewed the spam.
So it's good advice to set the mail client to *not* preview rich media emails, which would protect the recipient from both accidentally confirming their email addresses to spammers and viruses.
Guessing could be done based on the fact that email addresses are based on people's names, usually in commonly used ways (first.last@domain or an initial of one name followed / preceded by the other @domain)
Also, some email addresses are standard - postmaster is mandated by the RFCs for internet mail. Other common email addresses are postmaster, host, root (for unix hosts), etc.
From online white & yellow pages
There are various sites that serve as white pages, sometimes named people finders web sites. Yellow pages now have an email directory on the web.
Those white/yellow pages contain addresses from various sources, e.g. from UseNet, but sometimes your email address will be registered for you. Example - HotMail will add email addresses to BigFoot by default, making new addresses available to the public.
Spammers go through those directories in order to get email addresses. Most directories prohibit email address harvesting by spammers, but as those databases have a large databases of email addresses + names, it's a tempting target for spammers.
By having access to the same computer
If a spammer has an access to a computer, he can usually get a list of valid usernames (and therefore email addresses) on that computer.
On unix computers the users file (/etc/passwd) is commonly world readable, and the list of currently logged-in users is listed via the 'who' command.
From a previous owner of the email address
An email address might have been owned by someone else, who disposed of it. This might happen with dialup usernames at ISPs - somebody signs up for an ISP, has his/her email address harvested by spammers, and cancel the account. When somebody else signs up with the same ISP with the same username, spammers already know of it.
Similar things can happen with AOL screen names - somebody uses a screen name, gets tired of it, releases it. Later on somebody else might take the same screen name.
Using social engineering
This method means the spammer uses a hoax to convince people into giving him valid email addresses.
From the address book and emails on other people's computers
Some viruses and worms spread by emailing themselves to all the email addresses they can find in the email address book. As some people forward jokes and other material by email to their friends, putting their friends' email addresses on either the to: or cc: fields, rather than the bcc: field, some viruses and warms scan the mail folders for email addresses that are not in the address book, in hope to hit addresses the computer owner's friends' friends, friends' friends' friends, etc.
If it wasn't already done, it's just a matter of time before such malware will not only spam copies of itself, but also send the extracted list of email addresses to it's creator.
As invisible email addresses can't be harvested, it's good advice to have the email addresses of recipients of jokes and the like on BCC:, and if forwarded from somebody else remove from the email's body all the email addresses inserted by the previous sender.
Buying lists from others
This one covers two types of trades. The first type consists of buying a list of email addresses (often on CD) that were harvested via other methods, e.g. someone harvesting email addresses from UseNet and sells the list either to a company that wishes to advertise via email (sometimes passing off the list as that of people who opted-in for emailed advertisements) or to others who resell the list.
The second type consists of a company who got the email addresses legitimately (e.g. a magazine that asks subscribers for their email in order to keep in touch over the Internet) and sells the list for the extra income. This extends to selling of email addresses a company got via other means, e.g. people who just emailed the company with inquiries in any context.
The third type consist of technical staff selling the email address for money to spammers.
By hacking into sites
I've heard rumors that sites that supply free email addresses were hacked in order to get the list of email addresses, somewhat like e-commerce sites being hacked to get a list of credit cards.