Features Spam over Internet Telephony Lead image: Izaokas Sapiro, 123RF.com
Izaokas Sapiro, 123RF.com

Warding off the rise of VoIP spam

Line Lint

Spam over Internet Telephony (SPIT) is regarded as the next generation of spam. In this article, we investigate SPIT and point out countermeasures for admins with VoIP servers. By Michael Hirschbichler, Christoph Egger

Although unsolicited automated calling has been around for years, it hasn't really caught on in the way that spam email has. In the first place, automated cold calling is actually illegal in many places. Another factor limiting the use of automated calling is the cost: A primary rate interface with 30 bidirectional channels is not cheap. If a call center operates 24/7 and an unsuccessful call takes 10 seconds to complete, each call will cost the marketer 0.06 cents in line rental and 20 cents per charge unit. Although this might not seem like a large expense, the relative ineffectiveness of spam, which means the advertiser must place a huge number of calls to make a single sale, limits the effectiveness of conventional spam by phone.

The rise of IP-based Internet Telephony, however, has changed the terms of the spam equation. Using the G.723.1 codec, a VoIP call requires 16Kbps bandwidth (including signaling overhead) in each direction. For 30 bidirectional calls, you would need a maximum bandwidth of 480Kbps, which calls for a 512Kbps line.

Assuming a line rental of EUR 30 (US$ 35-40), the basic rate for the marketer would be 0.004 cents per call. In other words, 5,000 VoIP calls costs the same as one conventional POTS (Plain Old Telephone System) call.

Equally alarming is that the costs remain constant no matter where the call center is located, which means a VoIP call center anywhere in the world could theoretically spam your phone system. VoIP spam is relatively new, but many experts believe it will be a significant problem in the near future. In this article, we take a close look at Spam over Internet Telephony (SPIT) and highlight some techniques that will someday be the first line of defense.

Spam on the Line

RFC 5039 [1] defines no fewer than three different flavors of spam affecting the phone lines:

Although these techniques are relatively new, the surge of interest in Internet telephony and instant messaging ensures that they will pose an ever-greater challenge to networks.

Session Initiation Protocol

Besides Skype, Session Initiation Protocol (SIP4 [2], Listing 1), is the most widespread VoIP protocol today. Its pervasiveness in business environments, as well as its open standard, make SIP a lucrative target for spamming. Like the SMTP mail protocol, SIP relies on plaintext signaling, where any SIP proxy involved in the exchange can add or remove headers.

Listing 1: A SIP Request

01 INVITE sip:bob@tuwien.ac.at SIP/2.0
02 Via: SIP/2.0/UDP myhost.myprovider.de;branch=z9hG4bK776asdhds
03 ...
04 To: Bob <sip:bob@tuwien.ac.at>
05 ...
06 From: Alice <sip:alice@somesipprovider.de>;tag=42
07 Call-ID: a84b4c76e66710de5f90ae275
08 Contact: Alice <sip:alice@myhost.myprovider.de>
09 ...

The media data transferred in the course of a voice or video call are typically transported one of two ways: by the Real-time Transport Protocol (RTP) or by Secure RTP (SRTP). Text-based messages are sent as a payload with the SIP requests.

Listing 1 shows an example of an INVITE request with the relevant headers. The first line defines the request and the user to call; Via defines the path and the SIP signaling and contains all the proxies traversed. To also contains the user to call; From shows the name and URI of the caller. The displays on modern hard and softphones also display this information for incoming calls. Call-ID uniquely identifies a call, and Contact contains the hostname of the SIP client that initiated the request.

When a client such as a hard phone is activated, it registers with a proxy server and stores the user IDs and IP addresses of the clients in the location database.

When a call is established, the signal is routed away from the caller (Alice), optionally via a separate SIP proxy (Proxy Alice), to the call receiver's SIP proxy (Proxy Bob), and finally to the call receiver (Bob). If Alice uses her own proxy, she first has to authenticate against it.

In contrast, communication between the proxies occurs without any identity checks, just as in SMTP. The SIP standard doesn't envisage a centralized instance for mutual authentication.

Although RFC 4474 [3] defines how to sign SIP messages to allow Alice to authenticate against Bob, there are no widespread implementations of this standard. Similarly, no authentication takes place if a client sets up a connection directly to Bob's proxy working around its own proxy server.

From the point of view of Bob's proxy, it is difficult to distinguish between a proxy or client calling.

The lack of any authentication between Alice and Bob's proxy, and thus Bob's client, makes it easy to send spam using SIP. A spammer can successively work through a list of VoIP addresses (SIP URIs) and try to establish a connection to the client by spoofing a From URI (Figure 1).

In the case of calls with spoofed From URIs, the call receiver doesn't even need to accept the call to become a SPIT victim. The figures shows a VoIP phone display with an unambiguous text of Buy_viagra_at_viagra.com as the caller's number.
Figure 1: In the case of calls with spoofed From URIs, the call receiver doesn't even need to accept the call to become a SPIT victim. The figures shows a VoIP phone display with an unambiguous text of Buy_viagra_at_viagra.com as the caller's number.

Some tools, such as SIPP, are capable of automatically setting up parallel calls and playing an audio stream to the call receivers. After playing the message, the script in our example (Figure 2) automatically terminates the call after a timeout of 20 seconds.

The script automatically terminates a call.
Figure 2: The script automatically terminates a call.

It is just as difficult to identify and protect against VoIP spam as it is to identify and protect against email spam. A mix of various methods is probably the best approach. Some techniques are adapted from the strategies used against email spam, and other techniques target problems specifically associated with VoIP.

Content Filtering

The most widespread email spam defense method is to filter on the incoming content. Spam filters analyze the message payload and qualify mail as spam or ham based on the results. This method is not initially applicable to SPIT.

Because the content is not transmitted as a separate RTP stream until the INVITE request is successfully processed, the phone or VoIP software has already signaled the call, and Bob has had to check the phone and pick up the receiver before there is any content to filter. In contrast, content filtering is useful for SPIM and SPPP because the initial MESSAGE (for SPIM) or SUBSCRIBE (for SPPP) request will include the ad the spammer is trying to deliver.

Black, White, Gray

Blacklisting leaves the administrator of a SPIT defense system with the same problems email spam administrators have faced for years. The blacklisting option will not work reliably until VoIP has devised a reliable identity management method.

Whitelisting, on the other hand, offers some possibilities for VoIP environments. A whitelist is a list of trustworthy SIP URIs. When a call comes in, the caller's URI is checked against the list. To work around the restrictions imposed by the whitelist, a spammer would need to know the URIs in the whitelist database.

A method for adding SIP URIs to a whitelist is also important. For instance, one criterion could be retrospective evaluation of the call time: if a call exceeds a specific time, you can assume that it was productive rather than an annoying advertising call.

Graylisting is another option. In a graylisting scenario, the VoIP server initially drops a call from an unknown caller by replying with a busy signal. At the same time, the server keeps the URI in its list for a predefined period of time and puts the call through on the next attempt. It then adds the From URI to a whitelist. This technique is fairly easy to work around, but it is really annoying for an unsuspecting first-time caller.


Consent-based communication is already the norm for numerous instant messaging applications such as ICQ. Before Alice can contact Bob, he has to accept Alice's request.

SIP Presence uses a similar technique: To view the online or offline status of a person in the contact list, the person first has to authorize the viewer. Consent-based communication has a number of drawbacks: First and foremost, initial contacts are quite complicated. In some areas, such as call centers or emergency call centers, this technology is completely ruled out, and it is questionable in sales and marketing departments, too.

Second, the system is easy to work around if the spammer delivers the message in the approval request, rather than in the call itself (Figure 1).


Some admins use a reputation system in combination with a whitelist. In a reputation system, the call receiver simply relies on the caller's reputation. This reputation is derived from the number of entries in whitelists shared by other users of the same system and by reference to a central list. This social network of trust only works if the evaluation is reliable and trustworthy. However, if a node mutates into a zombie and starts sending spam, it loses its reputation, which, in turn, affects all callers that use it as a reference.

Cash Up Front and Turing

Introducing cost is the most effective method of making spam unattractive for its distributors. If an attacker has to intervene because human interaction is required, or if additional computer time is required, the economic benefits of spam are soon lost.

The Turing tests, an approach developed in 1950 by mathematician Alan Turing [4], served to identify a dialog partner as a human or a machine. To do this, the dialog partner is set to a task that only a human can solve.

Web users are familiar with the Turing test in the form of Captcha images (Completely Automated Public Turing test to tell Computers and Humans Apart) on guest book pages. The VoIP equivalent of this is reading an automated text. If the caller passes the Turing test, the server adds them to the whitelist; if they fail, they are put through to the switchboard just in case.

One obvious problem of the Turing test is its lack of "Granny friendliness": What happens to a caller who can't comply with a request because of language or other comprehension problems? How will a potential customer react to a test designed to find out if they are a human or a machine? It only seems to be a question of time until spammers find a way of working around these tests. But it takes CPU power to perform voice detection, and CPU power costs money. Until spammers find an effective weapon against Turing tests, their only option is to use expensive call center agents to solve them.

Puzzles and Venture Capital

Computational puzzles are another cost-related test. The spammer's computer is set a task that consumes some CPU capacity, leaving just enough to allow the computer to set up the call without any restrictions. However, setting up a large number of calls, or sending a large number of messages, consumes a large amount of CPU capacity.

Three factors can affect the dimension of the puzzle: Simple hard phones have to be capable of passing the test, and these devices use 50MHz CPUs. In contrast, spammers have up to 5GHz CPU power at their fingertips, making it one hundred times easier for the spammer to solve the puzzle.

Payment at Risk is another really unattractive option for spammers. The caller (Alice) deposits a small amount in Bob's account. If Bob tags the call as spam, the deposit is forfeit; if not, Alice gets her money back. This approach can be used whenever Alice calls Bob, or just for the initial call, although this is less practical. After a successful call, Alice is automatically placed on Bob's whitelist, and she starts to build up a positive reputation that will allow her to call free of charge at a later time. The major preconditions for this are a reliable centralized payment infrastructure and a reliable authentication system.

Related Techniques

Many techniques deployed in the fight against email spam are useful in combating VoIP spam (Figure 3). Content analysis for SPIM and SPPP (as mentioned earlier) illegible SIP URIs on websites and in directories, bookmarks at obvious locations in the SIP address, or URIs hidden in images: All are useful to combat SPIT harvesters crawling the web. The use of SIP URIs that are only available for a short time, or to a specific group of callers (colleagues, club members, friends), are also useful for VoIP. But if spam calls do start to trickle in through this kind of URI, the administrator has to inform the group of the new address.

Your SPIT strategy might include several components.
Figure 3: Your SPIT strategy might include several components.

Recommended Strategies

Although there is no secure and reliable method of fighting off email and VoIP spam, you can define framework conditions that at least mitigate the threat. The RFC 5039 [1] standard recommends four measures:

Out-of-the-box solutions against SPIT are relatively rare. The reason, for one thing, is that SPIT is a fairly new topic, but also that VoIP spam is not particularly prevalent. Many users and administrators are still blissfully unaware of the tangible threat scenario.

NEC Corporation is actively researching the topic of telephony spam and presented a solution at the 3GSM World Congress 2008 in Barcelona – VoIP SEAL (Voice over IP Secure Application Layer Firewall) [5]. This multiple-level carrier-grade solution is designed to provide protection against distributed denial of service attacks, to identify hacks, and to provide protection against VoIP spam by various means.

Additionally, other initiatives are SpITAssassin [6], an extension of SpamAssassin functionality, and the SPIT defense solution (SPIT AL). SPIT AL is a Java-based web application created by Kiel, Germany-based Internet provider The Net Generation AG (TNG) with support from the independent state center for data protection in Schleswig-Holstein (ULD). Because no specific attacks are currently known, these activities are a low priority for the developers working on them.

SPIT AL was in beta, with plans to released it in the first six months of 2008 under an open source license. However, no recent information is available on the project.


The Asterisk Internet telephony tool provides a feature that lets administrators add a simple audio Captcha with a whitelist. This feature uses the following approach: When a call arrives (SIP INVITE), Asterisk first checks the whitelist to see if the user has entered the Captcha previously. If not, it generates a random number and reads it to the caller ("To dial this number, press …").

The server then prompts the user to enter the numbers via the phone's keypad. If the numeric input matches the random number generated previously, the server puts the caller through to the contact and adds the caller to the whitelist. If the same user calls again, the user is put through immediately because of the whitelist entry (Figure 4).

Asterisk implementation workflow with audio Captchas.
Figure 4: Asterisk implementation workflow with audio Captchas.

Asterisk accepts the incoming call (SIP INVITE) and evaluates the SIP From header to ascertain the (purported) identity of the caller. It passes this identity to the checkwhitelist.sh script (Listing 2), which checks the /tmp/whitelist.log file, or to a database to see if the user has solved a Captcha previously.

Listing 2: checkwhitelist.sh

01 #!/bin/bash
02 WHITELIST=/tmp/whitelist.log
03 hash=`echo "$1" |/usr/bin/md5sum|cut -d \ -f1`
04 inlist=`/bin/grep -m 1 -c "$hash" $WHITELIST`
05 echo &apos;SET VARIABLE wl &apos;$inlist&apos;&apos;;

The script returns 0 as the wl variable, if the user isn't on the whitelist, or 1 for a match, which means the user is on the whitelist. The GotoIf function then switches the Asterisk configuration execution to the correct position to reflect the found or notfound label (Listing 5, line 4). If the user is not on the whitelist, Asterisk executes a second shell script, createcaptcha.sh (Listing 3), which generates a random number and returns the number as the Captcha variable.

Listing 3: createcaptcha.sh

01 #!/bin/bash
02 MAX=999
03 FLOOR=100
04 let "number = ($RANDOM % ($MAX - $FLOOR)) + $FLOOR"
05 echo &apos;SET VARIABLE captcha &apos;$number&apos;&apos;

Listing 4: extensions.conf

01 #!/bin/bash
02 WHITELIST=/tmp/whitelist.log
03 hash=`echo "$1" |/usr/bin/md5sum|cut -d \ -f1`
04 echo $hash "$1">> $WHITELIST

Listing 5: addwhitelist.sh

01 [captcha]
02 exten => _.,1,Answer()
03 exten => _.,n,agi,checkwhitelist.sh|${CALLERID(all)}
04 exten => _.,n,GotoIf($["${wl}" = "1"]?found:notfound)
05 exten => _.,n(notfound),agi,captcha.sh
06 exten => _.,n,Playback(to-call-num-press)
07 exten => _.,n,SayDigits(${captcha})
08 exten => _.,n,Read(usercaptcha||3|1|)
09 exten => _.,n,GotoIf($[${usercaptcha} == ${captcha}]?correct:wrong)
10 exten => _.,n(wrong),Playback(privacy-incorrect)
11 exten => _.,n,HangUp()
12 exten => _.,n(correct),agi,addwhitelist.sh|${CALLERID(all)}
13 exten => _.,n,Playback(auth-thankyou)
14 exten => _.,n(found),Dial(sip/${EXTEN})
15 exten => _.,n,HangUp()

The extensions.conf script in Listing 4 displays the number, and the Asterisk SayDigits reads it out aloud (Listing 5, line 7). Then the Read function accepts the user input. The GotoIf function next compares the original random number with the number input by the user and switches the script flow to the appropriate position (with the correct or wrong label).

If the user enters the wrong number, Asterisk interrupts the connection; if the user enters the correct number, addwhitelist.sh (Listing 5) adds the user to the whitelist. Dial in line 14 then puts the call through to the desired contact.


Because this scenario ignores authentication, any caller could spoof another person's identity and work around the Captcha. However, it is difficult to guess which entries the whitelist contains. Language is another problem. A system that implements our Captcha example and expects US callers, for example, won't give a caller from, say, the Ukraine a chance to pass the test. This is a major challenge for global corporations. You can't simply expect every single caller to speak one particular language. The SIP Accept-language header can be used to handle this, but according to RFC 3261 [2], it is only intended for reason phrases, session descriptions, or status responses.

Asterisk also has another option. For PSTN calls, it launches the "Privacy Manager" that prompts callers to enter their numbers if they have suppressed the number for outgoing calls. Of course, you can't rely on the caller to enter their "real" number, but this process poses complications for spitters.

SPIT: Soon To Be a Problem?

As telephone spam increases, admins need more defensive options. SPIT is not widespread today, but it could be some time in the future. The boom in the numbers of VoIP phones, free VoIP services, and broadband connections opens up a field of activity that spammers can't afford to ignore. VoIP is currently where email was 15 years ago, but VoIP admins have the advantage of many years of experience gained in the email sector. Now is the time to face this oncoming challenge by standardizing, developing, and deploying effective SPIT prevention methods.