IETF standards & drafts

RFC 3261, The Session Initiation Protocol, was published in 2002, six years after the initial work on SIP. Wikipedia writes

“SIP was originally designed by Henning Schulzrinne and Mark Handley in 1996. In November 2000, SIP was accepted as a 3GPP signaling protocol and permanent element of the IP Multimedia Subsystem (IMS) architecture for IP-based streaming multimedia services in cellular systems. The latest version of the specification is RFC 3261 from the IETF Network Working Group published in June 2002.

The SIP 2012 Realtime Communication Framework - a specification that we need!The problem here is that SIP 2.0 has changed a lot. It’s been formally updated by  a number of RFCs: 3265, 3853, 4320, 4916, 5393, 56215626, 5630, 5922, 5954, 6026, 6141 (according to And that’s the updates that changes the core protocol. In addition, there is a lot of extensions and additions to what now could be called The SIP 2012 Realtime Communication Framework. The core protocol is now just one piece of the puzzle.

Customers need to change requirements

I’ve seen too many public tenders and requirements that refer to “SIP according to RFC 3261″. This is like buying a PC referring to Windows 95 as the base requirement for compatibility. So what should be referred to? Here’s the core of the problem. There’s no simple document or reference profile to refer to. And there is a huge gap between the IETF and the current implementations, something that we see every year at SIPit events. Only customer requirements can push vendors to move forward.

An example: The SipConnect specification from the SIP forum

The SIP Forum SIP Connect 1,1 specification is a good example of a reference profile that customers can use. It focuses on the SIP trunk – the connection between a PSTN gateway provider (ITSP) and an enterprise PBX. SIP connect makes it possible for customers not to refer to RFC 3261, but a more complete specification that builds a framework on top of the specifications. During the work with SIP connect, it was revealed that many PBXs used SIP in a way that was actually not supported by the current set of RFCs. The SIP forum took this issue back to the IETF and the result is GIN – a way to register for many E.164 phone numbers (RFC 6140). Building a reference framework with customer focus actually added to the standard framework, based on the use in the current implementations.

There are more frameworks built on SIP – IP Multimedia System (IMS) from 3gPP is another example, but not for business users or vendors selling SIP solutions to consumers. The 3gpp is active in the IETF, trying to make sure that changes and additions are documented in RFCs and made compatible with the rest of the SIP framework.

Wanted: A reference profile for SIP phones and servers

If you look at the original set of SIP RFCs, there was a lack of solutions for NAT handling. In the current SIP framework the IETF have added new standards like ICE/TURN/STUN and SIP outbound. There are also standard frameworks for configuration update notifications, registration management and TLS certificate management – something that most vendors implement in a proprietary way. We see more implementations of these additions at every SIPit test event, but most phones on the market (and most servers) are still not supporting these new standards. We need a reference profile (maybe from the SIP Forum?) that customers can refer to in order to put pressure on vendors to update their implementations.  SIP implementations that is based on more current work will lead to better products, more NAT and IPv6 friendly solutions and improved security. Standardization in configuration and security management will lead to lower costs and less vendor lock-ins.

The question is who starts the work specifying the new reference profile? 


In a SIP network, you often have multiple servers communicating with each other. As soon as you add TCP and TLS to the mix, you will want to reuse connections. Why? Setting up A TLS connection involves a lot of messages going back and forth in the process up validating certificates and coming up with keying material for the encrypted session. Now if you have a re-invite that wants to put a call on hold, you don’t want to loose a lot of packet-roundtrip-times while this happens. A better solution is to keep connections open where possible and allow communication both ways.

RFC 3261 states that if you open a connection with a connection-oriented protocol, like TCP or STCP, the connection should stay open to cover the whole transaction. This means that if the other end sends a message in the dialog, a connection needs to be opened in the other direction. This is of course a problem with NAT between a device and a server, something that the SIP Outbound standard handles. Between servers, like B2bua’s and proxys, the problem still exists. This is managed by the Connection Reuse RFC, RFC 5923.

Mutual TLS authentication opens up for two-way communication

RFC 5923 – SIP Connection reuse – explains how this can work. One requirement is that the TLS connection has mutual connection, which means that the server ask the client for a certificate. The client indicates in the request that it is prepared to receive inbound requests, not only the response to the request, on the same connection. When that happens, the server and client sets up a connection table where the content of the certificates are stored – the domains and host names. Now if one of them has a request that is targeted to the same domain and the same IP and port (after DNS SRV lookups), the connection can be reused.

Checking and caching the certificate content

When the connection is initiated, both ends provide TLS certificates that contain one or multiple names or SIP uri’s. This list is cached and associated with the session. Now, if the same server host multiple domains, you can’t use the connection if the domain that is the target of the request doesn’t match the names in the certificate. In that case, you need to open a new connection to the same server.

Use DNS host names instead of IP addresses

On a related topic, notice that the RFC always use host names and not IP addresses in the Via: headers. This is of course a requirement if you want to match certificates. For TLS to work in all directions, host names should be used in Via: and Record-Route/Route headers. With a GRUU, you can also have a domain in the Contact. This also helps IPv4/IPv6 dual stack handling, letting every path select the optimal connection.

Combined with SIP outbound we have open connections all the way

Connection reuse is an important feature for all SIP servers, B2BUAs like Asterisk and SIP servers like Kamailio. Without it TLS will be hard to use and cause delays that will affect the calls. In combination with SIP Outbound, where the UA manages the connections to the first-hop servers, it is a working solution for TLS over NAT as well. Keeping TCP/TLS connections open like this is not new, Jabber/XMPP has done this from start. It’s just new to SIP.

I think SIP Connection Reuse support should be on the list of requirements when you select your next SIP application server for your Open Unifed Communication platform.

Lately, I’ve been going through a lot of SIP RFCs and drafts, trying to get an overview of the security suggested in all of these documents. The quality of this work, seen from a developer’s perspective, is quite poor. Sometimes it seems like authors think, “oh, we need to add that security stuff, so let’s add a few keywords like TLS and S/MIME here and there“. We need to get better in reviewing the drafts from a security perspective. Here are  some thoughts on instructions to RFC authors:

  • S/MIME: If you refer to S/MIME, make it very clear which certificates that are going to be used and how the certificate verification process should happen – which part of the SIP message should match with which part of the certificate? And which certificate should be used to encrypt?
  • TLS: If you refer to TLS, you need to be very clear on why – is this to provide authentication, confidentiality or something else? Does the solution require mutual authentication or just server authentication? If authentication is part of your solution, make it very clear how you verify the certificate with the message, down to SIP header fields and X.509v3/PKIX fields.
  • SIPS: If you suggest usage of SIPS, make it very clear on what this adds and how the message flow is supposed to look like. Is SIPS used in the request uri, the Contact or somewhere else? What is the effect? Make sure you really understand SIPS before this is added. Or even better, just avoid SIPS and let it fade away.
  • Certificate matching: If you refer to a certificate SubjAltName, make very clear if it’s a URI or a dnsName field that is required and preferences if there are multiple SubjAltNames in addition to the certificate subject.
The worst documenst so far are the RFCs related to SIP subscriptions. They suggest using S/MIME for encryption, but does not explain how. Now, if I subscribe to the presence status of, my SUBSCRIBE request will end up at the presence server for the domain Should the user agent somehow find the certificate for to encrypt the message? Should we use the certificate of – which would require the presence server to have the private key belonging to Bob? The RFCs doesn’t help at all.
RFC 3857 states the following on the topic of eavesdropping on SUBSCRIBE/NOTIFY requests:

“To prevent that, watchers MAY use the sips URI scheme when subscribing to a watcherinfo resource.  Notifiers for watcherinfo MUST support TLS and sips as if they were a proxy (see Section 26.3.1 of RFC 3261).”

This means that a UA should be able to SUBSCRIBE over a TLS connection, and get NOTIFY over – what? Remember that this was written before SIP Outbound was standardized. For a developer this means that the subscriber is required to have a TLS certificate and accept incoming connections on the TLS port if the Contact in the SUBSCRIBE is a SIPS uri. The RFC should discuss this in more detail.

Nine years after RFC 3261 we have a larger toolbox, including GRUUs, SIP Outbound, SIP Domain certificates, DNSsec and much more. It’s time we restart the work with a SIP security architecture and provide something that developers can implement and that users will clearly feel is a better and more trustworthy solution. The IETF mantra is “rough consensus and running code”. RFCs should make it easy to produce running code. The SIP RFCs fails do this on the topic of SIP security.





SIP over dual stacks - IPv4 and IPv6

Stay Connected - learn more about SIP & IPv6

Yesterday I found an Internet Draft called Testing Eyeball Happiness that gives examples on how to test dual stack deployments. There is a known issue with applications that retrieves multiple IP addresses from the same host name in DNS and , following current RFCs, test them sequentially with a preference for IPv6 addresses. The timeouts when things go bad with one flow are far longer than what the user accepts. Let’s say that Bob (you know him) use his SIP phone to place a call to Alice. Bob’s phone calls an outbound proxy, that wants to forward to another domain. This domain announces both IPv4 and IPv6 addresses in DNS for their proxy. Now, Bob’s proxy actually has an IPv6 address, but is not connected to the Internet with IPv6. The proxy will try connecting to Alice’s domain SIP proxy over IPv6 for quite a long time before it recognizes that there’s no connectivity. Hopefully it will then try another address, but the question is if the user is waiting for that to happen. In telephony, loosing seconds is a catastrophe, especially between requesting a call and getting the first ringing signal. Remember – this is not about media, this is only about signaling. Without signaling, we’ll never get into any media issues.

HTTP and Happy Eyeballs

We’ve seen this problem on the web. Browsers suddenly told us that large sites was not available. Turned out that the new home router enabled IPv6 tunnels and announced IPv6 prefixes on the LAN, something that the firewall blocked. By disabling IPv6 in the laptop, we could reach the web site again. This caused web sites to stop announcing IPv6 and computer owners to disable IPv6. This was no good for the IPv6 migration so the browser developers started to try to find solutions. The Happy Eyeballs discussion in the IETF is about finding algorithms where the browser connects to all addresses in parallel and selects a candidate that answers quickly. In SIP, we need to implement the same fix, over UDP, TCP and STCP. I’ll try to set up some tests at SIPit to see what the current state is.

A quote from the abstract section of the IETF draft:

In a dual stack network (i.e., one that contains both IPv4 [RFC0791] and IPv6 [RFC2460] prefixes and routes), or in an IPv6-only network that uses multiple prefixes allocated by upstream providers that implement BCP 38 Ingress Filtering [RFC2827], the fact that two hosts that need to communicate have addresses using the same architecture does not imply that the network has usable routes connecting them, or that those addresses are useful to the applications in question. In addition, the process of establishing a session using the Sockets API [RFC3493] is generally described in terms of obtaining a list of possible addresses for a peer (which will normally include both IPv4 and IPv6 addresses) using getaddrinfo() and trying them in sequence until one succeeds or all have failed. This naive algorithm, if implemented as described, has the side-effect of making the worst case delay in establishing a session far longer than human patience normally allows. This has the effect of discouraging users from enabling IPv6 in their equipment, or content providers from offering AAAA records for their services.


I’m currently swimming through the deep waters of SIP RFCs in order to get an overview of TLS implementation requirements. Reading RFC 3428 – The SIP Message Extension- I found something I did not know. In section 11, Security Considerations, the RFC states:

In normal usage, most SIP requests are used to setup and modify communication sessions. The actual communication between participants happens in the media sessions, not in the SIP requests themselves. The MESSAGE method changes this assumption; MESSAGE requests normally carry the actual communication between participants as payload. This implies that MESSAGE requests have a greater need for security than most other SIP requests. In particular, UAs that support the MESSAGE request MUST implement end-to-end authentication, body integrity, and body confidentiality mechanisms.

I have seen quite a few implementations of MESSAGE, but none has been compliant with RFC 3428.

The SIP MESSAGE implements a way to send short messages over SIP, within a dialog or outside of a dialog. MESSAGE requests does not create dialog, thus there’s no “session”. For chat sessions that , MSRP – the message session relay protocol – was developed. I’ll try to write more about that protocol in another blog post.

Last week I talked at the Voip2Day conference in Madrid, organized by Avanzada7. The talk, named “Watch out!” covers new areas developed in SIP, but not implemented in many devices or servers out there. Solutions for NAT traversal, PSTN trunk registration and new work with the real time web is covered, along with a small update to the list of 10 bullets to remember when implementing a new SIP platform.

Some topics covered:

  • ICE, Interactive Connection Establishment, a complex but working solution to find a working media path between two Sip phones, either directly or using a media relay (A TURN server). Used both for NAT traversal and IPv4/IPv6 dual stack deployments.
  • SIP Outbound, the way to handle NAT traversal for SIP signaling. With SIP outbound, the client sets up multiple IP connections, called flows, to servers while indicating that it’s actually the same device that registers on all these connections. The proxy can then do failover if one connection fails. It’s up to the SIP phone, the user agent, to maintain the connections and re-open them when they fail.
  • GIN – the way SIPconnect sends a registration for a SIP trunk with multiple phone numbers. Before GIN, every vendor used it’s very own hack which raised the cost for service providers that wanted to support multiple vendors.
  • GRUU – Globally Routable User URI’s – a domain-based address for every device that registers for an account. Makes it possible to do more complex operations over domain boundaries. Without a GRUU, many URI’s are unusable since they’re referring to an IP address hidden behind a NAT device.
I feel that ICE and SIP outbound are good candidates on solving the NAT puzzle as well as the IPv6 transition. We need more Open Source implementations as a reference!
The presentation also covers RTCweb briefly. On the conference, there was a live demonstration by Iñaki Bas Castillo and a colleague of a SIP implementation in JavaScript connecting over WebSockets to a SIP proxy. They lacked RTCweb so there was no media in the calls, but it showed that it’s possible to implement SIP in the browser!

The talk is now published on Slideshare and can be viewed online. Enjoy!


Realtime communication in web browsersVideo sessions in the browser opens up for a lot of new applications. While this has been working with various plugins, new opportunities will open up when it becomes part of your standard web browser. HTML5 introduced native video in the browser. New standardization are looking into peer2peer multimedia sessions in the browser, not just broadcast. This opens up for a large set of applications, from “click2call” without plugins to video chats like in Facebook and Google+. This is exciting. And all the old SIP architects seems to have jumped onto this project, trying to get things right from start in this second generation effort. (more…)

Alec Saunders has written a new manifesto called “Voice 3.0: The emergence of the Voice Web“. It’s very good reading and I agree with most of it. Please read it! What I am missing, which you will see in my comment far down the side, are two things:

  • IPv6 – it’s the glue that will make Voice 3.0 work
  • Security – we need to learn from our mistakes and make Voice 3.0 secure by default

I have written a lot of information about SIP and IPv6 on Edvina’s web site, Twitter and Facebook. Dan York has joined me in this campaign and have a lot of good podcasts, presentations and information on Voxeo’s IPv6 resources page. There’s quite a lot of work we still need to do, but we’re heading in the right direction in the SIP community.

I’m embarrassed

Now is the  time to start working on security. I find it really embarrassing that we have almost no experience of security in VoIP. The protocol has been around for ten years or more and we’re still confused. The hardware vendors claim that the CPUs can’t handle it. Or the DSP’s. Or something else. As we move towards more video we need more CPU power to encrypt. On the other side we have users that doesn’t require any security. They run stuff over the enterprise networks, over home networks and the Internet without bothering with who can listen in, access the log files or access their accounts.

The dark side

This last year, the number of attacks on SIP servers has grown in numbers. I hear not only rumours, but see attacks in progress. I have met a number of people who have lost huge amounts of money on International calls to foreign countries, placed by unknown hackers that figured out that their account (typically “300″) had a very complicated password (typically “300″). The next thing that we’re going to see during summer leave or xmas is someone showing a journalist how easy it is to see that the boss calls the union several times a day, propably to negotiate layoffs. Or calling competition, maybe to sell the company.

Let’s agree on one thing: ALL networks are insecure

I think it’s time to stop ignoring security. I think we have to agree that it makes our architecture much more easy if we assume that ALL networks are insecure. We can not judge, our users surely can not determine if a network is secure or not. They have to be able to connect to any open WiFi network and use the services we produce. Without thinking about anything else than reaching their office, customer, family or the kid back home. Every communication needs to be secure. Period.

If we start there, we’re on the right track. Now we need to work together and figure out where to go. How do we exchange encryption keys? How do we handle identities? Where do we start. Let’s not make the mistake of trying to hook on to a global PKI that doesn’t scale (and has proven not to be secure) like the one used by the web. PGP is interesting, but only scale on a nerd level. It’s too complicated the way we use it today. I would not even try to explain it and make my mother use it. We need to work on this.

Distributed webs of trust

Maybe it’s Facebook, Google+ or LinkedIn that will be the trust platform. Or a combination of them using systems like OpenID, oAuth and SAML 2.0 in various combinations. We need something simple, scalable and secure. That’s not easy to fulfill. Especially in a world filled with engineers with a lot of opinions, experiences and ideas. It’s our responsibility to start working on this. The IETF and other parties are already working hard on other components of Voice 3.0.Let’s get our act together on the security part of the vision for Voice 3.0 too. We need to, step by step, build a trustworthy platform for realtime communication for everyone.

Imagine working at your desk, getting a phone call from your friend Randy. You answer on your IP phone. Being Randy, he suddenly wants to play a new jingle he created while being in the mood the day before. The phone speaker is not a good device for a cool guitar riff – is it? On the same desk you have your PC with a softphone that supports HD voice and really cool loudspeakers. Why not transfer the audio to the PC while still using the phone’s microphone? After that, you want to play a video for Randy. Now you want to add media from your laptop to the call – while the call is still managed by the IP phone.

This is a true multimedia session, that is not that far away. But it’s too complex to manage in the SIP protocol today. I just found out about the SPLICES working group in the IETF that is working on this issue. I also know that the Asterisk development team is working on a new media architecture - something we’ve needed for many years. My presentation about the new media architecture from Astridevcon 3 years ago talks about being able to add streams, have multiple streams of the same kind and remove streams dynamically during a call. SPLICES, when finished, will help us implement it in a really cool way in SIP.

Another mailing list to follow.

Do you know another thing? Randy and I wouldn’t be able to have this call as easily if we did not use SIP over IPv6. Doing a call like the one above over two NATs would be awfully complex…

SIPit 28 was hosted by Digium in Huntsville, Alabama, USA the week of April 11-15, 2010. There were 54 attendees from 19 companies visiting from 10 countries, using 40 distinct implementations in the interoperability tests.

SIPit, organized by the SIP Forum, is one of the foundations that make SIP work across vendors and implementations. Twice each year, developers from all around the world meet and test, discuss, learn and fix issues both in implementations and standards. During SIPit events, many bugs in the RFCs – or just missing explanations – has been found. Under the leadership of Robert Sparks, SIPit has become the primary event for all SIP developers. Edvina proudly organized SIPit #26 in Stockholm in May 2010.

See the report for SIPit 28 at

I really miss attending SIPit. I do hope I can attend SIPit #29 in the fall. If you are developing SIP software – clients, servers, devices – make sure you attend the next SIPit with your team!  We have a lot to test, from SIP outbound to SRTP, MSRP, IPv6 and TLS.

Next Page »