Voice 3.0

RFC 3261, The Session Initiation Protocol, was published in 2002, six years after the initial work on SIP. Wikipedia writes

“SIP was originally designed by Henning Schulzrinne and Mark Handley in 1996. In November 2000, SIP was accepted as a 3GPP signaling protocol and permanent element of the IP Multimedia Subsystem (IMS) architecture for IP-based streaming multimedia services in cellular systems. The latest version of the specification is RFC 3261 from the IETF Network Working Group published in June 2002.

The SIP 2012 Realtime Communication Framework - a specification that we need!The problem here is that SIP 2.0 has changed a lot. It’s been formally updated by  a number of RFCs: 3265, 3853, 4320, 4916, 5393, 56215626, 5630, 5922, 5954, 6026, 6141 (according to tools.ietf.org). And that’s the updates that changes the core protocol. In addition, there is a lot of extensions and additions to what now could be called The SIP 2012 Realtime Communication Framework. The core protocol is now just one piece of the puzzle.

Customers need to change requirements

I’ve seen too many public tenders and requirements that refer to “SIP according to RFC 3261″. This is like buying a PC referring to Windows 95 as the base requirement for compatibility. So what should be referred to? Here’s the core of the problem. There’s no simple document or reference profile to refer to. And there is a huge gap between the IETF and the current implementations, something that we see every year at SIPit events. Only customer requirements can push vendors to move forward.

An example: The SipConnect specification from the SIP forum

The SIP Forum SIP Connect 1,1 specification is a good example of a reference profile that customers can use. It focuses on the SIP trunk – the connection between a PSTN gateway provider (ITSP) and an enterprise PBX. SIP connect makes it possible for customers not to refer to RFC 3261, but a more complete specification that builds a framework on top of the specifications. During the work with SIP connect, it was revealed that many PBXs used SIP in a way that was actually not supported by the current set of RFCs. The SIP forum took this issue back to the IETF and the result is GIN – a way to register for many E.164 phone numbers (RFC 6140). Building a reference framework with customer focus actually added to the standard framework, based on the use in the current implementations.

There are more frameworks built on SIP – IP Multimedia System (IMS) from 3gPP is another example, but not for business users or vendors selling SIP solutions to consumers. The 3gpp is active in the IETF, trying to make sure that changes and additions are documented in RFCs and made compatible with the rest of the SIP framework.

Wanted: A reference profile for SIP phones and servers

If you look at the original set of SIP RFCs, there was a lack of solutions for NAT handling. In the current SIP framework the IETF have added new standards like ICE/TURN/STUN and SIP outbound. There are also standard frameworks for configuration update notifications, registration management and TLS certificate management – something that most vendors implement in a proprietary way. We see more implementations of these additions at every SIPit test event, but most phones on the market (and most servers) are still not supporting these new standards. We need a reference profile (maybe from the SIP Forum?) that customers can refer to in order to put pressure on vendors to update their implementations.  SIP implementations that is based on more current work will lead to better products, more NAT and IPv6 friendly solutions and improved security. Standardization in configuration and security management will lead to lower costs and less vendor lock-ins.

The question is who starts the work specifying the new reference profile? 


Realtime communication in web browsersVideo sessions in the browser opens up for a lot of new applications. While this has been working with various plugins, new opportunities will open up when it becomes part of your standard web browser. HTML5 introduced native video in the browser. New standardization are looking into peer2peer multimedia sessions in the browser, not just broadcast. This opens up for a large set of applications, from “click2call” without plugins to video chats like in Facebook and Google+. This is exciting. And all the old SIP architects seems to have jumped onto this project, trying to get things right from start in this second generation effort. (more…)

Alec Saunders has written a new manifesto called “Voice 3.0: The emergence of the Voice Web“. It’s very good reading and I agree with most of it. Please read it! What I am missing, which you will see in my comment far down the side, are two things:

  • IPv6 – it’s the glue that will make Voice 3.0 work
  • Security – we need to learn from our mistakes and make Voice 3.0 secure by default

I have written a lot of information about SIP and IPv6 on Edvina’s web site, Twitter and Facebook. Dan York has joined me in this campaign and have a lot of good podcasts, presentations and information on Voxeo’s IPv6 resources page. There’s quite a lot of work we still need to do, but we’re heading in the right direction in the SIP community.

I’m embarrassed

Now is the  time to start working on security. I find it really embarrassing that we have almost no experience of security in VoIP. The protocol has been around for ten years or more and we’re still confused. The hardware vendors claim that the CPUs can’t handle it. Or the DSP’s. Or something else. As we move towards more video we need more CPU power to encrypt. On the other side we have users that doesn’t require any security. They run stuff over the enterprise networks, over home networks and the Internet without bothering with who can listen in, access the log files or access their accounts.

The dark side

This last year, the number of attacks on SIP servers has grown in numbers. I hear not only rumours, but see attacks in progress. I have met a number of people who have lost huge amounts of money on International calls to foreign countries, placed by unknown hackers that figured out that their account (typically “300″) had a very complicated password (typically “300″). The next thing that we’re going to see during summer leave or xmas is someone showing a journalist how easy it is to see that the boss calls the union several times a day, propably to negotiate layoffs. Or calling competition, maybe to sell the company.

Let’s agree on one thing: ALL networks are insecure

I think it’s time to stop ignoring security. I think we have to agree that it makes our architecture much more easy if we assume that ALL networks are insecure. We can not judge, our users surely can not determine if a network is secure or not. They have to be able to connect to any open WiFi network and use the services we produce. Without thinking about anything else than reaching their office, customer, family or the kid back home. Every communication needs to be secure. Period.

If we start there, we’re on the right track. Now we need to work together and figure out where to go. How do we exchange encryption keys? How do we handle identities? Where do we start. Let’s not make the mistake of trying to hook on to a global PKI that doesn’t scale (and has proven not to be secure) like the one used by the web. PGP is interesting, but only scale on a nerd level. It’s too complicated the way we use it today. I would not even try to explain it and make my mother use it. We need to work on this.

Distributed webs of trust

Maybe it’s Facebook, Google+ or LinkedIn that will be the trust platform. Or a combination of them using systems like OpenID, oAuth and SAML 2.0 in various combinations. We need something simple, scalable and secure. That’s not easy to fulfill. Especially in a world filled with engineers with a lot of opinions, experiences and ideas. It’s our responsibility to start working on this. The IETF and other parties are already working hard on other components of Voice 3.0.Let’s get our act together on the security part of the vision for Voice 3.0 too. We need to, step by step, build a trustworthy platform for realtime communication for everyone.