Thursday, October 28, 2010

Asterisk High Availability Solutions

Ways to increase system availability and balancing: HAAST from GenerationD www.GenerationD.com offers high availability / clustering for your existing Asterisk servers. A software based solution, High Availability ASTerisk (HAAST) offers rapid automatic failover, manual promote/demote for maintenance, a command line interface, a telnet interface, and a web based interface. Supports master-slave failover, load-balanced, hot-standby, warm-standby modes. Installation is straight forward, with no external hardware required. Real-time fail-over takes 10 seconds to complete. HAAST is a commercial solution, in use at call centers and other high-uptime environments.
DNS SRV on the CPE side but not all phones handle this.
SARK-HA from Aelintra Telecom offers High Availability Asterisk out-of-the box. Runs Aelintra's SARK UCS MVP Asterisk implementation on a pair of servers.... Real-time failover takes less than 20 seconds to complete. Setup requires only 4 additional data fields to filled out in the SARK globals panel. Illustrated set-up guide HERE.
Ranch Networks offers High Availability White_Paper_one_one_HA.pdf solution for Asterisk. This is Hardware based solution. (Just for two asterisks boxes).
Flip1405 Manages virtual IP between two Asterisk servers and queries UDP5060 for state changes Downtime less than 30 seconds Only 2 dependencies (nmap and arping) Incredibly easy to setup
SERVERware is a next generation communication technology solution. Delivers a wide range of IP services and applications. It is redundant, high available and fault tolerant. The SERVER edition of SERVERware is the most economical way to start delivering IP service from a single server. This edition includes all the necessary components that allow service providers to offer any of the supported VPS templates with a clear path upgrade to network edition. NETWORK edition of SERVERware contains up to 256 host servers creating a farm of virtual private servers from which IP service delivery platform is served. This allows service providers to offer redundant, flexible and scalable IP services like mail, web, hosted PBXes etc... Commercial
sysMONIT is a high availability hosts failure detection module that implements small and simple daemon running on each host, sending signaling UDP packets, for purpose of efficient detection of hosts failures, and taking over services by another spare hosts.
Failover switches to automatically switch connections (T1, Ethernet, etc.) to a backup system. Vovida has a SIP load balancer. This allows several Asterisk servers to be setup and appear to be a single server to users. Other load balacing approaches involve the SER SIP proxy, UltraMonkey (see below) or simple DNS round-robin. And then there's also app_distributor as third party application or app_random. there are a lot of bugs and the last version was released in 2002

Use the Linux-HA software to provide high-availability (HA) failover on programmed conditions - by default node hang or crash. Linux-HA also has many telephony-oriented HA APIs as defined by the Service Availability Forum (SAF). It also provides sub-second failover, and works well with shared disk or without. It is commonly used with the DRBD package to provide HA with no single point of failure, and no special hardware requirements.
Stratus, which as been making high-end continuous processing systems for 20 years, has just added an under $10,000 Linux based continuous processing solution: Stratus ftServer T Series Systems


QueueMetrics is able to monitor clustered call-centers with the load distribuited over a number of Asterisk servers as if they were one big single box.
OrderlyStats - Dedicated Real Time Call Centre Management and Statistics Package, can monitor single or clustered asterisk servers from a single page.
Overview
The following is a brief HOWTO for installing High-Availability Asterisk using Open Source tools combined with fail-over capable & intelligent hardware (the fonebridge).
The heartbeat utility is used in a 'Passive-Active' scenario but could easily be modified to do 'Active-Active'.

Background
Some of our more demanding customers in the Call Center and Banking Industry are loathe to accept an implementation with no mechanism for fail-over and high-availability so this is the hardware/software combination we are using to meet their demands.

Client Background
The following scenario was used for a medium sized call center operation with about 60 analog stations, and a single T1 PRI.

Hardware
2 x 1U Supermicro Servers (P4, 512Mb, Dual Gig Eth, Dual SATA with RAID 0) 1 x Redfone Quad T1 fonebridge to terminate PRI connectivity, power channel banks and provide fail-over capability between the two Supermicros. 1 x T1 PRI 3 x Adtran 750 FXS channel banks to drive analog phones 2 x UPS/Surge Protectors
Software
Fedora Core 4 Asterisk, zaptel, libpri from CVS head Linux HA software suite from Ultramonkey. They have RPMs for RHE3 that install fine on Fedora Core 4 Each server is a mirror image of the other in terms of Asterisk configs and software.
Software Install
After a standard install of FC4, Asterisk, zaptel, libpri we installed all of the packages from Ultramonkey pretty much following their guidelines: http://www.ultramonkey.org/3/installation-rh.el.3.html
You may have a few dependencies issues, mainly perl libs, but we were able to satisfy all of them by using Yum. If you are running Apt you should be able to accomplish the same thing.

Configuring Hearbeat
After installing heartbeat there are only three files that need to be modified for your environment. They are ha.cf, haresources and authkeys. They should all be placed in the /etc/ha.d/ directory. The files should be absolutely identical on all machines that are part of your Asterisk high-availability cluster. We only have two servers running but you could easily scale to more using the exact same configurations. These are our config files. All comment lines have been removed but as you can see they are short and simple.

ha.cf
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 200ms
deadtime 2
warntime 1
initdead 120
udpport 694
bcast eth0
node asterisk1
node asterisk2

haresources
asterisk1 10.10.10.110 fonulator asterisk

authkeys
auth 1
1 sha1 SuPerS&cretP@$$werd

Operation
Each Asterisk server has a unique IP address which is part of the LAN segment. This could be a NATed network or Internet facing with public IP addresses. Heartbeat manages the monitoring of the hardware state of each machine over Ethernet or serial port or a combination of both (recommended) and assigns the Virtual IP to the Asterisk server which is currently in an active state. Example;

Asterisk1= 10.10.10.100
Asterisk2= 10.10.10.120
Virtual IP= 10.10.10.110 (see haresources)

With Heartbeat it is important that your node names are identical to the host names reflected in #uname -n. You also may need to manually add IP/hosts statements to your /etc/hosts file so each machine knows how to reach the other via IP.

Following the rules in haresources, Heartbeat will assign machine name asterisk1 as the primary server when both systems start up. It will then start the following scripts; fonulator (this is the little script that configures the fonebridge) and asterisk which starts the Asterisk server. These are both standard startup scripts placed in /etc/init.d/ .
If the Primary server suffers a hardware fault or simply stops responding to the heartbeats going between the two nodes asterisk2 will execute /etc/init.d/fonulator start to reconfigure the fonebridge on the fly and begin redirecting traffic to asterisk2 followed by /etc/init.d/asterisk start to start the Asterisk server.

Results
With heartbeat, IP takeover occurs in under a second. The fonulator utility re-configures the fonebridge in just about the same amount of time and then depending on your hardware platform and the complexity of apps running in Asterisk it can take between 5-15 seconds for Asterisk to start up on your secondary server, load all config files, clear alarms and be ready to process calls. Total fail-over time about 15-20 seconds.

Resources
Ultramonkey http://www.ultramonkey.org (High Avail software packages)
Linux HA http://www.linux-ha.org (The High Availability Linux Project)
Redfone http://www.red-fone.com (Maker of the Quad T1/E1 fonebridge)


This tutorial, presented by Open Innovation, aims at presenting a robust cluster architecture to assess reliability based on Open Source software (Postgres, Pgpool, Csync2, ....) and native IP phone features by avoiding complex and expensive common cluster approaches. This solution currently is up and running to serve all business VoIP traffic (800 IP phones) of one of the most used credit card in Italy.
The tutorial is in PDF format and can be downloaded here: A reliable architecture for Asterisk Cluster
Overview
Use standard Ubuntu/Debian packages to create an Active/Passive high-availability solution for asterisk 1.4 using hearbeat 1.0 (and FreePBX) and using SIP (not redphone/PRI/analog/etc). Note: Use Debian server, do not use Ubuntu server until RAID-1 issues are solved (perhaps Ubuntu Intrepid?).

Background
Many ISP's are now providing "Dynamic T1" instead of (or in addition to) standard T1-PRI service. This "Dynamic T1" just means that they are providing highly prioritized VOIP/SIP between your customer site and them across a T1 (or other highspeed connection). So, it is now more and more possible to get cheaper service using VOIP only without T1-PRI and get very similar call quality. This solution deals with Debian/Ubuntu, but also the special issues that are raised with heartbeat when connecting to the upstream provider via SIP. Many clients want failover support to "seal the deal".

Issues
Heartbeat "takes over" an IP address by adding an "alias" to an interface IN ADDITION to an IP that must always be there so that heartbeat can communicate. For a PBX type install that is not behind a NAT, with no upstream SIP proxy (OpenSer), an alias will be added to BOTH the WAN interface and the LAN interface. Asterisk will need to bind to both the LAN and WAN to operate. Unless you do some routing/proxy magic outlined in this solution, you will run into trouble because asterisk will put the wrong SRC/VIA address in IP/SIP packets. This will cause problems upstream, because your ISP/SIP provider may authenticate based on IP and you will be appearing to send packets from the wrong IP. This will cause problems in the LAN for similar reasons.

Software Install
apt-get install asterisk
apt-get-install heartbeat

Heartbeat Config Generally
See the configuration info in the "Redfone" HOWTO above this one generally. I'm using the 10.10.10.0 addresses from above and 77.77.77.0 as a WAN address in my examples. I'm assuming that the shared LAN address is 10.10.10.110 and the shared WAN address is 77.77.77.110. Asterisk1 server's "other" WAN IP is 77.77.77.100. For sake of example: Asterisk2 machine has 77.77.77.120.

haresources
asterisk1 10.10.10.110 77.77.77.110 fixrouting asterisk

Routing fixes
For each interface to which Asterisk binds it gets the IP address by doing a routing lookup. If you look at 'ip route show' and the look after the word 'src' you will see which IP will be used for that interface (also look at 'ip route get'). It will put this IP into VIA headers and send all IP/UDP/SIP packets from this IP. When this server is primary we need to fix the routing so that all packets on LAN look like they are coming from the 'shared' IP of the two servers for the LAN... AND.. (for multi-homed) we need to fix the routing for the WAN interface also.

The 'fixrouting' script detailed below needs to be /etc/init.d/fixrouting
#! /bin/sh -e
set -e

case "$1" in
   start)
ip route change 10.10.10.0/24 src 10.10.10.110 dev eth0
       ip route change 77.77.77.0/24 src 77.77.77.110 dev eth1
   ;;
 stop)
       ip route change 10.10.10.0/24 src 10.10.10.100 dev eth0
       ip route change 77.77.77.0/24 src 77.77.77.100 dev eth1
   ;;
 force-reload|restart)
   $0 stop
   $0 start
   ;;
 *)
   echo "Usage: /etc/init.d/fixrouting {start|stop|restart|force-reload}"
   exit 1
   ;;
esac

exit 0

Results
When a failover happens that makes this server primary the "shared" IPs will be taken over and then the routing fix will make sure that all packets look like they are coming from that IP in asterisk. When this server fails or becomes secondary IPs will be released and the routing fix will set things back to the Passive state so that the Active machine might still be able to communicate with it (and avoid IP conflicts). The current solution I have uses UltraMonkey ( http://www.ultramonkey.org ) for load-balancing and failover and it works like a champ. There are obviously a lot of details there, and I'd be happy to detail them if people are interested. There is also a site that has two clusters with uniform reachability for all phones and PRIs. None of this requires a lot of dialplan tuning on a day-to-day basis.
Asterisk

No comments:

Post a Comment