High-Availability Cloud Servers on a Budget – Cheap way to being Redundant

Over the past 18 months, cloud computing has become an increasingly important component of the individual business; large and startup alike. More and more companies are turning to the cloud for their hosting, storage, and application needs. As companies switch their infrastructure over, the high-availability of these services becomes an ever important, but often overlooked, piece of the critical infrastructure puzzle.

In this article we will discuss how to achieve High-Availability on a tight budget which is surprisingly easy and inexpensive for most organizations to achieve. Using just a few of the open source utilities out on the market today, companies are able to achieve total redundancy and high-availability by utilizing Gigenet’s Los Angeles (LA) Cloud Servers and Chicago (CHI) Cloud Servers. We will be looking at one way businesses can leverage their cloud by providing high-availability services—affording respective clientèle the top-notch service they expect, and the individual company the fail-safe infrastructure they deserve.

The 4 technologies that we are going to discuss to achieve this desired effect of high-availability services in the infrastructure are as follows:

HAproxy (For Load Balancing)
BIND + GeoDNS (For less Latency and High-Availability)
MySQL (For Database back-end and replication)
Apache / Nginx (For serving the content)

Making all of these opensource technologies work together is extremely simple as the step by step guides will show in future articles. For now lets take a glimpse at each of the aforementioned technologies!

HAProxy – The Reliable, High Performance TCP/HTTP Load Balancer
http://haproxy.1wt.eu/

“HAProxy is a free, very fast and reliable solution offering high availability, load balancing, and proxying for TCP and HTTP-based applications. It is particularly suited for web sites crawling under very high loads while needing persistence or Layer7 processing. Supporting tens of thousands of connections is clearly realistic with todays hardware. Its mode of operation makes its integration into existing architectures very easy and risk-less, while still offering the possibility not to expose fragile web servers to the Net.”

ISC BIND DNS + MaxMind GeoDNS patch
http://www.isc.org/software/bind
http://www.maxmind.com/app/geoip_resources

“BIND is by far the most widely used DNS software on the Internet. It provides a robust and stable platform on top of which organizations can build distributed computing systems with the knowledge that those systems are fully compliant with published DNS standards.”

In addition, BIND is in use on 10 out of the 13 root name servers; the top-level DNS servers that directly answer requests for records in the root zone… these 13 servers form a crucial part of the Internet because they are the first step in resolving domain and host names into IP addresses—the method by which Internet hosts use to communicate.

The MaxMind GeoDNS patch is a complementary, and free, software patch for BIND that enables your company to add geographical filter support to the existing views in BIND.

One very typical use of this is directing visitor traffic to the nearest geographically available web-server – no longer will it be necessary to direct this critical traffic in a hap-hazard way when there’s clearly a better route available…

Combine BIND and GeoDNS with the power of the GigeNET Cloud and you’ve got a rock-solid infrastructure that will field requests to your clientèle with the lowest latency possible!

MySQL
http://www.mysql.com/

“MySQL is the world’s most popular open source database software, with over 100 million copies of its software downloaded or distributed throughout it’s history. With its superior speed, reliability, and ease of use, MySQL has become the preferred choice for Web, Web 2.0, SaaS, ISV, Telecom companies and forward-thinking corporate IT Managers because it eliminates the major problems associated with downtime, maintenance and administration for modern, online applications.

Many of the world’s largest and fastest-growing organizations use MySQL to save time and money powering their high-volume Web sites, critical business systems, and packaged software — including industry leaders such as Yahoo!, Alcatel-Lucent, Google, Nokia, YouTube, Wikipedia, and Booking.com.”

MySQL includes not only the features that expert DBAs expect—such as Stored Procedures, Triggers, Cursors, Distributed Transactional Support, multiple storage engines, SSL support, Query Caching, and Hot Backup—but ACID compliance, multiple asynchronous replication modes, and even synchronous Clustering!

Apache and Nginx – High Performance Web Servers
http://httpd.apache.org/
http://nginx.org/

NGinx is known to have better performance than Apache does but either one of these two free technologies will work. Its really about choosing what you are more comfortable with.

“Nginx is a free, open-source, high-performance HTTP server and reverse proxy, as well as an IMAP/POP3 proxy server. Igor Sysoev started development of Nginx in 2002, with the first public release in 2004. Nginx now hosts nearly 7.65% (22.8M) of all domains worldwide.

Nginx is known for its high performance, stability, rich feature set, simple configuration, and low resource consumption.

Nginx is one of a handful of servers written to address the C10K problem. Unlike traditional servers, Nginx doesn’t rely on threads to handle requests. Instead it uses a much more scalable event-driven (asynchronous) architecture. This architecture uses small, but more importantly, predictable amounts of memory under load.
Even if you don’t expect to handle thousands of simultaneous requests, you can still benefit from Nginx’s high-performance and small memory footprint. Nginx scales in all directions: from the smallest VPS all the way up to clusters of servers.

Nginx powers several high-visibility sites, such as WordPress, Hulu, Github, Ohloh, SourceForge, WhitePages and TorrentReactor. ”

Apache is known for its ease of use and is essentially the worlds most popular web server. If you run in to any problems with Apache you are more likely to find the solution much quicker than you would with, say, NGinx.

“Apache Synapse is a lightweight and high-performance Enterprise Service Bus (ESB). Powered by a fast and asynchronous mediation engine, Apache Synapse provides exceptional support for XML, Web Services and REST. In addition to XML and SOAP, Apache Synapse supports several other content interchange formats, such as plain text, binary, Hessian and JSON. The wide range of transport adapters available for Synapse, enables it to communicate over many application and transport layer protocols. As of now, Apache Synapse has support for protocols like HTTP/S, Mail (POP3, IMAP, SMTP), JMS, TCP, UDP, VFS, SMS, XMPP and FIX.

Apache Synapse is free and open source software distributed under the Apache Software License 2.0 The latest version of Synaspe is v2.0 . This release comes with a horde of new features, bug fixes, performance and stability improvements.”

Tie the Technologies Together

In order to really achieve high-availability as promised in this article you will have to know how to fit the pieces of the puzzle together. While this article might not go in to technical details on a step by step guide we have included some references to get you started next to each line item.

Essentially you will need to separate the Dynamic files (Files and Directories that are subject to change) from the Static files (Files and Directories that will remain consistent in both locations). Once this is achieved you will be able to keep both machines up-to-date with rsync or a similar tool. Basically for the files you will have two directories on each machine; one will be for staging and the other production. You will NFS mount the production directory from one server to the staging area on the other server and vice versa so that data is always stored locally. For more information on this I would refer you to pages like this one:

http://serverfault.com/questions/119961/is-there-a-way-to-mirror-two-severs-on-ubuntu
http://www.howtomonster.com/2007/08/08/how-to-sync-data-between-2-servers-automatically/
http://superuser.com/questions/126884/best-way-to-automatically-synchronize-files-between-linux-and-windows

To sync Mysql you will need to refer to articles such as this:

http://www.neocodesoftware.com/replication/
http://www.howtoforge.com/mysql_master_master_replication

For GEODns I would recommend going to a provider that specializes in this so that if one server is down the DNS server knows not to send traffic to it at all. This is a form of load balancing but its more of a high-availability feature in this article. This is the feature that will prevent traffic from being sent to a server or load balancer that is not operational or online. Some providers to look at for this service (GigeNET will soon have this):

http://www.autofailover.com
http://www.ultradns.com/Services/DNS-Traffic-Services/Directional-DNS
http://dyn.com/enterprise-dns/dynect-platform/dynect-traffic-management-gslb

For HAProxy you will need at least one HAProxy server per physical location. This could be on the same server that the MySQL Database, Dynamic and Static Files, Apache/NGINX, and BIND are located on. The reason why you need HAProxy is so that you can add servers in each physical location (as much as needed) quite easily. Expanding Horizontally is a very important feature in any cluster solution. GigeNET’s Cloud Servers enable you to scale both vertically (by growing the Cloud Server resources) or horizontally (by building new Cloud Servers based on a template or from backup).

http://www.howtoforge.com/setting-up-a-high-availability-load-balancer-with-haproxy-wackamole-spread-on-debian-etch
http://linuxadminzone.com/how-to-install-setup-and-config-haproxy-loadbalancer-for-content-switching/