July 24, 2008
| Bio-IT World > XXX-Rated
XXX-Rated


By BIO-IT World


XXX-Rated

No, it's not what you might be thinking. The Xs stand for Xserve G5, Xserve RAID, and Xsan, which are server, storage, and software products from Apple Computer.

The BioTeam has had many opportunities to experiment with and deploy Apple's Xserve G5 (a 1U server with dual 2GHz G5 processors) and Xserve RAID (3U, 5.6TB Fiber Channel RAID storage device). However, through our recent efforts in the deployment of a 125-node Xserve G5 cluster at the University of Pittsburgh Department of Human Genetics, we had our first opportunity to experiment with a pre-release version of Xsan, Apple's new cluster storage area network (SAN) file system. What we found may surprise and delight you.

Informatics clusters generally have a shared file system, providing all cluster nodes with a common place to read input from and write output to. However, when scaling a cluster beyond even a modest size of eight to 16 nodes and particularly when executing I/O- bound applications, this shared file system quickly becomes the bottleneck, throttling overall cluster performance.

The challenge becomes providing all cluster nodes with fast, reliable, affordable, and concurrent read/write access to common data.

Although compute and network hardware for cluster building has been commoditized and become relatively inexpensive, large-scale shared storage systems have not. Physical disks and RAID devices have become dramatically less expensive, but sharing this storage among tens and hundreds of machines simultaneously requires a completely different class of storage device. The standard solution to this shared storage problem has been to purchase a six-figure file server, rivaling if not surpassing the total cost of the compute and network hardware that contributes to a cluster.

Xsan is Apple's solution to this problem. Xsan enables the combination of one or more Xserve RAIDs ($2 per GB) with one or more (64 practical limit) Xserve G5s to assemble a scalable, shared file system for as little as $9,000 that is scalable to a practical maximum storage capacity in the petabyte range.


Remedy for Sluggish I/O
The cluster at UPitt consists of 121 Xserve G5 cluster nodes and four specialized Xserve G5 "head nodes." Each head node is connected by Fibre Channel to a common Xserve RAID that appears to the node as local disk. However, as with other SAN devices, each head node can have access to only its own dedicated portion of the RAID. The head nodes provide the many integrated, shared network services needed to transform this collection of discrete computers into a single virtual compute resource. One such network service is the network file system (NFS). NFS permits each head node to make its local file system (including its portion of the RAID) available for read/write access to the cluster. However, given a cluster of this size, the resulting NFS client/server ratio is 120:1 and would result in unacceptably poor I/O performance. Alternative solutions might be:

  • Locate different NFS shares on separate NFS servers (however, the NFS client/server ratio would still be 120:1 for each share).

  • Replicate the data within NFS shares on each of the NFS servers (this works for read-only data but results in a concurrency problem for data that are modified over time).

  • Break out the big bucks and buy a beefy file server.


STEELY SAN: University of Pittsburgh's cluster consists of 121 Xserve G5 nodes and four G5 "head nodes" linked to RAID storage.

With Xsan, the options improve significantly. Xsan manages concurrent disk access of many machines connected to a common set of Fibre Channel storage devices. This means that all four head nodes "see" all of the disks on the Xserve RAID as local disks, rather than only their dedicated portion. Since Xsan manages concurrency at the file level, rather than the volume level (as many SANs do), all four head nodes can share the RAID by NFS to the cluster simultaneously. This permits fractions of the cluster compute nodes to gain access to the common shared file system through independent NFS servers, in this case reducing the NFS client/server ratio by a factor of four.


Putting Xsan to the Test
In our experimental tests of this pre-release version of Xsan, we performed I/O benchmarking tests using "bonnie" (a common I/O benchmarking tool) executed on the cluster compute elements against a common NFS share point. Executions of bonnie were launched simultaneously on all 120 cluster nodes using the multi-threaded version of dsh (distributed secure shell). This resulted in a measure of total I/O throughput of one or more NFS servers in the presence of 120 simultaneous clients competing for read/write access to the same physical disk. (That is, we tested 120 machines reading/writing from one disk through one NFS server, 120 machines reading/writing from one disk through two NFS servers, and so on.)

As a baseline, we measured the I/O performance of all 120 cluster nodes accessing data from a single NFS server without Xsan and observed an I/O performance of 1X. Performing the same benchmark tests in the presence of Xsan with NFS server load distributed over two Xserve G5s, we observed an NFS I/O performance a little better than 2X, over three servers a little better than 3X, and over four servers a little better than 4X. This is big news. This means that for the price of one or more additional Xserve G5s (not much when compared to a six-figure file server), you can distribute NFS server load over as many servers as you want (assuming performance remains linear over scale).

This is great, but you might be asking yourself, "Am I locked into using Apple's triple-X product offering to deploy this sort of technology on my cluster?" Although Apple might prefer it if you did, the answer is no. Apple has taken an aggressive open-technology stance within this architecture. The underlying Fibre Channel technology is standards-based. You can mix and match Fibre Channel cards and switches from Brocade, QLogic, Emulex, and so on. You can use Fibre Channel storage devices other than the Xserve RAID. Xsan's underlying cluster file system is compatible with ADIC's StorNext file system, so you can even mix in Linux, Solaris, Windows, etc., clients with corollary client software from ADIC.

We're pleased with our first experience with this pre-release version of Apple's Xsan software and look forward to the finished product.

Bill Van Etten is a consultant for The BioTeam. E-mail: bill@bioteam.net.


PHOTO BY PATRICIA NAGLE /UNIVERSITY OF PITTSBURGH

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1

White Papers & Special Reports

definiens briefingon-76Next-Generation Technologies Revolutionizing Oncology and Diagnostics
underwritten by Definiens

This “Briefing On” collection of Bio-IT World features, commentaries and analysis, presents some of the latest thinking on high-throughput technologies that are being applied to the fields of research and drug discovery, with particular emphasis on oncology, diagnostics and imaging technologies. Download now at no charge compliments of the underwriting sponsor, Definiens. Download This Free Paper



gq nxt gen seq

This Bio•IT World Briefing On “Next-Generation Sequencing,” underwritten by GenomeQuest, Inc.,
presents a selection of feature stories, interviews,commentaries, conference reports, and editorials on the emergence, opportunities, and challenges posed by high-throughput sequencing. Covered in this collection: the launch of new platforms from Applied Biosystems and Helicos; new applications of nextgen sequencing; the rise of personal genomics; and informatics solutions to vexing problem of managing the vast volumes of next-gen data.  Download now 



metaminerMetaMiner™ Cystic Fibrosis Report
Sponsored by GeneGo

This paper discusses the MetaMiner™ (CF) data analysis platform for a broad range of CF researchers designed to:

1. Easily assemble important biological and chemical experimental data available today in cystic fibrosis research.
2. Visualize key mechanisms leading to the disease through pathway maps and network models.
3. Provide the CF community a “one stop shop” tool for uploading and analyzing experimental data in a disease-centered interface.
Download Now

 



Life Science Webcasts & Podcasts

GenoLogicsgenologics 2 translational
Enabling Translational Research Informatics

Learn about the challenges facing life sciences research labs to manage their translational research data:

  • The trends for organizations to adopt informatics solutions for translational research.
  • The unique requirements with managing complex data and workflow.
  • What labs should consider when reviewing informatics solutions for translational research.
  • Which life sciences research organizations are successfully adopting an informatics solution.

Download Now



More Podcasts

Job Openings

Assistant Editor (Science Writer)~Cambridge Healthtech Institute (CHI), Needham, MA, 
Cambridge Healthtech Institute seeks an assistant editor (science writer) who is an ambitious, dependable journalist who can fulfill a range of writing and editorial duties for a series of eNewsletters covering various aspects of the biopharmaceutical industry in addition to CHI’s flagship publication, Bio-IT World magazine.  This is a superb opportunity to make important contributions to the growth and success of a multimedia science publishing group, while gaining invaluable experience in multiple facets of the publishing industry.   Interested candidates should submit a cover letter, including 3 writing samples (attached in Word or PDF format), salary history or requirements, and resume to kdavies@healthtech.com. For a detailed description of the Assistant Editor position, please click here.

Fred Hutchinson Cancer Research Center: IT Business Analyst III
The Hutchinson Center is the only National Cancer Institute-designated comprehensive cancer center in the Pacific Northwest. Through our Tumor Research Initiative, we are finding new ways to detect tumors at an early stage.  We are presently seeking an experienced IT Business Analyst to assess technology needs for the Tumor Research Initiative, and to identify and design improvements to computer based systems.  For more information please visit www.fhcrc.org and search for Job# AD-21465

For reprints and/or copyright permission, please contact RMS, 1808 Colonial Village Lane, Lancaster, PA;

(717) 399-1900 ext 100 or via email to bio-itworld@theygsgroup.com.