WorkHabit Blogs

WORKHABIT’S BUSINESS BLOG

How do you support 500,000 users with Drupal?

by Jonathan Lambert Published: June 12th, 2007
Tagged:

How do you find out what kind of hosting the need for your Drupal based website?

Perhaps because of the fact that we run so much Drupal based infrastructure (we have more than one data center full of Drupal boxes), a question I get asked almost daily is: what kind box do I buy for my Drupal based website?

Usually, the person asking is just getting started and the really struggling with the question of how much resources, financial and time wise, should they invest upfront, and how much should they invest to handle their ongoing growth.

There are a number of rules of thumb, but this question does a lot of facets. I'll do my best to pass on some of the same advice in this post that I give to clients and friends everyday.

I have 500,000 page views per month, but I'm expecting 10 million

The vast majority of customers make the mistake of building their infrastructure for what they think is possible, but not their immediate needs. This is a mistake most of the time but not always.

The rule of thumb for existing sites is that you want to build your infrastructure for six months to a year's worth of traffic growth based on existing trends, the trends are generally unreliable due to the geometric nature of traffic growth once your quote unquote plugged-in. Generally you wanted to a longer-term investment only when you have to or when the complexity of implementation is very high, such as in the case of implementing San or NAS like NetApp or similar.

The importance of horizontal scalability

I cannot emphasize this enough: the specific box you buy or the virtual server that you lease from a hosting company is not an important purchasing decision. It isn't. What's important is that you have a strategy to scale that virtual server or box as your traffic increases. Many of our largest customers started on a 256 meg xen virtual server, which is easily scaled by adding more memory as your traffic increases. So, you can start at a very small commitment, and not be spending a lot of money while your traffic doesn't justify it.

In fact we've had great success building infrastructure with virtual servers and load balancer, especially with xen and vmware.

It's entirely possible to build a robust fairly scalable hosting infrastructure that is entirely based on these commodity tools.

Most people are surprised to learn that xen virtual servers are capable of running in multiple tiers, so you can have a Web server, a DB server, and the load balancer running something like haproxy or pen. In fact, not only can you run these tools inside of xen environments, but you can actually run them in high-availability configuration by putting a redundant components across to physical machines. So, with minimal commitment you're able to get started with the high-availability architecture with dedicated components that ideal for running Drupal, or in fact most web applications.

Strategies for scalability are as different as the tools that you put in the virtual servers. If you're scaling a Web server it's fairly trivial to add two of them. With Drupal the most important thing you can do is actually make sure that the files directory as a shared mount point, or has the ability to our sync between multiple systems to keep us synchronized data synced.

For load balancers, it's really simple. All you have to do is let the load balancers failover to one another just like you would with physical equipment.

For MySQL servers, the best set up is a master slave setup for scaling, and in fact you can actually run things like memcachd inside of your virtual server setup. Your only limitation is your ability to administer MySQL.

We have a cluster running a physical database server with xen-based virtual servers for the Web tier and load balancers that runs more than 1700 domains running Drupal, with three front-end Web servers using 2 GB of RAM each on three physical machines. This cluster handles great deal of traffic, probably more than your average site.

So as you can see it's actually more important to have a strategy for scalability than it is to make a particular technology choice. The most important thing you can do is pick a provider that can actually scale with you as your site grows, or make a decision early on to go with a provider that inexpensive with the expectation that you'll have to move the site at a later time. Don't worry, moving a Drupal based website usually isn't that difficult.

So, you didn't answer my question. How much hardware joining to hit 500,000 users?

You're right. And I can actually answer that question with technical accuracy. All I can hope to do is provide you with an estimate based on what I think your site will perform at in terms of number of requests per second and how much memory it's going to require each part of that system. But, the only way know that is to actually load test your application, so that we would actually be able to find out the actual levels of the application running under particular circumstances. This requires a great deal of technical know-how, but it's commonly done among were senior engineers in order to determine how much hardware is going to be required and whether further performance optimization is required before launching a particular release.

There are two general approaches to this kind of load testing.

The first is to do load testing against a release, which I like to call release load testing. I know it's entirely too logical. This philosophy is about risk mitigation, trying to find a way to make sure that a new software release doesn't break your production environment.

What you do is pick a time before release generally a week or two before you actually release a product, and you actually test the application on your production hardware. This enables you to actually find out how well the applications can perform, and is probably one of the most accurate ways you can test an application. Generally you don't actually have the capacity to do this after launch so there's one drawback. Another drawback is the fact that you usually don't have any production data to work with because it's always a good idea to test your applications logs of what actual users do on the site (with a larger data set being preferable).

The second type of load testing is what I like to call continuous or continuity load testing. This type of load testing is about benchmarking your application to be able to watch it over time. You can use continuous or continuity load testing to accurately measure for changes in your code and application performance over time.

The idea behind continuous or continuity load testing is to provide a baseline measurement of the application for every step of the development process. So you would be testing the application continually, perhaps once every two weeks or a month depending on your application development cycle and whether you were able to support a complete test based on your development methodology and the state of your code.

Being able to determine baseline accuracy is very important, but once you know how the application is going to perform you can actually see whether or not performance degrades or increases or if a particular test phase as a particularly bad performance number, you know that the measurement between the last time you had a load test and the low test you are doing now is the period of time during which the changes introduced, vastly reducing the possibility that major performance bottlenecks will be introduced without your knowledge throughout the development cycle. Many times a code merchant can cause an application to become many times less scalable than it originally was and this is a way watching out for that. If you based on your application with release load testing, and then continually watch it after release with continuous or continuity load testing, you will be able with a little common sense and a little luck, to determine whether or not your production infrastructure is in danger of failure were overloading.

You still didn't answer my question, how do I add support for 500,000 users?

You're right again. Once you determine your baseline with load testing it's possible to actually calculate the number of users that your infrastructure will support.

The formula for determining a baseline of 500,000 users is to do the following:

500,000 / 24 hours / 60 minutes in an hour / 60 seconds in a minute

This leave us with the magic number:

5.79

This means that does for your application will require at least six page views per second. This means you'll actually be required to complete six entire requests, not just serve six elements such as six images. As an example at six requests per second if you had 50 assets per page, which would include things like CSS, images, etc. you would actually be required to complete 600 individual Web server requests per second.

So, now I've introduced to the problem with your question. The number of requests required to serve your site is entirely dependent on the number of files that need to be served.

So, on this basis every site is different, because every site has a different number of included files unless it is completely baseline Drupal with no modules. Almost nobody runs that.

So unless you do the load testing, this formula is completely individual based on the individual profile of each website.

Also it's important to bear in mind that 500,000 page requests isn't actually coming in at six requests per second. Each site has a completely different traffic profile, so if you're looking at a site that has 500,000 page requests it's likely that three or 400,000 of those page requests come in during just a few hours of the day.

Well actually that entirely depends on the website. Some sites have activity all day long, while others just have activity for while their users are home from school or work or otherwise on the website. Again it's up to the individual website to determine what the individual needs are.

There is no way to provide a blanket diagnosis for how to do it because every single traffic profile is different based on the users that actually access the site and the timeframe in which they do it and how many of them are logged on simultaneously. Oh and one more thing, video and audio, such as streaming flash or podcasting can drastically reduce or increase the ability for your site to function under load, based on how they're implemented.

So it's likely your site actually will peak with some 30 requests per second required, which would mean you would need to support not 600 but 6000 thousand requests per second, a consideration that could require substantial hardware upgrade or more sophisticated planning.

If there's interest, I would be more than happy to explain some of the techniques for how to do load testing on applied basis, and actually determine the overall characteristics and requirements of your Drupal based application. It's not an insignificant task and usually requires that you have some experience linux. But baseline load testing to be done something like VM Ware and a laptop, though it's difficult to translate those numbers onto production infrastructure.

So what advice can you actually give me?

So much of Drupal scalability has nothing to do with the hardware you put it on, but with the optimization of the components that support the Drupal installation. MySQL is particularly difficult, and is often a source of difficulty or Drupal installations.

If your company requires the make immediate purchasing decision, I would suggest you start off with as much horsepower as you can afford, built around horizontally scalable architecture.

But for the vast majority of users, I would suggest you start off with a single virtual server. Do something based on xen, or if you have to start out a cheap shared host but make sure you move before you get significant traffic (or you'll end up like the unfortunate folks who launched Skirt on bluehost - it died - great for PR, crappy for launching a site).

So many Drupal projects come to us after they suffered some kind of dramatic failure or been shut down by their hosting company due to excess resource usage or database problems or something similar.

So as you can see designing an infrastructure to support 500,000 users requires careful planning as well as, you guessed it, load testing.

Plan your work, work your plan

For more complicated releases, setting a standard for how much traffic you need to support as a baseline is a really good idea. This helps the developers to focus on a goal, which can really help them understand how much time and energy they should focus on scalability (it's easy to get lost).

This brings up the idea of a test plan. A test plan is essentially a set of requirements that lays out an approach. Most do a minimum:

  • Describe the traffic they would like achieve, so, for example: 500,000 users
  • Explain what the customer experience should be like (not, it would be fast, but "it would work in all browsers, and be quick to work with like a desktop application even when under high latency" - more of a design goal test)
  • Explain who's involved, and who can approve specific components of the system (essentially, who signs off - this is really important on big projects).
  • Explains HOW to test, so you can coordinate testing
  • Covers the backend scalability, but often other testing at the same time (as in a QA round), so many times you're making changes to the code, content, css, etc to support the site - not just scalability. And you should because it matters. And that should be laid out in the test plan so people know what they're authorized to do.

Generally, test plans are only necessary for larger projects or projects with distributed teams (aka, most software projects), but you get to choose when to use them.

Awesome, what's next?

In future articles will be publishing the results of some of our ongoing load testing. We've been doing work with memcached, the new advanced Drupal cache, code acceleration tools like APC, eAccelerator, and of course our long-time favorite Zend, and I've been sitting on the results of all our tests for a very long time.

We are going to go back and revisit some of those tests and publish results as well as how to reproduce them over the course of the next couple of months. We are of course extremely interested in what people would like to see so if you have specific requests for things you like to see tested please post them in the comments below.

Bursts

The formula to compute your “magic number” assumes that the requests have a uniform distribution. This is not likely to be an accurate assumption. I know you know, but I figured I’d point it out nonetheless.

Yes, you're absolutely right

Yea Dries, you are of course correct. It’s “fake numbers” time, to try to drive home the example.

The truth is, there is no substitute for experience on this stuff - the more you know about your existing traffic patterns and your audience, the better off you’ll be scaling your gear.

Jonathan

Question

I know this is totally out of place.. but how do I make the comment fields look like yours? With the name.. email.. homepage and blank subject.. if you get a chance to answer I really appreciate it.. thanks

Just one minor correction

Hi there and thanks for sharing your experience.

Note that at: “As an example at six requests per second if you had 50 assets per page……be required to complete 600 individual Web server requests per second.” instead of 600 i think it should be 300 there. Cheers.

Hmm

You could probably accomplish a lot of this with 1 dedicated web server for running scripts, 1 for hosting content such as images…etc…1-2 database servers, and some memcached daemons, that would probably work pretty well.

Number of modules called makes a huge difference

I setup www.amazinggroups.com a few months ago. It gets a little traffic here and there, but not like what I am aiming for.

The site is incredibly slow. Sometimes the front page will take over 20 seconds to load.

MySQL for the site is on hardware (soon to be moved to a seperate machine with dual Opterons).

Drupal is running in a VMWare host under Ubuntu. With all the bells and whistles for optimization that I could find. I tweaked the VM and Apache. Set up persistant connections to MySQL.

I even put a profence caching load balancer in front… Still slow as snot…

Until I can get page loads in the 5-6 second range(or less), I cannot realisticly enage in a promotion campaign.

I am still working on this issue. It appears that some of the slow load is external site pulls, would be nice to have the page load and then load the external sites pulls after the main page loads.

Moving external resources..

Netzarim,

If the external loads are causing issues, is it possible to move those to the bottom of the page? I’m assuming it’s more than just javascript that’s being pulled from remote. Is this correct?

There are a couple options you have in this case.. If you’ve got images that are being pulled remotely, make sure they at least have widths/heights defined; this will allow the page to render more quickly (definitely an old-school technique, but I’m noting it in here for posterity).

Have you noticed any issues with any of the tiers? Is there a bottleneck in apache or mysql that you’re noticing?

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote> <h3>
  • You can use Markdown syntax to format and style the text.

More information about formatting options

Papernote

Upcoming Events