Standing in the line for the return ferry tonight, along with hundreds of others who had been forced to try the ferry by the BART transbay outage, I had a chance to ask one of the SF Bay Ferry employees walking the line why they had not separated the Oakland & Alameda traffic with the extra boats they had added. That was something they had been doing when the free spare the air day rides increased the traffic. The answer was that they had not had time (or it simply didn’t occur to anybody to do it).
It reminded me though of the work we had done on our web services in the early days of scaling up to deal with rapid load increases. Sure, the first thing you do is add extra capacity. For the web service, more servers; for the ferry, more boats. That helps, but you might still find that some requests are hogging more of the resources than others, and delaying otherwise quick services.
Waiting in Alameda felt a bit like that yesterday morning as all the extra boats raced to Oakland & filled up, either arriving at Alameda with very little space for more people, or bypassing us completely because they were already full.
In our web world we faced a similar situation, with more complex requests holding up light weight requests that needed a fast response. The solution was simple: split the servers into two pools & manage the capacity for each pool separately. We actually ended up with more than two pools, but this simple approach has allowed us to manage resources much more efficiently and also implement automatic scaling for each pool independently. Sharding is often used for scaling databases, but it can also be effective for managing traffic.
The SF bay ferry service experienced a big spike in traffic today, and while they responded by adding additional capacity quickly (five boats instead of one this morning), they could have used that additional capacity more wisely and kept the lines at both Oakland and Alameda flowing more evenly.