I spent a little too much time in the 90’s studying queuing theory. Why, you ask? After all, how exciting is it to be able to mathematically model the dynamics of…people lining up?
Queuing theory is more and less than that. It models the dynamics of anything that is waiting in line for just about any purpose. However the typical queuing models are usually either too simple to be accurate or too complex to be tractable.
For many years, queuing theory was used to model telecommunications system capacity. This worked reasonably well until the amount of data traffic in these systems began to grow. Packet data exhibits dynamics far different from circuit-switched telephony. After some noble, but ultimately futile attempts to “save” queuing theory, most of the academic community reluctantly admitted that other modeling techniques might prove more accurate.
Over the years I’ve found three basic queuing principles to be the most applicable and useful in practical settings. Below are these principles and (hopefully) intuitive explanations of why they are true. Of course, if you don’t believe me then you can look up proofs in various textbooks.
Little’s Result: If the arrival rate of customers into a queue is lambda (e.g., 5 customers per second), N denotes the average number of of customers in the queue, and T denotes the average amount of time that any given customer waits in the queue. The following relationship always holds:
N = lambda T
The reason for this can be explained with an example. Suppose a customer, Bob, arrives in the queue. On average there will be N customers ahead of Bob, and Bob will wait T seconds in the queue. During these T seconds, on average lambda T customers will arrive in the queue behind Bob. Thus when Bob is finally served, there is once again N customers in the queue.
Another means for understanding Little’s Result is to consider that if you double the arrival rate to the queue per unit time (lambda) or the waiting time in the queue (T) then you will also double the number of customers in the queue.
The remarkable thing about Little’s Result is that it universally applies to any queue, regardless of how customers arrive or how they are serviced. Thus, it applies equally to a steady stream of customers arriving to a queue that is served in a first-in-first-out basis, and to a highly-variable stream of customer arrivals to a priority queue.
The more variance in the service, the longer the average waiting time: Another relatively simple concept that is fairly intuitive. The more variability in the time it takes to serve customers, the more likely you will run into a pathological situation where a bunch of customers arrive behind a given customer who requires an abnormally long period of time to be served. You might feel like you run into this situation every time you get in line. In fact, the average waiting time in the queue grows linearly with the variance of the service time. Thus, it behooves us all to reduce the variability of service time. If that can’t happen then the next section describes one way of mitigating the damage.
When you have multiple servers, always maintain a single queue: Suppose that, at the airport, there are two agents (servers) checking in customers. The airline has two reasonable options for arranging queues. The first approach is to allow a dedicated queue for each agent. The second approach is to have a single, common queue in which all customers wait, and the agent calls the customer at the head of the queue when the agent finishes serving a previous customer.
The surprising result is that, all other things the same, the average waiting time for the two-queue system is twice that of the single queue system. Thus, simply arranging customers to line up in a single queue is twice as efficient as using two queues. As the number of servers increases, so does this efficiency. A single queue with four servers is four times as efficient as the same system with four queues, a single queue with eight servers is eight times as efficient as the same system with eight queues, and so on.
Luckily, most airports already use this practice, as do some motor vehicle departments and movie theaters. It is a simple way of getting a large number of people through the servers as quickly as possible. Why does it work? Again, it is all about avoiding the situation where a customer who requires a long service time ties up everyone behind him. When there are multiple servers and a single queue, the system approximates a fair first-come-first-served discipline.
Bonus: Is it queuing or queueing? Now that you know all you need to about queuing theory (and pretty much all I know), here’s a question I don’t have an answer for… The modern usage of the word seems to be “queuing,” but traditional texts tend to use “queueing.” I believe the latter is the British spelling and the former is the Americanized version. Anyone know for sure?