Bin Packing Problems: The SQL

By: Joe Celko
To read more DBA articles, visit http://dba.fyicenter.com/article/

The bin packing problem is an NP-complete problem. It is a great way to make computer science students do some work and it is also useful in the real world. The basic problem statement is that you are given a set of (n) items. The items all have different volumes. We then have a supply of bins or boxes of the same size. The goal is to put the items into the bins and to minimizes the number of bins used.

As a real world application for this algorithm, look at television advertisements of arbitrary durations. Given time slots of, say, five minutes how do we pack the the most commercials into each time slot and maximize our daily profits. Think of free public service announcements or promo spots as unfilled bins.

Another example is scheduling tasks with known execution times on a set of identical machines. Minimize the number of machines needed for the whole project. This is actually the simplest form of a more complex problem. The task usually have an order of execution. The wire has to be on the wheel before the lug nuts, etc. But is another article for later.

To make the math easier, we assume that the items are sized in integer values from 1 to (k) where (k <= bin size). And to keep thing simple, the bins are size (v). I happen to like (v =12) because 12 has a lot of divisors and because I can use egg cartons and colored plastic eggs when I teach this problem.

It is obvious that we can not have an item of size zero. In spite of what people believe about suitcases, you cannot cram an oversized load into a bin. It is also obvious that if an item is the size of the bin then you have filled that bin and need to get to the next one. We know that the number of bins is between ceiling(sum(item_size)/ bin size) and (k), the number of items.

If the items are all of size one, then we can simply fill bins and use that lower limit of ceiling(sum(item_size) / bin_size). There are sophisticated algorithms for particular item sets, too.

But in the general case, we have to fall back on heuristics. Frankly, the good ones are done with procedural code and not SQL. But there are some quick heuristics that work very well in the real world.

The First Fit Algorithm provides ie fast, but often gives a non-optimal solution. It is just what you chink and the way that people pack a suitcase. This assumes that the items are in a queue and you have enough bins. Grab an item and put it into the first bin in which it will fit.

The First Fit Algorithm requires ?(n log n) execution time, where (n) is the number of items to be packed. This is the same as a non-stable sorting algorithm. You can boost this algorithm by first sorting the list of items in decreasing order (First-Fit Decreasing Algorithm).

This does not guarantee an optimal solution, and for longer queues, it may increase the running time because of the extra sorting. It is known, however, that there always exists at least one ordering of the items queue that produces an optimal solution. You just cannot find it easily. In 1973, Jeffrey Ullman (a very important name in computer science) proved that this algorithm can differ from an optimal packing by as much at 70%.That same year, S. S. Johnson showed that this strategy is never suboptimal by more than 22%, and furthermore that no efficient bin-packing algorithm can be guaranteed to do better than 22%.

Another version is if this algorithm is the First-Fit Increasing Algorithm, which is the same queue sorted from smallest to largest item size. It has all the same problems and limits.

The Best Fit Algorithm needs some smarts. We you get the next item in the queue, you look over the partially filled bins for one which will be completely full if the item is added to it. This sounds like it ought to be optimal. Sorry, but it can fail, too. As an exercise, try writing a query to pick subsets of (n) items that total to the bin target bin size. If you cannot fill a bin, insert itnto the gin that will have the smallest space left; the best fit.

The (n = 1) case is trivial and it is one configuration that you have to have in the final solution. The cases for (n > 1) have to be done with self-joins.

The Worst Fit Algorithm has an unfortunate name. We look at the available partially filled bins and put the item into the bin with the most empty space.

There is a website with animations of of various algorithms from the University of Arizona's computer science department. It is a demonstration of the ICON programming language. It is as high level language that started off as a string manipulation language to replace SNOBOL and then it evolved.