Combinatorial optimization has proved time and again, to be an extremely useful method for deriving better decisions, whenever scarce and constrained resources have to be allocated. Typical areas of industrial use-cases include transportation, logistics, healthcare, finance, technology, defense, and more. A common problem is combinatorial explosion at growing problem-sizes, which limits the scalability of algorithms. Given the usefulness of the combinatorial optimization approach, the need for large-scale solvers becomes evident.
This page shows our design and implementation of our a flow-based solver for large-scale and real-world combinatorial optimization.
Our main design goals and decisions have been:
Our algorithm solves a combinatorial optimization problem called SET CONSTRAINED FLOW directly, by employing the column generation method in a modular software architecture. In this section, we summarize our approach.
2.1 Problem Formulation
The problem SET CONSTRAINED FLOW is defined as follows: We are given a directed graph \( G = (V, E) \) on n vertices and m edges, and a family \( S = \{S_1, \ldots, S_k \} \) of k set constraints, where the element \( S_i = (E_i, \ell_i, u_i) \) defines some set \( E_i \subseteq E \) and two numbers \( \ell_i \leq u_i \). Each vertex \( v \in V \) is given by a pair \( v = (i_v, b_v) \), where \( i_v \in \mathbb{N} \) is a unique identifier, and \( b_v \in \mathbb{R} \), the balance of v. A vertex v is called a source if \( b_v > 0 \) and a sink if \( b_v < 0 \). Each edge \( e \in E \) is given by a tuple \( e = (i_e, s_e, t_e, c_e, w_e, d_e) \), where \( i_e \in \mathbb{N} \) is a unique identifier, \( s_e \in V \) is the source, \( t_e \in V \) is the target, \( c_e \in \mathbb{R} \) is the cost, \( w_e \) is the weight, and finally \( d_e \in \{\mathbb{B}, \mathbb{Z}, \mathbb{R} \} \) is the domain of e. For each edge e, there is a variable \( x_e \in d_e \), i.e., in the domain given by the edge. Define the vector \( \mathbf{x} = (x_e : e \in E) \) and its cost by \( \textnormal{cost}(\mathbf{x}) = \sum_{e \in E} c_e x_e \). The optimization problem is:
Here, (1) is the objective function, (2) are the balance constraints, (3) are the upper constraints, and (4) are the lower constraints. \( E^-_v \) and \( E^+_v \) denote the incoming and outgoing edges of vertex v. SET CONSTRAINED FLOW is said to be in canonical form, if there is a singleton constraint \( S_i \in S \) with \( E_i = \{e\} \), \( l_i = 0 \), and \( w_e > 0 \) for each \( e \in E \). In that case, we identify i and e, and the edge e has a capacity \( k_e = u_e / w_e \). Observe that MINIMUM COST FLOW and hence also SHORTEST PATH are direct special cases of SET CONSTRAINED FLOW in canonical form.
2.2 Modular Software Architecture
The initial module is the instance generator, which is a framework that help to transform the input of some given use-case into an instance of SET CONSTRAINED FLOW. The user simply has to define the inter-dependencies between the objects of the use-case. The actual graph generation is taken care of by the module, by making use of parallelization to speed up the process. Finally, the user needs to specify the set constraints to complete the definition of the instance. Given that, the instance classifier detects properties that will be exploited in the subsequent modules and hence affects the overall configuration of the optimizer.
The main module of the solver is the coordinator: It instructs the behaviour of the remaining modules, observes if a terminating solution has been found or if time-limits have been exceeded. Following the column generation approach, the constraints of the instance are split into local- and global constraints, respectively. The master solver hosts the global constraints and always solves some given linear program. The corresponding dual solution yields reduced cost coefficients on the edges, which are reported to the pricing solver by the coordinator. The pricing solver solves the subproblem, which is given by the graph, the local singleton constraints, and the vertex-balances. In many cases, these problems are simply SHORTEST PATH problems of some variety (and determined by the instance classifier). The pricing solver is responsible for most of the speed and scalability of our approach: SHORTEST PATH can be solved in low order polynomial time and a large number of paths can be generated in parallel. The paths are reported to the column pool, deciding which ones to keep and which ones to discard. The remaining columns are finally added to the linear program to be solved by the master solver. Finally, the fixing solver decides which variables or constraints in the linear program are rounded, frozen, or released.
The purpose of modeling is to generate an input that represents an instance of the SET CONSTRAINED FLOW problem. We have defined the following JSON structure:
{
"instance": {
"name": < string: name of the instance >,
"description": < string: description of the instance >,
"defaults": {
"edge": {
"source": < integer: default edge-source id >,
"target": < integer: default edge-source id >,
"weight": < double: default edge-weight >,
"capacity": < double: default edge-capacity >,
"cost": < double: default edge-cost >,
"integral": < boolean: default edge-integrality >
},
"vertex": {
"balance": < double: default vertex-balance >
}
}
},
"graph": {
"vertices": [
{
"vertexId": < integer: vertex id >,
"balance": < double: vertex-balance >
},
...
{
"vertexId": < integer: vertex id >,
"balance": < double: vertex-balance >
}
],
"edges": [
{
"edgeId": < integer: edge id>,
"source": < integer: edge-source id >,
"target": < integer: edge-target id >,
"weight": < double: edge-weight >,
"capacity": < double: edge-capacity >,
"cost": < double: edge-cost >,
"integral": < boolean: edge-integrality >
},
...
{
"edgeId": < integer: edge id>,
"source": < integer: edge-source id >,
"target": < integer: edge-target id >,
"weight": < double: edge-weight >,
"capacity": < double: edge-capacity >,
"cost": < double: edge-cost >,
"integral": < boolean: edge-integrality >
}
]
},
"constraints": [
{
"constraintId": < integer: constraint id>,
"edgeIds": < list: edge ids of constraint >,
"lower": < double: constraint lower bound >,
"upper": < double: constraint upper bound >
},
...
{
"constraintId": < integer: constraint id>,
"edgeIds": < list: edge ids of constraint >,
"lower": < double: constraint lower bound >,
"upper": < double: constraint upper bound >
}
]
}
Consider an instance for fractional KNAPSACK, with knapsack capacity 4 and five items with (profit, weight) pairs given by (1.0, 1.0), (2.0, 1.0), (3.0, 2.0), (4.0, 2.0), (5.0, 4.0). The instance definition translates into the following JSON input:
{
"instance": {
"name": "Knapsack 1",
"description": "A small fractional KNAPSACK instance",
"defaults": {
"edge": {
"source": 0,
"target": 1,
"weight": 1.0,
"capacity": 1.0,
"integral": false
},
"vertex": {
"balance": 0.0
}
}
},
"graph": {
"vertices": [
{
"vertexId": 0,
"balance": 5.0
},
{
"vertexId": 1,
"balance": -5.0
}
],
"edges": [
{
"edgeId": 0,
"cost": -1.0
},
{
"edgeId": 1,
"cost": -2.0
},
{
"edgeId": 2,
"weight": 2.0,
"cost": -3.0
},
{
"edgeId": 3,
"weight": 2.0,
"cost": -4.0
},
{
"edgeId": 4,
"weight": 4.0,
"cost": -5.0
},
{
"edgeId": 5,
"capacity": 5.0,
"cost": 0.0
}
]
},
"constraints": [
{
"constraintId": 0,
"edgeIds": [ 0, 1, 2, 3, 4 ],
"lower": 0.0,
"upper": 4.0
}
]
}