This problem involves finding two min-max criteria filters to generate the highest score sum in a dataset.
I have a dataset, with 3 columns. x, y, score, with over 1 million of rows.
x | y | score |
---|---|---|
3.6 | 1.2 | -5 |
4.2 | 1.2 | -4 |
1.2 | 30.2 | 1 |
2.9 | 6.8 | 6 |
3.1 | 5.8 | 7 |
0.1 | 15.8 | 7 |
The data may or may not have a correlation.
I want to find a criteria filter of min/max on x and y that gives me the highest possible sum of scores.
This is how the query would look like in SQL.
SELECT SUM(score) FROM mytable WHERE x > xmin AND x < xmax AND y > ymin AND y < ymax
What I am looking for is the optimal values for xmin, xmax, ymin och ymax
What kind of optimization method is needed to solve this problem? And exactly how would the implementation look like?
(Preferrably it would be implemented using Java or postgres sql.)
Advertisement
Answer
Not so easy. Here is my MIP model:
minimize sum(i, selected(i)*score(i)) x(i) <= xmin + M*(xBetween(i)+xAbove(i)) x(i) >= xmax - M*(xBetween(i)+xBelow(i)) x(i) >= xmin - M*xBelow(i) x(i) <= xmax + M*xAbove(i) y(i) <= ymin + M*(yBetween(i)+yAbove(i)) y(i) >= ymax - M*(yBetween(i)+yBelow(i)) y(i) >= ymin - M*yBelow(i) y(i) <= ymax + M*yAbove(i) xBetween(i)+xAbove(i)+xBelow(i)=1 yBetween(i)+yAbove(i)+yBelow(i)=1 selected(i) <= xBetween(i) selected(i) <= yBetween(i) selected(i) >= xBetween(i)+yBetween(i)-1 xmin <= xmax ymin <= ymax xBetween(i),xAbove(i),xBelow(i) ∈ {0,1} yBetween(i),yAbove(i),yBelow(i) ∈ {0,1} selected(i) ∈ {0,1}
The constants M are large enough numbers (I used the range of x or y data).
With some random numbers, I got:
---- 37 VARIABLE z.L = 30.940 objective ---- 44 PARAMETER data data + results x y score x.betw y.betw selected i1 1.717 6.611 -8.972 i2 8.433 7.558 -9.880 i3 5.504 6.274 -1.975 i4 3.011 2.839 0.398 1.000 1.000 1.000 i5 2.922 0.864 2.578 1.000 1.000 1.000 i6 2.241 1.025 -5.485 1.000 i7 3.498 6.413 -2.078 1.000 i8 8.563 5.453 -4.480 1.000 i9 0.671 0.315 -6.953 i10 5.002 7.924 8.726 i11 9.981 0.728 -1.547 1.000 i12 5.787 1.757 -7.307 1.000 i13 9.911 5.256 -2.279 1.000 i14 7.623 7.502 -2.507 i15 1.307 1.781 -4.630 1.000 i16 6.397 0.341 8.967 i17 1.595 5.851 -6.221 1.000 i18 2.501 6.212 -4.050 1.000 i19 6.689 3.894 -8.509 1.000 i20 4.354 3.587 -1.973 1.000 i21 3.597 2.430 -7.966 1.000 i22 3.514 2.464 -2.322 1.000 i23 1.315 1.305 -3.518 1.000 i24 1.501 9.334 -6.157 i25 5.891 3.799 -7.753 1.000 i26 8.309 7.834 1.931 i27 2.308 3.000 0.229 1.000 1.000 1.000 i28 6.657 1.255 -9.099 1.000 i29 7.759 7.489 5.662 i30 3.037 0.692 8.915 1.000 1.000 1.000 i31 1.105 2.020 1.929 1.000 i32 5.024 0.051 2.147 i33 1.602 2.696 -2.750 1.000 i34 8.725 4.999 1.881 1.000 i35 2.651 1.513 3.597 1.000 1.000 1.000 i36 2.858 1.742 0.132 1.000 1.000 1.000 i37 5.940 3.306 -6.815 1.000 i38 7.227 3.169 3.138 1.000 i39 6.282 3.221 0.478 1.000 i40 4.638 9.640 -7.512 i41 4.133 9.936 9.734 i42 1.177 3.699 -5.438 1.000 i43 3.142 3.729 3.513 1.000 1.000 1.000 i44 0.466 7.720 5.536 i45 3.386 3.967 8.649 1.000 1.000 1.000 i46 1.821 9.131 -5.975 i47 6.457 1.196 -4.057 1.000 i48 5.607 7.355 -6.055 i49 7.700 0.554 -5.073 i50 2.978 5.763 2.930 1.000 1.000 1.000 range(data) 9.516 9.885 19.614 min(solution) 2.241 0.554 max(solution) 3.498 5.851