How you should do this depends on a number important things: how many numbers will you have and how big will they be? Also, as far as I understand, your data can change (add / remove numbers etc.), right?. How often do you need to make these queries?
I'll present two solutions. I suggest you use the second, as I suspect it's better for what you need and it's a lot easier to understand.
Solution 1 - dynamic programming
Let S[i] = true if we can make sum i and false otherwise.
S[0] = true // we can always make sum 0: just don't choose any number
S[i] = false for all i != 0
for each number i in your input
for s = MaxSum downto i
if ( S[s - i] == true )
S[s] = true; // if we can make the sum s - i, we can also make the sum s by adding i to the sum s - i.
To get the actual numbers that make up your sum you should keep another vector P[i] = the last number that was used to make sum i
. You would update this accordingly in the if
condition above.
The time complexity of this is O(numberOfNumbers * maxSumOfAllNumbers)
, which is pretty bad, especially since you have to rerun this algorithm whenever your data changes. It's also slow for even one run as long as your numbers can be very big and you can have a lot of them. In fact, "a lot" is misleading. If you have 100 numbers and each number can be as big as 10 000, you will do roughly 100 * 10 000 = 1 000 000 operations each time your data changes.
It's a good solution to know, but not really useful in practice, or at least not in your case I think.
He's some C# for the approach I suggest:
class Program
{
static void Main(string[] args)
{
List<int> testList = new List<int>();
for (int i = 0; i < 1000; ++i)
{
testList.Add(1);
}
Console.WriteLine(SubsetSum.Find(testList, 1000));
foreach (int index in SubsetSum.GetLastResult(1000))
{
Console.WriteLine(index);
}
}
}
static class SubsetSum
{
private static Dictionary<int, bool> memo;
private static Dictionary<int, KeyValuePair<int, int>> prev;
static SubsetSum()
{
memo = new Dictionary<int, bool>();
prev = new Dictionary<int, KeyValuePair<int, int>>();
}
public static bool Find(List<int> inputArray, int sum)
{
memo.Clear();
prev.Clear();
memo[0] = true;
prev[0] = new KeyValuePair<int,int>(-1, 0);
for (int i = 0; i < inputArray.Count; ++i)
{
int num = inputArray[i];
for (int s = sum; s >= num; --s)
{
if (memo.ContainsKey(s - num) && memo[s - num] == true)
{
memo[s] = true;
if (!prev.ContainsKey(s))
{
prev[s] = new KeyValuePair<int,int>(i, num);
}
}
}
}
return memo.ContainsKey(sum) && memo[sum];
}
public static IEnumerable<int> GetLastResult(int sum)
{
while (prev[sum].Key != -1)
{
yield return prev[sum].Key;
sum -= prev[sum].Value;
}
}
}
You should do some error checking perhaps, and maybe store the last sum in the class so as not to allow the possibility of calling GetLastResult
with a different sum than the sum Find
was last called with. Anyway, this is the idea.
Solution 2 - randomized algorithm
Now, this is easier. Keep two lists: usedNums
and unusedNums
. Also keep a variable usedSum
that, at any point in time, contains the sum of all the numbers in the usedNums
list.
Whenever you need to insert a number into your set, also add it to one of the two lists (doesn't matter which, but do it randomly so there's a relatively even distribution). Update usedSum
accordingly.
Whenever you need to remove a number from your set, find out which of the two lists it's in. You can do this with a linear seach as long as you don't have a lot (this time a lot means over 10 000, maybe even 100 000 on a fast computer and assuming you don't do this operation often and in fast succession. Anyway, the linear search can be optimized if you need it to be.). Once you have found the number, remove it from the list. Update usedSum
accordingly.
Whenever you need to find if there are numbers in your set that sum to a number S
, use this algorithm:
while S != usedSum
if S > usedSum // our current usedSum is too small
move a random number from unusedNums to usedNums and update usedSum
else // our current usedSum is too big
move a random number from usedNums to unusedNums and update usedSum
At the end of the algorithm, the list usedNums
will give you the numbers whose sum is S
.
This algorithm should be good for what you need, I think. It handles changes to the dataset very well and works well for a high number count. It also doesn't depend on how big the numbers are, which is very useful if you have big numbers.
Please post if you have any questions.