I'm trying to optimize a target function that has multiple input variables (between 24 and 30). These variables are samples of three different statistical variables, and target function values are t-test probability values. An error function represents the error (sum of squares of differences) between the desired and the actual t-test probabilities. I can only accept solutions where the error is less than 1e-8, for all of the three t-tests.
I was using scipy.optimize.fmin
and it worked great. There are many solutions where the target function became zero.
The problem is that I need to find a solution where the variables are between 0 and 10.0, and are whole numbers or don't have more than one digit fractional part. Examples of valid values are 0 10 3 5.5 6.8
. Examples of invalid values: -3 2.23 30
or 0.16666667
.
I happen to know that there is at least one solution, because the target values are coming from actual measured data. The original data was lost, and my task is to find them. But I don't know how. Using trial/error is not an option, because there are about 100 possible values for each variable, and given the number of variables, the number of possible cases would be 100**30 which is too much. Using fmin is great, however it does not work with discreet values.
Is there a way to solve this? It is not a problem if I need to run a program for many hours to find a solution. But I need to find solutions for about 10 target values within a few days, and I'm out of new ideas.
Here is an example MWE:
import math
import numpy
import scipy.optimize
import scipy.stats
import sys
def log(s):
sys.stdout.write(str(s))
sys.stdout.flush()
# List of target T values: TAB, TCA, TCB
TARGETS = numpy.array([
[0.05456834, 0.01510358, 0.15223353 ], # task 1 to solve
[0.15891875, 0.0083665, 0.00040262 ], # task 2 to solve
])
MAX_ERR = 1e-10 # Maximum error in T values
NMIN,NMAX = 8,10 # Number of samples for T probes. Inclusive.
def fsq(x, t, n):
"""Returns the differences between the target and the actual values."""
a,b,c = x[0:n],x[n:2*n],x[2*n:3*n]
results = numpy.array([
scipy.stats.ttest_rel(a,b)[1], # ab
scipy.stats.ttest_rel(c,a)[1], # ca
scipy.stats.ttest_rel(c,b)[1] # cb
])
# Sum of squares of diffs
return (results - t)
def f(x, t, n):
"""This is the target function that needs to be minimized."""
return (fsq(x,t,n)**2).sum()
def main():
for tidx,t in enumerate(TARGETS):
print "============================================="
print "Target %d/%d"%(tidx+1,len(TARGETS))
for n in range(NMIN,NMAX+1):
log(" => n=%s "%n)
successful = False
tries = 0
factor = 0.1
while not successful:
x0 = numpy.random.random(3*n) * factor
x = scipy.optimize.fmin(f,x0, [t,n], xtol=MAX_ERR, ftol=MAX_ERR )
diffs = fsq(x,t,n)
successful = (numpy.abs(diffs)<MAX_ERR).all()
if successful:
log(" OK, error=[%s,%s,%s]
"%(diffs[0],diffs[1],diffs[2]))
print " SOLUTION FOUND "
print x
else:
tries += 1
log(" FAILED, tries=%d
"%tries)
print diffs
factor += 0.1
if tries>5:
print "!!!!!!!!!!!! GIVING UP !!!!!!!!!!!"
break
if __name__ == "__main__":
main()
See Question&Answers more detail:
os