Skip to content

New Optimizer code#125

Merged
pavelkomarov merged 33 commits intomasterfrom
hack-the-optimizer-more
Jul 15, 2025
Merged

New Optimizer code#125
pavelkomarov merged 33 commits intomasterfrom
hack-the-optimizer-more

Conversation

@pavelkomarov
Copy link
Collaborator

@pavelkomarov pavelkomarov commented Jul 7, 2025

big changes! Basically fully rewrote the optimizer code. Getting same answers for spectraldiff, going to follow on with moves of some of the other modules, deleting now-unnecessary code as I go.

… answers for spectraldiff, going to follow on with moves of some of the other modules, deleting now-unnecessary code as I go.
@pavelkomarov pavelkomarov changed the title big changes! Basically fully rewrote the optimizer code. Getting same… New Optimizer code Jul 7, 2025
@pavelkomarov
Copy link
Collaborator Author

pavelkomarov commented Jul 8, 2025

The jupyter notebooks 2a and 2b should be moved over to use this syntax, maybe in this PR, maybe in a follow-on. Essentially now all the shadow-methods don't exist, and you instead call optimize with the original method.

Follow up: Done and folded in to this (now giant) PR.

# Savitzky-Golay filter #
#########################
def savgoldiff(x, dt, params=None, options=None, polynomial_order=None, window_size=None, smoothing_win=None):
def savgoldiff(x, dt, params=None, options=None, poly_order=None, window_size=None, smoothing_win=None):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to rename this parameter, because typing out the full word can take up a bunch of space and feel unnecessary.

if window_size < poly_order*3:
window_size = poly_order*3+1
if window_size % 2 == 0:
window_size += 1
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This thing was failing because the optimizer wanted to give even-length windows, so I've added back this +1.

"CVXPY to be installed. You can still pynumdiff.optimize for other functions.")

from . import finite_difference, smooth_finite_difference, linear_model, kalman_smooth
from ._optimize import optimize
Copy link
Collaborator Author

@pavelkomarov pavelkomarov Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only need to import a single thing now, one function to rule them all

None for k,v in search_space_types.items()]

# wrap the objective and scipy.optimize.minimize because the objective and options are always the same
_obj_fun = partial(_objective_function, func=func, x=x, dt=dt, singleton_params=singleton_params,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to keep down the amount of information flying around. The use of partial and the comment make clear that essentially a bunch of stuff that flows in to the optimize function stays constant throughout the process, so we only need to focus on the point (and translation to and from the full parameter dictionaries).

padding=padding)
_minimize = partial(scipy.optimize.minimize, _obj_fun, method=opt_method, bounds=bounds, options={'maxiter':maxiter})

with Pool(initializer=filterwarnings, initargs=["ignore", '', UserWarning]) as pool: # The heavy lifting
Copy link
Collaborator Author

@pavelkomarov pavelkomarov Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the subprocesses can raise warnings, because I print warnings throughout the code, for example when using a kernel with even width. There is no way to keep the optimization from querying functions with parameters that may throw warnings, so this keeps the procedure from making a ruckus.

# results are going to be floats, but that may not be allowed, so convert back to a dict
opt_params = {k:(v if search_space_types[k] == float else
int(np.round(v)) if search_space_types[k] == int else
v > 0.5) for k,v in zip(search_space_types, opt_point)}
Copy link
Collaborator Author

@pavelkomarov pavelkomarov Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This translation-back-to-dict code is awfully compressed. It may need to become a full-on encode/decode function pair if we extend this to handle strings. But I spent some time trying to figure out how to do that by adaptively mapping newly-seen strings to ordinal numbers, and it's actually mildly nontrivial to remember such a mapping. We could map all the strings in the codebase to arbitrary numbers, but then some are closer than others; we could map them to simplex vertices or something instead, but then you start adding several more dimensions to the search space. I'm not sure it's worth taking on that added complexity, honestly, given how few parameter settings are strings. Ideally we generalize all the way, especially because categorical differences also describe the case of comparing methods themselves. It may be more expeditious to take a different approach entirely, like Bayesian optimization with a library that handles this sort of thing a bit. We should think about it at a high level.

from ..kalman_smooth import constant_velocity, constant_acceleration, constant_jerk


# Map from method -> (search_space, bounds_low_hi)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the wrapper classes basically just encoded the information in this big dictionary, but much more spread out. Here it's all in one place, and a massive amount of code can simply go away.

from pynumdiff.optimize.kalman_smooth import constant_velocity, constant_acceleration, \
constant_jerk
from pynumdiff.utils.simulate import pi_control
from ..finite_difference import first_order as iterated_finite_difference # actually second order
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now import the functions themselves, not identically-named shadows with different parameters. If we don't give extra search space information to the optimizer, it can fish that information out of its memory. bounds are enforced.


# simulation
noise_type = 'normal'
noise_parameters = [0, 0.01]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these were defined but not used. I've passed them to the simulation function explicitly.

log_gamma = -1.6 * np.log(cutoff_frequency) - 0.71 * np.log(dt) - 5.1
tvgamma = np.exp(log_gamma)

def get_err_msg(actual_params, desired_params):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was only used in one test function, and it's not really necessary.

return err_msg


def test_first_order():
Copy link
Collaborator Author

@pavelkomarov pavelkomarov Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #104 and #120. This thing isn't actually first order, so I renamed it a bit internally.

params1, val1 = optimize(meandiff, x, dt, search_space={'num_iterations':1}, tvgamma=tvgamma, dxdt_truth=dxdt_truth)
params2, val2 = optimize(meandiff, x, dt, search_space={'num_iterations':1}, tvgamma=0, dxdt_truth=None)
assert params1['window_size'] == 5
assert params2['window_size'] == 1
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to report that in almost all cases the answers that come out here are identical, so for all my complicated changes to the optimizer, nothing broke.

return r.rvalue**2


def total_variation(x, padding=0):
Copy link
Collaborator Author

@pavelkomarov pavelkomarov Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This felt like it belonged here instead of in the utilities, since it's only used as part of calculating a metric in the absence of ground truth. Putting it here allows it to take padding with a bit more continuity, too, which enables us to skip calculating padding in the optimizer itself.

padding = int(0.025*len(x))
padding = max(padding, 1)

return np.sum(np.abs(x[1:]-x[:-1]))/(len(x)-1) # mostly equivalent to cvxpy.tv(x2-x1).value
Copy link
Collaborator Author

@pavelkomarov pavelkomarov Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"mostly". https://www.cvxpy.org/_modules/cvxpy/atoms/total_variation.html#tv. Our version just normalizes for length, and theirs doesn't.

params1, val1 = optimize(polydiff, x, dt, tvgamma=tvgamma, dxdt_truth=dxdt_truth)
params2, val2 = optimize(polydiff, x, dt, tvgamma=0, dxdt_truth=None)
assert (params1['poly_order'], params1['window_size']) == (6, 50)
assert (params2['poly_order'], params2['window_size']) == (4, 10)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an example where I got different answers, but these actually agree with the ones that used to be here before I touched the codebase. I had to change them to [2, 10] to get this to pass after my slide_function changes, I think. Why that changed things, I don't know. Why these changes recover what the optimizer used to be doing, I don't know. My changes have complemented each other in the right way, I guess.

…RMS numbers worsened because no longer excluding the beginning and ends of the vectors in calculation), and changed the evaluation file because I realized padding=None makes it sound like no padding, not 2.5% padding, and padding wasn't getting used properly in the error_correlation and total_variation functions
removed a couple unnecessary imports, reran basic_tutorial notebook (…
"noise_type = 'normal'\n",
"noise_parameters = [0, 0.1] # mean and std\n",
"\n",
"# time step size and time series length in TIME\n",
"dt = 0.01\n",
"timeseries_length = 4"
"duration = 4"
Copy link
Collaborator Author

@pavelkomarov pavelkomarov Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I went to update the 2a and 2b notebooks, I realized I hadn't rerun this one, so now I have updated some things to agree with the updated evaluate and simulate files. Note that the default behavior is now to not pad when calculating the metrics, which means we're including the 2.5% at the edges which tends to perform more poorly. Thus RMS numbers worsen. Without this change, they're exactly what they were before.

padding = max(padding, 1)
x = x[padding:len(x)-padding]

return np.linalg.norm(x[1:]-x[:-1], 1)/len(x) # normalized version of what cvxpy.tv does
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This used to be normalized by len(np.ravel(x)[0:-1]), which is $m-1$ rather than $m$. The paper says normalize by $m$ in the math, so I've made the code agree.

…=0 while optimization notebooks use the optimizer's default, which was padding='auto'. Changed all default paddings to be 0 so the optimization optimizes the number shown on the plots by default in the 2a and 2b notebooks.
params_1, val_1 = first_order(x, dt, params=None, options={'iterate': True},
tvgamma=tvgamma, dxdt_truth=dxdt_truth)
params_2, val_2 = first_order(x, dt, params=None, options={'iterate': True},
tvgamma=0, dxdt_truth=None)
Copy link
Collaborator Author

@pavelkomarov pavelkomarov Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized the way these were being called doesn't make any sense, because if dxdt_truth is given, tvgamma isn't used, and if dxdt_truth isn't given, tvgamma is used. So why was the carefully-calculated tvgamma above only passed in when dxdt_truth was, and 0 passed in when we would need to use tvgamma? Correcting this changes some of the solutions to the optimizations.

@pavelkomarov pavelkomarov merged commit 79f3eb9 into master Jul 15, 2025
1 check passed
@pavelkomarov pavelkomarov deleted the hack-the-optimizer-more branch July 15, 2025 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant