Add main functions for the computation of Kr maps#866
Add main functions for the computation of Kr maps#866carhc wants to merge 25 commits intonext-exp:masterfrom
Conversation
|
Comments & Corrections:
Tests:
|
|
This is a large commit, better do small commits with specific changes. Otherwise is difficult to follow the flow of changes. I understant that:
Do I miss other changes? There is a comment of many lines related with time-series, I understant this is the time-evolution part, that will be re-adapted in a new PR. @carhc Are you preparing to commit more tests of the functions used in the code? |
cb58021 to
f1c6ab6
Compare
063b20c to
c7606ed
Compare
c7606ed to
aedc73e
Compare
gonzaponte
left a comment
There was a problem hiding this comment.
This is being done with peras
| fittype : KrFitFunction | ||
| Chosen fit function for map computation |
There was a problem hiding this comment.
This is no longer a parameter. Remove blank line before.
| fittype : KrFitFunction | ||
| Chosen fit function for map computation | ||
| bins : Tuple[np.array, np.array] | ||
| Tuple containing bins in both axis |
There was a problem hiding this comment.
| Tuple containing bins in both axis | |
| Tuple containing bins in both axes |
|
|
||
| Returns | ||
| ------- | ||
|
|
| geom_comb = list(itertools.product(b_center[0], b_center[1])) | ||
| r_values = np.array([np.sqrt(x**2+y**2)for x, y in geom_comb]) |
There was a problem hiding this comment.
| geom_comb = list(itertools.product(b_center[0], b_center[1])) | |
| r_values = np.array([np.sqrt(x**2+y**2)for x, y in geom_comb]) | |
| geom_comb = np.array(list(itertools.product(*b_center)) | |
| r_values = np.sum(geom_comb**2, axis=1)**0.5 |
| X = list(zip(*geom_comb))[0], | ||
| Y = list(zip(*geom_comb))[1], |
There was a problem hiding this comment.
Following the suggestion above...
| X = list(zip(*geom_comb))[0], | |
| Y = list(zip(*geom_comb))[1], | |
| X = geom_comb.T[0], | |
| Y = geom_comb.T[1], |
| @given(integers(min_value = 0, max_value = 1e10), | ||
| integers(min_value = 0, max_value = 1e10), | ||
| arrays (dtype = np.int64, shape = (2,), | ||
| elements = integers(min_value = 1, | ||
| max_value = 1e4))) | ||
| def test_get_number_of_bins_returns_type(nevents, thr, n_bins): | ||
|
|
||
| assert type(krf.get_number_of_bins(nevents, thr) ) == np.ndarray | ||
| assert type(krf.get_number_of_bins(n_bins=n_bins)) == np.ndarray |
There was a problem hiding this comment.
Remove hypothesis and do a simple fixed cases. Why do we want to test the output type?
| @given(n_bins=arrays(dtype = int, shape = (2,), | ||
| elements = integers(min_value = 2, | ||
| max_value = 100)), | ||
| n_min=integers(min_value=1, max_value=100), | ||
| r_max=floats (min_value=50, max_value=450)) |
There was a problem hiding this comment.
again, a single case is enough.
| 'pval', 'in_active', 'has_min_counts', 'fit_success', 'valid', 'R', 'X', 'Y'] | ||
|
|
||
| assert all(element in columns for element in df.columns.values) | ||
|
|
There was a problem hiding this comment.
Check also the number of rows in the output and rename the test to something like test_create_df_kr_map_shape.
Also, add another test checking that the non-dummy values set in the function fall in the expected range.
| def valid_bin_counter(map_df : pd.DataFrame, | ||
| validity_parameter : Optional[float] = 0.9): |
There was a problem hiding this comment.
Also, need tests for this function
| def regularize_map(maps : pd.DataFrame, | ||
| x2range : Tuple[float, float]): |
There was a problem hiding this comment.
Tests. At least:
- check it doesn't modify the input
- create a silly map with ones and a few "invalid entries", then check the invalid entries are restored to 1.
gonzaponte
left a comment
There was a problem hiding this comment.
a few more comments.
| counts = counts.flatten() | ||
| bin_indexes -= 1 | ||
| bin_labels = np.ravel_multi_index(bin_indexes, dims=(n_xbins, n_ybins), | ||
| mode='clip', order = 'F') | ||
|
|
||
| return counts, bin_labels |
There was a problem hiding this comment.
| counts = counts.flatten() | |
| bin_indexes -= 1 | |
| bin_labels = np.ravel_multi_index(bin_indexes, dims=(n_xbins, n_ybins), | |
| mode='clip', order = 'F') | |
| return counts, bin_labels | |
| # indexes 0 and len+1 represent underflow and overflow, thus we subtract 1 | |
| bin_labels = np.ravel_multi_index(bin_indexes - 1, dims=(n_xbins, n_ybins), | |
| mode='clip', order = 'F') | |
| return counts.flatten(), bin_labels |
There was a problem hiding this comment.
if I understand correctly mode="clip" means that bins out of range will be assigned the closest bin. Is this what we used to do @bpalmeiro ?
There was a problem hiding this comment.
In our case, we had three values for "fiducial":
-
The R max for the selection (just a quality one to ensure everything was inside)
-
The R max for the map, that defines the maximum extension of it. In this case, if the event didn't happen to fall inside the bins, it was disregarded.
-
Also (not sure here it's the most appropriate place to mention, tho), we had a posterior step where the peripheral bins (those further than a fiducial r -but within said Rmax for map production-) were set to nan.
The latter was used to ensure that, in the posterior analysis, all the hits reconstructed outside the volume were automatically flagged as nan.
There was a problem hiding this comment.
That said, the best approach would be to disregard events outside boundaries instead of fakely merging them in the border bins, even if we rely on a previous selection. It's not a very strong opinion, but I think it adds redundancy (and, therefore, a bit of robustness) to the process.
There was a problem hiding this comment.
Yes, my bad... I assumed that there would not be any events outside the boundaries here since I was expecting a "clean" dst in this part of the code, so when I defined bin_labels like that I didn't think of the consequences in case some event was actually out of the range.
I am thinking in a solution like this one:
| counts = counts.flatten() | |
| bin_indexes -= 1 | |
| bin_labels = np.ravel_multi_index(bin_indexes, dims=(n_xbins, n_ybins), | |
| mode='clip', order = 'F') | |
| return counts, bin_labels | |
| counts = counts.flatten() | |
| bin_indexes -= 1 | |
| valid_mask = ( in_range(bin_indexes[0], 0, n_xbins, right_closed=True) & | |
| in_range(bin_indexes[1], 0, n_ybins, right_closed=True) ) | |
| bin_labels = np.full(bin_indexes.shape[1], fill_value=np.nan) | |
| bin_labels[valid_mask] = np.ravel_multi_index((bin_indexes[0, valid_mask], | |
| bin_indexes[1, valid_mask]), | |
| dims=(n_xbins, n_ybins), | |
| order='F') |
In case some events are outside the desired range, their label would be a NaN instead of a numerical index, so when grouping the events bin by bin, none would be assigned to those events...
| return valid_per | ||
|
|
||
|
|
||
| def fit_and_fill_map(map_bin : pd.DataFrame, |
There was a problem hiding this comment.
If I understand correctly, this is a pd.Series, not a pd.DataFrame
| outliers = new_map.in_active & ~new_map.valid | ||
|
|
||
| if isinstance(x2range, Tuple): | ||
| outliers &= ~in_range(new_map.chi2, *x2range) |
There was a problem hiding this comment.
I think the logic is wrong here. The outliers should be those that are in the active and (not valid OR not in range). Truth table:
in_active valid x2_in_range outlier expected
--------------------------------------------------
True True True False False <--- satisfied
True True False False True <--- not satisfied
True False True False True <--- not satisfied
True False False True True <--- satisfied
False x x False False <--- satisfied
| 'pval', 'in_active', 'has_min_counts', 'fit_success', 'valid', 'R', 'X', 'Y'] | ||
|
|
||
| assert all(element in columns for element in df.columns.values) | ||
| assert df.bin.nunique() == n_bins_x*n_bins_y |
There was a problem hiding this comment.
should the length od the dataframe be also n_binx_x*n_bins_y?
| def test_valid_bin_counter_warning(n_bins, rmax, validity_parameter): | ||
| counts = np.array(range(n_bins[0]*n_bins[1])) | ||
| krmap = krf.create_df_kr_map(bins_x = np.linspace(-rmax, +rmax, n_bins[0]+1), | ||
| bins_y = np.linspace(-rmax, +rmax, n_bins[1]+1), | ||
| counts = counts, | ||
| n_min = 0, | ||
| r_max = np.nextafter(np.sqrt(2)*rmax, np.inf)) | ||
|
|
||
| krmap.valid.iloc[0 : 9] = True |
There was a problem hiding this comment.
as discussed offline, simplify the data in this test
| if validity_parameter == 1: | ||
| with warns(UserWarning, match = "inner bins are not valid."): | ||
| krf.valid_bin_counter(map_df = krmap, validity_parameter = validity_parameter) |
There was a problem hiding this comment.
for validity_parameter = 0.2, you are not checking anything. Check this: https://stackoverflow.com/a/45671804
This PR addresses issue #863 (ICAROS: Calibration map creation) and lies on top of PR #865 (Data preparation functions).
It contains the main and auxiliary functions needed for the core of Icaro city. Many different functions have been defined in order to prepare, fit and produce the Kr calibration maps.
WIP: