Conversation
…tor_GenerateViolinPlots
rernst
left a comment
There was a problem hiding this comment.
First of all lots of work done, good job! I feel like there is room for improvement, some general thoughts:
-
Many parameters have names like metab_interest_sorted. In the context of a function, it’s not relevant whether the input is “of interest” or “sorted.” Use neutral, descriptive names that reflect the data type or role.
-
Several functions are named after their use case rather than their functionality. name functions based on what they do, not where they are used.
-
When breaking function calls across lines, maintain a consistent style. Preferred format:
Rfunction1(
function_2(param_a),
param_b,
param_c,
)-
There is no error catching for missing files or invalid paths. Currently, the code will crash, making debugging difficult.
-
There seems to be a lot of ad-hoc data transformations. It feels like the DIMS application is missing a standardized data format for saving and reusing data between steps.
| # Remove columns, move HMDB_code & HMDB_name column to the front, change intensity columns to numeric | ||
| intensities_zscore_df <- intensities_zscore_df %>% | ||
| select(-c(plots, HMDB_name_all, HMDB_ID_all, sec_HMDB_ID, HMDB_key, sec_HMBD_ID_rlvnc, name, | ||
| relevance, descr, origin, fluids, tissue, disease, pathway, nr_ctrls)) %>% | ||
| relocate(c(HMDB_code, HMDB_name)) %>% | ||
| rename(mean_controls = avg_ctrls, sd_controls = sd_ctrls) %>% | ||
| mutate(across(!c(HMDB_name, HMDB_code), as.numeric)) | ||
|
|
||
| # Get the controls and patient IDs, select the intensity columns | ||
| controls <- colnames(intensities_zscore_df)[grepl("^C", colnames(intensities_zscore_df)) & | ||
| !grepl("_Zscore$", colnames(intensities_zscore_df))] | ||
| control_intensities_cols_index <- which(colnames(intensities_zscore_df) %in% controls) | ||
| nr_of_controls <- length(controls) | ||
|
|
||
| patients <- colnames(intensities_zscore_df)[grepl("^P", colnames(intensities_zscore_df)) & | ||
| !grepl("_Zscore$", colnames(intensities_zscore_df))] | ||
| patient_intensities_cols_index <- which(colnames(intensities_zscore_df) %in% patients) | ||
| nr_of_patients <- length(patients) | ||
|
|
||
| intensity_cols_index <- c(control_intensities_cols_index, patient_intensities_cols_index) | ||
| intensity_cols <- colnames(intensities_zscore_df)[intensity_cols_index] |
There was a problem hiding this comment.
This could be one (or more) 'prepare_data' functions.
| intensity_cols_index <- c(control_intensities_cols_index, patient_intensities_cols_index) | ||
| intensity_cols <- colnames(intensities_zscore_df)[intensity_cols_index] | ||
|
|
||
| #### Calculate ratios of intensities for metabolites #### |
There was a problem hiding this comment.
Parts of this block can be 'calculate' functions.
| zscore_patients_df <- intensities_zscore_ratios_df %>% select(HMDB_code, HMDB_name, any_of(paste0(patients, "_Zscore"))) | ||
| zscore_controls_df <- intensities_zscore_ratios_df %>% select(HMDB_code, HMDB_name, any_of(paste0(controls, "_Zscore"))) | ||
|
|
||
| #### Make violin plots ##### |
There was a problem hiding this comment.
And this a make create violoin plot pdf function
| save_prob_scores_to_excel(diem_probability_score, output_dir, run_name) | ||
|
|
||
|
|
||
| #### Generate dIEM plots ######### |
| # metabs_iems <- lapply(top_iems, function(iem) { | ||
| # iem_probablity <- patient_top_iems_probs %>% filter(Disease == iem) %>% pull(!!sym(patient_id)) | ||
| # metabs_iems_names <- c(metabs_iems_names, paste0(iem, ", probability score ", iem_probablity)) | ||
| # metab_iem <- expected_biomarkers_df %>% filter(Disease == iem) %>% select(HMDB_code, HMDB_name) | ||
| # return(metab_iem) | ||
| # }) | ||
| # names(metabs_iems) <- metabs_iems_names |
The refactor of GenerateViolinPlots: