mcmc_sampler

Header file for mcmc_sampler

Functions

void mcmc_step_threaded(int j)
void mcmc_swap_threaded(int i, int j)
void full_random_swap(sampler *s)
void RJPTMCMC_MH_dynamic_PT_alloc_comprehensive_internal_driver(mcmc_sampler_output *sampler_output, double **output, int **status, int *model_status, int nested_model_number, int max_dimension, int min_dimension, int N_steps, int chain_N, int max_chain_N_thermo_ensemble, int swp_freq, int t0, int nu, int max_chunk_size, std::string chain_distribution_scheme, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_prior, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_likelihood, std::function<void(double*, int*, int, double**, mcmc_data_interface*, void*)> fisher, std::function<void(double*, double*, int*, int*, int*, int*, double*, mcmc_data_interface*, void*)> RJ_proposal, void **user_parameters, int numThreads, bool pool, bool show_prog, bool update_RJ_width, std::string statistics_filename, std::string chain_filename, std::string likelihood_log_filename, std::string checkpoint_file)

Driver routine for the uncorrelated sampler &#8212; trying not to repeat code.

Assumed that the checkpoint_file has already been populated &#8212;

It will overwrite all the file paths

Parameters:
  • max_dimension – dimension of the parameter space being explored

  • min_dimension – dimension of the parameter space being explored

  • N_steps – Number of total steps to be taken, per chain AFTER chain allocation

  • chain_N – Maximum number of chains to use

  • max_chain_N_thermo_ensemble – Maximum number of chains to use in the thermodynamic ensemble (may use less)

  • swp_freq – the frequency with which chains are swapped

  • t0 – Time constant of the decay of the chain dynamics (~1000)

  • nu – Initial amplitude of the dynamics (~100)

  • max_chunk_size – Maximum number of steps to take in a single sampler run

  • log_prior – std::function for the log_prior function &#8212; takes double *position, int *param_status, int dimension, int chain_id

  • log_likelihood – std::function for the log_likelihood function &#8212; takes double *position, int *param_status,int dimension, int chain_id

  • fisher – std::function for the fisher function &#8212; takes double *position, int *param_status,int dimension, double **output_fisher, int chain_id

  • RJ_proposal – std::function for the log_likelihood function &#8212; takes double *position, int *param_status,int dimension, int chain_id

  • user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*

  • numThreads – Number of threads to use (=1 is single threaded)

  • pool – boolean to use stochastic chain swapping (MUST have >2 threads)

  • show_prog – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)

  • statistics_filename – Filename to output sampling statistics, if empty string, not output

  • chain_filename – Filename to output data (chain 0 only), if empty string, not output

  • likelihood_log_filename – Filename to write the log_likelihood and log_prior at each step &#8212; use empty string to skip

  • checkpoint_file – Filename to output data for checkpoint, if empty string, not saved

void RJPTMCMC_MH_dynamic_PT_alloc_comprehensive_internal(mcmc_sampler_output *sampler_output, double **output, int **parameter_status, int *model_status, int nested_model_number, int max_dimension, int min_dimension, int N_steps, int chain_N, int max_chain_N_thermo_ensemble, double *initial_pos, int *initial_status, int initial_model_status, double *seeding_var, double **ensemble_initial_pos, int **ensemble_initial_status, int *ensemble_initial_model_status, double *chain_temps, int swp_freq, int t0, int nu, int max_chunk_size, std::string chain_distribution_scheme, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_prior, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_likelihood, std::function<void(double*, int*, int, double**, mcmc_data_interface*, void*)> fisher, std::function<void(double*, double*, int*, int*, int*, int*, double*, mcmc_data_interface*, void*)> RJ_proposal, void **user_parameters, int numThreads, bool pool, bool show_prog, bool update_RJ_width, std::string statistics_filename, std::string chain_filename, std::string likelihood_log_filename, std::string checkpoint_file)

Parallel tempered, dynamic chain allocation MCMC. Reversible Jump version, which doesn’t currently have any autocorrelation calculations currently.

Runs dynamic chain allocation, then it will run the RJPTMCMC routine repeatedly

Parameters:
  • output – [out] Output shape is double[N_steps,dimension]

  • parameter_status – [out] Output shape is int[N_steps,dimension]

  • max_dimension – maximum dimension of the parameter space being explored

  • min_dimension – minimum dimension of the parameter space being explored

  • N_steps – Number of total steps to be taken, per chain AFTER chain allocation

  • chain_N – Maximum number of chains to use

  • max_chain_N_thermo_ensemble – Maximum number of chains to use in the thermodynamic ensemble (may use less)

  • initial_pos – Initial position in parameter space - shape double[max_dimension]

  • initial_status – Initial status in parameter space - shape int[max_dimension]

  • initial_model_status – Initial model status in parameter space - shape int[nested_model_number]

  • seeding_var – Variance of the normal distribution used to seed each chain higher than 0 - shape double[dimension]

  • chain_temps – [out] Final chain temperatures used &#8212; should be shape double[chain_N]

  • swp_freq – the frequency with which chains are swapped

  • t0 – Time constant of the decay of the chain dynamics (~1000)

  • nu – Initial amplitude of the dynamics (~100)

  • max_chunk_size – Maximum number of steps to take in a single sampler run

  • log_prior – std::function for the log_prior function &#8212; takes double *position, int *param_status, int dimension, int chain_id

  • log_likelihood – std::function for the log_likelihood function &#8212; takes double *position, int *param_status,int dimension, int chain_id

  • fisher – std::function for the fisher function &#8212; takes double *position, int *param_status,int dimension, double **output_fisher, int chain_id

  • RJ_proposal – std::function for the log_likelihood function &#8212; takes double *position, int *param_status,int dimension, int chain_id

  • user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*

  • numThreads – Number of threads to use (=1 is single threaded)

  • pool – boolean to use stochastic chain swapping (MUST have >2 threads)

  • show_prog – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)

  • statistics_filename – Filename to output sampling statistics, if empty string, not output

  • chain_filename – Filename to output data (chain 0 only), if empty string, not output

  • likelihood_log_filename – Filename to write the log_likelihood and log_prior at each step &#8212; use empty string to skip

  • checkpoint_file – Filename to output data for checkpoint, if empty string, not saved

void RJPTMCMC_MH_dynamic_PT_alloc_comprehensive(mcmc_sampler_output *sampler_output, double **output, int **parameter_status, int *model_status, int nested_model_number, int max_dimension, int min_dimension, int N_steps, int chain_N, int max_chain_N_thermo_ensemble, double *initial_pos, int *initial_status, int initial_model_status, double *seeding_var, double **ensemble_initial_pos, int **ensemble_initial_status, int *ensemble_initial_model_status, double *chain_temps, int swp_freq, int t0, int nu, int max_chunk_size, std::string chain_distribution_scheme, double (*log_prior)(double *param, int *status, int model_status, mcmc_data_interface *interface, void *parameters), double (*log_likelihood)(double *param, int *status, int model_status, mcmc_data_interface *interface, void *parameters), void (*fisher)(double *param, int *status, int model_status, double **fisher, mcmc_data_interface *interface, void *parameters), void (*RJ_proposal)(double *current_param, double *proposed_param, int *current_status, int *proposed_status, int *current_model_status, int *proposed_model_status, double *MH_corrections, mcmc_data_interface *interface, void *parameters), void **user_parameters, int numThreads, bool pool, bool show_prog, bool update_RJ_width, std::string statistics_filename, std::string chain_filename, std::string likelihood_log_filename, std::string checkpoint_file)
void continue_RJPTMCMC_MH_dynamic_PT_alloc_comprehensive_internal(std::string checkpoint_file_start, mcmc_sampler_output *sampler_output, double **output, int **parameter_status, int *model_status, int nested_model_number, int N_steps, int max_chain_N_thermo_ensemble, double *chain_temps, int swp_freq, int t0, int nu, int max_chunk_size, std::string chain_distribution_scheme, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_prior, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_likelihood, std::function<void(double*, int*, int, double**, mcmc_data_interface*, void*)> fisher, std::function<void(double*, double*, int*, int*, int*, int*, double*, mcmc_data_interface*, void*)> RJ_proposal, void **user_parameters, int numThreads, bool pool, bool show_prog, bool update_RJ_width, std::string statistics_filename, std::string chain_filename, std::string likelihood_log_filename, std::string checkpoint_file)

Parallel tempered, dynamic chain allocation MCMC. Reversible Jump version, which doesn’t currently have any autocorrelation calculations currently.

Runs dynamic chain allocation, then it will run the RJPTMCMC routine repeatedly

Parameters:
  • output – [out] Output shape is double[N_steps,dimension]

  • parameter_status – [out] Output shape is int[N_steps,dimension]

  • N_steps – Number of total steps to be taken, per chain AFTER chain allocation

  • max_chain_N_thermo_ensemble – Maximum number of chains to use in the thermodynamic ensemble (may use less)

  • chain_temps – [out] Final chain temperatures used &#8212; should be shape double[chain_N]

  • swp_freq – the frequency with which chains are swapped

  • t0 – Time constant of the decay of the chain dynamics (~1000)

  • nu – Initial amplitude of the dynamics (~100)

  • max_chunk_size – Maximum number of steps to take in a single sampler run

  • log_prior – std::function for the log_prior function &#8212; takes double *position, int *param_status, int dimension, int chain_id

  • log_likelihood – std::function for the log_likelihood function &#8212; takes double *position, int *param_status,int dimension, int chain_id

  • fisher – std::function for the fisher function &#8212; takes double *position, int *param_status,int dimension, double **output_fisher, int chain_id

  • RJ_proposal – std::function for the log_likelihood function &#8212; takes double *position, int *param_status,int dimension, int chain_id

  • user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*

  • numThreads – Number of threads to use (=1 is single threaded)

  • pool – boolean to use stochastic chain swapping (MUST have >2 threads)

  • show_prog – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)

  • statistics_filename – Filename to output sampling statistics, if empty string, not output

  • chain_filename – Filename to output data (chain 0 only), if empty string, not output

  • likelihood_log_filename – Filename to write the log_likelihood and log_prior at each step &#8212; use empty string to skip

  • checkpoint_file – Filename to output data for checkpoint, if empty string, not saved

void continue_RJPTMCMC_MH_dynamic_PT_alloc_comprehensive(std::string checkpoint_file_start, mcmc_sampler_output *sampler_output, double **output, int **parameter_status, int *model_status, int nested_model_number, int N_steps, int max_chain_N_thermo_ensemble, double *chain_temps, int swp_freq, int t0, int nu, int max_chunk_size, std::string chain_distribution_scheme, double (*log_prior)(double *param, int *status, int model_status, mcmc_data_interface *interface, void *parameters), double (*log_likelihood)(double *param, int *status, int model_status, mcmc_data_interface *interface, void *parameters), void (*fisher)(double *param, int *status, int model_status, double **fisher, mcmc_data_interface *interface, void *parameters), void (*RJ_proposal)(double *current_param, double *proposed_param, int *current_status, int *proposed_status, int *current_model_status, int *proposed_model_status, double *MH_corrections, mcmc_data_interface *interface, void *parameters), void **user_parameters, int numThreads, bool pool, bool show_prog, bool update_RJ_width, std::string statistics_filename, std::string chain_filename, std::string likelihood_log_filename, std::string checkpoint_file)
void dynamic_temperature_full_ensemble_internal(sampler *samplerptr, int N_steps, double nu, int t0, int swp_freq, bool show_prog)
void continue_PTMCMC_MH_dynamic_PT_alloc_full_ensemble_internal(std::string checkpoint_file_start, double ***output, int N_steps, double *chain_temps, int swp_freq, int t0, int nu, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_prior, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_likelihood, std::function<void(double*, int*, int, double**, mcmc_data_interface*, void*)> fisher, void **user_parameters, int numThreads, bool pool, bool show_prog, std::string statistics_filename, std::string chain_filename, std::string checkpoint_file, bool burn_phase)

Continue dyanmically tunes an MCMC for optimal spacing. step width, and chain number.

NOTE: nu, and t0 parameters determine the dynamics, so these are important quantities. nu is related to how many swap attempts it takes to substantially change the temperature ladder, why t0 determines the length of the total dyanimcally period. Moderate initial choices would be 10 and 1000, respectively.

Based on arXiv:1501.05823v3

Currently, Chain number is fixed

max_chain_N_thermo_ensemble sets the maximium number of chains to use to in successively hotter chains to cover the likelihood surface while targeting an optimal swap acceptance target_swp_acc.

max_chain_N determines the total number of chains to run once thermodynamic equilibrium has been reached. This results in chains being added after the initial PT dynamics have finished according to chain_distribution_scheme.

If no preference, set max_chain_N_thermo_ensemble = max_chain_N = numThreads = (number of cores (number of threads if hyperthreaded))&#8212; this will most likely be the most optimal configuration. If the number of cores on the system is low, you may want to use n*numThreads for some integer n instead, depending on the system.

chain_distribution_scheme:

“cold”: All chains are added at T=1 (untempered)

“refine”: Chains are added between the optimal temps geometrically &#8212; this may be a good option as it will be a good approximation of the ideal distribution of chains, while keeping the initial dynamical time low

“double”: Chains are added in order of rising temperature that mimic the distribution achieved by the earier PT dynamics

“half_ensemble”: For every cold chain added, half of the ensemble is added again. Effectively, two cold chains for every ensemble

Parameters:
  • output – [out] Output chains, shape is double[max_chain_N, N_steps,dimension]

  • N_steps – Number of total steps to be taken, per chain AFTER chain allocation

  • chain_temps – [out] Final chain temperatures used &#8212; should be shape double[chain_N]

  • swp_freq – the frequency with which chains are swapped

  • t0 – Time constant of the decay of the chain dynamics (~1000)

  • nu – Initial amplitude of the dynamics (~100)

  • log_prior – std::function for the log_prior function &#8212; takes double *position, int dimension, int chain_id

  • log_likelihood – std::function for the log_likelihood function &#8212; takes double *position, int dimension, int chain_id

  • fisher – std::function for the fisher function &#8212; takes double *position, int dimension, double **output_fisher, int chain_id

  • user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*

  • numThreads – Number of threads to use (=1 is single threaded)

  • pool – boolean to use stochastic chain swapping (MUST have >2 threads)

  • show_prog – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)

  • statistics_filename – Filename to output sampling statistics, if empty string, not output

  • chain_filename – Filename to output data (chain 0 only), if empty string, not output

  • checkpoint_file – Filename to output data for checkpoint, if empty string, not saved

void continue_RJPTMCMC_MH_dynamic_PT_alloc_full_ensemble_internal(std::string checkpoint_file_start, double ***output, int ***status, int **model_status, int nested_model_number, int N_steps, double *chain_temps, int swp_freq, int t0, int nu, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_prior, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_likelihood, std::function<void(double*, int*, int, double**, mcmc_data_interface*, void*)> fisher, std::function<void(double*, double*, int*, int*, int*, int*, double*, mcmc_data_interface*, void*)> RJ_proposal, void **user_parameters, int numThreads, bool pool, bool show_prog, bool update_RJ_width, std::string statistics_filename, std::string chain_filename, std::string checkpoint_file, bool burn_phase)

Continue dyanmically tunes an MCMC for optimal spacing. step width, and chain number.

NOTE: nu, and t0 parameters determine the dynamics, so these are important quantities. nu is related to how many swap attempts it takes to substantially change the temperature ladder, why t0 determines the length of the total dyanimcally period. Moderate initial choices would be 10 and 1000, respectively.

Based on arXiv:1501.05823v3

Currently, Chain number is fixed

max_chain_N_thermo_ensemble sets the maximium number of chains to use to in successively hotter chains to cover the likelihood surface while targeting an optimal swap acceptance target_swp_acc.

max_chain_N determines the total number of chains to run once thermodynamic equilibrium has been reached. This results in chains being added after the initial PT dynamics have finished according to chain_distribution_scheme.

If no preference, set max_chain_N_thermo_ensemble = max_chain_N = numThreads = (number of cores (number of threads if hyperthreaded))&#8212; this will most likely be the most optimal configuration. If the number of cores on the system is low, you may want to use n*numThreads for some integer n instead, depending on the system.

chain_distribution_scheme:

“cold”: All chains are added at T=1 (untempered)

“refine”: Chains are added between the optimal temps geometrically &#8212; this may be a good option as it will be a good approximation of the ideal distribution of chains, while keeping the initial dynamical time low

“double”: Chains are added in order of rising temperature that mimic the distribution achieved by the earier PT dynamics

“half_ensemble”: For every cold chain added, half of the ensemble is added again. Effectively, two cold chains for every ensemble

Parameters:
  • output – [out] Output chains, shape is double[max_chain_N, N_steps,dimension]

  • status – [out] output parameter status array, dimensions: status[chain_N][N_steps][dimension]

  • N_steps – Number of total steps to be taken, per chain AFTER chain allocation

  • chain_temps – [out] Final chain temperatures used &#8212; should be shape double[chain_N]

  • swp_freq – the frequency with which chains are swapped

  • t0 – Time constant of the decay of the chain dynamics (~1000)

  • nu – Initial amplitude of the dynamics (~100)

  • log_prior – std::function for the log_prior function &#8212; takes double *position, int *param_status, int dimension, int chain_id

  • log_likelihood – std::function for the log_likelihood function &#8212; takes double *position, int *param_status,int dimension, int chain_id

  • fisher – std::function for the fisher function &#8212; takes double *position, int *param_status,int dimension, double **output_fisher, int chain_id

  • RJ_proposal – std::function for the log_likelihood function &#8212; takes double *position, int *param_status,int dimension, int chain_id

  • user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*

  • numThreads – Number of threads to use (=1 is single threaded)

  • pool – boolean to use stochastic chain swapping (MUST have >2 threads)

  • show_prog – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)

  • statistics_filename – Filename to output sampling statistics, if empty string, not output

  • chain_filename – Filename to output data (chain 0 only), if empty string, not output

  • checkpoint_file – Filename to output data for checkpoint, if empty string, not saved

void continue_RJPTMCMC_MH(std::string start_checkpoint_file, double ***output, int ***status, int **model_status, int nested_model_number, int N_steps, int swp_freq, double (*log_prior)(double *param, int *status, int model_status, mcmc_data_interface *interface, void *parameters), double (*log_likelihood)(double *param, int *status, int model_status, mcmc_data_interface *interface, void *parameters), void (*fisher)(double *param, int *status, int model_status, double **fisher, mcmc_data_interface *interface, void *parameters), void (*RJ_proposal)(double *current_param, double *proposed_param, int *current_status, int *proposed_status, int *current_model_status, int *proposed_model_status, double *MH_corrections, mcmc_data_interface *interface, void *parameters), void **user_parameters, int numThreads, bool pool, bool show_prog, std::string statistics_filename, std::string chain_filename, std::string auto_corr_filename, std::string likelihood_log_filename, std::string end_checkpoint_file)
Parameters:
  • start_checkpoint_file – File for starting checkpoint

  • output – [out] output array, dimensions: output[chain_N][N_steps][dimension]

  • status – [out] output parameter status array, dimensions: status[chain_N][N_steps][dimension]

  • N_steps – Number of new steps to take

  • swp_freq – frequency of swap attempts between temperatures

  • RJ_proposal – std::function for the log_likelihood function &#8212; takes double *position, int *param_status,int dimension, int chain_id

  • user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*

  • numThreads – Number of threads to use

  • pool – Boolean for whether to use deterministic'' vs stochastic’’ sampling

  • show_prog – Boolean for whether to show progress or not (turn off for cluster runs

  • statistics_filename – Filename to output sampling statistics, if empty string, not output

  • chain_filename – Filename to output data (chain 0 only), if empty string, not output

  • auto_corr_filename – Filename to output auto correlation in some interval, if empty string, not output

  • likelihood_log_filename – Filename to write the log_likelihood and log_prior at each step &#8212; use empty string to skip

  • end_checkpoint_file – Filename to output data for checkpoint at the end of the continued run, if empty string, not saved

void PTMCMC_MH_dynamic_PT_alloc_uncorrelated_internal_driver(mcmc_sampler_output *sampler_output, double **output, int dimension, int N_steps, int chain_N, int max_chain_N_thermo_ensemble, int swp_freq, int t0, int nu, int max_chunk_size, std::string chain_distribution_scheme, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_prior, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_likelihood, std::function<void(double*, int*, int, double**, mcmc_data_interface*, void*)> fisher, void **user_parameters, int numThreads, bool pool, bool show_prog, std::string statistics_filename, std::string chain_filename, std::string likelihood_log_filename, std::string checkpoint_file, bool continue_burn)

Driver routine for the uncorrelated sampler &#8212; trying not to repeat code.

Assumed that the checkpoint_file has already been populated &#8212;

It will overwrite all the file paths

Parameters:
  • dimension – dimension of the parameter space being explored

  • N_steps – Number of total steps to be taken, per chain AFTER chain allocation

  • chain_N – Maximum number of chains to use

  • max_chain_N_thermo_ensemble – Maximum number of chains to use in the thermodynamic ensemble (may use less)

  • swp_freq – the frequency with which chains are swapped

  • t0 – Time constant of the decay of the chain dynamics (~1000)

  • nu – Initial amplitude of the dynamics (~100)

  • max_chunk_size – Maximum number of steps to take in a single sampler run

  • log_prior – std::function for the log_prior function &#8212; takes double *position, int dimension, int chain_id

  • log_likelihood – std::function for the log_likelihood function &#8212; takes double *position, int dimension, int chain_id

  • fisher – std::function for the fisher function &#8212; takes double *position, int dimension, double **output_fisher, int chain_id

  • user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*

  • numThreads – Number of threads to use (=1 is single threaded)

  • pool – boolean to use stochastic chain swapping (MUST have >2 threads)

  • show_prog – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)

  • statistics_filename – Filename to output sampling statistics, if empty string, not output

  • chain_filename – Filename to output data (chain 0 only), if empty string, not output

  • likelihood_log_filename – Filename to write the log_likelihood and log_prior at each step &#8212; use empty string to skip

  • checkpoint_file – Filename to output data for checkpoint, if empty string, not saved

void continue_RJPTMCMC_MH_internal(sampler *samplerptr, std::string start_checkpoint_file, double ***output, int ***status, int **model_status, int nested_model_number, int N_steps, int swp_freq, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_prior, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_likelihood, std::function<void(double*, int*, int, double**, mcmc_data_interface*, void*)> fisher, std::function<void(double*, double*, int*, int*, int*, int*, double*, mcmc_data_interface*, void*)> RJ_proposal, void **user_parameters, int numThreads, bool pool, bool show_prog, bool update_RJ_width, std::string statistics_filename, std::string chain_filename, std::string auto_corr_filename, std::string likelihood_log_filename, std::string end_checkpoint_file, bool tune, bool burn_phase)

Routine to take a checkpoint file and begin a new chain at said checkpoint.

See MCMC_MH_internal for more details of parameters (pretty much all the same)

Parameters:
  • start_checkpoint_file – File for starting checkpoint

  • output – [out] output array, dimensions: output[chain_N][N_steps][dimension]

  • status – [out] output parameter status array, dimensions: status[chain_N][N_steps][dimension]

  • N_steps – Number of new steps to take

  • swp_freq – frequency of swap attempts between temperatures

  • log_prior – std::function for the log_prior function &#8212; takes double *position, int *param_status, int dimension, int chain_id

  • log_likelihood – std::function for the log_likelihood function &#8212; takes double *position, int *param_status,int dimension, int chain_id

  • fisher – std::function for the fisher function &#8212; takes double *position, int *param_status,int dimension, double **output_fisher, int chain_id

  • RJ_proposal – std::function for the log_likelihood function &#8212; takes double *position, int *param_status,int dimension, int chain_id

  • user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*

  • numThreads – Number of threads to use

  • pool – Boolean for whether to use deterministic'' vs stochastic’’ sampling

  • show_prog – Boolean for whether to show progress or not (turn off for cluster runs

  • statistics_filename – Filename to output sampling statistics, if empty string, not output

  • chain_filename – Filename to output data (chain 0 only), if empty string, not output &#8212; if multiple cold chains, it will append each output to the other, and write out the total

  • auto_corr_filename – Filename to output auto correlation in some interval, if empty string, not output

  • likelihood_log_filename – Filename to write the log_likelihood and log_prior at each step &#8212; use empty string to skip

  • end_checkpoint_file – Filename to output data for checkpoint at the end of the continued run, if empty string, not saved

void RJPTMCMC_MH_internal(double ***output, int ***parameter_status, int **model_status, int nested_model_number, int max_dimension, int min_dimension, int N_steps, int chain_N, double *initial_pos, int *initial_status, int initial_model_status, double *seeding_var, double **ensemble_initial_position, int **ensemble_initial_status, int *ensemble_initial_model_status, double *chain_temps, int swp_freq, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_prior, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_likelihood, std::function<void(double*, int*, int, double**, mcmc_data_interface*, void*)> fisher, std::function<void(double*, double*, int*, int*, int*, int*, double*, mcmc_data_interface*, void*)> RJ_proposal, void **user_parameters, int numThreads, bool pool, bool show_prog, bool update_RJ_width, std::string statistics_filename, std::string chain_filename, std::string auto_corr_filename, std::string likelihood_log_filename, std::string checkpoint_file)

Generic reversable jump sampler, where the likelihood, prior, and reversable jump proposal are parameters supplied by the user.

Note: Using a min_dimension tells the sampler that there is a base model'', and that the dimensions from min_dim to max_dim are small’’ corrections to that model. This helps inform some of the proposal algorithms and speeds up computation. If using discrete models with no overlap of variables (ie model A or model B), set min_dim to 0. Even if reusing certain parameters, if the extra dimensions don’t describe small'' deviations, it's probably best to set min_dim to 0.Since the RJ proposal is user specified, even if there are parameters that should never be removed, it's up to the user to dictate that. Using min_dim will not affect that aspect of the sampler. If there's a base-model’’, the fisher function should produce a fisher matrix for the base model only. The modifications are then normally distributed around the last parameter value. Then the fisher function should take the minimum dimension instead of the maximum, like the other functions.

Currently, no dynamic PT option, as it would be too many free parameters for the sampler to converge to a reasonable temperature distribution in a reasonable amount of time. Best use case, use the PTMCMC_MH_dynamic for the `base’ dimension space, and use that temperature ladder.

Base of the sampler, generic, with user supplied quantities for most of the samplers properties

Uses the Metropolis-Hastings method, with the option for Fisher/MALA steps if the Fisher routine is supplied.

3 modes to use -

single threaded (numThreads = 1) runs single threaded

multi-threaded `deterministic’ (numThreads>1 ; pool = false) progresses each chain in parallel for swp_freq steps, then waits for all threads to complete before swapping temperatures in sequenctial order (j, j+1) then (j+1, j+2) etc (sequenctially)

multi-threaded `stochastic’ (numThreads>2 ; pool = true) progresses each chain in parallel by queueing each temperature and evaluating them in the order they were submitted. Once finished, the threads are queued to swap, where they swapped in the order they are submitted. This means the chains are swapped randomly, and the chains do NOT finish at the same time. The sampler runs until the the 0th chain reaches the step number

Note on limits: In the prior function, if a set of parameters should be disallowed, return -std::numeric_limits<double>::infinity() &#8212; (this is in the <limits> file in std)

The parameter array uses the dimensions [0,min_dim] always, and [min_dim, max_dim] in RJPTMCMC fashion

Format for the auto_corr file (compatable with csv, dat, txt extensions): each row is a dimension of the cold chain, with the first row being the lengths used for the auto-corr calculation:

lengths: length1 , length2 …

dim1: length1 , length2 …

Format for the chain file (compatable with csv, dat, txt extensions): each row is a step, each column a dimension:

Step1: dim1 , dim2 , …, max_dim, param_status1, param_status2, …

Step2: dim1 , dim2 , …, max_dim, param_status1, param_status2, …

Statistics_filename : should be txt extension

checkpoint_file : This file saves the final position of all the chains, as well as other metadata, and can be loaded by the function <FUNCTION> to continue the chain from the point it left off. Not meant to be read by humans, the data order is custom to this software library. An empty string (“”) means no checkpoint will be saved. For developers, the contents are:

dimension, # of chains

temps of chains

Stepping widths of all chains

Final position of all chains

Parameters:
  • output – [out] Output chains, shape is double[chain_N, N_steps,dimension]

  • parameter_status – [out] Parameter status for each step corresponding to the output chains, shape is double[chain_N, N_steps,dimension]

  • max_dimension – maximum dimension of the parameter space being explored &#8212; only consideration is memory, as memory scales with dimension. Keep this reasonable, unless memory is REALLY not an issue

  • min_dimension – minimum dimension of the parameter space being explored >=1

  • N_steps – Number of total steps to be taken, per chain

  • chain_N – Number of chains

  • initial_pos – Initial position in parameter space - shape double[dimension]

  • initial_status – Initial status of the parameters in the initial position in parameter space - shape int[max_dim]

  • initial_model_status – Initial status of the parameters in the initial position in parameter space - shape int[max_dim]

  • seeding_var – Variance of the normal distribution used to seed each chain higher than 0 - shape double[max_dimension] &#8212; initial seeding of zero corresponds to the dimension turned off initially

  • chain_temps – Double array of temperatures for the chains

  • swp_freq – the frequency with which chains are swapped

  • log_prior – std::function for the log_prior function &#8212; takes double *position, int *param_status, int dimension, int chain_id

  • log_likelihood – std::function for the log_likelihood function &#8212; takes double *position, int *param_status,int dimension, int chain_id

  • fisher – std::function for the fisher function &#8212; takes double *position, int *param_status,int dimension, double **output_fisher, int chain_id

  • RJ_proposal – std::function for the log_likelihood function &#8212; takes double *position, int *param_status,int dimension, int chain_id

  • user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*

  • numThreads – Number of threads to use (=1 is single threaded)

  • pool – boolean to use stochastic chain swapping (MUST have >2 threads)

  • show_prog – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)

  • update_RJ_width – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)

  • statistics_filename – Filename to output sampling statistics, if empty string, not output

  • chain_filename – Filename to output data (chain 0 only), if empty string, not output &#8212; if multiple cold chains, it will append each output to the other, and write out the total

  • auto_corr_filename – Filename to output auto correlation in some interval, if empty string, not output

  • likelihood_log_filename – Filename to write the log_likelihood and log_prior at each step &#8212; use empty string to skip

  • checkpoint_file – Filename to output data for checkpoint, if empty string, not saved

void RJPTMCMC_MH(double ***output, int ***parameter_status, int **model_status, int nested_model_number, int max_dimension, int min_dimension, int N_steps, int chain_N, double *initial_pos, int *initial_status, int initial_model_status, double *seeding_var, double **ensemble_initial_position, int **ensemble_initial_status, int *ensemble_initial_model_status, double *chain_temps, int swp_freq, double (*log_prior)(double *param, int *status, int model_status, mcmc_data_interface*, void *parameters), double (*log_likelihood)(double *param, int *status, int model_status, mcmc_data_interface*, void *parameters), void (*fisher)(double *param, int *status, int model_status, double **fisher, mcmc_data_interface*, void *parameters), void (*RJ_proposal)(double *current_param, double *proposed_param, int *current_status, int *proposed_status, int *current_model_status, int *proposed_model_status, double *MH_corrections, mcmc_data_interface*, void *parameters), void **user_parameters, int numThreads, bool pool, bool show_prog, std::string statistics_filename, std::string chain_filename, std::string auto_corr_filename, std::string likelihood_log_filename, std::string checkpoint_file)
Parameters:
  • output – [out] Output chains, shape is double[chain_N, N_steps,dimension]

  • parameter_status – [out] Parameter status for each step corresponding to the output chains, shape is double[chain_N, N_steps,dimension]

  • max_dimension – maximum dimension of the parameter space being explored &#8212; only consideration is memory, as memory scales with dimension. Keep this reasonable, unless memory is REALLY not an issue

  • min_dimension – minimum dimension of the parameter space being explored >=1

  • N_steps – Number of total steps to be taken, per chain

  • chain_N – Number of chains

  • initial_pos – Initial position in parameter space - shape double[dimension]

  • initial_status – Initial status of the parameters in the initial position in parameter space - shape int[max_dim]

  • initial_model_status – Initial status of the parameters in the initial position in parameter space - shape int[max_dim]

  • seeding_var – Variance of the normal distribution used to seed each chain higher than 0 - shape double[max_dimension] &#8212; initial seeding of zero corresponds to the dimension turned off initially

  • chain_temps – Double array of temperatures for the chains

  • swp_freq – the frequency with which chains are swapped

  • RJ_proposal – std::function for the log_likelihood function &#8212; takes double *position, int *param_status,int dimension, int chain_id

  • user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*

  • numThreads – Number of threads to use (=1 is single threaded)

  • pool – boolean to use stochastic chain swapping (MUST have >2 threads)

  • show_prog – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)

  • statistics_filename – Filename to output sampling statistics, if empty string, not output

  • chain_filename – Filename to output data (chain 0 only), if empty string, not output

  • auto_corr_filename – Filename to output auto correlation in some interval, if empty string, not output

  • likelihood_log_filename – Filename to write the log_likelihood and log_prior at each step &#8212; use empty string to skip

  • checkpoint_file – Filename to output data for checkpoint, if empty string, not saved

void continue_PTMCMC_MH_dynamic_PT_alloc_uncorrelated(std::string checkpoint_file_start, mcmc_sampler_output *sampler_output, double **output, int N_steps, int max_chain_N_thermo_ensemble, double *chain_temps, int swp_freq, int t0, int nu, int max_chunk_size, std::string chain_distribution_scheme, double (*log_prior)(double *param, mcmc_data_interface *interface, void *parameters), double (*log_likelihood)(double *param, mcmc_data_interface *interface, void *parameters), void (*fisher)(double *param, double **fisher, mcmc_data_interface *interface, void *parameters), void **user_parameters, int numThreads, bool pool, bool show_prog, std::string statistics_filename, std::string chain_filename, std::string likelihood_log_filename, std::string checkpoint_file)
Parameters:
  • output – [out] Output, shape is double[N_steps,dimension]

  • N_steps – Number of total steps to be taken, per chain AFTER chain allocation

  • max_chain_N_thermo_ensemble – Maximum number of chains to use in the thermodynamic ensemble (may use less)

  • chain_temps – Final chain temperatures used &#8212; should be shape double[chain_N]

  • swp_freq – the frequency with which chains are swapped

  • t0 – Time constant of the decay of the chain dynamics (~1000)

  • nu – Initial amplitude of the dynamics (~100)

  • max_chunk_size – Maximum number of steps to take in a single sampler run

  • log_prior – Funcion pointer for the log_prior

  • log_likelihood – Function pointer for the log_likelihood

  • fisher – Function pointer for the fisher - if NULL, fisher steps are not used

  • user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*

  • numThreads – Number of threads to use (=1 is single threaded)

  • pool – boolean to use stochastic chain swapping (MUST have >2 threads)

  • show_prog – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)

  • statistics_filename – Filename to output sampling statistics, if empty string, not output

  • chain_filename – Filename to output data (chain 0 only), if empty string, not output

  • likelihood_log_filename – Filename to write the log_likelihood and log_prior at each step &#8212; use empty string to skip

  • checkpoint_file – Filename to output data for checkpoint, if empty string, not saved

void continue_PTMCMC_MH_dynamic_PT_alloc_uncorrelated_internal(std::string checkpoint_file_start, mcmc_sampler_output *sampler_output, double **output, int N_steps, int max_chain_N_thermo_ensemble, double *chain_temps, int swp_freq, int t0, int nu, int max_chunk_size, std::string chain_distribution_scheme, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_prior, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_likelihood, std::function<void(double*, int*, int, double**, mcmc_data_interface*, void*)> fisher, void **user_parameters, int numThreads, bool pool, bool show_prog, std::string statistics_filename, std::string chain_filename, std::string likelihood_log_filename, std::string checkpoint_file)

Parallel tempered, dynamic chain allocation MCMC with output samples with specified maximum autocorrelation.

Runs dynamic chain allocation until the autocorrelation lengths stabilize, then it will run the PTMCMC routine repeatedly, periodically thinning the chain to ensure the auto-correlation length is below the threshold

Note: smallest batch size is .5*N_steps, so the target correlation length should be << less than this number

Note: This routine only works if the requested number of samples is larger than the typical correlation length of the unthinned chains. This is because the chains are run and tested in batches the size of the requested sample number.

Note: This method does NOT guarantee the final autocorrelation length of the chains will the be the target. It merely uses the requested autocorrelation length as a guide to thin the chains as samples are accrued and to estimate the total number of effective samples. Its best to request extra samples and thin the chains out at the end one final time, or to run multiple runs and combine the results at the end.

Parameters:
  • output – [out] Output shape is double[N_steps,dimension]

  • N_steps – Number of total steps to be taken, per chain AFTER chain allocation

  • max_chain_N_thermo_ensemble – Maximum number of chains to use in the thermodynamic ensemble (may use less)

  • chain_temps – [out] Final chain temperatures used &#8212; should be shape double[chain_N]

  • swp_freq – the frequency with which chains are swapped

  • t0 – Time constant of the decay of the chain dynamics (~1000)

  • nu – Initial amplitude of the dynamics (~100)

  • max_chunk_size – Maximum number of steps to take in a single sampler run

  • log_prior – std::function for the log_prior function &#8212; takes double *position, int dimension, int chain_id

  • log_likelihood – std::function for the log_likelihood function &#8212; takes double *position, int dimension, int chain_id

  • fisher – std::function for the fisher function &#8212; takes double *position, int dimension, double **output_fisher, int chain_id

  • user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*

  • numThreads – Number of threads to use (=1 is single threaded)

  • pool – boolean to use stochastic chain swapping (MUST have >2 threads)

  • show_prog – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)

  • statistics_filename – Filename to output sampling statistics, if empty string, not output

  • chain_filename – Filename to output data (chain 0 only), if empty string, not output

  • likelihood_log_filename – Filename to write the log_likelihood and log_prior at each step &#8212; use empty string to skip

  • checkpoint_file – Filename to output data for checkpoint, if empty string, not saved

void PTMCMC_MH_dynamic_PT_alloc_uncorrelated(mcmc_sampler_output *sampler_output, double **output, int dimension, int N_steps, int chain_N, int max_chain_N_thermo_ensemble, double *initial_pos, double *seeding_var, double **ensemble_initial_position, double *chain_temps, int swp_freq, int t0, int nu, int max_chunk_size, std::string chain_distribution_scheme, double (*log_prior)(double *param, mcmc_data_interface *interface, void *parameters), double (*log_likelihood)(double *param, mcmc_data_interface *interface, void *parameters), void (*fisher)(double *param, double **fisher, mcmc_data_interface *interface, void *parameters), void **user_parameters, int numThreads, bool pool, bool show_prog, std::string statistics_filename, std::string chain_filename, std::string likelihood_log_filename, std::string checkpoint_file)
Parameters:
  • output – [out] Output, shape is double[N_steps,dimension]

  • dimension – dimension of the parameter space being explored

  • N_steps – Number of total steps to be taken, per chain AFTER chain allocation

  • chain_N – Maximum number of chains to use

  • max_chain_N_thermo_ensemble – Maximum number of chains to use in the thermodynamic ensemble (may use less)

  • initial_pos – Initial position in parameter space - shape double[dimension]

  • seeding_var – Variance of the normal distribution used to seed each chain higher than 0 - shape double[dimension]

  • chain_temps – Final chain temperatures used &#8212; should be shape double[chain_N]

  • swp_freq – the frequency with which chains are swapped

  • t0 – Time constant of the decay of the chain dynamics (~1000)

  • nu – Initial amplitude of the dynamics (~100)

  • max_chunk_size – Maximum number of steps to take in a single sampler run

  • log_prior – Funcion pointer for the log_prior

  • log_likelihood – Function pointer for the log_likelihood

  • fisher – Function pointer for the fisher - if NULL, fisher steps are not used

  • user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*

  • numThreads – Number of threads to use (=1 is single threaded)

  • pool – boolean to use stochastic chain swapping (MUST have >2 threads)

  • show_prog – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)

  • statistics_filename – Filename to output sampling statistics, if empty string, not output

  • chain_filename – Filename to output data (chain 0 only), if empty string, not output

  • likelihood_log_filename – Filename to write the log_likelihood and log_prior at each step &#8212; use empty string to skip

  • checkpoint_file – Filename to output data for checkpoint, if empty string, not saved

void PTMCMC_MH_dynamic_PT_alloc_uncorrelated_internal(mcmc_sampler_output *sampler_output, double **output, int dimension, int N_steps, int chain_N, int max_chain_N_thermo_ensemble, double *initial_pos, double *seeding_var, double **ensemble_initial_position, double *chain_temps, int swp_freq, int t0, int nu, int max_chunk_size, std::string chain_distribution_scheme, std::function<double(double*, int*, int, mcmc_data_interface *interface, void*)> log_prior, std::function<double(double*, int*, int, mcmc_data_interface *interface, void*)> log_likelihood, std::function<void(double*, int*, int, double**, mcmc_data_interface *interface, void*)> fisher, void **user_parameters, int numThreads, bool pool, bool show_prog, std::string statistics_filename, std::string chain_filename, std::string likelihood_log_filename, std::string checkpoint_file)
void dynamic_temperature_internal(sampler *samplerptr, int N_steps, double nu, int t0, int swp_freq, int max_chain_N_thermo_ensemble, bool dynamic_chain_number, bool show_prog)
void continue_PTMCMC_MH_dynamic_PT_alloc_internal(std::string checkpoint_file_start, double ***output, int N_steps, int max_chain_N_thermo_ensemble, double *chain_temps, int swp_freq, int t0, int nu, std::string chain_distribution_scheme, std::function<double(double*, int*, int, mcmc_data_interface *interface, void*)> log_prior, std::function<double(double*, int*, int, mcmc_data_interface *interface, void*)> log_likelihood, std::function<void(double*, int*, int, double**, mcmc_data_interface *interface, void*)> fisher, void **user_parameters, int numThreads, bool pool, bool show_prog, bool dynamic_chain_number, std::string statistics_filename, std::string chain_filename, std::string checkpoint_file, bool burn_phase)
void continue_PTMCMC_MH_dynamic_PT_alloc(std::string checkpoint_file_start, double ***output, int N_steps, int max_chain_N_thermo_ensemble, double *chain_temps, int swp_freq, int t0, int nu, std::string chain_distribution_scheme, double (*log_prior)(double *param, mcmc_data_interface *interface, void *parameters), double (*log_likelihood)(double *param, mcmc_data_interface *interface, void *parameters), void (*fisher)(double *param, double **fisher, mcmc_data_interface *interface, void *parameters), void **user_parameters, int numThreads, bool pool, bool show_prog, std::string statistics_filename, std::string chain_filename, std::string checkpoint_file)
Parameters:
  • output – [out] Output chains, shape is double[max_chain_N, N_steps,dimension]

  • N_steps – Number of total steps to be taken, per chain AFTER chain allocation

  • max_chain_N_thermo_ensemble – Maximum number of chains to use in the thermodynamic ensemble (may use less)

  • chain_temps – Final chain temperatures used &#8212; should be shape double[chain_N]

  • swp_freq – the frequency with which chains are swapped

  • t0 – Time constant of the decay of the chain dynamics (~1000)

  • nu – Initial amplitude of the dynamics (~100)

  • log_prior – Funcion pointer for the log_prior

  • log_likelihood – Function pointer for the log_likelihood

  • fisher – Function pointer for the fisher - if NULL, fisher steps are not used

  • user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*

  • numThreads – Number of threads to use (=1 is single threaded)

  • pool – boolean to use stochastic chain swapping (MUST have >2 threads)

  • show_prog – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)

  • statistics_filename – Filename to output sampling statistics, if empty string, not output

  • chain_filename – Filename to output data (chain 0 only), if empty string, not output

  • checkpoint_file – Filename to output data for checkpoint, if empty string, not saved

void PTMCMC_MH_dynamic_PT_alloc_internal(double ***output, int dimension, int N_steps, int chain_N, int max_chain_N_thermo_ensemble, double *initial_pos, double *seeding_var, double **ensemble_initial_position, double *chain_temps, int swp_freq, int t0, int nu, std::string chain_distribution_scheme, std::function<double(double*, int*, int, mcmc_data_interface *interface, void*)> log_prior, std::function<double(double*, int*, int, mcmc_data_interface *interface, void*)> log_likelihood, std::function<void(double*, int*, int, double**, mcmc_data_interface *interface, void*)> fisher, void **user_parameters, int numThreads, bool pool, bool show_prog, bool dynamic_chain_number, std::string statistics_filename, std::string chain_filename, std::string checkpoint_file, bool burn_phase)
void PTMCMC_MH_dynamic_PT_alloc(double ***output, int dimension, int N_steps, int chain_N, int max_chain_N_thermo_ensemble, double *initial_pos, double *seeding_var, double **ensemble_initial_position, double *chain_temps, int swp_freq, int t0, int nu, std::string chain_distribution_scheme, double (*log_prior)(double *param, mcmc_data_interface *interface, void *parameters), double (*log_likelihood)(double *param, mcmc_data_interface *interface, void *parameters), void (*fisher)(double *param, double **fisher, mcmc_data_interface *interface, void *parameters), void **user_parameters, int numThreads, bool pool, bool show_prog, std::string statistics_filename, std::string chain_filename, std::string checkpoint_file)
Parameters:
  • output – [out] Output chains, shape is double[max_chain_N, N_steps,dimension]

  • dimension – dimension of the parameter space being explored

  • N_steps – Number of total steps to be taken, per chain AFTER chain allocation

  • chain_N – Maximum number of chains to use

  • max_chain_N_thermo_ensemble – Maximum number of chains to use in the thermodynamic ensemble (may use less)

  • initial_pos – Initial position in parameter space - shape double[dimension]

  • seeding_var – Variance of the normal distribution used to seed each chain higher than 0 - shape double[dimension]

  • chain_temps – Final chain temperatures used &#8212; should be shape double[chain_N]

  • swp_freq – the frequency with which chains are swapped

  • t0 – Time constant of the decay of the chain dynamics (~1000)

  • nu – Initial amplitude of the dynamics (~100)

  • log_prior – Funcion pointer for the log_prior

  • log_likelihood – Function pointer for the log_likelihood

  • fisher – Function pointer for the fisher - if NULL, fisher steps are not used

  • user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*

  • numThreads – Number of threads to use (=1 is single threaded)

  • pool – boolean to use stochastic chain swapping (MUST have >2 threads)

  • show_prog – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)

  • statistics_filename – Filename to output sampling statistics, if empty string, not output

  • chain_filename – Filename to output data (chain 0 only), if empty string, not output

  • checkpoint_file – Filename to output data for checkpoint, if empty string, not saved

void continue_PTMCMC_MH(std::string start_checkpoint_file, double ***output, int N_steps, int swp_freq, double (*log_prior)(double *param, mcmc_data_interface *interface, void *parameters), double (*log_likelihood)(double *param, mcmc_data_interface *interface, void *parameters), void (*fisher)(double *param, double **fisher, mcmc_data_interface *interface, void *parameters), void **user_parameters, int numThreads, bool pool, bool show_prog, std::string statistics_filename, std::string chain_filename, std::string end_checkpoint_file, bool tune)
Parameters:
  • start_checkpoint_file – File for starting checkpoint

  • output – [out] output array, dimensions: output[chain_N][N_steps][dimension]

  • N_steps – Number of new steps to take

  • swp_freq – frequency of swap attempts between temperatures

  • log_prior – Funcion pointer for the log_prior

  • log_likelihood – Function pointer for the log_likelihood

  • fisher – Function pointer for the fisher - if NULL, fisher steps are not used

  • user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*

  • numThreads – Number of threads to use

  • pool – Boolean for whether to use deterministic'' vs stochastic’’ sampling

  • show_prog – Boolean for whether to show progress or not (turn off for cluster runs

  • statistics_filename – Filename to output sampling statistics, if empty string, not output

  • chain_filename – Filename to output data (chain 0 only), if empty string, not output

  • end_checkpoint_file – Filename to output data for checkpoint at the end of the continued run, if empty string, not saved

void PTMCMC_MH_loop(sampler *samplerptr)

Internal function that runs the actual loop for the sampler.

void PTMCMC_MH_step_incremental(sampler *samplerptr, int increment)

Internal function that runs the actual loop for the sampler &#8212; increment version.

The regular loop function runs for the entire range, this increment version will only step `increment’ steps &#8212; asynchronous: steps are measured by the cold chains

void PTMCMC_MH(double ***output, int dimension, int N_steps, int chain_N, double *initial_pos, double *seeding_var, double **ensemble_initial_position, double *chain_temps, int swp_freq, double (*log_prior)(double *param, mcmc_data_interface *interface, void *parameters), double (*log_likelihood)(double *param, mcmc_data_interface *interface, void *parameters), void (*fisher)(double *param, double **fisher, mcmc_data_interface *interface, void *parameters), void **user_parameters, int numThreads, bool pool, bool show_prog, std::string statistics_filename, std::string chain_filename, std::string checkpoint_filename)
Parameters:
  • output – [out] Output chains, shape is double[chain_N, N_steps,dimension]

  • dimension – dimension of the parameter space being explored

  • N_steps – Number of total steps to be taken, per chain

  • chain_N – Number of chains

  • initial_pos – Initial position in parameter space - shape double[dimension]

  • seeding_var – Variance of the normal distribution used to seed each chain higher than 0 - shape double[dimension]

  • chain_temps – Double array of temperatures for the chains

  • swp_freq – the frequency with which chains are swapped

  • log_prior – Funcion pointer for the log_prior

  • log_likelihood – Function pointer for the log_likelihood

  • fisher – Function pointer for the fisher - if NULL, fisher steps are not used

  • user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*

  • numThreads – Number of threads to use (=1 is single threaded)

  • pool – boolean to use stochastic chain swapping (MUST have >2 threads)

  • show_prog – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)

  • statistics_filename – Filename to output sampling statistics, if empty string, not output

  • chain_filename – Filename to output data (chain 0 only), if empty string, not output

  • checkpoint_filename – Filename to output data for checkpoint, if empty string, not saved

void PTMCMC_MH_internal(double ***output, int dimension, int N_steps, int chain_N, double *initial_pos, double *seeding_var, double **ensemble_initial_position, double *chain_temps, int swp_freq, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_prior, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_likelihood, std::function<void(double*, int*, int, double**, mcmc_data_interface*, void*)> fisher, void **user_parameters, int numThreads, bool pool, bool show_prog, std::string statistics_filename, std::string chain_filename, std::string checkpoint_filename, bool tune, bool burn_phase)

Generic sampler, where the likelihood, prior are parameters supplied by the user.

Base of the sampler, generic, with user supplied quantities for most of the samplers properties

Uses the Metropolis-Hastings method, with the option for Fisher/MALA steps if the Fisher routine is supplied.

3 modes to use -

single threaded (numThreads = 1) runs single threaded

multi-threaded `deterministic’ (numThreads>1 ; pool = false) progresses each chain in parallel for swp_freq steps, then waits for all threads to complete before swapping temperatures in sequenctial order (j, j+1) then (j+1, j+2) etc (sequenctially)

multi-threaded `stochastic’ (numThreads>2 ; pool = true) progresses each chain in parallel by queueing each temperature and evaluating them in the order they were submitted. Once finished, the threads are queued to swap, where they swapped in the order they are submitted. This means the chains are swapped randomly, and the chains do NOT finish at the same time. The sampler runs until the the 0th chain reaches the step number

Note on limits: In the prior function, if a set of parameters should be disallowed, return -std::numeric_limits<double>::infinity() &#8212; (this is in the <limits> file in std)

Format for the auto_corr file (compatable with csv, dat, txt extensions): each row is a dimension of the cold chain, with the first row being the lengths used for the auto-corr calculation:

lengths: length1 , length2 …

dim1: length1 , length2 …

Format for the chain file (compatable with csv, dat, txt extensions): each row is a step, each column a dimension:

Step1: dim1 , dim2 , …

Step2: dim1 , dim2 , …

Statistics_filename : should be txt extension

checkpoint_file : This file saves the final position of all the chains, as well as other metadata, and can be loaded by the function <FUNCTION> to continue the chain from the point it left off. Not meant to be read by humans, the data order is custom to this software library. An empty string (“”) means no checkpoint will be saved. For developers, the contents are:

dimension, # of chains

temps of chains

Stepping widths of all chains

Final position of all chains

Parameters:
  • output – [out] Output chains, shape is double[chain_N, N_steps,dimension]

  • dimension – dimension of the parameter space being explored

  • N_steps – Number of total steps to be taken, per chain

  • chain_N – Number of chains

  • initial_pos – Initial position in parameter space - shape double[dimension]

  • seeding_var – Variance of the normal distribution used to seed each chain higher than 0 - shape double[dimension]

  • chain_temps – Double array of temperatures for the chains

  • swp_freq – the frequency with which chains are swapped

  • log_prior – std::function for the log_prior function &#8212; takes double *position, int dimension, int chain_id

  • log_likelihood – std::function for the log_likelihood function &#8212; takes double *position, int dimension, int chain_id

  • fisher – std::function for the fisher function &#8212; takes double *position, int dimension, double **output_fisher, int chain_id

  • user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*

  • numThreads – Number of threads to use (=1 is single threaded)

  • pool – boolean to use stochastic chain swapping (MUST have >2 threads)

  • show_prog – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)

  • statistics_filename – Filename to output sampling statistics, if empty string, not output

  • chain_filename – Filename to output data (chain 0 only), if empty string, not output

  • checkpoint_filename – Filename to output data for checkpoint, if empty string, not saved

void continue_PTMCMC_MH_internal(sampler *samplerptr, std::string start_checkpoint_file, double ***output, int N_steps, int swp_freq, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_prior, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_likelihood, std::function<void(double*, int*, int, double**, mcmc_data_interface*, void*)> fisher, void **user_parameters, int numThreads, bool pool, bool show_prog, std::string statistics_filename, std::string chain_filename, std::string end_checkpoint_file, bool tune, bool burn_phase)

Routine to take a checkpoint file and begin a new chain at said checkpoint.

See MCMC_MH_internal for more details of parameters (pretty much all the same)

Parameters:
  • start_checkpoint_file – File for starting checkpoint

  • output – [out] output array, dimensions: output[chain_N][N_steps][dimension]

  • N_steps – Number of new steps to take

  • swp_freq – frequency of swap attempts between temperatures

  • log_prior – std::function for the log_prior function &#8212; takes double *position, int dimension, int chain_id

  • log_likelihood – std::function for the log_likelihood function &#8212; takes double *position, int dimension, int chain_id

  • fisher – std::function for the fisher function &#8212; takes double *position, int dimension, double **output_fisher, int chain_id

  • user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*

  • numThreads – Number of threads to use

  • pool – Boolean for whether to use deterministic'' vs stochastic’’ sampling

  • show_prog – Boolean for whether to show progress or not (turn off for cluster runs

  • statistics_filename – Filename to output sampling statistics, if empty string, not output

  • chain_filename – Filename to output data (chain 0 only), if empty string, not output

  • end_checkpoint_file – Filename to output data for checkpoint at the end of the continued run, if empty string, not saved

  • tune – Allow for dynamic tuning &#8212; technically should be turned off when drawing final samples

  • burn_phase – Passed to LL function for user information