mcmc_sampler
Header file for mcmc_sampler
Functions
-
void mcmc_step_threaded(int j)
-
void mcmc_swap_threaded(int i, int j)
-
void RJPTMCMC_MH_dynamic_PT_alloc_comprehensive_internal_driver(mcmc_sampler_output *sampler_output, double **output, int **status, int *model_status, int nested_model_number, int max_dimension, int min_dimension, int N_steps, int chain_N, int max_chain_N_thermo_ensemble, int swp_freq, int t0, int nu, int max_chunk_size, std::string chain_distribution_scheme, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_prior, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_likelihood, std::function<void(double*, int*, int, double**, mcmc_data_interface*, void*)> fisher, std::function<void(double*, double*, int*, int*, int*, int*, double*, mcmc_data_interface*, void*)> RJ_proposal, void **user_parameters, int numThreads, bool pool, bool show_prog, bool update_RJ_width, std::string statistics_filename, std::string chain_filename, std::string likelihood_log_filename, std::string checkpoint_file)
Driver routine for the uncorrelated sampler — trying not to repeat code.
Assumed that the checkpoint_file has already been populated —
It will overwrite all the file paths
- Parameters:
max_dimension – dimension of the parameter space being explored
min_dimension – dimension of the parameter space being explored
N_steps – Number of total steps to be taken, per chain AFTER chain allocation
chain_N – Maximum number of chains to use
max_chain_N_thermo_ensemble – Maximum number of chains to use in the thermodynamic ensemble (may use less)
swp_freq – the frequency with which chains are swapped
t0 – Time constant of the decay of the chain dynamics (~1000)
nu – Initial amplitude of the dynamics (~100)
max_chunk_size – Maximum number of steps to take in a single sampler run
log_prior – std::function for the log_prior function — takes double *position, int *param_status, int dimension, int chain_id
log_likelihood – std::function for the log_likelihood function — takes double *position, int *param_status,int dimension, int chain_id
fisher – std::function for the fisher function — takes double *position, int *param_status,int dimension, double **output_fisher, int chain_id
RJ_proposal – std::function for the log_likelihood function — takes double *position, int *param_status,int dimension, int chain_id
user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*
numThreads – Number of threads to use (=1 is single threaded)
pool – boolean to use stochastic chain swapping (MUST have >2 threads)
show_prog – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)
statistics_filename – Filename to output sampling statistics, if empty string, not output
chain_filename – Filename to output data (chain 0 only), if empty string, not output
likelihood_log_filename – Filename to write the log_likelihood and log_prior at each step — use empty string to skip
checkpoint_file – Filename to output data for checkpoint, if empty string, not saved
-
void RJPTMCMC_MH_dynamic_PT_alloc_comprehensive_internal(mcmc_sampler_output *sampler_output, double **output, int **parameter_status, int *model_status, int nested_model_number, int max_dimension, int min_dimension, int N_steps, int chain_N, int max_chain_N_thermo_ensemble, double *initial_pos, int *initial_status, int initial_model_status, double *seeding_var, double **ensemble_initial_pos, int **ensemble_initial_status, int *ensemble_initial_model_status, double *chain_temps, int swp_freq, int t0, int nu, int max_chunk_size, std::string chain_distribution_scheme, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_prior, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_likelihood, std::function<void(double*, int*, int, double**, mcmc_data_interface*, void*)> fisher, std::function<void(double*, double*, int*, int*, int*, int*, double*, mcmc_data_interface*, void*)> RJ_proposal, void **user_parameters, int numThreads, bool pool, bool show_prog, bool update_RJ_width, std::string statistics_filename, std::string chain_filename, std::string likelihood_log_filename, std::string checkpoint_file)
Parallel tempered, dynamic chain allocation MCMC. Reversible Jump version, which doesn’t currently have any autocorrelation calculations currently.
Runs dynamic chain allocation, then it will run the RJPTMCMC routine repeatedly
- Parameters:
output – [out] Output shape is double[N_steps,dimension]
parameter_status – [out] Output shape is int[N_steps,dimension]
max_dimension – maximum dimension of the parameter space being explored
min_dimension – minimum dimension of the parameter space being explored
N_steps – Number of total steps to be taken, per chain AFTER chain allocation
chain_N – Maximum number of chains to use
max_chain_N_thermo_ensemble – Maximum number of chains to use in the thermodynamic ensemble (may use less)
initial_pos – Initial position in parameter space - shape double[max_dimension]
initial_status – Initial status in parameter space - shape int[max_dimension]
initial_model_status – Initial model status in parameter space - shape int[nested_model_number]
seeding_var – Variance of the normal distribution used to seed each chain higher than 0 - shape double[dimension]
chain_temps – [out] Final chain temperatures used — should be shape double[chain_N]
swp_freq – the frequency with which chains are swapped
t0 – Time constant of the decay of the chain dynamics (~1000)
nu – Initial amplitude of the dynamics (~100)
max_chunk_size – Maximum number of steps to take in a single sampler run
log_prior – std::function for the log_prior function — takes double *position, int *param_status, int dimension, int chain_id
log_likelihood – std::function for the log_likelihood function — takes double *position, int *param_status,int dimension, int chain_id
fisher – std::function for the fisher function — takes double *position, int *param_status,int dimension, double **output_fisher, int chain_id
RJ_proposal – std::function for the log_likelihood function — takes double *position, int *param_status,int dimension, int chain_id
user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*
numThreads – Number of threads to use (=1 is single threaded)
pool – boolean to use stochastic chain swapping (MUST have >2 threads)
show_prog – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)
statistics_filename – Filename to output sampling statistics, if empty string, not output
chain_filename – Filename to output data (chain 0 only), if empty string, not output
likelihood_log_filename – Filename to write the log_likelihood and log_prior at each step — use empty string to skip
checkpoint_file – Filename to output data for checkpoint, if empty string, not saved
-
void RJPTMCMC_MH_dynamic_PT_alloc_comprehensive(mcmc_sampler_output *sampler_output, double **output, int **parameter_status, int *model_status, int nested_model_number, int max_dimension, int min_dimension, int N_steps, int chain_N, int max_chain_N_thermo_ensemble, double *initial_pos, int *initial_status, int initial_model_status, double *seeding_var, double **ensemble_initial_pos, int **ensemble_initial_status, int *ensemble_initial_model_status, double *chain_temps, int swp_freq, int t0, int nu, int max_chunk_size, std::string chain_distribution_scheme, double (*log_prior)(double *param, int *status, int model_status, mcmc_data_interface *interface, void *parameters), double (*log_likelihood)(double *param, int *status, int model_status, mcmc_data_interface *interface, void *parameters), void (*fisher)(double *param, int *status, int model_status, double **fisher, mcmc_data_interface *interface, void *parameters), void (*RJ_proposal)(double *current_param, double *proposed_param, int *current_status, int *proposed_status, int *current_model_status, int *proposed_model_status, double *MH_corrections, mcmc_data_interface *interface, void *parameters), void **user_parameters, int numThreads, bool pool, bool show_prog, bool update_RJ_width, std::string statistics_filename, std::string chain_filename, std::string likelihood_log_filename, std::string checkpoint_file)
-
void continue_RJPTMCMC_MH_dynamic_PT_alloc_comprehensive_internal(std::string checkpoint_file_start, mcmc_sampler_output *sampler_output, double **output, int **parameter_status, int *model_status, int nested_model_number, int N_steps, int max_chain_N_thermo_ensemble, double *chain_temps, int swp_freq, int t0, int nu, int max_chunk_size, std::string chain_distribution_scheme, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_prior, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_likelihood, std::function<void(double*, int*, int, double**, mcmc_data_interface*, void*)> fisher, std::function<void(double*, double*, int*, int*, int*, int*, double*, mcmc_data_interface*, void*)> RJ_proposal, void **user_parameters, int numThreads, bool pool, bool show_prog, bool update_RJ_width, std::string statistics_filename, std::string chain_filename, std::string likelihood_log_filename, std::string checkpoint_file)
Parallel tempered, dynamic chain allocation MCMC. Reversible Jump version, which doesn’t currently have any autocorrelation calculations currently.
Runs dynamic chain allocation, then it will run the RJPTMCMC routine repeatedly
- Parameters:
output – [out] Output shape is double[N_steps,dimension]
parameter_status – [out] Output shape is int[N_steps,dimension]
N_steps – Number of total steps to be taken, per chain AFTER chain allocation
max_chain_N_thermo_ensemble – Maximum number of chains to use in the thermodynamic ensemble (may use less)
chain_temps – [out] Final chain temperatures used — should be shape double[chain_N]
swp_freq – the frequency with which chains are swapped
t0 – Time constant of the decay of the chain dynamics (~1000)
nu – Initial amplitude of the dynamics (~100)
max_chunk_size – Maximum number of steps to take in a single sampler run
log_prior – std::function for the log_prior function — takes double *position, int *param_status, int dimension, int chain_id
log_likelihood – std::function for the log_likelihood function — takes double *position, int *param_status,int dimension, int chain_id
fisher – std::function for the fisher function — takes double *position, int *param_status,int dimension, double **output_fisher, int chain_id
RJ_proposal – std::function for the log_likelihood function — takes double *position, int *param_status,int dimension, int chain_id
user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*
numThreads – Number of threads to use (=1 is single threaded)
pool – boolean to use stochastic chain swapping (MUST have >2 threads)
show_prog – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)
statistics_filename – Filename to output sampling statistics, if empty string, not output
chain_filename – Filename to output data (chain 0 only), if empty string, not output
likelihood_log_filename – Filename to write the log_likelihood and log_prior at each step — use empty string to skip
checkpoint_file – Filename to output data for checkpoint, if empty string, not saved
-
void continue_RJPTMCMC_MH_dynamic_PT_alloc_comprehensive(std::string checkpoint_file_start, mcmc_sampler_output *sampler_output, double **output, int **parameter_status, int *model_status, int nested_model_number, int N_steps, int max_chain_N_thermo_ensemble, double *chain_temps, int swp_freq, int t0, int nu, int max_chunk_size, std::string chain_distribution_scheme, double (*log_prior)(double *param, int *status, int model_status, mcmc_data_interface *interface, void *parameters), double (*log_likelihood)(double *param, int *status, int model_status, mcmc_data_interface *interface, void *parameters), void (*fisher)(double *param, int *status, int model_status, double **fisher, mcmc_data_interface *interface, void *parameters), void (*RJ_proposal)(double *current_param, double *proposed_param, int *current_status, int *proposed_status, int *current_model_status, int *proposed_model_status, double *MH_corrections, mcmc_data_interface *interface, void *parameters), void **user_parameters, int numThreads, bool pool, bool show_prog, bool update_RJ_width, std::string statistics_filename, std::string chain_filename, std::string likelihood_log_filename, std::string checkpoint_file)
-
void dynamic_temperature_full_ensemble_internal(sampler *samplerptr, int N_steps, double nu, int t0, int swp_freq, bool show_prog)
-
void continue_PTMCMC_MH_dynamic_PT_alloc_full_ensemble_internal(std::string checkpoint_file_start, double ***output, int N_steps, double *chain_temps, int swp_freq, int t0, int nu, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_prior, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_likelihood, std::function<void(double*, int*, int, double**, mcmc_data_interface*, void*)> fisher, void **user_parameters, int numThreads, bool pool, bool show_prog, std::string statistics_filename, std::string chain_filename, std::string checkpoint_file, bool burn_phase)
Continue dyanmically tunes an MCMC for optimal spacing. step width, and chain number.
NOTE: nu, and t0 parameters determine the dynamics, so these are important quantities. nu is related to how many swap attempts it takes to substantially change the temperature ladder, why t0 determines the length of the total dyanimcally period. Moderate initial choices would be 10 and 1000, respectively.
Based on arXiv:1501.05823v3
Currently, Chain number is fixed
max_chain_N_thermo_ensemble sets the maximium number of chains to use to in successively hotter chains to cover the likelihood surface while targeting an optimal swap acceptance target_swp_acc.
max_chain_N determines the total number of chains to run once thermodynamic equilibrium has been reached. This results in chains being added after the initial PT dynamics have finished according to chain_distribution_scheme.
If no preference, set max_chain_N_thermo_ensemble = max_chain_N = numThreads = (number of cores (number of threads if hyperthreaded))— this will most likely be the most optimal configuration. If the number of cores on the system is low, you may want to use n*numThreads for some integer n instead, depending on the system.
chain_distribution_scheme:
“cold”: All chains are added at T=1 (untempered)
“refine”: Chains are added between the optimal temps geometrically — this may be a good option as it will be a good approximation of the ideal distribution of chains, while keeping the initial dynamical time low
“double”: Chains are added in order of rising temperature that mimic the distribution achieved by the earier PT dynamics
“half_ensemble”: For every cold chain added, half of the ensemble is added again. Effectively, two cold chains for every ensemble
- Parameters:
output – [out] Output chains, shape is double[max_chain_N, N_steps,dimension]
N_steps – Number of total steps to be taken, per chain AFTER chain allocation
chain_temps – [out] Final chain temperatures used — should be shape double[chain_N]
swp_freq – the frequency with which chains are swapped
t0 – Time constant of the decay of the chain dynamics (~1000)
nu – Initial amplitude of the dynamics (~100)
log_prior – std::function for the log_prior function — takes double *position, int dimension, int chain_id
log_likelihood – std::function for the log_likelihood function — takes double *position, int dimension, int chain_id
fisher – std::function for the fisher function — takes double *position, int dimension, double **output_fisher, int chain_id
user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*
numThreads – Number of threads to use (=1 is single threaded)
pool – boolean to use stochastic chain swapping (MUST have >2 threads)
show_prog – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)
statistics_filename – Filename to output sampling statistics, if empty string, not output
chain_filename – Filename to output data (chain 0 only), if empty string, not output
checkpoint_file – Filename to output data for checkpoint, if empty string, not saved
-
void continue_RJPTMCMC_MH_dynamic_PT_alloc_full_ensemble_internal(std::string checkpoint_file_start, double ***output, int ***status, int **model_status, int nested_model_number, int N_steps, double *chain_temps, int swp_freq, int t0, int nu, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_prior, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_likelihood, std::function<void(double*, int*, int, double**, mcmc_data_interface*, void*)> fisher, std::function<void(double*, double*, int*, int*, int*, int*, double*, mcmc_data_interface*, void*)> RJ_proposal, void **user_parameters, int numThreads, bool pool, bool show_prog, bool update_RJ_width, std::string statistics_filename, std::string chain_filename, std::string checkpoint_file, bool burn_phase)
Continue dyanmically tunes an MCMC for optimal spacing. step width, and chain number.
NOTE: nu, and t0 parameters determine the dynamics, so these are important quantities. nu is related to how many swap attempts it takes to substantially change the temperature ladder, why t0 determines the length of the total dyanimcally period. Moderate initial choices would be 10 and 1000, respectively.
Based on arXiv:1501.05823v3
Currently, Chain number is fixed
max_chain_N_thermo_ensemble sets the maximium number of chains to use to in successively hotter chains to cover the likelihood surface while targeting an optimal swap acceptance target_swp_acc.
max_chain_N determines the total number of chains to run once thermodynamic equilibrium has been reached. This results in chains being added after the initial PT dynamics have finished according to chain_distribution_scheme.
If no preference, set max_chain_N_thermo_ensemble = max_chain_N = numThreads = (number of cores (number of threads if hyperthreaded))— this will most likely be the most optimal configuration. If the number of cores on the system is low, you may want to use n*numThreads for some integer n instead, depending on the system.
chain_distribution_scheme:
“cold”: All chains are added at T=1 (untempered)
“refine”: Chains are added between the optimal temps geometrically — this may be a good option as it will be a good approximation of the ideal distribution of chains, while keeping the initial dynamical time low
“double”: Chains are added in order of rising temperature that mimic the distribution achieved by the earier PT dynamics
“half_ensemble”: For every cold chain added, half of the ensemble is added again. Effectively, two cold chains for every ensemble
- Parameters:
output – [out] Output chains, shape is double[max_chain_N, N_steps,dimension]
status – [out] output parameter status array, dimensions: status[chain_N][N_steps][dimension]
N_steps – Number of total steps to be taken, per chain AFTER chain allocation
chain_temps – [out] Final chain temperatures used — should be shape double[chain_N]
swp_freq – the frequency with which chains are swapped
t0 – Time constant of the decay of the chain dynamics (~1000)
nu – Initial amplitude of the dynamics (~100)
log_prior – std::function for the log_prior function — takes double *position, int *param_status, int dimension, int chain_id
log_likelihood – std::function for the log_likelihood function — takes double *position, int *param_status,int dimension, int chain_id
fisher – std::function for the fisher function — takes double *position, int *param_status,int dimension, double **output_fisher, int chain_id
RJ_proposal – std::function for the log_likelihood function — takes double *position, int *param_status,int dimension, int chain_id
user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*
numThreads – Number of threads to use (=1 is single threaded)
pool – boolean to use stochastic chain swapping (MUST have >2 threads)
show_prog – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)
statistics_filename – Filename to output sampling statistics, if empty string, not output
chain_filename – Filename to output data (chain 0 only), if empty string, not output
checkpoint_file – Filename to output data for checkpoint, if empty string, not saved
-
void continue_RJPTMCMC_MH(std::string start_checkpoint_file, double ***output, int ***status, int **model_status, int nested_model_number, int N_steps, int swp_freq, double (*log_prior)(double *param, int *status, int model_status, mcmc_data_interface *interface, void *parameters), double (*log_likelihood)(double *param, int *status, int model_status, mcmc_data_interface *interface, void *parameters), void (*fisher)(double *param, int *status, int model_status, double **fisher, mcmc_data_interface *interface, void *parameters), void (*RJ_proposal)(double *current_param, double *proposed_param, int *current_status, int *proposed_status, int *current_model_status, int *proposed_model_status, double *MH_corrections, mcmc_data_interface *interface, void *parameters), void **user_parameters, int numThreads, bool pool, bool show_prog, std::string statistics_filename, std::string chain_filename, std::string auto_corr_filename, std::string likelihood_log_filename, std::string end_checkpoint_file)
- Parameters:
start_checkpoint_file – File for starting checkpoint
output – [out] output array, dimensions: output[chain_N][N_steps][dimension]
status – [out] output parameter status array, dimensions: status[chain_N][N_steps][dimension]
N_steps – Number of new steps to take
swp_freq – frequency of swap attempts between temperatures
RJ_proposal – std::function for the log_likelihood function — takes double *position, int *param_status,int dimension, int chain_id
user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*
numThreads – Number of threads to use
pool – Boolean for whether to use
deterministic'' vs
stochastic’’ samplingshow_prog – Boolean for whether to show progress or not (turn off for cluster runs
statistics_filename – Filename to output sampling statistics, if empty string, not output
chain_filename – Filename to output data (chain 0 only), if empty string, not output
auto_corr_filename – Filename to output auto correlation in some interval, if empty string, not output
likelihood_log_filename – Filename to write the log_likelihood and log_prior at each step — use empty string to skip
end_checkpoint_file – Filename to output data for checkpoint at the end of the continued run, if empty string, not saved
Driver routine for the uncorrelated sampler — trying not to repeat code.
Assumed that the checkpoint_file has already been populated —
It will overwrite all the file paths
- Parameters:
dimension – dimension of the parameter space being explored
N_steps – Number of total steps to be taken, per chain AFTER chain allocation
chain_N – Maximum number of chains to use
max_chain_N_thermo_ensemble – Maximum number of chains to use in the thermodynamic ensemble (may use less)
swp_freq – the frequency with which chains are swapped
t0 – Time constant of the decay of the chain dynamics (~1000)
nu – Initial amplitude of the dynamics (~100)
max_chunk_size – Maximum number of steps to take in a single sampler run
log_prior – std::function for the log_prior function — takes double *position, int dimension, int chain_id
log_likelihood – std::function for the log_likelihood function — takes double *position, int dimension, int chain_id
fisher – std::function for the fisher function — takes double *position, int dimension, double **output_fisher, int chain_id
user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*
numThreads – Number of threads to use (=1 is single threaded)
pool – boolean to use stochastic chain swapping (MUST have >2 threads)
show_prog – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)
statistics_filename – Filename to output sampling statistics, if empty string, not output
chain_filename – Filename to output data (chain 0 only), if empty string, not output
likelihood_log_filename – Filename to write the log_likelihood and log_prior at each step — use empty string to skip
checkpoint_file – Filename to output data for checkpoint, if empty string, not saved
-
void continue_RJPTMCMC_MH_internal(sampler *samplerptr, std::string start_checkpoint_file, double ***output, int ***status, int **model_status, int nested_model_number, int N_steps, int swp_freq, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_prior, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_likelihood, std::function<void(double*, int*, int, double**, mcmc_data_interface*, void*)> fisher, std::function<void(double*, double*, int*, int*, int*, int*, double*, mcmc_data_interface*, void*)> RJ_proposal, void **user_parameters, int numThreads, bool pool, bool show_prog, bool update_RJ_width, std::string statistics_filename, std::string chain_filename, std::string auto_corr_filename, std::string likelihood_log_filename, std::string end_checkpoint_file, bool tune, bool burn_phase)
Routine to take a checkpoint file and begin a new chain at said checkpoint.
See MCMC_MH_internal for more details of parameters (pretty much all the same)
- Parameters:
start_checkpoint_file – File for starting checkpoint
output – [out] output array, dimensions: output[chain_N][N_steps][dimension]
status – [out] output parameter status array, dimensions: status[chain_N][N_steps][dimension]
N_steps – Number of new steps to take
swp_freq – frequency of swap attempts between temperatures
log_prior – std::function for the log_prior function — takes double *position, int *param_status, int dimension, int chain_id
log_likelihood – std::function for the log_likelihood function — takes double *position, int *param_status,int dimension, int chain_id
fisher – std::function for the fisher function — takes double *position, int *param_status,int dimension, double **output_fisher, int chain_id
RJ_proposal – std::function for the log_likelihood function — takes double *position, int *param_status,int dimension, int chain_id
user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*
numThreads – Number of threads to use
pool – Boolean for whether to use
deterministic'' vs
stochastic’’ samplingshow_prog – Boolean for whether to show progress or not (turn off for cluster runs
statistics_filename – Filename to output sampling statistics, if empty string, not output
chain_filename – Filename to output data (chain 0 only), if empty string, not output — if multiple cold chains, it will append each output to the other, and write out the total
auto_corr_filename – Filename to output auto correlation in some interval, if empty string, not output
likelihood_log_filename – Filename to write the log_likelihood and log_prior at each step — use empty string to skip
end_checkpoint_file – Filename to output data for checkpoint at the end of the continued run, if empty string, not saved
-
void RJPTMCMC_MH_internal(double ***output, int ***parameter_status, int **model_status, int nested_model_number, int max_dimension, int min_dimension, int N_steps, int chain_N, double *initial_pos, int *initial_status, int initial_model_status, double *seeding_var, double **ensemble_initial_position, int **ensemble_initial_status, int *ensemble_initial_model_status, double *chain_temps, int swp_freq, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_prior, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_likelihood, std::function<void(double*, int*, int, double**, mcmc_data_interface*, void*)> fisher, std::function<void(double*, double*, int*, int*, int*, int*, double*, mcmc_data_interface*, void*)> RJ_proposal, void **user_parameters, int numThreads, bool pool, bool show_prog, bool update_RJ_width, std::string statistics_filename, std::string chain_filename, std::string auto_corr_filename, std::string likelihood_log_filename, std::string checkpoint_file)
Generic reversable jump sampler, where the likelihood, prior, and reversable jump proposal are parameters supplied by the user.
Note: Using a min_dimension tells the sampler that there is a
base model'', and that the dimensions from min_dim to max_dim are
small’’ corrections to that model. This helps inform some of the proposal algorithms and speeds up computation. If using discrete models with no overlap of variables (ie model A or model B), set min_dim to 0. Even if reusing certain parameters, if the extra dimensions don’t describesmall'' deviations, it's probably best to set min_dim to 0.Since the RJ proposal is user specified, even if there are parameters that should never be removed, it's up to the user to dictate that. Using min_dim will not affect that aspect of the sampler. If there's a
base-model’’, the fisher function should produce a fisher matrix for the base model only. The modifications are then normally distributed around the last parameter value. Then the fisher function should take the minimum dimension instead of the maximum, like the other functions.Currently, no dynamic PT option, as it would be too many free parameters for the sampler to converge to a reasonable temperature distribution in a reasonable amount of time. Best use case, use the PTMCMC_MH_dynamic for the `base’ dimension space, and use that temperature ladder.
Base of the sampler, generic, with user supplied quantities for most of the samplers properties
Uses the Metropolis-Hastings method, with the option for Fisher/MALA steps if the Fisher routine is supplied.
3 modes to use -
single threaded (numThreads = 1) runs single threaded
multi-threaded `deterministic’ (numThreads>1 ; pool = false) progresses each chain in parallel for swp_freq steps, then waits for all threads to complete before swapping temperatures in sequenctial order (j, j+1) then (j+1, j+2) etc (sequenctially)
multi-threaded `stochastic’ (numThreads>2 ; pool = true) progresses each chain in parallel by queueing each temperature and evaluating them in the order they were submitted. Once finished, the threads are queued to swap, where they swapped in the order they are submitted. This means the chains are swapped randomly, and the chains do NOT finish at the same time. The sampler runs until the the 0th chain reaches the step number
Note on limits: In the prior function, if a set of parameters should be disallowed, return -std::numeric_limits<double>::infinity() — (this is in the <limits> file in std)
The parameter array uses the dimensions [0,min_dim] always, and [min_dim, max_dim] in RJPTMCMC fashion
Format for the auto_corr file (compatable with csv, dat, txt extensions): each row is a dimension of the cold chain, with the first row being the lengths used for the auto-corr calculation:
lengths: length1 , length2 …
dim1: length1 , length2 …
Format for the chain file (compatable with csv, dat, txt extensions): each row is a step, each column a dimension:
Step1: dim1 , dim2 , …, max_dim, param_status1, param_status2, …
Step2: dim1 , dim2 , …, max_dim, param_status1, param_status2, …
Statistics_filename : should be txt extension
checkpoint_file : This file saves the final position of all the chains, as well as other metadata, and can be loaded by the function <FUNCTION> to continue the chain from the point it left off. Not meant to be read by humans, the data order is custom to this software library. An empty string (“”) means no checkpoint will be saved. For developers, the contents are:
dimension, # of chains
temps of chains
Stepping widths of all chains
Final position of all chains
- Parameters:
output – [out] Output chains, shape is double[chain_N, N_steps,dimension]
parameter_status – [out] Parameter status for each step corresponding to the output chains, shape is double[chain_N, N_steps,dimension]
max_dimension – maximum dimension of the parameter space being explored — only consideration is memory, as memory scales with dimension. Keep this reasonable, unless memory is REALLY not an issue
min_dimension – minimum dimension of the parameter space being explored >=1
N_steps – Number of total steps to be taken, per chain
chain_N – Number of chains
initial_pos – Initial position in parameter space - shape double[dimension]
initial_status – Initial status of the parameters in the initial position in parameter space - shape int[max_dim]
initial_model_status – Initial status of the parameters in the initial position in parameter space - shape int[max_dim]
seeding_var – Variance of the normal distribution used to seed each chain higher than 0 - shape double[max_dimension] — initial seeding of zero corresponds to the dimension turned off initially
chain_temps – Double array of temperatures for the chains
swp_freq – the frequency with which chains are swapped
log_prior – std::function for the log_prior function — takes double *position, int *param_status, int dimension, int chain_id
log_likelihood – std::function for the log_likelihood function — takes double *position, int *param_status,int dimension, int chain_id
fisher – std::function for the fisher function — takes double *position, int *param_status,int dimension, double **output_fisher, int chain_id
RJ_proposal – std::function for the log_likelihood function — takes double *position, int *param_status,int dimension, int chain_id
user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*
numThreads – Number of threads to use (=1 is single threaded)
pool – boolean to use stochastic chain swapping (MUST have >2 threads)
show_prog – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)
update_RJ_width – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)
statistics_filename – Filename to output sampling statistics, if empty string, not output
chain_filename – Filename to output data (chain 0 only), if empty string, not output — if multiple cold chains, it will append each output to the other, and write out the total
auto_corr_filename – Filename to output auto correlation in some interval, if empty string, not output
likelihood_log_filename – Filename to write the log_likelihood and log_prior at each step — use empty string to skip
checkpoint_file – Filename to output data for checkpoint, if empty string, not saved
-
void RJPTMCMC_MH(double ***output, int ***parameter_status, int **model_status, int nested_model_number, int max_dimension, int min_dimension, int N_steps, int chain_N, double *initial_pos, int *initial_status, int initial_model_status, double *seeding_var, double **ensemble_initial_position, int **ensemble_initial_status, int *ensemble_initial_model_status, double *chain_temps, int swp_freq, double (*log_prior)(double *param, int *status, int model_status, mcmc_data_interface*, void *parameters), double (*log_likelihood)(double *param, int *status, int model_status, mcmc_data_interface*, void *parameters), void (*fisher)(double *param, int *status, int model_status, double **fisher, mcmc_data_interface*, void *parameters), void (*RJ_proposal)(double *current_param, double *proposed_param, int *current_status, int *proposed_status, int *current_model_status, int *proposed_model_status, double *MH_corrections, mcmc_data_interface*, void *parameters), void **user_parameters, int numThreads, bool pool, bool show_prog, std::string statistics_filename, std::string chain_filename, std::string auto_corr_filename, std::string likelihood_log_filename, std::string checkpoint_file)
- Parameters:
output – [out] Output chains, shape is double[chain_N, N_steps,dimension]
parameter_status – [out] Parameter status for each step corresponding to the output chains, shape is double[chain_N, N_steps,dimension]
max_dimension – maximum dimension of the parameter space being explored — only consideration is memory, as memory scales with dimension. Keep this reasonable, unless memory is REALLY not an issue
min_dimension – minimum dimension of the parameter space being explored >=1
N_steps – Number of total steps to be taken, per chain
chain_N – Number of chains
initial_pos – Initial position in parameter space - shape double[dimension]
initial_status – Initial status of the parameters in the initial position in parameter space - shape int[max_dim]
initial_model_status – Initial status of the parameters in the initial position in parameter space - shape int[max_dim]
seeding_var – Variance of the normal distribution used to seed each chain higher than 0 - shape double[max_dimension] — initial seeding of zero corresponds to the dimension turned off initially
chain_temps – Double array of temperatures for the chains
swp_freq – the frequency with which chains are swapped
RJ_proposal – std::function for the log_likelihood function — takes double *position, int *param_status,int dimension, int chain_id
user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*
numThreads – Number of threads to use (=1 is single threaded)
pool – boolean to use stochastic chain swapping (MUST have >2 threads)
show_prog – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)
statistics_filename – Filename to output sampling statistics, if empty string, not output
chain_filename – Filename to output data (chain 0 only), if empty string, not output
auto_corr_filename – Filename to output auto correlation in some interval, if empty string, not output
likelihood_log_filename – Filename to write the log_likelihood and log_prior at each step — use empty string to skip
checkpoint_file – Filename to output data for checkpoint, if empty string, not saved
- Parameters:
output – [out] Output, shape is double[N_steps,dimension]
N_steps – Number of total steps to be taken, per chain AFTER chain allocation
max_chain_N_thermo_ensemble – Maximum number of chains to use in the thermodynamic ensemble (may use less)
chain_temps – Final chain temperatures used — should be shape double[chain_N]
swp_freq – the frequency with which chains are swapped
t0 – Time constant of the decay of the chain dynamics (~1000)
nu – Initial amplitude of the dynamics (~100)
max_chunk_size – Maximum number of steps to take in a single sampler run
log_prior – Funcion pointer for the log_prior
log_likelihood – Function pointer for the log_likelihood
fisher – Function pointer for the fisher - if NULL, fisher steps are not used
user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*
numThreads – Number of threads to use (=1 is single threaded)
pool – boolean to use stochastic chain swapping (MUST have >2 threads)
show_prog – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)
statistics_filename – Filename to output sampling statistics, if empty string, not output
chain_filename – Filename to output data (chain 0 only), if empty string, not output
likelihood_log_filename – Filename to write the log_likelihood and log_prior at each step — use empty string to skip
checkpoint_file – Filename to output data for checkpoint, if empty string, not saved
Parallel tempered, dynamic chain allocation MCMC with output samples with specified maximum autocorrelation.
Runs dynamic chain allocation until the autocorrelation lengths stabilize, then it will run the PTMCMC routine repeatedly, periodically thinning the chain to ensure the auto-correlation length is below the threshold
Note: smallest batch size is .5*N_steps, so the target correlation length should be << less than this number
Note: This routine only works if the requested number of samples is larger than the typical correlation length of the unthinned chains. This is because the chains are run and tested in batches the size of the requested sample number.
Note: This method does NOT guarantee the final autocorrelation length of the chains will the be the target. It merely uses the requested autocorrelation length as a guide to thin the chains as samples are accrued and to estimate the total number of effective samples. Its best to request extra samples and thin the chains out at the end one final time, or to run multiple runs and combine the results at the end.
- Parameters:
output – [out] Output shape is double[N_steps,dimension]
N_steps – Number of total steps to be taken, per chain AFTER chain allocation
max_chain_N_thermo_ensemble – Maximum number of chains to use in the thermodynamic ensemble (may use less)
chain_temps – [out] Final chain temperatures used — should be shape double[chain_N]
swp_freq – the frequency with which chains are swapped
t0 – Time constant of the decay of the chain dynamics (~1000)
nu – Initial amplitude of the dynamics (~100)
max_chunk_size – Maximum number of steps to take in a single sampler run
log_prior – std::function for the log_prior function — takes double *position, int dimension, int chain_id
log_likelihood – std::function for the log_likelihood function — takes double *position, int dimension, int chain_id
fisher – std::function for the fisher function — takes double *position, int dimension, double **output_fisher, int chain_id
user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*
numThreads – Number of threads to use (=1 is single threaded)
pool – boolean to use stochastic chain swapping (MUST have >2 threads)
show_prog – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)
statistics_filename – Filename to output sampling statistics, if empty string, not output
chain_filename – Filename to output data (chain 0 only), if empty string, not output
likelihood_log_filename – Filename to write the log_likelihood and log_prior at each step — use empty string to skip
checkpoint_file – Filename to output data for checkpoint, if empty string, not saved
- Parameters:
output – [out] Output, shape is double[N_steps,dimension]
dimension – dimension of the parameter space being explored
N_steps – Number of total steps to be taken, per chain AFTER chain allocation
chain_N – Maximum number of chains to use
max_chain_N_thermo_ensemble – Maximum number of chains to use in the thermodynamic ensemble (may use less)
initial_pos – Initial position in parameter space - shape double[dimension]
seeding_var – Variance of the normal distribution used to seed each chain higher than 0 - shape double[dimension]
chain_temps – Final chain temperatures used — should be shape double[chain_N]
swp_freq – the frequency with which chains are swapped
t0 – Time constant of the decay of the chain dynamics (~1000)
nu – Initial amplitude of the dynamics (~100)
max_chunk_size – Maximum number of steps to take in a single sampler run
log_prior – Funcion pointer for the log_prior
log_likelihood – Function pointer for the log_likelihood
fisher – Function pointer for the fisher - if NULL, fisher steps are not used
user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*
numThreads – Number of threads to use (=1 is single threaded)
pool – boolean to use stochastic chain swapping (MUST have >2 threads)
show_prog – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)
statistics_filename – Filename to output sampling statistics, if empty string, not output
chain_filename – Filename to output data (chain 0 only), if empty string, not output
likelihood_log_filename – Filename to write the log_likelihood and log_prior at each step — use empty string to skip
checkpoint_file – Filename to output data for checkpoint, if empty string, not saved
-
void dynamic_temperature_internal(sampler *samplerptr, int N_steps, double nu, int t0, int swp_freq, int max_chain_N_thermo_ensemble, bool dynamic_chain_number, bool show_prog)
-
void continue_PTMCMC_MH_dynamic_PT_alloc_internal(std::string checkpoint_file_start, double ***output, int N_steps, int max_chain_N_thermo_ensemble, double *chain_temps, int swp_freq, int t0, int nu, std::string chain_distribution_scheme, std::function<double(double*, int*, int, mcmc_data_interface *interface, void*)> log_prior, std::function<double(double*, int*, int, mcmc_data_interface *interface, void*)> log_likelihood, std::function<void(double*, int*, int, double**, mcmc_data_interface *interface, void*)> fisher, void **user_parameters, int numThreads, bool pool, bool show_prog, bool dynamic_chain_number, std::string statistics_filename, std::string chain_filename, std::string checkpoint_file, bool burn_phase)
-
void continue_PTMCMC_MH_dynamic_PT_alloc(std::string checkpoint_file_start, double ***output, int N_steps, int max_chain_N_thermo_ensemble, double *chain_temps, int swp_freq, int t0, int nu, std::string chain_distribution_scheme, double (*log_prior)(double *param, mcmc_data_interface *interface, void *parameters), double (*log_likelihood)(double *param, mcmc_data_interface *interface, void *parameters), void (*fisher)(double *param, double **fisher, mcmc_data_interface *interface, void *parameters), void **user_parameters, int numThreads, bool pool, bool show_prog, std::string statistics_filename, std::string chain_filename, std::string checkpoint_file)
- Parameters:
output – [out] Output chains, shape is double[max_chain_N, N_steps,dimension]
N_steps – Number of total steps to be taken, per chain AFTER chain allocation
max_chain_N_thermo_ensemble – Maximum number of chains to use in the thermodynamic ensemble (may use less)
chain_temps – Final chain temperatures used — should be shape double[chain_N]
swp_freq – the frequency with which chains are swapped
t0 – Time constant of the decay of the chain dynamics (~1000)
nu – Initial amplitude of the dynamics (~100)
log_prior – Funcion pointer for the log_prior
log_likelihood – Function pointer for the log_likelihood
fisher – Function pointer for the fisher - if NULL, fisher steps are not used
user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*
numThreads – Number of threads to use (=1 is single threaded)
pool – boolean to use stochastic chain swapping (MUST have >2 threads)
show_prog – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)
statistics_filename – Filename to output sampling statistics, if empty string, not output
chain_filename – Filename to output data (chain 0 only), if empty string, not output
checkpoint_file – Filename to output data for checkpoint, if empty string, not saved
-
void PTMCMC_MH_dynamic_PT_alloc_internal(double ***output, int dimension, int N_steps, int chain_N, int max_chain_N_thermo_ensemble, double *initial_pos, double *seeding_var, double **ensemble_initial_position, double *chain_temps, int swp_freq, int t0, int nu, std::string chain_distribution_scheme, std::function<double(double*, int*, int, mcmc_data_interface *interface, void*)> log_prior, std::function<double(double*, int*, int, mcmc_data_interface *interface, void*)> log_likelihood, std::function<void(double*, int*, int, double**, mcmc_data_interface *interface, void*)> fisher, void **user_parameters, int numThreads, bool pool, bool show_prog, bool dynamic_chain_number, std::string statistics_filename, std::string chain_filename, std::string checkpoint_file, bool burn_phase)
-
void PTMCMC_MH_dynamic_PT_alloc(double ***output, int dimension, int N_steps, int chain_N, int max_chain_N_thermo_ensemble, double *initial_pos, double *seeding_var, double **ensemble_initial_position, double *chain_temps, int swp_freq, int t0, int nu, std::string chain_distribution_scheme, double (*log_prior)(double *param, mcmc_data_interface *interface, void *parameters), double (*log_likelihood)(double *param, mcmc_data_interface *interface, void *parameters), void (*fisher)(double *param, double **fisher, mcmc_data_interface *interface, void *parameters), void **user_parameters, int numThreads, bool pool, bool show_prog, std::string statistics_filename, std::string chain_filename, std::string checkpoint_file)
- Parameters:
output – [out] Output chains, shape is double[max_chain_N, N_steps,dimension]
dimension – dimension of the parameter space being explored
N_steps – Number of total steps to be taken, per chain AFTER chain allocation
chain_N – Maximum number of chains to use
max_chain_N_thermo_ensemble – Maximum number of chains to use in the thermodynamic ensemble (may use less)
initial_pos – Initial position in parameter space - shape double[dimension]
seeding_var – Variance of the normal distribution used to seed each chain higher than 0 - shape double[dimension]
chain_temps – Final chain temperatures used — should be shape double[chain_N]
swp_freq – the frequency with which chains are swapped
t0 – Time constant of the decay of the chain dynamics (~1000)
nu – Initial amplitude of the dynamics (~100)
log_prior – Funcion pointer for the log_prior
log_likelihood – Function pointer for the log_likelihood
fisher – Function pointer for the fisher - if NULL, fisher steps are not used
user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*
numThreads – Number of threads to use (=1 is single threaded)
pool – boolean to use stochastic chain swapping (MUST have >2 threads)
show_prog – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)
statistics_filename – Filename to output sampling statistics, if empty string, not output
chain_filename – Filename to output data (chain 0 only), if empty string, not output
checkpoint_file – Filename to output data for checkpoint, if empty string, not saved
-
void continue_PTMCMC_MH(std::string start_checkpoint_file, double ***output, int N_steps, int swp_freq, double (*log_prior)(double *param, mcmc_data_interface *interface, void *parameters), double (*log_likelihood)(double *param, mcmc_data_interface *interface, void *parameters), void (*fisher)(double *param, double **fisher, mcmc_data_interface *interface, void *parameters), void **user_parameters, int numThreads, bool pool, bool show_prog, std::string statistics_filename, std::string chain_filename, std::string end_checkpoint_file, bool tune)
- Parameters:
start_checkpoint_file – File for starting checkpoint
output – [out] output array, dimensions: output[chain_N][N_steps][dimension]
N_steps – Number of new steps to take
swp_freq – frequency of swap attempts between temperatures
log_prior – Funcion pointer for the log_prior
log_likelihood – Function pointer for the log_likelihood
fisher – Function pointer for the fisher - if NULL, fisher steps are not used
user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*
numThreads – Number of threads to use
pool – Boolean for whether to use
deterministic'' vs
stochastic’’ samplingshow_prog – Boolean for whether to show progress or not (turn off for cluster runs
statistics_filename – Filename to output sampling statistics, if empty string, not output
chain_filename – Filename to output data (chain 0 only), if empty string, not output
end_checkpoint_file – Filename to output data for checkpoint at the end of the continued run, if empty string, not saved
-
void PTMCMC_MH_loop(sampler *samplerptr)
Internal function that runs the actual loop for the sampler.
-
void PTMCMC_MH_step_incremental(sampler *samplerptr, int increment)
Internal function that runs the actual loop for the sampler — increment version.
The regular loop function runs for the entire range, this increment version will only step `increment’ steps — asynchronous: steps are measured by the cold chains
-
void PTMCMC_MH(double ***output, int dimension, int N_steps, int chain_N, double *initial_pos, double *seeding_var, double **ensemble_initial_position, double *chain_temps, int swp_freq, double (*log_prior)(double *param, mcmc_data_interface *interface, void *parameters), double (*log_likelihood)(double *param, mcmc_data_interface *interface, void *parameters), void (*fisher)(double *param, double **fisher, mcmc_data_interface *interface, void *parameters), void **user_parameters, int numThreads, bool pool, bool show_prog, std::string statistics_filename, std::string chain_filename, std::string checkpoint_filename)
- Parameters:
output – [out] Output chains, shape is double[chain_N, N_steps,dimension]
dimension – dimension of the parameter space being explored
N_steps – Number of total steps to be taken, per chain
chain_N – Number of chains
initial_pos – Initial position in parameter space - shape double[dimension]
seeding_var – Variance of the normal distribution used to seed each chain higher than 0 - shape double[dimension]
chain_temps – Double array of temperatures for the chains
swp_freq – the frequency with which chains are swapped
log_prior – Funcion pointer for the log_prior
log_likelihood – Function pointer for the log_likelihood
fisher – Function pointer for the fisher - if NULL, fisher steps are not used
user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*
numThreads – Number of threads to use (=1 is single threaded)
pool – boolean to use stochastic chain swapping (MUST have >2 threads)
show_prog – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)
statistics_filename – Filename to output sampling statistics, if empty string, not output
chain_filename – Filename to output data (chain 0 only), if empty string, not output
checkpoint_filename – Filename to output data for checkpoint, if empty string, not saved
-
void PTMCMC_MH_internal(double ***output, int dimension, int N_steps, int chain_N, double *initial_pos, double *seeding_var, double **ensemble_initial_position, double *chain_temps, int swp_freq, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_prior, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_likelihood, std::function<void(double*, int*, int, double**, mcmc_data_interface*, void*)> fisher, void **user_parameters, int numThreads, bool pool, bool show_prog, std::string statistics_filename, std::string chain_filename, std::string checkpoint_filename, bool tune, bool burn_phase)
Generic sampler, where the likelihood, prior are parameters supplied by the user.
Base of the sampler, generic, with user supplied quantities for most of the samplers properties
Uses the Metropolis-Hastings method, with the option for Fisher/MALA steps if the Fisher routine is supplied.
3 modes to use -
single threaded (numThreads = 1) runs single threaded
multi-threaded `deterministic’ (numThreads>1 ; pool = false) progresses each chain in parallel for swp_freq steps, then waits for all threads to complete before swapping temperatures in sequenctial order (j, j+1) then (j+1, j+2) etc (sequenctially)
multi-threaded `stochastic’ (numThreads>2 ; pool = true) progresses each chain in parallel by queueing each temperature and evaluating them in the order they were submitted. Once finished, the threads are queued to swap, where they swapped in the order they are submitted. This means the chains are swapped randomly, and the chains do NOT finish at the same time. The sampler runs until the the 0th chain reaches the step number
Note on limits: In the prior function, if a set of parameters should be disallowed, return -std::numeric_limits<double>::infinity() — (this is in the <limits> file in std)
Format for the auto_corr file (compatable with csv, dat, txt extensions): each row is a dimension of the cold chain, with the first row being the lengths used for the auto-corr calculation:
lengths: length1 , length2 …
dim1: length1 , length2 …
Format for the chain file (compatable with csv, dat, txt extensions): each row is a step, each column a dimension:
Step1: dim1 , dim2 , …
Step2: dim1 , dim2 , …
Statistics_filename : should be txt extension
checkpoint_file : This file saves the final position of all the chains, as well as other metadata, and can be loaded by the function <FUNCTION> to continue the chain from the point it left off. Not meant to be read by humans, the data order is custom to this software library. An empty string (“”) means no checkpoint will be saved. For developers, the contents are:
dimension, # of chains
temps of chains
Stepping widths of all chains
Final position of all chains
- Parameters:
output – [out] Output chains, shape is double[chain_N, N_steps,dimension]
dimension – dimension of the parameter space being explored
N_steps – Number of total steps to be taken, per chain
chain_N – Number of chains
initial_pos – Initial position in parameter space - shape double[dimension]
seeding_var – Variance of the normal distribution used to seed each chain higher than 0 - shape double[dimension]
chain_temps – Double array of temperatures for the chains
swp_freq – the frequency with which chains are swapped
log_prior – std::function for the log_prior function — takes double *position, int dimension, int chain_id
log_likelihood – std::function for the log_likelihood function — takes double *position, int dimension, int chain_id
fisher – std::function for the fisher function — takes double *position, int dimension, double **output_fisher, int chain_id
user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*
numThreads – Number of threads to use (=1 is single threaded)
pool – boolean to use stochastic chain swapping (MUST have >2 threads)
show_prog – boolean whether to print out progress (for example, should be set to `false’ if submitting to a cluster)
statistics_filename – Filename to output sampling statistics, if empty string, not output
chain_filename – Filename to output data (chain 0 only), if empty string, not output
checkpoint_filename – Filename to output data for checkpoint, if empty string, not saved
-
void continue_PTMCMC_MH_internal(sampler *samplerptr, std::string start_checkpoint_file, double ***output, int N_steps, int swp_freq, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_prior, std::function<double(double*, int*, int, mcmc_data_interface*, void*)> log_likelihood, std::function<void(double*, int*, int, double**, mcmc_data_interface*, void*)> fisher, void **user_parameters, int numThreads, bool pool, bool show_prog, std::string statistics_filename, std::string chain_filename, std::string end_checkpoint_file, bool tune, bool burn_phase)
Routine to take a checkpoint file and begin a new chain at said checkpoint.
See MCMC_MH_internal for more details of parameters (pretty much all the same)
- Parameters:
start_checkpoint_file – File for starting checkpoint
output – [out] output array, dimensions: output[chain_N][N_steps][dimension]
N_steps – Number of new steps to take
swp_freq – frequency of swap attempts between temperatures
log_prior – std::function for the log_prior function — takes double *position, int dimension, int chain_id
log_likelihood – std::function for the log_likelihood function — takes double *position, int dimension, int chain_id
fisher – std::function for the fisher function — takes double *position, int dimension, double **output_fisher, int chain_id
user_parameters – Void pointer to any parameters the user may need inside log_prior, log_likelihood, or fisher. Should have one pointer for each chain. If this isn’t needed, use (void**) NULL*
numThreads – Number of threads to use
pool – Boolean for whether to use
deterministic'' vs
stochastic’’ samplingshow_prog – Boolean for whether to show progress or not (turn off for cluster runs
statistics_filename – Filename to output sampling statistics, if empty string, not output
chain_filename – Filename to output data (chain 0 only), if empty string, not output
end_checkpoint_file – Filename to output data for checkpoint at the end of the continued run, if empty string, not saved
tune – Allow for dynamic tuning — technically should be turned off when drawing final samples
burn_phase – Passed to LL function for user information