Crash course on Bayesian statistics and MCMC algorithms
Question Answer(s)
Sorry, what was L in the integral part of the formula, again? just pasting this here again : L refers to the likelihood of the data given the model
NA another way of formulating this would be : what’s the probability of observing these data if my hypothesis is true ?
way* Thanks! My brain is starting to warm up :wink:
why choose dbeta(1,1) instead of dunif(0,1)? They are identical, so you can choose either. and , are there computational advantages to using either?
NA I think there is an advantage if you want to specify an informative prior - with higher probability, more weight, around certain values - you can choose the parameters of the beta to match a certain mean value and sd value - we will go through this in class 5
NA I guess it is also due to the conjuguate properties of combining a binomial likelihood with a beta prior giving you a known beta posterior distributions. Now with MCMC we don't need this anymore of course and are much more free in choosing priors.
NA dbeta(1,1) = dunif(0,1) but the Beta distribution offers more flexibility for further use (informative priors, conjugacy) like and said.
ok, so there are no computational advantages? I'm just wondering. See thread under your previous question :slightly_smiling_face:
NA
NA Thank you !
A little late notice but I needed to load the wesanderson package to make the script 1 run Thanks for pointing out - it is a color palette used for making some of the plots in the script ;-)
NA :+1:
NA For fans of Wes Anderson's (and Bill Murray's) movies :wink:
How do you know which algorithm to use? The "best" choice of algorithm depends on many factors, such as parameter properties, model structure, etc.
NA When using a software like JAGS, the algorithm is "fixed" so you cannot change it. With nimble, this is different. We'll talk about sampler choice a bit in the Lecture #8 tomorrow :smile:
So just to confirm we are looking for little correlation on the mixing and autocorrelation? Yes, we want little autocorrelation WITHIN each chain and little correlation AMONG all chains.
NA Great thanks and between teh chains is tested with the ANOVA?
NA Sorry the R-hat
NA R-hat is a measure for both, not just between-chain correlation. Specifically, it measures the ratio of the total variability combining multiple chains (between-chain plus within-chain) to the within-chain variability.
NA Some more details about how it is calculated and additional explanations here :
NA perfect thank you!
How many chains should be used to calculate the Gelman-Rubin R-hat? You can calculate R-hat with as few as 2 chains. Usually, we use 3-4 chains when developing/working with models.
A perhaps naïve question, just in case there's something more to it. Presumably new starting values do not really provide much in terms of chain convergence, unless the initial parameter(s) mean(s) you get stuck in some area of the parameter space? If you have starting values that allow you to travel through all the parameter space I assume your starting point is a bit irrelevant? That's right . If there are no issues with convergence, then changing initial values should not affect the behaviour. However, if there are some problems (such as low information content in data, weakly identifiable parameters, etc.) then the starting values can be influential.
NA As you say, in those cases, the starting values can "set the trajectory" towards one of several possible "solutions" and can thus affect the outcome.
NA That being said: if you have "good" starting values (i.e. starting values closer to truth) you will reach convergence faster :wink:
NA Running the model with several sets of random initial values is very useful to identify cases where we have local minima too
NA happening quite often in models with ‘complex’ structure, typically CMR models :wink:
NA it can be non-identifiability issues, in this case we also expect a small n.eff whe looking at the model summary, and probably high overlap between prior and posterior distribution, we will talk about in class 5
NA probably a naive question, but could this last case happen if you set too strong priors (for example with smaller data sets)?
Can there be more than one equilibrium distributions for the same parameter (e.g. one centred around 0.2 and the other 0.8)? That can indeed happen. Common causes are: unaccounted for variation (i.e. bi-/multi-modal posteriors), little information-content in data, non-/weakly identifiable parameters.
NA In those cases, we often get poor mixing, and sometimes "stray chains", i.e. chains converging to alternative solutions.
NA So I assume there is a way to explore multi-modal posterior distributions using MCMC, nonetheless?
NA Assuming that the results "make sense", and are not due to poor parameters, lack of data etc
NA this can signal that there are local minima, in this case running the model with several sets of random initial values can help to make sure you reach convergence. If it does not help, maybe rethink the model structure/ complexity with regard to the amount of data available, and/or use prior info if available
NA Another idea - if you have biological reasons to expect bi model distribution - is to use a mixture of two distributions (for example if you expect a sex effect but could not measure sex on your individuals during moniroting) then each mixture component should converge on a different parameter, and traceplot should be nice for each mixture
NA bimodal not bi model**
NA I would think that there are ways to explore multi-modal posteriors as well. This may - in fact - be useful as part of model optimization (see sarah's response above). Maybe or can add something to this later.
NA I can see how traceplot would work. Nice to keep in mind that this might be a possibility! Thanks for the answers :slightly_smiling_face:
NA More about multiple modes in the posterior distribution in lecture 6 :wink:
My graphs don't converge using the same code, am I doing something wrong? NA
Sorry where are the scripts ?
NA It's under the "Live demos" tab :)
it just so happens the posterior has values below 0.1 I was suspecting this... Thank you :smile:
I changed line 149 to have 10000 steps, and got this. I guess this is really poor mixing then :slightly_smiling_face: Doesn't look great! ;p
NA Did you change anything else?
NA That's my graph changing for 5,000 it
I get something ok
NA
NA Can you send me the code?
NA That's really weird, have not changed the code! Will attach it here. Only change was line 149 and 195, as discussed.
NA
NA don't loose your time on it just rerun the code and got your well mixed results... hummm
NA and now I can't reproduce the weird plot I got above. Let's file it under mysteRies
NA Haha, the joy of reproducible research ;-)
NA Tiago, how did you solve it? I keep running and get the same problem!
NA feel free to send me the code that reproduces the pb, I'll have a look.
NA I didn't change the code in your 1_demo... Do you still want me to send it over?
NA Yes, please. I'd really like to figure out what happens as several of you got the same issue.
Yea Katharine, I still get the same model from the html code... Oh bother :disappointed: (think I'm just replying to you here). I used ctrl+shoft+F10 between the two if that might help?
NA The two aren't different for me?
It can be solved when section from set.seed() until the end of code section (for example L151:L224 or L233:266) is selected and run all together. For me at lest (R 4.0.5.) I think that's the same for me too!
yes runs perfectly with the whole code, but not when doing step by step .. What's the pb?
NA running all together...
One quick question about correlations when assessing convergence. The bottom plots show correlation between successive values of the chain. So the left bottom graph shows high correlation. Is it because each bar shows correlation so the more bars you have the more values were correlated? Is this the right way of seeing this graph? the y-axis shows the amount of correlation and the x a number of steps away in the chain. So we would expect a value to be more correlated with values closer in the chain and we want to see this correlation reduce relatively quickly, like in the middle. Hope that makes sense :simple_smile: and someone please correct me if that's not quite right