Statistics - Michael J Kerrison

# Unsorted "Hypothesis tests under separation": https://osf.io/preprints/socarxiv/bmvnu http://freerangestats.info/blog/2023/07/30/log-transforms Really good: implications of repeated hypothesis tests as you go through an experiment, in a frequentist framework - [https://vgherard.github.io/posts/2023-07-24-ab-tests-and-repeated-checks/](https://vgherard.github.io/posts/2023-07-24-ab-tests-and-repeated-checks/ "https://vgherard.github.io/posts/2023-07-24-ab-tests-and-repeated-checks/") Practical example of bringing "machine learning" to prediction/inference problems: [https://datageeek.com/2023/07/26/understanding-the-effect-of-subsidies-on-agriculture-with-bagged-neural-networks/](https://datageeek.com/2023/07/26/understanding-the-effect-of-subsidies-on-agriculture-with-bagged-neural-networks/ "https://datageeek.com/2023/07/26/understanding-the-effect-of-subsidies-on-agriculture-with-bagged-neural-networks/") Ordinal models for paired data: [https://www.fharrell.com/post/pair/index.html](https://www.fharrell.com/post/pair/index.html "https://www.fharrell.com/post/pair/index.html") Interesting but maybe needs some vetting for interpretations...? [https://win-vector.com/2023/08/18/omitted-variable-effects-in-logistic-regression/](https://win-vector.com/2023/08/18/omitted-variable-effects-in-logistic-regression/ "https://win-vector.com/2023/08/18/omitted-variable-effects-in-logistic-regression/") Good classification write-up: [https://tylerburleigh.com/blog/tidy-tuesday-spam-mail/](https://tylerburleigh.com/blog/tidy-tuesday-spam-mail/ "https://tylerburleigh.com/blog/tidy-tuesday-spam-mail/") Interesting article on WW2 air warfare: [https://scweiss.blogspot.com/2023/08/blood-then-oil-techies-perspective-on.html](https://scweiss.blogspot.com/2023/08/blood-then-oil-techies-perspective-on.html "https://scweiss.blogspot.com/2023/08/blood-then-oil-techies-perspective-on.html") https://www.linkedin.com/posts/towards-data-science_understanding-kolmogorov-smirnov-ks-tests-activity-7040314275927650304-J722 On the reliability of published findings using the regression discontinuity design in political science - https://journals.sagepub.com/doi/full/10.1177/20531680231166457 - Abstract: The regression discontinuity (RD) design offers identification of causal effects under weak assumptions, earning it a position as a standard method in modern political science research. But identification does not necessarily imply that causal effects can be estimated accurately with limited data. In this paper, we highlight that estimation under the RD design involves serious statistical challenges and investigate how these challenges manifest themselves in the empirical literature in political science. We collect all RD-based findings published in top political science journals in the period 2009–2018. The distribution of published results exhibits pathological features; estimates tend to bunch just above the conventional level of statistical significance. A reanalysis of all studies with available data suggests that researcher discretion is not a major driver of these features. However, researchers tend to use inappropriate methods for inference, rendering standard errors artificially small. A retrospective power analysis reveals that most of these studies were underpowered to detect all but large effects. The issues we uncover, combined with well-documented selection pressures in academic publishing, cause concern that many published findings using the RD design may be exaggerated. Contra matching: - https://blogs.worldbank.org/impactevaluations/what-do-you-need-do-make-matching-estimator-convincing-rhetorical-vs-statistical - References Doleac's commentary as well Election modelling / Nate Silver / Nassim Taleb https://quant.stackexchange.com/questions/47054/can-someone-explain-rigorously-talebs-criticism-of-nate-silvers-election-forec#59293