This blog is intended to document my day to day experiences as a statistical consultant. Basically, if I have to spend more than a few minutes thinking through a statistical argument, I'd like to leave myself a record of my thought process.
As such, the subject matter here could really be anything: theory, applications, modeling, computation, visualization, data mining, Bayesian analysis, experimentation, big data, machine learning.
Probability Integral Transform, A Proof
The probability integral transform is a fundamental concept in statistics that connects the cumulative distribution function, the quantile function, and the uniform distribution. We motivate the need for a generalized inverse of the CDF and prove the result in this context.
[Python] Sphinx Compatible Forwarding Patterns in Python
We develop a Python example that showcases the forwarding pattern while handling docstrings in a Sphinx compatible way. We maintain this compatibility in two ways: first, using a metaclass, and second, using decorators. Along the way, we discover a few things about binding instance and class attributes.
[Python] Pandas.DataFrame, PostgreSQL, and Autoincrementing Columns
Pandas.DataFrame has a to_sql() convenience method for pumping dataframes to SQL tables. However, there is no parameterization of to_sql() that will create an autoincrementing column index. This post details a simple workaround.
[C] Dynamic Programming 101: Change for a Dollar
We describe an efficient dynamic programming algorithm to compute the number of ways that one might make change for a dollar. The answer, assuming pennies, nickles, dimes, quarters, and half-dollars? Two hundred ninety two.
[Python] Random Access Priority Queue
We describe a priority queue data structure that allows item removal via key or removal via (lowest) priority. As a bonus, there is code that shows how to implement a wrapper for a decorator class that enables per-method parameters and has access to class variables.
[R] Adaptive Rejection Sampling
Adaptive rejection sampling is a statistical algorithm for generating samples from a univariate, log-concave density. Because of the adaptive nature of the algorithm, rejection rates are often very low.
Multivariate Normal: Conditional Density Derivation
We derive the classical result: what is the density of a multivariate normal conditioned on some proper subset of its components?
Deploying a New Rails App On a Subdomain
This post: a step-by-step recipe for deploying a new Rails app on a subdomain. Our Ubuntu server runs Apache / Passenger / Rails / Capistrano; our domain registrar is namecheap.
[R] Uber Interview Challenge
An analysis done as part of a recent Uber interview, this post showcases a regularized logistic regression model used to assess customer retention.
Rmarkdown, A Simple Example
This post is an example showing how to write a post in R markdown. In particular, there is an Rmd block that generates a figure and a second Rmd block that generates a nicely formatted table.
Knitr & Jekyll: A Stats Blog Pipeline
Combining Knitr and Jekyll took some effort. This post describes how to get it all working as part of a statistical blogging pipeline.
Namecheap, Dynamic IPs, ddclient, and Hosting Multiple Sites on a Single Server
How to run a single Ubuntu server with a dynamic IP address that hosts several different sites: this post will show you how using the Namecheap domain name registrar.
subscribe via RSS