You can also also view the full archives of micro-posts. Longer blog posts are available in the Articles section.
Handouts with exercises on scientific computing using Python, feat. some introduction to BioPython. #python
TIL about https://dotfiles.github.io, the unofficial guide to dotfiles on GitHub.
The good things in a community site come from people more than technology; it’s mainly in the prevention of bad things that technology comes into play. Technology certainly can enhance discussion. Nested comments do, for example. But I’d rather use a site with primitive features and smart, nice users than a more advanced one whose users were idiots or trolls. — Paul Graham, What I’ve learned from Hacker News
Lovely. https://leon-kim.com/
eBay’s TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.
Population genetics notes, from the the Coop Lab. #bioinformatics
Ezhil: Clean and minimal personal blog theme for Hugo.
Random Forests, Decision Trees, and Categorical Predictors: The “Absent Levels” Problem (PDF).
This problem occurs whenever there is an indeterminacy over how to handle an observation that has reached a categorical split which was determined when the observation in question’s level was absent during training.
TL;DR No feature engineering heuristics seem to really help mitigate this kind problem.