What’s up on the internet in January?
Learning with Privacy at Scale This one is taken from the Apple ML Journal. Basically, the article deals with local differential privacy which refers to the anonymization of personal data on the local computer directly, and not on the server (i.e., after the data have been uploaded). I was pleased to learn that everything is done in order to ensure that this does not impact the device bandwidth (I already know that we can opt in or not). In fact, Apple uses anonymized targeted events (e.g., user typing an emoji or listening to an audio file) that are transferred daily; IP information is removed once data reach the Apple restricted-access server, where aggregated statistics are computed and random samples of individual records are post-processed using dedicated algorithms.
Google chat is dead, Google hangouts are not that fun, so let’s get back to good old IRC! In the meantime, I discovered the matrix project.
Machine learners, beware: You probably need a much larger sample size than you think (see also this BMC article, by Steyerberg and coll.), and we may find “seven tasks which are beyond reach of current machine learning systems and which have been accomplished using the tools of causal modeling”, according to Judea Pearl in his latest arXived paper.
The authors of the great “Mining of Massive Datasets” are working on the third edition of their book. So far, they added a discussion on Spark and TensorFlow, as well as decision trees in their chapter on large scale machine learning (neural networks, support vector machines, k-nearest neighbors and kernel regression).
Fossil is like Git or Mercurial but it also incorporate bug tracking (which is managed via tickets) and event-level technical note that can appear anywhere in time. In comparison to Git, all these features are contained in a single standalone executable with an SQLite backend to store all data and revision history. It is possible to use a free hosting system or to set up a standalone server.
TablePlus is a modern, native tool for relational database. It is compatible with major SQL system (MySQL, PostgreSQL, SQLite) and it features a query editor and a table viewer. Also note that a new version is released every week. It looks like a great alternative to existing solutions on a Mac (which are often limited to one backend if they are not paid app).
For bayesian modeling, in addition to the existing packages RStan and rjags, there is now brms and greta. The brms package relies on Stan and its approach is nicely summarized in the Journal of Statistical Software vol 80 2017 while greta uses Google TensorFlow and uses “more conventional” R syntax.
A Course in Machine Learning is a nicely illustrated textbook on ML where the author discusses various techniques (nearest neighbors, naive Bayes, linear and logistic regression, neural networks) and the underlying algorithms. Of note, this book was written in LaTeX using a highly customized Tufte class. It’s a pity that the margin illustrations were not done the XKCD way :-)
Client-Side Web Development An online course that looks really nice.
Programming Design Systems A free digital book that teaches a practical introduction to the new foundations of graphic design. By Rune Madsen. The chapter on geometric composition is really interesting for those who are versed in data visualization and Trellis displays.
After Meltdown and Spectre, it is probably time to revisit gold standards. Here is the story of qmail: Some thoughts on security after ten years of qmail 1.0. And for the interested readers, go check Dan Bernstein‘s website directly; there really is serious work in computer science there, e.g. Fast multiplication and its applications.
Luna is a WYSIWYG data processing engine. It looks like an interesting app and I should probably take a more serious look at some point. There is a book (work in progress, source on GitHub), a programming language based on Haskell, and the user app.