What’s up on the internet in January?
Learning with Privacy at Scale
This one is taken from the Apple ML Journal. Basically, the article deals with local differential privacy which refers to the anonymization of personal data on the local computer directly, and not on the server (i.e., after the data have been uploaded). I was pleased to learn that everything is done in order to ensure that this does not impact the device bandwidth (I already know that we can opt in or not).
What’s going on on arXiv these days?
Here is my reading list for the past couple of weeks.
arXiv:1801.00631 Deep Learning: A Critical Appraisal, Gary Marcus
This is a brief review of deep learning in the light of the “renewed interest” for artificial intelligence that emerged during the last two years. I believe it does not reflect the opinion of all researchers or practicioners, but anyway there are some interesting references in there, and as the author said this is deliberately oriented toward AI research, not machine learning or data science.
Here are some interesting links I keep opened in my web browser for a while in December.
How to Write a Git Commit Message
This is a well-known article regarding the annotation of your current work or contribution in a version control system. As I tend to work mostly alone, I don’t have a strong need for Git except that it helps me to keep a trace of my workflow and it is quite useful to archive different version or deliverable of my work.
Some quick notes from my recent activities and reading list. Although I am far less active that I used to be in the past (and that I would like to be currently), I still archive–when time allows–interesting things that happen on the web or that I found potentially useful for my work.
The famous “Structure and Interpretation of Computer Programs”, by Abelson, Sussman, and Sussman, is now live on GitHub.
Here are twenty canonical questions when using “learning machines”, according to Malley and co-authors: Malley, JD, Malley, KG, and Pajevic, S, Statistical Learning for Biomedical Data, Cambridge University Press (2011).
Are there any over-arching insights as to why (or when) one learning machine or method might work, or when it might not? How do we conduct robust selection of the most predictive features, and how do we compare different lists of important features?
Here are some notes on cross-over trials and within-patient titration in indirect assays.
Most of my litterature review started with Senn’s textbook, Statistical Issues in Drug Development (Wiley, 2007, pp. 317-336). Contrary to direct assays where we know what we want to achieve, say a given response to treatment, and we adjust the dose until this goal is reached, in indirect assays individual response is studied as a function of the dose and a ‘useful dose’ is decided upon afterwards.
I just got my copy of Exploratory Data Mining and Data Cleaning, by Dasu and Johnson (Wiley, 2003).
This is quite an old book but it offers a nice overview of common techniques to gauge and enhance data quality with exploratory data analysis. I learned about DataSphere partition,(1,2) for instance. This book is, however, not about tools to perform data cleaning or data analysis, like Janert’s Data Analysis with Open Source Tools (O’Reilly, 2001) which presents gnuplot, Sage, R, or Python and offers a small Appendix on Working with Data.
While browsing questions related to psychometrics posted late in 2012 on Cross Validated, I noticed two questions dealing with hierarchical ωh.
These questions come from the use of William Revelle’s psych package, which offers a very nice toolkit for serious psychometrics, especially work related to factor analysis. Just take a look at some of his Psychology 454 syllabus to get an idea of what’s available in psych.
The ωh measure was popularized by Zinbarg, Revelle and coll few years ago.
I have been quite busy during the last two months. This was in part due to writing teaching material, including an 11-week statistical computing course with R and Stata for a French university. As a matter of fact, it took me something like 70 hours to write 300 pages of exercices with solutions, and I am currently writing a 110-page long textbook which aims at introducing students to R and Stata for medical statistics.
As its name suggests, a cognitive diagnosis model aims at “diagnosing” which skills examinees have or do not have. It has become very popular in recent years because it overcomes standard limitations of summated scale scores derived from classical test or item response theory.
There is a detailed overview of CDM by DiBello and coll. in the Handbook of Statistics, vol. 26: DiBello, L.V., Roussos, L.A., and Stout, W.F. (2007).