Quick recap’ of March on the Micro blog.
2019-03-05: Beirut, No No No.
2019-03-05: As I am using Postgresql a lot these days, I thought I would import a large CSVfile (1 Go) to see if I can play with in-database tools from dplyr & Co. I willprobably need this for work so it’s worth the effort. I started with a Statafile that I read using haven, and I converted it to a CSV using
data.table::fwrite. This already eated up all my RAM. Now, I’m using csvkit toimport the CSV file into a Postgresql local database. Well, it says a lot aboutthe process:
2019-03-05: I guess I just found another org-powered user!
2019-03-05: I haven’t written a single line of Latex in a long time, but it looks like wenow get Font Awesome for free in our TeX distribution. (via @kaz_yos)
2019-03-05: It’s astonishing how much work has been done regarding working with databaseusing R. We now have dbplot and modeldb (not to be confused with this one). (via@theotheredgar)
2019-03-05: Just added to my Papers list: Mean and median bias reduction in generalizedlinear models. See also the brglm2 R package.
2019-03-05: syn uses OS X’s natural language processing tools to tokenize and highlighttext. Nice utility to add to my writing stack. It is used by Emacswordsmith-mode.
2019-03-06: Better than
time? gnomon is a command line utility to prepend timestampinformation to the standard output of another command.
2019-03-06: Doing work as it shows up.
2019-03-06: Natural Gradient Descent. Be sure to check the rest of the site. I just added itto my RSS reader.
2019-03-06: Vim within Emacs: A very good read even if you’re not versed into Spacemacs.
2019-03-07: Clearly, I’m not that active in the early afternoon. Either because of the lunchbreak or the half-life of my medication… Anyway, Timing is the best timetracking app I’ve seen in a while.
2019-03-07: I’m in my third year with the 12-inch Macbook (generously offered by SB). It iscertainly the best laptop I got in 13 years. Sometimes I feel like I miss thetiny pulsing light (aka sleep indicator) that we used to have on older metallicaluminum body ones. Well, we have backlit keyboard now, even if it is like abutterfly keyboard ;-)
2019-03-07: Nice. I spent some time checking Dimitri Fontaine’s Github repo, in particularhis advent of code in Common Lisp. I am currently reading his book onPostgreSQL, but I couldn’t resist reading some Lisp code after lunch.
2019-03-07: So it seems that we will be done with The Expanse, Season 1, tonight.
2019-03-07: 7 Unix Commands Every Data Scientist Should Know. I lost track of the number ofblog posts I read where the title includes “un*x commands that (data) scientistsshould know.” I expect that soon or later mastering deep learning techniques willbe a mandatory skill as well. Anyway, this gentle tutorial is well tied up, sogo read it if you want to refresh your memory.
2019-03-07: Exterminate Magit buffers: Quite useful tip if like me you happen to kill yourMagit buffers by hand.
2019-03-07: Rash - The Reckless Racket Shell. (via @NlightNFotis)
2019-03-07: Viewing Matrices & Probability as Graphs. With great illustrations. For thoseinterested in catgeory theory, the other posts are worth a look too. See, e.g.this booklet on arXiv (PDF, 50 pp.).
2019-03-09: Factory Records.
2019-03-10: 📖 Rezvani, Le magicien (Actes Sud, 2006)
2019-03-10: Robyn, Indestructible.
2019-03-10: I forgot about OSF. Here is a nice read: A chill intro to causal inference viapropensity scores. Not only do we have a 16 page-long PDF, but also theaccompagnying source files! (via @george_berry)
2019-03-10: I wish I had read this nice post on Travis-CI, by Julia Silge, before I strugglemyself with Travis and R. Unrelated but also interesting post: Tensorflow, JaneAusten, and Text Generation.
#rstats> Understanding how text generation works with deep learning and TensorFlow has> been very helpful for me as I wrap my brain around these techniques more> broadly. And that’s good, because exactly how practical of a skill is this,> right?! I mean, who needs to generate new text from an existing corpus in their> day job?
2019-03-10: Lovely work by @aschinchon! There’s more to see on his blog, e.g. Mandalaxies.
2019-03-10: Today’s lunch:
2019-03-10: Generating Uniformly Random points on a d-sphere and d-ball. (via@AtabeyKaygun)
2019-03-11: Tindersticks, _Tindersticks.
2019-03-11: Sadly, there’s not such a steady flow on Pragmatic Emacs.
2019-03-11: TIL. There’s a nice option when you edit Python code under Emacs which consists in sorting automagically all
import statement. In most cases, it works great, however there are some edge cases. E.g., it is common in Flask applications to have
import defined after initializing the app itself, because of cicular imports. Hopefully, it is possible to override the default settings and to add a local directory variable, as recommended on Spacemacs website (
SPC f v d).
2019-03-11: The value of owning more books than you can read. I have thousands of books in my home, many of which are more than 20 years old. From time to time it seems to me that’s all I have left. I’ve read them all except the last ones I bought. However, I can understand what it’s like to contemplate all that we still have to learn.
2019-03-12: Peter Erskine, Palle Danielsson & John Taylor, As It Was.
2019-03-12: Here is the best take I found on imperative vs. functionnal approach using Lisp.
2019-03-12: TIL Better to use
partition rather than
split when you want to convert a ‘string’ to a ‘dict’ based on the first occurence of a specific delimiter (as in
.split(..., 1)). Note that unlike
split, the delimiter is kept and you probably don’t want to keep it.
2019-03-12: Mathematical Recreations and Essays, by W. W. Rouse Ball. (Note that the PDF is nicely hyper-linked!)> Another common trick is to throw twenty cards on to a table in ten couples, and ask someone to select one couple. The cards are then taken up, and dealt out in a certain manner into four rows each containing five cards. If the rows which contain the given cards are indicated, the cards selected are known at once.
2019-03-13: Exactement, comme dans une épicerie. Et il faut voir ce que cela donne avec lebétail dedans…
2019-03-13: HN on the spotlight: Spotify to Apple and Google and DuckDuckGo.
2019-03-13: MacJournal 7 is now free. I will stay by Org for managing my text files, butit’s good to know anyway. (via Jack Baty)
2019-03-14: Timber Timbre, Creep On Creepin’On.
2019-03-14: Timber Timbre, Sincerely, Future Pollution.
lsp backend for the Python layer in Spacemacs has so much improved overtime, and it is much more featured than the default
anaconda one. Pending minorissues with mypy which complains about missing imports (this can be resolvedusing a config file, as described here), everything works perfectly. Things aregoing too fast for me with the develop branch of Spacemacs.
2019-03-15: Time to watch The Expanse, Season 2, now.
2019-03-15: A (very) short intro to Constraints: Nice visual explanation à la idyll. (via@JohnSelstad)
2019-03-15: Learning Statistics with R. Looks like a nice intro to statistics with R. Ipersonally started with Peter Dalgaard’s Introductory Statistics with R, but nodoubt this should be a good start too (beware this tutorial relies on externalpackages).
2019-03-16: How about generating figure name using MD5 hash? I’ve long been wondering how tostore unique file names for all documents that I happen to write from day today. The last few years, I decided to prefix all such file names using either
img- depending on the context (i.e., whether it has been generated by acomputer program or in the case it’s just an illustration grabed on theinternet), followed by a short but meaningful description, e.g.
img-emacs-screenshot.png. When it is a series of figures, I usually append an index (“a”, “b”, …; or zero-padded numbers). Still I have lot of duplicatesfile names on my HD. One way to circumvent this issue is to generate randomhash, or I believe so since we all have the
md5 utility on Un*x systems.
2019-03-16: Commit Often, Perfect Later, Publish Once. This reminds me of Stack Overflowmotto circa 2010 (“Vote early, vote often”). Anyway, this recommended bestpractices with Git are very well done.> Don’t let tomorrow’s beauty stop you from performing continuous commits today.
2019-03-16: Small Sharp Software Tools. Together with Vince Buffalo‘s Bioinformatics DataSkills, I believe this combo should provide the very best technical expositionto practical Unix. You may want to add Learning Unix for OS X if you’reinterested in Mac-specific tools. (Disclaimer: I haven’t read Hogan’s book yet).
2019-03-17: ECM: Keith Jarrett.
2019-03-17: According to BSAG, Doom Emacs has been polished a little in recent months. I’mstill on Spacemacs–probably for a long time to come–but I remember howpleasant the experience with Doom Emacs was.
2019-03-17: It looks like Statistical Rethinking will have a profund impact on bayesianstatistical computing. There’s now a Julia package to complement the R one. (via@zerology)
2019-03-17: RMS is now taking care of Apple. Now, I can’t help but smile at the idea of thispicture where we see RMS carrying his laptop on his shoulder. Surely he wasn’tlistening to music on iTunes. Note too that the list of criticisms made ofMicrosoft is much shorter (fair enough), but the same is true for Google whoonly gets two dozzns of bad marks!
2019-03-17: Explorable multiverse analyses. What a talent this guy has! (via @mjskay)
2019-03-17: Mathematics for Machine Learning is finally out. (via @ChengSoonOng)
2019-03-18: Jazz Chill.
2019-03-18: TIL DuckDuckGo, which has been my default search engine since 2018, features abuilt-in URL shortener. So nice! (via Brett Tersptra)br> 2019-03-18: The number of projects hosted under the Apache Software Foundation never ceasesto fascinate me. Today, I discovered Jena for the semantic web!
2019-03-18: Flux ML and differentiable programming. Nice to see how new packages arecontinuously coming in the Julia ecosystem, after so many years.
2019-03-18: The Definitive Guide To Syntax Highlighting. Nice to see some good old postsabout Emacs. It makes me want to activate the paren-face mode to change alittle.
2019-03-19: 📖 Delphine de Vigan, Les gratitudes (JC Lattès, 2019)
2019-03-19: Suede, Dog Man Star.
2019-03-19: A bit late (3pm), but delightful:
2019-03-19: I have been seriously thinking of subscribing to NordVPN during the last fewmonths. On further inspection, there was a good deal for the 3-year subscriptionplan. Now, it’s done.
2019-03-19: Little flowers to go with today’s sunshine:
2019-03-19: So I only have three episodes left before I finish my last TV series, Occupied.
2019-03-19: Essential Statistics with Python and R. Although this textbook does not coveradvanced material (and the figures are terrible), it comes with a lot ofexercices that one can solve using either R or Python.
2019-03-19: Performance of Error Estimators for Classification (PDF). Always good to beremembered of how important error estimation is in statistical modeling,especially with small samples. Remember Frank Harrell‘s post?
2019-03-19: Slate “helps you create beautiful, intelligent, responsive API documentation.” Itreminds me of the whole stack of racco (probably dead), docco (sill live) & Co.Slate is used in Clojure by Example, a site that offers an original and veryinstructive approach to learning the basics of the Clojure language.
2019-03-20: 📖 Alberto Moravia, L’amour conjugal (Denoël, 1948)
2019-03-21: Lorde, Pure Heroine.
2019-03-21: I was just reading some of Rackhim‘s posts. He’s the author of the recent EmacsCast.The one on backups is quite interesting. I use Arq (Thx @fonnesbeck!) dailysince 5 years or so and I have been happy with that only one solution to backupmy personal and work-related data. I no longer use cloud fronts like Dropbox,except for already anonymised stuff I don’t get care enough to bother with privacy.
2019-03-21: On the simplicity of working with a Terminal: processing 44K of mails in lessthan 2 seconds.
2019-03-21: TIL There are several flavours of
awk lurking around on the internet. Here isbioawk, a bioinformatics-aware
2019-03-21: Too late to start re-reading Don Knuth’s excellent book onMathematical Writing (PDF), but I will definitively do it in a few days.
2019-03-21: What is Data Science after all? I never liked this term, and I consider myselfas a statistician, or better a data craftsman, because I mostly spend my timedealing with data after all. Stephanie C. Hicks & Roger D. Peng wrote a nicearticle, Elements and Principles of Data Analysis, which I believe providesquite an honest account of DS-related stuff:> Data science is the science and design of (1) actively creating a question to> inves- tigate a hypothesis with data, (2) connecting that question with the> collection of appro- priate data and the application of appropriate methods,> algorithms, computational tools or languages in a data analysis, and (3)> communicating and making decisions based on new or already established knowledge> derived from the data and data analysis.
2019-03-21: Foundations of Machine Learning. Never heard of it before I spotted @gappy3000 tweet.
2019-03-21: Moving to a World Beyond “p < 0.05”. Or maybe the earth isn’t just round. (via @kaz_yos)
2019-03-21: Sudoku solver written in more or less 30 lines of Racket code.
2019-03-22: Interesting to know: The wakefield R packages allows to quickly generate randomdata sets. I learned about that while reading David Gohel’s Using R as a BItool.
2019-03-22: An Introduction to Applied Bioinformatics: An interesting online textbook that Ifound while browsing the scikit-bio Python package on Github.
2019-03-22: Scientists rise up against statistical significance. Together withMoving to a World Beyond “p < 0.05”, it is probably time to rethink statisticalsignificance and embrace the world of uncertainty instead. As Stephen Seen oncesaid:> We can predict nothing with certainty but we can predict how uncertain our> predictions will be, on average that is. Statistics is the science that tells us> how.
2019-03-25: Nick Cave & The Bad Seeds, Nocturama.
2019-03-25: Jazz Chill.
2019-03-25: Even if I have only increased the length of my working days by 2 hours in 1month (currently 9am to 3pm), I definitely stay out of work for a good 2 or 3hours once I get home. I guess I just have to live with that for the moment.It’s probably time to finish Occupied before the beer finishes me off.
2019-03-25: Here is the fourth edition of Algorithms, by Sedgewick & Wayne, a definitive book to have afterKnuth’s monumental work and the Cormen et al. (via @TechSparx)
2019-03-25: What a beautiful artistic work at the crossroads between dataviz andinfographics, by @janezhgw.
2019-03-25: Functional programming explained for the pragmatic programmer. Nice take. Maybeit would have been easier to focus on C versus Common Lisp before addressing thecase of hybrid languages. (via HN)
2019-03-26: Owen Pallett, In Conflict.
2019-03-26: Got a little upgrade under the hood in the morning: Nothing really new, though, except perhaps the “more editorial highlights on asingle page in the Browse tab” in iTunes.
2019-03-26: I like minimal theme, hence the Hugo them I choose last year. However, I justfound an even more minimalist theme:slim.
2019-03-26: Well, I’m done for good with Occupied, my list of TV series is out of stock, andI have no idea what to look for. I guess I’m good at reading books and watchingMinecraft gamers on Twitch.
2019-03-26: Scott’s World*, and more animations to see on Complexity Explorables.
2019-03-27: Bill Evans, You Must Believe in Spring.
2019-03-27: Les données conduisent au refroidissement social.
2019-03-27: Never heard of Qwant before, but it looks like a good alternative to Google or Bing.
2019-03-27: Still no idea which TV series to watch, nor what to look for in this evening.So I’ll keep posting (not so) random links that have been hanging around on myiPhone for weeks:- Time Series Analysis, by Kevin Kotzé- Feather, CSV, or Rdata, by Vince Buffalo- pandas 2.0 Design Documents, by the Pandas team- Intervention Analysis, from the PennState Eberly College of Science- If not SICP, then what? Maybe HTDP?, by Steven Rosenberg- Matrix Algebra proGrams In Common Lisp, by Rigetti Computing- Now You C Me, by Davis Vaughan- Codex Seraphinianus, by John Borwick- Reading large CSV files in R, on SO- Advent of Code in Lisp, by Dimitri Fontaine- Marijn Haverbeke’s homepage- Modern Regression Analysis (PDF), by James C. Slaughter- Variance reduction in randomised trials by inverse probability weighting usingthe propensity score, by Williamson et al.- David Gohel’s Github account- Should we ignore covariate imbalance and stop presenting a stratified ‘tableone’ for randomized trials?, on Frank Harrell’s Discourse- A Python package for exploring and analysing genetic variation data- Replacing Disqus with Github Comments, by Don Williamson
2019-03-28: Belle and Sebastian, If You’re Feeling Sinister.
2019-03-28: I remember the time when I was using PLINK to perform genome-wide analysis,before I switched to David Clayton’s excellent
snpMatrix R package. Now, itlooks like some folks are interested in using Julia for this stuff.
2019-03-28: A successful Git branching model. See also What is wrong with this. Personally,I found that the Atom team has a pretty nice setup for working with stable andbeta version.relies on
2019-03-29: I’m quite happy actually with how Spacemacs handles LSP for various modes I makeregularly use of (Python, JS, C), thanks to the wonderful lsp-mode. Today, Idiscovered that there’s another “universal” package, eglot, for dealing with allavailable servers. (via @hillelogram)
2019-03-29: I’m trying to use Eshell more consistently since a few days. Mastering Emacs is(as always) quite useful in this respect. After having tried some customsettings, including those found on Modern Emacs, I finally choose the full-featuredeshell-git-prompt.
2019-03-29: Nice finding today! Just when I thought I would need to write a full macOSnative app for viewing Fasta files or MAFFT-aligned sequences, I found italready exists, and it’s so much faster and prettier than Jalview. Thank you somuch Mathieu Fourment!
2019-03-29: Grav – a modern flat-file CMS. If you’re looking for an alternative to Jekyllor Hugo, there’s probably some good stuff behind this open-source project. (viaBrett Terpstra)
2019-03-29: Numerical Tours of Data Sciences, feat. Python, Julia and R.
2019-03-29: The Computer Language Benchmarks Game. (via Daniel Lemire)
2019-03-29: Typing is not the problem. Nice take! This came just after reading Tom’s lastpost, where I also learned about conventional commits. The latter reminds methat at some point I was using some ideas from Modern Emacs to highlight commitleaders.
2019-03-30: 🎥 Matrix. Long time no see… wait, it’s still as topical as ever. Remember those who spoke about AI 20 years ago? Or wrote Black Mirror more recently? Anyway, my son and I had a good time watching this “viral” movie.
2019-03-30: Remote pbcopy on OS X systems. Nice tip, as always.
2019-03-31: Les Ogres de Barback, Amours grises & colères rouges.