2019

2019-02-17 20:35

  Nick Cave & The Bad Seeds, Push the Sky Away.

2019-02-17 20:29

I am about to exceed the 150th micro-posts in my Org file. (Other posts are published from the terminal directly.) I added a little cookie to keep track of the number of entries, although a little harder path would be to write some elisp code. #org

2019-02-17 20:19

I don’t have any big needs in terms of image processing, and I am generally happy with ImageMagick. However, Acorn and Retrobatch (h/t Brett Terpstra) look pretty nice.

2019-02-17 18:39

  Nick Cave & The Bad Seeds, The Boatman’s Call.

2019-02-17 18:34

Just cleanup a little bit more my Dropbox (6 Go of data, reports and papers accumulated along 8 years!).

2019-02-17 18:15

Machine learning in Clojure with XGBoost. Note that there are bindings for the awesome xgboost in various other languages (Python, Julia, R), not just the JVM. #clojure

Python didn’t become the leader in the field because it’s inherently better or more performant, but because of scikit-learn, pandas and so on. While as Clojurists we don’t really need pandas (dataframes) or similar stuff (everything is just a map, or if you care more about memory and performance a record) we don’t have something like scikit-learn that makes really easy to train many kind of machine learning models and somewhat easier to deploy them.

2019-02-17 18:05

merlin - a unified framework for data-analysis, and many other interesting packages by the same author or other coworker. #stata

2019-02-17 17:56

Python for Epidemiologists, feat. zEpid which I just discover right now. #python

A few highlights: basic epidemiology calculations, easily create functional form assessment plots, easily create effect measure plots, generate and conduct diagnostic tests. Implemented estimators include; inverse probability of treatment weights, inverse probability of censoring weights, inverse probabilitiy of missing weights, augmented inverse probability weights, time-fixed g-formula, Monte Carlo g-formula, Iterative conditional g-formula, and targeted maximum likelihood (TMLE).

Note that lifelines requires Matplotlib 2.2.3 but the latest release, as upgraded when installing zepid, is 3.0.2. How nice!

2019-02-17 12:07

Again, I’m slowly updating stata-sk. It took me a while to reset the publishing system to use Stata 13 MP instead of Stata 15 since I no longer get a free license for it. This will probably be my last textbook on Stata. #stata

2019-02-17 08:51

Look. Even Racket has some support for statistical data structure like data frames. In addition, here is an essential read if you want to get started with common data structures: An Overview of Common Racket Data Structures. #scheme

2019-02-16 14:14

An analysis of lossless data compression programs: Large Text Compression Benchmark. (via SO–it looks it is the very first question on the beta site)

The amount of genomic sequence data being generated and made available through public databases continues to increase at an ever-expanding rate. Downloading, copying, sharing and manipulating these large datasets are becoming difficult and time consuming for researchers. We need to consider using advanced compression techniques as part of a standard data format for genomic data. The inherent structure of genome data allows for more efficient lossless compression than can be obtained through the use of generic compression programs. We apply a series of techniques to James Watson’s genome that in combination reduce it to a mere 4MB, small enough to be sent as an email attachment. – Human genomes as email attachments

2019-02-16 13:51

I haven’t yet embraced the full power of Julia for data munging, but surely this article is a gem to understand the language at a deeper level. #julia

2019-02-16 09:04

Useful tips to build and manage R packages: rOpenSci Packages: Development, Maintenance, and Peer Review. #rstats

2019-02-16 09:02

Probability and Statistics: a simulation-based introduction, by Bob Carpenter. I like it when there are instructions for those like me who do not want to install RStudio to build the book. #rstats

2019-02-15 09:41

Causal Inference Book, Python code hosted on GitHub (by the author of the Stata kernel). (via @kaz_yos)

2019-02-14 21:27

I’m halfway thru my new TV show (Occupied), but I’m struggling to motivate myself to move forward right now, even to watch TV right now. Besides that, I’m finally getting a job back. Let’s just hope I don’t go back to the hospital too soon. #self

2019-02-14 21:22

  Joy Division, Closer.

2019-02-13 21:34

Why the 3? Earlier in the morning I was reading one of the latest posts published by John D. Cook about dose finding studies. I am well aware of the 3+3 design. Incidentally, I attended a meeting yesterday where a PhD student was presenting his work in microbiology, and they used triplicates. It is interesting that the same 3 seems like a magic number here, but it is not the same. Maybe I should drop a note in a few days.

2019-02-13 21:27

Not sure how we can think of GTD when we spend about one hour cleaning up defunct stuff on our HD, but sure we are close…

2019-02-13 13:48

One of the first hit when looking for “Lisp and bioinformatics” on the internet: How the strengths of Lisp-family languages facilitate building complex and flexible bioinformatics applications. #lisp

2019-02-12 08:45

I’ve been following Greg Stein on Caches to caches for a long time now, because the site has such a beautiful design and useful material on Emacs and Org mode. Recently they published a series of posts on AI and ML.

2019-02-12 08:37

disk.frame is a new (dplyr-compliant) R package to manipulate structured tabular data that doesn’t fit into RAM, in the spirit of Dask for Python. #rstats

2019-02-11 21:34

Another nice article about GTD by BSAG. I enjoy reading her blog posts, and I really love her website design. Funny thing: I was just reading some old posts written by Bastien Guerry on Org mode.

2019-02-11 21:19

Overnight…

2019-02-11 21:11

  Gary Peacock, Jack DeJohnette & Keith Jarrett, My Foolish Heart (Live at Montreux).

2019-02-11 18:48

  Jack DeJohnette, Ravi Coltrane & Matt Garrison, In Movement.

2019-02-11 14:31

Portacle is a complete IDE for Common Lisp that you can take with you on a USB stick.

If you are looking for a quick solution, here it is. Otherwise, learn Emacs for good. #emacs

2019-02-11 14:24

Staying with Common Lisp. Safe no move perhaps? On a related note, here is an enlightening discussion about Racket vs. Lisp: Why I haven’t jumped ship from Common Lisp to Racket (just yet). #lisp #scheme