I am about to exceed the 150th micro-posts in my Org file. (Other posts are
published from the terminal directly.) I added a little cookie to keep track of
the number of entries, although a little harder path would be to write some
Just cleanup a little bit more my Dropbox (6 Go of data, reports and papers accumulated along 8 years!).
Python didn’t become the leader in the field because it’s inherently better or more performant, but because of scikit-learn, pandas and so on. While as Clojurists we don’t really need pandas (dataframes) or similar stuff (everything is just a map, or if you care more about memory and performance a record) we don’t have something like scikit-learn that makes really easy to train many kind of machine learning models and somewhat easier to deploy them.
A few highlights: basic epidemiology calculations, easily create functional form assessment plots, easily create effect measure plots, generate and conduct diagnostic tests. Implemented estimators include; inverse probability of treatment weights, inverse probability of censoring weights, inverse probabilitiy of missing weights, augmented inverse probability weights, time-fixed g-formula, Monte Carlo g-formula, Iterative conditional g-formula, and targeted maximum likelihood (TMLE).
lifelines requires Matplotlib 2.2.3 but the latest release, as
upgraded when installing
zepid, is 3.0.2. How nice!
Again, I’m slowly updating stata-sk. It took me a while to reset the publishing
system to use Stata 13 MP instead of Stata 15 since I no longer get a free
license for it. This will probably be my last textbook on Stata.
Look. Even Racket has some support for statistical data structure like data
frames. In addition, here is an essential read if you want to get started with
common data structures: An Overview of Common Racket Data Structures.
The amount of genomic sequence data being generated and made available through public databases continues to increase at an ever-expanding rate. Downloading, copying, sharing and manipulating these large datasets are becoming difficult and time consuming for researchers. We need to consider using advanced compression techniques as part of a standard data format for genomic data. The inherent structure of genome data allows for more efficient lossless compression than can be obtained through the use of generic compression programs. We apply a series of techniques to James Watson’s genome that in combination reduce it to a mere 4MB, small enough to be sent as an email attachment. – Human genomes as email attachments
I haven’t yet embraced the full power of Julia for data munging, but surely this
article is a gem to understand the language at a deeper level.
Useful tips to build and manage R packages: rOpenSci Packages: Development,
Maintenance, and Peer Review.
Probability and Statistics: a simulation-based introduction, by Bob Carpenter. I
like it when there are instructions for those like me who do not want to install
RStudio to build the book.
I’m halfway thru my new TV show (Occupied), but I’m struggling to motivate
myself to move forward right now, even to watch TV right now. Besides that, I’m
finally getting a job back. Let’s just hope I don’t go back to the hospital too
Why the 3? Earlier in the morning I was reading one of the latest posts published by John D. Cook about dose finding studies. I am well aware of the 3+3 design. Incidentally, I attended a meeting yesterday where a PhD student was presenting his work in microbiology, and they used triplicates. It is interesting that the same 3 seems like a magic number here, but it is not the same. Maybe I should drop a note in a few days.
Not sure how we can think of GTD when we spend about one hour cleaning up defunct stuff on our HD, but sure we are close…
One of the first hit when looking for “Lisp and bioinformatics” on the internet:
How the strengths of Lisp-family languages facilitate building complex and
flexible bioinformatics applications.
I’ve been following Greg Stein on Caches to caches for a long time now, because the site has such a beautiful design and useful material on Emacs and Org mode. Recently they published a series of posts on AI and ML.
Portacle is a complete IDE for Common Lisp that you can take with you on a USB stick.
If you are looking for a quick solution, here it is. Otherwise, learn Emacs for
Staying with Common Lisp. Safe no move perhaps? On a related note, here is an
enlightening discussion about Racket vs. Lisp: Why I haven’t jumped ship from
Common Lisp to Racket (just yet).