I fucked up.
Short version: If you’re reading mailing lists with an NNTP news reader via news.gmane.org, you should update your news reader to point to news.gmane.io instead. — Whatever Happened To news.gmane.org?
2020-04-02: And so we are done with Black List, Season 4, as of yesterday evening. We need to take a break, my son and I, and we decided to watch Season 10 of The Walking Dead. It’s a perfect fit, isn’t it?
2020-04-02: I can relate.
I am a medical statistician. I've studied this stuff at university, done data analysis for decades, written several NHS guidelines (including one for an infectious disease), and taught it to health professionals. That's why you don't see me making any coronavirus forecasts.— Robert Grant (@robertstats) April 2, 2020
When Gmail users send mail from mechanisms other than Google’s web interface (i.e.: their phone or laptop’s email program), Gmail includes the user’s IP address in message headers. — Client IP Address Disclosure in smtp.gmail.com
2020-04-03: Amazing. (via jlelse’s blog)
2020-04-03: For those interested in Lisp and related family of FP languages, the ACM is actually open access to proceedings from several conferences from the 90s.
2020-04-03: No good news for data workers, right?> So Numbers took 9.4 times as long as Excel, which isn’t great. But it’s an improvement over my test of Numbers ’13 and Excel 2011, where Numbers took 102 times as long as Excel. — Opening Large CSV Files in Numbers 10.0
2020-04-03: Embeddings in Natural Language Processing: Theory and Advances in Vector Representation of Meaning (PDF, 163 pp.).
2020-04-03: Linear Algebra Done Right, by Sheldon Axler, is available for free as well. Idem for Bayesian Data Analysis (PDF, 3rd ed.), but see the GH repo. Happy readings and stay safe!
2020-04-03: Using JavaPlex with Clojure.
2020-04-03: Version 4.0 of the survey package is on its way to CRAN..
2020-04-05: Banks, Goddess.
Considering everything we know about China — human rights violations, untrustworthy track record, unaccountable totalitarian leadership, vast resources, and their technical expertise to act, at scale, on access to potentially sensitive poorly-encrypted video calls — China is quite literally and obviously the last country on the face of the earth where you’d want video calls routed. — TechCrunch: ‘Zoom admits some calls were routed through China by mistake’
One of my produtivity challenges is starting projects. When writing, I struggle with the blank page. When faced with a big project, I struggle to decide where to start. My suspicion is that REPL-driven development facilitates tinkering that gets me over that initial hump. — Programming horizons revisitedI would add that the REPL-driven approach to data analysis (R or Stata compared to, say, SAS or SPSS) let you ‘feel’ the data in a particular way. Think of exploratory data analysis, for instance.
2020-04-05: Culinary memories of the last days.
Brian D. Ripley, Spatial Statistics. On the one hand, this is from 1981, so all the detailed computational advice is laughably obsolete. (At one point, Ripley discusses strategies for not having to keep all of a 128 kb image in main memory at once.) There has also been a lot of advances in some aspects of the theory, notably point processes. On the other hand, Ripley’s basic advice — visualize; do less testing for “randomness” and more model-building; simulate your models, visualize the simulations, and test modeling assumptions with simulations and visualizations; smooth, and remember that “kriging” is just the Wiener filter — remains eminently sound. — I have been reading bits and pieces of this book, off and on, since around 2000, but I have a rule about not recommending something until I’ve finished it completely. Having finally now read it all, including the chapter on tomography (!), I can safely say: anyone seriously interested in spatial statistics probably ought to read this, but you can skip the tomography chapter as obsolete. — Books to Read While the Algae Grow in Your Fur, March 2020
2020-04-06: Yet another nice post by Travis Hinkelman on statistical data structures in Scheme: Split, bind, and append dataframes in Chez Scheme.
2020-04-07: Nice post on Backtracking, by Martin Thoma.
2020-04-07: Functions Explained Through Patterns.
2020-04-07: cljfx: Declarative, functional and extensible wrapper of JavaFX inspired by better parts of react and re-frame.
2020-04-07: emacs-vega-view: a small library meant to facilitate exploratory data visualization using Vega.
But Emacs and Vim have been shaped, balanced, sharpened and smoothed over by decades of usage by hundreds of thousands of programmers, each trying to get through their day as efficiently and fuss-free as possible. In the right hands, they move lines, shift paragraphs and fling code better and faster than anything out there. — A well-honed tool (via Irreal)
Google, the world’s largest ad-tech company, has direct access to user data and browsing information from a large part of the web traffic. Their data collection can track an individual from multiple angles to create the best possible behavioral profile. Google has nine different products with more than one billion users each. — Why you should stop using Google Analytics on your website
2020-04-08: I was cleaning an old 500 Go HD that I used to use for backup some years ago. That feeling when you finf some good old R code…
2020-04-08: This also is ten years old!
2020-04-08: We are done with season 10 of The Walking Dead. Something’s obviously missing, and it was way too short.
2020-04-08: Printing from the command line. Because why not?
2020-04-09: Still reading, and cooking…
The two libraries [Scalaz and Cats] have different styles, and both remain heavily used by portions of the community today. The evangelism has died down, to some extent, but usage remains strong and everyone recognizes functional programming as one possible style in which to write your Scala applications. — The Death of Hype: What’s Next for Scala
2020-04-10: Happy to take another fresh look at Kristoffer Magnusson’s nice visualization projects.
2020-04-10: I happened to finish to transfer 200 Go of data over home wifi to an 8 year old Time Machine. It tooks a night, and part of a day, btw.
2020-04-10: Lot of interesting Stata programs for epidemiology and econometrics.
2020-04-10: Time for Fira Code v3!
2020-04-10: Break on NaN in gdb, or how to detect “not a number” edge cases in C.
2020-04-10: Bringing GNU Emacs to Native Code.
2020-04-10: Modern Data Analysis for Economics.
2020-04-11: Joan As Police Woman, To Survive.
2020-04-11: John Cale, Fragments of a Rainy Season.
Paul Graham describes LISP as the convergence point for all programming languages. His observation is that as languages mature, the average language continues to slide towards LISP. Therefore understanding LISP is to understand the fundamental model of modern programming. — Understanding the Power of LISP
September 1993. Before then, the internet was primarily a university thing, and in university settings, fall semester starts in September, so you had a whole freshman class getting their first networked computer access and often doing rude things with it, especially in the eyes of the Sys Admins who had to deal with it. The problem was, in 1993, you started having the WorldWideWeb, and thus there would always be an influx of new users who continue to behave in ways Sys Admins consider rude. And that influx never stopped. — TheSeptemberThatNeverEnded
2020-04-11: How nice! Frank Harrell is about to release blrm, an extension of the rms package for Bayesian binary and ordinal proportional odds logistic regression. Together with brms, the bayesian toolbox has grown fast on CRAN the last few years.
2020-04-11: Interesting study: Looking back at findings from a series of eyetracking studies over 13 years, we see that fundamental scanning behaviors remain constant, even as designs change.> The more things change, the more they stay the same.
2020-04-11: It looks like Rogue Amoeba is the definitive way to go for controlling audio IO on a Mac these days. See also Podcasting Microphones Mega-Review.
2020-04-11: It’s always interesting, if not enlightening, to re-read Terence Tao’s review on probability theory ten years later.
2020-04-11: Let’s be happy, and here’s what you probably need for tonight:
2020-04-11: This was the day it was.
(puff pastry with spinach, feta, prunes, onions and spices)
2020-04-11: Pointless: a scripting language for learning and fun.> Expressions in Pointless are normally evaluated eagerly. There are exceptions to this rule, like the branches of conditional statements, as described previously. There are two other important instances where the language introduces laziness: lists and definitions.
2020-04-14: A nice discussion about UTF-8 encoding.> We see no particular reason to favor Unicode code points over Unicode grapheme clusters, code units or perhaps even words in a language for that. On the other hand, seeing UTF-8 code units (bytes) as a basic unit of text seems particularly useful for many tasks, such as parsing commonly used textual data formats. This is due to a particular feature of this encoding. Graphemes, code units, code points and other relevant Unicode terms.
2020-04-14: Lovely: The Webpage, an online RSS reader and news aggregator, styled like a newspaper. (via HN)
2020-04-14: Some handy shortcuts to rename files on Un*x systems.
Emacs (like Smalltalk) has no barriers except at its low-level foundations. Emacs users can change, and break, anything they like. Emacs packages made available for download can potentially contain very malicious code. The Emacs philosophy, going back to the 1970s when there were neither cybercriminals nor completely tech-naive users, is that Emacs users are fully responsible for managing their Emacs environment. This actually works very well in practice, even today, because Emacs is neither attractive for completely tech-naive users, nor sufficiently popular to be an interesting target for cybercriminals. — The most successful malleable system in history
We’re happy to announce we’re making private repositories with unlimited collaborators available to all GitHub accounts. All of the core GitHub features are now free for everyone. — GitHub is now free for teams
2020-04-15: Interesting discussion regarding Markdown for serious typesetting (using shell scripting).
2020-04-15: This was also the day it was.
(roast chicken, fried potatoes and carrots, and celery root.)
2020-04-16: Falcon: Free, open-source SQL client for Windows and Mac. Looks like a nice successor to Induction app (now defunct).
2020-04-16: Harp: The static web server with built-in preprocessing.
Given that many of Common Lisp’s defining features and advantages are also available in other languages, who would choose Lisp over any of the more mainstream options? Someone who needs to write code that is portable across operating systems and competing implementations in a high-level, compiled language that generates standalone executable binaries with execution speed comparable to C. — Pragmatic reasons for choosing Common Lisp
2020-04-17: > People who have never really lived in a world without mobile phones […] might think that daily life at that time was unnecessarily complicated and ‘harder’. Organising meetings, finding people, finding places around you, having to use paper maps instead of having a portable device with GPS functionalities built in, not being able to look things up in Google or Wikipedia at any time. The truth is, people knew how to organise themselves with the tools they had available. Daily life had a completely different pace and style, built around the tools available at that time. It really isn’t a matter of ‘worse’ or ‘better’ — life was just different. — How I’d live this quarantine if it was 1990
2020-04-17: The book Programming Algorithms (A comprehensive guide to writing efficient programs with examples in Lisp) is now completed, and it is available (eventually for free) on Leanpub.
2020-04-17: A List of Ways to Confirm the Earth is Round.
2020-04-17: Motzkin paths and source code silhouettes. See also Motzkin numbers on the main blog.
2020-04-17: Web development starter pack.
2020-04-17: org-noter: Emacs document annotator, using Org-mode, with some differences from Interleave.
Ranking is a farce. Apparent performance is actually attributable mostly to the system that the individual works in, not to the individual himself. — Statistical process control after W. Edwards Deming
Refinement types give us the ability to define validation rules, or more commonly called predicates, at the type level. This means we get compile-time validation whenever the values are known at compile-time. — Parallel typeclass for Haskell
Aside from the compilation aspect, I think that
assert like statements (in Stata or Python) are very close to the above statement in statistical programming.
Whenever I hear someone emphasize the speed of their just-released scientific software, my strong Bayesian prior is that they are really telling me their code is not only full of bugs (all software is!) but that it’ll be really hard to find and fix them… — Software and workflow development practices
2020-04-21: After a few days of use, I found that emacs-jupyter really is an amazing package.
2020-04-21: I have nothing to delete (I already did this a lot ), and I missed the deadline. See you next year for the deletion day!
2020-04-21: I’ve been lately retrying Atom: Atom IDE and the rest of Facebook’s stuff is really in a sad mood apparently (Rust is ok but Haskell is a 2nd class citizen all along; Nuclide terminal’s not working), Hydrogen still works great but you need to update kernel specs for Python 3. I guess I will just return to my usual stuff in Emacs. BTW, this is a great read if you’re looking for an alternative to Hydrogen and/or Nteract: Cheap polyglot notebooks.
2020-04-21: Computational Category Theory (PDF, 263 pp.).
2020-04-21: Jupyter Notebooks as PDF. Surely an interesting option for those using Jupyter notebooks extensively. Of note, it will install Chromium since the NB -> HTML conversion is performed without Latex.
2020-04-21: Ricing up Org Mode.
2020-04-21: Switching to Doom Emacs. Nothing really fancy about Doom capabilities and philosophy, but in case you’re interested in switching too, here are the instructions to install it!
There are an untold number of analyses of panel data affected by an issue that is almost impossible to identify because R and Stata obscure the problem. Thanks to multi-collinearity checks that automatically drop predictors in regression models, a two-way fixed effects model can produce sensible-looking results that are not just irrelevant to the question at hand, but practically nonsense. — What Panel Data Is Really All About
2020-04-24: And so it happened: We now have
dplyr clones here and there, with base R only. I will call it the #rstats clone war from now on.
2020-04-24: So many changes to expect in existing code base! R 4.0.0 is released.
2020-04-24: TIL about Generalized Random Forests, which currently provides non-parametric methods for least-squares regression, quantile regression, and treatment effect estimation (optionally using instrumental variables).
2020-04-24: Welcome to the Kattis Problem Archive. Yes, I need more ways to waste time these days.
2020-04-24: CUSTOM GAME ENGINES: A Small Study.
2020-04-24: Incremental Regular Expressions.
2020-04-24: Obuscated C Christmas programs, with some tricks.
2020-04-28: Nick Cave, The Story – And the Ass the Angel.
2020-04-28: I didn’t upgrade my R stuff yet. I should note that beside reference counting, the new color palette looks really great!
2020-04-28: If you’re missing some good books while retired at home, don’t forget Springer offers free PDF + EPUB version of many textbooks in computer science and statistics.
2020-04-28: SICP in Python.
2020-04-28: Zawinski’s Law “Every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can.” Law of Software Envelopment, Jamie Zawinski.
Iterated random functions are used to draw pictures or simulate large Ising models, among other applications. They offer a method for studying the steady state distribution of a Markov chain, and give useful bounds on rates of convergence in a variety of examples. The present paper surveys the field and presents some new examples. There is a simple unifying idea: the iterates of random Lipschitz functions converge if the functions are contracting on the average. — Iterated Random Functions, by Diaconis and Freedman
Sounds already good to me.