Rstats

Multi-Group comparison in Partial Least Squares Path Models

This post is about multi-group partial least squares path modeling (PLS-PM). There is already a useful list of references on this blog post. As the author noticed, ensuring measurement invariance is often thought of as a prerequisite before dwelling into multi-group comparison, at least in the psychometric literature that I am familiar with. From a measurement perspective, this is easily understandable since we need to ensure that we are indeed measuring in a similar way the exact same construct in specific subpopulation.

lost+found 2015

Here are some draft notes, written in 2015, unfilled but not lost forever. With slight edits to accomodate a proper archive blog post. R and psychometrics (February 2015) I have been using R for most of my statistical projects since 10 years or so. In the beginning it really was an awesome software for psychometric modeling because there were some nice packages for multidimensional and optimal scaling, IRT modeling, and factor analysis, which were otherwise not available, at least on OS X.

Yet another gray theme for R base graphics

Among things I like with R is that if you are not happy with default settings, e.g. for graphics, then you can usually update some parameters or make your own plotting function. For instance, Karl Broman proposed his own theme for base R graphics, with a grey background for the main plotting region (à la ggplot2). He even uploaded a full package to CRAN; see the grayplot() function. Here is an example of a beautiful R graphical display:

Writing a Book

I spent the last month working hard to finish my book on biomedical statistics using R. It has been a pleasuring experience since I wrote this book using Emacs, R, and git. In fact, I spent several days and nights with a screen looking like the one shown below (this is Homebrew Emacs in fullscreen mode): Of course there are many things to fix and I’m not totally happy with the book as it stands.

R Pipes and Co

The R language is rapidly changing. I am afraid I’m still teaching R like I learned and liked it 10 years ago (but I was already aware of replicate() long ago 😄) although I try to keep myself regularly informed of what’s new on CRAN. It stucks in my head for two or three years now: Should I just stop teaching how to use lattice graphics and switch to ggplot2? If you are wondering why this causes me some problems, this is just because once students understand the advantage of using R formulae and the split-apply-combine strategy with aggregate() (and not plyr) for statistical modeling and data aggregation you are almost done.

R graphs cookbook

I just finished reading the R Graphs Cookbook (2nd ed.), by Jaynal Abedin and Hrishi V. Mittal, edited by Packt Publishing. Not to be confused with the R Graphics Cookbook, or its companion website, Cookbook for R. This is a basic introductory text on R graphics. Beyond building basic graphics, such as scatterplot, bar chart, or histogram, the authors show how to customise various elements of a statistical graphic using R base graphics (axis limits, axis labels, legend, etc.

Emacs Org-mode and literate programming

I’ve been using Emacs for editing and evaluating R code with ESS for a long time now. I also like Emacs for editing statistical reports and compiling them using knitr (and before that, Sweave), using plain $\LaTeX$ or just RMarkdown. Now, I’m getting interested in org-mode as an alternative to noweb, which I previously used when looking for a way to integrate different programming languages (e.g., sh, sed, and R) into the same document.

UseR 2014

Here are some notes on user2014 (no it’s not one of the anonymous poster on Stack Exchange!). The GitHub homepage can be found at https://github.com/user2014. I wish I could attend the conference but I am a bit short of time at the moment, so I’m just following its progress on Twitter. I just started reading material available on the conference site, especially the tutorials that were made available on line.

Reproducible research with R

I just finished reading two recent books in the R Series from Chapman & Hall: Reproducible research with R and RStudio (Christopher Gandrud), and Dynamic documents with R and knitr (Yihui Xie). Following my post on a good Workflow for statistical data analysis, I decided to take a look at the state of the art regarding the R statistical software. In fact, I’ve been using Sweave and knitr for a while now, and I tend to use knitr for everything but simple R scripts that can be self-contained.

Audit trails and statistical project management

In the context of a statistical project, “sanity checking” refers to the verification of raw data: whether they make sense, if there are any coding errors that are apparent from the range of data values, or if some data should be recoded or set as missing (Baum, CF, An introduction to Stata programming, Stata Press, 2009, p. 79).1 This should be recorded in an audit trail. For a general overview in the IT domain, see Rajeev Gopalakrishna’s tutorial.