aliquote.org

lost+found 2011

January 1, 2012

Unfinished and very drafts posts from 2011.

Switching font in $\LaTeX$

(January 2011)

The default fonts that are used when writing a $\LaTeX$ document are known as Computer Modern (CM) or Extended Computer Modern (EC). The lmodern package allows to handle vector font (Type 1), so that you can use diacritical signs (e.g., when typesetting in French). For this to be possible, you just have to add

\usepackage[T1]{fontenc}
\usepackage{lmodern}

in the preamble of your $\LaTeX$ file. There is also the cm-super package whose description on CTAN should be self sufficient:

The CM-Super package contains Type 1 fonts converted from METAFONT fonts and covers entire EC/TC, EC Concrete, EC Bright and LH fonts (Computer Modern font families). All European and Cyrillic writings are covered. Each Type 1 font program contains ALL glyphs from the following standard LaTeX font encodings: T1, TS1, T2A, T2B, T2C, X2, and also Adobe StandardEncoding (585 glyphs per non-SC font and 468 glyphs per SC font), and could be reencoded to any of these encodings using standard dvips or pdftex facilities (the corresponding support files are also included).

Now, $\LaTeX$ has revolutionized the way we can change the default font for Serif, Sans Serif, and Monospace families. See XeTeX on the Web.

It is Not only easy to switch default fonts in a $\LaTeX$ document, but also within a $\TeX$ or Context document. In what follows, I shall describe some simple commands to change fonts in each case.

$\LaTeX$

The straightforward way to handle other font families in a $\LaTeX$ processed through $\XeLaTeX$ is by way of the fontspec package. The fontspec package allows users of either xetex or luatex to load OpenType fonts in a $\LaTeX$ document.

So, just put

\usepackage{fontspec, xltxtra}
\setmainfont{Apple Garamond}
\setsansfont[Scale=0.86]{Fontin Sans}
\setmonofont[Scale=0.8]{Menlo}

in your preamble, and you’ll get a nice document with Garamond for Serif text. Note that xltxtra loads fontspec by default. Here is how it looks:

ex1

Optional arguments can be passed to \set*font, or globally with fontspec: Mapping=tex-text, Ligatures={Common, Historical}, Numbers={OldStyle, Lining}.

More detailed explanations can be found in the documentation ($ texdoc fontspec) for the fontspecpackage.

Font size can also be changed anywhere in the document using

\fontsize{12pt}{18pt}\selectfont

which result in There is also a Scale= parameter (i.e., Scale=1.4 would result in the given font diplayed at 140%); usually we use it to ensure that all characters are well aligned between the different font families, like in:

\defaultfontfeatures{Scale=MatchLowercase}

So, for instance,

A lot of fonts come with a TeXLive installation, as can be seen below

$ ls /usr/local/texlive/2010/texmf-dist/doc/fonts/ | wc -l
    161

Don’t forget to cache all your fonts

$ fc-cache -fv

Here, I look if I have Garamond somewhere

$ fc-list | grep -i gara

$\TeX$

For example, the following instructions contained in a file tex_ttf.tex

\font\1="Inconsolata"\1
\input knuth.tex
\bye

will typeset a short fragment using the freely available Monospace font Inconsolata. The knuth.tex file comes with Context. On my TeXLive installation, it is found in the sample directory which is /usr/local/texlive/2010/texmf-dist/tex/context/sample/. The compilation is done with

$ xetex tex_ttf.tex

and the result is shown below.

knuth

Context

The default font used when typesetting in Context is Latin-modern. However, it is possible to use OTF or Type 1 font (with mkii and mkiv).

Font handling usually proceeds from so-called Typescripts.

The font database can be generated using

$ mtxrun --script fonts --reload

Fonts in ConTEXt Manual section on Fonts available from the SVN repository. See also Fonts in LuaTeX.

What best combination of fonts to use?

Did you ever wonder what combination of Serif, Sans and Mono does give the best rendering (on screen, and for printing)? I’m going to ask this on tex.stackexchange.com because I didn’t find anything reliable on Google.

Kieran Healy uses a nice combination of Minion Pro, Myriad Pro, and Pragmata (template available at kjh-vita).

According to the American Association of University Presses, Minion, ITC New Baskerville and FF Scala & 4. FF Scala Sans are the top three best fonts, but see Top Typefaces Used by Book Design Winners.

[2022-04-01]
Apparently, the FontFeed website is dead. See What Happened to Fontfeed.com & Top Alternatives to learn more about it, and thanks June for the followup notice!

Outlier detection

(February 2011)

This is a very small note on outlier detection with methods based on PCA, Random Forests and Malahanobis distances.

The motivation for this “benchmark” originates from a question at stats.stackexchange.com.

The dataset we will use is the iris data. It is composed of four measures collected on threes species (each with 50 individuals): sepal length and width, and petal length and width.

Brushing the data

Before starting to look at the automatic detection of possible outliers, let’s go though the data itself. A rough picture is provided below, with smooth estimates of the within-group tendencies superimposed onto the scatterplot matrix. The code that is used to generate this graphics is shown below:

library(lattice)
trellis.par.set(list(fontsize = list(text=9, points=6)))
splom(iris[,1:4], groups=iris[,5], type=c("p","g","smooth"),
      lwd=2, pch=19, alpha=.4, auto.key=list(space="top", columns=3))

xyplot

A convenient way to look interactively at the data is to rely on some kind of brushing. Actually, without resorting to GGobi, it is available through iPlots, or its next-gen version Acinonyx, or the tkBrush() function from the TeachingDemos package. (The latter makes use of Tk so make sure it is installed on your machine.)

data(iris)
library(TeachingDemos)
tkBrush(iris)

Well, we spotted three individuals that seem to exhibit some extreme pattern of measures when considering sepal width x sepal or petal length. These are inviduals 42, 118, and 132, see .Last.value$pch. It should be emphasized that this way of doing is plenty wrong if we look for multivariate outliers: Here, we are just looking at bivariate relationships, so it is likely that we will fail to identify good candidates in the 4D space.

brush

PCA outlier detection

Let’s plot individual scores on the last two components, which are supposed to account for the lowest amount of variance:

iris.pca <- princomp(iris[,-5])
plot(iris.pca$scores[,3:4])

Then, we can highlight the most extreme points in this point cloud using, for example, convex hull. Here is how it would look

chpts <- chull(iris.pca$scores[,3], iris.pca$scores[,4])
polygon(iris.pca$scores[chpts,3], iris.pca$scores[chpts,4], lty=3)

xyplot

The most extreme points are

> chpts
  [1]  85 135 123  63  88  69  42 142 115 137 101

RF outlier detection

The idea is to use the proximity measure that is derived from an RF model. It is readily available in the outlier function of the randomForest package.

set.seed(1)
iris.rf <- randomForest(Species ~ ., data=iris, proximity=TRUE)
outl <- outlier(iris.rf)
head(sort(outl, decr=TRUE), n=10)

In the next figure, I have highlighted those 10 indivduals in red:

pairs(iris[,-5], col=ifelse(outl>68.78, "red", "black"))

rfoutlier

Malahanobis outlier detection

This time, we use the mvoutlier and the chisq.plot() function that will plot the ordered robust Mahalanobis distances of the data against the quantiles of the χ2 distribution.

library(mvoutlier)
chisq.plot(iris[,1:4])

Distributed computing

(February 2011)

Xgrid

PVM

An alternative way to go with distributed computing or virtualization on OS X is to use pvm. There are detailed instructions in a README file but something went wrong for me (and I guess, it applies to everyone with a Mac). So, here is how to compile pvm3:

  1. Set an environment variable for the place you want pvm to live in; I choose export PVM_ROOT=/usr/local/share/pvm3.
  2. If you decide to put pvm3 in a non writable directory (as above), change permission accordingly (e.g., sudo chown username /usr/local/share/pvm3).
  3. Edit the Makefile at the toplevel directory and change AIMKSTR to reflect the correct def file; in my case, -here -f ./conf/MACOSX.def -f ./Makefile.aimk.
  4. Edit the file src/pmsg.c and comment out the static redefinition of the xdr_float() and xdr_double() functions (or find a way to alter the -DFAKEXDRFLOAT flag that is being passed to cc when compiling.

A short implementation of the shuf program for Mac OS X.

(August 2011)

Based on a solution posted on Stack Overflow.

use List::Util 'shuffle';

@list = <STDIN>;
@shuffled = shuffle(@list);
$n = $#ARGV+1 == 1?$ARGV[0]-1:$#shuffled;
print @shuffled[0..$n];

Testing Mahout

(August 2011)

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Mahout ..................................... SUCCESS [18.464s]
[INFO] Mahout Build Tools ................................ SUCCESS [11.130s]
[INFO] Mahout Math ....................................... SUCCESS [34.161s]
[INFO] Mahout Core ....................................... SUCCESS [1:06:19.256s]
[INFO] Mahout Integration ................................ SUCCESS [2:45.826s]
[INFO] Mahout Examples ................................... SUCCESS [1:29.287s]
[INFO] Mahout Release Package ............................ SUCCESS [0.026s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1:11:39.255s
[INFO] Finished at: Sat Aug 06 01:59:43 CEST 2011
[INFO] Final Memory: 25M/97M
[INFO] ------------------------------------------------------------------------

See Also

» lost+found 2010 » Switching to Textpattern » Blogging about this Jekyll blog