Here is the latest bag of tweets*, which covers February 2011.
(*) These are interesting news that I found on Twitter and that I archive periodically.
- Encyclopedia of Machine Learning: This comprehensive reference covers many key topics in machine lea… http://bit.ly/gE4T81 #datamining (h/t kdnuggets, 25 Feb)
- Liked “Copy Number Variation and Genetic Disease | Learn Science at Scitable” http://ff.im/yPe32 (h/t yokofakun, 25 Feb)
- Apple refreshes MacBook Pros, Drops NVidia for AMD/Intel http://bit.ly/dECM8i (h/t VizWorld, 24 Feb)
- This looks awesome: Pattern, a Python module for mining web data http://is.gd/mv8kpl (h/t drewconway, 24 Feb)
- The Future of #R is #clojure? http://bit.ly/hdvtgX (h/t algoriffic, 24 Feb)
- The convex optimization sequence of S.Boyd is online, including videos. Great teacher, 2nd course is recommended http://bit.ly/djfSxR (h/t gappy3000, 24 Feb)
- Some good notes on Classification and Regression Trees (CART). I prefer CART over some of the literal decision trees http://t.co/V1ls9QU (h/t DataJunkie, 24 Feb)
- Re-implementing simple system utilities in Go, contrasted with the original C versions: http://j.mp/eLXhSw (h/t cortesi, 23 Feb)
- Emacs users consider using the mouse a “cache miss”: http://bit.ly/hlGS2y (h/t vsbuffalo, 23 Feb)
- the next Java is Java′ (not Scala, Clojure, Groovy, et al.) http://bit.ly/g5Oycr (h/t cemericck (h/t cemerick, 23 Feb)
- Only now learned of “Scala levels”. Seems like a sad (inevitable?) turn. Feeling re-vindicated in my bet on Clojure. http://bit.ly/igS8Qx (h/t cemerick, 23 Feb)
- Just read nice introduction to coding Metropolis-Hastings algorithm in R by Darren Wilkinson: http://tinyurl.com/45sppxd. #MCMC #rstats (h/t emble64, 23 Feb)
- RT @far_hat: The Importance of Reproducibility in High-Throughput Biology http://t.co/qv5Qt7W (@brent_p) Presentation for the great paper. (h/t vsbuffalo, 23 Feb)
- Does Incanter have support for sparse matrices? http://goo.gl/fb/UPM7h #clojure #SO (h/t planetclojure, 23 Feb)
- View/edit files on remote servers natively: with MacFUSE+MacFusion. No need for 10 terminals or X! http://t.co/aHt1xH4 http://t.co/XJae (h/t DataJunkie, 22 Feb)
- GPU-enabled PyMC for big data sets http://bit.ly/gm28dh (h/t fonnesbeck, 22 Feb)
- Feature: The neuroscience of addiction http://sns.ly/qNcey2 (h/t Neuro_science, 22 Feb)
- Very interesting ACM #datamining talk and Q&A by Mahout dev @ted_dunning about Log Likelihood Ratio tests and #recsys http://bit.ly/f6x1uw (h/t ogrisel, 22 Feb)
- Heng Li calls Lua the replacement of Perl he’s been looking for several years. http://bit.ly/i2K3Uh (h/t vsbuffalo, 21 Feb)
- Radix sort revisited - a nice brush-up of an under-used algorithm: http://j.mp/dLkqTg (h/t cortesi, 21 Feb)
- New Post: #Rstats versus Matlab in Mathematical Psychology http://bit.ly/gkhtqD (h/t JeromyAnglim, 21 Feb)
- “A cool and practical alternative to traditional hash tables” by authors from Microsoft Research http://bit.ly/hx0iAl (h/t CompSciFact, 21 Feb)
- RT: Very informative & timely NIH 2010 Genome analysis course with videos & hands-outs (http://bit.ly/gRKrFf). (via @iGenomics) (h/t vsbuffalo, 21 Feb)
- At Facebook last spring, we hosted a talk about combining R + Hadoop, with RHIPE. The video is now up! http://bit.ly/e5VOYI (h/t dataspora, 21 Feb)
- Visualizing dynamic programming http://goo.gl/gfUGx , #bioinformatics (h/t abhishektiwari, 21 Feb)
- Highly Recommended on Random Matrices: these notes by R. Vershynin http://bit.ly/i1OKBU (h/t gappy3000, 20 Feb)
- DSPL: Dataset Publishing Language - Google Code http://t.co/m6WlpyN (h/t berdote, 20 Feb)
- My Lisp Experiences and the Development of GNU Emacs http://post.ly/1e6sG (h/t irr, 20 Feb)
- I´m working with PowerHouse - Data Mining based on Information Theory. Testing some financial and medical datasets. http://bit.ly/fJtCqg (h/t i_314, 20 Feb)
- @hmason social science data! http://bit.ly/eOIACu (h/t drewconway, 20 Feb)
- I’m putting together a bundle of public research-quality datasets. What am I missing? http://bit.ly/f2cX4h (h/t hmason, 20 Feb)
- Love this! Dijkstra’s handwritten notes on Why numbering should start at zero (PDF): http://bit.ly/g51B7d (via @seanjtaylor) (h/t hmason, 19 Feb)
- This nice review makes me eager to read “#Mahout in Action”: http://bit.ly/e5Z4gY #Hadoop (h/t nicolastorzec, 18 Feb)
- Gene Discovery Provides Insight into Brain Formation http://sns.ly/qbbzy7 (h/t Neuro_science, 18 Feb)
- Lynch’s “Introduction to Applied Bayesian Statistics and Estimation for Social …” is available for download http://bit.ly/dFPI2o (h/t berndweiss, 18 Feb)
- New blog post: Protovis-GWT 0.3 released (full API implementation for basic Mark types) http://bit.ly/i1BOZP #protovis #gwt #infovis (h/t lgrammel, 18 Feb)
- Is Scheme Faster than C? http://post.ly/1ddmK (h/t irr, 18 Feb)
- HIV as you’ve never seen it before http://bit.ly/gIrgF6 (h/t newscientist, 18 Feb)
- Need #ARPACK, anyone? Compiling Fortran code to Java, http://icl.cs.utk.edu/f2j #epicwin epic win! (h/t akuhn, 18 Feb)
- PEGASUS is an open source Peta-Scale Graph Mining system. It runs in parallel, distributed manner on top of Hadoop. http://bit.ly/fithxv (h/t communicating, 18 Feb)
- Article: Working with Data in Protovis http://datavis.ch/go2SzZ (h/t datavis, 17 Feb)
- The many faces of operator new in C++ http://post.ly/1dKP2 (h/t irr, 17 Feb)
- Are there any good open source examples of JRuby + Clojure integration? http://goo.gl/fb/lfQdf #clojure #SO (h/t planetclojure, 17 Feb)
- #mapsforge: free and open toolbox to create new #OpenStreetMap based applications http://code.google.com/p/mapsforge/ (h/t iricelino, 16 Feb)
- Is there a good way to make Ruby talk to Clojure and vice versa, across some… http://goo.gl/fb/0AEw2 #clojure #SO (h/t planetclojure, 16 Feb)
- Liked “Very cool chart: the Top 200 drugs from 2009 and their structures:…” http://ff.im/xLh4k (h/t kshameer, 16 Feb)
- Nice tutorial on #NumPy arrays (#python) http://su.pr/2AoUjr (h/t boris_gorelik, 16 Feb)
- Efficient algorithms for computing pi from Ramanujan http://ow.ly/3Xep2 (h/t AnalysisFact, 16 Feb)
- Twitter Dots: Mapping all Tweets for a specific Keyword http://ldata.in/fYcOvv #analytics (h/t dHolowack, 16 Feb)
- Liked “Hadoop implementation of fast genome indexing with BWT http://code.google.com/p/genome-indexing/ #ngs” http://ff.im/y98NC (h/t yokofakun, 16 Feb)
- @moritz_stefaner you could consider checking http://oreil.ly/ekeU6h if you’re into #python. you can basically follow the recipes (h/t jcukier, 16 Feb)
- @vsbuffalo http://bit.ly/el72Ms ← Survey piece about #rstats #emacs #orgmode #git, et al. http://bit.ly/gpEMkh ← The .org source for it. (h/t kjhealy, 16 Feb)
- I love @quora. Asked question about Random Forests. Within minutes got this nice explanation http://www.quora.com/How-do-random-forests-work (h/t genetics_blog, 16 Feb)
- How to find the list of all coding snps / exomic variants in a given gene ? http://ff.im/y76OA (h/t kshameer, 16 Feb)
- Patrick Stein: Calculating the mean and variance with one pass http://tinyurl.com/5vmasny (h/t planet_lisp, 16 Feb)
- Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers by Boyd et al. http://bit.ly/gMRe01 (h/t ogrisel, 16 Feb)
- Very, very good article on C++: http://bit.ly/dIwZTK I’m new to C++, and this captures all of my early thoughts. Also why I love C. (h/t vsbufalo, 15 Feb)
- RT @socialwebmining: If you’re adapting MTSW examples to work w @infochimps data, check out infochimpy, a Python client http://rww.to/gMFYPs (h/t infochimps, 15 Feb)
- Recently reading this (http://bit.ly/gel2iB) puts this (http://bit.ly/dKLx4j) in a new perspective. (Via @medriscoll, @josephreisinger) (h/t johnmyleswhite, 15 Feb)
- Double brace initialization, covariant return types, and other “hidden features” of Java - http://t.co/ICnFyc0 (h/t fbahr, 15 Feb)
- Review of Strata tutorial and conference http://bit.ly/f2KhTZ #strataconf (h/t nbrgraphs, 15 Feb)
- This paper gives the first true probabilistic generative model for PCA… nearly 100 hundred years after PCA http://bit.ly/ge60yY (h/t gappy3000, 15 Feb)
- @vsbuffalo http://bit.ly/r-pack - is how Prof. Charles Berry (UCSD) maintains his R-package solely using #org-mode in #emacs, #rstats (h/t suncoolsu, 15 Feb)
- A decade’s perspective on DNA sequencing technology http://ff.im/y28iP (h/t kshameer, 15 Feb)
- The Illustrated Guide to Epigenetics http://bit.ly/hLt5yQ (h/t delahar, 14 Feb)
- Is Clojure or Haskell better for making command line tools? http://goo.gl/fb/WGXqE #clojure #SO (h/t planet_clojure, 14 Feb)
- “Two of the most famous products of Berkeley are LSD and Unix. I don’t think that is a coincidence.” - Unix Haters Handbook. (h/t vsbuffalo, 14 Feb)
- A great synthesis paper by the much missed Sam Roweis: A Unifying Review of Linear Gaussian Models http://bit.ly/hAgeHD (h/t gappy3000, 14 Feb)
- New on eagereyes: Anscombe’s Quartet http://t.co/0GvudDN #infovis #statistics (h/t eagereyes, 14 Feb)
- Data Mining map: http://bit.ly/h8xumD (h/t algoriffic, 14 Feb)
- City traffic visualized as blood vessels http://datafl.ws/17n (h/t flowingdata, 14 Feb)
- After watching this I’m convinced I’ve never seen real literate programming before. http://bit.ly/dSMRyh Wow. (h/t vsbuffalo, 14 Feb)
- Good piece by Peter Norvig (another Googler) on the state of artificial intelligence (AI): http://goo.gl/hLwzq (h/t mattcutts, 14 Feb)
- Learning with large datasets, by L. Bottou http://j.mp/gLUJbr (h/t gappy3000, 13 Feb)
- Online LaTeX Equation Editor - http://goo.gl/69lz (h/t kinggary, 13 Feb)
- Pros and cons of the term “data science”: http://j.mp/hJLWtY (h/t algoriffic, 13 Feb)
- GPU-accelerated version of LIBSVM available (C-SVC with RBF kernel only) using CUDA Framework. http://bit.ly/er5epG #machinelearning #gpu (h/t atlamp, 13 Feb)
- Polymaps - interactive map/tile #JavaScript library (from the primary developer of #Protovis) - http://bit.ly/ejNb3j (h/t MetaThis, 13 Feb)
- 99 Prolog problems (Lisp and Haskell too) http://post.ly/1c6bS (h/t irr, 13 Feb)
- Installing Trait-o-matic: http://snp.med.harvard.edu/ Find and classify phenotypic correlations for variations in whole genomes (h/t kshameer, 13 Feb)
- First of 10 lessons in using awk http://bit.ly/huR1KO #unixforpoets (h/t willf, 12 Feb)
- googleVis is an R package providing an interface between R and Google Visualisation API. Google Motion Charts with R http://bit.ly/gzYJuB (h/t i_314, 12 Feb)
- Riding the Elephant: bioinformatics and hadoop http://bit.ly/h3Llar (h/t jandot, 12 Feb)
- googleVis is an R package providing an interface between R and Google Visualisation API. Google Motion Charts with R http://bit.ly/gzYJuB (h/t i_314, 12 Feb)
- New substring search algorithm http://bit.ly/fdlsJU (h/t CompSciFact, 12 Feb)
- RT @cunyp001 #Epistatic Interaction Maps and its relation to Metabolic Phenotypes http://bit.ly/fxvo9I #epistasis (h/t moorejh , 12 Feb)
- A map of the world by alphabet. Truly fantastic. Can you name all of the scripts? http://bit.ly/gOtoHI (h/t azaaza, 12 Feb)
- Million song dataset, http://labrosa.ee.columbia.edu/millionsong/ (h/t zaxtax, 12 Feb)
- Which is a shame, because there are some great articles in there, e.g., this piece of scientific data viz http://bit.ly/g1GU1i (h/t drewconway, 11 Feb)
- Science magazine goes for a full-page word cloud for the cover of their special issue on “Dealing with Data” http://bit.ly/h8GnbF (h/t drewconway, 11 Feb)
- I’ve posted part 5/5 of my #protovis tutorial on working with data http://bit.ly/gBb7as, on layouts (treemaps, network graphs etc.) (h/t jcukier, 11 Feb)
- Call for Papers for BioVis 2011 http://bit.ly/flVLWj (h/t VizWorld, 11 Feb)
- Link for NHGRI event “A Decade with the Human Genome Sequence” (Excellent quality video) http://bit.ly/gTqnYy #genomics #bioinformatics (h/t kshameer, 11 Feb)
- RT @madmongol: RT @mikeloukides: Really cool. The #strataconf topic graph, based on tweets during the conference. http://bit.ly/eitGeR (h/t infochimps, 11 Feb)
- RT @tdhopper: “Testing the procedure on the data that gave it birth is almost certain to overestimate performance…” Mosteller & Tukey 1977 (h/t StatFact, 11 Feb)
- Want an intense, short course in machine learning? Join Hastie & Tibshirani in Palo Alto on 3/14-15: http://bit.ly/fktgeF (h/t medriscoll, 11 Feb)
- Short intro to “Dynamic documents with #rstats and #latex as an important part of reproducible research” http://bit.ly/eoW1kL (h/t berndweiss, 11 Feb)
- Deft is an open source web server (licensed under Apache version 2). Deft was intitially inspired by facebook/tornado. http://post.ly/1bcVq (h/t irr, 11 Feb)
- “From Machine Learning to Machine Reasoning”: interesting proposal and research direction from Leon Bottou: http://arxiv.org/abs/1102.1808 (h/t mdreid, 11 Feb)
- The elegance of protovis really shines in @jcukier’s fine tutorial series on data transformations in protovis http://t.co/7bD7rfu (h/t moritz_stefaner, 11 Feb)
- Less known than the GFS paper, “The Hadoop Distributed File System” published by Yahoo! in 2010 is worth reading. #HDFS http://goo.gl/O5J06 (h/t mfiguiere, 11 Feb)
- Thinking of using some data from @DataDryad to teach lmer to psychology students http://www.datadryad.org/handle/10255/dryad.1619 (h/t mja, 10 Feb)
- two factor auth for gmail as an option, nice! http://goo.gl/npeAr (h/t codinghorror, 10 Feb)
- Clinical diagnostics using next-generation sequencing - http://bit.ly/hzET6c (h/t GenomeQuest, 10 Feb)
- Nice review by B. Stranger et al MT @pvanbaarlen: Progress & Promise of #GWAS for Human Complex Trait Genetics http://bit.ly/dL5uos (h/t genetics_blog, 10 Feb)
- FOSDEM 2011 - Here is my late FOSDEM report. The nice thing is that all the guys have already written their… http://tumblr.com/x1g1h9029s (h/t zenogantner, 10 Feb)
- I’ve found technical report by guy who scored 3rd place in Kaggle R package recommendation engine competition: http://j.mp/hnp4xJ (h/t n0mad_0, 10 Feb)
- Enormous book on random number generation available for download http://bit.ly/fWTyvx (h/t CompSciFact, 10 Feb)
- nice! RT @onertipaday: Hothorn - Leisch “Case Studies in reproducibility” using #rstats BiB paper 2010: http://bit.ly/hUKtqf (h/t berndweiss, 10 Feb)
- Looking forward to reading the WikiGenes paper when it’s done. Draft is clunky, but looks like lots of good tools/ideas http://bit.ly/fGb7zI (h/t genetics_blog, 10 Feb)
- Access World Bank data directly from stata! WBOPENDATA draws from the main World Bank collections. To use: ssc install wbopendata #stata (h/t duke_data, 9 Feb)
- wrangler is exceptional: http://vis.stanford.edu/papers/wrangler. more databases people should aim to publish at chi. (h/t hackingdata, 9 Feb)
- RT @gwardis: The Human Genome, 10 Years Later - MIT Technology Review http://bit.ly/hG0Cvs #genomics #bioinformatics (h/t kshameer, 8 Feb)
- Another list of free public data sets http://j.mp/eU4PxI (h/t ilikedata, 8 Feb)
- Research Paper on #NumPy Array http://t.co/F9qTBdK (h/t gramfort, 8 Feb)
- Partitioning integers http://bit.ly/fZJy7o (complete with obligatory pretty pictures of the Mandelbrot set) (h/t algoriffic, 8 Feb)
- Reading up on color spaces http://bit.ly/ezqtlE (h/t lisaczhang, 8 Feb)
- Matplotlib pulling a ggplot2. Imitation is the sincerest form of flattery #rstats http://j.mp/hC51D0 (h/t gappy3000, 8 Feb)
- Could Fisher, Jeffreys and Neyman have agreed on testing? http://bit.ly/hRugxE (h/t StatFact, 8 Feb)
- Genetic variations are nowhere near independent http://ow.ly/3PxOb (h/t ProbFact, 8 Feb)
- Interesting Tutorial - Lectures on Computational Neuroscience - http://www.genesis-sim.org/GENESIS/cnslecs/cnslecs.html #Neuroscience (h/t i_314, 8 Feb)
- A Hidden Markov Model package for Weka http://bit.ly/e99nMb (h/t mxlearn, 8 Feb)
- interesting dataset to analyse: toronto.ca | Open - Dinesafe - http://t.co/smiIyrU Public Health - Healthy Environments Program (h/t attilacsordas, 8 Feb)
- Reading the Green Book from DoH. Very interesting, but very long! http://www.dh.gov.uk/en/Publichealth/Immunisation/Greenbook/index.htm (h/t emble64, 7 Feb)
- call for BMC Research Notes contributions promoting best practice in data standardization, sharing & publication http://goo.gl/ZLYvT (h/t yokofakun, 7 Feb)
- “Graphical elegance is often found in simplicity of design and complexity of data.” -Edward R. Tufte (h/t ffunction, 8 Feb)
- Block-adaptive randomization http://bit.ly/f72ajP (h/t StatFact, 6 Feb)
- Added 14 courses from Carnegie Mellon to collection of Free Online Courses. Now 350 courses in total: http://cultr.me/acBpsj (h/t openculture, 6 Feb)
- Redis: One Page Command References http://post.ly/1aHCg (h/t irr, 6 Feb)
- RT @mat_kelcey “Probabilistic and Unsupervised Learning” UCL 2009 http://bit.ly/bT4bmQ #machinelearning (h/t kshameer, 6 Feb)
- #rstats tip: tell R when you’re working with integers. http://bit.ly/g53qwH (h/t vsbuffalo, 6 Feb)
- Paris joins the open data movement http://bit.ly/fA5sV3 (h/t drewconway, 6 Feb)
- “I would be surprised if [Stack Overflow] turns out to provide better answers than…reading mailing list archives” http://bit.ly/g9PcDu (h/t spolsky, 5 Feb)
- RT @hmason: An easy way to extract useful text from web content: http://bit.ly/14y5en (thanks, @mmelliott103!) #strataconf (h/t DataJunkie, 5 Feb)
- An easy way to extract useful text from web content: http://bit.ly/14y5en (thanks, @mmelliott103!) #strataconf (h/t hmason, 5 Feb)
- Clojure: all the way to lisp http://goo.gl/fb/CyPbH #clojure (h/t planetclojure, 5 Feb)
- Shalizi’s Data Analysis Course http://bit.ly/eEeDj6 Great lecture notes and R code. Regression, prediction, bias/variance, bootstrap etc. (h/t seanjtaylor, 5 Feb)
- General Purpose Computer-Assisted Clustering and Conceptualization (unprotected article pdf) http://ow.ly/3QRKV (h/t kinggary, 5 Feb)
- Machine learning in Python with scikits.learn, today at #FOSDEM: http://fseoane.net/talks/fosdem-skl/ (h/t fpedregosa, 5 Feb)
- RT @OnlineBiotech: Linked electronic medical records for genomic research « ScienceRoll http://bit.ly/ftifU4 (h/t genomicslawyer, 5 Feb)
- Introduction to distributed file systems http://bit.ly/gRro4E (h/t CompSciFact, 5 Feb)
- The “named” field (covered in R Internals) is pretty awesome. #rstats http://bit.ly/i65J1p (h/t vsbuffalo, 5 Feb)
- There is a programming language called Merd. Not surprisingly, it is “on hold” http://bit.ly/fuCN7q (h/t gappy3000, 5 Feb)
- Clean your Data with Stanford’s Data Wrangler http://bit.ly/gr2G46 (h/t VizWorld, 5 Feb)
- Repeating cross-validation vs. adding more folds http://bit.ly/fmEhj3 (h/t metaoptimizeqa, 4 Feb)
- Bancroft’s rule (rule of thumb for estimating linear regression) http://bit.ly/eNN78 (h/t StatFact, 4 Feb)
- Data Wrangler, from the fantastic Stanford Vis Group looks amazingly useful. http://vis.stanford.edu/wrangler/ (h/t finklabs, 3 Feb)
- Probability Theory Review for Machine Learning http://post.ly/1ZUxA (h/t irr, 3 Feb)
- This looks like an awesome way to learn git: http://gitimmersion.com/ (h/t hmason, 3 Feb)
- MAchine Learning for LanguagE Toolkit http://post.ly/1ZTE4 (h/t irr, 3 Feb)
- Goog Refine competitor? RT DataWrangler: Visual data cleaning, reshaping, xform tool from Stanford/Berkeley. http://bit.ly/datawrangler (h/t arnabdotorg, 3 Feb)
- New at Byte Mining: Web Mining Pitfalls http://dlvr.it/Fkrym (working link now!) (h/t DataJunkie, 3 Feb)
- Updated readme for Scythe, with info on how it compares to the very nice program TagDust: http://bit.ly/dQZexa #bioinformatics #genomics (h/t vsbuffalo, 3 Feb)
- The most obvious way to compute correlation and covariance may have severe numerical problems. http://bit.ly/6zsNuh (h/t ProbFact, 3 Feb)
- 2007 article,“Statistical Analyses and Reproducible Research,” Now available for free at http://tinyurl.com/4wz8jh6 (h/t AmstatNews, 3 Feb)
- my take on functional programming and #rstats: http://bit.ly/hnQ6EE (h/t hadleywickham, 3 Feb)
- Strata 2011: The Keynotes http://bit.ly/fRG2y2 (h/t infosthetics, 3 Feb)
- Strata2011 Conference Videos on YouTube http://bit.ly/ifZT3H (h/t mxlearn, 3 Feb)