Here is the latest bag of tweets*, which covers May 2011.
(*) These are interesting news that I found on Twitter and that I archive periodically.
- Learning of Sparse Invariant Representations http://arxiv.org/abs/1105.5307 2 layers sparse learning:1st is usual,2nd is learn dependancies (h/t sergecell, 29 May)
- Very helpful link for finding solutions to common R related probs and tasks - R cookbook - http://bit.ly/r-cookbook #rstats (ht/ suncoolsu, 29 May)
- You can now submit runs from the command-line using http://mlcomp.org/mlcomp-tool (h/t mlcomp_news, 29 May)
- Matrix Factorization: A Simple Tutorial and Implementation in Python – http://bit.ly/doaUi (h/t smarttypes, 29 May)
- Machine Learning Module (Tutorials) http://post.ly/26yKy (h/t irr, 28 May)
- Why is machine learning not more widely used for medical diagnosis? http://bit.ly/lWAsFc (h/t mxlearn, 28 May)
- Article: Understanding CART and Related Methods http://ow.ly/54R3E (h/t salfordsystems, 28 May)
- An “almost exhaustive” search-based sequential permutation method for detecting epistasis in disease association studies http://goo.gl/BF8OU (h/t genetics_blog, 26 May)
- How to Use GNU Screen http://post.ly/25bha (h/t irr, 24 May)
- Phenome-wide association studies (PheWAS) for exploration of novel geno-pheno relationships & pleiotropy discovery http://bit.ly/j2wsIv (h/t genetics_blog, 23 May)
- A ton of interesting ICML papers http://www.icml-2011.org/papers.php (h/t benm, 23 May)
- “[People] are sensitive to patterns, and are quick to spot them where they exist, and even when they don’t exist.” - http://t.co/8zFvFGf (h/t fbahr, 23 May)
- “Anchors: Software for Anchoring Vignettes Data” in press, Journal of Stat’l Software, http://ow.ly/50jXY (h/t kinggary, 23 May)
- Spark - A Sinatra inspired micro web framework for quickly creating web applications in Java with minimal effort http://post.ly/256Ka (h/t irr, 22 May)
- New data and statistics search engine http://www.zanran.com/q/ (h/t StatFact, 21 May)
- Data Mining with R: Learning with Case Studies | free programming … http://bit.ly/m93phi (h/t DataMiningTips, 21 May)
- Another python package seems to be extremely useful: http://www.scipy.org/SciPyPackages/Sparse - never used it before (h/t n0mad_0, 21 May)
- The pursuit of genome-wide association studies: where are we now? http://ff.im/DPvLp (h/t kshameer, 21 May)
- Woo, an Introduction to Machine Learning with Web Data is up for sale! http://oreil.ly/jYDMR8 (h/t hmason, 20 May)
- Publicly available expression profiles+MRI+DTI for 1000s of regions in human brain http://www.brain-map.org (eg SORT1: http://bit.ly/iHRoMn) (h/t genetics_blog, 20 May)
- MT @NCBI BioProject (formerly Genome Project) a collection of genomics, functional genomics & genetics studies (http://1.usa.gov/lYGKmW) (h/t yokofakun, 19 May)
- Good comments and discussion on the role of chart choice in decision-making, including @Jon_Peltier and @hadleywickham. http://t.co/SxppKOc (h/t eagereyes, 19 May)
- Stigler’s History of Statistics now available as an ebook http://ow.ly/4YbEs via @simonbriscoe (h/t StatFact, 19 May)
- Started using @bufferapp this morning. It’s liberating… and addictive! You gotta try this http://bit.ly/m1R9OW (free) (h/t dHolowack, 19 May)
- TermKit looks very promising: http://t.co/puZdb2F /via @fjossinet (h/t mdreid, 19 May)
- Some beautiful mathematical art projects here - http://danielwalsh.tumblr.com/ (h/t benm, 19 May)
- Eureka moment under the shower this morning after seeing this http://bit.ly/mOk0XX : scalable hierarchical unsupervised feature extraction (h/t ogrisel, 19 May)
- Good articles by Vivek Haldar: http://bit.ly/mqVCYu (Emacs), http://bit.ly/f6Mw6F (Minimalism), http://bit.ly/ebNrdG (Unix) (h/t JohnDCook, 19 May)
- Mapping locations in R with the Data Science Toolkit: Pete Warden’s Data Science… http://goo.gl/fb/HAYtA #rstats (h/t Rbloggers, 19 May)
- Exploring the Human Disease Network http://hudine.neu.edu/ (h/t genetics_blog, 19 May)
- The Redis documentation is a thing of beauty. https://github.com/antirez/redis-doc (cc @mwbrooks && @balmer) (h/t brianleroux, 18 May)
- If you still subscribe to Beadle & Tatum’s 1 gene, 1 enzyme, 1 function hypothesis, please read http://bit.ly/mMJ6WI & http://bit.ly/jSgHc4 (h/t genetics_blog, 17 May)
- VEGAS (VErsatile Gene-based Association Study) for gene-based association tests http://bit.ly/ktHKzM (paper: http://bit.ly/k6Jul4) #GWAS (h/t genetics_blog, 17 May)
- Mapping Health, a great #dataviz by Damien Leri, built in #D3: http://j.mp/jLCCdN (h/t JanWillemTulp, 17 May)
- In case you’ve missed it: check out my latest #dataviz: http://j.mp/ktmwsZ (blog post: http://j.mp/kDO312, flickr set: http://j.mp/ml5ZVx) (h/t JanWillemTulp, 17 May)
- John Langford of Yahoo Research on Research Directions for Machine Learning & Algorithms: http://goo.gl/xHc90 (h/t bigdata, 17 May)
- @DataJunkie have you discovered @psychemedia’s blog? It’s great to get up and running with gephi. http://blog.ouseful.info (h/t neilkod, 17 May)
- “When I see a paper with lots of significance tests, I think the researchers are p-ing all over the research” Herman Friedman (h/t PerterFlomStat, 16 May)
- Latest Phil Trans B is all about placebos. Interesting that none of the articles is a control. http://bit.ly/iLCw1jc /via @StuartJRitchie (h/t mja, 16 May)
- Before computing a large SVD, se if you can update an existing one for cheap http://t.co/FSxpTN2 #algebra #machinelearning (h/t gappy3000, 16 May)
- Effective statistics : http://bit.ly/lLoAWf (h/t jrideout, 16 May)
- A Not Very Short Introduction To Node.js http://post.ly/22sJ5 (h/t irr, 16 May)
- Scientific Data Mining: A Practical Perspective Buy Discount eBook http://bit.ly/lzgaaM (h/t DataMiningTips, 15 May)
- A Python module for extracting relevant tags from text documents: http://bit.ly/lmcjEc (h/t jedisct1, 15 May)
- Is there an algorithm to check positiveness or negativeness in a sentence? http://bit.ly/jwj5ox (h/t metaoptimizeqa, 15 May)
- M-x occur, or M-s o is very handy. M-x multi-occur-in-matching-buffers is awesome too. #Emacs (h/t vsbuffalo, 15 May)
- Chris Lattner on “What Every C Programmer Should Know About Undefined Behavior” http://bit.ly/j50VLx (h/t vsbuffalo, 15 May)
- Here is a presentation by @atveit introducing Map Reduce in the context of Web Search, with many examples… http://bit.ly/lqUSI2 #mapreduce (h/t nicolastorzec, 15 May)
- Does http://www.opensource.apple.com/ mark a new era in Apple Open Source? How long as this much code been open source by Apple? (h/t vsbuffalo, 14 May)
- Anybody read this? “Building Bioinformatics Solutions: with Perl, R and MySQL” http://amzn.to/js206i #Perl #Rstats #GWAS #bioinformatics (h/t genetics_blog, 14 May)
- RT @TerryHeaton: Not only is this a great story of finding a stolen laptop; it’s also a beautiful use of Storify. http://bit.ly/jI3dCO (h/t mathewi, 13 May)
- Slides from NYC R/#predictive #analytics meetup on Caret Package. http://bit.ly/kHlYr1 #rstats (h/t NPHard, 13 May)
- The impact of next-generation sequencing technology on genetics http://ff.im/Di8vh (h/t kshameer, 13 May)
- From Our Blog: Announcing Leaflet: a Modern Open Source JavaScript Library for Interactive Maps http://bit.ly/iM7n87 (h/t cloudmade, 13 May)
- Working with the Google Chart Tools Python library to produce some nicely-formatted tables, soon charts. http://bit.ly/m6qqjF (h/t neilkod, 13 May)
- The most detailed explanation I’ve ever seen on how #CouchDB is implemented (even if a bit dated): http://bit.ly/iyBbE7 (h/t MetaThis, 13 May)
- “The best theory is inspired by practice. The best practice is inspired by theory.” – Donald Knuth (h/t CompSciFact, 13 May)
- Trying the switch to iTerm2 http://bit.ly/iOi7N7. I may even use regular Emacs in it… (h/t vsbuffalo, 13 May)
- Free ebook “Programming Languages: Application & Interpretation” by Shriram Krishnamurthi http://tinyurl.com/46zm2 // via @alexott_en (h/t CompSCiFact, 12 May)
- via R-pkgs: mvmeta contains functions to run fixed/random effects meta-analysis/meta-regression on multiple outcomes http://bit.ly/kli9w5 (h/t berndweiss, 12 May)
- @genetics_blog no problem also check out his cubit for R package. http://bit.ly/l1Vebn (h/t NPHard, 12 May)
- New github repo for pylearn the GPU-powered python machine learning library http://bit.ly/l3rIEg /by @dwf & friends (h/t ogrisel, 12 May)
- #Scala 2.9.0 final is now out! If you want to learn a bit about Scala, here’s a nice, concise intro: http://bit.ly/jaI0Dl (h/t jasonbaldridge, 12 May)
- If you don’t have ready access to Bishop’s book, this Latent Variable Models survey will do: http://bit.ly/mrOnzk (h/t gappy3000)
- Twitter sparklines http://kottke.org/x/4poz (h/t kottke, 11 May)
- 402 Citations Questioning the Indiscriminate Use of Null Hypothesis Sig. Tests in Observational Studies by B. Thompson http://t.co/EIFpNn8 (h/t perfectalgo, 11 May)
- analyzing tabular data : #Knime vs shell script : http://openwetware.org/wiki/User:Lindenb/Notebook/UMR915/20110511 (h/t yokofakun, 11 May)
- How to map connections with great circles http://datafl.ws/1bx (h/t flowingdata, 11 May)
- Tip of the day: learn about regularized / shrunk covariance estimation with @scikit_learn: http://bit.ly/keMj2L (h/t ogrisel, 11 May)
- easy version control http://t.co/KfiyYWE (h/t hughfrench, 11 May)
- dimensionality reduction using random projections #blog http://bit.ly/j1gUDC (h/t mat_kelcey, 11 May)
- RT @hadleywickham my essential #rstats vocabulary: http://bit.ly/j9kBvC (h/t abmathewks, 11 May)
- Data Mining : an Overview | Biomedical Research http://bit.ly/ikeUDd (h/t DataMiningTips, 10 May)
- Nice slides: [Machine Learning and IR: Recent Successes and New Opportunities] , ECIR2010, http://j.mp/mC69oT (h/t n0mad_0, 10 May)
- Nikodemus Siivola: SBCL 1.0.48 released http://tinyurl.com/6h2yykb (h/t planet_lisp, 10 May)
- A little collection of cool unix terminal/console/curses tools http://post.ly/20ye (h/t irr, 9 May)
- ado-mode-1.11.2.0 ready: http://homepage.mac.com/brising/Stata. Fixed numerous bugs for sending code in Windows, updated for Stata 11.2. (h/t adomode, 9 May)
- Common SNPs in/near genes explain more variance, and h2 explained proportional to chromosome length http://bit.ly/ktDvxy #GWAS (h/t genetics_blog, 9 May)
- dbNSFP: a lightweight database of human non-synonymous SNPs and their functional predictions http://ff.im/D0zAb (h/t kshameer, 9 May)
- Never new this - PLINK (–flipscan) can use LD to find incorrect strand designation http://bit.ly/koqdvI #GWAS (h/t genetics_blog, 9 May)
- 1 in 38 children may have an autism spectrum disorder: http://bit.ly/jvAlUN (h/t newscientist, 9 May)
- SciDB integrates with R . http://bit.ly/h1zx0r #rstats (h/t NPHard, 9 May)
- COLT 2011 Call for Participation: The 24rd Annual Conference on Learning Theory (COLT 2011) will take place in B… http://bit.ly/mbJP16 (h/t PASCALNetwork, 9 May)
- Generic Programming with C++ 0x http://post.ly/20i9P (h/t irr, 9 May)
- Practical NoSQL - Solving a Real Problem with MongoDB and Redis http://post.ly/20hzk (h/t irr, 9 May)
- Looks interesting: Nature Genetics: Genome partitioning of genetic variation for complex traits using common SNPs http://bit.ly/ktDvxy (h/t genetics_blog, 8 May)
- visualizing Mahout’s output with Clojure and Incanter http://goo.gl/fb/RwUH2 #clojure (h/t planetclojure, 8 May)
- Find lines in file1 that do not appear in file2: fgrep -vxf file1 file2 (h/t DataJunkie, 8 May)
- The brain performs visual search near optimally http://sns.mx/q4d3y5 (h/t TheNeuroScience, 8 May)
- LevelDB is a library that implements a fast key-value store http://post.ly/20beq (h/t irr, 8 May)
- just released couchapp 0.8 :) Lot of fixes, autopush feature, standalone on macosx and windows… http://t.co/Oq6DMSr enjoy :) #couchdb (h/t benoitc, 8 May)
- spark: scala parallel processing tool from berkeley http://bit.ly/4pDQgl barrier-less iterative processing for distributed data. (h/t lalasusu999, 8 May)
- Slides: “Accessing Databases from R” #rstats: For the past few meetings of the… http://goo.gl/fb/qAz43 #rstats (h/t Rbloggers, 8 May)
- @eddelbuettel @HarlanH the boost vocabulary is a little different though: point, duration and interval. It’s certainly much refined in JODA (h/t hadleywickham, 7 May)
- Hyperlearning & Schizophrenia: DISCERN neural network can learn natural language & simulate neurological dysfunction http://goo.gl/pKPPV (h/t bigdata, 7 May)
- Before the new iPhone update starts deleting your location data, you can use https://openpaths.cc/ to keep a record of it for your own use. (h/t johnmyleswhite, 6 May)
- Fast implementations of CCA are now in python [http://bit.ly/mosvo4] cc: @cmastication @gappy3000 @DataJunkie (h/t siah, 6 May)
- icml 2011 abstracts http://t.co/OlS2KBd " Topic Modeling with Nonparametric Markov Tree", “Bayesian CCA via Group Sparsity” #icml2011 (h/t josephreisinger, 6 May)
- Randomized algorithms for matrices and data http://arxiv.org/PS_cache/arxiv/pdf/1104/1104.5557v1.pdf (h/t gappy3000, 5 May)
- Using cross-validation to evaluate predictive accuracy of survival risk classifiers based on high-dimensional data http://ff.im/CI2kC (h/t kshameer, 5 May)
- Validation in Genetic Association Studies http://ff.im/CI2kB (h/t kshameer, 5 May)
- pathClass: an R-package for integration of pathway knowledge into support vector machines for biomarker discovery http://ff.im/CI2kA (h/t kshameer, 5 May)
- The meta-analysis of genome-wide association studies http://ff.im/CI2ky (h/t kshameer, 5 May)
- Free ebook: Matters Computational http://www.jjj.de/fxt/ via @nealrichter (h/t CompSciFact, 5 May)
- Updates to GMM Python package for modeling network evolution using graph motifs; new paper at arXiv http://bit.ly/luYBTw (h/t drewconway, 5 May)
- Using R for Map-Reduce applications in Hadoop http://bit.ly/ivaY4I #greader (h/t neilfws, 5 May)
- “Cascalog is an amazing, life changing tool for anyone who needs to analyze data sets using Hadoop.” http://bit.ly/kNDtO1 (h/t nathanmarz, 4 May)
- ooh, brain: a library for javascript neural networks and classifiers http://bit.ly/mdKJzw (via @littlecalculist) (h/t hmason, 4 May)
- A collection of cool terminal tools you may not know (with screenshots) http://j.mp/k3FWRJ #dev #linux #tools #programming (h/t kkovacs, 4 May)
- bashreduce: mapreduce in bash: http://bit.ly/m0VeTT (h/t vsbuffalo, 4 May)
- just discovered Samuel Huckins’ blog http://dancingpenguinsoflight.com/ lots of ruby, python, bash, ubuntu, and vim tips (h/t JeromyAnglim, 4 May)
- Visualize multi-dimensional data on a 2-dim map? http://bit.ly/mTkLn1 (h/t metaoptimize, 4 May)
- Running R on an iPhone/iPad with RStudio: This thread has been widely discussed on a… http://goo.gl/fb/U1BFC #rstats (h/t Rbloggers, 3 May)
- Mathematica developers answer your questions in our latest Q&A series post. http://bit.ly/mx8Ipr (h/t WolframResearch, 3 May)
- combining multiple classifiers http://bit.ly/jdxrpl (h/t metaoptimizeqa, 3 May)
- writing #rstats code with #rstats some very rough draft notes: http://bit.ly/iZbUbz. What do you want to know how to do? (h/t hadleywickham, 3 May)
- New book - Scaling Up Machine Learning http://www.cs.umass.edu/~ronb/scaling_up_machine_learning.htm (h/t JeffD, 3 May)
- Very well-behaved and efficent force-directed layout by @mbostock in #D3 http://t.co/jRkYGx1” (via @JanWillemTulp) (h/t moritz_stefaner, 3 May)
- An #rstats wrapper for the Data Science Toolkit by @rtelmore http://bit.ly/kyvjdL (h/t drewconway, 3 May)
- More xargs love: ls | grep “.*[ATCG]\.bam” | xargs -I{} scp {} markov:~/Desktop/ (h/t vsbuffalo, 3 May)
- Efficient storage of high throughput DNA sequencing data using reference-based compression http://bit.ly/l1OOTn #citeulike (h/t neilfws, 3 May)
- Using the R Package crlmm for Genotyping and Copy Number Estimation http://bit.ly/mAqvVy #greader (h/t neilfws, 3 May)
- COLT 2011 accepted papers are up: http://colt2011.sztaki.hu/accepted_papers.html (Tip: Googling for titles typically gets you preprints). (h/t mdreid, 3 May)
- https://github.com/micha/jsawk “Jsawk is like awk, but for JSON” (h/t mikedewar, 3 May)
- The Limitations of Decision Trees and Automatic Learning in Real World Medical Decision Making http://bit.ly/lsyC1s #datamining (h/t zyxo, 2 May)
- #FDA issues draft guidance on processing/reprocessing medical devices - FDA.gov http://ow.ly/4LqId #medicaldevice #regulatory (h/t RAPSorg, 2 May)
- How to get a random line from a text file in bash: http://bit.ly/mNjWxH (h/t hmason, 2 May)
- @moritz_stefaner no experience with it yet, but perhaps #Needlebase ? http://needlebase.com/ (h/t JanWillemTulp, 2 May)
- Intro to Mahout – DC Hadoop http://slidesha.re/loWR1h (h/t gsingers, 2 May)
- The R Inferno revised http://bit.ly/k03OOw hell is new and improved #rstats (h/t portfolioprobe, 2 May)
- COLT 2011 reviews are back and my paper on mixability (with @tverven and Bob Williamson) was accepted! Preprint here: http://t.co/su8JgKA (h/t mdreid, 2 May)
- Model selection in Weka through cross validation for regression problems http://bit.ly/iRFZu6 (h/t metaoptimizeqa, 2 May)
- Rolling a random walk on a sphere http://ow.ly/4KCsK (h/t ProbFact, 1 May)
- ‘Algorithms for Estimating Relative Importance in Networks’ – [pdf] http://bit.ly/lrcxvM – via @twarko ’s http://bit.ly/e06hFT (h/t smarttypes, 1 May