Here is the latest bag of tweets*, which covers April 2011.
(*) These are interesting news that I found on Twitter and that I archive periodically.
- Chris Manning’s NLP lectures at Stanford are now available online [http://bit.ly/lEq9qm] (h/t siah, 29 Apr)
- Liked “In defense of ‘Omics” http://ff.im/CchcJ (h/t yokofakun, 28 Apr)
- the Rubicon Project gets down with Machine Learning. Brush up on your ML knowledge here: http://bit.ly/efsAI5 (h/t CChap3, 28 Apr)
- New blog post: Lessons learned from manually classifying CIFAR-10 dataset http://karpathy.ca/myblog/?p=160 [with MATLAB code] (h/t karpathy, 27 Apr)
- IQ tests measure motivation - not just intelligence. http://ow.ly/4I5qc #SitN (h/t AmSciMag, 27 Apr)
- Object shell http://geophile.com/osh/ (h/t SciPyTip, 27 Apr)
- Tutorial on Twitter Sentiment analysis with Python and OpenNLP: http://t.co/4T1JH6O #nlproc (h/t ogrisel, 27 Apr)
- 3 resources for #chart / #visualization classification: http://j.mp/cE5q27 http://j.mp/e4imS8 http://j.mp/dQMUX2 (h/t JanWillemTulp, 27 Apr)
- download the app to create the cover of Creative Review: http://j.mp/gBvaL6 (Mac only) (via @creativeapps) (h/t JanWillemTulp, 27 Apr)
- these #typography #maps look really cool: http://j.mp/ibpvOD (h/t JanWillemTulp, 27 Apr)
- Best Java/Ruby/JRuby Machine Learning Libraries? http://bit.ly/fIoDA3 (h/t mxlearn, 27 Apr)
- Free tools for data cleaning, visualization and analysis http://bit.ly/dY9rzK #kdnuggets (h/t DataMiningTips, 26 Apr)
- PheGenI: Phenotype-Genotype Integrator: http://www.ncbi.nlm.nih.gov/gap/PheGenI #genomics #bioinformatics (h/t kshameer, 26 Apr)
- Strengthening the reporting of genetic risk prediction studies: the GRIPS statement http://ff.im/C0nit (h/t kshameer, 26 Apr)
- Favorite #data #mining books, by Vincent Granville: http://linkd.in/e0goTm. #in (h/t neslonforte, 26 Apr)
- A New Perspective in the Cognitive Science of Attention and Action http://sns.mx/q6d3y3 (h/t TheNeuroScience, 26 Apr)
- Medical research should consider social networking instead of relying on traditional trial design: http://j.mp/gDEBrb /via @NatureMedicine (h/t moahWG, 26 Apr)
- great presentations / resources / slides / tips and tricks for neuroimaging made available by the #Martinos Center http://t.co/43E94hB (h/t gramfort, 26 Apr)
- radial gradient fills in development branch of #protovis: http://j.mp/hzPTv8 examples: http://j.mp/gK3oX6 (by @brendansterne) (h/t JanWillemTulp, 26 Apr)
- Snapshots for HDFS: http://www.cs.berkeley.edu/~sameerag/hdfs-snapshots.pdf (h/t hackingdata, 26 Apr)
- Paper on real meaning of British English, e.g. “Not bad” -> “good”; “With the greatest respect” -> “You must be a fool” http://see.sc/1mevH4 (h/t almostandy, 26 Apr)
- 4 lines of R to get you started using the Rook web server interface http://bit.ly/e7LWHU #greader (h/t neilfws, 26 Apr)
- Annotated Manhattan plots and QQ plots for GWAS using R, Revisited http://bit.ly/gKENvN #greader (h/t neilfws, 26 Apr)
- 22 free data viz & analysis tools: http://ow.ly/4GKxD (ht @bigdata) (h/t infochimps, 26 Apr)
- JASA article PDF: Modeling 3-D Chromosome Structures Using Gene Expression Data http://pubs.amstat.org/doi/pdfplus/10.1198/jasa.2010.ap09504 (h/t AmstatNews, 25 Apr)
- Sometime working on getting more data is better than trying to improve complicated models by adding more features: http://t.co/HL5Gx5n #ML (h/t nicolastorzec, 25 Apr)
- An open letter to #RMS :http://bit.ly/fkAl3w #freesoftware (h/t onertipaday, 25 Apr)
- CRAN #rstats Package “anchors: Statistical analysis of surveys with anchoring vignettes” version 3.0-6 http://ow.ly/4G3wr (h/t kinggary , 25 Apr)
- “… social scientists need not abandon SEM… only the notion that SEM is capable of testing causal models.” (Pearl 2009) (h/t almostandy, 24 Apr)
- Finally, a new blog post. TF-IDF With Apache Pig http://bit.ly/hf4Dov #hadoop #pig (h/t TheDataChef, 24 Apr)
- Seminal Paper on feature selection. Plus, Andre’ is a friend and coauthor. http://ow.ly/4FN7Q (h/t gappy3000, 24 Apr)
- many people ask “#hive or #pig?” @alanfgates has an awesome @ydn post about this: http://yhoo.it/fq2WEP #hadoop (h/t esammer, 23 Apr)
- Basics of Compiler Design (free ebook) http://ow.ly/4FCUQ (h/t CompSciFact, 23 Apr)
- scaling up machine learning: http://www.cs.umass.edu/~ronb/scaling_up_machine_learning.htm (h/t hackingdata, 22 Apr)
- Large datasets with #rstats http://bit.ly/aLhNwT (h/t freakonometrics, 22 Apr)
- Interesting semi-supervised approach to predicting demographics from twitter profiles: http://bit.ly/evJQqC (h/t hmason, 22 Apr)
- Day #28 ggplot2 in knime: If you haven’t read yesterday’s post, I advise you to do so… http://goo.gl/fb/W5ixT #rstats (h/t Rbloggers, 22 Apr)
- stalkR: R functions for exploring iPhone and iPad (OS X only): Yesterday Alasdair… http://goo.gl/fb/tHtgd #rstats (h/t Rbloggers, 22 Apr)
- Learning about creating efficient Python web apps (@ Betaworks) http://4sq.com/eiK4Zh (h/t drewconway, 22 Apr)
- Rwui: Create a user friendly web interface for an #RStats script http://bit.ly/g8Gngi , paper:http://1.usa.gov/ekaMGX (h/t genetics_blog, 20 Apr)
- Jonathan Hartley: Python port of Modern 3D Graphics using OpenGL tutorial http://bit.ly/e29Jur (h/t planetpython, 20 Apr)
- Top Five Articles in Data Mining | Data Mining Research - www … http://bit.ly/f6KVtA (h/t DataMiningTips, 20 Apr)
- Survey of Pythonic tools for RDF and Linked Data programming (Feb 2011) @lambdaman http://bit.ly/h2agcL (h/t DublinCore, 20 Apr)
- Recommended Readings in AI - a list by Russell and Norvig http://j.mp/1YWt4x (h/t newsycombinator, 20 Apr)
- On Nuit Blanche now: CS: L1 Minimization in Python, A Rosetta table between Statistics/Machine Learning and Comp… http://bit.ly/eeaYvF (h/t IgorCarron, 20 Apr)
- Few draft chapters from Mike Jordan’s Intro to Graphical Models. http://bit.ly/eAvtNj (h/t mxlearn, 20 Apr)
- Advice vs. experience: Genes predict learning style http://sns.mx/q4dty3 (h/t TheNeuroScience, 20 Apr)
- This is an excellent, excellent paper: http://bit.ly/ecXNYl “Microarray data analysis: from disarray to consolidation and consensus” (h/t vsbuffalo, 20 Apr)
- Improving D3’s force-directed layout for large, dynamic and disconnected graphs: http://bl.ocks.org/929623 (h/t mbostock, 19 Apr)
- Clojure Atlas (Preview): An experiment in visualizing a programming language & standard library http://bit.ly/ed7J5Z http://clojureatlas.com (h/t ClojureAtlas, 19 Apr)
- Medical NLP challenge: co-reference resolution and sentiment classification - http://bit.ly/ghDAWD #nlproc #TextMining (h/t marin_dimitrov, 19 Apr)
- Data Visualization Survival Kit: Creating Visualizations in the Wild http://bit.ly/euFMQD (h/t infosthetics, 19 Apr)
- JanWillemTulp Jan Willem Tulp
New version of the #Data Science Toolkit released: http://j.mp/ecmCR3 (h/t JanWillemTulp, 19 Apr)
- Introducing Rack http://bit.ly/hV5bBF #greader (h/t neilfws, 19 Apr)
- Detailed notes from a 3 hour tutorial course on Redis: http://t.co/KDoJr9H #NoSQL (h/t DataJunkie, 19 Apr)
- Using R, Sweave and Latex to integrate animations into PDFs http://bit.ly/dSrMhf (h/t mxlearn, 19 Apr)
- A short introduction to Sparse Coding and Dictionary Learning http://bit.ly/gtjlUc (h/t mxlearn, 19 Apr)
- This morning’s coffee-shop read: Symbolic regression by means of arbitrary evolutionary algorithm: http://t.co/GtYr3iO (h/t cortesi, 19 Apr)
- Luc’s random forest: http://bit.ly/ePoWYu (h/t kshameer, 18 Apr)
- I just released Data Science Toolkit 0.35 - http://bit.ly/hKmz3E - UK geocoding, date/time extraction and more! (h/t petewarden, 18 Apr)
- All presentations from PSC’s Data Intensive Analysis Workshop available: http://bit.ly/gYklhh (VIA @mike_schatz) #ngs #mapreduce #hadoop (h/t suncoolsu, 18 Apr)
- “cluster forests” extension of random forests to spectral clustering; yan+chen+jordan http://t.co/QHjpBAM (h/t josephreisinger, 18 Apr)
- Going over the speed limit: In an earlier post [Speeding tickets for R and Stata] I… http://goo.gl/fb/eWE1w #rstats (h/t Rbloggers, 17 Apr)
- “Analyzing Massive Astrophysical Datasets: Can Pig/Hadoop or a Relational DBMS Help?” http://bit.ly/edR3w1 (h/t abmathewks, 17 Apr)
- RT @bachinsky: Evidence: The Weak Link Of Evidence-Based Medicine: http://wp.me/ppMHd-27B (h/t Chris_Evelo, 17 Apr)
- UNIX for poets http://j.mp/dSYB1s essential intro for bash data hacking (h/t drewconway, 17 Apr)
- Playing with DotCloud (http://www.dotcloud.com). This is really cool. (h/t hmason, 17 Apr)
- Empirical study of bagging and boosting #datamining http://bit.ly/h6djCn (h/t zyxo, 16 Apr)
- RT @dr_dobbs: olap4j - A New Open Standard for Analytics? http://twb.io/if8Jt9 /via @julianhyde (h/t fbahr, 16 Apr)
- The cognitive style of Unix http://bit.ly/ebNrdG (h/t JohnDCook, 16 apr)
- Python Patterns - Implementing Graphs - @gvanrossum 1998 - http://www.python.org/doc/essays/graphs/ (h/t sparttypes, 16 Apr)
- PADS, A library of Python Algorithms and Data Structures – http://www.ics.uci.edu/~eppstein/PADS/ (h/t smarttypes, 16 Apr)
- if you use the gbm package in r, you’ll probably like https://sites.google.com/site/rtranking/home. (h/t hackingdata, 16 Apr)
- Very cool application. RT @dataists Accentuate.us: Machine Learning for Complex Language Entry http://bit.ly/fde11p (h/t hmason, 15 Apr)
- What I like about org-mode http://bit.ly/faoDcc #emacs (h/t JohnDCook, 15 Apr)
- @eagereyes Great post on Pie Charts. I also wrote one called ‘In Defense of Pie Charts’ back in 2007 http://bit.ly/JzPO (h/t JeffClark, 15 Apr)
- RT @StatFact: Data hand tools http://ow.ly/4AqOs (h/t SciPyTip, 14 Apr)
- My editorial on ‘The spatial dimension in biological data mining’ has been published in BioData Mining. http://is.gd/O3tlHW (h/t moorejh, 14 Apr)
- How do variants outside genes influence disease risk? http://bit.ly/dVjDmI #greader (h/t neilfws, 14 Apr)
- Just posted: Significance testing and Congress http://goo.gl/a719q (h/t StatFact, 14 Apr)
- Maps in R http://bit.ly/e9ZEaB #diigo (h/t neilfws, 14 Apr)
- http://benjisimon.blogspot.com/2011/04/10-concepts-emacs-newbie-should-master.html (h/t learnemacs, 14 Apr)
- Rcpp Introduction published in Journal of Statistical Software, fresh Rcpp 0.9.4 released, http://goo.gl/dDQ0H #rstats (via @eddelbuettel) (h/t onertipaday, 14 Apr)
- “Linear regression is then employed for no better reason than that users know how to type lm but not gam” C Shalizi http://j.mp/hHYVIu (pdf) (h/t mja, 14 Apr)
- Review of Data Analysis with Open Source Tools http://bit.ly/hLk7vb #greader (h/t neilfws, 14 Apr)
- Hunting for Simpson’s paradox http://ow.ly/4zZtY (h/t StatFact, 14 Apr)
- bullet-chart and node-link hierarchy examples added to #D3: http://j.mp/eGfXiH (h/t JanWillemTulp, 14 Apr)
- sofia-ml: Suite of Fast Incremental Algorithms for Machine Learning (incl. methods for learning classification a… http://bit.ly/eLThEQ (h/t mxlearn, 14 Apr)
- Amazon EC2 configuration for scientific computing in Python and R http://bit.ly/ihvsAS #kdnuggets (h/t DataMiningTips, 14 Apr)
- Nice new OMIM interface RT @genomicslawyer, @humangenomeorg: OMIM avail through new & improved site: http://www.omim.org/ (h/t genetics_blog, 14 Apr)
- Great thread on #Quora: what are some time-saving tips that every Linux user should know http://b.qr.ae/eID7H4 (h/t genetics_blog, 13 Apr)
- Programming in Scala, 1st Edition, for free (not pirate!): http://www.artima.com/pins1ed/ (h/t dcsobral, 13 Apr)
- Caffeine withdrawal symptoms are heritable, h2=0.35 http://1.usa.gov/dVSUKx (h/t genetics_blog, 13 Apr)
- Best practices in LaTeX http://ow.ly/4zCVJ (h/t TeXtip, 13 Apr)
- Even if you’ve never used pointers, understanding things like http://stackoverflow.com/questions/4484289 requires the same aptitude (h/t spolsky, 13 Apr)
- Julien Palard: Python: Consulting PEPs from command line, while being offline http://bit.ly/e8z4YZ (h/t planetpython, 13 Apr)
- RT @eagereyes Interesting new visualization criticism site, looks promising: http://thewhyaxis.info/ (also @thewhyaxis) (h/t Biff_Bruise, 13 Apr)
- I wrote a #python function that sends an email (via GMail) once a script has completed. You may find it useful http://bit.ly/gB3Dhz (h/t drewconway, 12 Apr)
- Google releases open source code for hash functions http://bit.ly/e73jQC (h/t aria42, 12 Apr)
- 6 Free E-Books and Tutorials for Learning and Mastering Node.js - http://t.co/jQMPCL2 via @RWW (h/t fbahr, 12 Apr)
- Lots of resources RT @visualisingdata: From NYT Learning Network “Data visualized: more on teaching with infographics” http://nyti.ms/hMcttC (h/t JanWillemTulp, 11 Apr)
- Using Python, multiprocessing and NumPy/SciPy for parallel numerical computing by Sturla Molden: http://bit.ly/ezvetQ [PDF] /via @pprett (h/t ogrisel, 11 Apr)
- The Natural Language Toolkit Book is a great (free) resource to learn both about #NLP and #Python #NLT http://j.mp/3d7hJI http://j.mp/4zZvWs (h/t JanWillemTulp, 11 Apr)
- Stack Exchange releasing custom Redis client as community project: http://bit.ly/gcWLwZ (http://bit.ly/dRJuEh) (h/t marcgravell, 11 Apr)
- Optimization #Algorithm Toolkit: includes reference algorithm implementations, #graphing, #visualization, and much more: http://j.mp/fW27Jh (h/t JanWillemTulp, 11 Apr)
- Could an ex blind person see the difference between a cube and a sphere? The Molyneux problem has been solved http://bit.ly/eNqGmk (h/t iamreddave, 11 Apr)
- Community Structure in Graphs – http://arxiv.org/abs/0712.2716 (h/t sparttypes, 11 Apr)
- Learning from matrix valued data (Stanford stat 315C taught by Art Owen). Course notes, readings etc. http://ow.ly/4wYeP (h/t gappy3000, 10 Apr)
- Visual.ly is launching soon. to the beta > http://ldata.in/fDLOqD #visualization (h/t dHolowack, 9 Apr)
- Views from John Storey (one of my fav. statisticians) on importance of use of statistics in next-gen data analysis http://bit.ly/nature-ngs (h/t suncoolsu, 9 Apr)
- For those with extra time: Bob, an educational implementation of Scheme in Python http://bit.ly/fFxgtk (h/t statalgo, 9 Apr)
- some node.js tutorials: http://nodetuts.com/ (h/t alexablag, 9 Apr)
- A simple frequency plot: I’m currently working on a paper that uses Polish survey… http://goo.gl/fb/nXt8F #rstats (h/t Rbloggers, 8 Apr)
- Text Data Mining with Twitter and R: Twitter is a favorite source of text data for… http://goo.gl/fb/8toAH #rstats (h/t Rbloggers, 8 Apr)
- Ars always has very, very respectable articles on science. http://bit.ly/iaA5IM “RNA duplicating RNA, a step closer to the origin of life” (h/t vsbuffalo, 8 Apr)
- Coming soon: Robert Laganière. OpenCV 2 Computer Vision Application Programming Cookbook. http://bit.ly/hM3hZI (h/t ilysenkov, 8 Apr)
- Perl vs Python speed test: http://bit.ly/gDKtFI http://bit.ly/ebeHZg and the winner is Perl #ftw ! (h/t kshameer, 8 Apr)
- The trouble with discretizing a continuous variable http://xkcd.com/883/ (h/t drewconway, 8 Apr)
- How to diff extremely large files http://bit.ly/fGgvEC (h/t JohnDCook, 8 Apr)
- Mac OS X hidden features and nice tips & tricks http://t.co/tgcTPqP (h/t rubayeet, 8 Apr)
- Limits and challenges of parallelizing statistical software (e.g., Stata/MP) http://is.gd/oXWQeM #stata (h/t mmanti, 8 Apr)
- Buffon versus Bertrand in R: Following my earlier post on Buffon’s needle and… http://goo.gl/fb/M8vnK #rstats (h/t Rbloggers, 8 Apr)
- Data Science on the Command-Line http://oreil.ly/eflxDX (h/t medriscoll, 8 Apr)
- Extra! Extra! Get Your gridExtra!: The more I use it, the deeper I fall in love with… http://goo.gl/fb/eULcW #rstats (h/t Rbloggers, 7 Apr)
- Journal of Statistical Software v40 is out – with three articles by @hadleywickham and my ‘Nutshell’ review http://goo.gl/1iyVs #rstats (h/t eddelbuettel, 7 Apr)
- @kshameer FASTX ? http://hannonlab.cshl.edu/fastx_toolkit/ ? (h/t yokofakun, 7 Apr)
- Installed bcftools, bedTools, bwa, DINDEL, FastQC, GATK, JCVI, piccard, samtools. Am I missing any must-have tool for WES/#ngs analysis ? (h/t kshameer, 7 Apr)
- Common Lisp - Myths and Legends http://post.ly/1rdBY (h/t irr, 7 Apr)
- [delicious] Machine learning in bioinformatics #tweet: This article reviews machine learning methods for b… http://tinyurl.com/3ujt98w (h/t yokofakun, 7 Apr)
- the importance of controlling for multiple comparisons: http://xkcd.com/882/ #rstats (via @hadleywickham) (h/t onertipaday, 7 Apr)
- Atlas of Global Development is available in traditional format (print & online) & as a robust online visualization tool http://ow.ly/4v7PM (h/t WBPubs, 7 Apr)
- Wolfram|Alpha Blog : Wolfram|Alpha Becomes Scannable with QR Codes: http://bit.ly/giGUq9 via @resourceshelf @notinmy (h/t SteveAkinsSEO, 7 Apr)
- Compiler in R 2.13? Whoa! http://stackoverflow.com/questions/1452235/does-an-r-compiler-exist/5570354#5570354 (h/t statalgo, 7 Apr)
- Introducing FlumeBase: Continuous streaming SQL queries over Flume: http://blog.flumebase.org/ (h/t gremblor, 7 Apr)
- Speeding up R computations: The past few days I’ve been going through some R code… http://goo.gl/fb/XMJsN #rstats (h/t Rbloggers, 6 Apr)
- John Cook: A knight’s tour magic square http://bit.ly/fxiWi2 (h/t planetpython, 6 Apr)
- WebSweave?: A recent R-help post asks for examples of Sweave use for web appliations… http://goo.gl/fb/nEoN3 #rstats (h/t Rbloggers, 6 Apr)
- Daniel Brown: Installing Python, MatPlotLib & iPython on Snow Leopard http://bit.ly/ePXVAA (h/t planetpython, 6 Apr)
- R Reference Card for Data Mining http://bit.ly/dKj0P8 #kdnuggets (h/t DataMiningTips, 6 Apr)
- Learn MongoDB… http://post.ly/1rAnT (h/t irr, 6 Apr)
- Meta-analysis of Observational Studies in Epidemiology http://ff.im/Av0WW (h/t kshameer, 6 Apr)
- Heritage Health Prize: The goal of the prize is to develop a predictive algorithm that can identify patients who will… http://ht.ly/4tPHK (h/t StatsInTheWild, 5 Apr)
- First European Workshop on Integral Biomathics: iBioMath 2011, Paris, France, 12. August 2011 http://goo.gl/fCem8 (h/t yokofakun, 5 Apr)
- @drewconway think you’ll enjoy the rest of @5harad’s related post, btw: http://bit.ly/egWRsP :) (h/t jakehofman, 5 Apr)
- My JMLR paper on “Information, Divergence, and Risk” has finally been published! http://jmlr.csail.mit.edu/papers/v12/reid11a.html (h/t mdreid, 5 Apr)
- 45 algorithms from the field of Artificial Intelligence http://bit.ly/f1KnhA (h/t radar, 4 Apr)
- CytoscapeRPC 1.4 released. You can use #cytoscape from R, Perl, etc. http://bit.ly/ibFqVo (h/t cytoscape, 4 Apr)
- Bioinformatics special issue- compilation of all Bioinformatics papers on Next-Generation Sequencing http://bit.ly/DcfAm (h/t genetics_blog, 4 Apr)
- “Lectures on Computational Economics” in #python. http://bit.ly/hthQAP (h/t statalgo, 4 Apr)
- Stack Exchange Unix and Linux Q: What are your favorite command line features or tricks? http://t.co/98hmt4o (h/t alexablag, 4 Apr)
- Found something on #mpi & #python : http://mail.python.org/pipermail/chicago/2008-February/003607.html (h/t n0mad_0, 3 Apr)
- blogged about ‘Travelling the world of gene-gene interactions’ http://is.gd/tJGxY5 #epistasis #bioinformatics (h/t moorejh, 2 Apr)
- BioPython News: Biopython 1.57 released http://bit.ly/fuMqgf (h/t planetpython, 2 Apr)
- Creating web applications for spatial epidemiological analysis and mapping in R using Rwui http://bit.ly/gQqNDg #citeulike (h/t neilfws, 2 Apr)
- MetScape Plugin 2.1 Released http://bit.ly/i5Y2qQ (h/t cytoscape, 1 Apr)
- Great article on categorical association coefs http://goo.gl/J3aWL (h/t alexablag, 1 Apr)