Using Emacs, Org-mode and R for Research Writing

This guide presents a toolkit for writing research papers and monographs using Emacs, Org-mode and R.

Org-mode allows embedding statistical code in the document to generate results that can be revised and reproduced, integration of bibliographic references with a database, and consistent formatting without any manual tweaking using excellent support for creation of pdf, […]

Sustainability of water use in agriculture

AQUASTAT, developed and maintained by the Food and Agriculture Organization of the UN, is the global database that gives quantitative information on water resources and withdrawal of water for different uses.

Agricultural water withdrawal is defined as follows:

Annual quantity of self-supplied water withdrawn for irrigation, livestock and aquaculture purposes. It includes water from primary […]

Hindi/Devanagari presentations using orgmode, R, latex and beamer

I recently had to prepare a beamer presentation in hindi/devanagari. I usually use emacs-orgmode  with a lot of R source code embedded in it to prepare my beamer presentations. To adapt the entire setup to work with devanagari, this is what I needed to do.


Make orgmode export to latex using xetex rather than […]


I discovered a new, very useful, R function yesterday: ave.

This is what it does: “Subsets of ‘x[]’ are averaged, where each subset consist of those observations with the same factor levels.”

But interestingly, you can use any function other than average. The output of that function is set against each observation.

I wanted […]


If a model is estimated using the following code: lm(y~x1+x2)->p

1. bptest(p) does the Breuch Pagan test to formally check presence of heteroscedasticity. To use bptest, you will have to call lmtest library.

2. If the test is positive (low p value), you should see if any transformation of the dependent variable helps you eliminate […]

Find type of variables in a data frame

sapply(a,class) gives type of field (character, numeric, or factor) for each variable in the data frame a.

Moving average/median

?rollmean (package=zoo)?rollmedian (package=zoo)?runmed (package=stats)

“relevel” factors when using them to create dummy variables


: The levels of a factor are re-ordered so that the level specified by ‘ref’ is first and the others are moved down. This is useful for ‘contr.treatment’ contrasts which take the first level as the reference.

Working with ggplot, version 2

In my previous post on this issue, I had presented a code that made weighted boxplots and annotated them with boxplot statistics and the mean values. The problem with that code was that it printed these annotations right on the vertical axes of the boxplots. Also, a relatively minor problem was that, when the values […]

Reading large tables into R

Here are some useful tips on the issue.


Reading data from microsoft access files in linux

Some of the Census 2001 data are in microsoft access files (having filename extensions .mdb). A microsoft access file can have several tables inside, each of which contains data. There is a software called mdbtools that can be used to read access files.

The command mdb-tables can be used to see the names of tables […]

simple.scatterplot: Two way distributions

John Verzani’s book has a title page that shows a scatterplot with histograms of x and y variables along the two axes. It is a very powerful way of looking at two distributions. The plot was generated through a function simple.scatterplot. The function is made available as part of the UsingR package, which can be […]

Graphics (base, grid and lattice) in RNews

R News 2(2) has papers on grid and lattice packages. R News 3(2) has papers on base, grid and gridBase.

Essential stuff for anybody trying to master R graphics


Working with ggplot

Hadley Wickham’s ggplot is a very interesting package. It makes beautiful graphics, integrates well with some of the other packages to allow you to superimpose the plots of various types of estimates on plots of data. In particular, it uses colours very well. The default colour schemes are aesthetically pleasing. It allows a flexible use […]

Dropping columns in subset command

Use “select=c(var1,var2)” in the subset command to select var1 and var2.

Use “select=-c(var1,var2)” in the subset command to drop var1 and var2.

