My prefered tool of choice for statistical analysis was always Microsoft Office Excel. This was because from the very start you can dump your data into a spreadsheet, graph it and then perhaps work out what to do….where are the gaps, the extremes, the clustering? Whilst this is a quick approach there always seems to be something missing and later the analysis gets clunky as it hasn’t been thought through.
I then heard about R…a grown up statistical analysis environment. Right from the very start you are forced to consider the format of the data you have and how you are going to handle it. This makes you think about what variables and dataframes will be required to hold and manipulate that data….I think this forms a much better start to any analysis. Further down the processes it is easier to manipulate the data to what you require. In Excel there are always lots of ‘filler’ coumns that contain intermediate calculations because the user will find them too difficult to invoke in a single cell call. This naturally prompts some of use to explore VBA – this can produce some very good results but it is time consuming and likely to contain many bugs initially.
I highly recommend you try out R.see R Homesite
I took a 4 week course through Coursera.org and think of that as a good introduction.
The environment is command line and interpreted so you can either type small snippets of commands in directly or opt for writing a larger R file.
Playing with stats in R can build up a very good understanding of the basics – make a vector of random numbers (you choose the distribution) and plot them out. Play with the calling parameters and watch the graph change.
try a vector of 1000 values, mean of 0 and standard deviation of 1:
> a<-rnorm(1000,0,1) > hist(a)