Hi,

I was wondering whether you should use R or python for statistical ðŸ“Š analysis? What are the pros and cons of each?

Thanks

4

Hi,

I was wondering whether you should use R or python for statistical ðŸ“Š analysis? What are the pros and cons of each?

Thanks

3

No brainer IMO - R.

10,000 statistics oriented packages on CRAN alone, written by statisticians. There is little to choose between the languages (indeed many language) in general usage, it's the community and contributions that distinguish. R is hands-down leading in both regards for statistics.

There *is* overlap for sure, so simple stuff will be served by both. But you want more specialist stats, R for sure e.g. GAMs, R, mixed models, R,...

Python for sure for some things, but stats is R's game and I'll arm-wrestle anyone who says otherwise.

BTW "powerful" gets bandying about a lot, but needs qualifying :)

4

â€¦ BTW, Significance magazine has a nice retrospective on R (25 year anniversary).

https://www.r-craft.org/r-news/r-generation-25-years-of-r/

I'll admit to massive bias here - I had an office next to Ross Ihaka and Robert Gentleman when they were writing it.

0

Thank you, some really great points. So cool that you had an office next to R's creators! Did you contribute anything to it?

jordan_w
·

1

Did I contribute anything?

Just some moaning about how horrible it was. I was a PhD student and we were kind of forced to teach undergrads in it (it was

1

There isn't much to choose between them.

R has built in packages which allow you do analysis faster, e.g., you can use commands like `lm`

, `glm`

and `plot`

straight off the bat, whereas in python you'll probably have to import `statsmodels`

, `numpy`

, `pandas`

, and `matplotlib`

before doing any analysis.

That being said, I prefer python because (in my opinion) it is a more intuitive language, and the staple packages (i.e., `numpy`

, `pandas`

, etc.) are a little more powerful ðŸ”¥ðŸ”¥.

Hope this helps!

1

The other commenters who say that R is more powerful are correct. I still prefer python, but probably because that's what I know and am fast in.

You can call R from python, and use R in things like Jupyter Notebooks with your python data types, so there's almost no need to be exclusive any more.

1

Out of interest, how do you call R from Python? Is the rpy2 package your best option?

jdry1729
·

1

That's what I've used in the past. Or the R magic in Jupyter Notebooks.

James
·

1

An interesting question. There are people doing statistical analysis using R and people using Python. Each group shouts about how wonderful their system is. (I'm using the word "system" to mean Python-with-its-statistical-ecosystem or R-with-its-ecosystem.)

So this suggests that the factors worth considering are broader than which system is "better". Questions you might think about include:

- Are you already familiar with one of the two systems?
- Do you have colleagues who are familiar with one of the systems who could help you when you need it?
- Do you have access to courses to learn either of the systems?
- Can you find examples online of code for doing the sort of analysis you are trying to do in either system which you can adapt to your needs?

I hope this helps your thinking!

1

I would say yes! The answer is not an either or but both. I have used both R and Python in the past. Python is better at cleaning up data and organizing it. R is not so good at that. Both R and Python have a lot of "packages" to help you with your statistical analysis.

The good Python development platforms you will need to pay for such as Pycharm. R Studio is free.

R is easier to use if you do not have any programming background.

1

I would go with Python, both R/Python are very similar, but Python is much simpler to use.

Also if you then would to delve into using big data clusters you can use Pyspark or if you are interested in Cuda you have pytorch etc.