R introduction


What is R?

R is a programming language and free software developed by Ross Ihaka and Robert Gentleman in 1993. R possesses an extensive catalog of statistical and graphical methods. It includes machine learning algorithms, linear regression, time series, statistical inference to name a few. Most of the R libraries are written in R, but for heavy computational tasks, C, C++ and Fortran codes are preferred.

R is not only entrusted by academic, but many large companies also use R programming language, including Uber, Google, Airbnb, Facebook and so on.

Data analysis with R is done in a series of steps; programming, transforming, discovering, modeling and communicate the results

  • Program: R is a clear and accessible programming tool
  • Transform: R is made up of a collection of libraries designed specifically for data science
  • Discover: Investigate the data, refine your hypothesis and analyze them
  • Model: R provides a wide array of tools to capture the right model for your data
  • Communicate: Integrate codes, graphs, and outputs to a report with R Markdown or build Shiny apps to share with the world

In this introduction tutorial you will learn R

  • Statistical inference
  • Data analysis
  • Machine learning algorithm

R studio accepts markdown to write a document. You can export the documents in different formats:

  • Document :
    • HTML
    • PDF/Latex
    • Word
  • Presentation
    • HTML
    • PDF beamer

Why use R?

Data science is shaping the way companies run their businesses. Without a doubt, staying away from Artificial Intelligence and Machine will lead the company to fail. The big question is which tool/language should you use?

They are plenty of tools available in the market to perform data analysis. Learning a new language requires some time investment. The picture below depicts the learning curve compared to the business capability a language offers. The negative relationship implies that there is no free lunch. If you want to give the best insight from the data, then you need to spend some time learning the appropriate tool, which is R.

On the top left of the graph, you can see Excel and PowerBI. These two tools are simple to learn but don't offer outstanding business capability, especially in term of modeling. In the middle, you can see Python and SAS. SAS is a dedicated tool to run a statistical analysis for business, but it is not free. SAS is a click and run software. Python, however, is a language with a monotonous learning curve. Python is a fantastic tool to deploy Machine Learning and AI but lacks communication features. With an identical learning curve, R is a good trade-off between implementation and data analysis.

When it comes to data visualization (DataViz), you'd probably heard about Tableau. Tableau is, without a doubt, a great tool to discover patterns through graphs and charts. Besides, learning Tableau is not time-consuming. One big problem with data visualization is you might end up never finding a pattern or just create plenty of useless charts. Tableau is a good tool for quick visualization of the data or Business Intelligence. When it comes to statistics and decision-making tool, R is more appropriate.

Stack Overflow is a big community for programming languages. If you have a coding issue or need to understand a model, Stack Overflow is here to help. Over the year, the percentage of question-views has increased sharply for R compared to the other languages. This trend is of course highly correlated with the booming age of data science but, it reflects the demand of R language for data science.

Is R difficult?

Years ago, R was a difficult language to master. The language was confusing and not as structured as the other programming tools. To overcome this major issue, Hadley Wickham developed a collection of packages called tidyverse. The rule of the game changed for the best. Data manipulation become trivial and intuitive. Creating a graph was not so difficult anymore.

The best algorithms for machine learning can be implemented with R. Packages like Keras and TensorFlow allow to create high-end machine learning technique. R also has a package to perform Xgboost, one the best algorithm for Kaggle competition.

R can communicate with the other language. It is possible to call Python, Java, C++ in R. The world of big data is also accessible to R. You can connect R with different databases like Spark or Hadoop.

Finally, R has evolved and allowed parallelizing operation to speed up the computation. In fact, R was criticized for using only one CPU at a time. The parallel package lets you to perform tasks in different cores of the machine.

Five advantages of R

Five disadvantadges of R

1. It is Open Source

2. Exemplary Support for Data Wrangling

3. Quality Plotting and Graphin

4. Highly Compatible

5. Platform Independent

1. Weak Origin

2. Data Handling

3. Basic Security

4. Complicated Language

5. Lesser speed

R in biomedical engineering

Like many other languages, R can be pretty interesting in bioengineering, taken into consideration it advantages. Particularly, R is interesting in data management and modelling due to the quality in plotting and graphin. It allows indeed to have an excellent visualization of big amounts of data. 

It can be particularly interesting in clinical engineering or in telematics. 

Personal experience with R

However, it is true it was for me the most complicated language to use. I mainly used it during my Master and I can say that despite having being programming for a few years, it took me and my collegue hard work to understand the subtilities of this language and use it for the task we had been asked. 

Therefore, I would recommend it in very precise situations it suits the most to use R. In those cases, I can say the visualization and results presentations with R is just great. 

Some interesting courses