
To improve the experience of users who wish to use R notebooks for single node analysis and the new sparklyr users starting with Spark 2.2, we are not importing SparkR by default any more. For these users, the pre-loaded SparkR functions masked several functions from other popular packages, most notably dplyr. We learned that some of them use our notebooks as a convenient way for single node R data analysis. Thousands of users have been running R and Spark code in R notebooks. When we released R notebooks in 2015, we integrated SparkR into the notebook: the SparkR package was imported by default in the namespace, and both Spark and SQL Context objects were initialized and configured. We also introduce some of the latest improvements in Databricks R Notebooks. In this blog post, we show how you can install and configure sparklyr in Databricks.

Today, we are happy to announce that sparklyr can be seamlessly used in Databricks clusters running Apache Spark 2.2 or higher with Scala 2.11. sparklyr’s addition to the Spark ecosystem not only complements SparkR but also extends Spark’s reach to new users and communities. At Databricks, we provide the best place to run Apache Spark and all applications and packages powered by it, from all the languages that Spark supports. sparklyr’s interface to Spark follows the popular dplyr syntax.

In September 2016, RStudio announced sparklyr, a new R interface to Apache Spark. Try this notebook on Databricks with all instructions as explained in this post notebook
