Hace pocos días sacamos un artículo sobre esta nueva tecnología que parece que está tomando la delantera en el Big Data.

En un artículo en kdNuggets nos hablan sobre cómo Spark, con su enfoque InMemory y su funcional API está revolucionando el sector. Empresas como Alpine Data Labs han sabido ver todo su potencial y ya se han puesto manos a la obra.

“Have you heard of Spark?

This is going to change everything, again.” (Dr. Will Ford)

 Spark has three key features that make it the most interesting up and coming technology to rock the big data world since Apache Hadoop in 2005.

  1. For iterative analysis like logistic regression, Random Forests, or other advanced algorithms, Spark has demonstrated 100X increase in speed that scales to hundreds of millions of rows.
  2. Spark has native support for the latest and greatest programming languages Java, Scala, and of course Python.
  3. Spark has generality or platform compatibility in both directions meaning it integrates nicely with SQL engines (Shark), Machine Learning (MLlib), and streaming (Spark Streaming) without requiring new software installed on the cluster using Hadoop’s new YARN cluster manager.

Apache Spark

At Alpine, we have made it dead simple to get started with Spark by including the technology in our latest build out of the box.  We require no additional software or hardware to leverage our extensive list of operators for data transformation, exploration, and building advanced analytic models.  We leverage Hadoop Yarn (Hadoop NextGen) to launch Spark job without any pre-installation of Spark or modification of cluster configuration. This empowers our customers to have seamless integration of our Spark implementation and their Hadoop stack.  For example, we have analyzed 50 Million rows of account data in 50 seconds on a 20 node cluster recently at last month GigaOM conference.