Google and Cloudera are bringing Dataflow to Spark

Total
0
Shares

Google Cloud Dataflow allows developer to create and monitor their data-processing pipelines without the hassle of underlying clusters. Now the company is bringing this technology to Apache’s Spark data-processing engine with help from the people at Cloudera.

Google today announced that it has teamed up with the Hadoop specialists at Cloudera to bring its Cloud Dataflow programming model to Apache’s Spark data processing engine. With Google Cloud Dataflow, developers can create and monitor data processing pipelines without having to worry about the underlying data processing cluster. As Google likes to stress, the service evolved out of the company’s internal tools for processing large datasets at Internet scale. Not all data processing tasks are the same, though, and sometimes you may want to run a task in the cloud or on premise or on different processing engines. With Cloud Dataflow — in its ideal state — data analysts will be able use the same system for creating their pipelines, no matter the underlying architecture they want to run them on.

Leave a Reply

Your email address will not be published. Required fields are marked *

Sign Up for Techi's Special Newsletter

Newsletters are not just for grabbing attention. I promise to deliver the best disruptive technologies in your inbox once or twice a month.

You May Also Like