Google and Cloudera are bringing Dataflow to Spark
G

Google Cloud Dataflow allows developer to create and monitor their data-processing pipelines without the hassle of underlying clusters. Now the company is bringing this technology to Apache’s Spark data-processing engine with help from the people at Cloudera.

Google today announced that it has teamed up with the Hadoop specialists at Cloudera to bring its Cloud Dataflow programming model to Apache’s Spark data processing engine. With Google Cloud Dataflow, developers can create and monitor data processing pipelines without having to worry about the underlying data processing cluster. As Google likes to stress, the service evolved out of the company’s internal tools for processing large datasets at Internet scale. Not all data processing tasks are the same, though, and sometimes you may want to run a task in the cloud or on premise or on different processing engines. With Cloud Dataflow — in its ideal state — data analysts will be able use the same system for creating their pipelines, no matter the underlying architecture they want to run them on.

NOTE: TECHi Two-Takes are the stories we have chosen from the web along with little bit of our opinion in a paragraph. Please check the original story in the Source Button below.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Interested in TECHi Feed RSS?

Get the latest insights, tips, and updates on revolutionizing your workspace to your inbox.

Popular This Week