Menu
Nadzweb.com
Nadzweb.com

Google Cloud Dataflow, what and how?

Posted on October 16, 2014February 13, 2015 by admin

Cloud Dataflow is a Google technology that provides a cloud service to process data. It allows developers to build pipelines, monitor their execution, and transform & analyse data, all in the cloud.
Cloud Dataflow is based on a highly efficient and popular model used internally at Google, which evolved from MapReduce and successor technologies like Flume and MillWheel. The underlying service is language-agnostic.

Cloud Dataflow represents all datasets, irrespective of size, uniformly via PCollections (“parallel collections”). A PCollection might be an in-memory collection, read from files on Cloud Storage, queried from a BigQuery table, read as a stream from a Pub/Sub topic, or calculated on demand by your custom code.

Dataflow is designed to complement the rest of Google’s existing cloud portfolio. If you’re already using Google BigQuery, Dataflow will allow you to clean, prep and filter your data before it gets written to BigQuery. Dataflow can also be used to read from BigQuery if you want to join your BigQuery data with other sources. This can also be written back to BigQuery.

Since this service is on Google’s infrastructure, it eliminates operational costs and the need to focus on scalability as Google handles this on its infrastructure. All we as developers need to focus is the Application layer and logic. “Eyeball this Space”, as this is quite interesting and it may be a game-changer in the BigData space.

For more information refer to Google Cloud Data Processing Service and Google Cloud Dataflow.

  • bigdata
  • gap
  • google app engine
  • google cloud
  • hadoop
  • tags
  • Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    *
    To prove you're a person (not a spam script), type the security word shown in the picture. Click on the picture to hear an audio file of the word.
    Anti-spam image

    Tags

    .htaccess angular angular2 angular2-pipes angular4 angularjs apache bigdata blockchain children codeigniter computer graphics ethereum flot flot charts funny hadoop http javascript jquery kanban lena linux love math mathematics microsoft misc node js php phpframework php frameworks postgres pun-intended python react sass scrum scss silverstripe software ssl story valentines day wordpress

    Archives

    Recent Posts

    • Install only Postgres client 11 on Ubuntu 18.04
    • PostgreSQL – Granting access to users
    • Querying JSONB Postgres fields in SQLAlchemy
    • Angular – Writing unit tests for setTimeout in functions
    • Angular 6 – getting previous url from angular router
    ©2021 Nadzweb.com | Powered by WordPress & Superb Themes