Big Data . Cloud Computing . Realtime Analytics

Cloudscale

Subscribe to Cloudscale: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get Cloudscale: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Cloudscale Authors: Bill McColl, Pat Romanski, Liz McMillan, Jeremy Geelan

Related Topics: Cloud Computing, Cloudonomics Journal, Cloud Expo on Ulitzer, Open Source and Cloud Computing, Cloud Hosting & Service Providers Journal, Cloud Security Journal , Cloud Computing Newswire, Open Cloud Collaboration, Cloud Backup and Recovery Journal, Cloud Data Analytics, Cloud Computing for SMBs

Blog Post

Cloud Analytics: Dataflow vs Databases

Realtime analytics drives a migration away from databases to more scalable parallel dataflow architectures.

For twenty years, analytics has been viewed as just one specific area within the broader relational database industry. So, analytics has meant databases. Today that view is changing. Over the past year or so, a new movement, the "NoSQL" movement has emerged promoting the advantages of doing a variety of kinds of analytics without using any relational database technologies at all.

Whatever one thinks of the capabilities and limitations of distributed key-value stores relative to relational databases, one thing is clear - the stranglehold that SQL has held over all aspects of data analytics since 1990 is now coming to an end. Other non-SQL approaches to analytics such as MapReduce/Hadoop, a very simple dataflow architecture for batch computing, are now gaining ground. As the need for realtime analytics grows we will continue to see a migration away from databases and towards more scalable parallel dataflow architectures for analytics.



The main differences between databases and dataflow can be summarized as follows:

Database

Dataflow

Historical

Realtime

Offline

Online

Pull Model

Push Model

High latency

Low latency

Demand-driven

Data-driven


The shift from databases to dataflow for enterprise cloud analytics mirrors what we have recently seen in another area, the "realtime web". The old demand-driven web model of polling/querying/pulling RSS feeds has proved unable to deliver the kinds of low latency required for the numerous new realtime web services being created by Twitter and others. New data-driven, realtime, push models such as PubSubHubbub and RSSCloud are now replacing the old approaches.

More Stories By Bill McColl

Bill McColl is Founder & CEO, Cloudscale Inc. In order to found Cloudscale he left Oxford University, where for over twenty years he was Professor of Computer Science, Head of the Parallel Computing Research Center, and Chairman of the Computer Science Faculty. He has led research, product and business teams in a number of areas: massively parallel algorithms and architectures, parallel programming languages and tools, datacenter virtualization and resource management, realtime stream processing, and cloud computing. Cloudscale is his second Silicon Valley software company. He was also founder and CEO of Sychron Inc., a Silicon Valley VC-backed software company developing scalable software systems for datacenter and desktop virtualization. McColl lives in Palo Alto, CA.