inside BIGDATA | 3 Considerations for Working with Data Gravity
“Data gravity” is a term coined nearly a decade ago by Dave McCrory, a systems engineer and current VP of Software Engineering for the Wise.io team at GE Digital. The idea is based on Newtonian gravity: greater mass, greater attraction. In the classical Newtonian sense, the force of gravity increases with mass, yet decreases exponentially with distance. Similarly, the more data you have, the more you attract applications and other data.
As data shifts to cloud and cloud-adjacent colocation environments, the more we face these gravitational forces. The growing distance between data nexuses and where data is stored (or where applications run) becomes increasingly difficult, as well.
As enterprises share more data and derive increasing value from deeply enriched data sets, evaluation of data gravity becomes a critical part of data use best practices. Answer these three questions to help refine how you store and leverage data:
1. How efficiently can you access your data?
When you want to access your data, you want – and, in most cases, need – the process to be fast. You want to minimize data latency, the amount of time it takes to retrieve your data from its storage location.
The locality of your data plays an out-sized role here. Let’s say you want to run an application in a data center in Portland, Oregon, which needs access to data in an Atlanta data center. You’ll need to accommodate the time it takes for data to flow back and forth. If the application and data are both in Portland, you can access the data in 5% of the time. The result: an application that can work up to 20 times faster. Leaving data on-premise has pronounced downsides, particularly at a time when the majority of valuable data is moving to the cloud or colocation facilities. Last year, when only 10% of enterprises had closed their traditional data centers, Gartner estimated that by 2025 that number will grow to 80%.
Data gravity—and the ensuing need to move significant amounts of data between data centers, clouds, and on-premise infrastructures—becomes a deterrent to long-distance access to (and use of) data. It can also harm motivations to share data between enterprises.
2. How can you monetize your data?
Understanding specifically why you use your data is another critical consideration. Think of big data usage as a form of storytelling: your narrative, and the algorithms you use to reach that narrative, matter. Are you using enough of the right data?
Hotel chain Red Roof Inn knows that its customers care about the weather; it is often a significant part of the travel experience. It turned to data to learn how to “outsmart” its competitors. Through open data that evaluated past winters’ storms, Red Roof Inn could determine when 2-3% of flights were cancelled, leaving 90,000 passengers looking for accommodations each evening. It then marketed to travelers, notifying them of locations based near airports and calling attention to the warm beds available at their hotels on nights when the airlines stranded them.
In this case, Red Roof Inn was able to access public data sets, including government statistics and weather data. But not all organizations will share data freely. Big data success stories frequently show that projects require data from only a single organization.
Efforts to unearth unstructured data (that isn’t electronically readable) are growing; there are wide-ranging efforts to safely share data among organizations. Those with valuable data will be looking for ways to monetize that data. How it might be available, and at what price, will become important factors for how well you convey your narrative.
Consider healthcare. McKinsey estimates that big data can yield $100 billion in savings for U.S. healthcare alone in part by sharing data and collaborating externally. Can they increase profit and lower healthcare costs by greater eternal collaboration? It’s worth exploring.
3. How is the locality of your data impacting big data results?
Data gravity is driven by the need for low latency and high throughput. Large data sets are difficult to move. The AWS Snowmobile data transfer service was developed to help customers move “massive volumes of data.” Literally, it’s a storage data center in a box, delivered by a semi-truck, where each truck can move up to 100 petabytes of data. Transferring that much data straight to the cloud using a full 10 Gbps connection would take nearly three years. That’s the throughput problem in a nutshell. Copying the huge data sets that are required in big data won’t be practical over long-distance networks. To be prepared for new uses of your data in the future, your enterprises must strongly consider the locality of your big data.
Strength of Data Gravity
As enterprises try to share more data and derive more value from deeply enriched data sets, data gravity’s effects will become more dramatic. To ensure quality big data results, enterprises must be prepared for how they plan to use that data in the future. While startups talk about their ability to overcome data gravity with higher throughput, they won’t be able to overcome the speed of light. Ingesting and processing data present enough challenges; you shouldn’t also have to worry about wide-area network distances.
For the original article, visit insidebigdata.com