If your organization is like most, you might be overwhelmed by the volume of information available at your fingertips. To make it worse, the growth rate on this data continues to increase; data and lots of it are being produced more rapidly than ever before. How can you capture this data and use it to help you and your organization make better decisions, and ideally to predict future behavior and events?
Historically, you might approach this challenge in a very structured way. First, you might commission a team to examine the raw data, build a data dictionary and construct a data database and then have a team of analysts write a set of queries and views to give you and your team access to slices of the data. Ideally, this approach would be formalized into an ongoing operational process that provided valuable insights to your organization.
This is a tried and tested approach, which with the luxury of time and resources, has the potential to produce useful results from the captured data. However, with information growing at a rapid pace, what happens when your data changes? Frequently, in the middle of this analysis, structure or the format of the data source may change or you might discover a new, potentially insightful, data source, which could be very helpful, but also might set back your timeline as you adjust to the new information and/or the new data stream.
Chances are good that you would need something quicker and more flexible to adapt to changes in the data or in the data sources. You can address these needs by adopting some of the techniques frequently associated with the popular term “Big Data”, which has characteristics that you may recognize and solutions that you can leverage. Some of these characteristics include:
- An abundance of data sources, with both structured and unstructured data.
- A flexible data schema that seamlessly, and in real-time, incorporates new fields and data-types
- A tremendous number of variables with underutilized (or unknown) correlations and interactions.
- Real-time input streams that continually update and could influence your conclusions.
Fortunately, the cloud can provide resources and tools to quickly, and flexibly meet these requirements. At some level, the above challenges exist for any organization – it is all relative, and the techniques and approaches can often be scaled down. Whether you truly have “Big Data” or not, cloud services and the data analysis techniques available with them, provide elements and approaches that you can apply to your data.
- The key is to have a step-at-a-time approach that puts some governance around the data and builds repeatable processes that make the information readily accessible to many users. It’s best to start by defining the goals, understanding the sources of data, and setting the appropriate data boundaries (time, resources, questions, tolerance). This sounds simple and obvious, but frequently when many data points and variables are involved, it is not always easy to know which goals to focus on. I like the analogy of poking a hole in a dike to release the back-flow of water. The knowledge may start out as a trickle, but as you start to widen that hole, the force of the water starts working on your side and the outflow of water (or information) accelerates.
- Analyzing your data can be an iterative process, which can lead to a constant back and forth of testing approaches, tweaking variables or algorithms and comparing results. It is important to evaluate the cost and benefit of your changes, as this can easily burn time and resources.
- Examine cloud services as a place for capturing and storing your data, and for using the associated tools for aggregating and analyzing your data. Cloud-based storage techniques can provide schema-on-read capabilities that flexibly to absorb new fields and feeds. The trick for efficient cloud implementations is to find the balance between computing and data resources, and this balance is often dependent on the types of data that you are collecting and the desired insights that you are seeking.
- Finally, when you have found a way through your data to identify root problems or even to predict future actions, you need to operationalize the process and tools. This requires solidifying the data collection and transformation process, hardening the analysis process and then disseminating the results in a way that is easy to understand and act upon. It may take some pre-work to prep your colleagues to expect and accept the new information.
The journey from start to finish is not a straight line, even less so when you are just beginning. However, as you build experience and knowledge of your data, your transformations, algorithms, and outputs that drive the most benefit, you will be on your way to leveraging the cloud and data analytics techniques that can grow your organization’s capabilities and responsiveness.