Introduction\n\nThe concept of Big Data is a very interesting when thinking about solutions to very real problems with our information. By implementing an effective digital ecosystem, we can manage our VAST amounts of information in a manner where it is easy to query and find EXACTLY that we are looking for. This then enables us to determine the answers to our most difficult questions with little effort. On top of this, by having a huge and ordered index of our data, we can potentially solve problems we never knew we had and develop insights we never knew would be useful to us. All of this glorious intelligence based on information we were always going to store anyway... fantastic 😎!\n\nMuch like other fantastic ideas (like Agile, Blockchain, Lean etc), the concept of Big Data has unfortunately been hijacked and corrupted into a buzzword which can get in the way of its true power and usefulness. Whilst buzzwords can be great for building up the general interest around a topic, a lack of substance due to a lack of knowledge and/or understanding of core concepts can serve to alienate the very people who would such a concept useful. I have seen Big Data become a victim of this, which is such a shame because when you strip away all of the NONSENSE and demonstrate its capability using working examples pertinent to your demo audience, pushback is minimal. Think more captivated eyes, rather than eye rolls 🙄!\n\nBig Data Bingo\n\nGartner Hype Cycle\n\nIt's important to distinguish what in fact Big Data is and how it can be used in various situations. It's also vital that we manage expectations, because there is NO magic drag and drop solution for your project and/or organisation. I advise that you dispel any such delusions before embarking on such a journey to save huge disappointments, known as the "Trough of Disillusionment" shown in the Gartner Hype Cycle link above. Getting the most of your data and harnessing it the way that YOU specifically want requires buy-in, effort, time and ongoing development. This might seem daunting, but once you get get started and get past the initial hump, the numerous benefits will start to show. The more you continue, the more incredible your solution(s) become!\n\nLady who is harnessing data for the insights she needs for her specific use cases\n\nData is the Key\n\nEVERY organisation has data, but not everyone is a data organisation. Just by carrying out day to day tasks, you will likely be creating and storing lots of information that will be useful to at least someone. This includes the documents that you write, the Excel sheets that you prepare, the audits that you carry out, the emails that you send, the sensor values that you capture, factory machine diagnostics etc. Whether you take the time to think about it or not, this information that you manage is actually integral to you, and a lot of your time, resources and overheads will be spent manipulating into outcomes useful for your organisation. It is this (mostly) manual interaction with your data which is where we can make significant improvements.\n\nIf you've ever worked with a computer, I'm sure you've had to carry out mind-numbing busywork to complete work. This includes spending huge amounts of time reading through many documents to get that little nugget of information that you need, trawling your email inbox aggregating information from many recipients and copy-pasting rows and/or columns from many spreadsheets into one super-sheet with many columns which is an absolute pain to read (hiding columns is your friend on speed dial). It is this necessary, often repeated busywork which eats up most time and has employees wishing for better more efficient ways of working. The ideal for me would be to live in a world where all of this is automated, allowing us to spend 100% of our time interpreting this information and 0% manually processing it.\n\nSpreadsheet busywork is all the rage in many companies\n\nWhat's in a Name...\n\nSo what is Big Data? Put simply, it's an concept that deals with ways to analyse and gain insights from data sets which are too large or complex to be managed by traditional data-processing techniques. Think huge amounts of your data manipulated to get answers to the questions that you ask in a reasonable amount of time with little to no manual work. When implemented correctly, it can give you knowledge and understanding that wasn't possible before.\n\nWhen you have a lot of information that you need to manually trawl through, then you have a "Big Data Amount" and are NOT harnessing Big Data. Furthermore, the more manual steps that you have to take in order for your information to actually be usable, the further away from the concept of Big Data you actually get. Don't let someone try to convince you that a spreadsheet is 1,048,576 rows by 16,384 columns (the Excel maximum) is Big Data! Unless you can gain useful insights, it's just an unwieldly behemoth! Do you really want to go through all of those cells? Also, I've used big sheets on normal-spec computers before and the speed and overall performance is HORRENDOUS!\n\nMicrosoft Support: Excel Specification and Limits\n\nWhat we want is to understand our position, based on the information that we are analysing in a reasonable amount of time. If we ran an analysis that would take 30 days to complete for insights that we need tomorrow, then we will likely not bother. There has to be a better way...\n\nGuy channeling his strength to unleash his inner BEAST\n\nHarnessing Your Power and Unleashing the BEAST!\n\nBig Data to me is more about the insights you get from the processing and analysis of your data, not necessarily about the overall size of the information that you have. As I've mentioned before, there is NO magic system that easily turns all the data you hold into a form that can easily be analysed and manipulated. Instead you need to take the time to consider the information you have and subsequently collect, and treat each data point as important. I've gone into this at length in a previous article and I recommend that you have a read of the concepts discussed. The idea is to structure your data environment such that it can scale from 1 record to the currently unfathomable numbers of records that will potentially be possible with the increasing computing power available in the future.\n\nPutting Data First: Real-time Information as a First Class Citizen\n\nWhen you've properly considered your information along with your schema, structure and format(s), you can configure your data ecosystem to do ALL the heavy lifting for you whilst you reap the rewards. The trick is to count the clicks required to go from your (large) dataset to the insights that you asked for, where the fewer clicks you need, the better your solution. To demonstrate this concept, I'm going to use a slice of data from one of my systems.\n\nAlmost 3 million records captured over a 7 day period where data input peaks at over 107,000 records every three hours\n\nIn the previous image, a 2.7m data records have been captured, stored and indexed over the period of a week (this isn't all the data, just data from the last 7 days). The graph shows the total records every 3 hours and the table below shows a subset of the datapoints generated in a manner similar to a spreadsheet. If we were in spreadsheet territory, we would likely make a new sheet with some graphs, but we unfortunately wouldn't be able to store and read all records due to total row limitations (as previously mentioned, Excel's limit is just over 1m rows). We would also struggle to get the latest data as it is produced and ingested due to the open-and-lock-the-workbook limitations of many traditional applications.\n\nSo with a selection of tools (in my case the excellent Open Source Elasticsearch visualised with Kibana), we are able to quickly and easily view millions of records within a period of our choice as well as update in real time with newly produced data! It's important to note that Big Data tools are built to handle data orders of magnitude more than this so don't worry about limits when dealing with a few million datapoints! Now, viewing records in such a tabular form doesn't really tell us very much, so now we need to analyse the underlying trends in the millions of records.\n\nThe trends uncovered when millions of records are analysed, displayed and updated in real-time\n\nIn the above image, different performance elements of my system are extracted from the millions of datapoints and displayed on interactive charts in milliseconds. Doing this via traditional means would take much longer and wouldn't provide me with the same speed, frequency of updates and customisability. The interactivity allows me to pinpoint periods of time, as well as only show specific values such as particular applications and/or success/failures etc. When making these selections, ALL of the charts update! Just a click or two away, power at your fingertips 😎.\n\nNow let's take this example further. In a previous article, I spoke about monitoring your logs and provided examples of the text-based access logfiles that webservers manage. Whilst useful, these files can become difficult to analyse, especially when your site has been accessed many times and these files are thousands/millions of lines long. In order to understand how your site has been used, a bit of postprocessing would be required, and fortunately our tools can do this for us 🥳!\n\nMonitor Your Things, Check Your Logs!\n\nOver 5m records generated over a 7 day period\n\nHere, over 5m records have been generated over the 7 day period, which would DEFINITELY test spreadsheet software limitations! It's important to note that not all of this is site access data, which is what we are interested in with this example. In the traditional sense, we would have to go through each record, determine if it was relevant then look at the contents and postprocess them into a form useful to us. This would take ages with 5m records, but with the power of Big Data tools, we can do this on the fly as data is collected! From this I was able for example to easily and quickly determine the operating systems used without breaking a sweat! From a development point of view, this can be useful to determine optimisations required, as well as to suggest to those who are using older systems that they should really update!\n\nDistribution of operating systems used to access services\n\nContinuing on, I tweaked the filter with only a few clicks and was able to see only the browsers used by Apple products, and split these up into the various versions. This is really useful to see how my services are consumed (desktop vs mobile) to decide how content should be displayed, as well for any browser-based optimisations that might be required to give users the best experience. This isn't limited to Apple, I could easily look at browsers for Windows, Android etc, as well as a combination of devices if I wanted to.\n\nDistribution of browsers used on Apple products\n\nThese are but a few simple examples using relatively smaller amounts of data to demonstrate the power you can wield by pointing Big Data tools at your datasets. For me, the best aspect of these tools is how customisable they are! You can take this further and run very complex computations on your stored data, as well as join different sets together to achieve the results that you need. You can even feed the data and your calculations into systems to improve their results over time in using Machine Learning. The idea is that by using such tools, you can gain insights never before possible and progress your base expectations of your data.\n\nWhen I got comfortable using such tools, I could never imagine going back to how things were before! My examples have only included data from one machine, but these tools can ingest data from tonnes of sources and still pluck out insights in real-time, so you're in for a treat if you decide to make the investment. Finally, understand that it is a process that will evolve the more you use it and understand the capabilities. You'll want to add new datapoints, and gather information from other areas etc to build up that overall picture specific to your use case(s). I've used Elasticsearch for this article because it's my bread and butter, but there are many other tools out there to try so be adventurous!\n\nLady perched on top of a cliff happy with the possibilities that Big Data brings\n\nTreating Our Data with Respect\n\nBig Data is a concept that can bring about HUGE benefits when it comes to the information we store as long as we treat it with respect and not use it as a cheap buzzword. The trick is to focus on what insights you can get from your information rather than the amount of information that you have. It is also important that your environment is flexible and allows for your your approaches to evolve over time in response to what you learn. When given time and implemented correctly, it will mark a big change in your working practices and will significantly improve your understanding of the data that you manage.\n\nTake care and all the best, Si.