One day Mickey was watching TV at his home when he heard Minnie calling him out
Minnie – Hey Mickey
Mickey – Hi Minnie, how are you
Minnie – I need to talk to you
Mickey – Okay give me 2 min and will go on a drive
Mickey – Yes Minnie, tell me now
Minnie – I keep hearing the term “Big Data” everywhere and I need to know what exactly is this
Mickey – Okay, don’t worry let me explain
Minnie – Yeah
Mickey – See we have data everywhere online, offline and data keeps on generating every second. Like people are posting images, videos, text, chatting, communication all this generates lots of data
Minnie – Right
Mickey – Now traditionally we used Database and tables to store the data
Minnie – Yes
Mickey – Now Big Data refers to a huge volume of data, that cannot be stored and processed using the traditional computing approach alone within a given time frame
Minnie – Ok, How big is the big data
Mickey – There is a lot of misconception surrounding, what amount of data can be termed as Big Data
Minnie – What misconceptions?
Mickey – Usually, the data which is either in gigabytes, terabytes, petabytes, exabytes or anything larger than this in size is considered as Big Data. This is where the misconception arises
Even a small amount of data can be referred to as Big Data depending on the context it is being used
Minnie – I did not get that
Mickey – For example, if we try to attach a document that is of 100 megabytes in size to an email we would not be able to do so. As the email system would not support an attachment of this size
Therefore this 100 megabytes of attachment with respect to email can be referred to as Big Data
Minnie – Okay, Why do we need Big Data, What do we do with it
Mickey – As I said any data, that is huge and could not be processed by traditional computing approaches alone in given time is Big Data. We need data to do calculations, analysis, take decisions etc
Minnie – Like what?
Mickey – When you goto your youtube page, you will see video recommendations based on your previously watched videos, When you goto any shopping site like amazon, you will see products based on your interests
Minnie – Yes, I see
Mickey – Also when you search for some holiday package to some specific destination, and even after closing that web site, you will see ads related to that even on other websites you visit
Minnie – Yeah, I have seen that
Mickey – So here data generated from all these actions is recorded, processed, analysed and further used as needed
Minnie – Got it
Mickey – This is one of many examples of using Big Data. As you see data can be in the form of images, videos, texts, audio etc
Minnie – So do we have any categorisation for Big Data
Mickey – Very good question, Minnie, Big Data is categorised into 3 different categories.
- Structured Data
- Semi-Structured Data
- Unstructured Data
Structured Data refers to the data that has a proper structure associated with it. For example, the data that is present within the databases, the CSV files, and the excel spreadsheets can be referred to as Structured Data
Semi-Structured Data refers to the data that does not have a proper structure associated with it. For example, the data that is present within the emails, the log files, and the word documents can be referred to as Semi-Structured Data
Un-Structured Data refers to the data that does not have any structure associated with it at all. For example, the image files, the audio files, and the video files can be referred to as Un-Structured Data
Minnie – I am getting it
Mickey – There are 3 important characteristics of Big Data
- Volume
- Velocity
- Variety
Volume refers to the amount of data that is getting generated
Velocity refers to the speed at which the data is getting generated
Variety refers to the different types of data that is getting generated
Minnie – Okay Mickey, but then how do we store and process this Data
Mickey – Very good question Minnie, and these are the 2 main challenges of Big Data I.e. Storing and Processing, and this led to the creation of Hadoop Framework
Minnie – Okay so Hadoop is a framework to store and process Big Data
Mickey – yes, but more on it some other day. I hope Big Data is clear to you now
Minnie – Yes, Mickey
Mickey – Shall we go back home now
Minnie – No Mickey, lets goto the beach and spend a nice evening
Mickey – Sure, Ma’am
Minnie – You are my best friend
Mickey – I am always there for you 🙂
References –
https://en.wikipedia.org/wiki/Big_data
https://medium.com/swlh/big-data-explained-38656c70d15d
https://en.wikipedia.org/wiki/Apache_Hadoop