A Story of Big Data

One day Mickey was watching TV at his home when he heard Minnie calling him out

Minnie – Hey Mickey

Mickey – Hi Minnie, how are you

Minnie – I need to talk to you

Mickey – Okay give me 2 min and will go on a drive

Mickey – Yes Minnie, tell me now

Minnie – I keep hearing the term “Big Data” everywhere and I need to know what exactly is this

Mickey – Okay, don’t worry let me explain

Minnie – Yeah

Mickey – See we have data everywhere online, offline and data keeps on generating every second. Like people are posting images, videos, text, chatting, communication all this generates lots of data

Minnie – Right

Mickey – Now traditionally we used Database and tables to store the data

Minnie – Yes

Mickey – Now Big Data refers to a huge volume of data, that cannot be stored and processed using the traditional computing approach alone within a given time frame

Minnie – Ok, How big is the big data

Mickey – There is a lot of misconception surrounding, what amount of data can be termed as Big Data

Minnie – What misconceptions?

Mickey – Usually, the data which is either in gigabytes, terabytes, petabytes, exabytes or anything larger than this in size is considered as Big Data. This is where the misconception arises

Even a small amount of data can be referred to as Big Data depending on the context it is being used

Minnie – I did not get that

Mickey – For example, if we try to attach a document that is of 100 megabytes in size to an email we would not be able to do so. As the email system would not support an attachment of this size

Therefore this 100 megabytes of attachment with respect to email can be referred to as Big Data

Minnie – Okay, Why do we need Big Data, What do we do with it

Mickey – As I said any data, that is huge and could not be processed by traditional computing approaches alone in given time is Big Data. We need data to do calculations, analysis, take decisions etc

Minnie – Like what?

Mickey – When you goto your youtube page, you will see video recommendations based on your previously watched videos, When you goto any shopping site like amazon, you will see products based on your interests

Minnie – Yes, I see

Mickey – Also when you search for some holiday package to some specific destination, and even after closing that web site, you will see ads related to that even on other websites you visit

Minnie – Yeah, I have seen that

Mickey – So here data generated from all these actions is recorded, processed, analysed and further used as needed

Minnie – Got it

Mickey – This is one of many examples of using Big Data. As you see data can be in the form of images, videos, texts, audio etc

Minnie – So do we have any categorisation for Big Data

Mickey – Very good question, Minnie, Big Data is categorised into 3 different categories.

  • Structured Data
  • Semi-Structured Data
  • Unstructured Data

Structured Data refers to the data that has a proper structure associated with it. For example, the data that is present within the databases, the CSV files, and the excel spreadsheets can be referred to as Structured Data

Semi-Structured Data refers to the data that does not have a proper structure associated with it. For example, the data that is present within the emails, the log files, and the word documents can be referred to as Semi-Structured Data

Un-Structured Data refers to the data that does not have any structure associated with it at all. For example, the image files, the audio files, and the video files can be referred to as Un-Structured Data

Minnie – I am getting it

Mickey – There are 3 important characteristics of Big Data

  • Volume
  • Velocity
  • Variety

Volume refers to the amount of data that is getting generated
Velocity refers to the speed at which the data is getting generated
Variety refers to the different types of data that is getting generated


Minnie – Okay Mickey, but then how do we store and process this Data

Mickey – Very good question Minnie, and these are the 2 main challenges of Big Data I.e. Storing and Processing, and this led to the creation of Hadoop Framework

Minnie – Okay so Hadoop is a framework to store and process Big Data

Mickey – yes, but more on it some other day. I hope Big Data is clear to you now

Minnie – Yes, Mickey

Mickey – Shall we go back home now

Minnie – No Mickey, lets goto the beach and spend a nice evening

Mickey – Sure, Ma’am

Minnie – You are my best friend

Mickey – I am always there for you 🙂


References –
https://en.wikipedia.org/wiki/Big_data
https://medium.com/swlh/big-data-explained-38656c70d15d
https://en.wikipedia.org/wiki/Apache_Hadoop

34