Posted on

Hadoop – 22 Analytics tools for big data

hadoop - Google Soution

Hadoop – Analytics tools for big data

Traditional Approach Before Hadoop Analytics tools for big data

In this approach, a company will store all data in their computers and then process that data with analytics tools for big data.

Before hadoop, big data were stored in RDBMS like Oracle Database, MS SQL Server or DB2. Later google devloped an algorithm to give Hadoop – Google Big Data Solutions.

Hadoop - Google Big Data Solutions

What were the Limitation of this Traditional Approach?

  1. Traditional approach worked great when volume of data to be processed was small.
  2. But when it comes to huge amounts of data, it was really a tedious task to process such data. through a traditional database server.


Read our previous post: Apache Hadoop: What is high performance big data analytics

Google Solution to Limitation of Traditional Approach

  1. To solve this problem, google developed an algorithm called MapReduce.
  2. MapReduce algorithm divides the task into small parts and assigns those parts to many computers connected over the network, and collects the results to form the final result dataset. This made working on big data smooth and simple.

analytics tools for big data

Hadoop – analytics tools for big data

Google developed an algorithm called MapReduce. Doug Cutting, Mike Cafarella and team took the solution provided by Google and started an Open Source Project called HADOOP in 2005 and Doug named it after his son’s toy elephant. Now Apache Hadoop is a registered trademark of the Apache Software Foundation.

  • Hadoop runs applications using the MapReduce algorithm, where the data is processed in parallel on different CPU nodes.

Analytics tools for big data

  • OpenRefine

OpenRefine (or GoogleRefine) is an open source tool that is dedicated to cleaning messy data.

analytics tools for big data OpenRefine

  • DataCleaner

DataCleaner recognises that data manipulation is a long and drawn out task.

  • RapidMiner

RapidMiner is a fantastic tool for predictive analysis. This tool is used by bigger brand like Paypal, Deloitte, eBay and Cisco.

  • IBM SPSS Modeler

IBM SPSS Modeler helps in  text analysis, entity analytics, decision management and optimization.

analytics tools for big data IBM SPSS Modeler

  • Oracle data mining

Oracle data mining allows its users to discover insights, make predictions and leverage their Oracle data.

  • Teradata

Teradata provide end-to-end solutions and services in data warehousing, big data and analytics and marketing applications.

  • FramedData

This is a startup which analyzes your analytics and tell you which customers are about to abandon your product.

analytics tools for big data FramedData

  • Kaggle

Kaggle is the world’s largest data science community where companies and researchers post their data and statisticians and data miners from all over the world compete to produce the best models.

  • Qubole

Qubole simplifies, speeds and scales big data analytics workloads against data stored on AWS, Google.

analytics tools for big data Qubole

  • Tableau

Tableau is a data visualization tool with a primary focus on business intelligence.

analytics tools for big data Tableau

  • Silk

Silk is a much simpler data visualization and analytical tool. It allows you to bring your data to life by building interactive maps and charts.

analytics tools for big data Silk


  • CartoDB

CartoDB can manage a myriad of data files and types and If you have location data, you definitely should use CartoDB.

  • Chartio

Chartio allows you to combine data sources and execute queries in-browser.

analytics tools for big data Chartio

  • allows you to create stunning 2d and 3d charts.

  • Datawrapper

Open source tool that creates embeddable charts.

  • Blockspring

Blockspring is a unique program in the way that they harness all of the power of services such as IFTTT and Zapier in familiar platforms such as Excel and Google Sheets. With just writing a Google formula you can connect to a whole host of 3rd party programs.

analytics tools for big data Blockspring

  • Pentaho

Pentaho offers big data integration with zero coding required.

analytics tools for big data Pentaho

Data Languages for analytics tools for big data

  • R

R is a language for statistical computing and graphics. You can use R, If the data mining and statistical software listed above doesn’t quite do well for you.

  • Python

Python can be used to write custom scrapers if data collection tools fail to get the data that they need.

analytics tools for big data Python

  • RegEx

Regular Expressions are a set of characters that can manipulate and change data.

  • XPath

XPath is a query language used for selecting certain nodes from an XML document.

Data Collection analytics tools for big data

  • is the number one tool for data extraction. With just few clicks, you take a webpage and transform it into an easy to use spreadsheet that you can then analyze, visualize and use to make data-driven decisions.

analytics tools for big data

Hadoop dossier

  • Dedicated Hadoop practice—part of a focused Cloud Computing CoE.
  • Dedicated Hadoop Sandbox cluster: More than 70-node cluster.
  • Comprehensive expertise: data aggregation, storage, parallel processing, analytics, data visualization, and machine learning.
  • Hadoop-focused QA: comprehensive big data verification, cluster benchmarking, and performance-tuning expertise—methodology, tooling, and practices.
  • Hadoop RIM: trained and certified Hadoop administration staff.
  • Partnership with industry leaders: AWS, Cloudera, and VMware.