Tuesday, April 28, 2015

What is Machine Learning?

Machine learning represents the logical extension of simple data retrieval and storage. It is about developing building blocks that make computers learn and behave more intelligently. Machine learning makes it possible to mine historical data and make predictions about future trends. Search engine results, online recommendations, ad targeting, fraud detection, and spam filtering are all examples of what is possible with machine learning. Machine learning is about making data-driven decisions. While instinct might be important, it is difficult to beat empirical data.

What is the use of Machine Learning?

Machine Learning is found in things we use every day such as Internet search engines, email and online music and book recommendation systems. Credit card companies use machine learning to protect against fraud.
Using adaptive technology, computers recognize patterns and anticipate actions. Machine Learning is used  in more complex applications such as:
  • Self-parking cars
  • Guiding robots
  • Airplane navigation systems (manned and unmanned),
  • Space exploration
  • Medicine
What is Machine Learning best suited for?

Machine Learning is good at replacing labor-intensive decision-making systems that are predicated on hand-coded decision rules or manual analysis. Six types of analysis that Machine Learning is well suited for are:
  • classification (predicting the class/group membership of items)
  • regression (predicting real-valued attributes)
  • clustering (finding natural groupings in data)
  • multi-label classification (tagging items with labels)
  • recommendation engines (connecting users to items)

Sunday, April 12, 2015

Key Skills for a Successful Analytics Career

Companies worldwide are dealing with huge volumes of data, and as companies get more adept at data acquisition they are relying increasingly on analytics professionals to help them mine the data for business insights and to drive strategic growth. Qualified analytics professionals are in great demand, and can command high salaries for specialized skills.

There are some fundamental behaviors that are critical to those looking to build a successful analytics career, including:


  • A high sense of intellectual curiosity:
    People that tend to do well with analytics careers typically have a high sense of curiosity and inquisitiveness. They want to know the whys and hows of any situation, and that is very useful in a professional environment dealing with business challenges. There has to be an interest in understanding the business issue and working out the specifics of the solution, and especially the curiosity to challenge any assumptions.
  • Mathematically oriented:
    To do well in analytics, you need to be comfortable with mathematical concepts, and not be afraid to use mathematical tools. This is not the career for you if the word Mathematics strikes fear in your heart!
  • Big picture vision:
    It is important to always remember the larger business issue that is being addressed through the process of working with data and dealing with minute.
  • Detail oriented:
    While it is important to remember the big picture, it is critical to pay attention to the details. While working with large volumes of data it is very easy to lose sight of the specifics that add insight and understanding to solving business issues.
  • Ability to differentiate between tools and methods:
    This is a common issue – confusing a tool with a solution. SAS and Excel are “dumb” tools in the sense that the output produced is meaningless unless thought has been applied to the methodology and techniques applied to get at results. Analytics is not SAS; it is using SAS to arrive at results applying analytical thinking and methodology.
  • Interpretation skills:
    Ultimately, every hoop that an analyst jumps through is to enable solving a business problem. Numbers by themselves mean nothing. Experience and domain understanding give one the ability to interpret the results in the business context, assess usefulness of results, and allow the building of strategies based on the outcomes.

Certified Big Data Professional

Big Data success requires professionals who can prove their mastery with the tools and techniques of the Hadoop stack. However, experts predict a major shortage of advanced analytics skills over the next few years.
The Cloudera Certified Professional (CCP) program delivers the most rigorous and recognized Big Data credential.

Cloudera Certified Professional: Data Scientist (CCP:DS)

CCP: Data Scientists have demonstrated the skills of an elite group of specialists working with Big Data. Candidates must prove their abilities under real-world conditions, designing and developing a production-ready data science solution that is peer-evaluated for its accuracy, scalability, and robustness.


Cloudera Certified Developer for Apache Hadoop (CCDH)

Individuals who achieve CCDH have demonstrated their technical knowledge, skill, and ability to write, maintain, and optimize Apache Hadoop development projects.


Cloudera Certified Administrator for Apache Hadoop (CCAH)

Individuals who earn CCAH have demonstrated the core systems administrator skills sought by companies and organizations deploying Apache Hadoop.


Cloudera Certified Specialist in Apache HBase (CCSHB)

Individuals who pass CCSHB have demonstrated a comprehensive knowledge of the technology and skills required by companies using Apache HBase.

For more info Please click  http://cloudera.com/content/cloudera/en/training/certification.html

What is Big Data?

Big data means really a big data, it is a collection of large datasets that cannot be processed using traditional computing techniques. Big data is not merely a data, rather it has become a complete subject, which involves various tools, techniques and frameworks.

Due to the advent of new technologies, devices, and communication means like social networking sites, the amount of data produced by mankind is growing rapidly every year. The amount of data produced by us from the beginning of time till 2003 was 5 billion gigabytes. If you pile up the data in the form of disks it may fill an entire football field. The same amount was created in every two days in 2011, and in every ten minutes in 2013. This rate is still growing enormously.

Big Data Technologies


Big data technologies are important in providing more accurate analysis, which may lead to more concrete decision-making resulting in greater operational efficiencies, cost reductions, and reduced risks for the business.

To harness the power of big data, you would require an infrastructure that can manage and process huge volumes of structured and unstructured data in realtime and can protect data privacy and security.

There are various technologies in the market from different vendors including Amazon, IBM, Microsoft, etc., to handle big data. While looking into the technologies that handle big data, we examine the following two classes of technology:

Operational Big Data
This include systems like MongoDB that provide operational capabilities for real-time, interactive workloads where data is primarily captured and stored.

NoSQL Big Data systems are designed to take advantage of new cloud computing architectures that have emerged over the past decade to allow massive computations to be run inexpensively and efficiently. This makes operational big data workloads much easier to manage, cheaper, and faster to implement.

Some NoSQL systems can provide insights into patterns and trends based on real-time data with minimal coding and without the need for data scientists and additional infrastructure.

Analytical Big Data

This includes systems like Massively Parallel Processing (MPP) database systems and MapReduce that provide analytical capabilities for retrospective and complex analysis that may touch most or all of the data.

MapReduce provides a new method of analyzing data that is complementary to the capabilities provided by SQL, and a system based on MapReduce that can be scaled up from single servers to thousands of high and low end machines.