It’s a well known fact that we have generated more data in the past two years than in the entire history of the human race! Sounds fascinating? Well, this is just the tip of the iceberg. The data is growing so fast that about 1.7 Megabyte of information is created for every human being on this planet. This humongous amount of data has left companies scrambling to make use of it. Companies all around the world are working on making sense of this huge amount of data and apply it in their business.
Why the boom all of a sudden?
Data Science, especially Machine Learning has been around for quite a few decades. The idea that a program can deliver accurate results even when it is very lightly bound by rules has been boggling programmers for years now. With the advent of cheap hardware and breakthroughs in the semiconductor industry, an impetus was provided for the rapid growth of data science.
Whenever there has been a revolution has taken place in the IT Sector, IBM has never been behind. IBM’s new service called Watson, a next gen supercomputer powered by over 90 servers daisy chained with each helped IBM achieve processing rates of over 80 teraflops!
IBM Bluemix under the hood:
Computation Speed: 80 teraflops
Concurrent Servers: 90
Pages of Information: 200 million pages
No of logic rules: 6 million
No of Processor Cores: 2880
Data Scientists and Machine Learning Practitioners have been working on various complex algorithms to solve various real life problems. Even though there has been a boom in the data generated there is yet to be a significant development in the hardware for common people. Though we have been able to solve the problems with Machine Learning, it takes a significant amount of time to get an accurate solution. Here comes IBM with their solution.
IBM Bluemix, having been used for IoT and various other technologies has finally been released a portal for Data Scientists to off load their computational tasks to the trusted and reliable Bluemix servers. Even the toughest and meanest problems can be solved within a jiffy as compared to the traditional laptop or desktop route. IBM has given the ability to connect all the services within the Bluemix sector with one another. This allows us to connect our IoT devices via IBM Watson and feed the data to our models on IBM Data Science Experience (IBM DSX) and then run predictive analysis on it. This flexibility helps developers get things done in a faster and secure way with low turnaround time.
IBM Data Science Experience is a one stop solution for Data Scientists, Machine Learning Practitioners, Data Engineers, etc with the latest cutting edge tools in the market.
DSX currently includes the following tools and technologies
Jupyter Notebook with Python, R, and Scala
Why IBM DSX?
We can use either our own data or the datasets available from the community to create, train and deploy self-learning models. Leveraging an automated, collaborative workflow helps you to drive intelligence into the day to day business applications easily.
Simple: DSX Learning guides and community tutorials guide you through the steps to create, train, evaluate and deploy based on the desired outcome.
Powerful: Harnessing the power of DSX, you can run multiple algorithms to find the one best suited for your needs the works best for your data
Flexible: Testing and deploying your models as APIs for development, testing or pushing to production has never been easier.
Interact with a wide variety of data sources:
DSX supports the following wide variety of data sources that can either be pulled from or connected to. Along with the following data sources, streaming data from Kafka topics is also supported.
- Amazon Redshift
- Apache Hive
- Cloudera Impala
- IBM DB2
- IBM Informix
- IBM Netezza
- IBM dash DBTM
- IBM Watson Analytics
- Microsoft Azure
- Microsoft SQL Server
- Pivotal Greenplum
- Sybase IQ
A recent test by a Github user to study the difference between local development and development on IBM’s Bluemix Servers gave astonishing results. The test was performed with a standard 5000 IMDB Movie Database hosted on Kaggle to make a scalable movie recommendation engine. A recommendation engine works by recommending users things that they might like based on the things they have liked in the past just like Youtube’s video recommendation or any e-commerce portal’s “you might also like to buy this” section. The test was performed on a dual core i7 laptop with GPU acceleration which took over 38 minutes to give out credible output and the same program on IBM’s Data Science system based on Bluemix took just over 30 seconds to complete.
Harishkandan, a budding data scientist based out of Mumbai made the switch to IBM’s Data Science platform from AWS based systems. A thumb rule in the world of IT is that Developers should spend maximum time on development instead of setting up systems every now and then. To work on Machine Learning on AWS, one needs to create new Virtual Systems and then install various software and dependencies everytime they wanted to work on a new project which leads to a lot of man hours wasted in grunt work. IBM solves this very problem by grouping everything together under a single umbrella namely – Bluemix and presents a stacked solution to all our worries. When things run out of the box, the chances of error due to environmental issues in various systems are also elimated.
Though IBM DSX is still in its beta phase, it is very well capable of solving your day to day problems which are limited due to hardware constraints. As DSX is cloud based you do not need a high-end system anymore for simple machine learning tasks to complex deep learning projects
You can sign up for a 30 day free trial of IBM Data Science Experience at http://datascience.ibm.com
Feel Free to share your thoughts in the comment section below.
Don't forget to follow us on Twitter, like our Facebook Fan Page and Add us to your circles on Google+ to keep you updated with the latest technology news, gadget reviews, launches around the world and much more