Niketan Pansare, Scalable and Fast Machine Learning

Slides

In this age of big data, every company is looking into ways to apply machine learning algorithms on their huge datasets. However, the problem is that this takes a huge amount of time due to the volume of data as well as complexity of the algorithms. To address the first problem, most companies use MapReduce, a software framework that parallelizes computations across clusters. To address the second problem, most companies use a simpler, iterative and more parallelizable algorithm called ?gradient descent? to implement these machine learning algorithms. However, these techniques are still not fast enough to meet the needs of industry. My work focuses on improving the performance of gradient descent on MapReduce, which by extension also improves the performance of those machine learning algorithms on MapReduce as well.

For my next problem in this proposal, I will be using my previous work on Online Aggregation for implementing the gradient descent algorithm and also will focus on the key question of when to stop a given iteration of gradient descent. The model proposed in this problem improves the performance of gradient descent on MapReduce and hence will takes us one step closer to the grand goal of scalable data analytics.