This project is maintained by rahulptel


Algorithms and Optimization for Big Data

School of Engineering and Applied Science, Ahmedabad University


Prof. Ratnik Gandhi | Profile |

Teaching Assistant

Rahul Patel | Profile |

Course Brief

This is a PG course (open as a technical elective to senior UG). The course will evolve around reading and implementation of state of the art literature in streaming algorithms related to big data problems. The objective of this course is to expose students with state of the art literature in the area of algorithms designing specifically for big data (focusing on streaming algorithms and related optimization problems). Student taking this course will develop an ability to independently take up a problem related to big data, model it and design a relevant solution.

This will be a Laboratory-class based course. Every week we will meet for a 3 hours session. During this session we will be discussing one or two ideas from reference research papers. Further, in this session, you (students) will be implementing these ideas in relevant software systems


Type Weightage Description
Midterm Project 30% A 3 Week project individual project - Implementation of an existing research paper.
Endterm Project 30% A 7 Week project group project - Implementation and extention of an existing research paper.
Endterm Take-home 40% A one week individual assignment - Propose solution/s for the open-ended problem provided.

Endterm exam (takehome)

Consider a scenario in which a company like LinkedIn wants to build a module for suggesting career progression paths to its registered users. When a user logs onto the platform, the platform reads user’s profile and based on various parameters of this profile comes up with relevant suggestions on how the user should consider next set of skills to be acquired. Your aim, through this exam, is to design the following two modules:

  1. A module that reads user’s profile and suggest a career path – in terms of skillset – to be acquired. [10]
  2. A module in which user enters a career goal and based on this career goal and other related information the platform suggest a career path.[10]

Relevant user profile data is available here. You are supposed to design modules for

Significantly unique solutions will be appreciated.[10]

Submission will be a report (maximum 3 page for diagrams/algorithm/results and other discussions) and GitHub code. A neat and clean algorithm with relevant proofs of correctness and its efficiency (recorded as a measure of computational complexity) will fetch more grades. [10]

Submit your solutions to Rahul on April 28, 2017 between 10am to 11am.


List of assignments with solutions.

Sr. No Submission Solution
1 Online Regression Parth Satodiya: Tensorflow
2 Online Singular Value Decomposition Kishan Raval
3 Robust Principle Component Analysis (Midterm Project) Riddhesh Sanghavi
4 Probabilistic Principle Component Analysis (PPCA) using Expectation Maximization Parth Satodiya
5 Incremental Principle Component Analysis Maunil Vyas Sol.
6 Generative Adversarial Network using PPCA (Endterm Project) Maunil Vyas, Deep Patel and Shreyas Patel Sol.
7 Online K-medians Clustering Shreyas Patel Sol.
8 Incremental Linear Discriminant Analysis Shreyas Patel Sol.
8 Endterm Take-home Maunil Vyas and Deep Patel Sol.
Ashutosh Kakadiya Sol.

List of repository

Link to the excel file containing list of student repositories.

Useful Resources

Blogs & Tutorials

Softwares & Packages


  1. Multi-Dimensional Regression Analysis of Time-Series Data Streams, Chen et al., Proceedings of the 28th VLDB Conference, Hong Kong, China, 2002.
  2. Linear Programming in the Semi-Streaming Model with Applications to the Maximum Matching Problem, Ahn and Guha, Arxiv 2011.
  3. Fast Low-Rank Modifications of the Think Singular Value Decomposition, M Brand, Elsevier 2006.
  4. Parallel and Collaborative filtering for Streaming Data, Ali, Jhonson, Tang, 2011.
  5. Streaming Algorithm for the SVD, Strumpen, Hoffmann, Agarwal, MIT LCS Technical Memo 2003.
  6. Matrix Factorization for Collaborative Prediction, Kleeman, Hendersen, Denuit.
  7. Generalized Hebbian Algorithm for Incremental Latent Semantic Analysis, Gorrell and Webb.
  8. Analytic challenges in Social Sensing, Abdelzaher and Wang.
  9. Detecting anomaly in data streams by fractal model, Zhang et al., WWW 2014.
  10. Eigenspace Method for Spatiotemporal Hotspot Detection, Fanaee- T and Gama, Arxiv 2014.
  11. Chandima Hewa Nadungodage, Yuni Xia, Fang Li, Jaehwan John Lee, and Jiaqi Ge. Streamfitter: a real time linear regression analysis system for continuous data streams. In Database Systems for Advanced Applications, pages 458{461. Springer, 2011.
  12. Haitao Zhao, Pong Chi Yuen, and James T Kwok. A novel incremental principal component analysis and its application for face recognition. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 36(4):873{886, 2006.
  13. Pang, Shaoning, Seiichi Ozawa, and Nikola Kasabov. “Incremental linear discriminant analysis for classification of data streams.” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 35, no. 5 (2005): 905-914.
  14. Aggarwal, Charu C. “Outlier analysis.” In Data Mining, pp. 237-263. Springer International Publishing, 2015.
  15. Aggarwal, Charu C. “A Survey of Stream Clustering Algorithms.” (2013): 231-258.
  16. Candès, Emmanuel J., et al. “Robust principal component analysis?.” Journal of the ACM (JACM) 58.3 (2011): 11.
  17. Tipping, Michael E., and Christopher M. Bishop. “Probabilistic principal component analysis.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 61.3 (1999): 611-622.