I am learning Machine Learning(Linear Regression) from Prof. Andrew’s lecture. While listening when to use normal equation vs gradient descent, he says when our features number is very high(like 10E6) then to use gradient descent. Everything is clear to me, but I wonder that can someone give me real life examples where we use such such huge number of features?

I am learning Machine Learning(Linear Regression) from Prof. Andrew’s lecture. While listening when to use normal equation vs gradient descent, he says when our features number is very high(like 10E6) then to use gradient descent. Everything is clear to me, but I wonder that can someone give me real life examples where we use such such huge number of features?

## 2 Answers

For example, in text classification (e.g., email spam filtering), we can use unigrams (bag of words), bigrams, trigrams as features. Depending on the size of the dataset, the number of features can be very large.

List of data sets having large no of attributes:-

1. Daily and Sports Activities Data Set link

2. Farm Ads Data Set link

3. Arcene Data Set link

4. Bag of Words Data Set link

Above are real life examples of data sets having large no. of attributes.