From sklearnfeature_extractiontext import CountVectorizer vect CountVectorizer Using the fit method our CountVectorizer will learn what tokens are being used in our messages. To speed up training it is recommended to. You need a newer scikit-learn version.