Black Swans


4 points

memahesh hasn't added a bio yet


[Help Needed] How to apply NLP techniques on real-time data streams

Hello guys,

Hope everyone is having a great time.

In my project(web application), I have a continuous stream of reviews flooding in real time. I need to perform Sentiment Analysis and Clustering Analysis on the data real time.

I am familiar with various clustering techniques (LSA, DBSCAN etc.,) and sentiment analysis classifiers. (Would love to hear some more algorithms from you ?)

With my experience of participating in hackathons, I can deal with this on a given dataset.

But in a continuous stream of data, I am facing few questions:

i. The new data might have some new words. How to incorporate them into the training process ? Do I need to run the classifier again for all the reviews ? (I would like to know how this is tackled in industry.)
ii. Similarly, for the clustering process. Should I run the clustering process over all the reviews whenever I receive a new review ? And how to efficiently do this as my dataset grows larger and larger. (Again, I would like to know how this is tackled in industry.)

Please point me to any resource that can help me.
Please comment below if you need anymore details.
Thanks in advance,