The journey from raw text to a good dataset
In this blog post I want to describe the journey that comes before all the “AI”s, “ML”s and other buzzwords - the tedious data collecting, analyzing, and processing. I’ll be focusing on textual data, and how to build a good dataset for your NLP problems.