Senior developer/tech lead @ Zone Digital
Topic: Building decent NER for Ukrainian language
Named-entity recognition (NER) (also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction. It seeks to locate and classify named entities in text into pre-defined categories such as the names of persons, organizations, locations, etc.
While state-of-the-art NER systems for English are producing near-human performance for languages like Ukrainian, it’s still impossible to do that in an automatic fashion, so we decided to fix it.
We’ve manually annotated corpus with named entities and published it in the open domain. We also trained two models on this corpus: MITIE and deep learning neural network. In my talk, I’ll describe the architecture of the latter one and show how it’s now possible to build a high performant neural network for NLP using only word embeddings as the feature.