Today at work, I had to read some data about people from a CSV file where the names were given in a single “Name” column, and use that data to get each person’s first name, last name, title, and so on.
So I looked for a name-parsing library, and the first one I found was probablepeople, which uses “advanced NLP methods” and a “probabilistic model” to parse names into components. I put in a name that was just slightly more complex than usual, and…
…it decided that “Jeremy Corbyn MP” was the name of a corporation.
The next one I tried was nameparser, which doesn’t use any fancy NLP or probabilistic methods—it humbly
“attempts the best guess that can be made with a simple, rule-based
That one turned out to do just the right thing, even with one of the most complex names I could find in my dataset:
So that’s something mildly amusing from my work today.