house-carpenter:

Today at work, I had to read some data about people from a CSV file where the names were given in a single “Name” column, and use that data to get each person’s first name, last name, title, and so on.

So I looked for a name-parsing library, and the first one I found was probablepeople, which uses “advanced NLP methods” and a “probabilistic model” to parse names into components. I put in a name that was just slightly more complex than usual, and…

…it decided that “Jeremy Corbyn MP” was the name of a corporation.

The next one I tried was nameparser, which doesn’t use any fancy NLP or probabilistic methods—it humbly
“attempts the best guess that can be made with a simple, rule-based
approach”.

That one turned out to do just the right thing, even with one of the most complex names I could find in my dataset:

So that’s something mildly amusing from my work today.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s