Tag Archives: statistical analysis

Picking a Baby Name That’s Uniquely Popular

April 28, 2014Blogbaby name, baby names, statistical analysis, zipfJason Morrison

We’re letting the whole Internet vote on the name for our baby, so it might seem a little strange that I’m posting about picking a name that’s unique. Why bother with a poll if you don’t want a popular name? What does uniquely popular mean, anyway?

Looking at the data right now, Alexander is comfortably in the lead. But Alexander is also a very popular name in general right now – it was the 9th most popular name for boys in 2012 according to the U.S. SSA. So, I’d like to see if some names are getting more votes in our poll than you would expect just by general popularity.

I did a similar analysis when we did this for my daughter five years ago. Back then I used SPSS and a little more statistical rigor, but I haven’t had a chance to play around with R to do something similar yet. For now I’ll stick to what I can do in Google Spreadsheets.

Here’s a plot of each name, showing the number of votes in our poll vs the number of babies with that name in 2012:

So what does this tell us? First off, the names seem to line up more or less on a line going up and to the right. This means there’s probably a correlation between our poll and popularity in the U.S. Google Spreadsheets has a function to give you the correlation coefficient called CORREL(). Right now this is 0.69, which is a pretty strong correlation.

Second, if we guesstimate where we would have to put a straight line to best fit these points, we can see which names are above the line – right now, it’s Nikola, Luka, and Finn, with Soren maybe just peeking over the top. If we want to pick names that are uniquely popular in our poll, those are good choices.

I’ve plotted the U.S. Babies in 2012 totals on a log scale for two reasons – first, it’s much easier to read this way, and second, it doesn’t look like baby names are distributed very evenly:

This is from U.S. SSA data again. In this chart you can see that a very small number of the most popular names (at the left side of the graph) are given to a very large number of babies. Looking toward the right of the graph, there’s a very long tail of many names given to much smaller numbers of babies.

This looks like it might be a Zipf distribution, which is a pretty common distribution for data like wordcounts and website popularity. If we shift that graph to a log scale it starts to look more like a straight line.

By the way, if you haven’t voted on our baby name poll yet, go ahead and vote now – this baby is coming soon!

Internet, I’d Like To Introduce You to Athena Marie Morrison

November 23, 2008Blogbaby names, Flickr, hello world, internet, statistical analysis, TwitterJason Morrison

Hello World!

Thanks to the well over 10,000 people who voted in our baby name poll, we’ve chosen the perfect name for our new baby – Athena Marie Morrison.

Those of you who have been following this story might be a bit surprised at the name choice, since Olivia was leading the poll for girls’ names. But the very, very few readers who managed to make their way through my boring (but educational) statistics posts will remember that Ann and I controlled for popularity, hoping to pick a name that was loved by our voters but still reasonably unique and interesting.

We took three names that our voters liked better than could be explained by general popularity in 2007 – Cassia, Ada, and Athena – and waited to see which name would fit her best.

Since our baby was born with her eyes open, perceptive and looking very thoughtful, we thought it was appropriate to name her for the goddess of wisdom.

Thanks again to all the family, friends, Googlers, and random internet strangers who voted.

If you’d like to keep following Athena’s first days on the planet, you can follow me on Twitter.

For photos (a few now, and hundreds more as soon as we get home), feel free to friend me on Flickr.

If you’d like to read more about web design, usability, and doing crazy social experiments with the internet, please subscribe to my blog feed or subscribe via email.

Choosing a Unique Baby Name with Statistics

November 9, 2008Blogbaby names, chart, confidence interval, correlation, linear regression, popularity, R-Square, scatterplot, SPSS, statistical analysisJason Morrison

We’ve got well over 10,000 votes, we know that the vote totals are significantly different from random, so do we have enough information to pick a name yet?

There’s another stats exercise I want to go through before we narrow down the list. We want to pick a name that people have voted for, but we’d also like to choose a name that’s not too popular. This is just a personal preference that Ann and I have, we think it’s a little more fun to have a more unique name.

Also, it would be pretty boring if the vote gives us the exact same information as a list of most popular baby names. So, how to do we choose a name that’s popular with friends and family (and in our case, random internet strangers), that’s still reasonably unique?

Based on the chart below, names that fit our criteria include Ada, Cassia, Athena, Erin, or Olivia for a girl and Nikolas, Levi, Isaac, Dylan or Alexander for a boy. Follow along and I’ll explain where I got the data and how it helps me pick names.

Link to the full-sized graph at Flickr.

The graph you see above is a scatterplot of the names, showing the vote total versus the number of babies given that name in the U.S. in 2007. For example, Isaac has 1220 votes as of this writing and 10,066 babies were named Isaac in 2007.

Continue reading →

JasonMorrison.net

Usability, web development, and design

Tag Archives: statistical analysis

Picking a Baby Name That’s Uniquely Popular

Internet, I’d Like To Introduce You to Athena Marie Morrison

Choosing a Unique Baby Name with Statistics