Audience Research: Understanding Personalities Through Text Analysis

Challenge: In 2017, our client wanted to understand its customer base in real time. Their market research team asked if we could use Twitter data to gain a better understanding of potential customers, utilizing Big 5 Personality metrics, as well as how these personality metrics differed by varying demographics –age, marital status, location, and others.…

Challenge:

In 2017, our client wanted to understand its customer base in real time. Their market research team asked if we could use Twitter data to gain a better understanding of potential customers, utilizing Big 5 Personality metrics, as well as how these personality metrics differed by varying demographics –age, marital status, location, and others.
Our team conducted a proof of concept to demonstrate the possibility of this process, and attempt to replicate results that had previously been performed in academic studies.

Approach:

Managed a qualitative researcher & data scientist to categorize terms based on previous secondary text analysis research.

Twitter data was collected via Sysomos MAP then tuned, cleaned, supplemented with age, income, and location data, and ultimately correlated to personality type based on keyword groupings. 20,000 random Twitter records were collected for each of the 30 states, totaling 600,000 Twitter records.

Data Tuning
•Initial Twitter records with no bio text were eliminated.
•Author Names were matched to a name database to determine “real names” (non-entity), as a way of narrowing content to authentic authors, as well as allowing name/age matching.

Data Supplementation
•Average income data was added to each record and matched at the County level*.
•Age was added to each record and determined by matching user names to the US Census Name & Age Birth Record Database, as a way to predict age based on first name conventions.
•Location data was gleaned and matched to confirm city, county, state designations (records with no identifiable location information were eliminated)

Result:

This experimental method was surprisingly accurate in the ability to use data analysis and match it to a qualitative researcher’s tagging of the Big 5 personality types. However, in the end we couldn’t recommend it for the client’s original intention as it was at predicting strength of response and preferences to certain things, it was not a reliable way to decide if a product was a good fit for someone based simply on their Big 5 Personality score.

Tags:

Leave a comment