I’ve been feeling with growing certainty that barring extreme circumstances and mental or physical illness, whether you feel happy on a daily basis is mostly a function of habits. I’ve noticed, anecdotally, that I tend to be happier on days I get adequate sunshine, exercise, don’t use my phone too much, actually watch tv as opposed to letting Grey’s Anatomy play while I scroll aimlessly through Twitter, etc. Conversely, I’m more likely to wake up early, go outdoors, be productive, read and engage with other human beings on days I’m feeling happier.
Lately, I’ve become curious about the Quantified Self movement; it advocates the pursuit of self-knowledge through numbers. Thanks to the fact that tech is ubiquitously (and creepily) tracking our every action, there’s plenty of data that’s already passively being collected about each of us. For data about myself, I pulled my daily steps off my iPhone using this app, my Netflix history, financial transactions (to track daily activities - everything costs $ except the lake, parks and libraries) and transcripts of WhatsApp conversations. To account for factors beyond my control, I pulled temperature, location and hours of daylight for each day of observation.
If I want to understand the predictors of my happy days, I need some measure of my daily happiness. Since I haven’t actively been tracking my moods with a mood journal or an app like Happify or Track Your Happiness, I have to wade back into this data and come up with some metric to use as a proxy.
I live and work on a continent that’s very far away from my family and several friends - as a result, some of my closest and most meaningful relationships are conducted online. Since I often talk about my day over text message with people who weren’t part of my day, I wondered if the words I use to talk about my day could be mined to compute a happiness score, for want of a better term. I’m going to walk you through the exploratory analysis I did to figure out whether this made sense.
To do this, I first exported WhatsApp transcripts of conversations with four different friends to understand what this data looked like. All four are friends I have relatively substantive conversations with and are unlikely to have noise from forwards or tweets or cat pictures. I split up the text messages into sentences, and used functions from the sentimentr package to compute polarity scores for each sentence, using Baccianella, Esuli and Sebastiani’s (2010) SentiWord lexicon. With
sentimentr, I was able to incorporate valence shifters; these account for words that negate, amplify, de-amplify, or overrule adjacent words with high polarity scores. Scores are grouped by date and sender, and then averaged. For instance, the average sentiment of the text messages I sent my friend Pranathi on Oct 21, 2018 was 0.32 .
Neutral sentences like ‘How are you?’ are coded as 0 with more positive sentences having positive sentiment scores and more negative sentences having negative scores. Here’s a quick look at the overall distribution of sentiment scores. As I’d expected, it looks roughly normal - most sentences are relatively neutral with more extreme sentiment scores becoming increasingly less frequent.
I know that I’m susceptible to the Sunday blues, so I wanted to see if I tend to send more despondent messages on Sundays. Turns out, not true at all! The valance of my text messages are, in fact, most positive on Sundays! Perhaps this is because we’re all animatedly discussing the wild, exciting Saturday nights (lol) of our twenties, but silently sulking through laundry and chores when the full force of Monday-dread hits. Adding to the noise here is time-difference - this data represents at least three different time zones and so, my friends and I are often in different moods and frames of mind when we talk to each other.
The next thing I was curious to explore was - whose moods are getting reflected in the text? There’s naturally a fair degree of interdependence here - even when I’m having a great day, I’m unlikely to sound bouyant and pumped while talking to a friend who’s feeling miserable, and vice-versa. Conversations (ideally, anyway) involve listening and responding, and we could potentially hypothesize that the person initiating the conversation is likely to set the tone and the other person takes their cue from them. I’ve had the phone I’m using since Nov 2017, so I plotted daily averages over close to two years. I don’t really know what to make of these trends other than that I appear to use more positively scored words than my friends.
To break this down further, I split these trends up by friend and I rather like what this reveals! In the words of the ultimate object of my bicep-envy (Michelle Obama), “When they go low, we go high”. I should also probably find a way to model my penchant for quoting things that are completely out of context but, look! There are several instances where a local minimum in one person’s curve is accompanied by a local maximum in the other person’s curve of somewhat similar magnitude. It appears as if when one of us is having a rough day, the other is being reassuring and responding with positivity and optimism! Before we cue sunshine and rainbows, however, does this also mean that when one person is talking about something very positive, the other is bringing them down? Without accounting for lag, I can’t say for sure, but this also makes me wonder whether highly positive (or highly negative) information is more likely to be shared over a phone conversation.
These curves also appear to suggest that ‘moods’ are somewhat persistant and cyclical. I looked at the valance of my text messages alone, at two different levels of granularity, to see if they roughly correlated with my own memories of my feelings at the time and if any patterns I was seeing were artifacts of ggplot smoothing functions. I won’t go into too much detail, but they don’t consistently correlate, confirming that the words I use are not only a function of my own moods but also a function of my friends’ moods. I am, of course, an unreliable narrator of my own past and could be conflating the feelings I had while doing certain things with the feelings that surfaced when those actions bore fruit. Weekly cyclical patterns emerge when the ‘span’ parameter for geom_smooth’s loess function is set to a lower value. This controls the degree to which smoothing occurs; lower values produce more wriggly lines.
So, should I use sentiment scores on my text messages as a proxy for how good my mood was on a given day? Shouldn’t I be trying a more sophisticated, tailored approach to classifying sentiment first? I’ve used a completely unaltered off-the-shelf dictionary-based implementation using SentiWord. This doesn’t take into account specificities of context, the fact that a non-trivial percentage of my text messages are transilterated Hindi, slang, expletives, emojis and GIFs. For instance, ‘Haha’ and ‘Hahaha’ are both rated as -0.59 while hahahahaha is rated 0. “What the fuck!” is rated -0.29 but “wtf!” is rated 0. Inexplicably, “This is amazing” gets a -0.23. Before I whip tensorflow out though, I think I have enough reasons to believe that even with more accurate and specific sentiment classification, this is maybe not the best idea.
My text messages are not responses to my day alone, they are also responses to my friends’ days.
There’s a lot of missing data - I’m not texting my friends when I’m hanging out with them IRL or speaking to them on the phone. What information I choose to convey over text is not a choice I make randomly and is likely correlated with the types of emotions I’m swayed by at that point.
It would be a fairly strong assumption to make that anyone is consistently authentic and honest in communicating their feelings. Some of my most cheery Instagram posts were made on rather miserable days.
While I continue to think of alternatives to use as a proxy for my daily happiness, I’ve started to track my moods using the Daylio app. If you have other ideas, I’d love to hear them! Code for this can be found here