Navigating Data Science Careers

A couple of months ago, I talked to the graduating class of the MA in Computational Social Science program at the University of Chicago. As a recent alumna, and member of the program’s guinea pig cohort, I was excited to share my experiences from the last two years of being out in the world, and I figured a lot of what I shared with them is potentially applicable to a wide variety of data science job seekers.

There are literally hundreds of articles, listicles and essays about how to navigate the data science job market written by people with far more experience and wisdom than I have to offer. I will add my data point to the mix. A couple of things specific to my background that have shaped my experiences - I am a foreign national living and working in the United States, I have a background in economics and public policy, was a data scientist in a research lab at a university, and have recently made the shift to an industry position.

Find your competitive advantage.

Data science jobs have truly grown in scope and variety over the last few years, so it would serve you well to really examine where your competitive advantage lies. Allow job descriptions to help you make this determination - be wary of postings that ask for thirty-four years of experience with machine learning, natural language processing, data engineering, statistics and the ability to assemble a computer from a couple of chips and scrap metal. This likely means that the hiring team isn’t entirely clear as to why they want to hire someone. Clear, descriptive, and honest job descriptions will give you the information you need to gauge for yourself whether you are best poised to do the thing they want you to do.

Here are a couple of descriptions I like a lot - I can’t remember where they’re from, unfortunately, so I can’t shout out to the thoughtful managers that wrote them.

Examine closely for fit.

It is such a cliché but also entirely true that interviews are supposed to be a two-way street. You’re going to be spending 40-50 hours a week doing this thing, so you may as well ask for all the information you need, and really evaluate if this job is a good fit for you. Asking these questions also allows the interviewer to get to know you better, and it gives them more information to assess you on. When I interviewed right out of my masters, I was second-guessing having turned down a cushy job, in a precarious visa situation, and on the verge of running out of health insurance - this made me nervous and underconfident, and I ended up saying whatever I thought the interviewer wanted to hear. To no one’s surprise, this was not an effective strategy.

When I interviewed for my current job, here are some questions I asked my potential manager. A good time to ask these questions is when you’ve been offered the job, or are pretty sure you’ll be - these should help you make your decision.

  • What will I be working on for the first 3-6 months?
  • Who is the consumer of our team’s work?
  • What was this team’s last successful project?
  • What is the team’s engagement with open source development?
  • What resources will I have access to for continued education?

Here are some issues I would recommend discussing openly if you’re about to accept a research data science (or an econ pre-doc) position.

  • Is this gig a stepping stone to grad school?
  • What authorship ambitions do I have?
  • What is the extent of my collaboration with the PI, with students, and with research staff?
  • How much infrastructure development am I signing up to do?
  • What stage of a project will I be working on? Read between the lines closely to estimate the division of time across data collection, experiment design, data cleaning, grant-writing, and paper-writing.

It can help to have an online presence.

I hesitate to bring this up, because it certainly does favour individuals with pre-existing privilege, with the time, energy and resources to dedicate to cultivating a professional presence online, outside of work. It is substantially more difficult for primary caretakers, people working multiple jobs, older candidates, people with interests and responsibilities outside their careers, people for whom it might not be entirely safe to be forthcoming with information in public. These advantages also distribute very clearly across race, gender, and class lines. It risks creating the Red Queen’s race - where you have to run faster and faster to stay in the same place; when enough candidates have something ‘extra’ to offer, it is no longer ‘extra’, it becomes what is expected.

Having said that, an online presence can be a low-stakes way to share your work, practice writing and communication, and generally allow a glimpse into the way you approach problems. If you’re in school now, it’s very likely that with a few extra hours of effort, several of your class projects can be repurposed into simple blog posts. At minimum, it helps when it is easy to google you - even just a concise and updated LinkedIn profile is helpful. If you are able to devote capacity to it - it can offer a long-form argument for why someone should hire you - it is substantially more information than a resume, and unlike an interview, you have full control over what you’re discussing, and how. Simple landing pages, elaborate blogs, well-documented Github READMEs, detailed LinkedIn profiles are all valid! If you’re an R programmer, blogdown is your friend, and Alison Hill has the most comprehensive instructions and troubleshooting tips.

Practice talking about your work.

You don’t have to be a drag and talk about your work all the time, but it is useful to grow accustomed to talking about your work informally. Practice talking about your work to people who are not your advisor, manager, or collaborator - talk to folks on the fringes of your discipline, and those entirely outside it. This will help you figure out what aspects of your job are more interesting to talk about, provoke curiosity, and follow-up questions, as opposed to what felt most challenging to do. Most of us have a tendency to talk about the difficult work because we’re understandably proud of having done it, but it might not be the best reflection of our abilities, or help us have an animated and engaged conversation - which is ultimately what we want our interview to be.

Learning to talk about my work has been difficult for me - I love what I do, but I’m usually loath to talk about it when I’m off the clock. I often explain things to my mother when I want a sanity check - she’s smart, willing to listen because I’m her daughter, and from a completely different discipline, so I’m able to figure out when something that seems obvious to me, does indeed need to be explicitly stated. Learn to use context and situational cues to figure out whether results-focused, methods-focused or process-focused explanations are best suited to exhibit your work.

If you have a connection, use it.

Again, this is another thing that I wish wasn’t the case, but unfortunately, it is. Credible referrals from people who know you in a capacity that allows them to speak to your skills, knowledge, and work ethic, can go a long way. At the very least, it usually gets a pair of human eyes on your application. I’ve found that academic research positions aren’t always advertised, so if there are any professors or labs you want to work with, just reach out - your worst outcome is that someone hears that their work is admired.

If you’re asking for a referral or even just for information, ask NICELY. Remember that you’re not entitled to their time and energy. Ask specific, bounded questions (don’t say “hey, can you tell me about this job?” - they’re not obligated to guess what you want to know and write you an essay about their job). Offer multiple channels of communication, at their convenience. You don’t need to, and shouldn’t, grovel, but you definitely should be kind and polite! Send a thank-you note later.

If you’re cold-applying, really optimize your resume by matching keywords to the job description. Include a basic cover letter, even if it is optional. Don’t hesitate to prune your resume ruthlessly to really foreground your most relevant experiences.

Leverage your social science background.

I went to college in an institute of technology where as an Economics major, I was basically regarded as a Poetry major (zero shade at Poetry, but you know what I mean). Partly because ‘technology’ was seen as so distinct from social science during undergrad, I was surprised by how many things that are now packaged as ‘data science’ are decades old statistical concepts applied commonly across economics, political science, and quantitative sociology.

The term data science is relatively recent, but social scientists really have been using these ideas forever. We have long traditions of analyzing surveys, using econometrics to draw conclusions about populations from samples, a wide variety of experiments to make causal claims. Forecasting, time series analysis, experiment design, randomized trials (A/B testing? it’s a remix) are all completely within the realm of tools quantitative social scientists use to understand the world.

Here are some things podcasts/ talks by very successful data scientists with social science educations that touch on this issue.

Python or R?

This image from here never fails to make me laugh, but I really think the answer to this question is “do not gatekeep, comment your code, and learn SQL”. (Highly reluctant note to self - the message to refrain from gatekeeping applies when I talk to Stata users too.)

Data science Twitter turns into a battleground every few months to hash this out (interspersed by tidy vs base R wars), and it almost never seems to yield anything useful at all. Use whatever has the most comprehensive infrastructure for the task you have at hand.

Python is more ubiquitous, probably more production friendly, has an older machine learning/ deep learning ecosystem, probably a more sophisticated set of NLP libraries, and has a solid set of tools to work with networks. I would assume that it’s easier for people with CS backgrounds to pick up Python quickly.

R was created by statisticians to do statistical computing, and is excellent at this. The tidyverse ecosystem has made data exploration, wrangling, and visualizing easy and pleasurable. CRAN has a rapidly growing collection of cutting-edge libraries implementing methods in causal inference, spatial analysis, and time series statistics. My favourite thing about R is the wonderful community that surrounds it.

The important thing though, is that these differences are closing rapidly. We’re seeing more instances of R deployed to production, more statistical packages in Python, and reticulate to help the two languages play nicely with each other. Whichever you decide is your primary language, I would highly recommend being at least conversant in the other, and being able to read and review code in it.

Talk respectfully and transparently about money.

We’ve hopefully learned by now that a culture of never talking about money with your peers has played a part in allowing dramatic pay disparity to persist in a wide variety of contexts. Even Meredith Grey didn’t know she was being lowballed until her coworkers shared their incomes with her.

If it can inform a friend or coworker’s leverage or decisions, don’t hesitate to share details of your compensation. When I switched from a university position to a tech industry position, I would have underestimated my own market value by close to 30% if it hadn’t been for the transparency and wisdom of three women from R Ladies Chicago.

As a general rule of thumb, don’t disclose your previous salary while applying to a new job - it is illegal in several US states for potential employers to ask for this information. Push back against supplying a range until the end of the hiring process, when you have all the information you need. At this point, do your homework! Go on LinkedIn, Glassdoor, BuiltIn, talk to your networks, and arrive at a a realistic range, that you’d be happy to accept. If you’re in school, talk to your career services team about this - they probably have plenty of experience with this.

If you’re negotiating at your current job, I don’t have experience with this but I really like the idea of having a running document of all the work you’re accomplishing - Julia Evans calls this a brag document, and it sounds like a great way to have your work recognized.

Remember, it’s one job!

It’s important to find a role that’s a great fit for you, but remember it’s one job! Most likely, it’s 5-10% of your career. If your dream job doesn’t feel accessible to you at the moment, take the time to pick up the skills you need to have it. Making career changes is hard, and I’ve noticed that it’s easier to make changes along one axis at a time - geography / skillset / domain. For me, this looked like - econ policy job -> masters -> data science job in policy/research -> data science job in tech. If you’re international, you’re already making a significant geography change.

Data science is no longer an undersupplied market - apart from the multitude of specific graduate programs that have come up in the past few years, other quantitative disciplines have woken up to the need to incorporate computational skills into their training. Vicki Boykis wrote a brilliant blog post on this last year, and its probably even more relevant now. I’ll let you read her post directly (you really should!), but the following are all real data science positions - even ‘data science’ is a new, made-up word - we’re still negotiating the boundaries of what this actually means.

  • data analyst, research analyst, business analyst
  • data scientist, research scientist, ML engineer
  • experimentation, causal inference, people analytics
  • decision scientist, systems engineer, applied statistician
  • data manager, research associate, pre-doctoral research assistant

I’m still learning to do this, but I think we’d all be better served if we let go of the perceived notions of hierarchy across these jobs. You should definitely think about leveling - your resposnibilities and compensation should be in line with your skills and experience, but there are several valid and real ways to do data science, and it’s a waste of time to tell ourselves otherwise.

Take risks that are calibrated to your specific situation. It is completely valid and legitimate to take a job that feels non-optimal in some ways to pay your bills, to maintain visa status, for health insurance, to be near a loved one. It is also valid to pass up on non-optimal offers if you have a few feet of runway before you must have a job. You don’t owe anyone an explanation, do what’s right for you!