Data science is one of the fastest-growing and highest-paid jobs in tech. According to Dr. Tara Sinclair, Indeed.com’s chief economist, the number of job postings for data scientist grew 57% in the first quarter of 2015 alone, and searches for "data scientist" grew 73.5% during the same period. Meanwhile, data scientist topped the charts for salary, number of job openings, and career growth opportunities in Glassdoor's recent 25 Best Jobs in America report. The jury is in—data science is hot. In the words of CEO Diane Hessan,
You’d have to be living in a cave to not know that data science will give you the potential to have a great career. The problem is that when you dig into the courses available, you’ll see that they all say, “Prerequisite: Ph.D" Even for the ones that don’t require a PhD, people need to have a fairly in-depth background in Statistics or Python. That’s all well and good, but there’s an enormous population who loves data, who want to get involved but don’t have the extensive educational background other programs require.
Tomorrow, we're launching a new Introduction to Data Science and Analytics course in Boston that is going to change this. In partnership with the sports-loving data scientists at Stattleship, the class is designed to give participants an edge in their current roles, or set them up to explore a new career path. Through an accessible and engaging, sports-themed curriculum, participants will learn the fundamentals of R and Tableau.
In anticipation of the first day of classes, I caught up with Chief Data Officer at Stattleship, Tanya Cashorali. Tanya and her team worked with us to develop the course materials, and will serve as instructors over the next eight-weeks. Below, Tanya shares her insights on the emerging field of data science, who should pursue data science, and what the introductory course will entail:[bctt tweet="Game on! @StartupInst + @Stattleship launch sports-themed #datascience course in Boston." via="no"]
Q: What is Stattleship?
A: Stattleship is a sports content business that helps brands connect with sports fans through social media. We recognize that the next generation of sports fans are obsessed with sports stats and data. This is not just a fantasy sports thing, it’s a social media thing. Before, during, and after the game fans are on Facebook, Twitter, and Instagram sharing their pain and joy. We see this as an awesome opportunity for brands to engage fans in new ways, but also a huge education opportunity too. We’re also really excited to help people who love sports data learn new data skills and tell stories using data.
Education is a deeply rooted theme for our team at Stattleship, as I’m sure is obvious from us launching this course with the Startup Institute. We want to get people more excited about data and math—visualizing data and telling stories with data. This is a lot easier to do that when the content matter is something people can relate to. For many people, sports data is a lot more exciting than supply chain data or finance data.
Sports data is really inaccessible to most people. There are a few major providers such as STATS Inc., Sports Direct, and Sports Radar. This data is really expensive, and it’s not easy to use.
At Stattleship, we’re making this data readily accessible and easier to consume. We’re trying to unlock this data that’s been locked up for so long, and also add an additional layer of really cool metrics and new calculations for things. We want to make it a lot more valuable for everyone.[bctt tweet="We're making #sports data accessible + easy to consume @Stattleship"]
Q: Why have you teamed up with us to offer this part-time data science class?
A: An important goal we have at Stattleship is to build an amazing community. It’s really important to us that the data remains available and affordable for people who are trying to advance their careers.
Data is critical to being at the forefront of any industry right now. If you can use data skills to tell an interesting story, you’re going to provide much more value to a company than the people who are lagging behind and not keeping up with the latest skills and technologies.[bctt tweet="#Data is critical to being at the forefront of any industry, says @tanyacash21 #datascience"]
Q: Have you always been a sports enthusiast?
A: I always have. I’ve played basketball, baseball, soccer, field hockey. I shot my first basket on a regulation hoop when I was five. I played sports all through high school and I’ve always been a Boston sports fan. I’m also obsessed with Tom Brady, so you’ll likely see lots of examples in the course relating to his stat lines.
Q: It’s a great hook! What is your definition of a data scientist? What does a data scientist do?
A: That’s a great question, because you’ll find that definitions vary. There are few gold standards of what data science is. It is at the intersection of a few different skill sets.
I’ll start with big data, which most data scientists aren’t even touching. Once traditional data processing tools and hardware can no longer aggregate, manipulate, and analyze the data in a reasonable amount of time, that’s when you’ve entered the world of big data.
Now, data science does not only have to do with big data. In this course, for example, we will not be dealing with big data.
There are three major facets of data science:
- Munging: There’s data-munging, or data-cleansing. That’s the first step—being able to comfortably manipulate small and large data sets, fill-in missing values, and prepare the data for analysis.
- Analysis: Now that data is ready for analysis, you can begin your analytical exploration of the data. You’re summarizing it, you’re able to quickly describe the range of numeric values, how many different types of variables there are, how many rows/columns, etc. This is also where you start to answer the important business questions at hand with the data. You may validate hypotheses in this step and also generate new ones.
- Visualization & Storytelling: Finally, the analysis segues into data visualization, which is tightly coupled with storytelling. You can conduct a really great analysis, which is clever and has actionable takeaways, but if you can’t communicate that to the business or project stakeholders, then that analysis has gone to waste. You need to be able to display data visually. If someone can look at your chart and take something away from it in less than ten seconds, then I consider that a successful chart. Otherwise, you might as well look at the tabular data.
[bctt tweet="#GoldStandard #datascience is the intersection of munging, analysis, + storytelling—@tanyacash21"]
Q: You mentioned that big data is when you can’t handle your data set with traditional techniques anymore? How do you know that you’ve gotten to that point?
A: I’ve definitely broken a number of machines in the past. For example, my computer has 16 gigabytes of RAM. I won’t be able to read a 30 gigabyte file into memory in R. If the only tool that you’re comfortable using is Excel, then big data to you is anything greater than one million rows, because that’s the file size limit in Excel.
Q: You said that in this course we won’t be a big data training. Will students be able to use their own computers?
A: Yes, exactly. In fact, most data scientists aren’t dealing with really large data sets. There are plenty of, and probably smarter, insights that can be gleaned from small data sets. Other than some of the major players for which data is their core asset—companies like Netflix and Amazon are pushing the boundaries using data science and big data to build smarter recommendation engines, and some startups are generating massive amounts of data like Localytics—it’s usually about manipulating smaller data sets and joining disparate sources of data together to tell one story.[bctt tweet="Plenty of smart insights can be gleaned from small #data sets, says @tanyacash21 #datascience"]
Q: Are there other job titles that are synonymous with data scientist?
A: I wouldn’t say synonymous, but there are similar titles that compose the different pieces of data science. Most data scientists should be well-versed in those three skill sets I mentioned: data-munging, analysis, and visualization. But, you could have a data visualization specialist who has studied the art of visualization and is phenomenal at telling those stories. You can have a data engineer who is focused on that first piece—the data-munging and setting-up infrastructure to store data such that it will scale properly. And you could also have data analysts who focus on mainly on the analysis piece, and may not be so well-versed in the data infrastructure and engineering.
Q: But, to be a true data scientist, you need to have all three?
A: At least be somewhat versed in each facet, but not necessarily an expert in all three. We will touch on each facet in this course.
Q: What will make someone successful in this class? In becoming a data scientist?
A: You have to be curious and a problem-solver at heart. If you’re curious about using data to help solve problems, then you’re already on the right track. In order to become a data scientist, you’ll need to then have strong motivation and ambition to learn and keep up with a constantly-changing industry. A solid foundation in computer science, math, or statistics helps, but for this course and to get your foot in the door to the world of data, you don’t have to have that background.[bctt tweet="#DataScientists have to be curious problem-solvers at heart, says @tanyacash21"]
Q: What makes this data science class different?
A: It’s different because we’ll stray very far from theory and math. We’ll be focused on hands-on, interactive, and applied learning, so that students can feel empowered to take what they learn from this course, expand upon it, and continue to learn and pursue a career in data science. They’re going to walk away with a lot of hands-on skills and the resources to be able to go back and learn more. They’re going to have the confidence to do that.[bctt tweet="Our #sports #datascience course is hands-on + interactive, says @tanyacash21" via="no"]
Q: Is it appropriate for learners at any level? Are there prerequisites?
A: I would honestly say that any level is appropriate. A lot of students’ success will be determined by how much work they can put into it. Each week, we’ll have homework assignments and plenty of sports data for them to analyze. What you take away from this course will be directly proportional to how much effort you’re able to put in. Other than installing a few tools and learning some basic commands, there’s not a huge technical slant to this course. I really think it’s welcoming and open to anyone.
Q: What will this course prepare a student to do?
This course will prepare people for a couple things.
For example, if you’re currently a business analyst, economist, or you work in finance and you want to improve and add a tool to your skill set, one of those tools being R, this is definitely the course to take. It will prepare you to be more effective at your job and work smarter, but it will also provide an introduction for people who want to foray into the world of data science. Students will need to continue practicing their skills when the class has ended, but the goal of this class is to get people ready for a career in data science and at the very least add a few of the most in-demand data skills to their resumes.
You’ll walk away with a working knowledge of R and some exposure to Tableau, as well as publishing dashboards on the web. These are some of the most important skills to have when training to enter the world of data analysis.[bctt tweet="Learn R and Tableau with @Stattleship @StartupInst part-time #datascience course" via="no"]
Q: Do you believe there is value in this course for people who aren’t economists or in finance?
I do. For example, as a web designer, you may be interested in analyzing web traffic data or A/B testing a new site redesign—does this wording work better than this other wording? Of course there are analysts who can do this type of work, but to be able to do it yourself is empowering. Understanding the effort that goes into this type of work and more effectively manage it is also hugely important.
On the marketing end, if you get into any sort of sales data or marketing leads data, this skill set would be powerful in helping you optimize campaign performance. Data is permeating every job type and industry.[bctt tweet="#Data is permeating every job + industry, says @tanyacash21"]
Q: You mentioned that students would be working with R and Tableau during this course. Can you explain what these are?
A: Tableau is a software desktop tool. We’ll be working with Tableau Public, which is a free version. R is an open-source statistical programming language.
I’m a big fan of R for a number of reasons. I’ve been using it for ten years, and it’s been gaining a lot of popularity in the data science world. There are thousands of statistical packages and libraries that PhDs and smart people in the community have contributed, so there are tons of resources available at your disposal.
There’s a bit of a learning curve, but R is a friendly language for people who don’t come from a computer science background, versus Python which is another popular language in the data science world. Python is used a lot more in environments where you are productizing and building an entire data product, for example. R, on the other hand, allows you to quickly prototype. It’s a way for data scientists to code out a solution quickly. From there, if the business decides to productize, that’s when you might use Python or some other web development frameworks. I find R less intimidating to learn than Python, so I think it’s really suitable for this course.[bctt tweet="#R is a friendly language for people w/out a #CS background—@tanyacash21"]
Q: What inspired you to get excited about data science in the first place?
I originally started my career in bioinformatics, which is at the intersection of healthcare and data. It serves as the foundation of personalized medicine. I was fascinated by the potential to deliver better healthcare and tailor drugs based on an individual’s genetic makeup. That was my first exposure to data science.
In 2005, I was working at the Boston Children’s Hospital Informatics Program and I was given a gene expression data set from normal mouse brain development cells and mouse tumor brain cells. I was told “Learn R, and apply Gene Set Enrichment to these data sets in order to figure out if there are shared mechanisms between normal brain development and tumor development for these mice.” This was with no R training whatsoever and I had only had a few freshman level computer science classes. There was nothing like DataCamp.com, there were not a lot of online tutorials other than R’s own CRAN documentation.
It’s really interesting to see how many resources are available now, and how accessible it is to people. It’s exciting to see data science embraced by the larger population. I’m excited to teach people in a way that’s user-friendly, inviting, and encouraging.[bctt tweet="It's exciting to see #datascience embraced by the larger population, says @tanyacash21"]
Q: What’s your opinion of the online data science courses available?
A: The online courses are great because you can learn at your own pace and access them from anywhere. They do make it more challenging to hold yourself accountable to finishing the curriculum. They tend to lack the community aspect as well. In this course, when we’re not in class, we’ll have a Slack community online where we’re able to collaborate and help each other out on assignments.
I think the online courses are good for filling in technical gaps you might have, but I really think it’s useful to work in groups in person, and to be able to ask questions interactively as you’re working.
Online courses might be a good option for some students once they’ve completed this course in order to deepen skills in certain areas.
Q: What are your favorite resources for learning data science, specifically for beginners in the space?
A: I’m a big fan of DataCamp.com because their online courses are specifically targeted toward R and data science. We’ve gotten to meet the team a couple times at Stattleship and they’ve got a fantastic group of people working there. You can learn R right in an interactive console on the web. There are some premium paid courses, but they have a lot of great free content as well.
I like Nathan Yau’s FlowingData.com. He posts and curates really good visualization content. R Bloggers is also really good - I’ve found a lot of good tutorials and articles on there.[bctt tweet=".@DataCamp @flowingdata + @Rbloggers are great resources for learning #datascience"]
Q: Are there any last things you want people to know about this class or about the data-science industry?
A: I think it’s a great time and great industry to get into, especially because it can be really fun and rewarding to do things with data that to other people seems like magic.
This course truly is unique—it’s the first of its kind, especially because of the sports data that we have access to. It will be live data—we’ll be analyzing games from that week that some of the students may have watched. We’ve established an awesome team of instructors and teaching assistants. It’s going to be a lot of fun, and students should be prepared to be really engaged and hands-on, to work hard for two and a half hours interacting with their classmates, and to hopefully put in some extra time in between each session so that they can get the most value possible out of the course.[bctt tweet="It's rewarding to do things with #data that to others seems like magic, says @tanyacash21"]