How to become a data scientist
How to become a data scientist
How to Become Data Scientist – A Complete Roadmap
According to the Harvard Business Review, Data Scientist is “The Sexiest Job of the 21st Century”. Is this not enough to know more about data science! In the world of data space, the era of Big Data emerged when organizations are dealing with petabytes and exabytes of data. It became very tough for industries for the storage of data until 2010. Now when popular frameworks like Hadoop and others solved the problem of storage, the focus is on processing the data. And here Data Science plays a big role. Nowadays the growth of data science has been increased in various ways and so one should be ready for the future by learning what data science is and how can we add value to it.
What is Data Science?
So now the very first question arises is, “What is Data Science?” Data science means different things for different people, but at its gist, data science is using data to answer questions. This definition is a moderately broad definition, and that’s because one must say data science is a moderately broad field!
Data science is the science of analyzing raw data using statistics and machine learning techniques with the purpose of drawing conclusions about that information.
So briefly it can be said that Data Science involves:
Nowadays it is known to everyone that how popular is Data Science. Now the questions that arise are, Why Data Science(Decide the Goal First?), how to start? Where to start? What topics one should cover? etc, etc. Do you need to learn all the concepts from a book or you should go with some online tutorials or you should learn Data Science by doing some projects on it? So in this article, we are going to discuss all these things in detail.
Why Data Science? (Decide the Goal First?)
So before jumping into the complete Roadmap of Data Science one should have a clear goal in his/her mind that why he/she wants to learn Data Science? Is it for the phrase “The Sexiest Job of the 21st Century“? Is it for your college academic projects? or is it for your long-term career? or do you want to switch your career to the data scientist world? So first make a clear goal. Why do you want to learn Data Science? For example, if you want to learn Data Science for your college Academic projects then it’s enough to just learn the beginner things in Data Science. Similarly, if you want to build your long-term career then you should learn professional or advanced things also. You have to cover all the prerequisite things in detail. So it’s on your hand and it’s your decision why you want to learn Data Science.
How to Learn Data Science?
Usually, data scientists come from various educational and work experience backgrounds, most should be proficient in, or in an ideal case be masters in four key areas.
Domain Knowledge
Most people thinking that domain knowledge is not important in data science, but it is very important. Let’s take an example: If you want to be a data scientist in the banking sector, and you have much more information about the banking sector like stock trading, know about finance, etc. so this is going to be very beneficial for you and the bank itself will give more preference to these type of applicants more than a normal applicant.
Math Skills
Linear Algebra, Multivariable Calculus & Optimization Technique, these three things are very important as they help us in understanding various machine learning algorithms that play an important role in Data Science. Similarly, understanding Statistics is very significant as this is a part of Data analysis. Probability is also significant to statistics and it is considered a prerequisite for mastering machine learning.
Computer Science
There is much more to learn in computer science. But when it comes to the programming language one of the major questions that arise is:
Python or R for Data Science?
There are various reasons to choose which language for Data Science as both have a rich set of libraries to implement the complex machine learning algorithm, visualization, data cleaning. Please refer to R vs Python in Data Science to know more about this.
But my recommendation is one must have knowledge of both the programming language to become a successful data scientist.
Apart from the programming language the other computer science skills you have to learn are:
Communication Skill
It includes both written and verbal communication. What happens in a data science project is after drawing conclusions from the analysis, the project has to be communicated to others. Sometimes this may be a report you send to your boss or team at work. Other times it may be a blog post. Often it may be a presentation to a group of colleagues. Regardless, a data science project always involves some form of communication of the projects’ findings. So it’s necessary to have communication skills for becoming a data scientist.
Learning Resources
There are plenty of resources and videos available online and it’s confusing for someone where to start learning all the concepts. Initially, as a beginner, if you get overwhelmed with so many concepts then don’t be afraid and stop learning. Have patience, explore, and stay committed to it.
Some useful learning resource links available at GeeksforGeeks:
A Roadmap to Learn
Start with the Overview of Data Science. Read some Data Science related blogs and also research some Data Science-related things. For example read blogs on Introduction to Data Science, Why to choose data science as a career, Industries That Benefits the Most From Data Science, Top 10 Data Science Skills to Learn in 2020, etc., etc., and make a complete mind makeup to start your journey on Data Science. Make yourself self-motivated to learn Data Science and build some awesome projects on Data Science. Do it regularly and also start learning one by one new concept on Data Science. It will be very better to join some workshops or conferences on Data Science before you start your journey. Make your goal clear and move on toward your goal.
1) Mathematics
Math skill is very important as they help us in understanding various machine learning algorithms that play an important role in Data Science.
2) Probability
Probability is also significant to statistics, and it is considered a prerequisite for mastering machine learning.
3) Statistics
Understanding of Statistics is very significant as this is a part of Data analysis.
4) Programming
One needs to have a good grasp of programming concepts such as Data structures and Algorithms. The programming languages used are Python, R, Java, Scala. C++ is also useful in some places where performance is very important.
References:
5) Machine Learning
ML is one of the most vital parts of data science and the hottest subject of research among researchers so each year new advancements are made in this. One at least needs to understand basic algorithms of Supervised and Unsupervised Learning. There are multiple libraries available in Python and R for implementing these algorithms.
6) Deep Learning
Deep Learning uses TensorFlow and Keras to build and train neural networks for structured data.
7) Feature Engineering
In Feature Engineering discover the most effective way to improve your models.
8) Natural Language Processing
In NLP distinguish yourself by learning to work with text data.
9) Data Visualization Tools
Make great data visualizations. A great way to see the power of coding!
10) Deployment
The last part is doing the deployment. Definitely, whether you are fresher or 5+ years of experience, or 10+ years of experience, deployment is necessary. Because deployment will definitely give you a fact is that you worked a lot.
11) Other Points to Learn
12) Keep Practicing
“Practice makes a man perfect” which tells the importance of continuous practice in any subject to learn anything.
So keep practicing and improving your knowledge day by day. Below is a complete diagrammatical representation of the Data Scientist Roadmap.
How to become a Data Scientist? — A detailed step by step guide!
You couldn’t have missed the buzz.
Whether it’s the media, articles, job postings or interviews of top leaders from companies such as Google and Facebook, everyone seems to have been talking about Data Science and Artificial Intelligence. If you’re like most, you would also be thinking — How to become a Data Scientist?
It’s time to take that question seriously. In 2012 Harvard Business Review dubbed Data Scientist as the Sexiest Job of the Century. The demand and hype around it have made it a very lucrative career option for college students & software professionals.
Is it easy to become a Data Scientist?
As enticing as it seems, data science is not an easy field to enter into as it requires some strong pre-requisites in many areas. People with good programming skills, mathematics and love for data have good chances of becoming a Data Scientist.
In this guide, I have tried to cover almost every aspect of a data science and it will be able to help you decode the most efficient and fastest way of becoming a data scientist.
What is Data Science?
Data science is all about uncovering meaningful insights (usage, trends, consumer behaviour, retention etc)and findings by using complex algorithms & tools, machine learning processes, mathematics & statistics, programming & technology.
Here’s a quick video that shows the importance of data at Uber:
Basically, businesses today are using data science to outperform the competition, reduce costs, increase retention and make smart business decisions.
But how exactly they do this? How do the awesome data scientists make this random and unstructured data meaningful?
Who are Data Scientists & What they do?
Given the wide range of stuff data scientists do, there seems to be confusion around the roles of data scientists. Are they statisticians, mathematicians or software engineers?
This statement pretty much puts things in perspective:
A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician.
Here are some things data scientists are normally asked to do:
Outside of these finer tasks, the overall role of a Data scientist is to advise the teams and management to take data-based decisions vs taking adhoc decisions. Watch how Mayur Datar, Principal Data Scientist — Flipkart is talking about what being a data scientist is all about:
What are the skills required to become a data scientist?
The skill set of a good data scientists consists modular expertise in many fields like data mining, data analysis, programming, mathematics & statistics, machine learning, business, data hacking, data visualization, database & (big) data. Following is the brief description of all the major skills required to become a data scientist and how to acquire them:
Mathematics (Probability, Statistics, Linear Algebra):
Let’s get this straight — mathematics is the core foundation of data science.
To take an example, let’s say you work in a drone company who does crowd surveillance & you want to find out the number of male and female attendees at an event. Now, for doing so that too from a far distance you need a strong grip on probability & statistics (concepts such as Maximum likelihood estimation). Probability will help you in finding out the chances of occurrence of a male or female person on the basis of their face and physical appearance.
Mathematics is important for a data scientist because working on data or building data products require an ability to view data, patterns or textures through a mathematical mindset. After converting data into a structured form, If you want to analyse or visualize it then also you must have a good knowledge of statistics. Linear algebra is one of the most important functions of machine learning. It is also very important if you want to uncover some characteristics of users in a big data sets — talking about matrix here.
Following are some resources which will help you in learning & improving these skills:
Programming:
For prototyping small & quick solutions or stitching complex data systems, a data scientist must know how to code. It helps you in cleaning and organizing unstructured data. The most important programming languages & technologies which you must know or learn to excel in this field are Python, R, SAS, SPSS, Perl & SQL/NoSQL.
Trust me if you are genuinely passionate about getting into data sciences then you must have a good command over programming. It will be your best support in reaching your KRA’s on time.
Following are some resources which will help you in learning & improving these skills:
Machine Learning (ML):
Machine learning is used to train computers to learn & develop continuously by themselves on feeding them with new data. Recommendation engines, self-driving cars, recruitment companies, etc in today’s times are heavily relying on ML to improve their user experience.
To clear the confusion you can say ML is the core subset of Artificial Intelligence. Machine learning helps companies automate their important processes in real time hence reducing the cost of operations based on human intervention. Data scientists must know ML because it helps them in making such systems which can make high-value predictions & take decisions in real time.
Following are some resources which will help you in learning & improving your ML skills:
Data skills
Knowledge of Databases:
Data scientists need to access, manipulate and store data all the time. Knowledge of relational databases such as MySQL as well as NOSQL databases such as MongoDB & Cassandra is very important to do this effectively.
Following are some resources which will help you in learning & improving these skills:
Big data is basically a huge amount of data, generating from multiple sources at high velocity and variability which can’t be handled easily by traditional database management systems such as the relational database.
Big Data is a problem and tools like Hadoop & Spark are solutions to it. Hadoop is an open-source software framework used for distributed storage and processing of datasets of big data.
Following are some resources which will help you in learning & improving these skills:
Data Munging/Wrangling & Visualization:
Data munging/wrangling is the process of transforming one “raw” data form into another form making it more convenient to understand and use.
Data Visualization & Reporting: Data visualization is the creation & study of the visual representation of the data by using statistical graphics, plots and information graphics. Data reporting is the process of arranging data into informational reports in order to gain meaningful insights for improving & monitoring different areas within a business.
Following are some resources which will help you in learning & improving these skills:
So, how to become a Data Scientist?
With all this background, let’s just get to the steps needed to become a data scientist:
3. Get a real-world project: Now, you know all the skills, you have already done few projects, passed multiple tests and you are very well aware of the whole data science scenario — What’s next? It’s time to take the litmus test.
If you are a student, it’s easier. From startups to big companies like Amazon provides data science internships throughout the year. Getting an internship in data science is not difficult if you have given sufficient amount of time in getting your basics clear and have hands-on experience.
If you are already working and want to switch to data science, don’t worry. Demand for data scientists is increasing exponentially day by day.
4. Connect with people hiring for data scientists: Don’t waste your time on regular job portals like Indeed, Monster or Naukri. These portals are very noisy and the output is really low. Try to use intelligent platforms (we built CutShort- “an AI-based platform to find just the right jobs in product based companies and startups”) to find which can really cut short your path to becoming a data scientist.
Here are real teams hiring for data scientists currently:
Conclusion
Data science as a career is a great option that is interesting as well as rewarding. The demand for data scientists is just going to explode in next decade.
However, it’s a challenging role too. You may be able to get into it in short term, but for longer-term success, you need to really build a strong foundation in this domain. All the best!
If you liked this article, please share with your friends. They will thank you for it!
How to Become a Data Scientist Without a Degree
Data scientists are some of the most coveted roles in any company, and their popularity — and demand — are both skyrocketing. Aside from crunching big data to enable (very cool) things like artificial intelligence and advanced medical research, data science has stretched into the private sector as companies look to data as a decision driver.
And though most data scientists have a Master’s or Doctorate degree, more than 25% of data scientists have a Bachelor’s degree or less. In other words, there are many paths to becoming a data scientist — not just the popular “get your STEM Master’s.”
To truly discover how to become a data scientist without a degree, we have to back up and look at what data science is and what data scientists do. The career of data scientists often gets confused with data analysts and data statisticians — spoiler alert: they’re slightly different. We’ll also need to look at some of the skills that data scientists need and how to learn them.
What is data science?
Data science isn’t a physical product — it’s a broad term given to studying behaviors, topics, and trends. Data science has a lot of definitions, but it’s best to think of it as a multidisciplinary approach to find stories, insights, and patterns from large data sets. Then cleaning and organizing this data in a way that makes sense so you can act on it.
And now, data science is everywhere.
If you’ve uploaded a photo on Instagram recently, if you’ve sent a Snap using Snapchat to one of your friends, or if you left a comment on a YouTube video today — those actions were collected and added to a data set that data scientists are using to make decisions. The ads you get for how to start a business course, the Hulu recommendations for what new TV show to binge-watch, and the product recommendations you get from Amazon are all a product of data science.
Data scientists are taking your actions, organizing them with your other actions and the actions of others, finding patterns, and creating an action (in this case, a recommendation) based on that data.
Data science is also a broad term used to describe many, more specific subcategories such as data engineering, data mining, mathematics, statistics, advanced computing, and model visualization. It’s also the keystone of artificial intelligence, machine learning, and deep learning.
What do data scientists do?
As we mentioned, data scientists use mathematics and programming to collect, clean, and explain data. Their main role is to adjust statistical and mathematical models applied to acquire data. In other words, they make data discoveries.
In a world where every action is collected, data scientists are more important than ever. When applied to business decisions, data scientists have a few concentrations:
Data scientists turn formal business problems into data questions, so those business problems have data-driven answers.
They communicate data to less-technical team members and stakeholders through data visualization.
Data scientists often take other titles, like data architects, data engineers, machine learning engineers, and analytics managers.
But what separates data scientists from the rest of the pack is their ability to code. Data scientists are experts in programming languages like Python, R, SQL, and more.
Are data scientists different than data analysts and statisticians?
Data scientists often get confused with data analysts and statisticians, and though their work is similar, there are some main differences:
Data analysts are the overseers of a company’s data, helping team members understand it and use that data to make data-driven, strategic decisions
Business analysts play a strategic role, focusing on using that data housed by analysts to find problems and find solutions
Data scientists, on the other hand, use that data by sifting through it and organizing it to find weaknesses and trends, creating predictive models based on that data
One of the biggest — if not the biggest — differences between analysts and data scientists is salary. Data scientists average much higher salaries, plain and simple, because of their technical skills.
While both careers are derived mainly from working with data, a data analyst acts more like a translator, and becoming a data analyst requires less techinical ability. In contrast, a data scientist acts in a hybrid capacity, helping companies turn data into something practical and usable through coding. Analysts and researchers have been around long before “data” existed, which is partially why data scientists often get confused with data analysts.
What skills do data scientists need?
There are many complicated skills that data scientists need, and without a degree, it can become difficult to learn, but it isn’t impossible. Not even close. We’ll help you uncomplicate the skills required.
Strong math comprehension
Calculus
Calculus is vital, but it’s only essential to understand calculus principles and how those principles affect your data models. Knowing derivatives is more of something you’ll need to understand rather than compute.
Linear algebra
Linear algebra is also an excellent subject to touch upon, but as before, it’s not essential to know every little thing about algebra — just knowing how to solve a variable will help when it comes to coding and performing other tasks.
Statistics and probability
There is one domain you’ll need to be very well versed in, though, and that’s statistics and probability. A strong background isn’t required, though — learning enough will be almost similar to trial-and-error, and you’ll learn more through experience. The advantage here is that most of the concepts within statistics aren’t hard to grasp! It just takes time.
Data science programming languages
Programming and coding is a general skill that can help a data scientist in various ways — having a familiar background within programming languages can help.
R is a programming language created specifically for statistical computing and graphic tasks
Python is a general-purpose programming language that is used for data manipulation and repeated tasks—the language is more commonly used and has a broader application than R
SQL, Structured Query Language, is relied on by industries to extract data for analytics and reporting purposes—it’s designed for managing data in relational database management systems
Hadoop is a suite of technologies for managing data and executing programs in a cluster – a collection of networked computers running within a data center. Hadoop equips a file system designed for the needs of extensive data
Spark is a system, rather than a language, used for writing parallel programs to run in clusters. Spark has gained popularity for its higher efficiency on many problems and applications. It also has a rugged, machine-learning library compatible with R, making it popular amongst data scientists.
The list continues, but don’t let it intimidate you — knowing every single language inside out isn’t required, and many data scientists are more expert in some than others. Familiarity with each, though, is often expected.
Machine learning
Although machine learning sounds intimidating, it is merely how computers can learn, and improve at, tasks without being explicitly and manually programmed. Data scientists can use machine learning techniques to make decisions and predictions based on data, and it has many applications and use in the field of data science.
Other skills
A/B testing is generally a skill that should be learning but not mastered (unless requested by your work). In a general sense, A/B testing is a process of showing two types of the same web page to different audiences simultaneously to see which one earns more conversions and interactions.
Along with statistics, linear regression is a skill that should you should be familiar with before entering a data scientist’s career. In a simple sense, linear regression is just to see how close data points line up with a “line of best fit.” From this regression line, you can see how data points compare with each other while also seeing where the trend is going or how it’s changing.
So how do you learn data skills without getting a degree?
Becoming a data scientist without a Master’s degree or Doctorate degree is both possible and, frankly, not entirely rare. As we mentioned earlier. more than 25% of professional data scientists do not have a Master’s or Doctorate.
Those that don’t have those degrees probably took one of three routes on the way to their data science career. Either they self-taught themselves everything they need to know, started in a more accessible role like data analytics, or attended a data science bootcamp.
These are the three paths we’d recommend as you work your way toward a data science career, and each comes with different pros and cons.
At Flatiron School, we always recommend that students think long and hard about what what their learning style is before selecting an education path. Do you tend to work better in groups or alone? In person or remotely? Quickly or slowly? Hands-on or reading or both?
There are many ways to learn, and everyone is slightly different. While considering which path to take — teaching yourself, attending a data science bootcamp, or learning data analysis skills and working your way up always keep in mind the teaching style that fits your learning style.
Path 1: Self-teaching
Educating yourself without a teacher is hard, which is why not many people do it. We liken it to regularly exercising: you can buy the best shoes and make the best workout plan, but if you can’t keep yourself disciplined enough to run every day, you aren’t going to get in shape.
Self-teaching requires extreme discipline, as well as thorough research to make sure you’re teaching yourself the right skills. In a world evolving as quickly as data and tech, we recommend making some checkpoints for yourself to reassess your curriculum often. That way, you stay on track with new and relevant skills.
Of course, this won’t be an issue if you’re learning full-time for 6 month, but if you’re learning over the course of a few years, these checkpoints are absolutely critical.
Fortunately, for anyone interested in teaching themselves to become a data scientist, there are plenty of books and online resources dedicated to getting started.
Introduction to Data Science from Alison
A free online intro course. You’ll cover machine learning and data models, as well as typical processes like the scientific method and algorithms. It’s roughly 3 hours of curriculum.
Dataquest: Learn R, Python and SQL for Data Science
Another free intro course that covers the basics of data science.
Free Data Science Bootcamp Prep
Flatiron School’s introductory data science course covers Python and machine learning. It is the beginning modules of our full-time data science course.
Here are some other resources to help round out your data science educations.
If you want to explore various articles related to the history and uses of Phyton, check out this free Python for Dummies resource
Want to learn SQL? Start with w3school’s Introduction to SQL or even Udacity’s Learn SQL course
For more resources, check out Codecademy’s Data Science resources, career paths, and skill paths. They also have a few more statistics courses to familiarize yourself with some of the math topics related to data science.
Reading My Journey to Self-Taught Data Scientist is also a great read and incentive to keep your drive strong
Lastly, here’s a list of some of the best books and smaller courses — some free — to introduce yourself to data science:
The pros of self-teaching
Little-to-no cost to you, most of the time
Learning at your own pace and on your own schedule
Struggling or find something interesting? Go more slowly and deeply on that subject
Unbiased materials from multiple sources
Learn using the methods that you prefer (book, online, videos, slideshows, etc.)
The cons of self-teaching
It’s hard to stay disciplined
No career services support or guidance once you’re learned what you need to know
Uncertain if the right material is learned
No experts to advise your learning
Hiring managers may not consider it a valid education
No portfolio to showcase when interviewing
Path 2: Data science bootcamps
College courses will typically provide you with a more philosophical and a more prolonged approach/education than teaching yourself, though not as philosophically well-rounded as a post-grad degree.
Unlike with self-teaching, bootcamps are sure to teach you the skills necessary to succeed at a data scientist today. Some bootcamps, like Flatiron School, work with employers and hiring companies to understand what they need from data experts, and creates their curriculum around those needs.
The biggest difference between self-teaching and bootcamps is, well price.
Bootcamps also give you hands-on data science projects so you you’ll graduate with a full portfolio to showcase your skills for recruiters during the interview process.
If you decide to go the bootcamp route, we suggest considering a few things:
What teaching style do they use? Is it hands-on labs? More book-oriented? Select a bootcamp that matches your style.
What do they teach you? Python? R? SQL? All 3? Do your research into what languages are used for, then compare that with your career goals. Choose a bootcamp that teaches the skills that match your ambitions.
What are their placement rates? How many of their graduates successfully land jobs, and how much do they get paid? All bootcamps should be open and honest about these numbers — you deserve to know that you’er investing in a bootcamp that will provide a good education and has paid off for students in the past. Any bootcamp that doesn’t provide this data clearly upfront might be worth avoiding.
At bootcamps, your education isn’t a pass or a fail; you get a more hands-on approach.
We recommend the following in-person data science bootcamps for their quality education and reasonable tuition fees. It’ll be a steep price, but once you land that data science role, it’ll be well worth it.
Flatiron School’s data science course
Flatiron’s course teaches you all the skills you need to become a data scientist in as little as 6 months. Our curriculum covers Python, SQL, statistics, A/B testing, linear regressions, combinatorics, probability theory, statistical distributions, Bayes Theorem, sampling methods, hypothesis testing, model evaluation, and more. Our course is offered in person in New York, Chicago, San Francisco, Seattle, and Washington, D.C.
What are some pros of coding bootcamps?
Hands-on, applicable skills in a short-time provides a faster way to learn data science
Knowing and learning the right material
Focused on getting you a career as a data scientist
More affordable than most universities and can be done part-time
Many bootcamps have career services and job search assistance—professional, ready-made network
Connect with other data scientists
Know the latest market/employer needs
Hiring managers favor Bootcamp graduates, especially compared to self-taught scientists
What are some cons of coding bootcamps?
High-cost upfront, at once
Intense, long hours for a few months
Lack of access to FAFSA or other federal financial aid
Less philosophical background than computer science degree programs
Less in-depth than traditional college students
Can be fast-paced
There are still managers who prefer computer science degrees over bootcamp
How do you get a job after learning the data science skills you need to know?
We’ll outline simple steps for you to get a job as a data scientist after you’ve self-learned all of the skills listed above. This can be harder if you’re working on your own rather than working with a career coach or a career service team. People often think learning the skills you need is the entire story, but it’s just the beginning for most people.
Effective job searching and marketing yourself is vital and is often the most prominent reason someone does or doesn’t get a job offer.
Seriously, we cannot emphasize enough how much performing your job search effectively is.
For an effective job search, try these steps:
Build a robust job-search foundation
Write resumes and cover-letters effectively while including your self-learning
Create your professional brand and market yourself on LinkedIn, Twitter, GitHub, blogging, etc.
Network offline as well as online
Efficiently apply to jobs—applying to one appointment a week isn’t enough most of the time
Prepare for interviews, both culturally and technically, and communicate well via email, phone, and in-person
Most importantly, maintain a healthy search/work/life balance—stress is the last thing you want
How do you become a data scientist without a degree?
Identify your goals – Ask yourself, “what type of specialty do you want to focus on?”.
Figure out how you want to learn – Self-teaching is valid, but bootcamps are generally better.
Learn, learn, learn, learn, work hard, and learn – Stay committed to learning no matter what. If you teach yourself, great! If you go to a bootcamp, also great! Obviously this is the most important step of the way, so do not take any shortcuts, and dedicate yourself to a full understanding of data science.
Build your online presence – Consider writing articles to show off your new expertise.
Reach out to your network and support systems – Just graduated? Referrals and connections are often the best way to land an interview.
Continue learning during your job search – You can never know too much—there’s always something to learn, still. Bootcamps are tough, but so is self-learning, so your education and career are up to however much effort you want to put in. The payoff is life-changing. Futher reading: How to Land a Tech Job: The Complete Curriculum
Don’t give up – You might not land a job right away, but don’t think that you’re not good enough to become a data scientist—if you know the skills, you’re qualified.How to Become a Data Scientist Without a Degree
If you plan on going the bootcamp route, Flatiron School’s data science course teaches you all the skills you need to start a career as a data scientist or data analyst. This course is also available fully online.
Posted by Nicholas Gallinelli / February 17, 2021
How to Become a Data Scientist?
Table of Contents
Data Science is one of the leading career options in the 21st century. In the present data-savvy world, huge chunks of data are stored by organizations from all walks of the industry to process and churn out solutions in the form of information for answering a wide variety of questions.
Ranging from businesses and government institutions to non-profit organizations, everyone has big data that needs to be analyzed and processed for solving several intimidating queries. This is where data science comes in.
How to Become a Data Scientist?
Data scientists are professionals responsible for dealing with big data and help their employers know right answers to their questions, may it be for creating a marketing plan or targeting the right demographics for a product.
Though data scientists come from a varying educational background, most of them have some sort of technical schooling. Data science is a diverse field that demands programming knowledge along with an understanding of mathematics (statistics in particular).
As the total information available to mankind grows exponentially, so do the opportunities for data scientists. Before diving into the how-to of becoming a data scientist, let’s first take a brief look into data science and professionals pertaining to it, followed by essential skills required by the job profile.
Data Science and Data Scientists
Data science is a diverse field that involves a plethora of requisite skills. Typically, a data scientist is someone who gathers and processes data with the intent of reaching some concrete conclusions that can benefit their employer.
There are several different techniques employed by data scientists. In order to present the data in a visual context, there is something known as visualizing the data.
Visualizing the data is a way that allows a user to spot distinct patterns that otherwise won’t be so obvious had the information was to be presented in the form of mere numbers.
Data scientists create advanced algorithms that are meant to determine patterns in large chunks of data. It’s safe to say that data science is the exercise of looking for meaning in huge amounts of data.
Essential Skills for Becoming a Data Scientist
Without further ado, here is the step-by-step guide about how to become a data scientist:
Step 1 – Ensure It’s Meant for You
First things first! Before you set out on the journey to becoming a data scientist, it’s important to double-check that it is exactly what you want. Data science is a very extensive branch of general studies. Hence, you need to be sure before taking the heavy load on your shoulders.
The Internet is full of several preliminary data science courses that’ll ensure that whether what you’re seeking is good for you or not as well as what you will get by pursuing the career path if you finally decide to go for it.
Step 2 – Get a Relevant Bachelor’s/Higher Degree
Boot camps are an excellent way of speeding up the things alongside your main degree. Another beneficial activity that you can go for side-by-side your main academic course is enrolling for MOOCs.
Massive Open Online Courses or MOOCs are online courses that allow unlimited participation and open access to learning material by means of the web. MOOCs are offered by Harvard, MIT, Microsoft, and several other deemed universities and organizations from around the world.
Step 3 – Pick an Area of Interest
There are several different paths that converge to a fruitful data scientist career. Typically, data scientists start from the undergraduate level in Computer Science, Mathematics, Statistics, etc.
They are apt for bagging jobs like that of a data visualization specialist, management analyst, and market research analyst. However, some get specialized concentrations via master degree programs, such as data engineering and machine learning.
Some of them still pursue a further doctorate degree in concentrations to the likes of business solutions and enterprise science analytics. Therefore, it’s important to pick an area of interest and get a relevant degree for it.
Step 4 – Get Certification
Certifications are an important part of the resume of any present-day professional, especially someone belonging to the IT sector.
In addition to making the pursuer a marketable candidate for specific data scientist job requirements, certifications can help in developing new as well as improving extant skills.
Step 5 – Gain a Role
Once you’re done with accumulating all the academic and educational requisites, it’s time to put your skills learned thus far to test and gain a role in the lucrative field of data science.
Now, data science is a very varying field. Thus, there is a multitude of specialized roles to opt for. Furthermore, it is possible to become a data analyst without any prior experience and then advance from there.
Online places like iCrunchData and Kaggle are excellent for searching the right kind of data science job. With constant development in the field of IT and data science, new and better options are springing up every now and then.
Pros and Cons of Becoming a Data Scientist
Obviously, there are a lot of benefits of becoming a data scientist. However, like any other career path, it has its own share of disadvantages.
Pros
Cons
CAUTION: Data Science is Not Statistics!
It’s very easy to mistake data science for statistics. Although the two shares several aspects, each one of them is a distinct field.
Data Science, on the other hand, is relatively new. Unlike statistics, data science relies heavily on computers and technology. Moreover, it is a continuously evolving field where information is accessed via large databases and then the code is used to manipulate and process the same.
Conclusion
So, that was all about how to become a data scientist. The field of data science is growing continuously and there are no signs of it subsiding anytime sooner.
At least, until the world finds something better than data and information for doing each and everything that relies on them, which is, of course, a very impractical possibility. Hence, it is a great time to make a career in data science.
How to become a data scientist?
I am pretty sure that many of us come across the article from the Harvard Business Review back in 2012. A data scientist is a professional known as the sexiest job of the 21st century. Also, research conducted by McKinsey Global Institute back in 2013 projected that there will be approximately 425,000 and 475,000 unfilled data analytics’ positions in North America by 2018. The take-home message here is that there will be a constant stream of analytic talent will be required in all industries, where companies collect and use data for their competitive advantages.
What exactly a data scientist?
In an over-simplified description, a data scientist is a professional who can work with a large amount of data and extract analytical insights. They communicate their findings to the stakeholders (i.e., senior leadership, management, and clients). Thus, companies can benefit from making the best-informed decisions to drive their business growth and profitability (i.e., depends on the context of industries).
Why is it so hard to become a data scientist?
The nature of data science is a hybrid of many disciplines. Where it composed of different subject areas like math (i.e., statistics, calculus, etc.), database management, data visualization, programming/software engineering, domain knowledge, etc. In my opinion, this may be the primary reason why people interested in jumping into the entry-level data science career often feel completely lost. Most people don’t know where to start because you may lack in one area completely or multiple areas depend on one’s educational background and work experience.
However, the good news is that you don’t need to worry too much about it. These days, we face completely opposite side of an issue. There are simply too many resources out there to pick. So, you don’t necessarily know which one might work out best for you. In this article, I will be focused on how to become a data scientist from three perspectives.
Section 1: Where to learn data science?
Let’s get started from where to learn data science. There are three major pathways to obtain data science education from Massive Open Online Courses (MOOC), university degree/certificate and boot camp training.
Here is a sample figure which demonstrates the estimated time commitment vs. job placement success rate in each option. This provides an idea that the boot camp education can give you an edge on landing a data scientist job quicker than the other two options.
Here is a summary table provides more detail information about each education pathway. Basically, each option has pros and cons with regards to the cost, flexibility and program length. However, the best tip in making the right decision is to ask yourself what really matters most to you. For example, you have a luxury of time and want to minimize the investment cost. Or you might be a person who wants to land on a job as soon as possible even if the initial investment cost is high.
Section 2: What to learn data science?
There are many things to learn for sure as a data scientist. Let’s start looking at data science education pathway from five major steps.
Step 1, catching up on the basic math related to statistics, calculus and linear algebra is a good start. This is essential as a data scientist to understand the mechanisms behind how different algorithms work. It builds intuition about how to tweak or modify algorithms for solving unique business problems. Also, knowing the statistics helps you to convert your findings from the experimental design tests (i.e., A/B testing) into key business metrics.
Step 2, data scientists must be familiar with a toolset to work with data in various environments. A toolset contains a combination of SQL, command line, coding and cloud tool. Here is a summary of how each tool is used. For data extraction and manipulation from the relational databases, SQL is the fundamental language used in almost anywhere. For general programming purposes (i.e., functions, for loops, iterations, etc.), Python is a good choice since it already packaged with many libraries (i.e., visualization, machine learning, etc.). For an additional boost, knowing command lines provide extra benefits especially for running jobs within cloud environments.
Step 3, this is the best time to pick up some language for building the data science foundation. For commercial software, you have a choice between SAS or SPSS. From open source platforms, many people choose either R or Python. From here, you can grab concepts about data munging/wrangling (i.e., import data, aggregation, pivoting data and missing value treatment). After this, you have the most fun part of learning your data from data visualization (i.e., bar charts, histograms, pie chart, heat map, and map visualizations).
Step 4, you have a choice to pick between applied machine learning or big data ecosystem pathway. Note that you can always come back to master another path later. In my case, I choose to learn about the applied machine learning first. Basically, it covers the aspect of building a machine learning model from an end to end (i.e., data exploration to model deployment). For learning about the big data, I will cover more about where to obtain that education (i.e., books and courses).
Step 5, this is the most crucial step to showcase your potential as a data scientist candidate. Once you familiarize yourself with doing the data science, one must have a project portfolio. A project portfolio is your best opportunity to show what you have done from learning and work experiences. Starting from the data collection (i.e., where to pick or scrape data on your own), come up with your hypothesis, perform exploratory analysis (i.e., extract some interesting insights), build your machine learning model(s) and finally share your findings from write up or presentations. In my case, I have done both a write-up and a video podcast by working on the capstone project with an assigned mentor. I can never emphasize enough about the importance of having a mentor who can directly work with you 1 on 1. Your mentor is the best friend to guide you and ask for help when you get stuck on some project ideas, tuning your model, communicating your results, etc. In fact, some researches mentioned that having a mentor can boost your career five times more than people without a mentor(s).
Section 3: How to learn data science?
In this section, you are going to learn how to pick the best resources for becoming a data scientist. I want to make recommendations based on my learning experience.
For SQL education, the DAT201x course offered by Microsoft from the Edx is one of the best choices. The course covers the following aspects of SQL from data types, filtering, joins, aggregation (group by), window functions and advanced concepts (i.e., stored procedures). The course ensures you to practice a lot by using the best sample data warehouse (i.e., AdventureWorks). Alternatively, you can use the Mode Analytics platform to practice and enhance your SQL skills. The best thing about Mode Analytics is you don’t need to have a SQL server and sample data warehouse installed in your machine. All you need is to have a free account and Internet connection to enjoy your learning.
For machine learning education, there are two options that I like to recommend. The first course is well-known from any data science practitioners out there in the field. Andrew Ng’s machine learning course from Coursera. I used this course to understand basic concepts and tips on how to tune my machine learning models. For coding experience perspective, I would highly recommend this book called Python Machine Learning 2nd edition by Sebastian Raschka. I really think this is the best machine learning book. This book helps you understand from basic mechanisms of each algorithm, a lot of coding examples and supplemental references (i.e., research articles). The best thing about this book is that he walkthroughs how to implement each machine learning algorithm line by line with thorough explanations. This is super important as mentioned by many data scientists, one should be able to write up coding from scratch and know how to implement it. These days, there are many complex problems that you cannot solve directly by using existing libraries from Python.
Here is a full list of resources you can reference for learning each building block of the data science education.
· Khan Academy Math Track
· MIT Open Courseware: linear algebra and calculus
· Udacity: Intro and Inferential Statistics
2. Data Science Toolkit:
o Edx: DAT201x — Querying with Transact SQL (*)
o Mode Analytics: SQL Tutorial (Intro to Advanced)
o WiseOwl: SQL Tutorial (Intro to Advanced) (*)
o Book: Data Science at Command Line
o Udemy: Complete Python Bootcamp
o Book: Learn Python the Hard Way (3rd Edition)
o Book: Automate Boring Stuff with Python
3. Machine Learning:
· Coursera: Machine Learning by Andrew Ng (*)
· Coursera: Applied Machine Learning (U Michigan)
· Harvard: CS109 — Intro to Data Science (*)
· Book: Python Machine Learning (2nd Edition) by Sebastian Raschka (*)
· Book: Python Machine Learning by Example
· Book: Intro to Machine Learning with Python
o Book: Hadoop The Definitive Guide
o Udacity: Intro to Hadoop and MapReduce
o IBM: Hadoop Fundamentals Learning Badge
o Edx: UC Berkeley Spark Courses (CS105, CS120)
o Datacamp: Intro to PySpark, Building Recommendation Engine in PySpark
o Book: Learning PySpark, Advanced Analytics with Spark
Bonus Section: Ask for Help and Networking
Now, I would like to wrap up this article by providing a few more extra tips. In the beginning, as a newbie data science enthusiast, you don’t necessarily have a mentor who can guide your learning experience. Thus, you need a place to ask for opinions and feedback from the data science community. Well, the good news is that there are a couple of forums out there you can ask for help with your problems. A few websites like StackOverflow, Quora, etc. let you post your question and receive a reply to your posts.
Another tip is related to networking. This really applies to anyone who is really looking for new opportunities and build connections. In Toronto, there are many local meetups and big conferences related to data science. Try to attend events as many as you can and introduce yourself (i.e., motivation, objective, passion). Also, if you have opportunities to connect with speakers and event organizers, work to establish meaningful connections with them. I think one of the useful tactics that I learned from my experience is seeking opportunities to present my project portfolio on whatever available medium. What I meant is either opportunity to present at local meetups or even video webcast through the remote data science office hour. From this experience, I was able to learn from my silly mistakes and make improvement from one presentation to another. This brings a lot of value as a data scientist candidate to deliver an effective presentation and able to clearly communicate the analytical insights.
Thanks for reading this article. I hope to bring more enjoyable and resourceful information as I am gaining more experience in my journey of becoming a data scientist.
Источники информации:
- http://medium.com/cutshort/how-to-become-a-data-scientist-a-detailed-step-by-step-guide-635b079937e2
- http://flatironschool.com/blog/how-to-become-a-data-scientist-without-a-degree/
- http://hackr.io/blog/how-to-become-a-data-scientist
- http://towardsdatascience.com/how-to-become-a-data-scientist-3f8d6e75482f