How much python is required for Data Science & ML?

Hamza Mujeeb Khan
3 min readOct 27, 2020

We all have been there at some point in our Data Science & ML journey when we were overwhelmed with the number of things we felt that we have to learn in order to become a data scientist. One of those things includes Python. As we begin to learn Python in our journey, for which we reach out to blogs, Youtube, books or online courses. We are overwhelmed when we get to know that Python has no endpoint. The more you dig in the more you feel that there is so much still left to learn.

To help you end this recursive agony I’ll share with you an overview of how much python is required if you are learning it for Data Science.

Data scientists use python to retrieve, clean, visualize and build models and not for developing applications. So time should be invested in learning modules and libraries in Python to perform these tasks. Python libraries are a group of already written functions that are built to perform special tasks. Python is also used for the following:

  1. Web development
  2. Game development
  3. Robotics
  4. Automation

For each of these fields, python developers have built separate libraries likewise libraries are also built for Data Science. As a Data Scientist, you only need to make yourself familiar with the libraries associated with Data Science and the knowledge of basic python that will allow you to use them. Rest is out of the scope. There are an estimated 137,000 libraries in python.

According to the following article, there are 20 main python libraries that are extensively used in Data Science. You can visit the article for further details. I’ll list those 20 libraries down below:

  1. Numpy
  2. Pandas
  3. Matplotlib
  4. Scipy
  5. Statsmodels
  6. Seaborn
  7. Plotly
  8. Bokeh
  9. Pydot
  10. Scikit-learn
  11. XGBoost
  12. ELi5
  13. Tensorflow
  14. Pytorch
  15. Keras
  16. spark-deep-learning
  17. NTLK
  18. Spacy
  19. Gensim
  20. Scrapy

Each of these libraries serves a special purpose. Each one of them has proper documentation which can be used to learn about them. Also one of the mistakes that people make while they study python for DS & ML is that they assume they need to know every library related to DS & ML top to bottom. It is better to get acquainted with some starting libraries of DS & ML and get started with a hands-on experience in the field.

Any normal human just can not remember all the functions in all the libraries at all time. That is why we have documentation for each libary.

That is why the most essential part is how you approach a solution ie What is your thinking process when you try to solve a problem. People have already contributed so much to the community that as a learner the best thing to do is to look out for those contributions and appreciate it. Hence you should Google things that you don’t know.

CONCLUSION

Now you know the amount of python that you need to get familiar with to get started with data science. So don’t spend hours of extra time trying to learn things that might be irrelevant for you right now. Focus more on understanding the math and concepts behind algorithms used extensively in the DS & ML industry.

--

--

Hamza Mujeeb Khan

My name is Hamza. I love to code and play around with computer. I’m always curious to know new things in life. I also love expressing myself.