Data Science
The interdisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured and unstructured data.
three core disciplines
typical workflow
Python
It has a simple syntax, a huge, mature ecosystem of open-source libraries, and versatility for both data analysis and production deployment.
main Python library for data cleaning, manipulation, and analysis
pandas. It provides fast, flexible data structures, most notably the DataFrame.
main Python library for numerical operations and array handling
NumPy (Numerical Python). It provides powerful array objects and tools for working with them, forming the foundation for most other scientific libraries.
main Python library for machine learning and predictive modeling
scikit-learn. It offers consistent interfaces for algorithms like classification, regression, clustering, and dimensionality reduction.
two key Python libraries for data visualization
Matplotlib (the base plotting library) and Seaborn (a library based on Matplotlib that provides a high-level interface for drawing attractive statistical graphics).