Define Python.
A high-level, interpreted programming language known for its readability and versatility.
What is Pandas used for?
Pandas is a library for data manipulation and analysis, providing data structures like DataFrames.
True or false: NumPy is primarily used for numerical computing.
TRUE
NumPy provides support for large, multi-dimensional arrays and matrices.
Fill in the blank: Matplotlib is used for _______ in Python.
data visualization
What does SciPy extend?
SciPy extends NumPy by adding a collection of mathematical algorithms and convenience functions.
Define DataFrame.
A two-dimensional, size-mutable, potentially heterogeneous tabular data structure in Pandas.
What is the purpose of Jupyter Notebooks?
Jupyter Notebooks allow interactive computing and data visualization in a web-based format.
True or false: Seaborn is built on top of Matplotlib.
TRUE
Seaborn provides a high-level interface for drawing attractive statistical graphics.
What is data wrangling?
Data wrangling is the process of cleaning and transforming raw data into a usable format.
Fill in the blank: Scikit-learn is a library for _______.
machine learning
Define data visualization.
The graphical representation of information and data to communicate insights clearly.
What is the main use of statsmodels?
Statsmodels is used for estimating and testing statistical models in Python.
True or false: PySpark is used for big data processing.
TRUE
PySpark is the Python API for Apache Spark, enabling large-scale data processing.
What does data cleaning involve?
Data cleaning involves correcting or removing inaccurate records from a dataset.
Fill in the blank: Plotly is used for _______ in Python.
interactive plotting
Define data analysis.
Data analysis is the process of inspecting, cleansing, and modeling data to discover useful information.
What is the purpose of data exploration?
Data exploration is the initial step in data analysis to summarize main characteristics.
True or false: TensorFlow is primarily used for data visualization.
FALSE
TensorFlow is primarily used for machine learning and deep learning applications.
What is the role of data types in Pandas?
Data types define the kind of data stored in a DataFrame’s columns, affecting operations.
Fill in the blank: DataFrames can be created from _______.
lists, dictionaries, or external files
Define time series analysis.
Time series analysis involves statistical techniques to analyze time-ordered data points.
What is the function of groupby in Pandas?
The groupby function is used to split the data into groups based on some criteria.
True or false: DataFrames can only hold numeric data.
FALSE
DataFrames can hold various data types, including numeric, string, and datetime.
What does data aggregation mean?
Data aggregation is the process of combining data from multiple sources to summarize it.