Comma-separated values (CSV) / Tab-separated values (TSV)
Commonly used format for storing tabular data as plain text where either the comma or the tab separates each value.
Data file types
A computer file configuration is designed to store data in a specific way.
Data format
How data is encoded so it can be stored within a data file type.
Data visualization
A visual way, such as a graph, of representing data in a readily understandable way makes it easier to see trends in the data.
Delimited text file
A plain text file where a specific character separates the data values.
Extensible Markup Language (XML)
A language designed to structure, store, and enable data exchange between various technologies.
Hadoop
An open-source framework designed to store and process large datasets across clusters of computers.
JavaScript Object Notation (JSON)
A data format compatible with various programming languages for two applications to exchange structured data.
Jupyter notebooks
A computational environment that allows users to create and share documents containing code, equations, visualizations, and explanatory text. See Python notebooks.
Nearest neighbor
A machine learning algorithm that predicts a target variable based on its similarity to other values in the dataset.
Neural networks
A computational model used in deep learning that mimics the structure and functioning of the human brain’s neural pathways. It takes an input, processes it using previous learning, and produces an output.
Pandas
An open-source Python library that provides tools for working with structured data is often used for data manipulation and analysis.
Python notebooks
Also known as a “Jupyter” notebook, this computational environment allows users to create and share documents containing code, equations, visualizations, and explanatory text.
R
An open-source programming language used for statistical computing, data analysis, and data visualization.
Recommendation engine
A computer program that analyzes user input, such as behaviors or preferences, and makes personalized recommendations based on that analysis.
Regression
A statistical model that shows a relationship between one or more predictor variables with a response variable.
Tabular data
Data that is organized into rows and columns.
XLSX
The Microsoft Excel spreadsheet file format.