L1: What does Data Scientis do? Flashcards

(18 cards)

1
Q

Comma-separated values (CSV) / Tab-separated values (TSV)

A

Commonly used format for storing tabular data as plain text where either the comma or the tab separates each value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data file types

A

A computer file configuration is designed to store data in a specific way.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data format

A

How data is encoded so it can be stored within a data file type.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data visualization

A

A visual way, such as a graph, of representing data in a readily understandable way makes it easier to see trends in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Delimited text file

A

A plain text file where a specific character separates the data values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Extensible Markup Language (XML)

A

A language designed to structure, store, and enable data exchange between various technologies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Hadoop

A

An open-source framework designed to store and process large datasets across clusters of computers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

JavaScript Object Notation (JSON)

A

A data format compatible with various programming languages for two applications to exchange structured data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Jupyter notebooks

A

A computational environment that allows users to create and share documents containing code, equations, visualizations, and explanatory text. See Python notebooks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Nearest neighbor

A

A machine learning algorithm that predicts a target variable based on its similarity to other values in the dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Neural networks

A

A computational model used in deep learning that mimics the structure and functioning of the human brain’s neural pathways. It takes an input, processes it using previous learning, and produces an output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Pandas

A

An open-source Python library that provides tools for working with structured data is often used for data manipulation and analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Python notebooks

A

Also known as a “Jupyter” notebook, this computational environment allows users to create and share documents containing code, equations, visualizations, and explanatory text.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

R

A

An open-source programming language used for statistical computing, data analysis, and data visualization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Recommendation engine

A

A computer program that analyzes user input, such as behaviors or preferences, and makes personalized recommendations based on that analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Regression

A

A statistical model that shows a relationship between one or more predictor variables with a response variable.

17
Q

Tabular data

A

Data that is organized into rows and columns.

18
Q

XLSX

A

The Microsoft Excel spreadsheet file format.