Tableau Flashcards

(201 cards)

1
Q

Data Visualization

A

the graphic representation and presentation of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

McCandless Method, 4 Elements of Good Data Visualization

A

1) Information: The data with which you’re working
2) Story: a clear and compelling narrative or concept
3) Goal: a specific objective or function for the visual
4) Visual Form: An effective use of metaphor or visual expression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Kaiser Fung’s Junk Chart Trifecta Checkup

A

A well-designed visual answers all three of the following questions at once (i.e., same answer):

1) What is the practical question?
2) What does the data say?
3) What does the visual say?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Marks

A

Basic visual objects such as points, lines, and shapes. Every mark can be broken down into 4 qualities:

1) Position: What is a specific mark in space relative to the scale or to other marks (e.g., if you look at two trends, the position allows you to compare the pattern of one element to another)

2) Size: How big, small, long, or tall is the mark? Comparison of object size can be an easy visual interpretation, but problems arise when the human eye inadvertently interprets some objects as appearing to be the same size when they’re not. Controlling the scale of a visual is important even when comparative sizes are not intended to offer info.

3) Shape: Does the shape of a specific object communicate something about it? Rather than using dots or lines, a bit of creativity can enhance how quickly people are able to interpret a visual by using shapes that align with a given application (e.g., instead of dots use person-shaped figures).

4) What color is a mark? Colors can be used both as a simple differentiator of groupings or as a way to communicate other concepts such as profitable versus unprofitable or hot versus cold.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Channels

A

Visual aspects or variables that represent characteristics in data (i.e., specialized marks that have been used to visualize data).

1) Accuracy: Are the channels helpful in accurately estimating the values being represented? (e.g., color works well when communicating categorical differences like apples and oranges, but it is less effective when distinguishing quantitative data such as 5 from 5.5.)

2) Popout: How easy is it to distinguish certain values from others?

There are many way to draw attention to specific parts of a visual (e.g., line length, size, line width, shape, enclosure, hue, and intensity).

3) Grouping: How effective is a channel at communicating groups that exist in data?
Consider the proximity, similarity, enclosure, connectedness, and continuity of the channel.

Remember: The more you emphasize one single thing, the more that counts. Emphasis diminishes with each item you emphasize because the items begin to compete with one another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Bar Graph

A

Use size contrast to compare two or more values

x axis: horizontal line
y axis: vertical axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Line Graph

A

Help your audience understand shifts or changes in your data

Often used to track changes over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Pie Chart

A

Shoes how much each part of something makes up the whole.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Maps

A

Help organize data geographically.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

x-axis vs y-axis

A

x-axis: Horizontal line used to represent categories, time periods, or other variables.

y-axis: Vertical line that usually has a scale of values for variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Name types of data visualizations

A

(https://datavizcatalogue.com/#google_vignette)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Histogram

A

A chart that shows how often data values fall into certain ranges.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Correlation charts

A

Show relationship among data.

But use these with caution as they can cause viewers to think they show causation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Correlation

Negative Correlation

Positive Correlation

No Correlation

A

Correlation in statistics is the measure of the degree to which two variables move in relationship to each other.

An example of correlation is the idea that “As the temperature goes up, ice cream sales also go up.”

It is important to remember that correlation doesn’t mean that one event causes another. But, it does indicate that they have a pattern with or a relationship to each other.

Negative Correlation: If one variable goes up and the other variable goes down, it is a negative or inverse correlation.

Positive Correlation: If one variable goes up and the other variable also goes up, it is a positive correlation.

No Correlation: If one variable goes up and the other variable stays about the same

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

dangers of assuming a causal relationship

A

When you make conclusions from data analysis, you need to make sure that you don’t assume a causal relationship between elements of your data when there is only a correlation.

Examples:

Cause of disease
For example, pellagra is a disease with symptoms of dizziness, sores, vomiting, and diarrhea. In the early 1900s, people thought that the disease was caused by unsanitary living conditions. Most people who got pellagra also lived in unsanitary environments. But, a closer examination of the data showed that pellagra was the result of a lack of niacin (Vitamin B3). Unsanitary conditions were related to pellagra because most people who couldn’t afford to purchase niacin-rich foods also couldn’t afford to live in more sanitary conditions. But, dirty living conditions turned out to be a correlation only.

Distribution of aid
Here is another example. Suppose you are working for a government agency that provides SNAP benefits. You noticed from the agency’s Google Analytics that people who qualify for the benefits are browsing the official website, but they are leaving the site without signing up for benefits. You think that the people visiting the site are leaving because they aren’t finding the information they need to sign up for SNAP benefits. Google Analytics can help you find clues (correlations), like the same people coming back many times or how quickly people leave the page. One of those correlations might lead you to the actual cause, but you will need to collect additional data, like in a survey, to know exactly why people coming to the site aren’t signing up for SNAP benefits. Only then can you figure out how to increase the sign-up rate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

To avoid attributing correlation to causation, always:

A

Critically analyze any correlations that you find

Examine the data’s context to determine if a causation makes sense (and can be supported by all of the data)

Understand the limitations of the tools that you use for analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Reverse causality (error)

A

Is Y causing X rather than X causing Y?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Sample selection error

A

Who is missing?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Measurement error

A

How easy is it to measure X and Y?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Omitted variables (error)

A

Are we forgetting about any variables Z that affect both X and Y?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Static visualizations vs Dynamic visualizations

A

static visualizations: do not change over time unless they’re edited.
Useful when you want to control your data and the data story.

dynamic visualizations: interactive or change over time.
Helpful if stakeholders want to be able to adjust what they’re able to view.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Line Chart

A

used to track changes over short and long periods of time. When smaller changes exist, line charts are better to use than bar graphs. Line charts can also be used to compare changes over the same period of time for more than one group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Column charts

A

use size to contrast and compare two or more values, using height or lengths to represent the specific values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Heatmap

A

use color to compare categories in a data set. They are mainly used to show relationships between two variables and use a system of color-coding to represent different values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
pie chart
is a circular graph that is divided into segments representing proportions corresponding to the quantity it represents, especially when dealing with parts of a whole.
26
Scatterplot
show relationships between different variables. Scatterplots are typically used for two variables for a set of data, although additional variables can be displayed.
27
distribution graph
displays the spread of various outcomes in a dataset.
28
Change
This is a trend or instance of observations that become different over time. A great way to measure change in data is through a line or column chart.
29
Clustering
This is a trend or instance of observations that become different over time. A great way to measure change in data is through a line or column chart.
30
Relativity
These are observations considered in relation or in proportion to something else. You have probably seen examples of relativity data in a pie chart.
31
Ranking
This is a position in a scale of achievement or status. Data that requires ranking is best represented by a column chart.
32
Correlation
This shows a mutual relationship or connection between two or more things. A scatterplot is an excellent way to represent this type of data pattern.
33
What type of data visual should you use if... you data only has one numeric variable?
Histogram or Density Plot
34
What type of data visual should you use if... there are multiple datasets?
Line chart or pie chart
35
What type of data visual should you use if... you are measuring changes over time?
Bar chart
36
What type of data visual should you use if... relationships between the data need to be shown?
Scatterplot or heatmap
37
Data Composition
Combining the individual parts in a visualization and displaying them together as a whole.
38
Design Thinking
A process used to solve complex problems in a user-centric way.
39
5 Phases of Design Thinking for Data Visualization
Empathize: Thinking about the emotions and needs of the target audience for the data visualization Define: Figuring out exactly what your audience needs from the data Ideate: Generating ideas for data visualization Prototype: Putting visualizations together for testing and feedback Test: Showing prototype visualizations to people before stakeholders see them
40
Alternative Text
Text that provides an alternative to non-text content, such as images and videos
41
Distribution graph
A data visualization that displays the frequency of various outcomes in a sample
42
Dimensions
Dimensions contain qualitative values (such as names, dates, or geographical data). You can use dimensions to categorize, segment, and reveal details in your data.
43
Measures
Measures contain numeric, quantitative values that you can measure. Measures can be aggregated. When you drag a measure into the view, Tableau applies an aggregation to that measure (by default).
44
Diverging Color Palette
Displays two ranges of values using color intensity to show the magnitude of the number and the actual color to show which range the number is from.
45
Optimize the data-ink ratio
The data-ink entails focusing on the part of the visual that is essential to understanding the point of the chart. Try to minimize non-data ink like boxes around legends or shadows to optimize the data-ink ratio.
46
Common errors that should be avoided so that visualizations aren't misleading
1) Cutting off the y-axis Changing the scale on the y-axis can make the differences between different groups in your data seem more dramatic, even if the difference is actually quite small. 2) Misleading use of a dual y-axis. Using a dual y-axis without clearly labeling it in your data visualization can create extremely misleading charts. 3) Artificially limiting the scope of the data If you only consider the part of the data that confirms your analysis, your visualizations will be misleading because they don’t take all of the data into account. 4) Problematic choices in how data is binned or grouped It is important to make sure that the way you are grouping data isn’t misleading or misrepresenting your data and disguising important trends and insights. 5) Using part-to-whole visuals when the totals do not sum up appropriately. If you are using a part-to-whole visual like a pie chart to explain your data, the individual parts should add up to equal 100%. If they don’t, your data visualization will be misleading. 6) Hiding trends in cumulative charts Creating a cumulative chart can disguise more insightful trends by making the scale of the visualization too large to track any changes over time. 7) Artificially smoothing trends Adding smooth trend lines between points in a scatter plot can make it easier to read that plot, but replacing the points with just the line can actually make it appear that the point is more connected over time than it actually was.
47
Top Rules to Make a Helpful Data Visualization
Five-second rule: A data visualization should be clear, effective, and convincing enough to be absorbed in five seconds or less. Color contrast: Graphs and charts should use a diverging color palette to show contrast between elements. Conventions and expectations: Visuals and their organization should align with audience expectations and cultural conventions. For example, if the majority of your audience associates green with a positive concept and red with a negative one, your visualization should reflect this. Minimal labels: Titles, axes, and annotations should use as few labels as it takes to make sense. Having too many labels makes your graph or chart too busy. It takes up too much space and prevents the labels from being shown clearly.
48
Area chart
A data visualization that uses individual data points for a changing variable connected by a continuous line with a filled in area underneath
49
Box plot
A data visualization that displays the distribution of values along an x-axis
50
Bubble chart
A data visualization that displays individual data points as bubbles, comparing numeric values by their relative size
51
Bullet graph
A data visualization that displays data as a horizontal bar chart moving toward a desired value
52
Circle view
A data visualization that shows comparative strength in data
53
Column chart
A data visualization that uses individual data points for a changing variable, represented as vertical columns
54
Combo chart
A data visualization that combines more than one visualization type
55
Density map
A data visualization that represents concentrations, with color representing the number or frequency of data points in a given area on a map
56
Distribution graph
A data visualization that displays the frequency of various outcomes in a sample
57
Diverging color palette
A color theme that displays two ranges of data values using two different hues, with color intensity representing the magnitude of the values
58
Dashboard
A tool that organizes information from multiple datasets into one central location for tracking, analysis, and simple visualization.
59
Data Storytelling
Communicating the meaning of a dataset with visuals and a narrative that are customized for each particular audience.
60
Rule of Thumb for Presentations
Keep text to less than 5 lines & 25 words per slide.
61
Best Practices for Presentations
Use an attention-grabbing opening Start with broad ideas and later talk about specific details Speak in short sentences Pause for five seconds after showing a data visualization Pause intentionally at certain points Keep the pitch of your voice level Stand still and move with purpose Maintain good posture Look at your audience (or camera) while speaking Keep your message concise End by explaining why the data analysis matters
62
Best Practices for Slide Decks
Include a good title and subtitle that describe what you’re about to present Include the date of your presentation or the date when your slideshow was last updated Use a font size that lets the audience easily read your slides Showcase what business metrics you used Include effective visuals (like charts and graphs)
63
Evaluate your slide deck prior to presentation. Evaluation Criteria:
Include a title, subtitle, and date: Making sure that your slide deck presentation has a title, subtitle, and date makes sure that your audience knows exactly what you are presenting and when the information was from. That way they know it’s relevant and current to them! Use a logical sequence of slides: Organizing your slides in an order that makes sense guides your audience through your narrative, building understanding step by step. Provide an agenda with a timeline: An agenda offers a roadmap of your presentation, allowing your audience to follow along and anticipate key topics. Limit the amount of text on slides: Keeping text brief ensures clarity and retains the audience’s attention; aim for your audience to scan it within 5 seconds. Start with the business task: By immediately relating the content to the business task at hand, you contextualize your information, making it relevant and actionable. Establish the initial hypothesis: Presenting an initial hypothesis gives your audience a starting point for what to expect and frames the subsequent analysis. Show what business metrics you used: Clarifying which metrics you're analyzing validates your arguments and helps the audience gauge your presentation's relevance to business outcomes. Use visualizations: Visual aids can illustrate complex data more effectively than text alone, making your message more accessible. Introduce the graphic by name: A brief introduction to each graphic aids in understanding and retaining information. Provide a title for each graph: Titles act as signposts, helping the audience quickly grasp the meaning of each visual. Go from the general to the specific: Starting with a broad overview before diving into details ensures that all audience members are on the same page. Use speaker notes to help you remember talking points: Notes act as your cue cards, enabling a smoother delivery and ensuring no critical point is missed. Include key takeaways: Summarizing the main points at the end of your presentation reinforces the message and ensures the audience leaves with the intended takeaways.
64
Kindergarten simple
Don't include too much data. Make your presentation fun: games, quizzes, videos, ask the audience questions (don't just monologue at them) storytelling: if you tell a good story, you can get your audience to connect Get an ally in the room. find one or two people in the room and present beforehand. That way, you have people nodding along. And they can have your back if people try to poke holes in your thesis.
65
5-Second Rule for Presentations
1) Wait 5 seconds after showing a data visual (to give your audience a chance to process it) 2) Ask if they understand (If not, explain it) 3) And Give your audience another 5 seconds (to let it sink in) 4) Tell them the conclusion *Remember: This will be the first time some of the people in the audience encounter your data.
66
Presentation Checklist
Do I use an attention-grabbing opening? Do I start with broad ideas and later talk about specific details? Do I speak in short sentences? Do I pause for five seconds after showing a data visualization? Do I pause intentionally at certain points? Do I keep the pitch of my sentences level? Do I stand still and move with purpose? Do I have good posture? Do I look at my audience (or camera) while speaking? Do I keep my message concise? Do I end by explaining to my audience why the data analysis matters? You can also add checklist items that help you refine your slide deck: Do I include a good title and subtitle that describes what I’m about to present? Do I include the date of my presentation or the date when my slideshow was last updated? Does my font size let the audience easily read my slides? Do I showcase what business metrics I used? Do I include effective visuals (like charts and graphs)?
67
How to speak and hold yourself during presentations
keep your sentences short build in intentional pauses keep the pitch of your sentences level stay still and move with purpose good posture positive eye contact with audience Improve with every performance! Seek out feedback and keep on going.
68
Types of Presentation Objections (& how to handle them): 1) Objections about the data
Where did you get the data? What systems did the data come from? What transformations happened to the data? How fresh and accurate is the data? *You can include all of this information at the beginning of your presentation to set up the data context. You can add a more detailed breakdown in your appendix in case there are more questions
69
Types of Presentation Objections (& how to handle them): 2) Objections to your analysis
Is your analysis reproducible? -keep a change log documenting the steps you took. This way, someone else can follow along and reproduce the process. -you can even create a slide in the appendix section of your presentation explaining these steps if you think it will be necessary. -It can be useful to keep a clean version of your script if you're working with SQL or R. Who did you get feedback from during this process? -It is especially important that you're able to address this question when your analysis reveals insights that are the opposite of your audience's gut feelings about the data.
70
Types of Presentation Objections (& how to handle them): 3) Objections to your findings
Do these findings exist in previous time periods? Did you control for the differences in your data? -your audience wants to be sure that you accounted for any possible inconsistencies and that your results are accurate and useful.
71
How to Respond to Objections to Your Presentation
Communicate any assumptions about the data, your analysis, or you findings that may help answer questions posed --e.g., note that your team cleaned and formatted the data before analysis Explain why your analysis may be different than expected --Walk your audience through the variables that change the outcomes to help them understand how you got there. If an objection is valid, acknowledge that it is valid and take steps to investigate further. (you can follow up with more details afterward.)
72
business task
a question or problem you use data to solve—and a presentation demonstrates how to solve it.
73
How to respond to presentation questions
-Listen to the whole question -repeat the question (if necessary to ensure you are understanding them correctly) -understand the context -involve the whole audience --others may want to hear your answer to a questions as well. --if there's someone in the audience or on your team who may have insight, you can ask them for their thoughts -keep your responses short and to the point --start with a headline response that gives your stakeholders the basic answer. Then if they have more questions, you can go into more detail. (So: Answer the question directly with as few of words as possible. From there you can expand on your answer.) Example: Q: How was the data gathered? A: A survey was taken to measure an individual's happiness Q: Can you tell us more about the survey? A: (now that additional interest has been expressed about the survey, you can go into more detail.)
74
After you finish a presentation, an audience member asks for additional information. As you are providing a thorough response, the rest of the audience seems disinterested. How can you re-engage them?
Ask a question to the audience and/or Redirect to a new question
75
After a presentation, you are asked a complex question during the Q&A session. It will require a few hours of research for you to answer effectively. How should you proceed?
Ask for some time to find the answer & follow up promptly after the Q&A session.
76
Data blending
A Tableau method that combines data from multiple data sources
77
Framework
The context a presentation needs to create logical connections that tie back to the business task and metrics
78
Statistics
The study of how to collect, analyze, summarize, and present data
79
The second slide of a presentation should state the analysis project’s _____, including its objective and why it is important to the business.
purpose
80
You give a presentation about technical operations management concepts to an audience that knows very little about the subject. Which McCandless Method concept helps ensure your audience does not become distracted by something they don't understand?
Answer obvious questions before they're asked
81
When introducing a chart, you should____
Ask "Are there any questions about this chart?"
82
Internet of Things (IoT)
connecting any device/anything to the internet in order to exchange data.
83
Industry 1.0 Industry 2.0 Industry 3.0 Industry 4.0
Industry 1.0: Mechanization, steam power, weaving loom Industry 2.0: Mass production, assembly line, electrical energy Industry 3.0: Automation, computers, and electronics Industry 4.0: Cyber physical systems, internet of things, networks
84
Big Data
Huge Volume High Speed Different Types. Goal: Efficiently store, process, and analyze data to produce significant value for the business.
85
Data Architecture
Process of creating a blueprint for how how to organize, process, and store data.
86
Data Engineering
Design and build data pipelines and data storage. ETL processes: Extract, transform, and load data to the target storage.
87
Data Modeling
Put data into entities and objects and then describe the relationship between those entities to help us and the programs understand how the data points are related to each other.
88
Data Mining
The process of analyzing massive amounts of raw data in order to discover patterns, trends, etc. to solve business problems and mitigate risk.
89
Machine Learning
Provide computers with the raw data along with the mathematical models and algorithms. Then the computer will train and process in order to perform tasks like predictions and insights.
90
data science
scientific study of data. Uses programming knowledge, mathematics, and domain specific knowledge to uncover valuable insights from raw data.
91
data visualization
Convert number and raw data into visuals and charts.
92
Row-Level Security (RLS)
Restricting the rows of data a specific user can see, based on defined policies.
93
Top benefits of BI Tools (i.e., Tableau or PowerBI over Excel)
Automation Security Big Data functionality Interactivity Note: You can do complex calculations in excel and then import the final results into Tableau to produce better visualizations and gain better insights regarding the results.
94
What type of tool is Tableau?
-Data Visualization Tool -Business Intelligence (BI) Tool -Reporting Tool
95
tableau product suite
developer tools, data engineering: -prep developer tools, data visualization: -desktop -public sharing tools: -server -cloud -public -reader -mobile
96
tableau development process
1) Connect Data to Tableau 2) Build Visualizations 3) Share our Work via Publication *To do the above steps, use either Tableau Desktop or Tableau Public But, often the data is bad (not cleaned) and needs to be processed. In this case, you need to add a step: 1) Connect Data to Tableau 2) Prepare the Data 3) Build Visualizations 4) Share our Work via Publication *To prepare the data, use Tableau Prep
97
tableau desktop
You connect data; build views, dashboards, and stories; and publish workbooks and data sources. You can publish to tableau server, tableau cloud, tableau public, or locally.
98
tableau public desktop
Free version of tableau desktop You connect data; build views, dashboards, and stories; and publish workbooks and data sources. 4 Main Limitations: -10 data connectors (only local files) -Limited to 15M rows -Publishes only to Tableau Public (cloud) -You can't save locally. BUT all functions and tools needed to build visuals and dashboards are available. Cons: -Not secure as you have to publish your work to public platforms. -Made for everyone (vs. desktop, which is made for data scientists and analysts, and prep, which is made for data engineers) -You can't connect to a server, API, DB, or cloud
99
tableau prep
Use it to prepare data before analysis. Once you connect tableau to your data, you can build data flows. You'll then have access to tools and functions to transform your data (e.g., filter data, aggregate data, etc.)/prepare it for data visuals. You can then save the data to a local PC, publish it as a data source in tableau server or cloud, or write the output to a database. Then when your done with the data flow, you can publish it online to tableau server or tableau cloud. SO: tableau prep is a developer tool for data engineering. You can use it to connect data, build flows (clean, combine, aggregate, etc.), publish flows and data sources. It requires a license. 90 data connectors Output: -file (stored locally) -tableau data source -database table Publish Data Flow either in: -tableau server or -tableau cloud
100
Infrastructure as a Service (IaaS)
You may choose to outsource the hardware (i.e., you buy a service from cloud providers like Microsoft Azure, Amazon AWS, or Google Cloud.) The hardware includes servers (CPU, memory), Storage (HDD, SSD), network (internet, routers). So, they manage the hardware, & you manage all software and projects.
101
Software as a Service (SaaS)
Outsource the Hardware & Software. Example: Each time Tableau makes a new release, a new version of a Tableau Server has to be installed. If you have a small IT team, they may not have time to do that. So, you need to outsource the software. Tableau Cloud can manage the hardware & the software. This is called Software as a Service (SaaS)
102
Pros for Tableau Public
You can discover visuals and download the Tableau workbook if you want to know how a visual was made. you can follow creators. You can use it to create a Tableau portfolio. But remember: Limited security features.
103
Tableau Mobile
Only use it to view visuals can't use it to build visuals Free App requires a license to use It can connect to Tableau Server & Tableau Cloud Caches dashboards for offline access (so you can access visuals even if you are offline).
103
Tableau Reader
Sharing tool for data visualization. Free tool. can only be used to view visualizations. You can't build visuals, refresh data, or keep the data secure. Do not use it in an organization.
104
Tableau Live vs Tableau Extract
Tableau Live: Pulls directly from the source (DB). This takes longer--and thus affects performance--but the data is fresh. Tableau Extract: Pulls data from the "extract"--a pre-pulled source from the database--and is therefore faster (i.e., has better performance), but the data is not as fresh.
105
.TDS
File Type: Tableau Data Source Use Case: Perhaps you did a lot of work in the data source (e.g., you built a data model, renamed things, did aggregations, etc.) and want to share that with your team. But I'm not allowed to share my data with them, so you share the data source with your colleagues.
106
Tableau Workbook
Contains 3 Things: 1) Data Extract 2) Data Source 3) Data Visuals A data extract (or Tableau Data Extract, .hyper) is a file created from that data source, storing a compressed snapshot of the data locally on disk to improve performance by reducing query times and back-end load A data source is the actual location or origin of your data, like a database, Excel file, or web service.
107
.hyper
It stores extracted datasets (high-performance, compressed) It includes data. Typical File Size: Medium to Large Use this file type if you want to share only your data without the data source or visualizations. Best Use Case: Speed up queries, share extracts. publish optimized data to Tableau Server/Online Note: Can only open this file type with Tableau Desktop. (You can't open it with Tableau Reader or Tableau Public.)
108
Tableau Data Source vs Data Extract
A data source is the actual location or origin of your data, like a database, Excel file, or web service. A data extract (or Tableau Data Extract, .hyper) is a file created from that data source, storing a compressed snapshot of the data locally on disk to improve performance by reducing query times and back-end load
109
.tdsx
File Type: Tableau Packaged Data Source It stores: Data source definition (.tds) + extract (.hyper) + local files It includes data Typical file size: Medium Best Use Case: Shares a reusable, self-contained data source with others (connection + extract bundled) Example Use Case: My colleagues don't have access to the source system, so we can not use the live connection. But you can share your data. So, you can send them a package of an extract and a data source Note: You can open this file type only with .TDSX. (You can't open it with Tableau Reader or Tableau Public.)
110
.twb
File Type: Tableau Workbook It stores: workbook XML (dashboards, sheets, formatting, calculated fields) It does not include data. It just includes metadata. Typical File Size: Small. Best Use Case: Version control, lightweight sharing when recipients already have access to the underlying data. Send without the data inside. Note: You can only open this file type with Tableau Desktop. (You can't open it with Tableau Reader or Tableau Public.)
111
.twbx
File Type: Tableau Packaged Workbook. It stores: Workbook (.twb) + extracts (.hyper) + images/local assets It includes data File Size: Large (can be 10s-100s of MBs) Best Use Case: Share a fully portable workbook that works offline or with people lacking access to the original data sources. Use if you want to send the extract, the data source, and the data visuals. Note: You can open this file type with Tableau Desktop, Reader, and Public.
112
When to use .twb .twbx .hyper .tdsx
.twb: Use if everyone already has the data source. .twbx: Use when you need to hand someone everything in one file. .hyper: Use if you only want to share the data .tdsx: Use if you want to share a reusable connection and an extract combo.
113
Send workbook with data vs. Send workbook without data
Send workbook with data: .hyper: send only the data OR .tdsx: send the whole dataset with the data OR .twbx: send the whole package (i.e., data extract, data source, data visuals) Send workbook without data: .tds: the dataset without data .twb: the workbook
114
Metadata
Data about your data. Example: The metadata about a cat photo: Filename: Sonya.jpg Author: Musya Date: 10/7/2021
115
Data Source
Data could come from: -Database (like MySQL or Oracle) -Files (like Excel or JSON) -Cloud (like AWS or Azure)
116
Tableau Story
A sequence of visuals that work together to tell a data narrative.
117
Data Modeling
The process of organizing data in a clear and understandable way. Each model has -entities like customers and products or -events like orders. Inside the entities we have -attributes (i.e., information like first name and last name). We describe in a data model how the entities are connected or related to one another.
118
Conceptual Data Model
Big picture of the data. High-level representation of the data model without going into detail about how the data model is implemented. It's like a map that shows the important entities and relationships. Use to explain the data models to business analysts and stakeholders to help them understand the big picture of the data.
119
Logical Data Model
Blueprint for implementation. Provide more detail than the data conceptual model, looking at how the data is structured and organized. Define the attributes of each entity and include constraints and more details about the relationships between the entities. Standardly used by database designers and developers as a blueprint for implementation.
120
Physical Data Model
Shows how the data is implemented in the databases. Represents that actual implementations of the data model. It includes all the technical details about how to store the data (e.g., the data types of the attributes, the primary and foreign keys, indexes, etc. Used by developers to create and manage the database.
121
Events
records of actions or changes that occur over time examples: customer purchases product order is shipped user clicks on ad events capture a point-in-time action or change and are typically structured with attributes that describe the context like a timestamp, user ID, and details of the interaction.
122
Entities
Core objects or concepts we want to capture in a data model, such as a "customer", "product", or "order". Entities generally have attributes that describe their current state and they're often represented by records in databases, forming the foundation for operational data.
123
Star Schema
Central fact schema surrounded by dimensional tables. Fact tables contain events (i.e., records of actions or changes that occur over time like "customer purchases product" or "order is shipped" or "user clicks on ad"). Dimensional tables contain descriptive information.
124
Snowflake Schema
Like the star schema, but the dimensions are broken down into subdimensions.
125
Star Schema vs Snowflake Schema
Star Schema is simple and easy to understand. It's standardly used if a dataset is small or medium. Snowflake Schema is more complex. It reduces the storage space and it's standardly used with large datasets.
126
Event or Fact table
Fact tables contain events (i.e., records of actions or changes that occur over time like "customer purchases product" or "order is shipped" or "user clicks on ad"). Examples of what could be in a fact table: -Keys to the Dimension Table: Order ID, Customer ID, and Product ID -Dates, when the event happened: Order Date, Shipping Date, etc. -Measures: numeric, quantitative values: Sales, Quantity, Profit, Unit Price
127
How to determine if a table is a fact/event table or a dimension table?
Dimension: Describes physical persons or objects (e.g., employees, customers, products, etc.) Fact/Event: Contains events or transactions (e.g., sales, orders, logs, ATM transactions).
128
Tableau Options for connecting tables
Joins (these can only be done on the physical layer) -Inner Join, Left Join, Right Join, Full Join Union (these can only be done on the physical layer) -Combine tables into one big table vertically (e.g., add 2 tables with Date & ID columns) -Union Rules: 1) Both tables must have the same number of fields & 2) The fields should have the same data types Relationships (these can only be created on the logical layer) *Note a relationship does NOT create a new table. It simply links the tables. (Data) Blending (this is done on the visual layer)
129
Data Profiling
The process of examining and investigating the data to understand the content of the tables.
130
Tableau Relationships Cardinality, The Rule
Many: When a key has duplicate values Example: Customer ID in the below table would be many as 1 and 2 are repeated Customer ID 1 2 2 1 One: When a key had unique values (e.g., in the below Customers screenshot, the Customer IDs are all unique). Example: Customer ID in the below table be one as its unique Customer ID 1 2 3 Very important to choose the correct value. If you select "One" when there are "many", you'll miss values. And if you select "Many" when there is only "one", you'll hurt performance (tableau will look for duplicates when there are none.) Best Practice: Use default many-many cardinality if you're not sure. It may hurt performance a bit, but the results will be correct.
131
Data Blending
Method of combining data at the visualization level from two different data sources using a left join. Notes: -data blending can only be done on the visualization level on the worksheet page, not in the data source. -Tableau will use a left join (no changing that) -data blending is a unique feature of Tableau; it's not in PowerBI or any other BI tool. (You can't combine data from two different datasets using other tools.) -data blending works on a separate worksheet basis (i.e., start a new worksheet, start over)
132
How can you tell the primary vs secondary data source in Tableau?
Tableau will mark the primary data source with a blue icon
133
Joins
Combine the FIELDS of the tables Table A F1 F2 (column headers) Table B F3 F4 (column headers) Join of Tables A & B F1 F2 F3 F4 (column headers)
134
Unions
Combine the ROWS of the tables. Left table comes first in the unioned table Table A (left) F1 F2 x y Table B (right) F1 F2 a b Union of Tables A & B F1 F2 x y a b
135
Joins vs data blending
Joins: First combine and then aggregate -You can get duplicates Blending: First aggregate and then combine -You will not get duplicates Note: -Measures can be aggregated. Dimensions cannot be aggregated (e.g., a date can't be aggregated; can't add 2020 and 2023 together). -When multiple matches exist, an asterisk appears (e.g., Tableau can't add dates so will put an asterisk there in that instance)
136
Joins vs Relationships
If you use Joins, data will be static and you may lose data (e.g., if you do a left join, any data that doesn't match on the right-side will be lost.) If you use relationships, they will be more flexible and you won't lose data.
137
Tableau Data Source, Notes on Join Types
Once you decide the join type (e.g., left join), it will stay that join type for all of the worksheets. If you do a full join, it will slow down performance. You will lose data, if you do a left join or a right join.
138
Issues with merging tables
Merging tables may cause duplicates. Duplicates cause false aggregations For example, if you have a score column, we may have duplicates if some customers have more than one order. that could result in a lot of duplicates if we merge the customers and orders. Then if you do the average, you will get the wrong answer.
139
Join vs Union vs Relation vs Blending What's most ideal?
Baraa recommends relationships, which should be default in Tableau
140
If tables refer to different entities/things, always use ___
RELATIONSHIPS
141
Relationship>Performance Options>Cardinality If the data quality is bad (and we haven't cleaned it), then ____ If the data quality is good (we've cleaned it), then ____
Left side: fact side Right side: dimension side If the data quality is bad (and we haven't cleaned it), then leave the cardinality as is (MANY, MANY) If the data quality is good (we've cleaned it), then the fact-side is MANY and the dimension side is ONE (MANY, ONE) If unsure, you can always check that the dimension side is ONE (e.g., if the fact is the product ID, you're ensuring that all product IDs are unique; no numbers appear twice). If you're super unsure, just leave it as a MANY to MANY relationship (the default)
142
If tables refer to the same entities/things, consider using ___
joins/union (e.g., orders and customers tables are completely different entities)
143
Signs that there is a formatting error in Tableau
If Tableau displays numerical fields as a String Data Type Note: If you switch a column's data type from String to Number, it won't resolve the issue, because the format will still be unknown to Tableau. Solution: Go to the the Data Source>physical layer of the table that needs editing>right-click on it>select Text File Properties>Make sure the field separator is correct (should be semicolon--just to prevent errors--CSV files) & Locale: change to English (United States)
144
Europe format vs Non-Europe format for data
EU: 2,5 M Non-EU (e.g., USA, Asia, South Africa, etc.): 2.5M
145
(Tableau) Data Types
Numbers # String Abc Date Boolean T/F
146
tableau roles
Role I: -Dimension: Level of Details -Measure: Aggregation Role II: -Discrete: Separate values -Continuous: Connected values
147
(tableau) metadata
Data Types: -Numbers # -String Abc -Date -Boolean T/F Role I: -Dimension: Level of Details -Measure: Aggregation Role II: -Discrete: Separate values -Continuous: Connected values
148
Best Practices
Check the metadata (data types, role 1, role 2) after connecting the data to tableau to make sure that everything is assigned correctly. Data Types: -Numbers # -String Abc -Date -Boolean T/F Role I: -Dimension: Level of Details -Measure: Aggregation Role II: -Discrete: Separate values -Continuous: Connected values
149
Data Types
Specify the kind of information stored inside the data Define what operations can be performed on the data
150
Rules
The Keys of the Relationships must have the same data type
151
3 groups of tableau data types
Basic Types: -Whole Number: no decimals or fractions. negative, positive, or zero are whole numbers. -Decimal Number: decimals or fractions (e.g., 2.4 or 30.99) -String: sequence of characters, including letters, numbers, spaces, and special characters (e.g., $ ?). So any field can be converted to a string. -Date: lots of different date formats (i.e., / - . etc.) -Date & Time (aka time stamp): 2025-08-20 18:48:53 yyyy-mm-dd hh:mm:ss -Boolean: t/f t(1)/f(0) Roles: -geographic role: 12 types (i.e., None, Airport, Area Code (US), CBSA/MSA (US), City, Congressional District (US), Country/Region, County, Latitude, Longitude, NUTS Europe, State/Province, Zip code/post code). BUT most common: City, Country/Region, County, Zip code/Post code -image role Advanced Types: -groups -cluster groups -bins -sets
152
Data Type vs Roles
Data types are a MUST for each field Roles are extra to assign
153
Current Tableau Image Requirements (may change as Tableau upgrades)
-supported image extensions (.png, .jpeg, or .jpg) -URL must begin with: http or https -Image file must be > 128 kb
154
Online Analytical Practices (OLAP)
Data model that has the shape of a cube. Think of a rubik's cube. It has dimensions (length, width, height) and cells (aka measures so data, numbers, etc.) Example: Cube of Sales 3 dimensions: 1) Location (USA, France, Germany) 2) Time (Jan, Feb, March) 3) Category Inside the measures (cells) of cube: sales (e.g., 30, 40, 50, 45, 88, etc.) We could slice the cube to only have USA and then do the total of those sales, etc. You can also do drill-up, dicing, drill-down, roll-up, slicing, and pivot (look at hardcopy flashcard for visuals)
155
Dimensions vs Measures
Dimensions: -Dimensions contain qualitative or categorical values -e.g., product name, product category, location -use of dimensions: to categorize, filter, or show the level of detail Measures: -Measures contain quantitative and numerical values -e.g., sales, profit, quantity -measures can't be aggregated
156
Dimension vs Measure Decision Flow Chart
Is the Data Type a Number? Yes>Does it make sense to aggregate (i.e., sum, avg, etc.)>yes>Measure No>Dimension Yes>Does it make sense to aggregate (i.e., sum, avg, etc.)>no>Dimension
157
Tableau Colors (Blue vs Green) & Position (Dimension vs Measure)
Blue = Discrete Green = Continuous How to tell difference between discrete and continuous: Do you count the items (discrete) or do you measure them (continuous)? Example: Counting people is discrete, because you can't have half a person. But measuring a person's height is continuous, because you can be 5.5 feet or 5.51 feet, etc. Above the line = Dimension Below the line = Measure
158
Formula to create new views or reports
Formula: Measure by Dimension Combine any measure by any dimension Examples: -Sales by Product -Profit by Category -Quantity by Country
159
Discrete vs Continuous
How to tell difference between discrete and continuous: Do you count the items (discrete) or do you measure them (continuous)? Discrete: Disconnected and separate values Continuous: Connected and unbroken chain and values Example: Counting people is discrete, because you can't have half a person. But measuring a person's height is continuous, because you can be 5.5 feet or 5.51 feet, etc.
160
Discrete vs Continuous Colors in Tableau
continuous: green discrete: blue
161
Discrete vs Continuous Sorting Options
Discrete: Many sorting options: -Ascending -Descending -Data Source Order -Alphabetic -Field -Manual -Nested Continuous: Limited Sorting Options: -Ascending -Descending
162
Purpose/Analysis type for discrete vs continuous values
Discrete Purpose: Deep Dive Analysis/Helps analyze specific problems Standard Chart: Bar Chart Continuous Purpose: Big Picture Analysis/Helps to see the big picture Standard Chart: Line Chart
163
Naming Conventions, Most Common
snake_case: used in python, PHP, ruby camelCase: first word is lowercase, but all following words are capitalized Used in java, javascript, and typescript iPhone PascalCase: Capitalized, no separation. Used in Java and C# kebab-case: lowercase, each word separated by - Used in HTML, CSS Title Case: Capitalized, space between words Customer Name Used in Tableau *Once you connect your data to Tableau, Tableau will rename everything according to this rule.
164
Data Source>Metadata Grid>Remote Field Name vs Field Name
Remote Field Name: Comes from the original source of the data. The original dataset should have a specific naming convention. (Keep in mind that Tableau automatically converts all data to abide to the Title Case naming convention; but, of course, the original data set may use snake_case or camelCase or PascalCase or kebab-case. Field Name: Comes from Tableau after it renames data according to Title Case. Example: Field Name: Product ID Remote Field Name: Product_ID
165
can you easily rename fields in tableau and powerbi?
tableau: yes powerbi: NO. It will break your views if you rename fields. Proceed with caution.
166
How do you change the y axis title? How do you change the x axis title?
y axis: Double click on the name in the rows section and enter //Name before the current name and then hit shift enter at the same time x axis: right-click>edit axis>update the title
167
Parts of data source page in tableau
Data Model (visual) Metadata grid Data
168
Use Cases of Aliases
-Poor Data Quality (e.g., a dataset has "Germany" and "Deutchland" and "USA" and "America") -Abbreviate long words or values & don't have enough space (e.g., "DE" as opposed to "Germany" and "US" as opposed to "America")
169
Aliases
Alternate names for the members of a discrete dimension field, so that their labels appear differently in the view. Note: tableau does not allow you to create aliases from measures or continuous dimension fields
170
Root Node Branch Leaf Nodes/Leaves Drill down Drill up
Root Node: The highest level of aggregation Branch: Connects the different levels of the hierarchy Leaf Nodes/Leaves: The most detailed level of the hierarchy; this level has no children Drill down: every time you drill down, you will see more details about your data; each drill down, you'll jump to the next level in your data. Drill up: Bottom to top. Go up each level, starting at the leaf node/leaves
171
Hierarchy Rules in Tableau
You can only create hierarchies in Tableau on the worksheet page, not on the data source page. Hierarchies can be created only using dimensions.
172
Tableau Groups
Groups combine similar or related values into a higher-level category, creating a new dimension for data analysis. Note: Groups are created using dimensions ONLY. Examples: GROUP GROUP Product ID Product Name Category Class 1 Samsung FHD Monitor Monitor Class A GROUP GROUP Group Customer ID Name Country City Postal Code
173
How to create groups in Tableau
right-click on the root node in the hierarchy>create>group>select the items to include in the group by clicking and using shift>Group button>name group & repeat as necessary, depending on how many groups wanted>click apply
174
Best Practices for Groups
Create groups from dimension with high cardinality directly from the view
175
Groups
Groups combine similar and related values into higher-level categories Groups are created using dimensions only Groups simplify data by categorizing data points into clear, relevant categories
176
centroid
the geometric center/the center of a figure
177
k-means
Tableau creates clusters for the data points and each cluster will have its own centroid. Note: centroid: the geometric center/the center of a figure
178
Clusters
-Tableau's cluster group is a statistical technique that groups similar data points together in clusters. -Tableau uses the k-means algorithm for clustering (i.e., Tableau creates clusters for the data points and each cluster will have its own centroid). -Tableau can plot endless data points in visualizations. (PowerBI places limitations on the numbers of data points that you can see in the visual.) -Data clustering in visuals is a powerful tool for analysis and pattern recognition, enabling data-driven decisions.
179
Clusters vs Groups
Clusters are statistically derived, automatically generated groupings of similar data points in a view. Groups are manually created by users to combine specific measures of a dimension into a single entity.
180
How to cluster in Tableau
Open worksheet>select data source>select columns, rows, and detail (i.e., data points)>Analytics tab>select "Cluster" and drag over to the view>pick the number of clusters if don't want to stick with the automatic number>change color and shape if wanted
181
Sets
-Sets divide data based on specific criteria into two subsets: ---IN: contains all members of the set ---OUT: contains members not included in the set -Sets are useful for focusing on a subset of the data and comparing it with the remaining data. -Sets add interactivity and dynamics to view by allowing users to define which subset they want to focus on. ---- Create an in-group and an out-group Used to: -focus the analysis on specific data -compare a subset of data with the remaining data Methods of creating a set: 1) Fixed Sets: manually select which customers are IN and which are OUT 2) Dynamic Sets: --Condition (e.g., condition: if the score is greater than 400, then the customer is in the set, otherwise out.) --Rank (e.g., rank: set will include only the top 2 highest scores) 3) Combined Sets (i.e., combine two different datasets together): --full: all members --inner: shared members --left: set1 except shared members --right: set 2 except shared members Example: -Full/All Members: The customer is in if its a member of at least one set -Inner/Shared Members: The customer must be a member of both sets -Left/Set1 except shared members: Customers must be a member of set1 but cannot be a member of set2 -Right/Set2 except shared members: Customers must be a member of set2 but cannot be a member of set1
182
How to create sets in tableau
Open worksheet>select data source>create rows/columns/details/etc>right-click on value that want to create a set of in the data pane>create>set> Different options: -General: manual selection -Condition: dynamic set, condition -Top: dynamic set, rank >General>Select what's in or out >Condition>set the rule >Top>define tier by field Use Cases: -Use sets to highlight data points in the view. -Focus on a specific subset. -Show set as a quick filter. -Use sets in actions to let users define in/out subsets.
183
How to make sets interactive for users
Bar at top>Worksheet (right after File and Data on the bar)>Actions>Add Action>Change Set Values Change set values (i.e., the actions of the users will change the values in the set) >Name the action>Select the worksheet it will be applied to Define the behavior of the user: Run action on: Hover/Select/Menu/Single-Select Source Select the Target Set Then Running this action will: 1) Assign values to set: Users create a completely new set OR 2) Add values to set: Users add new members to the set Changing the selection will: 1) Keep set values OR 2) Add all values to set: once the users start moving away from the selection, all the members or customers are going to be in the in-group/going to be inside the set
184
Formulas
Measure by Dimension Measure by Measure (Bins) Example: -Profit by Sales (Bins) -Quantity by Profit (Bins)
185
Bins
Divide the data into groups of equally sized containers, resulting in a systematic distribution of data. Bins are used to create charts called hisotgrams Formula: Measure by Measure (Bins) Example: -Profit by Sales (Bins) -Quantity by Profit (Bins)
186
Bar Graph vs Histogram
Bar Graph: Shows categorical data with distinct gaps Ex//Different types of fruit liked by people x axis: fruits people like (apple, guava, banana, mango) y axis: Number of people (1, 2, 3, 4, 5) Histogram: Illustrates continuous data without gaps. Ex1// height of trees in Lincoln Park x-axis: height in cms (100-150, 150-200, 200-250,250-300, 300-350). Note: no spaces between the bins on the histogram because represent a range of values (e.g., 100-150) as opposed to a discrete value (e.g., apple) y-axis: number of trees (1, 2, 3, 4, 5, 6) Ex2// Number of machines working in a factory for a fixed amount of time x-axis: time in minutes (20-25, 25-30, 30-35, 35-40) y-axis: number of machines (1, 2, 3, 4, 5, 6, 7, etc.)
187
How to create bins
Option 1: Worksheet>Select Data Source>Pick value (e.g., score within the customers table), right-click>Create>Bins>Field name (change if wanted)>Size of bins (note: tableau will automatically calculate the size of the bins based on the data) but you can change it>Min & max value (change if wanted)>if you check the data pane, a new value will appear (name will be whatever you chose, e.g., Score(bin)). If you want to convert the chart to a histogram (i.e., continuous values), right click the value (e.g., Score(bin)>Convert to Continuous & then go up to the column or row where that value (i.e., Score(bin) appears and convert that to continuous too. Option 2: Worksheet>Set up rows>Show Me>select histogram (only need one measure for this visual)
188
Frequency Distribution Types -Normal -Right-Skewed -Left-Skewed -Uniform -Symmetric Bimodal -Non-Symmetric Bimodal
Normal Distribution (unimodal/symmetric/the "bell curve"): When graphed, a vertical line at the middle will form mirror images, i.e., the highest vertical column is in the center. Example: Distribution of male heights in the US. The average height of a male in the US is 69.1 inches (5ft 7 inches) with some shorter and some taller. Right-Skewed Distribution (positively-skewed): Fewer data plots are found to the right of the graph (toward the larger numeric values). The "tail" of the graph is pulled toward higher positive numbers/the right. (In practice, the right-side of the graph has higher bars than the left.) Ex// x-axis: 10, 20, 30, 40, 50 y-axis: the highest bar is at 10 and then it goes down as you go right (to 70) Example: Distribution of household incomes in the U.S. Most households earn between $40k and $80k per year, but there is a long tail of households that earn much more. NOTE: Tail extends to the right. Left-Skewed Distribution (negatively-skewed): Fewer data plots are found to the left of the graph (toward the smaller numeric values). The "tail" of the graph is pulled toward the lower or negative number, i.e., the left. Ex// x-axis: 10, 20, 30, 40, 50 y-axis: The lowest bar is at 10 and then it goes up as you go right (to 70). Examples: Distribution of age of death Most people live to between 70 and 80 years old with fewer living less than this. NOTE: Tail extends to the left Uniform Distribution (equal spread, no peaks): The data is spread equally across the range. There are no clear peaks in these graphs as the data entry appears the same number of times in the set. Symmetric Bimodal Distribution (two modes) Example: Exam scores fall into a lot of As and a lot of Fs. So Group 1 is prepared for the class and group 2 is unprepared for the class. Non-Symmetric Bimodal Distribution (two modes) Example: Data representing the time it takes for employees to complete a task. One peak for highly efficient employees and another for less efficient employees. If most employees are efficient, the distribution would be right-skewed with the less efficient group creating a longer tail to the left, indicating a non-symmetric bimodal shape. NOTES: -Unimodal Distribution: One clear peak -Bimodal Distribution: Two clear peaks. This usually indicates that you've got two different groups, e.g., exam scores fall into a lot of As and a lot of Fs. So Group 1 is prepared for the class and group 2 is unprepared for the class. -Bell-Shaped Distribution: Single peak at the center
189
Summary of Bins & Histograms in Tableau
-Bins divide data into equally sized groups, resulting in systematic data distribution -ONLY measures are used to create bins -Bins are dimensions, and it's better to convert them into continuous measures. -Calculated fields cannot be used to create bins. -Histograms in statistics show the frequency of data within a certain range.
190
Filters
Remove or select specific subsets of data for different purposes and use cases. Purpose: -Reduce the size of dataset to optimize the performance of the dashboard. -Interactivity & Analysis: Offer filters to users so they can focus on subsets of the data. (Different users may be interested in different sections of the data.) -Data Security: Remove sensitive data. Use filters to restrict or hide sensitive data from the users. -Data Access Control by Applying Row-Level Security (RLS): use filters to limit access to data based on user role and permissions (e.g., employees should not see sales/employee as the managers do)
191
Filter Types
Visual: Source system>Extract Filter(optional)>Data Source>Data Source Filter(optional)>Context Filter(optional)>Worksheet>Dimension Filter>Measure Filter>Table Calculation Filter First Processed to Last Processed Filter: Extract Filter>Data Source Filter>Context Filter>Dimension Filter>Measure Filter>Table Calculation Filter Optimize Performance of a Dashboard: 1) extract filter: Used to filter data before it even enters Tableau (i.e., before it reaches the data source). (You can't use it on a live data connection, only on an extract data connection.) -Only extract connection -Only tableau desktop -purpose: optimize load performance & optimize performance in views 2) data source filter: filter data between data source and worksheets. -extract and live connection -tableau desktop and public -purpose: optimize performance in views & hide sensitive data 3) context filter: creates temporal, filtered subset of data in worksheets -downside: lose performance to create the subset. Why use a data source filter & a context filter? Context filter enables you to have different filter criteria for each worksheet. In some scenarios, you can't use a data source filter, because you have have different requirements and focuses in each worksheet. Interactivity: We offer the below filters to users so they can slice and dice data to focus on specific subsets of the data. -dimension filter -measure filter -table calculation filter
192
How to add a Data Source Filter
Data Source page>Filters (upper right-hand corner)>Add>Add...select value you want a filter on (e.g., country)>select what you want filtered (e.g., USA, exclude)>OK>OK>all of your datasets will update accordingly (no worksheets will show US data)
193
common methods to reduce amount of data to improve performance
Use the data fields to reduce the size of the data (i.e., limit the number of years included in the data source). Ask users if need all, say, 5 years of data or if it's enough to have 2 years. Caution: All worksheets connected to this data source will be affected by these filters.
194
How to add a Context Filter
drop value in the filters section on a worksheet>right-click>Create filter>select whatever filter wanted. Note: Keep in mind that you can't hide sensitive data with the context filter. Viewers can select the arrow in the filter on the view and have it show all values in the dataset.
195
Dimension Filters
Put a value in "Filter" on a worksheet>right-click>edit filter>General to exclude or include values OR Wildcard to set up a rule OR Condition to set up a rule whereby profit = sum of x (or some such thing) OR Top in order to see top values Tip: Use Wildcard to set up a rule if you have a dimension with high cardinality (i.e., a long list of all possible values in a dimension). You can right-click on the value in the filters section of the worksheet after you apply and then hit "show filter" so users can see an interact with the filters.
196
Dimension with high cardinality
You have a long list of all possible values in the dimension Tip: You're not going to want to select all values manually for a filter as it would take a long time. You can instead use a dimension filter and "Wildcard" to define a rule.
197
Measure Filter
If you put a measure value on the Filters section of a worksheet, a window will pop up>it will ask if you want "all values" (i.e., all values from the dataset) or if you want an aggregated value (i.e., sum, avg, median, etc.) such that the values are aggregated AND THEN filtered>select your choice>Next> Range of Values (range, at least, at most, special) (special lets you decide if you want to show only null values, only non-null values, or all values). >Apply
198
Table Calculation Filter
Place a value in the marks section of a worksheet>right-click it>Quick table calculation>percent of total (or whatever you want to select)>control and drag it to the filter section of the worksheet
199
Quick Filters
Filters that can be used by users to interact and filter worksheets
200