Tableau Flashcards

Question

pie chart

Answer 1

is a circular graph that is divided into segments representing proportions corresponding to the quantity it represents, especially when dealing with parts of a whole.

Answer 2

show relationships between different variables. Scatterplots are typically used for two variables for a set of data, although additional variables can be displayed.

Answer 3

displays the spread of various outcomes in a dataset.

Answer 4

This is a trend or instance of observations that become different over time. A great way to measure change in data is through a line or column chart.

Answer 5

This is a trend or instance of observations that become different over time. A great way to measure change in data is through a line or column chart.

Answer 6

These are observations considered in relation or in proportion to something else. You have probably seen examples of relativity data in a pie chart.

Answer 7

This is a position in a scale of achievement or status. Data that requires ranking is best represented by a column chart.

Answer 8

This shows a mutual relationship or connection between two or more things. A scatterplot is an excellent way to represent this type of data pattern.

Answer 9

Histogram or Density Plot

Answer 10

Line chart or pie chart

Answer 11

Scatterplot or heatmap

Answer 12

Combining the individual parts in a visualization and displaying them together as a whole.

Answer 13

A process used to solve complex problems in a user-centric way.

Answer 14

Empathize: Thinking about the emotions and needs of the target audience for the data visualization Define: Figuring out exactly what your audience needs from the data Ideate: Generating ideas for data visualization Prototype: Putting visualizations together for testing and feedback Test: Showing prototype visualizations to people before stakeholders see them

Answer 15

Text that provides an alternative to non-text content, such as images and videos

Answer 16

A data visualization that displays the frequency of various outcomes in a sample

Answer 17

Dimensions contain qualitative values (such as names, dates, or geographical data). You can use dimensions to categorize, segment, and reveal details in your data.

Answer 18

Measures contain numeric, quantitative values that you can measure. Measures can be aggregated. When you drag a measure into the view, Tableau applies an aggregation to that measure (by default).

Answer 19

Displays two ranges of values using color intensity to show the magnitude of the number and the actual color to show which range the number is from.

Answer 20

The data-ink entails focusing on the part of the visual that is essential to understanding the point of the chart. Try to minimize non-data ink like boxes around legends or shadows to optimize the data-ink ratio.

Answer 21

1) Cutting off the y-axis Changing the scale on the y-axis can make the differences between different groups in your data seem more dramatic, even if the difference is actually quite small. 2) Misleading use of a dual y-axis. Using a dual y-axis without clearly labeling it in your data visualization can create extremely misleading charts. 3) Artificially limiting the scope of the data If you only consider the part of the data that confirms your analysis, your visualizations will be misleading because they don’t take all of the data into account. 4) Problematic choices in how data is binned or grouped It is important to make sure that the way you are grouping data isn’t misleading or misrepresenting your data and disguising important trends and insights. 5) Using part-to-whole visuals when the totals do not sum up appropriately. If you are using a part-to-whole visual like a pie chart to explain your data, the individual parts should add up to equal 100%. If they don’t, your data visualization will be misleading. 6) Hiding trends in cumulative charts Creating a cumulative chart can disguise more insightful trends by making the scale of the visualization too large to track any changes over time. 7) Artificially smoothing trends Adding smooth trend lines between points in a scatter plot can make it easier to read that plot, but replacing the points with just the line can actually make it appear that the point is more connected over time than it actually was.

Answer 22

Five-second rule: A data visualization should be clear, effective, and convincing enough to be absorbed in five seconds or less. Color contrast: Graphs and charts should use a diverging color palette to show contrast between elements. Conventions and expectations: Visuals and their organization should align with audience expectations and cultural conventions. For example, if the majority of your audience associates green with a positive concept and red with a negative one, your visualization should reflect this. Minimal labels: Titles, axes, and annotations should use as few labels as it takes to make sense. Having too many labels makes your graph or chart too busy. It takes up too much space and prevents the labels from being shown clearly.

Answer 23

A data visualization that uses individual data points for a changing variable connected by a continuous line with a filled in area underneath

Answer 24

A data visualization that displays the distribution of values along an x-axis

Answer 25

A data visualization that displays individual data points as bubbles, comparing numeric values by their relative size

Answer 26

A data visualization that displays data as a horizontal bar chart moving toward a desired value

Answer 27

A data visualization that shows comparative strength in data

Answer 28

A data visualization that uses individual data points for a changing variable, represented as vertical columns

Answer 29

A data visualization that combines more than one visualization type

Answer 30

A data visualization that represents concentrations, with color representing the number or frequency of data points in a given area on a map

Answer 31

A data visualization that displays the frequency of various outcomes in a sample

Answer 32

A color theme that displays two ranges of data values using two different hues, with color intensity representing the magnitude of the values

Answer 33

A tool that organizes information from multiple datasets into one central location for tracking, analysis, and simple visualization.

Answer 34

Communicating the meaning of a dataset with visuals and a narrative that are customized for each particular audience.

Answer 35

Keep text to less than 5 lines & 25 words per slide.

Answer 36

Use an attention-grabbing opening Start with broad ideas and later talk about specific details Speak in short sentences Pause for five seconds after showing a data visualization Pause intentionally at certain points Keep the pitch of your voice level Stand still and move with purpose Maintain good posture Look at your audience (or camera) while speaking Keep your message concise End by explaining why the data analysis matters

Answer 37

Include a good title and subtitle that describe what you’re about to present Include the date of your presentation or the date when your slideshow was last updated Use a font size that lets the audience easily read your slides Showcase what business metrics you used Include effective visuals (like charts and graphs)

Answer 38

Include a title, subtitle, and date: Making sure that your slide deck presentation has a title, subtitle, and date makes sure that your audience knows exactly what you are presenting and when the information was from. That way they know it’s relevant and current to them! Use a logical sequence of slides: Organizing your slides in an order that makes sense guides your audience through your narrative, building understanding step by step. Provide an agenda with a timeline: An agenda offers a roadmap of your presentation, allowing your audience to follow along and anticipate key topics. Limit the amount of text on slides: Keeping text brief ensures clarity and retains the audience’s attention; aim for your audience to scan it within 5 seconds. Start with the business task: By immediately relating the content to the business task at hand, you contextualize your information, making it relevant and actionable. Establish the initial hypothesis: Presenting an initial hypothesis gives your audience a starting point for what to expect and frames the subsequent analysis. Show what business metrics you used: Clarifying which metrics you're analyzing validates your arguments and helps the audience gauge your presentation's relevance to business outcomes. Use visualizations: Visual aids can illustrate complex data more effectively than text alone, making your message more accessible. Introduce the graphic by name: A brief introduction to each graphic aids in understanding and retaining information. Provide a title for each graph: Titles act as signposts, helping the audience quickly grasp the meaning of each visual. Go from the general to the specific: Starting with a broad overview before diving into details ensures that all audience members are on the same page. Use speaker notes to help you remember talking points: Notes act as your cue cards, enabling a smoother delivery and ensuring no critical point is missed. Include key takeaways: Summarizing the main points at the end of your presentation reinforces the message and ensures the audience leaves with the intended takeaways.

Answer 39

Don't include too much data. Make your presentation fun: games, quizzes, videos, ask the audience questions (don't just monologue at them) storytelling: if you tell a good story, you can get your audience to connect Get an ally in the room. find one or two people in the room and present beforehand. That way, you have people nodding along. And they can have your back if people try to poke holes in your thesis.

Answer 40

1) Wait 5 seconds after showing a data visual (to give your audience a chance to process it) 2) Ask if they understand (If not, explain it) 3) And Give your audience another 5 seconds (to let it sink in) 4) Tell them the conclusion *Remember: This will be the first time some of the people in the audience encounter your data.

Answer 41

Do I use an attention-grabbing opening? Do I start with broad ideas and later talk about specific details? Do I speak in short sentences? Do I pause for five seconds after showing a data visualization? Do I pause intentionally at certain points? Do I keep the pitch of my sentences level? Do I stand still and move with purpose? Do I have good posture? Do I look at my audience (or camera) while speaking? Do I keep my message concise? Do I end by explaining to my audience why the data analysis matters? You can also add checklist items that help you refine your slide deck: Do I include a good title and subtitle that describes what I’m about to present? Do I include the date of my presentation or the date when my slideshow was last updated? Does my font size let the audience easily read my slides? Do I showcase what business metrics I used? Do I include effective visuals (like charts and graphs)?

Answer 42

keep your sentences short build in intentional pauses keep the pitch of your sentences level stay still and move with purpose good posture positive eye contact with audience Improve with every performance! Seek out feedback and keep on going.

Answer 43

Where did you get the data? What systems did the data come from? What transformations happened to the data? How fresh and accurate is the data? *You can include all of this information at the beginning of your presentation to set up the data context. You can add a more detailed breakdown in your appendix in case there are more questions

Answer 44

Is your analysis reproducible? -keep a change log documenting the steps you took. This way, someone else can follow along and reproduce the process. -you can even create a slide in the appendix section of your presentation explaining these steps if you think it will be necessary. -It can be useful to keep a clean version of your script if you're working with SQL or R. Who did you get feedback from during this process? -It is especially important that you're able to address this question when your analysis reveals insights that are the opposite of your audience's gut feelings about the data.

Answer 45

Do these findings exist in previous time periods? Did you control for the differences in your data? -your audience wants to be sure that you accounted for any possible inconsistencies and that your results are accurate and useful.

Answer 46

Communicate any assumptions about the data, your analysis, or you findings that may help answer questions posed --e.g., note that your team cleaned and formatted the data before analysis Explain why your analysis may be different than expected --Walk your audience through the variables that change the outcomes to help them understand how you got there. If an objection is valid, acknowledge that it is valid and take steps to investigate further. (you can follow up with more details afterward.)

Answer 47

a question or problem you use data to solve—and a presentation demonstrates how to solve it.

Answer 48

-Listen to the whole question -repeat the question (if necessary to ensure you are understanding them correctly) -understand the context -involve the whole audience --others may want to hear your answer to a questions as well. --if there's someone in the audience or on your team who may have insight, you can ask them for their thoughts -keep your responses short and to the point --start with a headline response that gives your stakeholders the basic answer. Then if they have more questions, you can go into more detail. (So: Answer the question directly with as few of words as possible. From there you can expand on your answer.) Example: Q: How was the data gathered? A: A survey was taken to measure an individual's happiness Q: Can you tell us more about the survey? A: (now that additional interest has been expressed about the survey, you can go into more detail.)

Answer 49

Ask a question to the audience and/or Redirect to a new question

Answer 50

Ask for some time to find the answer & follow up promptly after the Q&A session.

Answer 51

A Tableau method that combines data from multiple data sources

Answer 52

The context a presentation needs to create logical connections that tie back to the business task and metrics

Answer 53

The study of how to collect, analyze, summarize, and present data

Answer 54

Answer obvious questions before they're asked

Answer 55

Ask "Are there any questions about this chart?"

Answer 56

connecting any device/anything to the internet in order to exchange data.

Answer 57

Industry 1.0: Mechanization, steam power, weaving loom Industry 2.0: Mass production, assembly line, electrical energy Industry 3.0: Automation, computers, and electronics Industry 4.0: Cyber physical systems, internet of things, networks

Answer 58

Huge Volume High Speed Different Types. Goal: Efficiently store, process, and analyze data to produce significant value for the business.

Answer 59

Process of creating a blueprint for how how to organize, process, and store data.

Answer 60

Design and build data pipelines and data storage. ETL processes: Extract, transform, and load data to the target storage.

Answer 61

Put data into entities and objects and then describe the relationship between those entities to help us and the programs understand how the data points are related to each other.

Answer 62

The process of analyzing massive amounts of raw data in order to discover patterns, trends, etc. to solve business problems and mitigate risk.

Answer 63

Provide computers with the raw data along with the mathematical models and algorithms. Then the computer will train and process in order to perform tasks like predictions and insights.

Answer 64

scientific study of data. Uses programming knowledge, mathematics, and domain specific knowledge to uncover valuable insights from raw data.

Answer 65

Convert number and raw data into visuals and charts.

Answer 66

Restricting the rows of data a specific user can see, based on defined policies.

Answer 67

Automation Security Big Data functionality Interactivity Note: You can do complex calculations in excel and then import the final results into Tableau to produce better visualizations and gain better insights regarding the results.

Answer 68

-Data Visualization Tool -Business Intelligence (BI) Tool -Reporting Tool

Answer 69

developer tools, data engineering: -prep developer tools, data visualization: -desktop -public sharing tools: -server -cloud -public -reader -mobile

Answer 70

1) Connect Data to Tableau 2) Build Visualizations 3) Share our Work via Publication *To do the above steps, use either Tableau Desktop or Tableau Public But, often the data is bad (not cleaned) and needs to be processed. In this case, you need to add a step: 1) Connect Data to Tableau 2) Prepare the Data 3) Build Visualizations 4) Share our Work via Publication *To prepare the data, use Tableau Prep

Answer 71

You connect data; build views, dashboards, and stories; and publish workbooks and data sources. You can publish to tableau server, tableau cloud, tableau public, or locally.

Answer 72

Free version of tableau desktop You connect data; build views, dashboards, and stories; and publish workbooks and data sources. 4 Main Limitations: -10 data connectors (only local files) -Limited to 15M rows -Publishes only to Tableau Public (cloud) -You can't save locally. BUT all functions and tools needed to build visuals and dashboards are available. Cons: -Not secure as you have to publish your work to public platforms. -Made for everyone (vs. desktop, which is made for data scientists and analysts, and prep, which is made for data engineers) -You can't connect to a server, API, DB, or cloud

Answer 73

Use it to prepare data before analysis. Once you connect tableau to your data, you can build data flows. You'll then have access to tools and functions to transform your data (e.g., filter data, aggregate data, etc.)/prepare it for data visuals. You can then save the data to a local PC, publish it as a data source in tableau server or cloud, or write the output to a database. Then when your done with the data flow, you can publish it online to tableau server or tableau cloud. SO: tableau prep is a developer tool for data engineering. You can use it to connect data, build flows (clean, combine, aggregate, etc.), publish flows and data sources. It requires a license. 90 data connectors Output: -file (stored locally) -tableau data source -database table Publish Data Flow either in: -tableau server or -tableau cloud

Answer 74

You may choose to outsource the hardware (i.e., you buy a service from cloud providers like Microsoft Azure, Amazon AWS, or Google Cloud.) The hardware includes servers (CPU, memory), Storage (HDD, SSD), network (internet, routers). So, they manage the hardware, & you manage all software and projects.

Answer 75

Outsource the Hardware & Software. Example: Each time Tableau makes a new release, a new version of a Tableau Server has to be installed. If you have a small IT team, they may not have time to do that. So, you need to outsource the software. Tableau Cloud can manage the hardware & the software. This is called Software as a Service (SaaS)

Answer 76

You can discover visuals and download the Tableau workbook if you want to know how a visual was made. you can follow creators. You can use it to create a Tableau portfolio. But remember: Limited security features.

Answer 77

Only use it to view visuals can't use it to build visuals Free App requires a license to use It can connect to Tableau Server & Tableau Cloud Caches dashboards for offline access (so you can access visuals even if you are offline).

Answer 78

Sharing tool for data visualization. Free tool. can only be used to view visualizations. You can't build visuals, refresh data, or keep the data secure. Do not use it in an organization.

Answer 79

Tableau Live: Pulls directly from the source (DB). This takes longer--and thus affects performance--but the data is fresh. Tableau Extract: Pulls data from the "extract"--a pre-pulled source from the database--and is therefore faster (i.e., has better performance), but the data is not as fresh.

Answer 80

File Type: Tableau Data Source Use Case: Perhaps you did a lot of work in the data source (e.g., you built a data model, renamed things, did aggregations, etc.) and want to share that with your team. But I'm not allowed to share my data with them, so you share the data source with your colleagues.

Answer 81

Contains 3 Things: 1) Data Extract 2) Data Source 3) Data Visuals A data extract (or Tableau Data Extract, .hyper) is a file created from that data source, storing a compressed snapshot of the data locally on disk to improve performance by reducing query times and back-end load A data source is the actual location or origin of your data, like a database, Excel file, or web service.

Answer 82

It stores extracted datasets (high-performance, compressed) It includes data. Typical File Size: Medium to Large Use this file type if you want to share only your data without the data source or visualizations. Best Use Case: Speed up queries, share extracts. publish optimized data to Tableau Server/Online Note: Can only open this file type with Tableau Desktop. (You can't open it with Tableau Reader or Tableau Public.)

Answer 83

A data source is the actual location or origin of your data, like a database, Excel file, or web service. A data extract (or Tableau Data Extract, .hyper) is a file created from that data source, storing a compressed snapshot of the data locally on disk to improve performance by reducing query times and back-end load

Answer 84

File Type: Tableau Packaged Data Source It stores: Data source definition (.tds) + extract (.hyper) + local files It includes data Typical file size: Medium Best Use Case: Shares a reusable, self-contained data source with others (connection + extract bundled) Example Use Case: My colleagues don't have access to the source system, so we can not use the live connection. But you can share your data. So, you can send them a package of an extract and a data source Note: You can open this file type only with .TDSX. (You can't open it with Tableau Reader or Tableau Public.)

Answer 85

File Type: Tableau Workbook It stores: workbook XML (dashboards, sheets, formatting, calculated fields) It does not include data. It just includes metadata. Typical File Size: Small. Best Use Case: Version control, lightweight sharing when recipients already have access to the underlying data. Send without the data inside. Note: You can only open this file type with Tableau Desktop. (You can't open it with Tableau Reader or Tableau Public.)

Answer 86

File Type: Tableau Packaged Workbook. It stores: Workbook (.twb) + extracts (.hyper) + images/local assets It includes data File Size: Large (can be 10s-100s of MBs) Best Use Case: Share a fully portable workbook that works offline or with people lacking access to the original data sources. Use if you want to send the extract, the data source, and the data visuals. Note: You can open this file type with Tableau Desktop, Reader, and Public.

Answer 87

.twb: Use if everyone already has the data source. .twbx: Use when you need to hand someone everything in one file. .hyper: Use if you only want to share the data .tdsx: Use if you want to share a reusable connection and an extract combo.

Answer 88

Send workbook with data: .hyper: send only the data OR .tdsx: send the whole dataset with the data OR .twbx: send the whole package (i.e., data extract, data source, data visuals) Send workbook without data: .tds: the dataset without data .twb: the workbook

Answer 89

Data about your data. Example: The metadata about a cat photo: Filename: Sonya.jpg Author: Musya Date: 10/7/2021

Answer 90

Data could come from: -Database (like MySQL or Oracle) -Files (like Excel or JSON) -Cloud (like AWS or Azure)

Answer 91

A sequence of visuals that work together to tell a data narrative.

Answer 92

The process of organizing data in a clear and understandable way. Each model has -entities like customers and products or -events like orders. Inside the entities we have -attributes (i.e., information like first name and last name). We describe in a data model how the entities are connected or related to one another.

Answer 93

Big picture of the data. High-level representation of the data model without going into detail about how the data model is implemented. It's like a map that shows the important entities and relationships. Use to explain the data models to business analysts and stakeholders to help them understand the big picture of the data.

Answer 94

Blueprint for implementation. Provide more detail than the data conceptual model, looking at how the data is structured and organized. Define the attributes of each entity and include constraints and more details about the relationships between the entities. Standardly used by database designers and developers as a blueprint for implementation.

Answer 95

Shows how the data is implemented in the databases. Represents that actual implementations of the data model. It includes all the technical details about how to store the data (e.g., the data types of the attributes, the primary and foreign keys, indexes, etc. Used by developers to create and manage the database.

Answer 96

records of actions or changes that occur over time examples: customer purchases product order is shipped user clicks on ad events capture a point-in-time action or change and are typically structured with attributes that describe the context like a timestamp, user ID, and details of the interaction.

Answer 97

Core objects or concepts we want to capture in a data model, such as a "customer", "product", or "order". Entities generally have attributes that describe their current state and they're often represented by records in databases, forming the foundation for operational data.

Answer 98

Central fact schema surrounded by dimensional tables. Fact tables contain events (i.e., records of actions or changes that occur over time like "customer purchases product" or "order is shipped" or "user clicks on ad"). Dimensional tables contain descriptive information.

Answer 99

Like the star schema, but the dimensions are broken down into subdimensions.

Answer 100

Star Schema is simple and easy to understand. It's standardly used if a dataset is small or medium. Snowflake Schema is more complex. It reduces the storage space and it's standardly used with large datasets.

Answer 101

Fact tables contain events (i.e., records of actions or changes that occur over time like "customer purchases product" or "order is shipped" or "user clicks on ad"). Examples of what could be in a fact table: -Keys to the Dimension Table: Order ID, Customer ID, and Product ID -Dates, when the event happened: Order Date, Shipping Date, etc. -Measures: numeric, quantitative values: Sales, Quantity, Profit, Unit Price

Answer 102

Dimension: Describes physical persons or objects (e.g., employees, customers, products, etc.) Fact/Event: Contains events or transactions (e.g., sales, orders, logs, ATM transactions).

Answer 103

Joins (these can only be done on the physical layer) -Inner Join, Left Join, Right Join, Full Join Union (these can only be done on the physical layer) -Combine tables into one big table vertically (e.g., add 2 tables with Date & ID columns) -Union Rules: 1) Both tables must have the same number of fields & 2) The fields should have the same data types Relationships (these can only be created on the logical layer) *Note a relationship does NOT create a new table. It simply links the tables. (Data) Blending (this is done on the visual layer)

Answer 104

The process of examining and investigating the data to understand the content of the tables.

Answer 105

Many: When a key has duplicate values Example: Customer ID in the below table would be many as 1 and 2 are repeated Customer ID 1 2 2 1 One: When a key had unique values (e.g., in the below Customers screenshot, the Customer IDs are all unique). Example: Customer ID in the below table be one as its unique Customer ID 1 2 3 Very important to choose the correct value. If you select "One" when there are "many", you'll miss values. And if you select "Many" when there is only "one", you'll hurt performance (tableau will look for duplicates when there are none.) Best Practice: Use default many-many cardinality if you're not sure. It may hurt performance a bit, but the results will be correct.

Answer 106

Method of combining data at the visualization level from two different data sources using a left join. Notes: -data blending can only be done on the visualization level on the worksheet page, not in the data source. -Tableau will use a left join (no changing that) -data blending is a unique feature of Tableau; it's not in PowerBI or any other BI tool. (You can't combine data from two different datasets using other tools.) -data blending works on a separate worksheet basis (i.e., start a new worksheet, start over)

Answer 107

Tableau will mark the primary data source with a blue icon

Answer 108

Combine the FIELDS of the tables Table A F1 F2 (column headers) Table B F3 F4 (column headers) Join of Tables A & B F1 F2 F3 F4 (column headers)

Answer 109

Combine the ROWS of the tables. Left table comes first in the unioned table Table A (left) F1 F2 x y Table B (right) F1 F2 a b Union of Tables A & B F1 F2 x y a b

Answer 110

Joins: First combine and then aggregate -You can get duplicates Blending: First aggregate and then combine -You will not get duplicates Note: -Measures can be aggregated. Dimensions cannot be aggregated (e.g., a date can't be aggregated; can't add 2020 and 2023 together). -When multiple matches exist, an asterisk appears (e.g., Tableau can't add dates so will put an asterisk there in that instance)

Answer 111

If you use Joins, data will be static and you may lose data (e.g., if you do a left join, any data that doesn't match on the right-side will be lost.) If you use relationships, they will be more flexible and you won't lose data.

Answer 112

Once you decide the join type (e.g., left join), it will stay that join type for all of the worksheets. If you do a full join, it will slow down performance. You will lose data, if you do a left join or a right join.

Answer 113

Merging tables may cause duplicates. Duplicates cause false aggregations For example, if you have a score column, we may have duplicates if some customers have more than one order. that could result in a lot of duplicates if we merge the customers and orders. Then if you do the average, you will get the wrong answer.

Answer 114

Baraa recommends relationships, which should be default in Tableau

Answer 115

RELATIONSHIPS

Answer 116

Left side: fact side Right side: dimension side If the data quality is bad (and we haven't cleaned it), then leave the cardinality as is (MANY, MANY) If the data quality is good (we've cleaned it), then the fact-side is MANY and the dimension side is ONE (MANY, ONE) If unsure, you can always check that the dimension side is ONE (e.g., if the fact is the product ID, you're ensuring that all product IDs are unique; no numbers appear twice). If you're super unsure, just leave it as a MANY to MANY relationship (the default)

Answer 117

joins/union (e.g., orders and customers tables are completely different entities)

Answer 118

If Tableau displays numerical fields as a String Data Type Note: If you switch a column's data type from String to Number, it won't resolve the issue, because the format will still be unknown to Tableau. Solution: Go to the the Data Source>physical layer of the table that needs editing>right-click on it>select Text File Properties>Make sure the field separator is correct (should be semicolon--just to prevent errors--CSV files) & Locale: change to English (United States)

Answer 119

EU: 2,5 M Non-EU (e.g., USA, Asia, South Africa, etc.): 2.5M

Answer 120

Numbers # String Abc Date Boolean T/F

Answer 121

Role I: -Dimension: Level of Details -Measure: Aggregation Role II: -Discrete: Separate values -Continuous: Connected values

Answer 122

Data Types: -Numbers # -String Abc -Date -Boolean T/F Role I: -Dimension: Level of Details -Measure: Aggregation Role II: -Discrete: Separate values -Continuous: Connected values

Answer 123

Check the metadata (data types, role 1, role 2) after connecting the data to tableau to make sure that everything is assigned correctly. Data Types: -Numbers # -String Abc -Date -Boolean T/F Role I: -Dimension: Level of Details -Measure: Aggregation Role II: -Discrete: Separate values -Continuous: Connected values

Answer 124

Specify the kind of information stored inside the data Define what operations can be performed on the data

Answer 125

The Keys of the Relationships must have the same data type

Answer 126

Basic Types: -Whole Number: no decimals or fractions. negative, positive, or zero are whole numbers. -Decimal Number: decimals or fractions (e.g., 2.4 or 30.99) -String: sequence of characters, including letters, numbers, spaces, and special characters (e.g., $ ?). So any field can be converted to a string. -Date: lots of different date formats (i.e., / - . etc.) -Date & Time (aka time stamp): 2025-08-20 18:48:53 yyyy-mm-dd hh:mm:ss -Boolean: t/f t(1)/f(0) Roles: -geographic role: 12 types (i.e., None, Airport, Area Code (US), CBSA/MSA (US), City, Congressional District (US), Country/Region, County, Latitude, Longitude, NUTS Europe, State/Province, Zip code/post code). BUT most common: City, Country/Region, County, Zip code/Post code -image role Advanced Types: -groups -cluster groups -bins -sets

Answer 127

Data types are a MUST for each field Roles are extra to assign

Answer 128

-supported image extensions (.png, .jpeg, or .jpg) -URL must begin with: http or https -Image file must be > 128 kb

Answer 129

Data model that has the shape of a cube. Think of a rubik's cube. It has dimensions (length, width, height) and cells (aka measures so data, numbers, etc.) Example: Cube of Sales 3 dimensions: 1) Location (USA, France, Germany) 2) Time (Jan, Feb, March) 3) Category Inside the measures (cells) of cube: sales (e.g., 30, 40, 50, 45, 88, etc.) We could slice the cube to only have USA and then do the total of those sales, etc. You can also do drill-up, dicing, drill-down, roll-up, slicing, and pivot (look at hardcopy flashcard for visuals)

Answer 130

Dimensions: -Dimensions contain qualitative or categorical values -e.g., product name, product category, location -use of dimensions: to categorize, filter, or show the level of detail Measures: -Measures contain quantitative and numerical values -e.g., sales, profit, quantity -measures can't be aggregated

Answer 131

Is the Data Type a Number? Yes>Does it make sense to aggregate (i.e., sum, avg, etc.)>yes>Measure No>Dimension Yes>Does it make sense to aggregate (i.e., sum, avg, etc.)>no>Dimension

Answer 132

Blue = Discrete Green = Continuous How to tell difference between discrete and continuous: Do you count the items (discrete) or do you measure them (continuous)? Example: Counting people is discrete, because you can't have half a person. But measuring a person's height is continuous, because you can be 5.5 feet or 5.51 feet, etc. Above the line = Dimension Below the line = Measure

Answer 133

Formula: Measure by Dimension Combine any measure by any dimension Examples: -Sales by Product -Profit by Category -Quantity by Country

Answer 134

How to tell difference between discrete and continuous: Do you count the items (discrete) or do you measure them (continuous)? Discrete: Disconnected and separate values Continuous: Connected and unbroken chain and values Example: Counting people is discrete, because you can't have half a person. But measuring a person's height is continuous, because you can be 5.5 feet or 5.51 feet, etc.

Answer 135

continuous: green discrete: blue

Answer 136

Discrete: Many sorting options: -Ascending -Descending -Data Source Order -Alphabetic -Field -Manual -Nested Continuous: Limited Sorting Options: -Ascending -Descending

Answer 137

Discrete Purpose: Deep Dive Analysis/Helps analyze specific problems Standard Chart: Bar Chart Continuous Purpose: Big Picture Analysis/Helps to see the big picture Standard Chart: Line Chart

Answer 138

snake_case: used in python, PHP, ruby camelCase: first word is lowercase, but all following words are capitalized Used in java, javascript, and typescript iPhone PascalCase: Capitalized, no separation. Used in Java and C# kebab-case: lowercase, each word separated by - Used in HTML, CSS Title Case: Capitalized, space between words Customer Name Used in Tableau *Once you connect your data to Tableau, Tableau will rename everything according to this rule.

Answer 139

Remote Field Name: Comes from the original source of the data. The original dataset should have a specific naming convention. (Keep in mind that Tableau automatically converts all data to abide to the Title Case naming convention; but, of course, the original data set may use snake_case or camelCase or PascalCase or kebab-case. Field Name: Comes from Tableau after it renames data according to Title Case. Example: Field Name: Product ID Remote Field Name: Product_ID

Answer 140

tableau: yes powerbi: NO. It will break your views if you rename fields. Proceed with caution.

Answer 141

y axis: Double click on the name in the rows section and enter //Name before the current name and then hit shift enter at the same time x axis: right-click>edit axis>update the title

Answer 142

Data Model (visual) Metadata grid Data

Answer 143

-Poor Data Quality (e.g., a dataset has "Germany" and "Deutchland" and "USA" and "America") -Abbreviate long words or values & don't have enough space (e.g., "DE" as opposed to "Germany" and "US" as opposed to "America")

Answer 144

Alternate names for the members of a discrete dimension field, so that their labels appear differently in the view. Note: tableau does not allow you to create aliases from measures or continuous dimension fields

Answer 145

Root Node: The highest level of aggregation Branch: Connects the different levels of the hierarchy Leaf Nodes/Leaves: The most detailed level of the hierarchy; this level has no children Drill down: every time you drill down, you will see more details about your data; each drill down, you'll jump to the next level in your data. Drill up: Bottom to top. Go up each level, starting at the leaf node/leaves

Answer 146

You can only create hierarchies in Tableau on the worksheet page, not on the data source page. Hierarchies can be created only using dimensions.

Answer 147

Groups combine similar or related values into a higher-level category, creating a new dimension for data analysis. Note: Groups are created using dimensions ONLY. Examples: GROUP GROUP Product ID Product Name Category Class 1 Samsung FHD Monitor Monitor Class A GROUP GROUP Group Customer ID Name Country City Postal Code

Answer 148

right-click on the root node in the hierarchy>create>group>select the items to include in the group by clicking and using shift>Group button>name group & repeat as necessary, depending on how many groups wanted>click apply

Answer 149

Create groups from dimension with high cardinality directly from the view

Answer 150

Groups combine similar and related values into higher-level categories Groups are created using dimensions only Groups simplify data by categorizing data points into clear, relevant categories

Answer 151

the geometric center/the center of a figure

Answer 152

Tableau creates clusters for the data points and each cluster will have its own centroid. Note: centroid: the geometric center/the center of a figure

Answer 153

-Tableau's cluster group is a statistical technique that groups similar data points together in clusters. -Tableau uses the k-means algorithm for clustering (i.e., Tableau creates clusters for the data points and each cluster will have its own centroid). -Tableau can plot endless data points in visualizations. (PowerBI places limitations on the numbers of data points that you can see in the visual.) -Data clustering in visuals is a powerful tool for analysis and pattern recognition, enabling data-driven decisions.

Answer 154

Clusters are statistically derived, automatically generated groupings of similar data points in a view. Groups are manually created by users to combine specific measures of a dimension into a single entity.

Answer 155

Open worksheet>select data source>select columns, rows, and detail (i.e., data points)>Analytics tab>select "Cluster" and drag over to the view>pick the number of clusters if don't want to stick with the automatic number>change color and shape if wanted

Answer 156

-Sets divide data based on specific criteria into two subsets: ---IN: contains all members of the set ---OUT: contains members not included in the set -Sets are useful for focusing on a subset of the data and comparing it with the remaining data. -Sets add interactivity and dynamics to view by allowing users to define which subset they want to focus on. ---- Create an in-group and an out-group Used to: -focus the analysis on specific data -compare a subset of data with the remaining data Methods of creating a set: 1) Fixed Sets: manually select which customers are IN and which are OUT 2) Dynamic Sets: --Condition (e.g., condition: if the score is greater than 400, then the customer is in the set, otherwise out.) --Rank (e.g., rank: set will include only the top 2 highest scores) 3) Combined Sets (i.e., combine two different datasets together): --full: all members --inner: shared members --left: set1 except shared members --right: set 2 except shared members Example: -Full/All Members: The customer is in if its a member of at least one set -Inner/Shared Members: The customer must be a member of both sets -Left/Set1 except shared members: Customers must be a member of set1 but cannot be a member of set2 -Right/Set2 except shared members: Customers must be a member of set2 but cannot be a member of set1

Answer 157

Open worksheet>select data source>create rows/columns/details/etc>right-click on value that want to create a set of in the data pane>create>set> Different options: -General: manual selection -Condition: dynamic set, condition -Top: dynamic set, rank >General>Select what's in or out >Condition>set the rule >Top>define tier by field Use Cases: -Use sets to highlight data points in the view. -Focus on a specific subset. -Show set as a quick filter. -Use sets in actions to let users define in/out subsets.

Answer 158

Bar at top>Worksheet (right after File and Data on the bar)>Actions>Add Action>Change Set Values Change set values (i.e., the actions of the users will change the values in the set) >Name the action>Select the worksheet it will be applied to Define the behavior of the user: Run action on: Hover/Select/Menu/Single-Select Source Select the Target Set Then Running this action will: 1) Assign values to set: Users create a completely new set OR 2) Add values to set: Users add new members to the set Changing the selection will: 1) Keep set values OR 2) Add all values to set: once the users start moving away from the selection, all the members or customers are going to be in the in-group/going to be inside the set

Answer 159

Measure by Dimension Measure by Measure (Bins) Example: -Profit by Sales (Bins) -Quantity by Profit (Bins)

Answer 160

Divide the data into groups of equally sized containers, resulting in a systematic distribution of data. Bins are used to create charts called hisotgrams Formula: Measure by Measure (Bins) Example: -Profit by Sales (Bins) -Quantity by Profit (Bins)

Answer 161

Bar Graph: Shows categorical data with distinct gaps Ex//Different types of fruit liked by people x axis: fruits people like (apple, guava, banana, mango) y axis: Number of people (1, 2, 3, 4, 5) Histogram: Illustrates continuous data without gaps. Ex1// height of trees in Lincoln Park x-axis: height in cms (100-150, 150-200, 200-250,250-300, 300-350). Note: no spaces between the bins on the histogram because represent a range of values (e.g., 100-150) as opposed to a discrete value (e.g., apple) y-axis: number of trees (1, 2, 3, 4, 5, 6) Ex2// Number of machines working in a factory for a fixed amount of time x-axis: time in minutes (20-25, 25-30, 30-35, 35-40) y-axis: number of machines (1, 2, 3, 4, 5, 6, 7, etc.)

Answer 162

Option 1: Worksheet>Select Data Source>Pick value (e.g., score within the customers table), right-click>Create>Bins>Field name (change if wanted)>Size of bins (note: tableau will automatically calculate the size of the bins based on the data) but you can change it>Min & max value (change if wanted)>if you check the data pane, a new value will appear (name will be whatever you chose, e.g., Score(bin)). If you want to convert the chart to a histogram (i.e., continuous values), right click the value (e.g., Score(bin)>Convert to Continuous & then go up to the column or row where that value (i.e., Score(bin) appears and convert that to continuous too. Option 2: Worksheet>Set up rows>Show Me>select histogram (only need one measure for this visual)

Answer 163

Normal Distribution (unimodal/symmetric/the "bell curve"): When graphed, a vertical line at the middle will form mirror images, i.e., the highest vertical column is in the center. Example: Distribution of male heights in the US. The average height of a male in the US is 69.1 inches (5ft 7 inches) with some shorter and some taller. Right-Skewed Distribution (positively-skewed): Fewer data plots are found to the right of the graph (toward the larger numeric values). The "tail" of the graph is pulled toward higher positive numbers/the right. (In practice, the right-side of the graph has higher bars than the left.) Ex// x-axis: 10, 20, 30, 40, 50 y-axis: the highest bar is at 10 and then it goes down as you go right (to 70) Example: Distribution of household incomes in the U.S. Most households earn between $40k and $80k per year, but there is a long tail of households that earn much more. NOTE: Tail extends to the right. Left-Skewed Distribution (negatively-skewed): Fewer data plots are found to the left of the graph (toward the smaller numeric values). The "tail" of the graph is pulled toward the lower or negative number, i.e., the left. Ex// x-axis: 10, 20, 30, 40, 50 y-axis: The lowest bar is at 10 and then it goes up as you go right (to 70). Examples: Distribution of age of death Most people live to between 70 and 80 years old with fewer living less than this. NOTE: Tail extends to the left Uniform Distribution (equal spread, no peaks): The data is spread equally across the range. There are no clear peaks in these graphs as the data entry appears the same number of times in the set. Symmetric Bimodal Distribution (two modes) Example: Exam scores fall into a lot of As and a lot of Fs. So Group 1 is prepared for the class and group 2 is unprepared for the class. Non-Symmetric Bimodal Distribution (two modes) Example: Data representing the time it takes for employees to complete a task. One peak for highly efficient employees and another for less efficient employees. If most employees are efficient, the distribution would be right-skewed with the less efficient group creating a longer tail to the left, indicating a non-symmetric bimodal shape. NOTES: -Unimodal Distribution: One clear peak -Bimodal Distribution: Two clear peaks. This usually indicates that you've got two different groups, e.g., exam scores fall into a lot of As and a lot of Fs. So Group 1 is prepared for the class and group 2 is unprepared for the class. -Bell-Shaped Distribution: Single peak at the center

Answer 164

-Bins divide data into equally sized groups, resulting in systematic data distribution -ONLY measures are used to create bins -Bins are dimensions, and it's better to convert them into continuous measures. -Calculated fields cannot be used to create bins. -Histograms in statistics show the frequency of data within a certain range.

Answer 165

Remove or select specific subsets of data for different purposes and use cases. Purpose: -Reduce the size of dataset to optimize the performance of the dashboard. -Interactivity & Analysis: Offer filters to users so they can focus on subsets of the data. (Different users may be interested in different sections of the data.) -Data Security: Remove sensitive data. Use filters to restrict or hide sensitive data from the users. -Data Access Control by Applying Row-Level Security (RLS): use filters to limit access to data based on user role and permissions (e.g., employees should not see sales/employee as the managers do)

Answer 166

Visual: Source system>Extract Filter(optional)>Data Source>Data Source Filter(optional)>Context Filter(optional)>Worksheet>Dimension Filter>Measure Filter>Table Calculation Filter First Processed to Last Processed Filter: Extract Filter>Data Source Filter>Context Filter>Dimension Filter>Measure Filter>Table Calculation Filter Optimize Performance of a Dashboard: 1) extract filter: Used to filter data before it even enters Tableau (i.e., before it reaches the data source). (You can't use it on a live data connection, only on an extract data connection.) -Only extract connection -Only tableau desktop -purpose: optimize load performance & optimize performance in views 2) data source filter: filter data between data source and worksheets. -extract and live connection -tableau desktop and public -purpose: optimize performance in views & hide sensitive data 3) context filter: creates temporal, filtered subset of data in worksheets -downside: lose performance to create the subset. Why use a data source filter & a context filter? Context filter enables you to have different filter criteria for each worksheet. In some scenarios, you can't use a data source filter, because you have have different requirements and focuses in each worksheet. Interactivity: We offer the below filters to users so they can slice and dice data to focus on specific subsets of the data. -dimension filter -measure filter -table calculation filter

Answer 167

Data Source page>Filters (upper right-hand corner)>Add>Add...select value you want a filter on (e.g., country)>select what you want filtered (e.g., USA, exclude)>OK>OK>all of your datasets will update accordingly (no worksheets will show US data)

Answer 168

Use the data fields to reduce the size of the data (i.e., limit the number of years included in the data source). Ask users if need all, say, 5 years of data or if it's enough to have 2 years. Caution: All worksheets connected to this data source will be affected by these filters.

Answer 169

drop value in the filters section on a worksheet>right-click>Create filter>select whatever filter wanted. Note: Keep in mind that you can't hide sensitive data with the context filter. Viewers can select the arrow in the filter on the view and have it show all values in the dataset.

Answer 170

Put a value in "Filter" on a worksheet>right-click>edit filter>General to exclude or include values OR Wildcard to set up a rule OR Condition to set up a rule whereby profit = sum of x (or some such thing) OR Top in order to see top values Tip: Use Wildcard to set up a rule if you have a dimension with high cardinality (i.e., a long list of all possible values in a dimension). You can right-click on the value in the filters section of the worksheet after you apply and then hit "show filter" so users can see an interact with the filters.

Answer 171

You have a long list of all possible values in the dimension Tip: You're not going to want to select all values manually for a filter as it would take a long time. You can instead use a dimension filter and "Wildcard" to define a rule.

Answer 172

If you put a measure value on the Filters section of a worksheet, a window will pop up>it will ask if you want "all values" (i.e., all values from the dataset) or if you want an aggregated value (i.e., sum, avg, median, etc.) such that the values are aggregated AND THEN filtered>select your choice>Next> Range of Values (range, at least, at most, special) (special lets you decide if you want to show only null values, only non-null values, or all values). >Apply

Answer 173

Place a value in the marks section of a worksheet>right-click it>Quick table calculation>percent of total (or whatever you want to select)>control and drag it to the filter section of the worksheet

Answer 174

Filters that can be used by users to interact and filter worksheets

Tableau Flashcards

(201 cards)