Google Sheets/Excel Flashcards

(134 cards)

1
Q

Absolute Referencing

A

Absolute referencing is marked by a dollar sign ($). For example, =$A$10 has absolute referencing for both the column and the row value

Relative references (which is what you normally do, e.g. “=A10”) will change anytime the formula is copied and pasted. They are in relation to where the referenced cell is located. For example if you copied “=A10” to the cell to the right it would become “=B10”. With absolute referencing “=$A$10” copied to the cell to the right would remain “=$A$10”. But if you copied $A10 to the cell below, it would change to $A11 because the row value isn’t an absolute reference.

Absolute references will not change when you copy and paste the formula in a different cell. The cell being referenced is always the same.

To easily switch between absolute and relative referencing in the formula bar, highlight the reference you want to change and press the F4 key; for example, if you want to change the absolute reference, $A$10, in your formula to a relative reference, A10, highlight $A$10 in the formula bar and then press the F4 key to make the change.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

COUNTIF()

A

COUNTIF() is a formula and a function. This means the function runs based on criteria set by the formula.

COUNTIF returns a value based on a condition that the data range is evaluated for. The function counts the number of cells that meet the criteria.

Syntax:
=COUNTIF(range, “criterion”)
OR
if you have more than one tab in a Google Sheets spreadsheet:
=COUNTIF(‘tab_name’!range, “criterion”)

Range: A1:A16
Criterion: can be a text string or a number.

Examples:

Use =COUNTIF(A1:A16, “7”) to count only the cells in the range of A1:A7 that contained the number 7.
Output: 3. There were 3 sevens in that range.

Use =COUNTIF(‘raw_data’!G:G, A2) to count all cells in column G of the raw data tab of the spreadsheet that match the value in cell reference A2.

Other Examples:
=COUNTIF(A2:A9, “Taxi”)
=COUNTIF(B2:B9, “>=10”)
=COUNTIF(B2:B9, “>=”&B22)
=COUNTIF(A2:A9, “G*”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

To avoid a divide by 0 error

A

To do this, click on the cell with the error and then enter this function in the fx box: IFERROR(B4/A4, “Not applicable”)

(You’ll need to enter the correct cells; B4 and A4 are just filler cells for this example.)

Copy this formula to apply it to other cells in the column so other divide by errors are avoided.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

N/A

A

Data in a formula can’t be found by the spreadsheet. (Often this means the data doesn’t exist. This error most often occurs when you use
Functions like VLOOKUP)

VLOOKUP: searches for a certain value in a column to return a corresponding piece of information.

For example, if get an #N/A error and hover over the problematic cell, you may get a message like: “Did not find the value ‘Almond’ in VLOOKUP evaluation. In this case, ‘Almond’ needs to be changed to ‘Almonds’ and then the formula will work.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

NAME?

A

A formula or function name that isn’t understood

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

NUM!

A

A formula or function calculation can’t be performed as specified.

For example, we would get a #NUM! error if the end date came before the start date in a DATEDIFF function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

DATEDIFF function

A

Use to calculate duration, e.g., number of months between the start and end dates of a project.

Syntax:
=DATEDIFF(start_date, end_date, “unit”)

Example:
=DATEDIFF(B6, C6, “M”)

units:
“D”: the number of days between the start & end dates
“M”: the number of whole months between the start & end dates
“Y”: the number of whole years between the start & end dates

“MD”: the number of days between the start & end dates after subtracting whole months.
Example: After subtracting whole years and whole months from my age, how many days old am I?

“YM”: the number of whole months between the start & end dates after subtracting whole years.
Example: After subtracting whole years from my age, how many months old am I?

“YD”: the number of days between the start & end dates, assuming the start & end dates were no more than one year apart.
Example: How many days has it been since my last birthday, given my birthdate and today’s date?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

VALUE!

A

A general error that could indicate a problem with a formula or referenced cells.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

REF!

A

A formula is referencing a cell that is no longer valid or has been deleted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Spreadsheet tip

A

Create a tab with just the raw data, and a separate tab with just the data you need.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Common spreadsheet errors

A

DIV/0!

A formula is trying to divide a value in a cell by 0 (or an empty cell with no value)

=B2/B3, when the cell B3 contains the value 0

(Google Sheets only) Something can’t be interpreted as it has been input. This is also known as a parsing error.

=COUNT(B1:D1 C1:C10) is invalid because the cell ranges aren’t separated by a comma

A formula can’t find the data. The cell being referenced can’t be found.

=VLOOKUP”Apple”, A2:B5, 2, FALSE) Data not found in lookup.

The name of a formula or function used isn’t recognized. The name of a function is misspelled.

=SUMM(A1:A5)

The spreadsheet can’t perform a formula calculation because a cell has an invalid numeric value

=DATEDIF(A4, B4, “M”) is unable to calculate the number of months between two dates because the date in cell A4 falls after the date in cell B4

SQRT(-1) Data doesn’t make sense for the calculation.

A formula is referencing a cell that isn’t valid. A cell used in a formula was in a column that was deleted.

=A1+B1 Referenced cells have been deleted.

A general error indicating a problem with a formula or with referenced cells

There could be problems with spaces or text, or with referenced cells in a formula; you may have additional work to find the source of the problem.

=SUM(A1:A3) Incompatible data types.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

=ISERROR

A

Logical function that checks if a value is an error. Use it to set up conditional formatting in excel or sheets.

Conditional formatting in Microsoft Excel
For Format values where this formula is true, enter =ISERROR(A1).
Click the Format button, select the Fill tab, select yellow (or any other color), and then click OK.

Conditional formatting in Google Sheets
From the main menu, click Format and select Conditional Formatting to open the Conditional format rules pane on the right.
While in the Single Color tab, under Format rules, use the drop-down to select Custom formula is, enter =ISERROR(A1), select yellow (or any other color) for the formatting style, and then click Done.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

=AVERAGE()

A

Use it to calculate average.

Example:
=AVERAGE(B2:E2)

In above example, B2:E2 shows the sales for April, May, June, and July. Above equation tallies the total sales and then divides by 4.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to calculate change or percent change from June monthly sales to July monthly sales

A

If E2 is July monthly sales (163330) and D2 is June monthly sales (47002),

=E2-D2/D2

& then change the result column to percentages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Function

A

A preset command that automatically performs a specific process or task using the data.

=SUM(value1: value2) *Adds all cells in this range together.

=AVERAGE(value1:value2) *Adds all cell in this range together and then divides by the total number of cells in this range.

Example: AVERAGE(B2:D3)
1) B2+C2+D2=x
2) x/3

=MIN(value1:value2) *Finds the minimum value in the selected cell range.
Example:
=MIN(B2:E4) *Note this range covers an entire table. In this example, sales from 2017, 2018, and 2019 (rows) and each month (column) are considered.

=MAX(value1:value2) *Finds the maximum value in the selected cell range.

Again, like MIN, this range can cover an entire table, e.g., =MAX(B2:E4)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Fill Handle

A

Little box in the bottom left-hand corner of a cell that you can use to copy a function or formula. Drag it up or down.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Cell reference

A

column letter & row number used to identify a cell.

Example: A1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Formula

A

a set of instructions used to perform a calculation using the data in a spreadsheet.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Formula vs Function

A

Formula: a set of instructions used to perform a calculation using the data in a spreadsheet.

Function: A preset command that automatically performs a specific process or task using the data.

=SUM(value1: value2)
=AVERAGE(value1: value2)
=MIN(value1: value2)
=MAX(value1: value2)
=COUNTIF(value1:value2, “value_seeking”) *value can be a text string or number.

cut, save, and find are also functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Relative Reference vs. Absolute Reference

A

Relative references (cells referenced without a dollar sign, like A2) will change when you copy and paste the function into a different cell. With relative references, the location of the cell that contains the function determines the cells used by the function.

Absolute references (cells fully referenced with a dollar sign, like $A$2) will not change when you copy and paste the function into a different cell. With absolute references, the cells referenced always remain the same.

As an example, if you copy the formula =B4C4 from cell D4 to D5, the formula in D5 adjusts to the right by one column and becomes =B5C5.

If you want to maintain the original cell reference in this example when you copy it, you make the cell reference absolute by preceding the columns (B and C) and row (2) with a dollar sign ($). T

hen, when you copy the formula =$B$4*$C$4 from D4 to D5, the formula stays exactly the same.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Mixed Reference

A

Mixed reference Excel definition: A mixed reference is made up of both an absolute reference and relative reference.
This means that part of the reference is fixed, either the row or the column, and the other part is relative.

A mixed reference comes in these two common forms:

1) =$A1 ($ in front of column, holding the column constant)
2) =A$1 ($ in front of row, holding the row constant)

Example:

The company secretary has been asked to calculate the annual income tax for several top executives in the company. She has been given their annual income between 2017 and 2019 and their respective tax rate.

To find out their respective annual income tax payable, the calculation is = annual income * tax rate.

In Jonathan’s case, it would be = 259817% for 2017
In Excel expression, it would be =B3
E3.
The return value is $441.66.

Since the tax rate is always in column E, column E is to be held constant in the calculation.
=B3*$E3

To calculate the rest (i.e., 2018 and 2019 for Jonathan), copy the formula across the rest of the remaining cells.
By doing this,
2018: =C3 (2018 Salary)*$E3 (17% tax rate; doesn’t change over time)
2019: =D3 (2019 Salary) *$E3 (17% tax rate; doesn’t change over time)

If a mixed reference was not applied in the scenario above, and only a relative reference used, the calculations would not be correct.
They would be,
2018=C3 (2018 Salary) *F3 (No value in this column)
2019: D3 (2019 salary) *G3 (No value in this column)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Paste Values Only

A

Example:

     A            B               C               D              E 1  Sales     Week1     Week2     Week3     Total/Day 2   Mon       400          300           350          =SUM(A2:D2) 3   Tues       500          200           450          =SUM(A3:D3)

If you try to copy column A & E and paste them, you’ll get a #REF! error for the E column.

     A            B               C               D              E 5  Sales     Total/Day 6 Mon        =SUM(_:A6) 7  Tues       =SUM(_:A7)

Why do you get the error?

Excel is trying to sum the cells to the left, but there aren’t cells to the left. This happens because it’s using relative reference values.

Solution: Paste Values Only
This will copy the totals from cells E2 and E3 (i.e., 182719 and 202754) and paste them in B6 and B7 without the equation. (If you double click on 182719 and 202754, you’ll just see those values; no formula will show up.)

Hot Keys:
Paste Values Only: Control+Shift+V
vs.
Paste: Control+V

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Spreadsheet Hotkeys, PC

Create new workbook
Open workbook
Save workbook
Close workbook

A

Create new workbook: ctrl+N
Open workbook: ctrl+O
Save workbook: ctrl+S
Close workbook: ctrl+W

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Spreadsheet Hotkeys, PC

Undo
Redo
Copy
Cut
Paste
Paste Values Only
Insert link
Bold
Italicize
Underline

A

Undo: Ctrl+z
Redo: Ctrl+y
Copy: Ctrl+c
Cut: Ctrl+x
Paste: Ctrl+v
Paste Values Only: Ctrl+shift+v
Insert link: ctrl+k
Bold: ctrl+b
Italicize: ctrl+I
Underline: ctrl+u

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Spreadsheet Hotkeys, PC Find Find and replace
Find: ctrl+f Find and replace: ctrl+h
26
Spreadsheet Hotkeys, PC Zoom in Zoom out
Zoom in: ctrl+ Zoom out: ctrl-
27
Spreadsheet Hotkeys, PC *Select column Select row Select all cells Edit the current cell *Comment on a cell
Select column: ctrl+(spacebar) Select row: ctrl+(spacebar) Select all cells: ctrl+A Edit the current cell:F2 Comment on a cell: Alt+I+M
28
Insert column to the left Insert column to the right Insert row above Insert row below
Insert column to the left: Alt+Shift+I, then C Insert column to the right: Alt+Shift+I, then O Insert row above: Alt+Shift_I, then R Insert row below: Alt+Shift+I, then B
29
=TEXT()
Syntax: =TEXT(cell_reference, "format") =TEXT(B2, "mmmm") *Note: you must manually enter all formulas and functions. Do NOT copy and paste them, as it will lead to an error message. date & time patterns/format options: -d for the day of the month as one or two digits. -dd for the day of the month as two digits. -ddd for the short name of the day of the week. -dddd for the full name of the day of the week. -m for the month of the year as one or two digits or the number of minutes in a time. Month will be used unless this code is provided with hours or seconds as part of a time. -mm for the month of the year as two digits or the number of minutes in a time. Month will be used unless this code is provided with hours or seconds as part of a time. -mmm for the short name of the month of the year. -mmmm for the full name of the month of the year. -mmmmm for the first letter in the month of the year. -yy for the year as two digits. -yyyy for the year as four digits. -HH for the hour on a 24-hour clock. -hh for the hour on a 12-hour clock. -ss for the seconds in a time. -ss.000 for milliseconds in a time. -AM/PM for displaying hours based on a 12-hour clock and showing -AM or PM depending on the time of day.
30
How do you copy the function or formula in the first cell of a column & paste it into all of the cells in the column? Note: There are 2k+ cells in this column.
Double-click on the fill handle in the first cell of the column.
31
How do you select an entire column when using COUNTIF?
To select the full G column, you would write !G:G Example: =COUNTIF('raw_data'!G:G, A2) on the spreadsheet tab summary data -First value: Google sheets will go to the raw_data tab and sum every value in the G column. -Second value: Google sheets will go to the summary data tab and show the value in cell reference A2.
32
Practice: Calculate the total number of applications by month. Column A of the summary_data tab of a Google Sheets spreadsheet contains a list of months (January through December) A B 1 Month Applications 2 January 3 February 4 March Column G of the raw_data tab of a Google Sheets spreadsheet lists the month that each applicant applied to the company. A B C D E F 1 ID Date Job Title Location Hired Month 2 11578773 1/1/23 Business Data Analyst NY, NY TRUE January 3 11613549 1/1/23 Clinical Data Analyst NY, NY FALSE January
Enter =COUNTIF(raw_data!F:F,A2) on the summary tab in cell ref B2 Range: raw_data!F:F Criterion: A2 Count all values in the column F on the raw_data tab that meet the criterion in cell reference A2 (i.e., January) Output: 2387 That means that in cell G on the raw_data tab, there are 2387 instances of January. So, in this context, 2387 applicants in the month of January.
33
Equation Math expression Math function
Equation/Math Expression: A calculation that involves addition, subtraction, multiplication, or division. Math function: A function that is used as part of a mathematical formula
34
Algorithm
A process or set of rules followed for a specific task.
35
Big Data
Large, complex datasets typically involving long periods of time, which enable data analysts to address far-reaching business problems.
36
Cell reference
A cell or range of cells in a worksheet typically used in formulas and functions.
37
Pivot Chart Pivot Table
Pivot chart: A chart created from the fields in a pivot table. Pivot table: A data summarization tool used to sort, reorganize, group, count, total, or average.
38
Sheets, LEN
The LEN function is useful if you have a certain piece of information in your spreadsheet that you know must contain a certain length. =LEN(A2)
39
Sheets, Conditional Formatting
Conditional formatting is a spreadsheet tool that changes how cells appear when values meet specific conditions. To highlight all of column B except for the header, select cell B. Then press CONTROL (Windows) or COMMAND (MAC) and select cell B1. Navigate to the Format menu, and choose Conditional Formatting. Set the Format rules field to Is not equal to and enter 6 as the value. Select Done. Notice cell B36 is highlighted because its value is 7.
40
Sheets, LEFT
LEFT is a function that returns a set number of characters from the left side of a text string. In cell H2, enter =LEFT(A2, 5) to extract the first five characters from cell A2. This function will show the substring 51993.
41
Sheets, RIGHT
RIGHT is a function that returns a set number of characters from the right side of a text string. In cell I2, enter =RIGHT(A2, 4) to extract the last four characters from cell A2. This function will show the substring Masc.
42
Sheets, MID
MID is a function that returns a segment from the middle of a text string. In cell J2, enter =MID(D2, 4, 2) to extract the two-letter state code that starts at character four in cell D2.
43
Sheets, CONCATENATE and use cases
CONCATENATE is a spreadsheet function that joins together two or more text strings. In cell K2, enter =CONCATENATE(H2, I2) to combine the values from columns H and I. Use case 1: Use CONCATENATE to help you format the data as it is merged. Coming up, if you enter =CONCATENATE(D2, E2) as demonstrated by the instructor, the results will appear like this: 25 Dyas RdSte. 101 You could manually clean the data later to add a space between Rd and Ste., but why not let CONCATENATE do the work for you? Because CONCATENATE merges strings, you can enter an additional string to insert a space between Rd and Ste. Enter =CONCATENATE(D2, " ", E2) instead and you will have an address that is formatted like this: 25 Dyas Rd Ste. 101
44
Sheets, TRIM
TRIM is a function that removes leading, trailing, and repeated spaces in data. In cell L2, enter =TRIM(C2) to remove any leading, trailing, or repeated spaces.
45
Function
A set of instructions that performs a specific calculation using the data in a spreadsheet.
46
Sheets, COUNTIF
A function that returns the number of cells that match a specified value. =COUNTIF(O2:O72, ">500")
47
Can it be automated? -Preparing & Cleaning Data -Data exploration -Modeling the data
Preparing and cleaning data: Partially. Some tasks in data preparation and cleaning can be automated by setting up specific processes, like using a programming script to automatically detect missing values. Data exploration: Partially. Sometimes the best way to understand data is to see it. Luckily, there are plenty of tools available that can help automate the process of visualizing data. These tools can speed up the process of visualizing and understanding the data, but the exploration itself still needs to be done by a data analyst. Modeling the Data. Yes. Data modeling is a difficult process that involves lots of different factors; luckily there are tools that can completely automate the different stages.
48
VLOOKUP
VLOOKUP is a spreadsheet function that vertically searches for a certain value in a column to return a corresponding piece of information. It's rare for all of the data an analyst will need to be in the same place. Usually, you'll have to search across multiple sheets or even different databases. VLOOKUP helps bring the information together. Syntax: =VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup]) It searches for a specific value (i.e., the lookup value) within the column of the table (i.e., the table_array) and returns a value from the same row in a column you specify (i.e., col_index_num). The optional range_lookup argument specific whether to find an approximate or exact match. It would be true for an approximate match or false for an exact match. example: =VLOOKUP("Apple", A2:B10, 2, FALSE) Apple is the name of the product you want to find. A2:B10 is the range of cells containing the product names and prices. 2 for second column, which is the column containing the price FALSE to find an exact match for the product name.
49
Pivot Table
Data summarization tool used in data processing. Sorts, reorganizes, groups, counts, totals, or averages data stored in a database.
50
Data mapping
The process of matching fields from one data source to another.
51
.csv files
plain text files with an organized table structure that includes rows and columns.
52
SPLIT function
The SPLIT function divides text around a specified character or string, and puts each fragment of text into a separate cell in the row. Syntax: SPLIT(text, delimiter, [split_by_each], [remove_empty_text]) text: the text to divide delimiter: the character(s) to use to split the text split_by_each: (optional: true by default): whether or not to divide the text around each character contained in the delimiter remove_empty_text (optional, true by default): Whether or not to remove empty text messages from the split results. Default is to treat consecutive delimiters as one (if true). If false, empty cells values are added between consecutive delimiters. Example 1: Cell A1: Ben, Mary Jo, , Jenny, Simon =SPLIT(A1, ",") Output: B1: Ben C1: Mary Jo D1: E1: Jenny F1: Simon Example 2: The optional TRUE/FALSE determines whether SPLIT considers each individual character of the delimiter (TRUE) or only the full combination as the separator to use (FALSE). In the below example, FALSE ensures that it only considers the combined comma/space string as the delimiter. =SPLIT(A1, ", ", FALSE) Output: B1: Ben C1: Mary Jo D1: Jenny E1: Simon Example 3: The 4th argument is also optional and uses a TRUE/FALSE value. It determines whether to remove blank cells or not in the output. To keep blank cells, use FALSE. A B C D E F 1 String SPLIT Formula Split Formula Output 2 Client A;; B98;20 =SPLIT(A2, ";", TRUE, TRUE) Client A B98 20 3 Client A;;B98;20 =SPLIT(A3,";",TRUE,FALSE) Client A (blank) B98 20
53
Data Cleaning Checklist
This checklist will help you you identify problems in your data efficiently and identify the scale and scope of your dataset. Think of this checklist as your default “what to search for” list. Determine the size of the dataset: Large datasets may have more data quality issues and take longer to process. This may impact your choice of data cleaning techniques and how much time to allocate to the project. Determine the number of categories or labels: By understanding the number and nature of categories and labels in a dataset, you can better understand the diversity of the dataset. This understanding also helps inform data merging and migration strategies. Identify missing data: Recognizing missing data helps you understand data quality so you can take appropriate steps to remediate the problem. Data integrity is important for accurate and unbiased analysis. Identify unformatted data: Identifying improperly or inconsistently formatted data helps analysts ensure data uniformity. This is essential for accurate analysis and visualization. Explore the different data types: Understanding the types of data in your dataset (for instance, numerical, categorical, text) helps you select appropriate cleaning methods and apply relevant data analysis techniques. There might be other data cleaning tasks you’ve been learning about that you also want to prioritize in your checklist. Your checklist is an opportunity for you to define exactly what you want to remember about cleaning your data; feel free to make it your own.
54
Relational Database
A database that contains a series of tables that can be connected to form relationships.
55
When to use a spreadsheet vs SQL database
When it comes down to it, where the data lives will decide which tool you use. If you are working with data that is already in a spreadsheet, that is most likely where you will perform your analysis. And if you are working with data stored in a database, SQL will be the best tool for you to use for your analysis.
56
COUNTA vs COUNT
COUNTA: A function that counts the total number of values within a specified range. Unlike COUNT, it counts all non-blank cells, including text, numbers, dates, times, logical values, and even error values. vs. COUNT: Only counts the numerical values within a specified range
57
Data Cleaning Verification Checklist
1) Correct the most common problems: Sources of errors: Did you use the right tools and functions to find the source of the errors in your dataset? Null data: Did you search for NULLs using conditional formatting and filters? Misspelled words: Did you locate all misspellings? Mistyped numbers: Did you double-check that your numeric data has been entered correctly? Extra spaces and characters: Did you remove any extra spaces or characters using the TRIM function? Duplicates: Did you remove duplicates in spreadsheets using the Remove Duplicates function or DISTINCT in SQL? Mismatched data types: Did you check that numeric, date, and string data are typecast correctly? Messy (inconsistent) strings: Did you make sure that all of your strings are consistent and meaningful? Messy (inconsistent) date formats: Did you format the dates consistently throughout your dataset? Misleading variable labels (columns): Did you name your columns meaningfully? Truncated data: Did you check for truncated or missing data that needs correction? Business Logic: Did you check that the data makes sense given your knowledge of the business? 2) Review the goal of the project Confirm the business problem Confirm the goal of the project Verify that data can solve the problem and is aligned to the goal
58
Sheets & Excel, QUERY
Enables pseudo SQL (SQL-like) statements or a wizard to import the data. The QUERY function is useful when you want to pull data from another spreadsheet. The QUERY function's SQL-like ability can extract specific data within a spreadsheet. For a large amount of data, using the QUERY function is faster than filtering data manually. This is especially true when repeated filtering is required. Sheets Syntax: =QUERY(Sheet and Range, "Select *") Excel Menu Options: Data>From Other Sources>From Microsoft Query Use case: Analysts can use SQL to pull a specific dataset into a spreadsheet. They can then use the QUERY function to create multiple tabs (views) of that dataset. For example, one tab could contain all the sales data for a particular month and another tab could contain all the sales data from a specific region. This solution illustrates how SQL and spreadsheets are used well together. Clauses: *NOTE: All clauses are optional. Clauses are separated by spaces. The order of clauses is as follows: 1) SELECT: Selects which columns to return and in what order. If omitted, all of the table's columns are returned in their default order. 2) WHERE: Returns only the rows that match a condition. If omitted, all rows are returned. 3) GROUP BY: Aggregates values across rows. 4) PIVOT: Transforms distinct values in columns into new columns. 5) ORDER BY: Sorts rows by values in columns. 6) LIMIT: Limits the number of returned rows. 7) OFFSET: Skips a given number of rows. 8) LABEL: Sets column labels. 9) FORMAT: Formats the values in certain columns using given formatting patterns. 10) OPTIONS: Sets additional options.
59
Sheets & Excel, FILTER
Displays only the data that meets the specified conditions Sheets Syntax: =Filter(range, codition1, [condition2, ...]) Excel menu Options: (conditions per column) The FILTER function might run faster than the QUERY function. But keep in mind, the QUERY function can be combined with other functions for more complex calculations. For example, the QUERY function can be used with other functions like SUM and COUNT to summarize data, but the FILTER function can't.
60
Sheets and Excel, IMPORTRANGE
Imports (pastes) the data from one sheet to another and keeps it automatically updated. Sheets Syntax: =IMPORTRANGE(spreadsheet_url, range_string) Example: =IMPORTRANGE("https://docs.google.com/spreadsheets/d/abcd", "sheet1!A1:C10") Excel menu options: Paste Link (copy the data first) Using these on a large amount of data is more efficient than manual copying and pasting. They also reduce the chance of errors being introduced by copying and pasting the wrong data. They are also helpful for data cleaning because you can “cherry pick” the data you want to analyze and leave behind the data that isn’t relevant to your project. Example 2: An analyst monitoring a fundraiser needs to track and ensure that matching funds are distributed. They use IMPORTRANGE to pull all the matching transactions into a spreadsheet containing all of the individual donations. This enables them to determine which donations eligible for matching funds still need to be processed. Because the total number of matching transactions increases daily, they simply need to change the range used by the function to import the most up-to-date data. On Tuesday, they use the following to import the donor names and matched amounts: =IMPORTRANGE("https://docs.google.com/spreadsheets/d/abcd123abcd123", "sheet1!A1:C10", "Matched Funds!A1:B4001") On Wednesday, another 500 transactions were processed. They increase the range used by 500 to easily include the latest transactions when importing the data to the individual donor spreadsheet: =IMPORTRANGE("https://docs.google.com/spreadsheets/d/abcd123abcd123", "Matched Funds!A1:B4501")
61
Sheets, Why do you need to be careful with "sort range"?
Use the menu to sort one column without affecting how the rest of the sheet is arranged. Each row in a table describes a single observation, so if you sort only one column you can introduce errors throughout your dataset. Use caution when using this option! Select cell A to highlight all of column A. Select Data from the menu. Select Sort range. Select Sort range by column A (A to Z). Notice that the movie data across the rows is now jumbled because sorting a single column in a sheet doesn’t keep data in a row together. Note: As you’ve learned, each row in a table describes a single observation. Here, the values in Column A (movie titles) were sorted in A-to-Z order, but the rest of the sheet wasn’t sorted. For example, before sorting by column A, The Devil Inside was listed as having a release date of 2012-01-06. After sorting only the movie titles in Column A in A-to-Z order, however, The Devil Inside is listed as having a release date of 2015-06-16. This is incorrect.
62
Sort Sheet
All of the data in a spreadsheet is sorted by the ranking of a specific sorted column; data across rows is kept together.
63
Sort Range
When you sort a range, you're selecting a specific collection of cells (or range of cells) that you want sorted. Nothing else on the spreadsheet is rearranged. This means that data across rows is NOT kept together. For example: Original Data Set A B Movie Release Date George 2010 Bender 2007 Sort Sheet by Column A, ASC A B Movie Release Date Bender 2007 George 2010 vs. Sort Range by Column B, ASC A B Movie Release Date Bender 2010 George 2007 *Note that the sort range has incorrect info now. Bender released in 2007, not 2010. (Only column A was rearranged, not column b.)
64
Sheets, SORT
Use to sort data. Note: A written SORT function should never contain a header row. (Use the sort from the spreadsheet's data tab to do that.) Syntax: SORT(cell range, col_number, TRUE or FALSE) TRUE: return results in ascending order FALSE: return results in descending order example: =SORT(A2:D6, 2, TRUE)
65
Sheets, Sort from the spreadsheet's data tab
Sort with multiple conditions Highlight all the data in the party plan spreadsheet: cells A1 to D6. In the menu, select Data > Sort range > Advanced range sorting options. This opens the Sort range from A1 to D6 dialog box. Check the Data has a header row checkbox to make sure column titles aren’t included in the sorted data. From the Sort by drop-down list, select Sent Invitation. Select the A to Z radio button to make sure the No responses are listed first and the Yes responses are listed second. Select Add another sort column to add the additional sorting condition. From the Sort by drop-down list, select Guest Names to list guests alphabetically. Select the A to Z radio button. Select SORT.
66
BigQuery, Back Ticks ` When do you use back ticks?
To capture a column name that contains a space. For example: SELECT * FROM vaulted-algebra-458421-a4.movie_data.movies ORDER BY `Release Date`;
67
Sheets, CONVERT
Syntax =CONVERT(cell reference, "original_value_type", "new_value_type") Example: Convert B2 (99 degrees F) from Fahrenheit to Celsius =CONVERT(B2, "F", "C") The following list outlines the available unit conversions by category: -Weight - u, grain, g, ozm, lbm, stone, sg, cwt, uk_cwt, ton, uk_ton -Distance - ang, Picapt, pica, in, ft, yd, m, ell, mi, survey_mi Nmi, ly, parsec -Time - sec, min, hr, day, yr -Pressure - Pa, mmHg, Torr, psi, atm -Force - dyn, pond, N, lbf -Energy - eV, e, J, flb, c, cal, BTU, Wh, HPh -Power - W, PS, HP -Magnetism - ga, T -Temperature - C, F, K, Rank, Reau -Volume - ang^3, Picapt^3, tsp, tspm, tbs, in^3, oz, cup, pt, uk_pt, qt, l, uk_qt, gal, uk_gal, ft^3, bushel, barrel, yd^3, m^3, MTON, GRT, mi^3, Nmi^3, ly^3 -Area - ang^2, Picapt^2, in^2, ft^2, yd^2, m^2,ar, Morgen,uk_acre, us_acre, ha, mi^2, Nmi^2, ly^2 -Information - bit, byte -Speed - m/hr, mph, kn, admkn, m/s
68
Sheets, Paste Special
Right-click a cell>Copy>Right-Click blank cell>Paste Special>Paste Values Only This option pastes only the values from the original selection, removing any formatting, functions, or other information.
69
Convert to Proper Data Type How to do this in Sheets & Why does it matter?
Sheets: Format>Number>(choose unit type) This is key to ensure that you're comparing apples to apples and not oranges to apples (e.g., pounds to dollars).
70
Data Validation (the function, not the process) What is it & how is it used
Allows you to control what can and can't be entered in your worksheet. It is used to: 1) Add drop-down lists with pre-determined options Function in Sheets: Select column>Data>Data Validation>Data Validation Rules>Apply to range (don't select the header cell in the column, e.g., 'Sheet1'!C2:c1000, Criteria: dropdown (or whatever is wanted), create all options wanted (e.g., Not yet started, in progress, ready) 2) Create custom checkboxes same process as above BUT choose Checkbox for criteria & enter whatever criteria is wanted. 3) Protect structured data and formulas Data validation menu has an option to "reject input" for "invalid data". So it'll work even if someone puts the wrong data in by mistake.
71
Sheets, Set a column so that incorrect values are rejected
Example: Only allow users to enter dates. Data>Data Validation>Add rule>Apply to range (e.g., C:C if whole column C), Criteria: Is valid date, Advanced Options>if data is invalid>Reject the input
72
Sheets, conditional formatting
Example: Turn Drop-down value of "Not Ready" to Red in column C. Format>Conditional Formatting>Add rule>Apply to range (enter range), Format Rules: Text is Exactly, Not Ready, Formatting style: Red fill
73
Sheets, CONCAT()
CONCAT(): Used to combine strings. Can only combine two values. Syntax: CONCAT(value_1, value_2) Example: =CONCAT(Jeff, Baker) Output: JeffBaker (no space, because a space would be a third value).
74
Sheets, CONCATENATE()
CONCATENATE: used to combine strings. Can combine many values. Syntax: =CONCATENATE(value_1, value_2, value_3) Example: =CONCATENATE(Month, " ", Day, ", " Year Output: May 30, 1955
75
Sheets, TRIM()
Removes trailing and leading spaces Syntax: =TRIM() Example: =TRIM(CONCATENATE(first_name, " ", last_name)) Output: Without Trim: Robert Baker With Trim: Robert Baker
76
Sheets, LEN()
LEN function calculates a string's length. Syntax: =LEN(cell) Example: =LEN(D2) Cell D2: 2017-09-01 00:02:01 Output: 19 So, there are 19 characters in cell C2
77
Sheets, FIND()
FIND function locates specific characters and substrings in a string. Syntax: =FIND(value_you_want_to_find, cell_you're_looking_at) Example: =FIND(" ", D3) D3: 2017-09-01 00:08:12 Output: 11 Indicates that the space " " is the 11th character in the selected cell.
78
Sheets, RIGHT()
RIGHT function selects a specific number of characters on the right side of a cell. Syntax: =RIGHT(cell, number_of_characters) Example: =RIGHT(D4, 8) D2: 2017-09-01 0:10:02 Output: 0:10:02 Returns the 8 characters starting from the right-hand side of the selected cell
79
Sheets, LEFT()
LEFT function selects a specific number of characters on the left side of a cell. Syntax =LEFT(cell, number_of_characters) Example: =LEFT(D2, 10) D2: 2017-09-01 00:02:01 Output: 2017-09-01
80
Sheets, mathematical operators
+ addition – subtraction * multiplication / division ^ exponent
81
Sheets, comparative operators
= equal to <> not equal to < less than > greater than <= less than or equal to >= greater than or equal to
82
Sheets, Logical operators
& and | or
83
Sheets, SUM()
This formula returns the sum of as many values as you wish to add. Simply type in the formula and select the range of cells with the values you want to add. The function only requires one value, but you can add many more. =SUM(value1, value2*, …)
84
Sheets, NOW()
This very simple function has no parameters and a very specific purpose: it returns the exact date and time. Simply type the formula as shown below. =NOW() NOW is a volatile function, as it will update every time it’s recalculated.
85
Sheets, DATEDIF()
This function calculates the number of days, months, or years between two dates. =DATEDIF(start_date, end_date, unit) start_date: the start date to consider. end_date: the end date to consider. unit: the unit of time to be used: years, months, days. Acceptable values include “Y”, “M”, “D”, “YM”, “YD”, “MD”
86
Sheets, TEXT()
The TEXT function allows you to convert numbers to text using your specified format. =TEXT(number, format) number: the number, date, or time to format. format: the pattern by which to format the number in quotation marks.
87
Sheets, EXACT()
This simple function tests whether two strings are identical, including case, spacing, and hidden characters. =EXACT(string1, string2)
88
Sheets, TRIM()
The TRIM function helps you clean your data by removing leading, trailing, and repeated spaces in a text. =TRIM(text)
89
Sheets, JOIN()
Using JOIN, you can concatenate multiple elements using the delimiter of your choice, =JOIN(delimiter, value_or_array1, [value_or_array2, …]) delimiter: the character or symbol you want to place between concatenated elements. value_or_array1: the first element or value you want to concatenate. value_or_array2*: only the first element is required, but you can add many more.
90
Sheets, SPLIT
This formula allows you to split text into fragments, which are placed in separate cells. The text is divided around the specified delimiter, which can be a character or a string. =SPLIT(text, delimiter, [split_by_each], [remove_empty_text]) text: the text that you want to split. delimiter: the character or string you want to use to divide the text. split_by_each*: optional. Whether to split the text around each character in the delimiter. remove_empty_text*: optional. Whether to remove empty text messages from results.
91
Sheets, SUBSTITUTE
The SUBSTITUTE function allows you to search within a text for a specific string of text and replace it with a new one. Essentially, it is the function equivalent to find and replace. =SUBSTITUTE(text_to_search, search_for, replace_with, [occurrence_number]) text_to_search: the text within to search for the string. search_for: the string you want to find in text_to_search. replace_with: the string that will replace the search_for string. occurrence_number*: optional. By default, all occurrences of search_for are placed.
92
Sheets, IF
This simple but powerful formula evaluates a logical expression to determine whether it’s ‘TRUE’ or ‘FALSE’. =IF(logical_expression, value_if_true, value_if_false) logical_expression: the logical expression or condition to be evaluated. value_if_true: the value that the function should return if logical_expression is ‘TRUE’. value_if_false: the value that the function should return if logical_expression is ‘FALSE’.
93
Sheets, IFERROR
If you’re using a formula that could result in an error, you can wrap the IFERROR function around the formula or the result. This function allows you to specify the value returned if the first argument results in an error. =IFERROR(value, [value_if_error]) value: the value that should be checked. This can be an expression or a reference to a cell. value_if_error*: the value that should be returned if an error is encountered.
94
Sheets, TRANSPOSE
The TRANSPOSE function allows you to switch the rows and columns in an array or range of cells. =TRANSPOSE(array_or_range)
95
Sheets, FILTER
This function returns a filtered version of your selected range by including only rows or columns that meet the conditions you specify. =FILTER(range, condition1, [condition2]) range: the range of data you want to filter. condition1: the condition that values in a given row or column must meet. condition2*: optional.
96
Sheets, SORT
The SORT function allows you to sort the rows in a range of cells according to the values in one or more columns. =SORT(range, sort_column, is_ascending, [sort_column2], [is_ascending2]) range: the range of data you want to sort. sort_column: the index number of the column containing the values by which you want to sort. is_ascending: ‘TRUE’ for ascending order and ‘FALSE’ for descending order. sort_column2*, is_ascending2*: optional.
97
Sheets, UNIQUE
This function returns unique rows in the selected range by discarding duplicates. =UNIQUE(range)
98
Sheets, QUERY
QUERY is a powerful lookup function that allows you to find, filter, and manipulate data in a given dataset based on multiple criteria. You can easily combine QUERY with IMPORTRANGE to access datasets in other Google Sheets. =QUERY(data, query, [headers]) data: the range of cells to be queried. query: the query written in the Google Visualization API Query language. Enclose it in quotation marks. headers*: An optional number to say how many header rows there are in your data. Applies when the headers are split over more than one row. Example: =QUERY(A1:D234, "SELECT B, D", 1) -The data range in this example is A1:D234 -The query statement is the string inside the quotes, in green. In this case, it tells the function to select columns B and D from the data. -The third argument is the number 1, which tells the function that the original data had a single header row. This argument is optional and, if omitted, will be determined automatically by Sheets.
99
Sheets, COUNTIF
This function allows you to count cells based on whether they meet the condition or criterion that you specify. *(If you want to use multiple criteria, use COUNTIFS.) Syntax: =COUNTIF(range, criterion) -range: the range of cells that you want to test using the criterion. -criterion: the test or condition that you want to apply to the range. Example: Count the number of sales of productA =COUNTIF(B:B, "productA") Output: 112 The number of transactions related to this product is 112.
100
Sheets, COUNTIFS
This function allows you to count cells based on whether they meet the condition or criterion that you specify. *Unlike COUNTIF, you can use multiple criteria Syntax: =COUNTIFS(criteria_range1, criterion1, [criteria_range2, criterion2, …]) The first range and the first criterion are required arguments. -criteria_range1: the first range of cells that you want to test using the first criterion. -criterion1: the test or condition that you want to apply to the first range. NOTE: -Additional ranges and criteria are optional, but you can add as many as you want. -However, the dimensions of subsequent ranges must be equal to those of the first range – all ranges should have the same number of rows and columns. Example: Find the number of sales made by 'Southteam'. Instead of using an exact match, you can use wildcards to search for part of the text. (* is the wildcard.) =COUNTIF(C:C,"South*") Output: 76. Southteam made a total of 76 sales.
101
Sheets, SUMIF
A function that adds numeric data based on one condition. *If you want to use multiple criteria/conditions, use SUMIFS. =SUMIF(range, criterion, [sum_range]) -range: the range of cells that you want to test using the criterion. -criterion: the test or condition that you want to apply to the range. -sum_range*: the range where you want to add the numbers if that range is different from the range being evaluated So, if you find the criterion in range, then sum the sum_range Example 1: Sum the sales amounts corresponding to productA. =SUMIF(C:C, "productA", E:E) So sheets will review the C:C range and find any values that match "productA". If any match productA, it will then sum the matching values in column E. For example: Data set: C D E productA Apple $0.75 productB Banana $0.50 productA Apple $0.75 =SUMIF(C:C, "productA", E:E) Output: $1.50 Because: 0.75+0.75 Example 2: What's the sum of all sales more than $500? Data Set: A B C D 1 Name State No. Clients Sales 2 Alex NY 1 $964.69 3 Ben NJ 4 $130.03 4 Frank NY 2 $1,110.90 =SUMIF(D2:D4, ">500") Output: $1075.59 *Because you didn't include the sum_range input, all the values in the cells D2 to D21 that match the criteria were summed by default. Example 3: What's the sum of all sales in NY? Data Set: A B C D 1 Name State No. Clients Sales 2 Alex NY 1 $964.69 3 Ben NJ 4 $130.03 4 Frank NY 2 $1,110.90 =SUMIF(B2:B4, "NY", D2:D4) =$1,075.59 Example 4: How many stationary items have already been ordered? =SUMIF(F2:F, H2: H, D2:D) F: the category column; items include stationary, instruments, and miscellaneous H: header is By Category, H2: Stationery, H3: Instruments, H4: Miscellaneous (and column I is for the totals of each) D: Amount on Order So, Apply the criterion (column H: stationery, instruments, and miscellaneous) to F2:F (category column: lists each item purchased, specifically whether it is stationery/instruments/miscellaneous; each row in the table is a different order) and sum D column (amount on order) NOTE: If you want to use one formula to get the totals for stationery, instruments, and miscellaneous, use the ARRAYFORMULA function. Solution: H I By Category Stationery 37 Instruments Miscellaneous
102
Sheets, SUMIFS
A function that adds numeric data based on multiple conditions/criteria. =SUMIFS(sum_range, criteria_range1, criterion1, [criteria_range2, criterion2, …]) -sum_range: the range you want to sum. -criteria_range1: the range of cells that you want to test using the first criterion. -criterion1: the test or condition that you want to apply to the first range. NOTES: -The SUMIFS function adds up only those cells that meet ALL conditions (i.e., all of the specified criteria are true for a cell). This is commonly referred to as AND logic. -Sum range and all criteria ranges should be equally sized, i.e. have the same number of rows and columns, otherwise your SUMIFS formula will throw a #VALUE error. -Depending on the nature of your conditions, build your criteria appropriately: --Text, wildcard characters and dates should be enclosed in quotation marks like "apples" or "*" or "4/19/2024". --Numbers and cell references are not enclosed in quotation marks when used on their own. --Logical operators with numbers, dates or text must be enclosed in double quotes like ">200", ">4/19/2024" or "<>apples" . --Logical operators used together with cell references and other functions require the quotes to begin a text string and ampersand to concatenate and finish the string off. For example: ">"&B1 or ">="&TODAY(). Example: What is the revenue for delivered apple shipments? Data Set: A B C Item Amount Status 1 Apples $250 Delivered 2 Bananas $400 In Transit 3 Oranges $225 In Transit 4 Apples $130 Delivered So, let's go ahead and define our arguments: -Column A should contain "apples". Assuming we have data in rows 2 to 13, our criteria_range1 is A2:A13, and criterion1 is "apples". -Column C should contain "delivered". Meaning, our criteria_range2 is C2:C13, and criterion2 is "delivered". -Numbers to sum are in column B, so our sum_range is B2:B13. Putting the arguments together, we get this simple formula: =SUMIFS(B2:B13, A2:A13, "apples", C2:C13, "delivered") -
103
Sheets, ARRAYFORMULA
ARRAYFORMULA is a function that outputs a range of cells instead of just a single value and can be used with non-array functions. Syntax: =ARRAYFORMULA(array_formula) array_formula is a parameter that can be: - a range - a mathematical expression using ranges of the same size - a function that returns a result greater than one cell Array_formula with different Google Sheets functions: SUM SUMIFS COUNT COUNTA CONCATENATE JOIN TEXT JOIN etc. Array_formula with non-array functions: IF SUMIF COUNTIF VLOOKUP etc. Use Case: Let’s imagine a situation where you’ve got 2 columns in a Google spreadsheet: the 1st with prices, the 2nd with the number of items, and you need to multiply them in the 3rd column. What do you usually do in this case? If you were like me in the past, you’d compose a formula in the first row and copy-paste it into the other rows. A good old-school method that works fine. But what if there are 1000 rows or even more? Annoying, right? Let alone time-consuming. It can also cause a performance issue since a bunch of similar formulas slow down the whole spreadsheet. And, if you need to add a new value and create a separate row for it, Google Sheets will not automatically copy the formula. OK, so what’s the solution here? Actually, there is a dynamic and efficient way to address the discussed issues, and this way is called ARRAYFORMULA. Explain more, how does the ARRAYFORMULA solve problems? -Since this is one single formula even for a huge dataset, you won’t end up with a lot of formulas, and your Google Sheets will run smoothly. -ARRAYFORMULA is also expandable – a change in one place will expand down the entire data range. -And it is dynamic as well. When a new row is introduced into the dataset, the formula will automatically be applied to it. Example: We have a data set showing the quantity of 4 different products sold in the summer months (see below), & we need to calculate the total number of sold products. A B C D 1 Products June July August 2 Product A 3 10 6 3 Product B 4 8 9 4 Product C 2 25 21 5 Product D 5 12 12 Solution 1: Cell E2 =B2+C2+D2 Cell E3 =B3+C3+D3 Cell E4 =B4+C4+D4 Cell E5 =B5+C5+D5 Cell E6 =SUM(E3:E5) Issue: If you add a new row (e.g., insert Product A1 with June, July, and August sales into row 3 & all subsequent rows shift down), you would need to update the above formula. OR Solution 2: You only write one formula and it automatically updates if additional information/rows are added. =ARRAYFORMULA(SUM(B2:B5+C2:C5+D2:D5)) Then when you add a new range for productA, the formula will automatically update and add a row: =ARRAYFORMULA(SUM(B2:B6+C2:C6+D2:D6)) Example 2: You have a dataset that shows item SKU, amount sold, left in stock, amount on order, price, category. You want to know if you need to order more of any item. You decide that you will order more of an item if there are less than 10 in stock. You want the formula to create a Order? column that will show "order" if there are less than 10 in stock or "no" if there are more than 10 in stock. =ARRAYFORMULA(IF(C2:C17<10, "order", "no") Example 3: How many stationary items have already been ordered? =SUMIF(F2:F, H2: H, D2:D) F: the category column; items include stationary, instruments, and miscellaneous H: header is By Category, H2: Stationery, H3: Instruments, H4: Miscellaneous (and column I is for the totals of each) D: Amount on Order So, Apply the criterion (column H: stationery, instruments, and miscellaneous) to F2:F (category column: lists each item purchased, specifically whether it is stationery/instruments/miscellaneous; each row in the table is a different order) and sum D column (amount on order) Solution: H I By Category Stationery 37 Instruments 105 Miscellaneous 26 0 0 0 More examples of using array_formula with different functions: https://blog.coupler.io/arrayformula-google-sheets/
104
Sheets, SUMIF vs SUMIFS
SUMIF is used for adding values based on one condition. Use Case: Sum the sales amounts corresponding to productA. =SUMIF(C:C, "productA", E:E) -C:C the range that we want to apply the criterion. C is the column with the products -"productA" the criterion that we want to apply the selected range (i.e., we want an exact match for "productA"). -E:E the range with the values we want to sum if it's different from the first range selected (C:C). In this example, E is the column with the sales amounts. SUMIFS is to sum the values in a range, based on multiple conditions. Use Case: Sum up the amounts for all sales of productA made on or after February 1, 2022. =SUMIFS(E:E, C:C, "productA", A:A">=2/1/22") -E:E the values we want to sum. Column E is sales_amount. -C:C the range to apply the first criterion. Column C is the product. -"productA" the criterion we want to apply to the first range. -A:A the second range. This is the range we want to apply the second criterion. A is the date. - >=2/1/22 is the second criterion. This will be applied to the second range.
105
Sheets, H:H H2: H when do you use ranges like that?
H:H when you want to capture the all values/the full range of values in column H but there's a lot of data and you don't want to scroll to the bottom or the column (e.g., H2384) or the column is volatile (i.e., additional data is constantly being added). H2: H Same as above, but don't want the header row counted. (Sheets, I believe, often discounts the header row, but it can't hurt to specify.)
106
Sheets, FV
Google Sheets also features many financial formulas. The FV function calculates the future value of an annuity investment, given a constant interest rate and periodic payments of a constant amount. =FV(rate, number_of_periods, payment_amount, [present_value], [end_or_beginning]) rate: defines the annual interest rate. number_of_periods: defines the number of payments to be made. payment_amount: the amount to be paid per period. present_value*: optional. The default value is 0. end_or_beginning*: optional. Indicates whether payments are due at the end (0) or at the beginning (1) of the period
107
Sheets, IMPORTDATA
The IMPORTDATA function allows you to import data from CSV and TSV files stored online using the URL. Example: IMPORTDATA(A2) A2: URL to a csv file result in entry cell: Name Code Afghanistan AF Albania AL Algeria DZ
108
Sheets, IMPORTRANGE
Using the IMPORTRANGE function, you can import data stored in a different Google Sheets file. You can easily combine it with other functions, like QUERY or VLOOKUP. =IMPORTRANGE(spreadsheet_url, range_string) spreadsheet_url: the link to the spreadsheet containing the data you want to import. range_string: the format is “[sheet_name!]range”, where sheet name is optional.
109
Sheets, IMPORTXML
The IMPORTXML function allows you to import data from websites using structured data types like XML, HTML, or CSV/TSV. =IMPORTXML(url, xpath_query) url: the URL of the website. xpath_query: this parameter specifies the data that should be imported.
110
Sheets, VLOOKUP *practice
Vertical Lookup A function that searches for a certain value in a column to return a corresponding piece of information. Syntax: VLOOKUP(search_key, range, index, is_sorted) -search_key: the search term -range: it will search this range. The first column in the range is searched. When the search key is found, the index from that row is returned. For example, if you search for the search_key in column B and return the data from column D, the range would need to include columns B through D, such as the range B2:D10. If you specified a range of A2:D10, the function would search for the search term in column A. -index: the position of the column that contains the data to be returned. The first column in the range is column number 1, and each column is numbered sequentially to the right. For example, if the range is B2:D10 and you want to return a value from column D, the index number would be 3. If the index is not between 1 and the number of columns in range, the error message #VALUE! will be returned. -is_sorted: TRUE or FALSE This indicates whether to return an approximate or exact match. For example, if you’re searching for Google, then google would not count as a match. To return an exact match, set is_sorted to FALSE. This is recommended. To return an approximate match, set is_sorted to TRUE. The nearest match (less than or equal to the search_key) is returned. To use this option to obtain accurate results, you must sort your data in ascending order. But, you could still find a value. If neither TRUE nor FALSE are selected, the function will default to TRUE. TRUE: find a close match (it stops searching when it finds a value that is greater than or equal to the look up value.) FALSE: find an exact match Example: =VLOOKUP(103, A2:B26, 2, FALSE) NOTES: -VLOOKUP will not recognize column names like A, B, or C. You must use a number (i.e., A=1, B=2, etc.) -It will only return the first close match/matching value. (It will not return all of them.) -The search_key must be to the left of the information you want the function to return. This may require you to move columns around before you use VLOOKUP. For example, if you plan to search for the search_key column D, but the information you want the function to return is in column A, you must rearrange your columns before using VLOOKUP. -The #N/A error: #N/A indicates that a matching value can't be returned because no matches were found. Use Cases: 1) Populate data from one spreadsheet to another. For example, we're working with data from two different spreadsheets. We need information from both in order to answer our business question. VLOOKUP can connect two sheets together on a matching column to populate one single sheet. Syntax: VLOOKUP(look-up_value, 'name_of_spreadsheet_to_search_in'!, range_to_check, col_number) Example: =VLOOKUP(A2, 'Employee Rates'!$A$2:$B$5, 2) A2: The first cell in the Employee # column $A$2:$B$5: the locked range 2: search for a match in column 2 (B) for rate of pay *locked the range with absolute cell references.
111
Sheets, VALUE
A function that converts a text string that represents a number (e.g., data/time/number) to a numerical value. For example: The numbers in column A below are saved as text strings. We confirmed this by running a SUM of these values (i.e., SUM(A1:A3). The result was 0. that's because the function doesn't work on text strings. A 1 123 2 456 3 789 To convert cell reference AI to a numeric value, do the following: =VALUE(A1) We can do this to each cell reference above and then run the SUM function again to confirm it was converted to a numeric value.
112
Common Errors using VLOOKUP
1) A value is a text string as opposed to a numeric value. (Use the VALUE() function to convert a text string to a numeric value.) 2) Leading or trailing spaces. (These are often added when copying data from one source to another.) 3) Duplicates. If there are duplicate rows in the VLOOKUP search, it will only return the first match that it finds.
113
Sheets, TRIM()
Removes any trailing or leading spaces for the cell. TRIM(cell reference)
114
Sheets, Remove Duplicates
A tool that automatically searches for and eliminates duplicate entries from a spreadsheet. Sheets>Data>Data Cleanup>Remove Duplicates
115
Sheets, VLOOKUP Common issues
-It will only return the first matching value. (It won't return subsequent matching values.) -It can only look to the right, not to the left. (Workaround: copy and paste the data (i.e., the column) to the left of the column they want to look at.) -The formula doesn't work when copied. This is likely because the table array hasn't been locked/is not absolute. -Someone edits the spreadsheet and it messes up the formulas. To avoid this: --lock the spreadsheet OR --use MATCH (it can help with version control) -Exact and Approximate matching: when using VLOOKUP, you'll likely get different results, depending on whether you enter TRUE or FALSE in your function. *Table Array: A range of cells that is used as input for functions like VLOOKUP, INDEX, and MATCH. It's a table of data that these functions search through to find a single column, row, or a multi-column, multi-row range within the sheet.
116
Absolute reference
A reference that is locked so that rows and columns won't change when copied. For example: Absolute reference: $A:$B vs. range that is not locked: A:B
117
Table Array
*Table Array: A range of cells that is used as input for functions like VLOOKUP, INDEX, and MATCH. It's a table of data that these functions search through to find a single column, row, or a multi-column, multi-row range within the sheet.
118
Sheets, MATCH
A function used to locate the position of a specific lookup value and help with version control.
119
Sheets, how does VLOOKUP work? Common use cases?
VLOOKUP, or Vertical Lookup, searches for a search term, called a search_key, in one column of a spreadsheet. When the search_key is found, the function returns the data from another column of the row from which it was located. VLOOKUP returns only the value that corresponds to the first item it matches. So, if there are multiple matching values, the spreadsheet will return only data about the first one. Common uses cases: Populating data in a spreadsheet. Perhaps a store manager is tracking incoming shipments before a busy holiday. They could use VLOOKUP to look up product ID codes in a product spreadsheet and retrieve the corresponding product information from another spreadsheet. This would help the manager know how many stock clerks they need to schedule to work when the shipments arrive. Merging data from one spreadsheet with data in another. If a teacher keeps one spreadsheet for student grades and another for attendance, they could use VLOOKUP to combine the spreadsheets. That way, they could search for a particular student in the attendance sheet, and VLOOKUP would pull the corresponding attendance record into the grades spreadsheet.
120
Sheets, PRODUCT
=PRODUCT(cell reference, cell reference) Example: =PRODUCT(A2, B3) Process: =A2*B3
121
Summary Table
A table used to summarize statistical information about data.
122
Sheets, AVERAGEIF
This function will average the values in an array based on specific criteria. Syntax: =AVERAGEIF(range, criteria, [sum_range]) Example: What are the average sales in NY? Data Set: A B C D 1 Name State No. Clients Sales 2 Alex NY 1 $964.69 3 Ben NJ 4 $130.03 4 Frank NY 2 $1,110.90 =AVERAGEIF(B2:B4, "NY", D2:D4) =1037.79 So, the average salesperson in NY sells $1,037.79 worth of merchandise.
123
Array
In programming, an array is a collection of values. In a spreadsheet, an array is a collection of values in the cells, not the cells themselves. Example: An array named my_fruits has the values 'banana', 'apple', and 'orange' stored inside of it. Each value has a position, called an index, starting at 0. Array Name: my_fruits Values: 'banana', 'apple', and 'orange Index of the banana: 0 Index of the apple: 1 Index of the orange: 2
124
Sheets, MAXIFS
This function returns the maximum value in a range based on one or more conditions. Notes: -every constraint/all conditions must be met for a cell in the max_range to be considered. Syntax: =MAXIFS(max_range, range1, criteria1, [range2], [criteria2], ...) -max_range: the array for which you want to find the max value -range1: the array you are checking -criteria1: the value that you are checking for -inputs in the square brackets are optional additional constraints Example 1: Find the maximum sales for any salesperson in New York. Data Set: A B C D 1 Name State No. Clients Sales 2 Alex NY 1 $964.69 3 Ben NJ 4 $130.03 4 Frank NY 2 $1,110.90 =MAXIFS(D2:D4, B2:B21, "NY") =$1,110.90 Example 2: Find the maximum sales in New York where the max item cost is below $400. So, the sale must have been made in New York AND the max item cost must be below $400. Data Set: A B C D E 1 Name State No. Clients Sales Max Item Value 2 Alex NY 1 $964.69 $100.00 3 Ben NJ 4 $130.03 $65.00 4 Frank NY 2 $1,110.90 $415.00 =MAXIFS(D2:D4, B2:B4, "NY", E2:E4, "<400") =$964.49
125
Sheets, SUMPRODUCT
A function that multiples arrays and returns the sum of those products. Syntax: =sumproduct(array1, [array2]....) Example 1: What is the total revenue? Dataset: A B C D 1 ID Quantity Unit Price Margin 2 789 50 $1.25 20% 3 790 25 $5.00 25% 4 791 30 $4.25 20% 5 792 80 $3.00 30% 6 793 200 $0.50 15% =sumproduct(B2:B6, C2:C6) Process: (B2*C2)+(B3*C3)+(B4*C4)+(B5*C5)+(B6*C6) Output: 655 Example 2: What is the profit margin on all sales? Dataset: A B C D 1 ID Quantity Unit Price Margin 2 789 50 $1.25 20% 3 790 25 $5.00 25% 4 791 30 $4.25 20% 5 792 80 $3.00 30% 6 793 200 $0.50 15% =sumproduct(B2:B6, C2:C6, D2:D6) Process: Revenue=Quantity*Unit Price Profit Margin on All Sales: Revenue*Margin =((B2*C2) D2) +((B3*C3) D3) + ((B4*C4)D4) + ((B5*C5)D5) +((B6*C6) D6) Output=$156.25 So, $156.25 is the total profit in dollars across all products sold.
126
Profit Margin
How many cents of profit has been generated for each dollar of sale. For example: Product ID 789 has a profit margin of 20%, which means that each product sold earns a total profit of $0.20/every dollar.
127
Calculated Field
A new field within a pivot table that carries out certain calculations based on the values of other fields.
128
Sheets, Pivot table -row -column -value -filter
Rows: Organize and group data selected horizontally. *Rows are the "spine" of the table. They define the primary grouping of your data. *Field selected always appears as a column header on the far left. All unique items will then be listed as row headings. ex// If you create a row in a pivot table for the release date: Release Date - Year 2012 2013 2014 2015 2016 In the above example, "Release Date - Year" was selected for the Row. Then Sheets finds all unique items in that column in the raw dataset (i.e., 2012, 2013, 2014, 2015, and 2016) and makes them the row headings. Columns: Organize and display values from your data vertically. *Columns are a secondary grouping layered on top of the row groupings. Values: Used to calculate and count data. This is where you input the variables you want to measure. This is also how you create calculated fields in your pivot table. Ex// Use the values editor to create columns after first selecting the row that you want to organize the values by. For instance, first pick "Release Date" as the row and then enter "Sum of Box office rev" and "Avg of Box Office rev" and "Count of Box Office Rev" (i.e., number of movies) as values. (Calculated Field: A new field within a pivot table that carries out calculations based on the values of other fields. ) Filter: Enables you to apply filters based on specific criteria. For example, you could a filter to a movie data pivot table so that it only includes movies that generated less than $10M in revenue. Rule of Thumb: -Always add at least one Row field first (even if it’s something simple like “All Data” or a single category). -Then add Columns to break that row grouping into cross-sections. -After that, you can add Values to calculate totals/averages/etc.
129
Operator
A symbol that names the type of operation or calculation to be performed in a formula + - * /
130
Modulo
An operator (%) that returns the remainder when one number is divided by another. Think of MOD() as answering: “Is there anything left over after dividing?” *Possible results for remainder include: 0…n-1 (i.e., the remainder must always be less than the divisor). syntax: MOD(dividend, divisor) Example 1: Return the remainder of 18/4. SELECT MOD(18,4) Output= 2 Business Use Case 1, Assigning Customers to Buckets: Split customers into test groups for A/B testing. SELECT customer_id, CASE WHEN MOD(customer_id, 2) = 0 THEN 'Group A' ELSE 'Group B' END AS test_group FROM `my_dataset.customers`; Note: -In the above query, you should get "0" for all even numbers (Group A) and "1" for all odd numbers (Group B). This is because the "0" means the division had no remainder (e.g., 2/2 is 1 and has a remainder of 0 & 4/2 is 2 and has a remainder or 0, etc.) Business Use Case 2, Cyclic Scheduling: Assign tasks in rotation (i.e., rotating staff shifts or delivery trucks) SELECT order_id, MOD(order_id, 7) + 1 AS delivery_day_of_week FROM `my_dataset.orders`; NOTE: -The remainder obtained from a division operation must always be less than the divisor. So, when you divide an integer by 7, the remainder can only ever be less than 7, specifically 0, 1, 2, 3, 4, 5, 6. -As such, to assign a day of a week to each remainder, we must +1 so that we never have a "0" day (i.e., Monday=1, Tuesday=2, etc.) -Integer: a number with no decimal or fractional part. Both positive and negative numbers are integers. Business Use Case 3, Cycle-Based Discounts: Issue discount ever Nth customer or order; in this case, every 10th order. SELECT order_id, customer_id, CASE WHEN MOD(order_id, 10) = 0 THEN 'Discount Applied' ELSE 'No Discount' END AS discount_flag FROM `my_dataset.orders`; NOTES: -Because the divisor is 10, only order IDs that are multiples of 10 (10, 20, 30, …) will satisfy = 0. Business Use Case 4, Partitioning Without Explicit Keys: Distribute rows into buckets for parallel processing/split a large dataset into 4 roughly equal partitions for balanced processing. SELECT user_id, MOD(ABS(FARM_FINGERPRINT(user_id)), 4) AS partition_bucket FROM `my_dataset.users`;
131
Functions vs Operators
Often you can use functions over operators in sheets to perform calculations. Example: Spreadsheet Function vs SQL Function SUM SUM AVERAGE AVG
132
Modulo tips
Recap of function & syntax: MOD(dividend, divisor) → remainder. Possible results: 0…n-1 Tips: -MOD(n, 2) → checks even vs odd. 0=even. MOD(n, 7) → cycles through 0–6, perfect for 7-day rotations. MOD(n, 10) → cycles through 0–9, perfect for “every 10th thing.” MOD(x, 0) errors → Guard rails:
133
134