Subsetting & Slicing methods with pandas
df.loc[]
Access rows or columns by labels or Boolean array; slices: start & stop included
df.loc[]:
- elements
- row with label 1 as series
- row with label 1 as Data Frame
- rows from start (0) to end (5)
- all rows & named column
df.iloc[]
Subsetting rows with index 0 to 4
df[0:5]
Subset columns
Subset rows & columns
df[0:5] [[“col_name1”, “col_name2”]]
Subsetting vs referencing of datasets
Filtering data: Subset Dataframe’s rows or columns according to specified row or column labels
df.filter(like=”culmen”, axis = 1)
Cleaning data - Checking & removing duplicates
Cleaning data - Remapping values (e.g. due to faulty data or analysis requires transformation)
df.replace()
Cleaning data - Dealing with text
Reshaping data - Unpivot DataFrame from wide to long format
melt()
wide_to_long()
Reshaping data - Change from long to wide format
pivot()
JavaScript Object Notation (JSON)
Joining data by join
Joining data by union
Joining data by intersections
merge()
Df[…] vs loc & iloc