Series
Series syntax
list_var = [‘list’]
series_var = pandas^.Series(list_var)
^ = pandas can be whatever alias you assign it when importing the dependency
Retrieve a series syntax
series_var
DataFrame
-2-dimensional labeled data structure w/ rows and columns of potentially different data types where data is aligned in a table
DataFrame from Dictionary syntax
var_df = pandas^.DataFrame(dict_var)
^ = pandas can be whatever alias you assign when importing the dependency
Retrieve a DataFrame syntax
var_df
DataFrame naming best practices
Name with “_df” at the end to distinguish DataFrames from Series and Variables
DataFrame from List(s) syntax
#Create empty _df var_df = pd.DataFrame( ) #Add List to _df var_df['Column Header of my Choosing'] = list_var
3 Main Parts of a DataFrame & how you can access them
Can be accessed w/ the columns, index, and values attributes
Columns attribute syntax + Output
var_df.columns
Index( [‘Column1’ , ‘Column2’ , ….] ), dtype = ‘object’
Object may be other data type? tbd
Index attribute syntax + output
var_df.index
RangeIndex(start = 0 , stop = endIndex , step = increment)
i.e. var_df has 5 entries, incremented by 1
RangeIndex(start = 0, stop = 5, step = 1)
Values attribute syntax + output
var_df.values
Outputs the values without column names (ex. below has 3 columns ID, School, Type):
array( [ [ 0, ‘Huang High School’ , ‘District’ ] ,
[1, ‘Figueroa High School’ , ‘District’] , … ] dtype = object)
Convert csv file into DataFrame syntax/example
# Declare filename variable for csv
file_to_load = os.path.join('path' , 'filename.csv')
#Create DataFrame file_data_df = pd.read_csv(file_to_load)
head( ) and tail( ) methods: syntax + what they do
var_df.head( ) - returns top 5 rows of DF
var_df.tail( ) - returns last 5 rows of DF
inserting a number in the ( ) will return that many rows from top/bottom i.e. var_df.head(10) will return top 10 rows
count( ) method: what it does + syntax
Provies a count for the rows for each column containing data. “Null” values are not counted by default.
var_df.count( )
isnull( ) method: what it does + syntax
Determines empty rows. Returns boolean T/F. True if empty, False if not.
var_df.isnull( )
sum( ) method w/ isnull() or notnull(): what it does + syntax + output
Gets total number of empty rows that are marked as “True”
var_df.isnull( ).sum( )
Outputs all column names and sum of “True” values in each column
notnull( ) method: what it does + syntax
Returns T/F, w/ “True” for not empty and “False” if it’s empty value
NaN in a DataFrame
Means ‘not a number’ and cannot be equal to zero
Options for Missing Data
Do Nothing (missing data) considerations
- If we wish to multiply/divide with a row that has a NaN, the answer will be NaN
Drop the Row (missing data) considerations
Method to drop a row with NaNs + syntax + note about indexes
dropna( )
var_df.dropna( )
-Indexes do not reset automatically (0, 1, 2, 3) w/ 2 dropped is now (0, 1, 3)
Fill in the Row (missing data) considerations
- Must carefully consider the values you insert for every downstream analysis perfomed