Data
Raw facts and figures without any meaning.
Information
Data shaped into a meaningful form through context
Knowledge
Results from experience and education. The info collected by an entity and the ability to locate additional info
DNA of a Tweet
Relational Database
Database organises data into one or more tables. Tables consist of columns called attributes or fields and rows called tuples or records.
Keys
Unique Primary Key identify each row. Keys can be used to link different tables.
SQL
Standard Query Language is the defacto standard language used to communicate with database systems. We can create, delete and manipulate databases using SQL.
Categories of commands in SQL
SQLite
SQLite stores the whole database in a single file. Once you have access to the file, you have access to the database. DCL is not part of SQLite
Creating tables
CREATE TABLE Table Name (Column1 DataType (optional length), Column2 DataType (optional length),… PRIMARY KEY (KeyColumn1, KeyColumn2)
Creates a table and declares a composite primary key across multiple columns.
You can’t have spaces in table name and column names unless you put [].
Data Types
TEXT: character string
INTEGER/INT: signed integer
BLOB: binary large object, videos?photos stored in binary language
REAL: floating point numbers
NUMERIC/NUM: anything not covered by the rest, includes BOOLEAN, DATE, DATETIME
Redundancy
Redundancy=repetition should be avoided in databases since it:
- Increases the required amount of storage
- Is a source for data inconsistencies
BUT: it can improve databases since the join operator becomes uneccessary
Data Normalization
Normalization is a technique that manages the trade-off between redundancy and ease of use/speed of database. It is a standardized technique that transforms a database iteriatively accordingly to a strict set of rules.
A database satisfies a normal form, if all tables in that database satisfy the normal form.
First Normal Form 1NF
How to solve when there is multiple players with the same name?
a) create a composite key (player, date of birth)
b) create a unique player ID
Second Normal Form
How to solve partial dependencies?
a) we can create a new column that provides an artificial key to the table such as ID (the primary key consists of a single value so 2NF is satisfied)-> doesn’t reduce redundancy
b) we split the table into 2 (with the second table containing the tournament information)-> reduces redundancy
Third Normal Form
How to solve transitive dependencies?
By splitting the table (with the second table containing player info)
Results of Normalization
Entity Relationship Models
Entity Relationship Models describe the relationship between things of interest. They are visualized as Entity Relationship Diagrams. ERD
Components of ERD
Relationships and Cardinalities
Relationships explain the link between two entities. Cardinalities refer to the number or entities in one entity set that are related to entities in another entity set.
Types of Cardinalities