What is semi-structured data?
Semi-structured data lies in between fully structured data (like relational databases) and entirely unstructured database (arbitrary data files)
What is fully structured data?
Data that fits a strong schema, which allows you to make highly efficient queries possible but you do need highly specific shapes/structure
What is unstructured data?
What is semi-structured data?
In between the two extremes you have semistructured data
Tries to pick best features of both extremes, has lots of flexibility but no schema
Describe a semistructured data model
Describe each of the elements that make up a semi-structured data tree like model
What advantage do semi-structured data models have over structured data models?
What is semi-structured data useful for storing?
What are some of the forms for storing semi-structured data?
XML, JSON, KEY-VALUE, Graphs
Order these types of databases from fastest to slowest for accessing data: XML, JSON, Key-value, relational database
What is the structure of an XML document
What is not a problem in XML but is in file systems?
What can XML trees not have?
in XML, we can’t have nodes with multiple parents because XML files are always trees
- We can have references in trees though, that say this node points to this other node and it’s basically how shortcuts are done in a file system.
What is the form for an XML element?
What do you do if you want to leave an element empty?
You just combine the opening and closing tags by writing <keyword></keyword>
- elements are case sensitive so the keywords defining them must be the same
How are attributes defined in elements in XML documents?
When should something be an attribute and when should something just be another element
When should something be an attribute and when should something just be another element
What is document order?
What is a DTD?
Document type definition or XML schema are used to define a schema for your XML files, this must be done at the start of the document
What are Entity references
Entity references are basically the shortcuts, so if you wanted to say that two elements were both members of a group, you need to point to one of them instead of writing them on both of them.
Why do we use Entity references
We do this because if you just read it as a file, then this could insinuate that there are two different groups instead of two places pointing to the same group
What is CDATA used for?
For passing information onto the processor or the application being used by you XML file for
-for example if you want to use < or > inside your text then you need to define that the XML processor knows this isn’t an error - this can be done with CDATA sections.
What is a good way of defining format of an XML file