5.1 Enumerate the 10 points of difficulty managing data.
5.1 What are the multiple sources of data (one of the difficulties of managing data)? Give examples. Hint: IPEN
Internal sources
• Corporate databases, company documents
Personal sources
• Personal thoughts, opinions, experiences
External sources
• Commercial databases, government reports, corporate websites, clickstream data
New sources
• Blogs, Tweets, videos, sensor tags
5.1 What is the main solution to the difficulties of managing data?
Solutions to these difficulties include effective data governance.
5.1 What is data governance?
Data governance is an approach to managing information across an entire organization. It involves a formal set of business processes and policies that are designed to ensure that data are handled in a certain, well-defined fashion.
5.1 What are the objectives of data governance? How do organizations accomplish these? Hint: ATU
How?
Using business processes and policies for handling data in a certain well-defined way. Following unambiguous rules to create, collect, handle and protect data.
5.1 What strategy does data governance use to implement sound data governance?
Master Data Management
5.1 What is Master Data Management?
Master data management is a process that spans all of an organization’s business processes and applications.
It provides companies with the ability to store, maintain, exchange, and synchronize a consistent, accurate, and timely “single version of the truth” for the company’s master data.
5.1 Why aren’t master data and transactional data the same?
Master data are a set of core data, such as customer, product, employee, vendor, geographic location, and so on, that span the enterprise’s information systems.
Transactional data are generated and captured by operational systems, describe the business’s activities, or transactions.
Master data are applied to multiple transactions, and they are used to categorize, aggregate, and evaluate the transactional data.
5.1 What are the resulting benefits of master data management? Hint: ASF = E
Master data management leads to
5.3 What is structured data? What is unstructured data? Give examples.
Structured data fits into predefined fields and can be organized into a spreadsheet or a relational database.
Examples: names, dates, addresses, credit card numbers, etc.
Unstructured data is heterogenous and does not fall within standard fields.
Example: email messages, audio files, Facebook posts, ratings, recommendations.
5.3 Define Big Data.
We refer to the superabundance of data available today as Big Data. Big Data is a collection of data that is so large and complex that it is difficult to manage using traditional database management systems.
Essentially, Big Data is about predictions that come from applying mathematics to huge quantities of data to infer probabilities.
5.3 Where do Big Data come from (sources)?
5.3 What are the three distinct characteristics of Big Data?
Volume + Velocity + Variety
5.3 What are the three issues with Big Data?
5.3 Name 5 functional areas of the organization where Big Data is used. Hint: HP + OMG
5.4 What are the elements of a generic warehouse environment?
5.4 What is a data warehouse? What is a data mart?
A data warehouse is a repository of historical data that are organized by subject to support decision makers within the organization. Because data warehouses are so expensive, they are used primarily by large companies.
A data mart is a low-cost, scaled-down version of a data warehouse that is designed for the end-user needs in a strategic business unit (SBU) or an individual department. Data marts can be implemented more quickly than data warehouses, often in less than 90 days. They support local rather than central control by conferring power on the user group.
5.4 What is metadata?
Metadata is data about the data in a repository
5.4 What is a data lake?
A data lake is a vast pool of raw data, the purpose for which is not yet defined.
Data lake includes structured and unstructured data, whereas a data warehouse is a repository for structured, filtered data that has already been processed for a specific purpose.
5.2 What is a data file?
A data file is a collection of logically related records. (ex.: a shopping list)
In a file management environment, each application has a specific data file related to it.
5.2 What are the three issues with file management system? Hint: RICK
5.2 What is the database and which problems does it minimize? Hint: RICK
Database:
Databases minimize the following problems:
5.4 What are the differences between data warehouses and databases? Hint: think about content + search time + goal
Database:
Data warehouse
5.2 What do Database Management Systems maximize? Hint: ISIS
Data security:
Data integrity:
Data independence: