What is the Difference Between
Data analysts
Data scientists
Data engineers
Data analysts are primarily people who develop insights with data ….
Explain the different analytic levels
Descriptive Analytics: gain insight from historical data
* plot sales results by region and product category
* correlate with advertising revenue per region
Predictive analytics: make prediction using statistical and
machine learning techniques
* predict next quarter’s sales results using economic projections and advertising targets
Prescriptive analytics: recommend decisions using optimisation, simulation, etc.
* recommend which regions to advertise in given a fixed budget
Which of the following is a prescriptive analytics task (as opposed to a predictive analytics task)?
A. Suggesting a traffic route based on prior data for the time of data and incident reports.
B. Predicting travel time of multiple traffic routes
C. Estimating the student enrolment number of FIT5145 in 2023 Sem 1
D. Measuring the likelihood of a student getting HD in the final exam of FIT5145
A. Suggesting a traffic route based on prior data for the time of data and incident reports.
What are influence diagrams?
method for modeling data and decision making
Influence Diagrams (a.k.a Decision Graphs) are:
* directed graphical model with 4 types of nodes:
- chance nodes, known variable nodes, action/decision nodes and objective/utility nodes
Explain the node types of an influence diagram
An Influence Diagram:
A. is a model giving possible situations or outcomes.
B. consists of nodes and arcs.
C. is an alternative to decision tree.
D. consists of nodes and arcs and is an alternative to decision tree.
D. consists of nodes and arcs and is an alternative to decision tree.
Name the four growth laws
Explanations about change in IT and society:
What does Moore’s Law say?
==> capability and size of IT
Number of transistors per chip doubles every 2 years (starting from 1975)
Transistor count translates to:
* more memory
* bigger CPUs
* faster memory, CPUs (smaller==faster)
Pace currently slowing
What does Koomey’s Law say?
==> capability and size of IT
What does Bell’s Law say?
==> purpose of IT
What does Zimmermann’s Law say?
==> relationship between privacy and IT
Growth, business, and business models
As information technology develops and with more data collected, businesses utilise it and incorporate it in their business models (–> innovation)
Definition business model:
A business model describes the rationale of how an organization creates, delivers, and captures value, in economic, social, cultural or other contexts.
What kinds of businesses do we have operating in the Data Science world?
Information brokering service: buys and sells data/ information for others
Information-based differentiation: satisfies customers by providing a differentiated service built on the data/information.
Information-based delivery network: deliver data/ information for others.
Information provider: business selling the data/ information it collects.
The Bloomberg Terminal:
* a computer system provided by Bloomberg L.P.
* enables professionals to monitor and analyse real-time financial market data
* also place trades on the electronic trading platform
* is a proprietary secure network
Amazon.com
* An assembly line for the retail industry, with support for embedded online retailers.
* Huge stock of books, DVDs, CDs, etc. easily searchable.
* extensive cusomter reviews
–> Information-based differentiation: satisfies customers by providing a differentiated service (superior information (reviews), range)
–> Information-based deliverynetwork:
- they deliver information for others;
- retailers in the Amazon marketplace get customers directed to them and other retailer’s support
LexisNexis
- provides world’s largest electronic database for legal and public-records related information.
What is statistics?
“The practice or science of collecting and analysing numerical data in large quantities, especially for the purpose of inferring proportions in a whole from those in a representative samples”.
Two main statistical analytical methods:
* descriptive statistics – explaining data
* inferential statistics – finding regularities in irregular data
Mode, median, mean, variance, standard deviation
mode: which value is most common,
median: what is the value in the middle of the data
mean: the average value.
variance: average of how much values tend to differ from the mean.
Standard deviation: is the square root of the variance.
Example
Data: 2, 4,4,4,5,5,7,9
Mode: 4
Median: 4.5
Mean: 5
var = ((2-5)^2 + (4-5)^2 +(4-5)^2 + … + (9-5)^2)/8 = 4
sd = 4^0.5 = 2
Name the different variable types
Categorical, qualitative
* Groups or categories
* Nominal – no natural ordering e.g. blood type, eye color
* Ordinal – ordered e.g. education level
Quantitative
* Numerical
* Discrete – specific values –> counts like number of customer complaints
What are outliers?
Outliers are values outside of the expected parameters for the data
- Errors
- Exceptional circumstances
- Chance
What is a boxplot?
Combine quartiles, median and outliers
Quartile: divide the data into quarters
Interquartile range (IQR): The difference between the lower and upper quartiles
What are the pros and cons of a motion charts?