define data
. Data can be a piece of information or facts and statistics collected together
for reference or analysis
define training data
it is the primary dataset tht is fed into the system for the purpose of developing and training it.
give some examples where the machine takes in different trainng data
what is the validating data set
Also called secondary data set
This data is used to check if the newly developed model is correctly identifying the data for making predictions.
what does validating step ensure
This step makes sure that the new model has not become specific to the primary dataset values in making predictions.
If that is the case then corrections and tweaks are made in the project.
The primary and the secondary data sets are also re runs through the model untill the desired accuracy is achieved.
define testing data
it is the final dataset which paves the way for the machine model to enter the real world and start making predictions
how does testing data differ from training and validating data
All primary and secondary data come with relevant label tags on the data
The testing data is the final dataset which provides no help in terms of tag to the model produced
define datawarehousing
Data is always collected in bulk from various sources using various formats. The storing of this data is called data warehousing
define data features
data features are the factors and parameters that affect the problem directly or indirectly.
what shud be the chracteritics of trainign data
For better efficiency of an AI project, the Training data needs to be relevant and authentic. Data plays an important part of the AI project as it creates the base on which the AI project is built. Therefore, the data acquired should be authentic, reliable and correct.
what should be the characteristics of our data sources
it is necessary
to find a reliable source of data from where some authentic information can be taken. At the same
time, we should keep in mind that the data which we collect is open-sourced and not someone’s
property. Extracting private data can be an offence.
what are most reliable and authentic data soucres
One of the most reliable and authentic sources of
information, are the open-sourced websites hosted by the government. These government portals
have general information collected in suitable format which can be downloaded and used wisely.
ex: data.gov.in, india.gov.in
examples of some data sources
define system map
it is a tool used to infer relationships between the different data features.