What is a concept drift?
It is a change in the relationship between input and output of models. It doesn’t have to be necesseraly connected to data drift, it can be influenced by many different things like hidden context - ex. user behavior has changed over time due to influence of economy strength which is not visible in data.
What types of concept drift exist?
What are two main types of distributed training architecutres?
Data parallelism - split the training data between multiple worker nodes
Model parallelism - as the model can’t fit in a memory, split the model but use the same data
What are 2 common data parallelism model approaches? Explain in detail and when to use them.
What is model parallelism?
It is a distributed training architecture where model is split in layers and each layer is used for training on the same mini-batch and communicating with other layers. They have to be in-sync. This is used when model is too big and it can’t fit in a memory.
What are 4 types of Tensorflow distirbuted training strategies?
How Mirrored strategy for distributed training works?
It is used when there is a single machine with multiple GPUs. Model is replicated on each GPU and mini-batch size is split based on number of GPUs. Parameters have to be in sync across GPUs.
How Multi-worker Mirrored strategy for distributed training works?
Almost the same like for Mirrored strategy, the only difference is that there are now multiple machines that have multiple CPUs or GPUs and all of them are splitting the mini-batch. You need to define which machine is a chief (master) and which machines are worker nodes.
How TPU strategy for distributed training works?
The same like for Mirrored strategy, only difference is that workload is split between TPU cores. This strategy is optimised for biggest workloads and main consideration is to make sure that there is enough data that can be used and models are not sitting stale.