System Design - Problem Solving Flashcards

(4 cards)

1
Q

How would you design a customer support chatbot for an online banking platform (like Capital One) to ensure it provides secure, helpful, and relevant responses?

A

Architecture: Chatbot frontend + secure backend APIs (no direct DB access).

Security: Enforce authentication (OAuth2, MFA), encrypt data, mask sensitive info, use least privilege.

NLU: Detect intents (e.g., check balance, lost card) and retrieve FAQs safely.

Responses: Use templates for account data, LLM or retrieval for general FAQs, avoid hallucinations.

Masking: Prevent access to future or unrelated customer data.

Monitoring: Log safely, detect misuse, retrain on anonymized data.

Compliance: Follow GDPR, PCI DSS, and banking privacy laws.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

As a consultant advising a fintech on using generative AI for fraud detection with a labeled dataset of 50,000 transactions, how would you decide between RAG, prompt engineering, or fine-tuning? What factors guide your choice, and how would your recommendation change if the dataset doubled?

A

Choice depends on dataset size, task specificity, and performance needs:

  • Prompt engineering: Quick, low-resource; suitable for small datasets or exploratory tasks; limited accuracy for complex fraud patterns.
  • RAG: Combines retrieval with LLM reasoning; handles rare or evolving fraud cases; useful for complex patterns and explainability.
  • Fine-tuning / PEFT: Uses labeled data to adapt the model; best for medium/large datasets (50k+); provides high accuracy and explanations; requires more compute.

Factors guiding choice:

  • Dataset size & quality: Small → Prompt or RAG; Medium/Large → Fine-tuning/PEFT
  • Task complexity & explainability: Simple → Prompt; Complex → RAG or Fine-tuning
  • Resources: Limited → Prompt or RAG; Enough compute → Fine-tuning/PEFT
  • Fraud pattern changes: Rapid → RAG; Stable → Fine-tuning/PEFT

If dataset doubles (~100k):

  • Fine-tuning/PEFT becomes more attractive due to more labeled data supporting supervised learning.
  • RAG still useful for rare or unusual fraud cases.
  • Prompt engineering alone likely insufficient for high accuracy.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Let’s say BCG is helping a large e-commerce client develop a generative AI tool that can automatically generate marketing copy and brand visuals by processing both product photos and text reviews (multi-modal), rather than relying on text data alone (unimodal). How would you think about the implications of using a multi-modal model versus a unimodal one in this business setting, and what steps would you take to identify and reduce potential biases in its outputs?

A

Implications of multi-modal vs unimodal:

  • Multi-modal:
    • Uses both product images and text reviews → richer, more context-aware outputs.
    • Generates visuals aligned with copy and customer sentiment.
    • More complex and compute-intensive; biases from images and text can combine.
  • Unimodal (text only):
    • Easier to train and deploy.
    • Limited context; may miss visual cues or aesthetic details.

Steps to identify and reduce biases:

  1. Audit training data: Check text and images for gaps or imbalances in product types, styles, or demographic representation.
  2. Analyze outputs: Look for stereotypical, unfair, or harmful content in copy and visuals.
  3. Bias mitigation:
    • Add diverse examples to training data
    • Filter or post-process outputs
    • Evaluate fairness and inclusivity metrics
  4. Human review: Marketing and ethics teams validate outputs before publishing.
  5. Continuous monitoring: Track outputs over time and retrain to correct emerging biases.

Summary: Multi-modal models produce richer, aligned outputs but need careful bias auditing; unimodal models are simpler but less context-aware.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

You work as a machine learning engineer at Amazon, focusing on product discovery. Your team is tasked with building a system that enables customers to enter a text description, such as “red hiking backpack with water bottle holder”, and retrieve the most relevant product images from Amazon’s vast catalog.

How would you design this system end-to-end?

A

1. Data Preparation:

  • Collect product images, titles, descriptions, and metadata.
  • Clean and normalize text (tokenization, lowercasing, remove stopwords).
  • Preprocess images (resize, normalize, optional embeddings).

2. Feature Representation:

  • Text: Encode using a pre-trained language model (e.g., BERT, Sentence-BERT) to get dense embeddings.
  • Images: Encode using a pre-trained vision model (e.g., ResNet, CLIP visual encoder) to get image embeddings.
  • Optional: Use a multi-modal model (like CLIP) to embed text and images in the same vector space.

3. Indexing & Retrieval:

  • Store image embeddings in a vector database (e.g., FAISS, Milvus, Pinecone).
  • At query time: Encode user text into the same embedding space.
  • Retrieve top-K nearest image embeddings using cosine similarity or dot product.

4. Ranking & Post-processing:

  • Re-rank retrieved results using additional signals: popularity, relevance, or business rules.
  • Optional filtering by category, price, or availability.

5. System Design Considerations:

  • Use caching for popular queries.
  • Ensure low-latency retrieval for real-time user queries.
  • Monitor system performance and periodically update embeddings as catalog grows.

6. Evaluation:

  • Metrics: Precision@K (Fraction of top-K retrieved items that are relevant.) , Recall@K (Fraction of all relevant items that appear in the top-K results.) , MRR. (Average of reciprocal ranks of the first relevant item across queries) -> Evaluate how well a retrieval system ranks relevant results at the top.
  • Perform A/B tests to measure user satisfaction and click-through rate.

Summary:
Transform both text and images into embeddings (ideally in a shared space), store image embeddings in a vector database, retrieve nearest neighbors for user queries, then rank and filter results for relevance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly