deck_edw_ses_monitoring Flashcards

(32 cards)

1
Q

EDW (Enterprise Data Warehouse)

A

Central database that collects and stores data from multiple source systems across the company. At Likewize this is where the BI team aggregates CTR data, HITS data, Zendesk data, and other operational feeds into one place for reporting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

DCNA2-P-BI-O1

A

Database server name for the Likewize EDW. This is the server you connect to in SSMS to browse all the reporting tables. Contains HITS 3, HITS 2, and external data views including AWS Connect data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

SSMS (SQL Server Management Studio)

A

Free Microsoft application used to connect to SQL Server databases, browse tables, and run queries. Download it from Microsoft directly. You connect by entering the server name (DCNA2-P-BI-O1), then you can explore tables and write SQL against them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

SQL (Structured Query Language)

A

Language used to interact with relational databases. Core syntax is SELECT (what columns you want), FROM (what table to look in), WHERE (what conditions to filter by). Straightforward to learn, non-technical people pick it up quickly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

NVARCHAR

A

SQL Server data type that stores text as a string. The EDW uses NVARCHAR for most columns because it makes data ingestion easier since you don’t get type mismatch errors on import. Downside is you can’t sort or filter dates or numbers properly because the database treats them as text, not actual date or number values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Indexed Column

A

A database column that has been optimized for fast lookups and sorting. Think of it like a table of contents in a book. Without an index the database has to scan every single row to find what you’re looking for, which gets slow on large tables. In the EDW, initiation_timestamp is indexed so you can sort and filter by it quickly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

AWS Connected Table

A

Table inside the Likewize EDW that is built from CTR data coming out of the Connect instance. The BI team already populates it with attributes like initial contact ID, timestamps, and call handling data. Also has columns for CSAT call survey questions. Does not have a row_start_date column which is unusual compared to other EDW tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

CTR (Contact Trace Record)

A

Record generated for every interaction in Amazon Connect containing full metadata about the contact. Includes agent name, queue, handle time, contact ID, timestamps, survey responses, and other attributes. CTR files flow to S3 and are also ingested into the EDW by the BI team.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Row Start Date

A

A timestamp column that normally exists on every EDW table indicating when the row was inserted. The AWS Connected table is missing this column, which makes it harder to find recent records. You have to sort by another column like initiation_timestamp instead.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

BI Team

A

Business Intelligence team at Likewize that reports into Ranga. They extract data from multiple sources including CTR logs and load it into the EDW for reporting. Swapnil is the key person who understands the data. They already consume all CTR files, so they may be able to add new fields like CSAT survey results without going through a full product requirement cycle.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Swapnil

A

BI team member who understands everything about the EDW data. Can potentially add new CTR attributes to the database without formal product or requirements cycles. If you need a new field parsed out of CTR logs and added to the AWS Connected table, he is the person to talk to.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Interim Reporting via Excel Workbook

A

Temporary reporting approach where you embed a SQL query inside an Excel workbook. The end user (like Jo) opens the workbook, right-clicks, and hits refresh to pull in the latest data from the database. DK built the supply planner reports that are used globally using this method. Works as a bridge until a proper dashboard UI is built.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

EDW External Data Views

A

Section of the EDW database containing views built from external data sources. This is where the AWS Connected table lives alongside data from other systems like Zendesk. There are roughly 500+ tables and views in the EDW total.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Jason’s US/Canada CSAT Solution

A

A separate CSAT survey implementation that Jason built for US and Canada. It does not use the DynamoDB-based approach that the UK solution uses, so survey data from Jason’s solution may appear differently in the CTR records. Important to know when looking at CSAT columns in the EDW because the data format depends on which solution captured it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Amazon SES (Simple Email Service)

A

AWS service for sending and receiving email. In the Connect context, you verify a domain on SES so it appears in the dropdown inside the Connect instance to enable email as a channel. SES is also used by Cognito to send OTP communications like one-time passcodes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

SES Domain Verification

A

Process of proving you own a domain so SES allows you to send and receive email through it. Once verified, the domain shows up as a selectable option in the Connect instance settings. Required even if you are not using SES as your primary email gateway.

17
Q

Cognito

A

AWS identity management service that handles user registration, email verification, and one-time passcode delivery. At Likewize it is used on HITS portals and GTP portals for user authentication. Cognito sends all its OTP communications through SES, so if SES is misconfigured, OTP delivery can break.

18
Q

OTP (One-Time Passcode)

A

A temporary code sent to a user to verify their identity during login. Cognito generates the code, user enters it, Cognito checks if it matches. At Likewize, OTP is used on HITS and GTP portals. Cognito delivers OTPs through SES, so any SES misconfiguration directly affects whether users can log in.

19
Q

Cognito Identity Pool

A

A Cognito resource that provides temporary AWS credentials to users so they can access AWS services directly. During the hotfix, Sarab mistakenly blamed the IVA team for manually creating a Cognito identity pool. The actual issue turned out to be related to an SES misconfiguration in the same region.

20
Q

SES Misconfiguration (Hotfix Discovery)

A

During a hotfix, DK found that the likewize.com domain already existed in certain SES instances across multiple AWS regions. This misconfiguration was contributing to Cognito OTP delivery failures. Showed that the security team did not fully understand how SES and Cognito interact within AWS.

21
Q

GTP Portal

A

One of Likewize’s web portals that uses Cognito and OTP for user authentication. Along with HITS portals, it depends on SES being correctly configured in order for one-time passcodes to be delivered to users.

22
Q

SolarWinds

A

Database monitoring tool used by Likewize to watch production database performance. Tracks query execution, resource usage, and can alert when databases are degrading. DK logged in during an incident and found that monitoring on the production primary database had been turned offline with no alert generated.

23
Q

AppDynamics

A

Application performance monitoring tool used to watch HITS application servers. Shows error rates, response times, and transaction health. During the IVA timeout incident, AppDynamics showed a big spike in errors on the EMEA HITS servers which confirmed the problem was on the HITS side, not AWS.

24
Q

DBA (Database Administrator)

A

Person responsible for managing, tuning, and monitoring databases. At Likewize, DBAs are part of a shared service team. DK told them to monitor databases post-release and tune any problematic queries, but they don’t do it proactively. They wait to be told exactly what to do.

25
Query Tuning
Process of optimizing a database query so it runs faster and uses fewer resources. When a query is inefficient it can slow down or crash the entire database. In the EMEA incident, there was a query running that needed DBA attention and tuning, but nobody was watching for it.
26
Production Database Monitoring Offline
DK logged into SolarWinds and found the monitoring agent on the production primary EMEA database was offline. No alert was generated to notify anyone that monitoring had stopped. This meant database issues could happen without any automated warning.
27
Shared Service Team Culture Issue
The COE shared services team operates on a model where they provide people, but those people need to be told explicitly what to do. They don't operate independently, monitor proactively, or take initiative. This is a cultural problem that causes production issues to go undetected until someone outside the team notices.
28
US Database Crash (Dish Incident)
Earlier in the week, a BI job got hung up and took down the US production database server. This affected an API that Dish calls. The issue self-recovered after about 15 minutes. Dish monitors Likewize APIs externally, so they would have escalated if it hadn't recovered. SolarWinds monitoring was also offline during this event.
29
Dish API Monitoring
Dish Networks monitors Likewize's APIs externally as part of their integration. If the API degrades or goes down, Dish sends an escalation email and expects Likewize to join an emergency bridge call. This creates external pressure to keep production databases stable.
30
APAC Database Issues
When DK logged into SolarWinds to check all regions during the EMEA incident, APAC databases were showing active problems. US was okay. This highlights that database health varies by region and no single team is watching all of them consistently.
31
Performance Testing Gap
Likewize does not do any performance testing before pushing releases to production. Production is effectively the performance testing environment. This means you don't know your system's limitations until something breaks in front of real customers. Makes it impossible to say whether an issue is normal load or a capacity problem.
32
Distribution List Restructuring
The COE restructured all internal distribution lists without telling anyone. Some old distros still work, some don't. Makes it hard to contact the right team during incidents. DK had to hunt for the correct DBA distribution list to report the EMEA query issue.