Vague Purpose
The purpose of a data model needs to be clear because it determines the level of detail, scope, and completeness of the database. Without a clear purpose, designers may include irrelevant information or omit crucial data. For instance, instead of saying “create a database about students,” specify “create a student database for the registration office.” This clarifies what should be included (classification and address) versus what shouldn’t (hobbies or personal interests). This can be avoided by conducting thorough requirements gathering and confirming the specific business purpose with stakeholders
Literal modeling
Literal modeling occurs when the database designer implements exactly what the customer requests without analyzing the underlying needs. Since customers typically lack database design expertise, their requests may reflect how they currently work rather than optimal database structure. For instance, if a customer asks for “a table showing each class with its list of students,” a literal approach would create one problematic table, while a better design creates separate Student and Course entities linked through an enrollment relationship. This can be avoided by analyzing the business requirements deeply, identifying the true entities and relationships, and educating customers about better alternatives.
Large size
A large database is one with over 200 entities/tables, which typically indicates unnecessary complexity, poor design, or inadequate scope definition. Such databases become difficult to understand, maintain, and modify, and they require excessive development time and resources. This can be avoided by simplifying the model through proper normalization, eliminating redundant entities, clearly defining scope boundaries, and potentially breaking very large systems into separate, integrated databases.
Speculative content
Speculative content refers to attributes, entities, or relationships included in anticipation of future needs rather than current requirements. While some forward-thinking is reasonable, excessive speculation bloats the database, increases development time, and often proves incorrect about future needs. This can be avoided by focusing primarily on current, confirmed requirements while including only well-justified future considerations, and designing the database to be flexible and easily extensible when actual needs arise.
Lack of clarity
Lack of clarity occurs when entity names, attribute names, or data values are ambiguous, cryptic, or unclear. For example, using “N” instead of “Name,” or codes like “1” and “2” instead of descriptive values like “Full-Time” and “Part-Time” creates confusion. This makes the database difficult for users to understand and query, and can lead to errors in data entry and interpretation. This can be avoided by using descriptive, self-explanatory names; creating data dictionaries; establishing clear naming conventions; and storing meaningful values rather than cryptic codes (or documenting codes thoroughly if they must be used).
Needless redundancy
Needless redundancy occurs when the same information is stored in multiple tables without a valid reason (such as performance optimization). For instance, storing a student’s classification in both the Student table and the Enrollment table creates redundancy. This wastes storage space and, more critically, creates update anomalies where data can become inconsistent if changed in one location but not others. This can be avoided by following normalization principles (particularly 2NF and 3NF), storing each fact in only one place, and using relationships and joins to retrieve related data when needed.
Anonymous fields
Anonymous fields are attributes with generic or non-descriptive names that don’t indicate their purpose or content. Examples include “Address1/Address2” (which could mean street/city or line1/line2), or “Field1/Field2” which provides no information at all. This obscures the meaning of data and makes the model difficult to use and maintain. This can be avoided by using specific, meaningful names like “StreetAddress” and “City,” establishing and following clear naming conventions, and avoiding generic placeholders in the final design.
Symmetric relationships
A symmetric relationship exists when two entities relate to each other identically with no directional distinction. For instance, if Course A and Course B are co-requisites (must be taken together), neither is truly the “prerequisite” of the other, creating ambiguity. Storing this as two rows (A→B and B→A) causes redundancy and update anomalies. This can be avoided by modeling the relationship as a separate entity (e.g., a Co-Requisite Group entity that links to both courses), which eliminates ambiguity, reduces redundancy, and allows for additional relationship attributes like effective dates.
Weak entity
A weak entity is an entity that cannot be uniquely identified by its own attributes alone and depends on another (owner) entity for identification. For example, a Dependent cannot be uniquely identified without knowing which Employee they belong to, since multiple employees might have dependents with the same name.
Relationship
A relationship represents an association or connection between two or more entities. For example, an Employee “works for” a Department, or a Student “enrolls in” a Course.
Identifying relationship
An identifying relationship is a special relationship that connects a weak entity to its owner entity and helps uniquely identify the weak entity. For example, the relationship between Employee and Dependent is identifying because a Dependent’s identity depends on being associated with a specific Employee (not the other way around - Employee identifies Dependent, not Dependent identifies Employee).
Attribute
An attribute is a property that describes an entity. For example, Name, SSN, and BirthDate are attributes of the Employee entity.
Key attribute
A key attribute (or set of attributes) uniquely identifies each instance of an entity. For example, SSN uniquely identifies each Employee, and StudentID uniquely identifies each Student.
Multivalued attribute
A multivalued attribute can hold multiple values for a single entity instance. For example, an Employee might have multiple phone numbers (work, home, mobile), or a Student might have multiple email addresses (personal, school)
Composite attribute
A composite attribute is an attribute that can be divided into smaller sub-attributes with independent meaning. For example, Address can be broken down into Street, City, State, and ZipCode; or Name can be divided into FirstName, MiddleName, and LastName
Derived attribute
A derived attribute is one whose value can be calculated or derived from other attributes in the database rather than being stored directly. For example, Age can be calculated from BirthDate, or TotalPrice can be calculated from Quantity × UnitPrice.
Total participation in a relationship
Total participation (also called mandatory participation) occurs when every instance of an entity must participate in at least one relationship instance with another entity. For example, if every Employee must work for a Department, then Employee has total participation in the “works for” relationship.
Entity
An entity is a distinct object or concept that you want to store information about in a database. This can be a person (Student, Employee), place (Department), object (Project), or concept (Course).