Introduction to Data Modeling — 2024 Guide With Problems
What is Data Modeling?
Data modeling is the process of creating the conceptual representation of data and its relationship within an organisation.
Why Data Modeling is crucial?
Clarity: It helps in clearly defining the data and its relationships, reducing ambiguity.
Communication: It serves as a communication tool between business stakeholders, analysts, and IT professionals.
Consistency: It ensures consistency in how data is used and represented across different parts of an organization.
Efficiency: Well-designed data models can lead to more efficient database structures and better performance.
Level of Data Modeling?
Typically, data modeling requires several layers of abstraction, including:
- Conceptual level: The conceptual level involves establishing the high-level entities and relationships in the data model, which often happens through the use of diagrams or other visual representations.
- Logical level: At the logical level, the relationships and constraints between the data items are defined in greater detail, sometimes using data modeling languages such as SQL or ER diagrams.
- Physical level: At the physical level, the specifics of the data storage strategy — such as data types, indexes, and other technical details — are defined.
Conceptual Data Model
Definition: Organized view of database concepts and their relationships without detailing the actual database structure.
Purpose: Establish entities, attributes, and relationships.
Basic Tenets:
- Entity: Real-world thing.
- Attribute: Characteristics or properties of an entity.
- Relationship: Dependency or association between two entities.
Characteristics:
- Organisation-wide coverage of business concepts.
- Developed for a business audience.
- Independent of hardware and software specifications.
- Focus on representing data as users perceive it.
- Establishes a common vocabulary.
Logical Data Model:
Definition: Defines the structure of data elements and relationships, adding details to the conceptual model.
Purpose: Forms the basis for the Physical model.
Characteristics:
- Describes data needs for a single project, with potential integration.
- Developed independently from the DBMS.
- Specifies data attributes with datatypes, precisions, and lengths.
- Applies normalization processes (typically till 3NF).
Physical Data Model:
Definition: Describes a database-specific implementation of the data model, aiding in schema generation.
Purpose: Offers database abstraction, visualizes database structure, and guides implementation.
Characteristics:
- Describes data needs for a single project or application, with potential integration.
- Specifies relationships between tables, addressing cardinality and nullability.
- Developed for a specific version of a DBMS, location, data storage, or technology.
- Defines exact datatypes, lengths, default values for columns.
- Specifies Primary and Foreign keys, views, indexes, access profiles, authorizations, etc.
Problem Set
Problem 1
A recent MBA graduate went to work in his family’s business. As his first job, he was placed in charge of the shoe department at one of the larger stores owned by the family. He immediately found that the information about sales and inventory, which was manually recorded, was inaccurate and out of date. He decided to build a small database to improve the quality of information available within the department. Using the following description, construct an ER diagram that provides an accurate overview of the department’s data.
Shoes are identified by a unique product code. Each shoe style/size/color combination is considered unique. For example, a pair of women’s size 6 black Espirit shoes has product code 56756–6. Size 7 of the same style/color has a different product code. For each shoe (style/size/color), the department stocks several identical pairs. Thus, it must keep track of the quantity of each specific shoe in stock. Shoes are supplied by a variety of suppliers. However, a specific shoe (style/size/color) will be purchased from a single supplier. Suppliers are identified by a unique supplier code. It is not uncommon for a particular supplier to supply many different types of shoes to the store.
There is a need to keep track of customer sales regardless of the type of transaction (cash, credit, etc.). A unique customer number identifies each customer. Additional information, such as name, address, and phone number is also stored for each customer. Different customers have different buying habits. Some customers purchase only a single pair of shoes. Others purchase many different pairs while still others purchase several pair of the identical shoe. All of these types of purchases must be recorded. Additional information about a purchase includes the date of purchase and the quantity purchased.
A unique employee number identifies each salesperson. Each sale is credited to a single salesperson. This information is necessary because part of the salesperson’s compensation is based on a sales commission and the degree to which the salesperson met their sales quota.
Solution
Problem 2
Using the following description, construct an ER diagram. Information needs to be stored about books. Its ISBN (International Standard Book Number) uniquely identifies each book. Other information about a book includes its title and publication date. In addition to book information, there is also information stored about the book’s publisher. This includes a unique publisher identifier, publisher name, and publisher address. A single publisher can only publish a book. Information on the authors of a book is also stored. This information includes the author’s social security number, name, and address.
Either a single author or several authors can write any single book. When the book is printed, it is sent to a printer. Information about the printer includes a unique printer identifier, printer name, and address. A contract is written that indicates the number of books the printer will print and the printing deadline the printer needs to meet. At times, a single book might be contracted to several printers if the quantity required to be printed exceeds the printer’s production capacity.
Solution:
Feel free to leave a comment if you’re interested in additional challenges!
Follow me for more such blogs and also clap if you like it.
Please reach out via Linkedin or Github in case of any questions!
Subscribe to my free newsletter.