Decoding a Data Model - Using SchemaSpy in Snowflake

Python SQL Database Data

In following article, we will do on the implementation journey of SchemaPy, leveraging Docker for deployment, and employing a Snowflake database.

Romina Mendez https://example.com/norajones
2023-12-22

In this article, we will delve into a comprehensive exploration of the intricacies of data modeling, spanning from its conceptual inception to its logical definition and eventual physical implementation. Understanding the life cycle of a data model is crucial for efficiently designing and managing databases.

Furthermore, in real-world scenarios, existing databases often necessitate reverse engineering to unveil and comprehend their underlying structures.

In following sections, we will do on the implementation journey of SchemaPy, leveraging Docker for deployment, and employing a Snowflake database.


Data Model

Data modeling is a fundamental task that provides us with a clear understanding of data and the most efficient way to store it. This approach involves representing entities and relationships based on business processes, facilitating documentation and the efficient exploration of data. The ability to generate these documents and understand how data is stored is essential knowledge for any data team. In this context, we will delve into the significance and process of data modeling, emphasizing how this practice becomes a valuable tool for the effective management and exploitation of information.


Conceptual Data Model (CDM)

A conceptual data model (CDM) operates at a high level and offers an organization’s data needs. It defines a broad and simplified view of the data and the purpose of conceptual data modeling is to establish a shared understanding of the business by capturing the essential concepts of a business process. The focus is on abstracting and representing key entities, relationships, and their interdependencies, fostering a common understanding among stakeholders about the fundamental aspects of the business and its data requirements.

Below is an example of a CDM


Logical Data Model (LDM)

A logical data model (LDM) extends the conceptual data model by providing a complete definition of relationships, incorporating details and the structure of essential entities. In summary the LDM encompasses specific attributes for each entity and the relationships between entities. Below is an example of a LDM


Physical data model

A Physical Data Model (PDM) outlines how the model will be translated into an actual database. The PDM incorporates all necessary physical specifications essential for constructing a database, providing a comprehensive guide for database construction, including tables, columns, data types, indexes, constraints, and other implementation details tailored to the chosen database management system.


❄️Snowflake SchemaSpy 🐳 Docker Compose

In this following example, we will implement of SchemaSpy through a Docker image in a Snowflake database. It’s important to note that the implementation can be extended to other databases such as: mysql, PostgreSQL, Oracle, SQL Server, DB2, H2, HSQLDB, SQLite, Firebird, Redshift and Informix.

🚀 Benefits of SchemaSpy


Prerequisites

Before you begin, ensure that you have the following installed:


Usage

📁 Clone this repository

source:r0mymendez/schemaspy-snowflake
git clone https://github.com/r0mymendez/schemaspy-snowflake.git
cd schemaspy-snowflake

❄️ Snowflake Configuration

At this stage, you need to configure the configuration file located at the following path: config/snowflake.properties. This configuration will be used to establish the connection to Snowflake, so it is necessary to modify the account, role, warehouse, schema, and database settings.

schemaspy.t=snowflake
schemaspy.u=<user>
schemaspy.p=<password>
schemaspy.account=<account>
schemaspy.role=<role
schemaspy.warehouse=<warehouse>
schemaspy.db=<database>
schemaspy.s=<schema>

❄️ Snowflake Account

Below is an example of an account URL. However, you only need to use a portion of it in the configuration:

Image description

🐙 Build and run the Docker Compose environment

This command will build the Docker image and start the container.

docker-compose -f docker-compose-snowflake.yaml up

🔍 Schemapy

After executing Docker Compose, you’ll find a web site in the output folder. This page features multiple tabs, each offering distinct functionalities, and below, we will explain them.


Functionality Tabs


SchemaSpy UI

In the provided example, we showcase a demo utilizing a synthetic database called Synthea. Synthea is a state-of-the-art synthetic data generation tool designed for creating realistic, yet entirely fictitious, patient records and healthcare data. It enables the simulation of diverse medical scenarios, making it a valuable resource for testing and development purposes in the healthcare domain. Visit the following 👉link to access a demo👈.


📚 References

If you want to learn…

Other references:

- Image preview reference: [Imagen de vectorjuice en Freepik]

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Mendez (2023, Dec. 22). Romina Mendez: Decoding a Data Model - Using SchemaSpy in Snowflake. Retrieved from https://r0mymendez.github.io/posts_en/2023-12-22-decoding-a-data-model-using-schemaspy-in-snowflake/

BibTeX citation

@misc{mendez2023decoding,
  author = {Mendez, Romina},
  title = {Romina Mendez: Decoding a Data Model - Using SchemaSpy in Snowflake},
  url = {https://r0mymendez.github.io/posts_en/2023-12-22-decoding-a-data-model-using-schemaspy-in-snowflake/},
  year = {2023}
}