Behind the Scenes: The Phases of “The King’s Pantry” Dataset

Phase 1: The Vision

When I started The King’s Pantry, I wanted it to be more than just another themed dataset. My goal was to build something that mirrors real ERP and retail systems, while still being fun and creative. Every decision, from the product hierarchy to tax logic, came from things I’ve actually seen as an analyst in retail and distribution.

So while this project lives in a fictional Westerosi universe, it’s still rooted in business reality — complete with purchase costs, margins, vendors, customers, and sales channels that behave like a true mid-market grocery retailer.

Phase 2: Schema Design Decisions

I started with an ERP-inspired hybrid schema, structured like a star for analytics (for my PowerBI tutorial) but modeled like a snowflake behind the scenes (for SQL practice).

It’s the same foundation used in real retail and distribution systems — designed to evolve and scale just like one.

The focus for Launch 1 was on external stakeholders — the data that drives customer experience, product management, and vendor relationships.

💡 Real-world note:
In many companies, analysts rarely build the fact tables themselves, those are usually prepared by a data engineering or business intelligence team as part of the data warehouse. At least that was my experience so far in my career.

But in smaller organizations (or legacy environments), you often don’t have that luxury. You’re pulling granular transactional data straight from systems like SSMS, aggregating it manually in Excel through Power Query, or reshaping it in Power BI before analysis.

Call me old school, but I actually prefer the latter because it gives you more freedom — and it keeps you closer to the raw story in the data.

The challenge with pre-built fact tables is that they’re often fully aggregated, and the engineering team may keep the backend logic private behind RLS or security measures — so you can’t see how the metrics are actually calculated.

The biggest problem I’ve personally had with this were all the debugging moments I’d have when making DAX measures. If a KPI wasn’t calculating correctly and I was using a calculated field or column where I couldn’t see the actual formula (typically coded in SQL), I’d literally pull my hair out because it meant I’d have to confirm the backend with the data engineering team.

That’s why this dataset is designed to simulate both sides — the structured back end (for those practicing SQL joins and normalization) and the simplified reporting layer (for dashboard modeling and storytelling).

For Launch 1, I focused on the external stakeholders of the realm — the data that drives customer experience, product management, and vendor relationships.

🧺 Product

540 SKUs across 8 major categories, including 40 new “Apothecary & Household Essentials.”
Added item_taxonomy_1 and item_taxonomy_2 to model product hierarchies used in analytics tools like Power BI.
Included boolean flags (is_active_flag, is_stock_item, is_discontinued_flag) so analysts can simulate filtering and segmentation logic in SQL.
Introduced private_label_tier and the private brands dimension (The King’s Pantry Select, Smallfolk Essentials, and Crown Reserve) to mirror how real retailers manage exclusive product lines. Think of Sam’s Club ‘Member’s Mark’ items or Costco’s ‘Kirkland Signature’.
Integrated vendor_id and brand_id for relational integrity, modeling the link between purchasing and merchandising systems.
Embedded freight cost within landed_cost, reflecting real-world pricing where freight is often factored into total COGS.

🏛️ Vendor

Represents all supplier entities, including wholesale producers, guild distributors, and artisan makers.
Added preferred_vendor_flag to enable conditional join exercises and performance reporting.
Designed to link one-to-many with dimProduct for realistic vendor-to-product mapping.

🧍 Customer

Unified B2C (consumer) and B2B (business) customer logic into a single dimension (reflecting how many ERP systems track both under one master table).
Added first_name / last_name for individual consumers, while allowing nulls for business accounts.
Added business attributes such as tax_id, payment_terms, and credit_limit to represent wholesale clients.
Introduced a parent–child hierarchy (parent_customer_id) for roll-ups, allowing analysts to practice self-joins (e.g., aggregating sales by parent guild).
Localized addresses into Westerosi equivalents for immersive storytelling while maintaining analytic structure:
- manor_or_quarter (Address Line 1)
- holdfast_name (Address Line 2)
- town_or_keep (City)
- realm (State/Region)

🧾 Sales_order

Header-level table containing one record per sales transaction.
Uses channel-coded order IDs for realism and practice with pattern-based joins:
- RP-###### for Raven Prime (online)
- MS-###### for Market Stall (in-store)
- GS-###### for Guild Supply (wholesale)
- CC-###### for Court Catering (institutional)
Attributes include:
- sales_order_id, customer_id, order_date, channel_code, subtotal, tax_amount, discount_amount, total_amount.
Designed to teach time intelligence, discount calculations, and channel-based segmentation.

📦 Sales_order_details

Line-level table containing individual product sales.
Includes sales_order_id (FK), product_id, quantity, unit_price, discount, and calculated line_total.
Ties back to dimProduct for category insights and dimVendor for supplier contribution analysis.
Structured to enable exercises on joins, aggregations, and performance metrics such as margin by product, vendor, or category.

🛍️ sales_channel_lookuP

Lookup table for human-readable translation of sales channels.
Codes: CodeChannel NameReal-World EquivalentRPRaven PrimeeCommerceMSMarket StallIn-store / POSGSGuild SupplyB2B / WholesaleCCCourt CateringInstitutional / Catering
Created to model data normalization and to demonstrate how dimension lookups support clean reporting layers in BI tools.

Together, these tables form a scalable base that mirrors a real ERP data warehouse — ready for expansion into Phase 2 (internal stakeholders) with future additions such as:

purchase_orders, purchase_order_details, buyers, and buylines.

🛍️ brand

The brand table brings together both private labels and vendor-supplied brands — a key part of modeling how real grocery and retail businesses differentiate their assortments.

Private Labels:
These are the in-house lines owned by The King’s Pantry itself. Each one serves a different tier of customers, from value to luxury.
- Smallfolk Essentials → everyday affordable staples (Value Tier)
- The King’s Pantry Select → premium curated goods (Premium Tier)
- Crown Reserve → luxury imports and vintages (Luxury Tier)
Vendors:
External suppliers and trade guilds from across the realm — from House Tyrell Provisions (grains and produce from The Reach) to Iron Isles Fisheries (smoked herring and seafood). These represent the merchant partnerships that give the dataset realism and variety.

The brand dimension links directly to the product table via brand_id, which lets analysts:

Compare private label vs. national brand performance.
Analyze vendor contribution by region.
Build margin and assortment reports (e.g., private label share by category).

By separating brand logic from vendor and product attributes, this structure mirrors how real ERP and category management systems track brand hierarchy and ownership.

Fun fact: This addition came from a real-world insight when I did some research — many analysts report that over 25–40% of grocery assortment belongs to private label. Modeling this helps simulate the balance between in-house innovation and vendor partnerships.

This architecture allows The King’s Pantry to evolve naturally into a complete business system: from procurement to customer sale. All in all, as I look back at these tables and see how I created it, it subtly showcases my experience as a pricing analyst, supply chain analyst, and data analyst (which is pretty awesome to see).

Phase 3: Product Strategy & Scale

Initially, I planned for 10 items per category (similar to my Harry Potter dataset) but to make sales insights more realistic, I scaled to ~540 SKUs with weighted category distributions that reflect how grocery assortments work in the real world:

I also created three private label tiers to simulate real-life pricing and margin structures:

Smallfolk Essentials (Value) – Lower price, lower margin, everyday goods
The King’s Pantry Select (Premium) – Balanced price and margin
Crown Reserve (Luxury) – High margin, low volume

Phase 4: Pricing, Margin & Costing

To teach financial analytics concepts, each product includes:

Purchase cost
Landed cost (includes freight markup)
Base price
Margin percentage

In real-world ERP systems, freight is often embedded in landed cost, so we replicated that. This teaches analysts how to work with true cost of goods sold (COGS) data and margin analysis in SQL or Power BI.

Phase 5: Transactional Layer

With the help of my AI Wizard (shout out to you ChatGPT 🫶), we generated 10,000 sales orders and ~50,000 order lines using:

Weighted category probabilities per channel (e.g., more beverages online, more produce in-store).
Sales channels that bring the world to life and mirror real operations:
- 🦅 Raven Prime – Online/eCommerce
- 🏪 Market Stall – In-store / POS
- ⚙️ Guild Supply – B2B wholesale
- 🍽️ Court Catering – Institutional / event clients

Each order ID was given a prefix code (RP-, MS-, GS-, CC-) to simulate how multiple systems generate records differently — just like in real ERP environments where analysts must decode naming conventions.

We also introduced minimum order thresholds for B2B channels, reflecting wholesale minimums (e.g., 300 gold dragons per order).

Phase 6: Documentation & Learning Design

As a reflection tool for myself and a resource for those who may use the dataset, I included a /docs folder with explainers for:

Category weighting logic
Discount and tax rules
Customer segmentation
Channel mapping

This gives learners an inside look into how analytical datasets are actually built and maintained — the “why” behind every number.

Learning Opportunities

Even naming conventions tell a story.

“RP-001043” doesn’t just mean an order — it means an eCommerce transaction with a digital payment, possibly from a loyalty shopper.

By working with this dataset, analysts can learn:

How star schemas are modeled in analytics projects.
How to trace data lineage from customer to product to sale.
How to clean, filter, and join real-world imperfect data.
Why documentation and metadata design matter as much as visuals.

I know this might seem like an area that’s easy to overlook, but I made sure to include proper naming conventions to show analysts just how critical this is to a healthy database.

Out in the real world (believe it or not), there are fully functioning businesses with god-awful naming conventions — or no process at all.

What does that mean?
It means garbage data eventually piles up in the system, especially if no one’s maintaining it.

And you know what garbage data does?
It makes things harder for analysts — like when you have five versions of the same product, all still active, all with different names, and each with its own pricing and sales history.

So what do we analysts hate? Let’s all say it together: GARBAGE DATA.

That’s when you find yourself in full-on “WTF” mode — retracing logs, comparing timestamps, and trying to figure out which item actually belongs in your report.

Sorry, I had some traumatic flashbacks and my coffee is just kicking in 😂

Alas, tangent done.

Closing Thoughts

I started The King’s Pantry because I wanted to challenge myself to see if I could create a fully functioning dataset from scratch that feels real. Somewhere along the way, it turned into a teaching tool — not just for others, but for me too.

Building this dataset reminded me that data modeling is both art and architecture — creative, detailed, and deeply human.

I also found myself reflecting on the sheer complexity of building even a simulated dataset. Between balancing relationships, crafting believable hierarchies, and maintaining business logic across every table, it gave me a whole new level of respect for the datasets we take for granted — like AdventureWorks or Northwind.

Those projects were built by full teams; The King’s Pantry was built by one analyst — and, well, a very helpful AI wizard 🧙‍♂️. It reminded me how much thought, iteration, and testing goes into bringing a fictional world to life with data.

I may not have the resources of a Microsoft dev team, but what I do have is the curiosity and creativity to simulate the process, piece by piece. And honestly, I’m proud of that. And it’s so much fun (sorry not sorry for being a nerd)!

one final reflection thought, i promise

If I’ve learned anything from this project, it’s that data modeling can be as personal as it is technical. Every column tells a story, every join reflects a relationship — and somewhere between the tables and constraints, I found pieces of my own journey stitched quietly into the schema.

Until next time,

Ardonna •ᴗ•

author’s note

Hi, I’m Ardonna Cardines — a data analyst and creator of Mercury Musings, where business meets imagination through data. I love blending analytics, design, and storytelling to make learning data modeling and visualization approachable — one creative dataset at a time.

If you’ve enjoyed this post, I’d love for you to follow along.
You can subscribe to my blog for new tips + tutorials, creative datasets, and behind-the-scenes projects — or connect with me on LinkedIn where I share updates and learning resources for analysts and data storytellers.