<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Pedram's Data Based]]></title><description><![CDATA[Data Based dives deep on the latest in data so you don't have to. It's data content written for data practitioners and those who have the pleasure of working with them.]]></description><link>https://databased.pedramnavid.com</link><image><url>https://substackcdn.com/image/fetch/$s_!Gq30!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2af60e2c-8ad1-48ec-ad43-345f51acbdb3_1280x1280.png</url><title>Pedram&apos;s Data Based</title><link>https://databased.pedramnavid.com</link></image><generator>Substack</generator><lastBuildDate>Tue, 17 Mar 2026 05:33:37 GMT</lastBuildDate><atom:link href="https://databased.pedramnavid.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Pedram Navid]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[pedram@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[pedram@substack.com]]></itunes:email><itunes:name><![CDATA[Pedram Navid]]></itunes:name></itunes:owner><itunes:author><![CDATA[Pedram Navid]]></itunes:author><googleplay:owner><![CDATA[pedram@substack.com]]></googleplay:owner><googleplay:email><![CDATA[pedram@substack.com]]></googleplay:email><googleplay:author><![CDATA[Pedram Navid]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Reflections on 2 Years Running Developer Relations]]></title><description><![CDATA[What is DevRel and do I need one?]]></description><link>https://databased.pedramnavid.com/p/reflections-on-2-years-running-developer</link><guid isPermaLink="false">https://databased.pedramnavid.com/p/reflections-on-2-years-running-developer</guid><dc:creator><![CDATA[Pedram Navid]]></dc:creator><pubDate>Sat, 04 Oct 2025 01:40:53 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Sjaj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F937bc7b0-dfed-4ca7-ad10-eb6f20714156_1024x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Sjaj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F937bc7b0-dfed-4ca7-ad10-eb6f20714156_1024x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Sjaj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F937bc7b0-dfed-4ca7-ad10-eb6f20714156_1024x1536.png 424w, https://substackcdn.com/image/fetch/$s_!Sjaj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F937bc7b0-dfed-4ca7-ad10-eb6f20714156_1024x1536.png 848w, https://substackcdn.com/image/fetch/$s_!Sjaj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F937bc7b0-dfed-4ca7-ad10-eb6f20714156_1024x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!Sjaj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F937bc7b0-dfed-4ca7-ad10-eb6f20714156_1024x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Sjaj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F937bc7b0-dfed-4ca7-ad10-eb6f20714156_1024x1536.png" width="1024" height="1536" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/937bc7b0-dfed-4ca7-ad10-eb6f20714156_1024x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1536,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Sjaj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F937bc7b0-dfed-4ca7-ad10-eb6f20714156_1024x1536.png 424w, https://substackcdn.com/image/fetch/$s_!Sjaj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F937bc7b0-dfed-4ca7-ad10-eb6f20714156_1024x1536.png 848w, https://substackcdn.com/image/fetch/$s_!Sjaj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F937bc7b0-dfed-4ca7-ad10-eb6f20714156_1024x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!Sjaj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F937bc7b0-dfed-4ca7-ad10-eb6f20714156_1024x1536.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>My last day at Dagster has wrapped up, and after two years of building and running developer marketing, I thought I&#8217;d take the time to answer some questions I often get about DevRel and developer marketing. </p><p>The timing feels right to share what I&#8217;ve learned. </p><p>It seems like DevRel has had a resurgence lately. I&#8217;ve seen more discussions on it on X, and many more job postings and recruiter calls trying to fill these roles. I sense that it&#8217;s related to the challenges of breaking through the noise within the latest AI/LLM craze that has been building over the past few years. </p><p>Google Search/SEO is dead; the cost of producing mediocre automated content is essentially zero, and so the question on many people&#8217;s minds is How do we reach developers? </p><p>The answer often defaults to &#8220;hire DevRel,&#8221; and often, I think people have not thought about it more deeply than that. </p><p>This post is a reflection on many things I&#8217;ve learned running DevRel, and hopefully can provide a framework and some guidance on answering questions I hear often.</p><h2>DevRel at Dagster</h2><p>When I first joined Dagster, it was to run the DevRel team, and one of the first things I quickly realized was that no one really knew what the goal of the DevRel team was. Everyone had an opinion on what DevRel should do at Dagster, but no one agreed on that definition. </p><p>The result was a lot of little tasks being completed, but without an overall story or narrative for it to fit under. This inevitably led to people wondering what the point of DevRel even was. </p><p>This is not unique to Dagster at all, and it&#8217;s something I&#8217;ve heard many people tell me, both founders and developer advocates have suffered some version of this story.</p><p>Over the next 2 years, I helped transform the team into one of the best teams I&#8217;ve ever had the pleasure of working with. You&#8217;d be hard-pressed to find a single person at Dagster who wouldn&#8217;t sing the praises of the devrel team, whether they were producing engaging content that brought new leads in, working directly with the community, bringing product feedback back to the R&amp;D team, helping reduce friction points through excellent documentation and educational materials, or just being hilarious online.</p><p>One of the first things I did was have an open-ended discussion at the leadership level about what everyone believed the role of DevRel was at Dagster. The goal was to let the room do the talking and make it apparent to everyone that we don&#8217;t have alignment on the purpose of the team. I wrote a big list on a whiteboard of everything the team was doing and worked with the leadership team to identify the most impactful pieces of work on that list. </p><p>The items fit nicely in my framework for building DevRel teams, which I call the three pillars of DevRel: awareness, product/community, and education. </p><p>With alignment on what matters, I set out to build a team and put in place the processes and routines to help make sure we can execute at a high pace. </p><p>The transformation of the DevRel team was so visible that I was later asked to run the marketing team as well. This turned out to be a pivotal moment. <strong>The more I worked on marketing, the more I realized the fundamental truth of DevRel: that DevRel is just marketing.</strong></p><p>Combining the DevRel and marketing teams was not something I was convinced would work, but after having done it, there was a real (dare I say?) synergy between the teams that, in hindsight, was the absolute right decision. </p><p>A strong DevRel team that understands the product and the consumer, paired with domain experts in marketing who can run campaigns, events, and other marketing functions, is the missing glue that can take a DevRel team from a nice-to-have to a core revenue-generating part of the business. </p><p>With that out of the way, let&#8217;s get into it.</p><h2>What is Developer Relations?</h2><p>DevRel, or developer relations, goes by many names: DevRel, developer advocates, developer marketing, evangelists, digital prophets, and so on. </p><p>I don&#8217;t think there&#8217;s universal agreement on what it all means, but here&#8217;s my general framework:</p><p><strong>Developer Advocates:</strong> the people you hire who advocate on behalf of your product. Also sometimes called a DevRel Engineer.</p><p><strong>Developer Relations: </strong>the overall practice of engaging with your developers along their developer journey. This could be a team, a department, or just an ephemeral idea within the organization. Developer Relations, to me, is a subset of developer marketing. </p><p>To understand what Developer Relations actually encompasses, it helps to visualize the full developer journey. I particularly like James Parton&#8217;s map from his <a href="https://www.devrel.agency/book">book</a> that describes this journey. To me, Developer Relations covers the full range of the journey from discovery to scaling.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!V96C!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf4f0c3f-6a0c-4eae-8d68-9b7ebf1f9c3a_700x583.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!V96C!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf4f0c3f-6a0c-4eae-8d68-9b7ebf1f9c3a_700x583.png 424w, https://substackcdn.com/image/fetch/$s_!V96C!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf4f0c3f-6a0c-4eae-8d68-9b7ebf1f9c3a_700x583.png 848w, https://substackcdn.com/image/fetch/$s_!V96C!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf4f0c3f-6a0c-4eae-8d68-9b7ebf1f9c3a_700x583.png 1272w, https://substackcdn.com/image/fetch/$s_!V96C!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf4f0c3f-6a0c-4eae-8d68-9b7ebf1f9c3a_700x583.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!V96C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf4f0c3f-6a0c-4eae-8d68-9b7ebf1f9c3a_700x583.png" width="700" height="583" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/af4f0c3f-6a0c-4eae-8d68-9b7ebf1f9c3a_700x583.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:583,&quot;width&quot;:700,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!V96C!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf4f0c3f-6a0c-4eae-8d68-9b7ebf1f9c3a_700x583.png 424w, https://substackcdn.com/image/fetch/$s_!V96C!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf4f0c3f-6a0c-4eae-8d68-9b7ebf1f9c3a_700x583.png 848w, https://substackcdn.com/image/fetch/$s_!V96C!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf4f0c3f-6a0c-4eae-8d68-9b7ebf1f9c3a_700x583.png 1272w, https://substackcdn.com/image/fetch/$s_!V96C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf4f0c3f-6a0c-4eae-8d68-9b7ebf1f9c3a_700x583.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Developer Marketing: </strong>a marketing function that is focused on marketing to developers. Some people believe this is just marketing. I do think developer marketing requires a unique set of skills that differentiates itself from &#8216;traditional marketing&#8217;. Some might argue this is just marketing. While Developer Relations is a large part of developer marketing, many associated functions within marketing don&#8217;t belong there: marketing ops, event logistics, partner programs, marketing campaigns, and so on.</p><p>The controversial part of this definition is that it presupposes that DevRel is, in essence, marketing. I believe the sooner you accept that, the easier it is for everyone. </p><p>In some ways, the name &#8220;DevRel&#8221; is marketing a marketing role to engineers who would never consider a marketing job. </p><h2>The Three Pillars of DevRel</h2><p>As I think of building DevRel teams, I think along three pillars and how to ensure that each is well-served. The pillars are awareness, product/ community, and education. There are other models as well: <a href="https://www.swyx.io/measuring-devrel#what-kind-of-devrel-are-you">swyx, for example, has three</a>. </p><p>The pillars are not individuals. You might find one hire does well in some but not others, and you might choose to ignore some to focus on another. I do think the pillars have a natural order of importance. If no one is aware of your product, then education isn&#8217;t going to be much of a concern. You need awareness before education can have meaningful impact.</p><h3>Awareness</h3><p>This is generally the first reason people hire developer advocates. They have a product for developers, and they want more people to know about it. Awareness, when done well, looks like engaging content created for developers about things they care about. Awareness, when done poorly, looks like astroturfed Reddit comments that get you banned from /r/somecategory.</p><p>I could write a whole book on awareness, but awareness for DevRel is not that different from general marketing awareness. The reason we hire DevRel for this is that you want people from the community who understand the product and the audience, who can find engaging ways to interact with them. </p><p>The goals of awareness are fairly straightforward: increase the number of people who know what your product is, and once they know of you, help them become aware of what your product does.</p><p>Don&#8217;t become preoccupied with a single channel. You need multi-channel content that reaches developers where they are. You must first understand your audience and who they actually are. What do they do for fun? What problems do they have at work? Where do they look up information? Marketing and Positioning 101 stuff.</p><p>At Dagster, our audience was on Reddit, X, and LinkedIn. They were also at conferences and meetups. So that&#8217;s where we showed up. </p><p>Don&#8217;t make the mistake of thinking you can create awareness by talking about how great you are. No one cares. Show up in ways that show how helpful you are. Create content no one else is creating. Everyone is creating the easy stuff. &#8220;What is a database?&#8221; Who cares. &#8220;I spent 10 hours testing the top three database vendors, here&#8217;s my deep dive&#8221;. Hell yes. More of that, please.</p><p>Create content that goes ahead of your product. What do people care about before they learn about what you do? At Dagster, data engineers decide to buy Dagster only after they&#8217;ve decided to build a data platform. So we created an ebook about <a href="https://dagster.io/how-to-build-data-platforms-ebook">building data platforms.</a> It became one of our top-performing assets. Why was it successful? Because it wasn&#8217;t easy to do.</p><p>We printed it and brought it to conferences, and it flew off the shelves. Colton, who co-authored the book, even did a book signing at one of our conferences. You would not believe the lineup.</p><p>You should almost always start with an awareness-focused hire, and they should be spending half their time showing people how they can solve problems with your product, and the other half creating content about problems people face before using your product.</p><h3>Product &amp; Community</h3><p>I group product and community because they are so intertwined. Product is all about contributing to the development of the product, whether through deep and thoughtful feedback, or through actual code or integrations. Community is all about staying close to your users, whether free or paid, to maintain a pulse on what the broader community vibes are, and feeding this information regularly back to both R&amp;D and leadership.</p><p>This impactful type of work of giving frequent and in-depth feedback on the product, means your developer advocates need to be able to understand the product and be given enough time to use it in their day-to-day job. </p><p>A great exercise for a new hire is to have them run through onboarding and create a friction log of everywhere they got stuck. Chances are, your engineers have forgotten how difficult it is to use the product, given how intimately familiar they are with the code. Odds are they never need to pull up the docs to understand what feature X is or how to do Y. </p><p>This is why it&#8217;s so critical to hire DevRel from the community. I&#8217;ll talk more about finding DevRel later on in this post.</p><p>But product is not just about feedback; great DevRel hires go deeper. They contribute to the product itself. Especially with AI assistance now, the ability for developer advocates to work like an R&amp;D team means added capacity to make the product better. One of the highest impact things our team had done at Dagster was create integrations that were requested by the community: you can see our impact on things like the <a href="https://docs.dagster.io/api/libraries/dagster-sling">dagster-sling library</a> or <a href="https://docs.dagster.io/integrations/libraries/airlift">Airlift</a>. </p><p>Airlift is a particularly great example. I built a throwaway proof-of-concept of being able to peer into an Airflow instance from Dagster, and the POC was so compelling that an engineer took it and made it into the powerful integration it is today. </p><p>Beyond integrations, DevRel teams can also create additional tooling that helps solve common user pain points. This is often work that an engineering team can&#8217;t justify, but can be a great way for a developer advocate to stay current on their technical skills, while also contributing back to the product and community. A win-win.</p><h3>Education</h3><p>Education is the final pillar, and it requires a specific mindset and persona that is hard to find but life-changing when you do. Often, this person might be a technical writer, or maybe they create online educational courses. In our case, we have both. </p><p>I cannot over-emphasize the importance of good documentation. It is how developers learn about what your product actually does. We developers are trained to skip the marketing headlines and go straight to the docs to see what this product actually looks like, and what it can do. Anytime I go to a product&#8217;s website and view the docs and see mostly unfinished content, placeholders, and boilerplate, I just know they&#8217;re bleeding traffic. </p><p>Your engineers are not good at writing docs. They don&#8217;t want to do it, and they don&#8217;t know how. They will write docs if you tell them to, but they do not think about the overall information architecture that makes a complex product easier to understand. </p><p>Hire a good technical writer. </p><p>If your product is complex, documentation may not be enough. You may also need structured online courses, cookbooks, how-to guides, code examples and snippets, tutorials, example projects, and more. Keeping all of these up-to-date and deprecating where possible is a full-time job. Hire for it.</p><h2>Should I hire a Developer Advocate?</h2><p>I&#8217;m so glad you asked. Most don&#8217;t. You should start by asking why you want to hire one. It&#8217;s the same argument Claire Carroll makes in her post <a href="https://clrcrl.com/2021/03/03/how-to-build-a-community-why.html">&#8220;How to build a community&#8217;</a> so just read that and replace community with DevRel and you&#8217;ll get the same experience.</p><p>You should really know why you are hiring for this role, what you want them to accomplish, and how you will enable them to be successful. That is your job.</p><p>I&#8217;ve seen way too many cases of someone hiring a junior person to &#8216;do devrel&#8217;; they get very little guidance on what that means, and before you know it, the person is let go, and devrel is deemed a failure.</p><p>Here are some great questions to ask yourself first:</p><ul><li><p>Is my product ready for a DevRel hire? Do I already have some product market fit? </p></li><li><p>Will I hire a senior person or a junior person? If I hire a senior person, will I give them clear guidance on what I want them to accomplish? If I hire a junior person, I am prepared to meet with them regularly to help guide them on what they should be working on. </p></li><li><p>What part of the business needs the most help? Do we need more people to know about our product? Or do we have a lot of complaints about how complicated our product is to use?</p></li><li><p>Am I willing to invest in this for the long term? Or am I going to hire one person to do everything, burn them out, and wonder why they quit after six months?</p></li></ul><h2>Where do I hire DevRel?</h2><p>Assuming you decided you want to hire one, how do you go about finding someone to hire? </p><p>The number one advice I give here is to avoid looking for career devrel people. It is so much better to hire people in your actual developer community who have been doing the actual work of developers, but that show the signs of a promising devrel talent. They may share their projects online, they might have created great blog posts, they may just be really funny on X. It&#8217;s so much better to pull from the community than to look for someone with 5 years of DevRel experience.</p><p>I say this as someone with several years of DevRel experience, too. Some red flags to look for:</p><ul><li><p>They spend most of their public persona talking about themselves, and not the product or problem.</p></li><li><p>They can&#8217;t code their way out of a cardboard box.</p></li><li><p>They don&#8217;t come from the same domain as your product. Don&#8217;t hire someone who&#8217;s been doing front-end devrel for your security product. </p></li></ul><p>Some green flags to look for:</p><ul><li><p>A few blog posts that go deep on a topic</p></li><li><p>A history of talks, presentations, and other public speaking roles</p></li><li><p>GitHub profile with actual projects that pass the sniff test, not just random forks.</p></li><li><p>Named Pedram</p></li></ul><p>Where do you find these magical devrel people? Don&#8217;t go looking in &#8220;DevRel&#8221; communities, or lists of Top 100 DevRel of 2025. Look in your community. Find users of your product or users of your competitor&#8217;s product. Find a blog post someone wrote that you really like. </p><p>How do you convince them to join? Reach out to them, personally. Tell them who you are, how you found them, and if they&#8217;d consider working for you. My best hires were from directly reaching out to candidates with promising potential. </p><h2>What does success look like? Why does DevRel get a bad rap? Why is it so hard?</h2><p>These are all related questions, and I think they boil down to a common theme. It is really hard to measure DevRel. In fact, before combining the DevRel and Marketing teams, the only good measurement I had was vibes. </p><p>There were all sorts of metrics on productivity, and I even maintained a Notion database of everything we did, from blog posts to webinars to YouTube videos. But I try not to focus too much on engagement metrics. 100 highly qualified readers of a post can be 10,000 low-quality engagement bait readers.</p><p>However, once we brought marketing and devrel together under one roof, it did become easier to measure the effectiveness of the  team through marketing campaigns. </p><p>Not everything DevRel does falls under a marketing campaign, but many things do: webinars bring in leads that drive pipeline, social posts drive awareness and engagement that help us build lead lists, and new integrations bring new users who previously weren&#8217;t qualified. </p><p>It&#8217;s still not a perfect measure. I don&#8217;t think you can measure the impact on the pipeline of your docs, for instance. </p><p>I think the measurement questions are largely a red herring. The reason I think DevRel teams fall apart isn&#8217;t because they can&#8217;t measure their success through metrics, but because they haven&#8217;t defined the goals of the team up front. </p><p>If DevRel fits into the broader GTM strategy, and there&#8217;s clear alignment on how DevRel supports the overall company&#8217;s goals, then the only question is whether or not the team is executing well. You don&#8217;t measure engineers by lines of code written, but by how much the CEO loves the engineering org. I think the same can be said of DevRel. Does everyone at the company love working with them? Could they imagine not having them around? </p><p>A good DevRel team feels indispensable. A bad DevRel team fights over attribution of leads on a blog post. The problem is usually not the metrics, but something deeper.</p><h2>Proving the Value of the Investment</h2><p><a href="https://x.com/mehd_io/status/1974030358183088520">Mehdi</a> asks: <em>What I realized is that devrel takes time to build. Even if you have the right people. It&#8217;s hard to get short-term/mid-term results. How do you convince higher-ups that it&#8217;s worth the investment?</em></p><p>I think of this the same way I think of joining a new company or running a marketing team. You have to start with quick wins to build trust, and only then will you earn the right to work on things that take time to build.</p><p>The good news is that it&#8217;s not that hard to get quick wins within DevRel. Something as simple as hosting a biweekly/monthly webinar, writing a newsletter, and getting a few hundred subscribers, creating a 3-part YouTube series on the product, are all short-term, achievable tasks that can demonstrate your ability to perform.</p><p>Once you&#8217;ve built that trust, it&#8217;s about selling the story of your big idea. I often talk about how, in the age of AI, anyone can produce something easily. Undifferentiated content might as well be no content. </p><p>Developers want deeply technical, well-produced, engaging content. If your goal is to bring in a new audience, you need big bets to hit that next step function of success. Incremental improvements alone won&#8217;t get you there.</p><p>My advice is to plan out a quarterly roadmap, reserve 35-65% of the time for a big-bet project, and the rest for smaller quick wins throughout the quarter. Get alignment up front, and try to address concerns ahead of time. Three weeks in, when the exec team forgets, remind them that you&#8217;re cooking something that&#8217;s going to be great. Find a way to funnel new requests so that they feel heard without thinking every request gets prioritized ahead of what&#8217;s in flight.</p><p></p><p></p>]]></content:encoded></item><item><title><![CDATA[The Cost of Costing Nothing]]></title><description><![CDATA[The emotional labor of reading your Chat]]></description><link>https://databased.pedramnavid.com/p/the-cost-of-costing-nothing</link><guid isPermaLink="false">https://databased.pedramnavid.com/p/the-cost-of-costing-nothing</guid><dc:creator><![CDATA[Pedram Navid]]></dc:creator><pubDate>Sun, 03 Aug 2025 17:19:39 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-4uk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e96fd19-ad5b-471e-b9fd-2139a210b3ce_832x710.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you missed it, a few days ago someone shared ChatGPT logs that were made public on the internet. While OpenAI was quick to fix it by removing these from Google&#8217;s index, the chats remain archived by WayBackMachine. Many were upset by this, but what really upsets me is that people are sharing ChatGPT conversations at all.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-4uk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e96fd19-ad5b-471e-b9fd-2139a210b3ce_832x710.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-4uk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e96fd19-ad5b-471e-b9fd-2139a210b3ce_832x710.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-4uk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e96fd19-ad5b-471e-b9fd-2139a210b3ce_832x710.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-4uk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e96fd19-ad5b-471e-b9fd-2139a210b3ce_832x710.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-4uk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e96fd19-ad5b-471e-b9fd-2139a210b3ce_832x710.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-4uk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e96fd19-ad5b-471e-b9fd-2139a210b3ce_832x710.jpeg" width="832" height="710" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9e96fd19-ad5b-471e-b9fd-2139a210b3ce_832x710.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:710,&quot;width&quot;:832,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-4uk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e96fd19-ad5b-471e-b9fd-2139a210b3ce_832x710.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-4uk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e96fd19-ad5b-471e-b9fd-2139a210b3ce_832x710.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-4uk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e96fd19-ad5b-471e-b9fd-2139a210b3ce_832x710.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-4uk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e96fd19-ad5b-471e-b9fd-2139a210b3ce_832x710.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>When 3.5 came out, I remember everyone would send anyone who would listen a poem, a rap, or some other trite garbage written by a clanker. While that novelty wore off quickly, what hasn&#8217;t changed is that people are still sharing much more mundane chat&#8217;s with each other.</p><p>I am sure we have all been a victim of the emotional burden of being sent someone else&#8217;s research, analysis, discussion, or meandering thought. I know every time I get one of these, I can immediately feel my blood boil, and I&#8217;ve narrowed down the cause.</p><p><strong>The cost of thinking has been offloaded to the consumer, not the producer.</strong> </p><p>Let&#8217;s say you had some project at work. It used to be that you would do your analysis and research, capture your thoughts, synthesize information, and then send that work out for feedback. But more and more, people are simply throwing the problem and data into a model, and sending the model&#8217;s results (either via a link, or just raw copy and paste) to others. The idea is that it should be their job to synthesize information.</p><p>It&#8217;s the offloading of emotional labor to the person who is receiving information, not the person producing it. And with AI, the cost of doing more things has fallen to essentially zero. It&#8217;s my belief that things having some cost is actually a feature of a system, not a bug. It&#8217;s easy to come up with ideas, it&#8217;s doing something about that is actually hard. </p><p><strong>Everyone wants a piece of the pie. Nobody wants to bake.</strong></p><p>People have been primed to be rewarded for accomplishing what feels like the task: getting analysis complete. But the real hard work, the work where value is actually created, was never done. </p><h2>The smartest intern</h2><p>In many ways, the very nature of how we build and train AI systems is to create the worlds smartest intern. There is no experience in these models, the data consists of the entirety of written data available to it. But when we think of why we hire experienced engineers, product managers, and staff it&#8217;s because we know they&#8217;ve learned through experience what could not be learned in books. </p><p>What we are moving toward though, is a new playing field where people rely not on their experience and judgement, but on the average next token, to drive decisions. I think in this environment, more than ever, the value of good taste will rise. It takes a certain annoying type of person, one with good taste and the confidence to assert it, to push back against AI slop dominated work. The forces are often against them, as AI-produced work feels fast, feels important, feels useful. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rBVf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9474c25f-d593-4237-a3e5-b85addc6c9ee_1279x719.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rBVf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9474c25f-d593-4237-a3e5-b85addc6c9ee_1279x719.jpeg 424w, https://substackcdn.com/image/fetch/$s_!rBVf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9474c25f-d593-4237-a3e5-b85addc6c9ee_1279x719.jpeg 848w, https://substackcdn.com/image/fetch/$s_!rBVf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9474c25f-d593-4237-a3e5-b85addc6c9ee_1279x719.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!rBVf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9474c25f-d593-4237-a3e5-b85addc6c9ee_1279x719.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rBVf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9474c25f-d593-4237-a3e5-b85addc6c9ee_1279x719.jpeg" width="1279" height="719" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9474c25f-d593-4237-a3e5-b85addc6c9ee_1279x719.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:719,&quot;width&quot;:1279,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!rBVf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9474c25f-d593-4237-a3e5-b85addc6c9ee_1279x719.jpeg 424w, https://substackcdn.com/image/fetch/$s_!rBVf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9474c25f-d593-4237-a3e5-b85addc6c9ee_1279x719.jpeg 848w, https://substackcdn.com/image/fetch/$s_!rBVf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9474c25f-d593-4237-a3e5-b85addc6c9ee_1279x719.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!rBVf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9474c25f-d593-4237-a3e5-b85addc6c9ee_1279x719.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>And this is not to say AI is not a useful assistant in day-to-day knowledge work. I use it all the time in brainstorming, outlining, reviewing, and essentially acting as a much better search engine. </p><p>But what I don&#8217;t do is take AI-output and present it to others as if I had something worth sharing. </p><p>Even me, the vibe-coding lover, who was one-shotting with Claude, find myself frequently deleting more code and writing it myself. I created applications that almost work but are impossible to debug or understand. Largely because the way they were structured doesn&#8217;t make sense to me, even if they do make sense to a clanker. </p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B-Al!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00b66742-80de-4ccd-bfd9-d66119d6e051_444x144.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B-Al!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00b66742-80de-4ccd-bfd9-d66119d6e051_444x144.jpeg 424w, https://substackcdn.com/image/fetch/$s_!B-Al!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00b66742-80de-4ccd-bfd9-d66119d6e051_444x144.jpeg 848w, https://substackcdn.com/image/fetch/$s_!B-Al!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00b66742-80de-4ccd-bfd9-d66119d6e051_444x144.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!B-Al!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00b66742-80de-4ccd-bfd9-d66119d6e051_444x144.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B-Al!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00b66742-80de-4ccd-bfd9-d66119d6e051_444x144.jpeg" width="444" height="144" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/00b66742-80de-4ccd-bfd9-d66119d6e051_444x144.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:144,&quot;width&quot;:444,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!B-Al!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00b66742-80de-4ccd-bfd9-d66119d6e051_444x144.jpeg 424w, https://substackcdn.com/image/fetch/$s_!B-Al!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00b66742-80de-4ccd-bfd9-d66119d6e051_444x144.jpeg 848w, https://substackcdn.com/image/fetch/$s_!B-Al!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00b66742-80de-4ccd-bfd9-d66119d6e051_444x144.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!B-Al!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00b66742-80de-4ccd-bfd9-d66119d6e051_444x144.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Who knows, I could be wrong. Maybe the future is offloading all our thinking to the robots, and I&#8217;m the one that is going to be left behind. </p><p></p>]]></content:encoded></item><item><title><![CDATA[All I want in AI is some context and a chat window]]></title><description><![CDATA[Why curated context is the missing piece for enterprise AI adoption.]]></description><link>https://databased.pedramnavid.com/p/all-i-want-in-ai-is-some-context</link><guid isPermaLink="false">https://databased.pedramnavid.com/p/all-i-want-in-ai-is-some-context</guid><dc:creator><![CDATA[Pedram Navid]]></dc:creator><pubDate>Thu, 22 May 2025 16:40:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2af60e2c-8ad1-48ec-ad43-345f51acbdb3_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>My personal AI journey has mirrored the Gartner hype cycle we all know and love. While I&#8217;ve started much of my AI work through the lens of coding and software engineering, thinking about how life has changed forever, what I&#8217;ve found more useful in my day-to-day leading marketing is other, simpler use-cases for AI. </p><p>I know some of you believe that AI will come for your jobs and replace you, and while I don&#8217;t share that view, I can see how, if you extrapolate the progress we&#8217;ve made over the past twenty years, it seems inescapable. </p><p>I am not great at that type of long-term vision, but I can see what might be around the corner in the short term. I <a href="https://databased.pedramnavid.com/p/we-need-to-talk-about-dbt">begged for dbt</a> but better a few years back, and then dbt acquired SDF. I wrote about <a href="https://databased.pedramnavid.com/p/the-rise-of-the-data-platform-engineer">the rise of Data Platform Engineering</a> and how it requires a framework approach to enable teams, and then the team at Dagster shipped <a href="https://dagster.io/blog/accelerate-data-pipeline-development-with-dagster-components">components</a>. </p><p>My barometer for the future is loosely based on what I need today that I don&#8217;t see great solutions for, rather than extrapolating a world fundamentally different from what we have today. In that vein, I&#8217;m going to achieve progress through more complaints.</p><h3>It&#8217;s the interface, stupid</h3><p>First, I&#8217;ve seen firsthand the potential of LLMs to fill in the gaps of lower-priority work that never seems to get done. Work that often required a little technical and domain expertise, but not a lot. Work that was just annoying enough to defer until later, not important enough to prioritize strategically, but still valuable in its own right.</p><p>One of the most used things I&#8217;ve built is exceedingly simple: Help our sellers self-serve collateral for their sales calls using ChatGPT&#8217;s GPTs feature. I used to think the GPTs feature was a hokey thing with no real value, but I&#8217;ve realized that its superpower is its extremely simple and easy-to-use interface. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ldqA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fb0baf8-caf0-474b-b918-8ec11e39e923_1824x1222.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ldqA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fb0baf8-caf0-474b-b918-8ec11e39e923_1824x1222.png 424w, https://substackcdn.com/image/fetch/$s_!ldqA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fb0baf8-caf0-474b-b918-8ec11e39e923_1824x1222.png 848w, https://substackcdn.com/image/fetch/$s_!ldqA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fb0baf8-caf0-474b-b918-8ec11e39e923_1824x1222.png 1272w, https://substackcdn.com/image/fetch/$s_!ldqA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fb0baf8-caf0-474b-b918-8ec11e39e923_1824x1222.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ldqA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fb0baf8-caf0-474b-b918-8ec11e39e923_1824x1222.png" width="1456" height="975" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2fb0baf8-caf0-474b-b918-8ec11e39e923_1824x1222.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:975,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:209319,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://databased.pedramnavid.com/i/164133136?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fb0baf8-caf0-474b-b918-8ec11e39e923_1824x1222.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ldqA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fb0baf8-caf0-474b-b918-8ec11e39e923_1824x1222.png 424w, https://substackcdn.com/image/fetch/$s_!ldqA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fb0baf8-caf0-474b-b918-8ec11e39e923_1824x1222.png 848w, https://substackcdn.com/image/fetch/$s_!ldqA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fb0baf8-caf0-474b-b918-8ec11e39e923_1824x1222.png 1272w, https://substackcdn.com/image/fetch/$s_!ldqA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fb0baf8-caf0-474b-b918-8ec11e39e923_1824x1222.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>A chat interface in a web app remains the easiest way for less technical people to interact with AI tooling. Even Slack, which seems like a great way to chat, does not make a great interface for LLMs. The responses are too long, and the Canvas feature in ChatGPT and Claude is a necessity. Of course, you can build your own chatbot, or use Slack, or a terminal, or API calls, or a sidebar in your code editor, or create agents with Aider, or a million other possibilities, but ChatGPT&#8217;s interface remains the simplest way to work with an LLM.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SD56!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae853af0-3e71-4af5-a26c-a2a9f8058120_986x1070.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SD56!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae853af0-3e71-4af5-a26c-a2a9f8058120_986x1070.png 424w, https://substackcdn.com/image/fetch/$s_!SD56!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae853af0-3e71-4af5-a26c-a2a9f8058120_986x1070.png 848w, https://substackcdn.com/image/fetch/$s_!SD56!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae853af0-3e71-4af5-a26c-a2a9f8058120_986x1070.png 1272w, https://substackcdn.com/image/fetch/$s_!SD56!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae853af0-3e71-4af5-a26c-a2a9f8058120_986x1070.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SD56!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae853af0-3e71-4af5-a26c-a2a9f8058120_986x1070.png" width="504" height="546.9371196754564" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ae853af0-3e71-4af5-a26c-a2a9f8058120_986x1070.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1070,&quot;width&quot;:986,&quot;resizeWidth&quot;:504,&quot;bytes&quot;:209836,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://databased.pedramnavid.com/i/164133136?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae853af0-3e71-4af5-a26c-a2a9f8058120_986x1070.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SD56!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae853af0-3e71-4af5-a26c-a2a9f8058120_986x1070.png 424w, https://substackcdn.com/image/fetch/$s_!SD56!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae853af0-3e71-4af5-a26c-a2a9f8058120_986x1070.png 848w, https://substackcdn.com/image/fetch/$s_!SD56!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae853af0-3e71-4af5-a26c-a2a9f8058120_986x1070.png 1272w, https://substackcdn.com/image/fetch/$s_!SD56!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae853af0-3e71-4af5-a26c-a2a9f8058120_986x1070.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Even Slack, which is excellent for chat, doesn&#8217;t make for a great LLM interface today.</figcaption></figure></div><h3>Context is king</h3><p>The interface is half the equation. The other missing piece is providing relevant and curated context. ChatGPT&#8217;s GPTs feature fails miserably here, and very few tools I&#8217;ve used so far provide a good way to manage this.</p><p>For LLM adoption to really take root in organizations, people like me will try to build tooling for others. I&#8217;ve found that I need control over the context I feed into these tools for there to be any chance of success.</p><p>One of the first things I&#8217;ve built was our AskAI Slack bot, which Scout powers. It&#8217;s been a powerful tool for helping our community self-serve by feeding an otherwise useless LLM relevant context that completely changes how it operates.</p><p>Using a plain LLM to ask it a question about Dagster inevitably leads to wrong answers, hallucinations, and deprecated APIs. Feeding that same LLM our docs, GitHub Issues, and Discussions completely transforms it into a valid and sometimes correct assistant. </p><p>In this specific case, the context problem is fairly straightforward: scrape the website daily, run a daily job that fetches issues and discussions updated on the last day, and update the document store in Scout. Feed the 10 most relevant documents as context to the LLM and call it a day.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4Ou0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c686d7f-8e95-47dd-93d3-39865a714397_1146x788.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4Ou0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c686d7f-8e95-47dd-93d3-39865a714397_1146x788.png 424w, https://substackcdn.com/image/fetch/$s_!4Ou0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c686d7f-8e95-47dd-93d3-39865a714397_1146x788.png 848w, https://substackcdn.com/image/fetch/$s_!4Ou0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c686d7f-8e95-47dd-93d3-39865a714397_1146x788.png 1272w, https://substackcdn.com/image/fetch/$s_!4Ou0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c686d7f-8e95-47dd-93d3-39865a714397_1146x788.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4Ou0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c686d7f-8e95-47dd-93d3-39865a714397_1146x788.png" width="552" height="379.5602094240838" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4c686d7f-8e95-47dd-93d3-39865a714397_1146x788.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:788,&quot;width&quot;:1146,&quot;resizeWidth&quot;:552,&quot;bytes&quot;:124955,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://databased.pedramnavid.com/i/164133136?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c686d7f-8e95-47dd-93d3-39865a714397_1146x788.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4Ou0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c686d7f-8e95-47dd-93d3-39865a714397_1146x788.png 424w, https://substackcdn.com/image/fetch/$s_!4Ou0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c686d7f-8e95-47dd-93d3-39865a714397_1146x788.png 848w, https://substackcdn.com/image/fetch/$s_!4Ou0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c686d7f-8e95-47dd-93d3-39865a714397_1146x788.png 1272w, https://substackcdn.com/image/fetch/$s_!4Ou0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c686d7f-8e95-47dd-93d3-39865a714397_1146x788.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Where this model doesn&#8217;t work as well is the rest of the knowledge created at companies. I have a positioning guide in Notion that I regularly update, we have case studies in a Google Drive folder, our website has the latest information on pricing and features, and there&#8217;s a Google Slides presentation someone once shared that I can never find that has our public-facing roadmap. And right next to all that helpful information is hoards of information I don&#8217;t want anything to do with. </p><p>I can upload PDFs to GPTs but this means I need to constantly remember to update them as I update our documentation. Apps like Slack and Notion and Zoom seem to promise to solve this, but most everything-apps end up being everything-sucks-apps. </p><p>Curating this knowledge meaningfully with tooling that can help me easily create interfaces to an LLM that everyone at a company can interact with seems simple, but remains unsolved.</p><p>The problem remains the N integrations that need to be built. I talked about documents and emails, but there&#8217;s also a repository of knowledge in systems of truth. </p><p>Imagine a sales rep wants to craft an email to a prospect who recently attended our event. I&#8217;d want to give that rep access to not only our differentiated messaging and positioning, but also context from Salesforce and Hubspot on the account, the campaigns and events, previous conversations, recent activity, and other internal knowledge. I might also want to run an enrichment against Common Room and Clay to understand better how this person interacted with us on GitHub, Slack, and LinkedIn, or to enrich their profile with relevant details from their company&#8217;s website or public filings. </p><p>Right now, I am stitching together a half-dozen pieces of tooling to make this work, and the interfaces I have are fairly poor. ChatGPT&#8217;s GPTs don&#8217;t work well with this model, so I am forced to use Slack bots and duct tape. I don&#8217;t want to build this from scratch, because every use case is slightly different.</p><p>I am sure that eventually, we&#8217;ll see AI workflow cloud platforms that bridge this gap. And when that happens, I&#8217;ll be sure to take credit for their success. </p><p> </p>]]></content:encoded></item><item><title><![CDATA[It's not enough to be right]]></title><description><![CDATA[Early in my career, I was rewarded for being right.]]></description><link>https://databased.pedramnavid.com/p/its-not-enough-to-be-right</link><guid isPermaLink="false">https://databased.pedramnavid.com/p/its-not-enough-to-be-right</guid><dc:creator><![CDATA[Pedram Navid]]></dc:creator><pubDate>Wed, 04 Dec 2024 18:14:17 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Pwc0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd442407b-b8cc-4a44-8aad-cb80f945fd9c_2807x3786.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Early in my career, I was rewarded for being right. As a data scientist, I worked hard to understand some business processes, make predictions, model some outcomes, and help forecast some future states. As a data engineer, my ability to properly forecast future usage of a system and build systems that are resilient and scalable to that future state while being easy to maintain helped me grow in my career.</p><p>The better I was at being right, the better I was at my job. Many people can get stuck in this, and I&#8217;ve seen too often people frustrated with the fact that simply being right is not sufficient to propel organizational change.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Pwc0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd442407b-b8cc-4a44-8aad-cb80f945fd9c_2807x3786.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Pwc0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd442407b-b8cc-4a44-8aad-cb80f945fd9c_2807x3786.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Pwc0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd442407b-b8cc-4a44-8aad-cb80f945fd9c_2807x3786.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Pwc0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd442407b-b8cc-4a44-8aad-cb80f945fd9c_2807x3786.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Pwc0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd442407b-b8cc-4a44-8aad-cb80f945fd9c_2807x3786.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Pwc0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd442407b-b8cc-4a44-8aad-cb80f945fd9c_2807x3786.jpeg" width="413" height="557.0961538461538" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d442407b-b8cc-4a44-8aad-cb80f945fd9c_2807x3786.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1964,&quot;width&quot;:1456,&quot;resizeWidth&quot;:413,&quot;bytes&quot;:1476712,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Pwc0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd442407b-b8cc-4a44-8aad-cb80f945fd9c_2807x3786.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Pwc0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd442407b-b8cc-4a44-8aad-cb80f945fd9c_2807x3786.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Pwc0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd442407b-b8cc-4a44-8aad-cb80f945fd9c_2807x3786.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Pwc0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd442407b-b8cc-4a44-8aad-cb80f945fd9c_2807x3786.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Adding images to a post is an important part of being effective</figcaption></figure></div><p>As you progress in your role, you will be responsible for a greater impact on a business with ever-increasing complexity. So today, I want to explore all the other aspects of your career that are important to a business.</p><p>First, note how I speak about career progression. Often, we talk about career progression in a vacuum: &#8220;What do I have to do to get promoted?&#8221; What I find helpful, instead, is to frame your goals alongside the goals of the organization. Instead of asking how to get the next promotion, ask yourself, &#8220;Why would an organization promote me?&#8221;</p><p>For one, promotion can be a means of retaining top talent. More often, promotion and growth are how an organization recognizes your ability to have a greater impact on the business, so you earn a title and salary commensurate with that increased impact.</p><p>While being right does have some marginal impact on a business&#8212;it is important for a business that the forecasting of models be accurate and that the systems that are built be resilient&#8212;these are not the primary goals of an organization.</p><p>As organizations grow in complexity, the needs of the organization aren&#8217;t that people be right. If you hire well, many people may be right about the same thing. The hard problem of a business is affecting change, and the bigger the organization, the harder it is to affect that change.</p><p>This&#8212;being effective&#8212;is what separates a senior engineer from a lead or staff. It is the reason that being a lead, a staff, or a manager is hard. It is hard to affect change in organizations. You are fighting competing interests and priorities, varying levels of optimism and engagement, difficult personalities and entitlements. Even something as fundamental as communication is hard and only getting harder with remote and hybrid work.</p><p>I&#8217;ve seen too often many people who all agree that a particular system needs an upgrade. Being right about that is not a fundamentally useful feature for the system. Agreement about some fact does not result in a prioritization exercise; it does not communicate the value of that fact in terms of business outcomes: are we losing customers, are we not able to sign new customers, are we losing revenue, are we losing customer goodwill?</p><p>It takes a lot to affect change. First, you must understand the business objectives and how a particular change will help drive better business outcomes. Second, you must be able to navigate resourcing and prioritization. There are a million potential projects. What makes this the right one to do? Third, and most importantly, you must be accountable and own the change.</p><p>Ownership means many things to many people. I have seen a naive view of ownership that believes ownership means deciding who gets to do what within a particular domain. This is the least interesting version of ownership.</p><p>Actual ownership is a double-edged sword. I warn everyone who asks for ownership and accountability to be careful what they wish for. Ownership means being accountable for the success and failure of what you own. It means you set goals that are measured by metrics, delivered through milestones, and regularly reviewed and reported on.</p><p>Ownership does not necessarily mean you are writing the code or changing the system. You can own a project without deploying a single line of code. But ownership means refusing to allow the &#8216;that&#8217;s not mine&#8217; line of thinking to affect your responsibility to see things through. It means committing to a change and seeing it through.</p><p>If this sounds like a lot of work, that&#8217;s because it is. This is where I&#8217;ve seen frustration, especially from engineers, who have seen their ability to be correct about something as the thing that drives their success until it isn&#8217;t. If being correct was all that mattered, the world would be ruled by pedants. Instead, their frustration boils over into arguments about a workplace being &#8220;political,&#8221; which is the naive way of admitting you lack interpersonal skills.</p><p>You build political capital by being right, but you also build trust and influence. These are harder to measure, but it helps to think of who people want to work with: kind, empathetic people are far more fun to work with, especially if they are ruthlessly effective.</p><p>Earning trust is slowly won and quickly lost but can often be recovered. It means delivering on your commitments. It means having good taste, which is the only way (other than good vibes) that I can describe the intangible character that makes some people&#8217;s decisions better than others. It is not something I have ever been able to teach, but I can always find it in those who have it.</p><p>If I were to summarize all of this, I would say that it doesn&#8217;t hurt to be right, but it is critical that you be effective.</p><p>Of course, this entire essay is predicated on the fact that you are working at a somewhat functional organization. There are many dysfunctional organizations out there where the only thing that matters is saying yes to leadership, being right be damned. If you work there, you will need therapy more than my advice.</p>]]></content:encoded></item><item><title><![CDATA[The Rise of the Data Platform Engineer]]></title><description><![CDATA[In the late 2010s, when I was first advancing my career, the rise of the Data Scientist was everywhere.]]></description><link>https://databased.pedramnavid.com/p/the-rise-of-the-data-platform-engineer</link><guid isPermaLink="false">https://databased.pedramnavid.com/p/the-rise-of-the-data-platform-engineer</guid><dc:creator><![CDATA[Pedram Navid]]></dc:creator><pubDate>Thu, 27 Jun 2024 16:49:15 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!KrPy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69a5a08c-a212-4bf1-bd66-8599b617b7e5_3008x2000.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the late 2010s, when I was first advancing my career, the rise of the Data Scientist was everywhere. It was once the sexiest job of the 21st century, but like all inflationary things, the bubble popped and soon it was relegated from A Status Job to Yet Another Crummy Job (YACJ).</p><p>Soon, companies realized that a team of 20 Data Scientists couldn&#8217;t be effective with access to good data, and the role of the Data Engineer was brought to the forefront. Data Engineers would be responsible for the ingestion and transformation of data and the platform that enables data scientists, while the data scientists would become consumers of that data.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://databased.pedramnavid.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Pedram's Data Based is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>While on paper this seemed like a great division of labor, engineers famously <a href="https://multithreaded.stitchfix.com/blog/2016/03/16/engineers-shouldnt-write-etl/">do not want to write ETL pipelines</a>. So far back as 2016 Jeff Magnusson at Stitch Fix suggested that engineers build platforms, services, and frameworks and not ETL pipelines.</p><p>This largely did not happen.</p><p>Back in 2016, Hadoop clusters were still status quo. Spark and the JVM were the best we had. Scala was cool. What soon changed wasn&#8217;t that data scientists and data engineers ended up listening to Jeff, but a new breed of software was born. Cloud Data Warehouses were just becoming a natural replacement for the existing data systems. Instead of requiring a team of dedicated infrastructure engineers to scale your data requirements, you just needed a dedicated credit card.</p><p>Instead of tasking data scientists with writing ETL pipelines, we gave that task to Fivetran, Stitch and other cloud providers. The birth of the Modern Data Stack was just around the corner with Snowflake&#8217;s IPO in 2020.</p><p>Meanwhile, Data Scientists sat unhappy that they were using their PhDs to create dashboards. Consultants were picking up the slack until a little company called Fishtown Analytics open-sourced a tool they were using for transforming data in the warehouse. dbt was born and exploded in popularity, giving rise to the Analytics Engineer role. This role supplanted the Data Scientist, and soon the Data Scientists were freed of the chains of answering Yet Another Stakeholder Question (YASQ) and were able to move on to more important work, like <a href="https://machinelearningflashcards.com/">creating flashcards</a><strong>, </strong><a href="https://www.generalfolders.com/">founding startups</a>, and <a href="https://x.com/ylecun/status/1797270661192155427?lang=en">getting into fights on Twitter</a>.</p><p>Data Engineers, however, kept writing ETL pipelines. Sure, you could pay Fivetran to sync your Salesforce data, and maybe Stripe had a native Snowflake connector, but there was no escaping the long tail of data needs. Cost constraints meant that more and more companies were looking to bring some of the offloaded work back in-house. It was harder and harder to justify spending your pennies on every row that changed in a database.</p><p>As the dust settled, and interest rates rose, and VCs got bored of data and moved on to AI, we finally moved toward some sense of normalcy in data. Instead of hot takes, the data people continued to do the work it took the help make a business operate. We came to terms with the fact that Data Work is often just Blue Collar Work.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KrPy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69a5a08c-a212-4bf1-bd66-8599b617b7e5_3008x2000.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KrPy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69a5a08c-a212-4bf1-bd66-8599b617b7e5_3008x2000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!KrPy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69a5a08c-a212-4bf1-bd66-8599b617b7e5_3008x2000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!KrPy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69a5a08c-a212-4bf1-bd66-8599b617b7e5_3008x2000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!KrPy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69a5a08c-a212-4bf1-bd66-8599b617b7e5_3008x2000.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KrPy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69a5a08c-a212-4bf1-bd66-8599b617b7e5_3008x2000.jpeg" width="1456" height="968" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/69a5a08c-a212-4bf1-bd66-8599b617b7e5_3008x2000.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:968,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Farmer behind viral 'it ain't much, but it's honest work ...&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Farmer behind viral 'it ain't much, but it's honest work ..." title="Farmer behind viral 'it ain't much, but it's honest work ..." srcset="https://substackcdn.com/image/fetch/$s_!KrPy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69a5a08c-a212-4bf1-bd66-8599b617b7e5_3008x2000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!KrPy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69a5a08c-a212-4bf1-bd66-8599b617b7e5_3008x2000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!KrPy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69a5a08c-a212-4bf1-bd66-8599b617b7e5_3008x2000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!KrPy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69a5a08c-a212-4bf1-bd66-8599b617b7e5_3008x2000.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Actual photo of a data engineer</figcaption></figure></div><p>Instead of hiring 20 data scientists and asking them to &#8216;find insights&#8217;, we had smaller more focused teams that worked against delivering actual value to different lines of business. From building data models that made it easier to self-serve using modern BI tools, to creating recommendation models or predicting churn, the bread-and-butter stuff continued.</p><p>As teams matured and the frenzy of SaaS died down, we&#8217;ve started to return to the dilemma posed by Magnusson back in 2016. What should Data Engineers be working on?</p><h3>The Second Coming of the Data Platform Engineer</h3><p>I believe we&#8217;ve passed the trough of disillusionment and are entering the plateau of productivity. We&#8217;ve made a lot of progress in the last ten to fifteen years in data. The tooling is better than it has ever been, and it&#8217;s possible to do so much more with much less. DuckDB on a laptop is replacing MS Access on a corporate desktop. This is a <strong>good</strong> thing.</p><p>With that rise of productivity among data professionals of all kinds, from ML Engineers to Analytics Engineers, to Data Scientists and beyond, pressure is starting to build on Data Engineers.</p><p>There are two ways to react to that pressure. The easiest is to hire more data engineers to support your business, but we are fortunate that we live in a high-interest rate era.</p><p>High interest rates cure all ailments.</p><p>Instead, Data Engineers are coming back to the original sin of Data Engineering, building bespoke custom pipelines for your downstream consumers, and they&#8217;re solving it the same way we were trying to solve it 10 years ago: building platforms, frameworks, and services.</p><p>Part of the problem, I think, is the title Data Engineer simply beckons you to build pipelines. The next evolution of the role is more akin to a Data Platform Engineer.</p><p>This is someone who is tasked not with building ETL pipelines, but with making it possible for their various consumers to build any pipeline they need without having to resort to a complex higher language.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!J7sZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba08047-1348-4024-9afa-a980a54ef16b_1200x671.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!J7sZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba08047-1348-4024-9afa-a980a54ef16b_1200x671.jpeg 424w, https://substackcdn.com/image/fetch/$s_!J7sZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba08047-1348-4024-9afa-a980a54ef16b_1200x671.jpeg 848w, https://substackcdn.com/image/fetch/$s_!J7sZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba08047-1348-4024-9afa-a980a54ef16b_1200x671.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!J7sZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba08047-1348-4024-9afa-a980a54ef16b_1200x671.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!J7sZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba08047-1348-4024-9afa-a980a54ef16b_1200x671.jpeg" width="1200" height="671" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7ba08047-1348-4024-9afa-a980a54ef16b_1200x671.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:671,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!J7sZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba08047-1348-4024-9afa-a980a54ef16b_1200x671.jpeg 424w, https://substackcdn.com/image/fetch/$s_!J7sZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba08047-1348-4024-9afa-a980a54ef16b_1200x671.jpeg 848w, https://substackcdn.com/image/fetch/$s_!J7sZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba08047-1348-4024-9afa-a980a54ef16b_1200x671.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!J7sZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba08047-1348-4024-9afa-a980a54ef16b_1200x671.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>How to do that well is still not a solved problem: whether it&#8217;s custom bespoke yaml-to-pipeline factories, or something more purpose-built remains to be seen. But what I am seeing is more and more companies starting to move toward a framework approach to data platforms. It&#8217;s the only way to scale the demands of a data platform without scaling up the number of data engineers supporting your analysts.</p><p>What I like the most about this is that it finally gives Data Engineers something to look forward to. Career progression for Data Engineers often felt like it was simply bigger data and more complex pipelines, but most Data Engineers I know prefer software engineering to data analysis, and pipeline building is by its very nature closer to data analysis than building software.</p><p>While building pipelines will never go away, being able to see some light at the end of the tunnel is sometimes all we need.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://databased.pedramnavid.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Pedram's Data Based is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Hello, from the void.]]></title><description><![CDATA[a man, of letters]]></description><link>https://databased.pedramnavid.com/p/hello-from-the-void</link><guid isPermaLink="false">https://databased.pedramnavid.com/p/hello-from-the-void</guid><dc:creator><![CDATA[Pedram Navid]]></dc:creator><pubDate>Tue, 07 Nov 2023 06:25:45 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!E1Wa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89c3ec73-e3ea-4937-bf92-7ea713c4eaac_594x325.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hi friends, </p><p>It&#8217;s been a while. I&#8217;m afraid I haven&#8217;t had many thoughts to lead with, and so I&#8217;ve stopped writing for a few months, but I have missed you all. </p><p>A lot has happened since we last talked. I quit posting on the site formerly known as Twitter, although I occasionally open it to see if I&#8217;ve missed anything. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!E1Wa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89c3ec73-e3ea-4937-bf92-7ea713c4eaac_594x325.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!E1Wa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89c3ec73-e3ea-4937-bf92-7ea713c4eaac_594x325.png 424w, https://substackcdn.com/image/fetch/$s_!E1Wa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89c3ec73-e3ea-4937-bf92-7ea713c4eaac_594x325.png 848w, https://substackcdn.com/image/fetch/$s_!E1Wa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89c3ec73-e3ea-4937-bf92-7ea713c4eaac_594x325.png 1272w, https://substackcdn.com/image/fetch/$s_!E1Wa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89c3ec73-e3ea-4937-bf92-7ea713c4eaac_594x325.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!E1Wa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89c3ec73-e3ea-4937-bf92-7ea713c4eaac_594x325.png" width="594" height="325" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/89c3ec73-e3ea-4937-bf92-7ea713c4eaac_594x325.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:325,&quot;width&quot;:594,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:56835,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!E1Wa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89c3ec73-e3ea-4937-bf92-7ea713c4eaac_594x325.png 424w, https://substackcdn.com/image/fetch/$s_!E1Wa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89c3ec73-e3ea-4937-bf92-7ea713c4eaac_594x325.png 848w, https://substackcdn.com/image/fetch/$s_!E1Wa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89c3ec73-e3ea-4937-bf92-7ea713c4eaac_594x325.png 1272w, https://substackcdn.com/image/fetch/$s_!E1Wa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89c3ec73-e3ea-4937-bf92-7ea713c4eaac_594x325.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In some ways I have: whether it&#8217;s Charlie Marsh talking about another <a href="https://x.com/charliermarsh/status/1721631786797523116?s=20">Ruff milestone</a>, or Simon Willison posting about <a href="https://x.com/simonw/status/1721599524576322018?s=20">prompt injections</a>, there&#8217;s lots of really interesting work I&#8217;m missing, but at the same time, I just don&#8217;t have the heart for misinformation and the muskification of my internet. For what it&#8217;s worth, I&#8217;m on <a href="https://www.threads.net/@pdrmnvd">Threads</a> posting to all 3 of you who read my content, and on <a href="https://www.linkedin.com/in/pedramnavid">LinkedIn</a>, trying more and more unhinged forms of posting to see what I can get away with.</p><p>But the real reason I left the site formerly known as Twitter is that it was a site that made me feel bad. I would scroll until I got upset, and then scroll some more, and then would feel worse after I was done. And once I stepped away from it, I realized, something that makes you feel bad all the time is probably bad for you. It&#8217;s good to listen to your feelings.</p><p>While something is lost, I&#8217;m hopeful that something new will emerge, and while we&#8217;ve tried three forms of recreating Twitter, all without success, maybe there will be some other place where we can all get together and share what we&#8217;re working on. </p><p>Until then, I&#8217;d like to ask you to do something for me. Send me an email with what you&#8217;re working on. I&#8217;d love to hear from all of you still out there. It can be about data, a project, a deal, a hobby, or even a home renovation project. Whatever it is, I&#8217;d like to hear about it, and let me know if you&#8217;re comfortable with me sharing it, because I&#8217;d like to start writing not just about what I work on, but on what you all do too.</p><p>While I no longer have the reach of millions of potential likes, retweets, quotes, and replies, I think we have something a little more intimate. I think there&#8217;s something to be said for writing slower, engaging more thoughtfully, and chasing a good connection over a hot take, so indulge me.</p><p>In that spirit, here&#8217;s what&#8217;s been going on in my life: </p><div><hr></div><h2>Orchestrators Everywhere</h2><p>If you haven&#8217;t been following me closely, you might have missed it, but I&#8217;ve also recently joined Dagster to do data things. It&#8217;s not every day you get to join a company that is building a tool purpose built for you, and I consider myself really lucky to be part of such a highly talented group of people. </p><p>Orchestrators are funny things. At first, they seem relatively simple: a scheduler, a task runner, a webserver, and some glue. And while orchestrators like Airflow and Dagster at first glance seem a natural place for data pipelines to run, when you look closer you start to see orchestrators everywhere.</p><p>Github Actions, CircleCI, Airbyte, Meltano, Fivetran, dbt Cloud, are all operating as orchestrators too. I&#8217;d argue not by desire, but by necessity. </p><p>Consider a simple extract-load pipeline where you fetch data from a database and load it into a data warehouse. Airbyte, Meltano and Fivetran all offer this capability. But being able to extract data from one system and load it into another alone isn&#8217;t sufficient to build a product.</p><p>You also need to schedule that task, you need to be able to monitor it for failures, retry when it doesn&#8217;t succeed, and look at logs to understand what went wrong. You may even need to create a dependency, allow for configurations, and different environments. Quickly what seems trivial becomes complex.</p><p>Part of what I&#8217;ve been thinking about and working on is wondering what a world might look like if we didn&#8217;t have to reinvent orchestrators for every job we wanted to accomplish? What if we had simple tools for extracting and loading data, or for all the other data concerns we seem to have: data quality, anomaly detection, cataloging? </p><p>Some of that work has resulted in a simple little idea I call <code>dagster-embedded-elt. </code>That&#8217;s what I&#8217;ve been up to over the past little while. I&#8217;d love to hear what you&#8217;ve been doing.</p><div><hr></div><h2>Coalesce 2023</h2><p>Last year at Coalesce, someone whose identity I will protect, made the claim that 2023 would be the last good year for that conference. I think what he was getting at is that dbt labs would be forced to grow up and start running conferences like real companies do: customer stories, product showcases, proof of enterprise-readiness. </p><p>While in some ways, that was always the point of Coalesce, it has definitely become more true this year than any prior year. Attendance seemed lower, probably due more to the economy than anything else, but production value remained high as was the bar for talks. Talking to practitioners, it seemed the general consensus was pretty positive, and so while we didn&#8217;t have the marching bands and parties and free rides of yesteryear, I didn&#8217;t get the impression that this year disappointed. </p><p>Personally, I was stuck in my hotel room for 3 days fighting a brutal bought of exhaustion, and must&#8217;ve slept 15 hours a day each, so for me, this was the most beautiful of all conferences ever. Never had I slept so much, so peacefully, so quietly. If you can pull it off, I highly recommend going to a conference to sleep for 3 days straight.</p><p>Next year, Colaesce goes to Vegas, completing dbt Labs&#8217; transformation from small, scrappy startup to Enterprise-Ready (TM). </p><p>Did you go to Coalesce? If you didn&#8217;t sleep for three days straight, I&#8217;d love to know your thoughts.</p><div><hr></div><h2>Everything Else</h2><p>I&#8217;ve started therapy, once a week, for the past few months. I don&#8217;t know that I like it, or enjoy it, but I have to believe that it is good for me, given how much I am spending on it. I believe they call that cognitive dissonance in the biz. </p><p>After getting jealous of Taylor Murphy&#8217;s sim racing life, I decided I had to jump in. I got a wheel, pedals, and a stand. It&#8217;s equal parts fun and embarrassing. </p><p>I&#8217;ve been playing with LLMs non-stop, trying to better understand them. I&#8217;ve trained models on my desktop, built RAG pipelines using Llama Index, and am working on a support bot trained on Github Issues, Discussions, and Docs. </p><p>I am [&#8212;&#8212;] this close to try NixOS. Something about it seems really interesting, but there&#8217;s also something really compelling about NOT spending three days setting up a new Linux environment. </p><p>I desperately need a new desk chair. This one from West Elm, is so squeaky it is driving me insane. Please give me chair recommendations</p><p>If you do any of the above, or anything else that&#8217;s fun, please write in and tell me how it&#8217;s going for you. </p><p>All the best,</p><p>Your Friend.</p><p>pedram </p>]]></content:encoded></item><item><title><![CDATA[What the hell is going on with data?]]></title><description><![CDATA[This used to be fun.]]></description><link>https://databased.pedramnavid.com/p/what-the-hell-is-going-on-with-data</link><guid isPermaLink="false">https://databased.pedramnavid.com/p/what-the-hell-is-going-on-with-data</guid><dc:creator><![CDATA[Pedram Navid]]></dc:creator><pubDate>Tue, 22 Aug 2023 10:34:37 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1615402020061-337a2a5a97ab?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMXx8bm9zdGFsZ2lhfGVufDB8fHx8MTY5MjcwMDM4OHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In case you missed it, it&#8217;s been a hell of a few weeks. I barely know where to begin. First, dbt did an altogether boring and uninteresting thing, which was switch to consumption-based pricing. As you might expect with any pricing changes that cause the average cost of something to go up, consumers were upset.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1615402020061-337a2a5a97ab?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMXx8bm9zdGFsZ2lhfGVufDB8fHx8MTY5MjcwMDM4OHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1615402020061-337a2a5a97ab?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMXx8bm9zdGFsZ2lhfGVufDB8fHx8MTY5MjcwMDM4OHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1615402020061-337a2a5a97ab?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMXx8bm9zdGFsZ2lhfGVufDB8fHx8MTY5MjcwMDM4OHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1615402020061-337a2a5a97ab?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMXx8bm9zdGFsZ2lhfGVufDB8fHx8MTY5MjcwMDM4OHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1615402020061-337a2a5a97ab?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMXx8bm9zdGFsZ2lhfGVufDB8fHx8MTY5MjcwMDM4OHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1615402020061-337a2a5a97ab?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMXx8bm9zdGFsZ2lhfGVufDB8fHx8MTY5MjcwMDM4OHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" width="6048" height="4024" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1615402020061-337a2a5a97ab?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMXx8bm9zdGFsZ2lhfGVufDB8fHx8MTY5MjcwMDM4OHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:4024,&quot;width&quot;:6048,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;a close up of a pair of orange video game controllers&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="a close up of a pair of orange video game controllers" title="a close up of a pair of orange video game controllers" srcset="https://images.unsplash.com/photo-1615402020061-337a2a5a97ab?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMXx8bm9zdGFsZ2lhfGVufDB8fHx8MTY5MjcwMDM4OHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1615402020061-337a2a5a97ab?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMXx8bm9zdGFsZ2lhfGVufDB8fHx8MTY5MjcwMDM4OHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1615402020061-337a2a5a97ab?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMXx8bm9zdGFsZ2lhfGVufDB8fHx8MTY5MjcwMDM4OHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1615402020061-337a2a5a97ab?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMXx8bm9zdGFsZ2lhfGVufDB8fHx8MTY5MjcwMDM4OHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@girlwithredhat">Girl with red hat</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><p>The rollout was odd: no announcement, no email. Just a blog post quietly published that someone in the community saw, and posted in dbt&#8217;s Slack channel. A community that somehow, inconceivably, has 54,000 people. (If a Slack rep is looking for the deal of a lifetime, see if you can get dbt to start paying for that workspace, and drop me 1% of your commission.)</p><p>Even though this was only affecting new users, and existing users would have something like a year before this hits their pocketbooks, a minor revolt started. You know things are bad when the CRO has to step in to a Slack thread. Somewhere in the madness, people realized that you could avoid materializing views, and so quietly, without notice, dbt dropped the number of models included in their base package from 20,000 to 15,000. The people revolt, the CRO tries to calm down the peasants and pitchforks, time is a circle.</p><p>People are tossing around back of the napkin math. Costs are going up 300% for some. One person complained that they would have to pay a whopping $200 a month. Others would have been hit far worse. Smelling an opportunity, the blogposts and Twitter threads cropped up seemingly overnight.</p><ul><li><p>Top 10 Sexiest Ways to Deploy Dbt (sic)</p></li><li><p>How One GitHub Action Made Me a Senior Analytics Engineer</p></li><li><p>PROOF SQL IS THE ACTUAL DEVIL</p></li></ul><p>And so on. The knives were barely sharpened before they came out. </p><p>This isn&#8217;t a story about a botched pricing strategy, or one about the powerful lessons of price elasticity though. </p><p>I want to talk about something else that&#8217;s happening: Data just isn&#8217;t fun anymore.</p><h2>This Used to be Fun</h2><p>There was a time, not too long ago, where the data community used to be really fun. There were many like-minded people, and I made friendships from that community. Real friendships. </p><p>People I meet in real life, who I text with on the phone, who refer me to their therapists and show me pics of their new dogs. Real people I truly love, and all through this weird thing known as the &#8216;data community&#8217;. I have group chats with the boys, I check-up on my friends after a California Hurricane (some light rain), I message some less regularly but still with them always in my heart and in my head. </p><p>The community used to be fun. We had the same job: &#8220;something to do with data&#8221; and we used some of the same tools and we all suffered the exact same problems: data problems that were actually people problems.</p><p>There was even an exciting open-source library released by a small consultancy shop that was solving a real problem real people had. We bonded over the shared trauma that was looking at Hadoop and Spark error logs, trying to decipher Java stack traces, looking each other in the eye and saying &#8220;No. Never again. We will not forget.&#8221;</p><p>We&#8217;d make tweets and some of them would go viral, usually the worst ones, because the plebs have horrible taste. We knew the best tweets got 12 likes, and the worst ones got 12,000. We tried to pretend we didn&#8217;t laugh at puns, but deep down we did. Because we were happy.</p><p>Because this used to be fun.</p><p>But this isn&#8217;t fun anymore. Maybe it&#8217;s being harassed, both publicly and privately. Maybe it&#8217;s just the trouble with becoming a public figure, in that people you don&#8217;t know suddenly feel entitled to the same closeness you reserve for those that&#8217;s you do know privately. Maybe it was the sudden infusion of billions of dollars in capital without due diligence that brought in so many people looking to make a quick buck on whatever the flavour of the day is.</p><p>But this used to be fun.</p><p>Now it&#8217;s knives out for a company that can&#8217;t figure out how to monetize something, after taking round after round of money without a clear path to a valuation that makes little sense once the drunkenness of ZIRP gives way to the ruinous sobriety of revenue streams and capital expenses. </p><p>Maybe it&#8217;s a narcissist with a weak ego that took over a platform that was once bad, but not <em>this type of bad.</em></p><p>It&#8217;s the death of dbt this, the death of the modern stack that, it&#8217;s gotchas and I-told-yous. It&#8217;s venomous toxicity, and the slow disappearance of people I used to know from a platform I used to love.</p><p>Did I mention, this used to be fun? A familiar feeling. Another platform I once loved was also destroyed. </p><p>I used to write a lot in a little place called Livejournal. There too I made friends, and we wrote for ourselves as much as for each other, while reading what others wrote for us and leaving comments. A social network before it had a name. A platform that brought people together, until eventually it drove us all apart. If it wasn&#8217;t the Russians buying the platform, it was the harassment from the same people who would later become neo-nazis.</p><p>This used to be fun.</p><h2>Everyone Wants a Piece of the Pie, Nobody Wants to Bake</h2><p>In many ways, we have it better than ever before. dbt still exists, and it still remains open-source, for now. There are countless free and open-source tools that we would&#8217;ve killed for a decade ago. DuckDB is truly magical. Dagster, (a company I just joined) is a labor of love of from someone who just wanted to build something that not only fixed pain, but did so elegantly, although don&#8217;t worry, I recognize it is a VC-backed company too. There&#8217;s a wealth of tooling out there, and yes, some of it is of questionable quality, but much of it is great, and there&#8217;s so much we can do today that was difficult to do before.</p><p>And while we continue to get more than we will ever deserve from the good-hearted Dutch, we also benefit from the capital that venture-backed companies provide in order to hire the engineers that build the products that we use. The problem with good software is that it requires labor to build, maintain and improve, and there&#8217;s still no substitute for a bunch of people motivated by a common goal and passion in building anything of sufficient complexity.</p><p>But there is something about the data space that feels wrong, like something is missing. There&#8217;s a spirit that we once had of building and giving away that doesn&#8217;t seem so prevalent anymore. For all the millions of dollars you raised off the backs of the jinja, how much have you donated back? For all the hot takes and blog posts, how much time have you spent maintaining open-source projects? How much have you given away for all that you have taken? </p><p>Yes, data is dumb, and VCs ruin everything, and this company sucks and that one sucks too, yes it&#8217;s all a big scam, and everyone is a marketer, and the only person who has all the right answers charges you by the hour. These things are all true, but they are not fun truths. They are sad truths, and I don&#8217;t care for sad truths.</p><p>There&#8217;s plenty of good truths too. There are people who are still giving away knowledge because they believe knowledge should be shared, and there are people building things because they enjoy the act of building, and there are people solving problems not for profit, but for fun. </p><p>If you haven&#8217;t spent some time mentoring people and watching them grow over the months and years, if you haven&#8217;t spent too many thankless hours of your life trying to maintain and contribute to an open-source tool that everyone uses and no one cares to improve, if you don&#8217;t build things to solve a pain you&#8217;ve had in the hopes that it&#8217;ll solve someone else&#8217;s, if you don&#8217;t give away your hard work for free, then kindly, please, shut the fuck up. </p><p>Because this used to be fun.</p>]]></content:encoded></item><item><title><![CDATA[The Future of Data]]></title><description><![CDATA[Everyone wants a piece of the pie; no one wants to bake]]></description><link>https://databased.pedramnavid.com/p/the-future-of-data</link><guid isPermaLink="false">https://databased.pedramnavid.com/p/the-future-of-data</guid><dc:creator><![CDATA[Pedram Navid]]></dc:creator><pubDate>Sun, 21 May 2023 20:53:02 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c898466-e99a-4661-8af2-c8bbfe054295_600x350.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I&#8217;ve been thinking about a few things in the back of my head for a few months now, and I think it&#8217;s time to draw them out. I&#8217;ve been spending the last year doing a lot of data consulting and advisory work, whether it&#8217;s implementations, migrations, and modeling work; or, on the advisory side, working with data companies.</p><p>With this, I&#8217;ve started to develop three theories that I will share today on what I think the future holds for data: how ops will learn from data, the multiplication of semantic layers, and the single biggest problem data teams will continue to face. </p><p>Let&#8217;s dive in.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://databased.pedramnavid.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Pedram's Data Based is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Ops Teams Finally Get Some Love</h2><p>For the past five or so years, Data Teams have proudly learned lessons from software engineering, much of this kicked off by dbt&#8217;s introduction to data teams. We&#8217;ve learned about version control, CI/CD, configuration as code, automation, commit checks, and more. As a result, data teams have become better at managing the complexity of operating on data. Great job, everyone. I&#8217;m so proud.</p><p>Now, it&#8217;s time for ops teams to learn from data. Unfortunately, I often think of ops as the forgotten stepchild of data. In many ways, ops teams are just data teams with worse tools and more chaos. Yet, in the best organizations, I&#8217;ve seen ops and data teams work closely together to enable each other. </p><p>Ops teams work across various functions, including marketing, sales, revenue, finance, and facilities. What they all have in common are workflows driven by data that are primarily manual, annoying, and slow. Work occurs predominantly in spreadsheets, extracts to CSVs, in the Salesforce UI, and manual uploads and overrides. None of it is validated, verified, or version-controlled.</p><p>This is starting to change. Emilie Schario is building <a href="https://www.helloturbine.com/about">Turbine</a>, which offers a fresh take on procurement, inventory, and supply chain management. <a href="https://www.savantlabs.io/">Savant Labs</a> is creating a cloud-native automation platform for analysts. <a href="https://zamp.com/">Zamp</a> is automating sales tax compliance. <a href="https://withlantern.com/">Latern</a> is automating forecasting and churn reporting to enable CS teams to understand their customers better. </p><p>What these and other companies all have in common is a focus on bringing the best of data capabilities to teams that are traditionally underserved. </p><p>What&#8217;s clear to me is that now everyone is a data person. Whether it&#8217;s someone in success, marketing, finance, or sales: everyone is working with data, and so much of the data work in companies is happening outside data teams. </p><p>While data teams are still invaluable in building data assets and products, the work does not end there. Those data products become the fundamental building blocks on which other teams understand, forecast, and grow their business. I&#8217;m excited about what the future holds for these teams.</p><h2>One Semantic Layer to Rule Them All</h2><p>Just kidding. That won&#8217;t happen. To be clear: I want one semantic layer, and conceptually I love the idea of a semantic layer living close to my transforms and only having to define revenue once, in one place.</p><p>A place where every downstream tool can ingest capital-T Truth and finally fulfill all my dreams: a single-source-of-truth and self-serve analytics. </p><p>I can sip margaritas on the beach knowing that my VP of Finance will never report a wrong number again.</p><p>While the dream is beautiful, we need to consider that the one product that cares the most about semantic information is not dbt but BI tools. Tools like Looker, Holistics, or Lightdash all have a semantic layer not because they want one but because they need one.</p><p>The unlock of a semantic layer is clear: you can scale out a data team by enabling self-serve analytics. The problem is that these tools, and other BI tools, depend on the semantic layer to build their product and roadmap. </p><p>While some tools may be happy to put their hat in the dbt Cloud Semantic Layer, I think there&#8217;s an obvious issue: 1) it limits their customers to only those who are on dbt Cloud and 2) it puts their product roadmap in the hands of another company.</p><p>If you had a BI solution and had to rely on another company approving a PR to ship a feature, would you be okay with that? </p><p>This is, I think, the fundamental problem of semantic layers. </p><p>There are, however, some interesting alternative approaches to this space. For example, <a href="https://honeydew.ai/">Honeydew</a> is building a semantic layer middleware that can be consumed directly from the warehouse, obviating the need for integrations with another tool&#8217;s semantic layer. Any application that can read from the warehouse can read semantic information. </p><p><a href="https://www.datacouncil.ai/talks/cubing-and-metrics-in-sql?hsLang=en">Julian Hyde&#8217;s work on Apache Calcite </a>aims to bring metrics directly into SQL, with the long-term hope of standardizing how we express metrics and storing them in the warehouse outright. <a href="https://www.malloydata.dev/">Malloy</a> is an alternate take, a new open-source language for expressive data modeling. </p><p>Will any of these solutions win? If I were to bet, even five years from now, they&#8217;ll all be around, along with other half-dozen new ideas, and we&#8217;ll be no closer to a single standard than we are today. Sorry.</p><h2>The Single Biggest Problem Data Teams Face Today</h2><p>It does not know how to measure data teams. It&#8217;s not managing various vendor contract renewals and negotiating discounts. It&#8217;s not trying to figure out how to deploy that Python script. It&#8217;s certainly not something I&#8217;ve seen any solution or vendor offering in this space, but it&#8217;s pervasive, all-encompassing, and full of toil and dread. </p><p>It&#8217;s business logic.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pfSN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e786ec-5f0f-4dd0-b366-31c0ccc2e228_1642x1512.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pfSN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e786ec-5f0f-4dd0-b366-31c0ccc2e228_1642x1512.jpeg 424w, https://substackcdn.com/image/fetch/$s_!pfSN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e786ec-5f0f-4dd0-b366-31c0ccc2e228_1642x1512.jpeg 848w, https://substackcdn.com/image/fetch/$s_!pfSN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e786ec-5f0f-4dd0-b366-31c0ccc2e228_1642x1512.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!pfSN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e786ec-5f0f-4dd0-b366-31c0ccc2e228_1642x1512.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pfSN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e786ec-5f0f-4dd0-b366-31c0ccc2e228_1642x1512.jpeg" width="1456" height="1341" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d3e786ec-5f0f-4dd0-b366-31c0ccc2e228_1642x1512.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1341,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!pfSN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e786ec-5f0f-4dd0-b366-31c0ccc2e228_1642x1512.jpeg 424w, https://substackcdn.com/image/fetch/$s_!pfSN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e786ec-5f0f-4dd0-b366-31c0ccc2e228_1642x1512.jpeg 848w, https://substackcdn.com/image/fetch/$s_!pfSN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e786ec-5f0f-4dd0-b366-31c0ccc2e228_1642x1512.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!pfSN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e786ec-5f0f-4dd0-b366-31c0ccc2e228_1642x1512.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">It&#8217;s one metric, Michael; how many tables could it need? Photo courtesy <a href="https://twitter.com/tayloramurphy/status/1308133042985332738?s=20">Taylor Murphy.</a></figcaption></figure></div><p>Maintaining and understanding is the most challenging part of any data pipeline. We attempt to document it in yaml or a data catalog, but the code is the only accurate documentation of business logic. </p><p>Take a concept as simple as &#8216;lead source.&#8217; All you need to know is, for a lead, where did it come from. Well, first, we need to define a lead. Marketing might have one definition, and sales might have another. Once you have a lead, you need to identify its source. Again, the rules vary depending on who is asking. </p><p>Marketing might have rules based on priority and a rolling window. Did you get the email from an event, webinar, or ebook? What if they did two activities? Which one counts? Sales might have their questions: did it come from a partner? Was it part of an outbound email campaign? An SDR? A business card found on the ground that you cold-called? </p><p>The rules for how you classify this lead are constantly changing. In addition, the data you collect to classify it is never clean. They are removing Gmail accounts, cleaning up fake data, and stitching pieces together from various source systems. All these data flow issues are the core parts of data pipelines we seldom discuss.</p><p>They&#8217;re also the core models that drive all the metrics and measures downstream. </p><p>When someone asks you to explain how we calculate churn, you could direct them to documentation that explains how churn is calculated, but the true definition of churn isn&#8217;t in the documentation; it&#8217;s in the DAG of transforms, filters, conditions, aggregations and more that occur step-by-step to take a sequence of events, invoices, subscriptions, payments, and constants into a monthly view of reported churn.</p><p>This complexity is why column-level lineage demand and appeal are ever-growing. Understanding how this number came to me means understanding the web of inputs that led to it. </p><p>Over time the cruft adds up. There&#8217;s data from the previous migration, the random spreadsheet from an experiment that needs to be accounted for, test accounts, wrong ids, and corrections upon corrections. </p><p>What do we have to manage this complexity? At best, some modularity, but the fundamental complexity hasn&#8217;t been managed; it&#8217;s only been transformed.  </p><p>Data mesh is, in some ways, the admission of defeat in the face of complexity. The demands of teams are so complex that we must break apart the whole thing into smaller, more manageable chunks. Sales get their metric, and Marketing gets theirs. When someone asks why the numbers don&#8217;t match, we tell them that they don&#8217;t match because they are different. </p><p>Reconciliation is impossible. The quantum theory of data states that you may not understand and measure a metric with perfect accuracy. You may understand but not measure, measure but not understand, or measure and sort of understand. These are the only states we know. </p><p>The other option is forcing stakeholders to make tough decisions about simplifying their requests; that&#8217;s a dream we can all have. We can stare into the abyss and demand that the pit of despair leave us be; that the dread and anguish of staring into 500 lines of SQL that somehow define what customer is be ridden from our lives. We may reject the Sisyphean rock-pushing of model-building for now, but the rock will remain steadfast in its place. We may decide to quit data altogether and see what those software engineers are up to&#8212;they seem much happier.</p><p>But we wake from our dream. The data remains a mess. The stakeholders remain impatient. The work never ends, and we press on&#8212;one reconciliation at a time. The numbers are different because they are not the same. The numbers are different because they are not the same&#8212;the numbers.</p><p>The numbers.</p><p>The horror.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yVEC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c898466-e99a-4661-8af2-c8bbfe054295_600x350.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yVEC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c898466-e99a-4661-8af2-c8bbfe054295_600x350.jpeg 424w, https://substackcdn.com/image/fetch/$s_!yVEC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c898466-e99a-4661-8af2-c8bbfe054295_600x350.jpeg 848w, https://substackcdn.com/image/fetch/$s_!yVEC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c898466-e99a-4661-8af2-c8bbfe054295_600x350.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!yVEC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c898466-e99a-4661-8af2-c8bbfe054295_600x350.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yVEC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c898466-e99a-4661-8af2-c8bbfe054295_600x350.jpeg" width="600" height="350" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6c898466-e99a-4661-8af2-c8bbfe054295_600x350.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:350,&quot;width&quot;:600,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Apocalypse Now | National Review&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Apocalypse Now | National Review" title="Apocalypse Now | National Review" srcset="https://substackcdn.com/image/fetch/$s_!yVEC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c898466-e99a-4661-8af2-c8bbfe054295_600x350.jpeg 424w, https://substackcdn.com/image/fetch/$s_!yVEC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c898466-e99a-4661-8af2-c8bbfe054295_600x350.jpeg 848w, https://substackcdn.com/image/fetch/$s_!yVEC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c898466-e99a-4661-8af2-c8bbfe054295_600x350.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!yVEC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c898466-e99a-4661-8af2-c8bbfe054295_600x350.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Until next time, happy model building.</p>]]></content:encoded></item><item><title><![CDATA[Doing Data The Hard Way Part 1: Extracting Data]]></title><description><![CDATA[It's one table Michael, how hard could it be?]]></description><link>https://databased.pedramnavid.com/p/doing-data-the-hard-way-part-1-extracting</link><guid isPermaLink="false">https://databased.pedramnavid.com/p/doing-data-the-hard-way-part-1-extracting</guid><dc:creator><![CDATA[Pedram Navid]]></dc:creator><pubDate>Tue, 02 May 2023 13:30:41 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!HyzY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb2e44ee-a025-47bf-9428-fbf73d6ae288_1424x656.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In my last post, I promised you all a deep dive into doing data the hard way. In this post, I hope to deliver on that promise.</p><p>I&#8217;ll explore the first part of every data journey: getting data out of a system. The story is likely to be a familiar one to many of you. Data about something you care about exists in some system, and you want to extract that data and store it somewhere. How hard can it be?</p><p>I&#8217;m going to skip over the pleasantries of why you might want to do this and pretend that we all understand it&#8217;s something that needs to be done. There are many types of systems that have data that we might want, but for our example, we will cover a common use case: application databases.</p><p>You can&#8217;t extract data without also putting it somewhere so we&#8217;ll also discuss the merits of two different strategies: saving the data in some structured format like a CSV or a Parquet file, or writing the data directly into a Data Warehouse.</p><p>Let&#8217;s get started</p><h2>Querying Data</h2><p>Given some source system, we will need to query that system to retrieve some subset of data. If our source system is a SQL database, then naturally, we&#8217;ll use SQL. </p><p>In the brute force method, we could extract all data from all tables that we&#8217;re interested in, ignoring system load and storage costs. This is a stateless process that is often overlooked but is rather simple. It might look something like this in Postgres:</p><pre><code>\copy customers TO './customers.csv' CSV DELIMITER ','
</code></pre><p>We might soon realize while this is a simple method, it comes with some costs: namely it can be expensive to run. Our backend database may not appreciate the load. (You are running this against a replica, right?)</p><p>One solution is to only fetch data that has changed since your last run. This is known as an incremental data load. </p><p>We have two options:</p><ol><li><p>Use Change Data Capture: Subscribe to a log of all database events and fetch all events that occurred after our last load. </p></li><li><p>Use a column that updates every time a row is updated, and only fetch records that have been updated since the last load.</p></li></ol><p>The first option tends to be more accurate, but also a bigger pain to set up. It requires configuration of the database server, possibly even a restart. To really understand it, you&#8217;ll need to understand the WAL or write-ahead log in Postgres.</p><h3>A quick detour on databases</h3><p>The WAL is a mysterious place within a database. </p><div class="paywall-jump" data-component-name="PaywallToDOM"></div><p>While on the surface, a database appears to be a collection of tables that mirror a spreadsheet with many tabs, under the hood what is going on are sequences of events. The WAL is the ledger that keeps track of these events. Anytime you insert, update, or delete a row, a record of that transaction is kept just like it would be in an accounting ledger.</p><p>The balance of all these transactions, much like the bank balance you have, is the collective sum of all these events. While the WAL is used to ensure that in the event a database goes down, a record of everything that happened since the last backup is persisted, the WAL can also be used to sync data other systems, such as a backup replica database, or even, your silly little ETL job.</p><p>A plugin, such as wal2json, can translate these events into something a little more manageable, as we see here. </p><pre><code>{
        "change": [
                {
                        "kind": "insert",
                        "schema": "public",
                        "table": "inventory",
                        "columnnames": ["id", "item", "qty"],
                        "columntypes": ["integer", "character varying(30)", "integer"],
                        "columnvalues": [1, "apples", 100]
                }
        ]
}
{
        "change": [
                {
                        "kind": "update",
                        "schema": "public",
                        "table": "inventory",
                        "columnnames": ["id", "item", "qty"],
                        "columntypes": ["integer", "character varying(30)", "integer"],
                        "columnvalues": [1, "apples", 96],
                        "oldkeys": {
                                "keynames": ["id"],
                                "keytypes": ["integer"],
                                "keyvalues": [1]
                        }
                }
        ]
}
</code></pre><p>Every change to a database emits an event, and nearly every database has their own way of emitting these events. There is no single standard for CDC so it is up to the downstream implementations to handle the varying logic. Now that you know what a WAL is, let&#8217;s pick an option.</p><h3>Incremental Options</h3><p>The WAL, while appealing, comes with many complexities. We&#8217;d have to store JSON data and process each row. </p><p>Instead, we&#8217;ll opt for using the updated_at column. </p><p>These are typically maintained by the database and automatically update whenever a row changes. Be cautious, sometimes they don&#8217;t update, and this can cause inconsistencies.</p><p>This simplifies our future queries quite a bit. Now we can filter on data updated since our last sync.</p><pre><code>select * from customers
// if incremental run
where updated_at &gt;= {{last_sync_date}}</code></pre><h3>Data Storage and Encoding</h3><p>Once the data has been queried, it needs to be saved. The key decision here is whether to save the data in a binary or text format. You&#8217;re already familiar with text formats: CSV and JSON are the most popular. The benefits of text formats are that they are easy to read for humans, but this comes at the expense of efficiency and ambiguity around data types. </p><p>If you&#8217;ve ever had to parse the text &#8220;01-03-12&#8221; into a date format, you&#8217;ll appreciate the horrors of ambiguity. The structure of JSON also makes for large files as keys are repeated for every row.</p><p>Binary formats solve these issues by encoding data in a machine-readable format. Binary formats encode the data&#8217;s schema and types and offer more efficient storage of data. Parquet, for example, offers techniques like column-level compression and bit-packing. But nothing is every easy when it comes to data.</p><p>Parquet has a few well-defined standard types. Your typical types such as INT and FLOAT along with strings, dates, and timestamps are all well supported. The problem is your database may not have the same type system. For example, Postgres has the <a href="https://www.postgresql.org/docs/current/datatype-net-types.html">inet and cidr</a> types which are ways of encoding network addresses. </p><p>Here you may wish to resort to trusty strings for unknown data types by default, allowing downstream to cast these as needed or simply use them as is. </p><p>To extract the data into parquet, you could use a wonderful little like <a href="https://github.com/exyi/pg2parquet">pg2parquet</a> which can take either a table or a query and output a parquet file, or even take advantage of DuckDB&#8217;s <a href="https://duckdb.org/2022/09/30/postgres-scanner.html">Postgres scanner.</a> </p><p>With pg2parquet, you can extract an entire table in a single line:</p><pre><code> pg2parquet export 
     --table businesses \
     -H $POSTGRES_HOST \
     -U $POSTGRES_USER \
     --password $POSTGRES_PASSWORD \
     -o ./businesses.parquet \
     --dbname can_i_haz_replica</code></pre><p>You&#8217;ll want to save this to a persistent storage through your cloud provider, such as S3 or GCS for safe-keeping. Don&#8217;t let your hard work go to waste!</p><h3>Skipping Storage</h3><p>Pedram, you might say, why store data twice? I could simply write each row to my data warehouse right away! </p><p>And you are correct, you could do this! In fact, if you think this is a good idea, I&#8217;d encourage you to try it. What you will eventually run into is a failure, and failures are never fun.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HyzY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb2e44ee-a025-47bf-9428-fbf73d6ae288_1424x656.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HyzY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb2e44ee-a025-47bf-9428-fbf73d6ae288_1424x656.png 424w, https://substackcdn.com/image/fetch/$s_!HyzY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb2e44ee-a025-47bf-9428-fbf73d6ae288_1424x656.png 848w, https://substackcdn.com/image/fetch/$s_!HyzY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb2e44ee-a025-47bf-9428-fbf73d6ae288_1424x656.png 1272w, https://substackcdn.com/image/fetch/$s_!HyzY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb2e44ee-a025-47bf-9428-fbf73d6ae288_1424x656.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HyzY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb2e44ee-a025-47bf-9428-fbf73d6ae288_1424x656.png" width="727" height="334.91011235955057" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cb2e44ee-a025-47bf-9428-fbf73d6ae288_1424x656.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:656,&quot;width&quot;:1424,&quot;resizeWidth&quot;:727,&quot;bytes&quot;:302455,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HyzY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb2e44ee-a025-47bf-9428-fbf73d6ae288_1424x656.png 424w, https://substackcdn.com/image/fetch/$s_!HyzY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb2e44ee-a025-47bf-9428-fbf73d6ae288_1424x656.png 848w, https://substackcdn.com/image/fetch/$s_!HyzY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb2e44ee-a025-47bf-9428-fbf73d6ae288_1424x656.png 1272w, https://substackcdn.com/image/fetch/$s_!HyzY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb2e44ee-a025-47bf-9428-fbf73d6ae288_1424x656.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>If you chose to write directly into the warehouse and something in the process failed, you&#8217;re left scratching your head. How do I debug this issue? What was the row that caused the issue? Can I even see what the data looked like? Why did I choose this career? </p><p>Instead, if you chose to write the data to an intermediate layer, you have some more options available to you. You can take the exact file that failed and inspect it, load it into a dev environment, scan for gremlins, and hopefully address them so that your pipeline becomes more robust to failures in the future. What a happy data engineer you&#8217;ve become!</p><h2>Loading Data</h2><p>Loading data can be fraught with difficulties. You will need to insert new rows, updating existing rows, and delete removed rows. All of these actions require a primary key. </p><p>The MERGE DDL command allows you to perform all of the above in one go. </p><pre><code><code>MERGE dataset.Inventory T
USING dataset.NewArrivals S
ON T.product = S.product
WHEN MATCHED THEN
  UPDATE SET quantity = T.quantity + S.quantity
WHEN NOT MATCHED THEN
  INSERT (product, quantity) VALUES(product, quantity)</code></code></pre><p>Issues arise however whenever schemas change. New columns in your source data means tables must be altered before an insert. Even worse, if a source data type changes then you may have a bigger problem on your hands. </p><p>It can often be helpful to add helper columns as you load data, such as a timestamp of when the current batch was loaded, in case rollbacks are needed.</p><h2>Now do it all again </h2><p>Let&#8217;s pretend this wonderful journey proceeded smoothly for you. The next question is, can you do it again tomorrow? Will you know if it ran successfully?</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://databased.pedramnavid.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://databased.pedramnavid.com/subscribe?"><span>Subscribe now</span></a></p><p> Doing a task once is much easier than doing a task every day. In my next post, we will look at orchestration and scheduling, and all the other bits that come with.</p><p>Until next time!</p>]]></content:encoded></item><item><title><![CDATA[Doing Data The Hard Way: Intro]]></title><description><![CDATA[a preview of what's to come]]></description><link>https://databased.pedramnavid.com/p/doing-data-the-hard-way</link><guid isPermaLink="false">https://databased.pedramnavid.com/p/doing-data-the-hard-way</guid><dc:creator><![CDATA[Pedram Navid]]></dc:creator><pubDate>Tue, 04 Apr 2023 23:57:33 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2af60e2c-8ad1-48ec-ad43-345f51acbdb3_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I&#8217;m working on a 3-part series on Doing Data the Hard Way. In times of plenty, we often forget what life was like in times of scarcity. This series will take us back to basics, to a time before there were data vendors ready to sell shovels to anyone in search of gold. </p><p>What we&#8217;ve seen over the past few years has been a number of companies formed to help solve real problems data teams face. The first wave of companies solved broad, large problems where the value trade-off between engineering effort and vendor spending was clear: building and maintaining ETL pipelines can require a team of data engineers to handle ingesting, changing APIs, monitoring, logging, and the rest. But as the more obvious parts of the stack became dominated by one or two vendors, smaller and smaller pieces of the data pie were left for others to build against. </p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://databased.pedramnavid.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://databased.pedramnavid.com/subscribe?"><span>Subscribe now</span></a></p><p>There was also an eagerness to believe that solving a problem at one company was a universal problem that could be easily solved at many companies. I&#8217;ve seen many attempts at bringing a well-designed solution at a larger tech company to market with the hopes that others could soon benefit from a scaled-out solution. Still, I think we are starting to see that large-company problems are often not the same as small-company problems and, even more so, that tech-company solutions don&#8217;t always work well for non-tech companies.</p><p>All this to say that it&#8217;s no surprise that we&#8217;re seeing some pushback against the complexity of the past few years. </p><p>Compounded by the unfortunate market reality that companies need money to survive. Yes, I understand this can come as a surprise, but it is a new reality we all try to navigate. With that, there&#8217;s a minimum bar for any vendor below which a customer is simply not worth acquiring. Conversely, for every company looking to buy a data product, there&#8217;s a bar above which the solution isn&#8217;t worth the cost. </p><p>I fear the gap between these two bars is increasing, which is perhaps causing an overdue re-examination of what we need and how we get there.</p><p>At the danger of waxing nostalgic, I&#8217;d like to take a look at how we solved data problems prior to having access to endless vendors eager to solve problems for us. </p><p>I&#8217;ll explore common tasks like ETL, orchestration, and moving data between systems. I&#8217;m not a purist here; some things are not worth doing yourself &#8212; BI comes to mind. And the goal isn&#8217;t to demonstrate the cheapest possible stack. We will still use cloud vendors, such as Google Cloud or AWS, but the emphasis will be on building in-house with an assortment of custom code and open-source tooling. </p><p>We will consider what it will take to go from proof-of-concept to development to production and discuss the key concerns we want to keep in mind as we build systems: from maintainability to reliability and performance.</p><p>If this sounds interesting to you, smash that subscribe button, and leave a comment about what part of the stack you&#8217;re most interested in learning about. The next post comes out in about a week&#8217;s time. </p><p>See you then.</p>]]></content:encoded></item><item><title><![CDATA[dbt Reimagined]]></title><description><![CDATA[Living in a fantasy land where anything is possible.]]></description><link>https://databased.pedramnavid.com/p/dbt-reimagined</link><guid isPermaLink="false">https://databased.pedramnavid.com/p/dbt-reimagined</guid><dc:creator><![CDATA[Pedram Navid]]></dc:creator><pubDate>Thu, 09 Mar 2023 04:42:09 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3719a880-8678-432e-8a9e-f4fee8ae7820_688x368.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>It was late one night, and I couldn&#8217;t sleep, so I started falling into a fantasy where constraints weren&#8217;t real and anything was possible. I started to think to myself: what would a future dbt look like? </p><p>Let&#8217;s explore the fantasy together.</p><h2>DSLs over Templated Code</h2><p>One thing I really enjoy about using LookML is the ability to know when I made a mistake instantly. Because LookML is a <a href="https://en.wikipedia.org/wiki/Domain-specific_language">domain-specific language</a>, it has many nice features, such as type safety, validation, and autocomplete. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eWCv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcacce723-f439-431b-8a61-ed39f9facbf5_1364x584.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eWCv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcacce723-f439-431b-8a61-ed39f9facbf5_1364x584.png 424w, https://substackcdn.com/image/fetch/$s_!eWCv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcacce723-f439-431b-8a61-ed39f9facbf5_1364x584.png 848w, https://substackcdn.com/image/fetch/$s_!eWCv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcacce723-f439-431b-8a61-ed39f9facbf5_1364x584.png 1272w, https://substackcdn.com/image/fetch/$s_!eWCv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcacce723-f439-431b-8a61-ed39f9facbf5_1364x584.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eWCv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcacce723-f439-431b-8a61-ed39f9facbf5_1364x584.png" width="1364" height="584" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cacce723-f439-431b-8a61-ed39f9facbf5_1364x584.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:584,&quot;width&quot;:1364,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:157510,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eWCv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcacce723-f439-431b-8a61-ed39f9facbf5_1364x584.png 424w, https://substackcdn.com/image/fetch/$s_!eWCv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcacce723-f439-431b-8a61-ed39f9facbf5_1364x584.png 848w, https://substackcdn.com/image/fetch/$s_!eWCv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcacce723-f439-431b-8a61-ed39f9facbf5_1364x584.png 1272w, https://substackcdn.com/image/fetch/$s_!eWCv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcacce723-f439-431b-8a61-ed39f9facbf5_1364x584.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There&#8217;s a richness to LookML that you instantly start to miss when you&#8217;re working in templated languages like yaml. While you can certainly provide a certain level of validation through <a href="https://github.com/dbt-labs/dbt-jsonschema">JSON Schemas</a> which dbt does provide, there&#8217;s still something missing when you&#8217;re writing templated SQL or model configurations in yaml. </p><p>This bifurcated state of Jinja-templated-SQL on the one hand and yaml-configs on the other in two separate files becomes painful over time as model complexity grows. </p><p>In a world where we don&#8217;t care about backward compatibility, I&#8217;d like to see a whole new language that dbt uses to provide model metadata and describe the SQL model itself simultaneously. All in one file.</p><p>Maybe it would look something like this. It could come with its own <a href="https://microsoft.github.io/language-server-protocol/">language server protocol</a> that would have a rich context-aware understanding of your source database schema to help with autocomplete and typos. </p><p>Notice how the code that describes the model and the SQL is contained in one file? This would reduce cognitive interruptions that frequently occur today when building models. How often have you updated a column in your model and not bothered to edit the schema file because it was too far away?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2L3-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7334c300-7099-4fbf-8567-c3b71b6c0a51_1018x2022.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2L3-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7334c300-7099-4fbf-8567-c3b71b6c0a51_1018x2022.png 424w, https://substackcdn.com/image/fetch/$s_!2L3-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7334c300-7099-4fbf-8567-c3b71b6c0a51_1018x2022.png 848w, https://substackcdn.com/image/fetch/$s_!2L3-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7334c300-7099-4fbf-8567-c3b71b6c0a51_1018x2022.png 1272w, https://substackcdn.com/image/fetch/$s_!2L3-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7334c300-7099-4fbf-8567-c3b71b6c0a51_1018x2022.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2L3-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7334c300-7099-4fbf-8567-c3b71b6c0a51_1018x2022.png" width="541" height="1074.5599214145384" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7334c300-7099-4fbf-8567-c3b71b6c0a51_1018x2022.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2022,&quot;width&quot;:1018,&quot;resizeWidth&quot;:541,&quot;bytes&quot;:257760,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2L3-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7334c300-7099-4fbf-8567-c3b71b6c0a51_1018x2022.png 424w, https://substackcdn.com/image/fetch/$s_!2L3-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7334c300-7099-4fbf-8567-c3b71b6c0a51_1018x2022.png 848w, https://substackcdn.com/image/fetch/$s_!2L3-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7334c300-7099-4fbf-8567-c3b71b6c0a51_1018x2022.png 1272w, https://substackcdn.com/image/fetch/$s_!2L3-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7334c300-7099-4fbf-8567-c3b71b6c0a51_1018x2022.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Language servers are very powerful tools that most programmers rely on when building software. They also work across various tools, so the same language server in VS Code would also work in Vim. </p><p>Imagine a world where you could rename a column in a base model. The language server could rename all symbols associated with that column in the model configuration and downstream models in one go. </p><p>You could also leverage code actions. For example, after explicitly naming your columns, a Code Action could generate boilerplate names and descriptions for all columns as part of your model config. You could even go so far as to have generative AI infer the column descriptions and tests. Why shouldn&#8217;t the computer already know that your id columns should be unique and not null?</p><h2>Debuggers</h2><p><a href="https://www.infoq.com/presentations/rust-systems-programmer/">Analytics Engineers can have nice things too</a>. In software engineering, debuggers can be essential. When something goes wrong, it can be nice to interrupt the flow of a program to understand the state of the system right before things go bad. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pdOs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3001bce5-0e29-46fa-be1e-e4aa5b059c2c_1214x958.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pdOs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3001bce5-0e29-46fa-be1e-e4aa5b059c2c_1214x958.png 424w, https://substackcdn.com/image/fetch/$s_!pdOs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3001bce5-0e29-46fa-be1e-e4aa5b059c2c_1214x958.png 848w, https://substackcdn.com/image/fetch/$s_!pdOs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3001bce5-0e29-46fa-be1e-e4aa5b059c2c_1214x958.png 1272w, https://substackcdn.com/image/fetch/$s_!pdOs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3001bce5-0e29-46fa-be1e-e4aa5b059c2c_1214x958.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pdOs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3001bce5-0e29-46fa-be1e-e4aa5b059c2c_1214x958.png" width="1214" height="958" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3001bce5-0e29-46fa-be1e-e4aa5b059c2c_1214x958.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:958,&quot;width&quot;:1214,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:150019,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pdOs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3001bce5-0e29-46fa-be1e-e4aa5b059c2c_1214x958.png 424w, https://substackcdn.com/image/fetch/$s_!pdOs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3001bce5-0e29-46fa-be1e-e4aa5b059c2c_1214x958.png 848w, https://substackcdn.com/image/fetch/$s_!pdOs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3001bce5-0e29-46fa-be1e-e4aa5b059c2c_1214x958.png 1272w, https://substackcdn.com/image/fetch/$s_!pdOs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3001bce5-0e29-46fa-be1e-e4aa5b059c2c_1214x958.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In Python, you can drop into the python debugger by adding a `breakpoint` function call anywhere in your program. When running tests, you can ask pytest to fall into the Python debugger whenever a test fails, or even at the start of a test to trace through the program.</p><p>What if the same were possible when building SQL models? Suppose you had an incremental model that just wasn&#8217;t quite working properly. Maybe defined like so:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rnzM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc149dd6b-60d2-452b-ac99-0e982f65e239_1644x2072.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rnzM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc149dd6b-60d2-452b-ac99-0e982f65e239_1644x2072.png 424w, https://substackcdn.com/image/fetch/$s_!rnzM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc149dd6b-60d2-452b-ac99-0e982f65e239_1644x2072.png 848w, https://substackcdn.com/image/fetch/$s_!rnzM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc149dd6b-60d2-452b-ac99-0e982f65e239_1644x2072.png 1272w, https://substackcdn.com/image/fetch/$s_!rnzM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc149dd6b-60d2-452b-ac99-0e982f65e239_1644x2072.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rnzM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc149dd6b-60d2-452b-ac99-0e982f65e239_1644x2072.png" width="1456" height="1835" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c149dd6b-60d2-452b-ac99-0e982f65e239_1644x2072.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1835,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:383837,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rnzM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc149dd6b-60d2-452b-ac99-0e982f65e239_1644x2072.png 424w, https://substackcdn.com/image/fetch/$s_!rnzM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc149dd6b-60d2-452b-ac99-0e982f65e239_1644x2072.png 848w, https://substackcdn.com/image/fetch/$s_!rnzM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc149dd6b-60d2-452b-ac99-0e982f65e239_1644x2072.png 1272w, https://substackcdn.com/image/fetch/$s_!rnzM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc149dd6b-60d2-452b-ac99-0e982f65e239_1644x2072.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I could imagine a <em>trace mode</em>, where dbt would first preview both the compiled SQL and the results of each CTE, starting with <em>sessions. </em>Next, <em>other_sessions</em> would preview, and finally the last statement. </p><p>What if you dropped into a live debugger that allowed you to run SQL against your warehouse and displayed the results right there in the editor where you work? No more jumping between editing code to your Snowflake UI and back. </p><pre><code>(dbg) &gt; ${events}
-&gt; dim_events
&gt; ${if incremental} 
-&gt; true
&gt; ${incremental_condition}
-&gt;  event_time &gt; (select max(event_time) from dim_events
&gt; ${sessions}
-&gt; sql: select * from events where event_time &gt; (select max(event_time) from dim_events
-&gt; rows: [event_time, event_type, event_id]
         [2022-01-01, cart, 1234]
         [2022-01-02, cart, 2456]
... press m for &lt;More Rows&gt; or w to quit.
</code></pre><h2>Unit Tests</h2><p>Imagine if simple unit tests were easy to write and run. One could imagine a world where you provide a small sample dataset to a query and ask dbt to run it, perhaps through an alternative adapter like DuckDB, in order to validate pieces of logic.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zVm3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ba97c44-edf6-44b9-8c0b-44683c50065e_1800x1062.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zVm3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ba97c44-edf6-44b9-8c0b-44683c50065e_1800x1062.png 424w, https://substackcdn.com/image/fetch/$s_!zVm3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ba97c44-edf6-44b9-8c0b-44683c50065e_1800x1062.png 848w, https://substackcdn.com/image/fetch/$s_!zVm3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ba97c44-edf6-44b9-8c0b-44683c50065e_1800x1062.png 1272w, https://substackcdn.com/image/fetch/$s_!zVm3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ba97c44-edf6-44b9-8c0b-44683c50065e_1800x1062.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zVm3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ba97c44-edf6-44b9-8c0b-44683c50065e_1800x1062.png" width="1456" height="859" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2ba97c44-edf6-44b9-8c0b-44683c50065e_1800x1062.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:859,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:195072,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zVm3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ba97c44-edf6-44b9-8c0b-44683c50065e_1800x1062.png 424w, https://substackcdn.com/image/fetch/$s_!zVm3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ba97c44-edf6-44b9-8c0b-44683c50065e_1800x1062.png 848w, https://substackcdn.com/image/fetch/$s_!zVm3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ba97c44-edf6-44b9-8c0b-44683c50065e_1800x1062.png 1272w, https://substackcdn.com/image/fetch/$s_!zVm3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ba97c44-edf6-44b9-8c0b-44683c50065e_1800x1062.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Now I recognize there are a million reasons why this wouldn&#8217;t work, but again, this is me dreaming. Why can&#8217;t analytics engineers have nice things too?</p><p>As I keep going down this path, more and more things seem possible now. With a proper language in place, column-level lineage is achievable. Code generation becomes easier too. Why should I ever have to write a base staging model when the database is <em>right there?</em></p><p>dbt Packages, while decent enough, could be vastly improved if you could import specific modules from them. Today it feels cumbersome to pick and choose a few tables from a package or even to pass in variables to them. But what if you could import logic from packages instead? </p><p>Relationships can also be expressed independently of metrics.  Downstream BI tools could leverage this for self-serve analytics. Who knows? Anything is possible.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_pkt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3719a880-8678-432e-8a9e-f4fee8ae7820_688x368.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_pkt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3719a880-8678-432e-8a9e-f4fee8ae7820_688x368.png 424w, https://substackcdn.com/image/fetch/$s_!_pkt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3719a880-8678-432e-8a9e-f4fee8ae7820_688x368.png 848w, https://substackcdn.com/image/fetch/$s_!_pkt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3719a880-8678-432e-8a9e-f4fee8ae7820_688x368.png 1272w, https://substackcdn.com/image/fetch/$s_!_pkt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3719a880-8678-432e-8a9e-f4fee8ae7820_688x368.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_pkt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3719a880-8678-432e-8a9e-f4fee8ae7820_688x368.png" width="688" height="368" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3719a880-8678-432e-8a9e-f4fee8ae7820_688x368.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:368,&quot;width&quot;:688,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:44392,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_pkt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3719a880-8678-432e-8a9e-f4fee8ae7820_688x368.png 424w, https://substackcdn.com/image/fetch/$s_!_pkt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3719a880-8678-432e-8a9e-f4fee8ae7820_688x368.png 848w, https://substackcdn.com/image/fetch/$s_!_pkt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3719a880-8678-432e-8a9e-f4fee8ae7820_688x368.png 1272w, https://substackcdn.com/image/fetch/$s_!_pkt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3719a880-8678-432e-8a9e-f4fee8ae7820_688x368.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Anyway, a boy can dream. </p>]]></content:encoded></item><item><title><![CDATA[The Trouble with Growth]]></title><description><![CDATA[Pedram's Universal Theory of Bad Outcomes]]></description><link>https://databased.pedramnavid.com/p/the-trouble-with-growth</link><guid isPermaLink="false">https://databased.pedramnavid.com/p/the-trouble-with-growth</guid><dc:creator><![CDATA[Pedram Navid]]></dc:creator><pubDate>Thu, 23 Feb 2023 00:26:17 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd00b3b6-7f15-43eb-9a5d-7cfce7d78ff0_782x466.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Do you remember a time when products were good? Unblemished by all the things which make them bad? I do. It happens, from time to time. A product is released, and people enjoy it. It provides a service; users adopt it because it is good and enjoyable. But that is never enough. The markets demand more. Scaling out horizontally and building additional product-lines that people want is hard work. What if we added new buttons instead?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tcBX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78c4ac8b-47c0-4290-a642-2887c7b362f6_760x950.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tcBX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78c4ac8b-47c0-4290-a642-2887c7b362f6_760x950.webp 424w, https://substackcdn.com/image/fetch/$s_!tcBX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78c4ac8b-47c0-4290-a642-2887c7b362f6_760x950.webp 848w, https://substackcdn.com/image/fetch/$s_!tcBX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78c4ac8b-47c0-4290-a642-2887c7b362f6_760x950.webp 1272w, https://substackcdn.com/image/fetch/$s_!tcBX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78c4ac8b-47c0-4290-a642-2887c7b362f6_760x950.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tcBX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78c4ac8b-47c0-4290-a642-2887c7b362f6_760x950.webp" width="365" height="456.25" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/78c4ac8b-47c0-4290-a642-2887c7b362f6_760x950.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:950,&quot;width&quot;:760,&quot;resizeWidth&quot;:365,&quot;bytes&quot;:64498,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tcBX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78c4ac8b-47c0-4290-a642-2887c7b362f6_760x950.webp 424w, https://substackcdn.com/image/fetch/$s_!tcBX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78c4ac8b-47c0-4290-a642-2887c7b362f6_760x950.webp 848w, https://substackcdn.com/image/fetch/$s_!tcBX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78c4ac8b-47c0-4290-a642-2887c7b362f6_760x950.webp 1272w, https://substackcdn.com/image/fetch/$s_!tcBX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78c4ac8b-47c0-4290-a642-2887c7b362f6_760x950.webp 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Remember Zoom? A pleasant way to run virtual meetings. A clean interface, good video and audio quality, and an easy way to create, schedule, and share meetings. But was that enough? No.</p><p>Instead, there are Limited Time Offers and Apps for things no one has ever heard of, like twine (advanced breakouts), or Sesh (a virtual agenda) or W (AI Business Cards). I have no opinion on the usefulness of advanced breakouts or AI Business Cards. Still, I do know that seeing these pop-up in my Zoom meetings automatically, unwarranted is not something I would ever want.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UM1Z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0fb9767-ab02-4108-9ed0-02cfb503ba36_1286x206.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UM1Z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0fb9767-ab02-4108-9ed0-02cfb503ba36_1286x206.png 424w, https://substackcdn.com/image/fetch/$s_!UM1Z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0fb9767-ab02-4108-9ed0-02cfb503ba36_1286x206.png 848w, https://substackcdn.com/image/fetch/$s_!UM1Z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0fb9767-ab02-4108-9ed0-02cfb503ba36_1286x206.png 1272w, https://substackcdn.com/image/fetch/$s_!UM1Z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0fb9767-ab02-4108-9ed0-02cfb503ba36_1286x206.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UM1Z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0fb9767-ab02-4108-9ed0-02cfb503ba36_1286x206.png" width="1286" height="206" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a0fb9767-ab02-4108-9ed0-02cfb503ba36_1286x206.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:206,&quot;width&quot;:1286,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:94282,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UM1Z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0fb9767-ab02-4108-9ed0-02cfb503ba36_1286x206.png 424w, https://substackcdn.com/image/fetch/$s_!UM1Z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0fb9767-ab02-4108-9ed0-02cfb503ba36_1286x206.png 848w, https://substackcdn.com/image/fetch/$s_!UM1Z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0fb9767-ab02-4108-9ed0-02cfb503ba36_1286x206.png 1272w, https://substackcdn.com/image/fetch/$s_!UM1Z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0fb9767-ab02-4108-9ed0-02cfb503ba36_1286x206.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!N24u!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00699943-eefa-4cac-86ea-7a205a181a3f_1052x97.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!N24u!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00699943-eefa-4cac-86ea-7a205a181a3f_1052x97.png 424w, https://substackcdn.com/image/fetch/$s_!N24u!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00699943-eefa-4cac-86ea-7a205a181a3f_1052x97.png 848w, https://substackcdn.com/image/fetch/$s_!N24u!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00699943-eefa-4cac-86ea-7a205a181a3f_1052x97.png 1272w, https://substackcdn.com/image/fetch/$s_!N24u!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00699943-eefa-4cac-86ea-7a205a181a3f_1052x97.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!N24u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00699943-eefa-4cac-86ea-7a205a181a3f_1052x97.png" width="1052" height="97" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/00699943-eefa-4cac-86ea-7a205a181a3f_1052x97.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:97,&quot;width&quot;:1052,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:59855,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!N24u!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00699943-eefa-4cac-86ea-7a205a181a3f_1052x97.png 424w, https://substackcdn.com/image/fetch/$s_!N24u!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00699943-eefa-4cac-86ea-7a205a181a3f_1052x97.png 848w, https://substackcdn.com/image/fetch/$s_!N24u!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00699943-eefa-4cac-86ea-7a205a181a3f_1052x97.png 1272w, https://substackcdn.com/image/fetch/$s_!N24u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00699943-eefa-4cac-86ea-7a205a181a3f_1052x97.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><p>I&#8217;m not alone; there are dozens of us out there. And this isn&#8217;t strictly a Zoom phenomenon. It happens everywhere, and it&#8217;s so pervasive I would argue it&#8217;s a driving force for many new products. Old products get so bad that people leave in frustration for anything that stops treating them so poorly.</p><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/QuinnyPig/status/1623165309522956290?s=20&quot;,&quot;full_text&quot;:&quot;I&#8217;m about to cancel <span class=\&quot;tweet-fake-link\&quot;>@Zoom</span>, save a few grand a year, and move the company to Google Meet instead. The post-meeting up sell ads have become intolerable, and there&#8217;s no way to turn them off. &quot;,&quot;username&quot;:&quot;QuinnyPig&quot;,&quot;name&quot;:&quot;Corey Quinn / @quinnypig@awscommunity.social&quot;,&quot;profile_image_url&quot;:&quot;&quot;,&quot;date&quot;:&quot;Wed Feb 08 03:42:21 +0000 2023&quot;,&quot;photos&quot;:[],&quot;quoted_tweet&quot;:{&quot;full_text&quot;:&quot;hey, @Zoom it's REALLY annoying to see the constant upsell ads in the app when I'm a paid subscriber.\n\nJust STOP already, please.&quot;,&quot;username&quot;:&quot;briansooy&quot;,&quot;name&quot;:&quot;Brian Sooy&quot;},&quot;reply_count&quot;:0,&quot;retweet_count&quot;:22,&quot;like_count&quot;:408,&quot;impression_count&quot;:0,&quot;expanded_url&quot;:{},&quot;video_url&quot;:null,&quot;belowTheFold&quot;:false}" data-component-name="Twitter2ToDOM"></div><p>Remember Gmail? This is what it looks like today. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XckX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5adbaf66-1885-4f1e-b010-ec6334307b2e_2190x846.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XckX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5adbaf66-1885-4f1e-b010-ec6334307b2e_2190x846.png 424w, https://substackcdn.com/image/fetch/$s_!XckX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5adbaf66-1885-4f1e-b010-ec6334307b2e_2190x846.png 848w, https://substackcdn.com/image/fetch/$s_!XckX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5adbaf66-1885-4f1e-b010-ec6334307b2e_2190x846.png 1272w, https://substackcdn.com/image/fetch/$s_!XckX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5adbaf66-1885-4f1e-b010-ec6334307b2e_2190x846.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XckX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5adbaf66-1885-4f1e-b010-ec6334307b2e_2190x846.png" width="1456" height="562" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5adbaf66-1885-4f1e-b010-ec6334307b2e_2190x846.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:562,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:354283,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XckX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5adbaf66-1885-4f1e-b010-ec6334307b2e_2190x846.png 424w, https://substackcdn.com/image/fetch/$s_!XckX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5adbaf66-1885-4f1e-b010-ec6334307b2e_2190x846.png 848w, https://substackcdn.com/image/fetch/$s_!XckX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5adbaf66-1885-4f1e-b010-ec6334307b2e_2190x846.png 1272w, https://substackcdn.com/image/fetch/$s_!XckX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5adbaf66-1885-4f1e-b010-ec6334307b2e_2190x846.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There&#8217;s Mail, Chat, Spaces, Meet on one sidebar. A whole different sidebar on the right for Google Calendar, Keep, Tasks, and Contacts. Not to mention Active? Whatever that is. Gmail became so annoying that people started paying for an email client to avoid having to login to that nightmare. </p><p>Microsoft Windows 11 is so bad that <a href="https://www.thewindowsclub.com/remove-annoying-windows-11-features">entire websites</a> are dedicated to helping people remove advertisements from your operating system. Only Adobe could perhaps rival Microsoft in this domain. Something as simple as saving a file is now a complex feature-parity contest.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!btxE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5f0160-5150-435d-8de5-87a8c99c7a4e_1678x1190.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!btxE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5f0160-5150-435d-8de5-87a8c99c7a4e_1678x1190.png 424w, https://substackcdn.com/image/fetch/$s_!btxE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5f0160-5150-435d-8de5-87a8c99c7a4e_1678x1190.png 848w, https://substackcdn.com/image/fetch/$s_!btxE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5f0160-5150-435d-8de5-87a8c99c7a4e_1678x1190.png 1272w, https://substackcdn.com/image/fetch/$s_!btxE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5f0160-5150-435d-8de5-87a8c99c7a4e_1678x1190.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!btxE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5f0160-5150-435d-8de5-87a8c99c7a4e_1678x1190.png" width="571" height="405.11195054945057" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fb5f0160-5150-435d-8de5-87a8c99c7a4e_1678x1190.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1033,&quot;width&quot;:1456,&quot;resizeWidth&quot;:571,&quot;bytes&quot;:173351,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!btxE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5f0160-5150-435d-8de5-87a8c99c7a4e_1678x1190.png 424w, https://substackcdn.com/image/fetch/$s_!btxE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5f0160-5150-435d-8de5-87a8c99c7a4e_1678x1190.png 848w, https://substackcdn.com/image/fetch/$s_!btxE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5f0160-5150-435d-8de5-87a8c99c7a4e_1678x1190.png 1272w, https://substackcdn.com/image/fetch/$s_!btxE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5f0160-5150-435d-8de5-87a8c99c7a4e_1678x1190.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>It&#8217;s a clear and pervasive trend, but the question is why is it happening and why is Pedram writing about this in a data newsletter? Is he going to try and weave some tangental thread through these as if there&#8217;s some type of natural connection with his central thesis, and pretend that this isn&#8217;t just an excuse to complain about things that annoy him? </p><p>Absolutely not.</p><h2>Simpson&#8217;s Paradox, Local Maximums and Other Tangential Threads</h2><p>In statistics, Simpson&#8217;s paradox is a confusing trend where what appears to be a correlation in one direction may actually be a trend in another direction. What the model failed to do was account for confounding variables. What appeared to be a positive trend may actually have been a negative trend for each subgroup. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zfkF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61698b14-c9f9-40f0-898d-3c4a22fc7fe6_700x500.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zfkF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61698b14-c9f9-40f0-898d-3c4a22fc7fe6_700x500.gif 424w, https://substackcdn.com/image/fetch/$s_!zfkF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61698b14-c9f9-40f0-898d-3c4a22fc7fe6_700x500.gif 848w, https://substackcdn.com/image/fetch/$s_!zfkF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61698b14-c9f9-40f0-898d-3c4a22fc7fe6_700x500.gif 1272w, https://substackcdn.com/image/fetch/$s_!zfkF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61698b14-c9f9-40f0-898d-3c4a22fc7fe6_700x500.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zfkF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61698b14-c9f9-40f0-898d-3c4a22fc7fe6_700x500.gif" width="700" height="500" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/61698b14-c9f9-40f0-898d-3c4a22fc7fe6_700x500.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:500,&quot;width&quot;:700,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1270639,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zfkF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61698b14-c9f9-40f0-898d-3c4a22fc7fe6_700x500.gif 424w, https://substackcdn.com/image/fetch/$s_!zfkF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61698b14-c9f9-40f0-898d-3c4a22fc7fe6_700x500.gif 848w, https://substackcdn.com/image/fetch/$s_!zfkF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61698b14-c9f9-40f0-898d-3c4a22fc7fe6_700x500.gif 1272w, https://substackcdn.com/image/fetch/$s_!zfkF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61698b14-c9f9-40f0-898d-3c4a22fc7fe6_700x500.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Simpson's paradox. (2023, February 9). In <em>Wikipedia</em>. https://en.wikipedia.org/wiki/Simpson%27s_paradox</figcaption></figure></div><p>Somewhat related is Anscombe&#8217;s famous Quartet of identical datasets when explored using summary statistics, but with obviously different distributions. On the surface, looking at a high-level summary, the data all looks the same, but when plotted, or analyzed more carefully, it turns out there are four very different models at work. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4HH_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bcdcf9e-a49b-4db2-970e-9c8e66f43d16_425x309.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4HH_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bcdcf9e-a49b-4db2-970e-9c8e66f43d16_425x309.png 424w, https://substackcdn.com/image/fetch/$s_!4HH_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bcdcf9e-a49b-4db2-970e-9c8e66f43d16_425x309.png 848w, https://substackcdn.com/image/fetch/$s_!4HH_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bcdcf9e-a49b-4db2-970e-9c8e66f43d16_425x309.png 1272w, https://substackcdn.com/image/fetch/$s_!4HH_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bcdcf9e-a49b-4db2-970e-9c8e66f43d16_425x309.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4HH_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bcdcf9e-a49b-4db2-970e-9c8e66f43d16_425x309.png" width="425" height="309" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4bcdcf9e-a49b-4db2-970e-9c8e66f43d16_425x309.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:309,&quot;width&quot;:425,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4HH_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bcdcf9e-a49b-4db2-970e-9c8e66f43d16_425x309.png 424w, https://substackcdn.com/image/fetch/$s_!4HH_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bcdcf9e-a49b-4db2-970e-9c8e66f43d16_425x309.png 848w, https://substackcdn.com/image/fetch/$s_!4HH_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bcdcf9e-a49b-4db2-970e-9c8e66f43d16_425x309.png 1272w, https://substackcdn.com/image/fetch/$s_!4HH_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bcdcf9e-a49b-4db2-970e-9c8e66f43d16_425x309.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>There&#8217;s also the oft-used (and abused) idea of local maxima, where what seems like the best decision given the current context is far behind the globally best decision given full context. It&#8217;s a common pushback against experimentation-culture, where the critics say that small incremental changes are unlikely to lead to the large, big-bet changes that help build lasting companies.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!a8Ly!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd00b3b6-7f15-43eb-9a5d-7cfce7d78ff0_782x466.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!a8Ly!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd00b3b6-7f15-43eb-9a5d-7cfce7d78ff0_782x466.png 424w, https://substackcdn.com/image/fetch/$s_!a8Ly!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd00b3b6-7f15-43eb-9a5d-7cfce7d78ff0_782x466.png 848w, https://substackcdn.com/image/fetch/$s_!a8Ly!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd00b3b6-7f15-43eb-9a5d-7cfce7d78ff0_782x466.png 1272w, https://substackcdn.com/image/fetch/$s_!a8Ly!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd00b3b6-7f15-43eb-9a5d-7cfce7d78ff0_782x466.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!a8Ly!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd00b3b6-7f15-43eb-9a5d-7cfce7d78ff0_782x466.png" width="782" height="466" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fd00b3b6-7f15-43eb-9a5d-7cfce7d78ff0_782x466.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:466,&quot;width&quot;:782,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Local Maximum: What It Is, and How to Get Over It in A/B Testing - CXL&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Local Maximum: What It Is, and How to Get Over It in A/B Testing - CXL" title="Local Maximum: What It Is, and How to Get Over It in A/B Testing - CXL" srcset="https://substackcdn.com/image/fetch/$s_!a8Ly!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd00b3b6-7f15-43eb-9a5d-7cfce7d78ff0_782x466.png 424w, https://substackcdn.com/image/fetch/$s_!a8Ly!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd00b3b6-7f15-43eb-9a5d-7cfce7d78ff0_782x466.png 848w, https://substackcdn.com/image/fetch/$s_!a8Ly!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd00b3b6-7f15-43eb-9a5d-7cfce7d78ff0_782x466.png 1272w, https://substackcdn.com/image/fetch/$s_!a8Ly!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd00b3b6-7f15-43eb-9a5d-7cfce7d78ff0_782x466.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>These are well-understood entry-level concepts within statistics. But they all point to a common trend I think we see in some of the products I described above and many others. </p><p>What starts as a good product, gains adoption. The fast pace of growth eventually slows due to the laws of physics. Rather than update models and assumptions, the company hires a growth team; they &#8216;experiment&#8217; and each experiment in isolation seems like a good idea. They might move the metrics in the right direction. Maybe they generate marginal increases in revenue. But the overall experience suffers. And soon, all these marginal improvements end up creating a bad product, which inevitably leads to the downfall. New players sense this weakness and dissatisfaction, and competition enters the field to serve your market better than you did. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Bc4N!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6835cbcf-9be4-4552-9ade-3da7ec98c102_800x800.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Bc4N!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6835cbcf-9be4-4552-9ade-3da7ec98c102_800x800.png 424w, https://substackcdn.com/image/fetch/$s_!Bc4N!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6835cbcf-9be4-4552-9ade-3da7ec98c102_800x800.png 848w, https://substackcdn.com/image/fetch/$s_!Bc4N!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6835cbcf-9be4-4552-9ade-3da7ec98c102_800x800.png 1272w, https://substackcdn.com/image/fetch/$s_!Bc4N!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6835cbcf-9be4-4552-9ade-3da7ec98c102_800x800.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Bc4N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6835cbcf-9be4-4552-9ade-3da7ec98c102_800x800.png" width="503" height="503" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6835cbcf-9be4-4552-9ade-3da7ec98c102_800x800.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:800,&quot;resizeWidth&quot;:503,&quot;bytes&quot;:55702,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Bc4N!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6835cbcf-9be4-4552-9ade-3da7ec98c102_800x800.png 424w, https://substackcdn.com/image/fetch/$s_!Bc4N!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6835cbcf-9be4-4552-9ade-3da7ec98c102_800x800.png 848w, https://substackcdn.com/image/fetch/$s_!Bc4N!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6835cbcf-9be4-4552-9ade-3da7ec98c102_800x800.png 1272w, https://substackcdn.com/image/fetch/$s_!Bc4N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6835cbcf-9be4-4552-9ade-3da7ec98c102_800x800.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This same model plays out in the data ecosystem. Vendors who build something really good at first can quickly find that they&#8217;ve tapped their initial market. Faced with this precipice, they look for &#8216;quick wins&#8217;. But quick wins is code for minor optimizations. </p><p>What they should have been doing was preparing for this scenario months ago. The best founders are always paranoid and never satisfied. They&#8217;re less focused on how to win today, but on what to win next. </p><p>As you&#8217;re building out products and gaining success, I&#8217;d encourage founders to ask themselves that same question. You have an entire team dedicated to helping you win today. The question you should be always asking yourself is what&#8217;s next?</p><p></p>]]></content:encoded></item><item><title><![CDATA[Streaming Data Pipelines with Striim + DuckDB]]></title><description><![CDATA[Big thanks to Striim for getting me a preview of their new developer experience and sponsoring this post. Last month I got a sneak preview of Striim&#8217;s new developer experience that makes it easy to get started with CDC using BigQuery or Snowflake. If you missed my]]></description><link>https://databased.pedramnavid.com/p/streaming-data-pipelines-with-striim</link><guid isPermaLink="false">https://databased.pedramnavid.com/p/streaming-data-pipelines-with-striim</guid><dc:creator><![CDATA[Pedram Navid]]></dc:creator><pubDate>Tue, 31 Jan 2023 18:01:14 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/66e257ce-5432-4eba-9c7e-509d70899b6f_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Big thanks to Striim for getting me a preview of their new developer experience and sponsoring this post.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!C7bP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8517dd8e-b714-4973-bf5d-d66d8b0de2d2_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!C7bP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8517dd8e-b714-4973-bf5d-d66d8b0de2d2_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!C7bP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8517dd8e-b714-4973-bf5d-d66d8b0de2d2_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!C7bP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8517dd8e-b714-4973-bf5d-d66d8b0de2d2_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!C7bP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8517dd8e-b714-4973-bf5d-d66d8b0de2d2_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!C7bP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8517dd8e-b714-4973-bf5d-d66d8b0de2d2_1024x1024.png" width="495" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8517dd8e-b714-4973-bf5d-d66d8b0de2d2_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:495,&quot;bytes&quot;:2031737,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!C7bP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8517dd8e-b714-4973-bf5d-d66d8b0de2d2_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!C7bP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8517dd8e-b714-4973-bf5d-d66d8b0de2d2_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!C7bP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8517dd8e-b714-4973-bf5d-d66d8b0de2d2_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!C7bP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8517dd8e-b714-4973-bf5d-d66d8b0de2d2_1024x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><p>Last month I got a sneak preview of Striim&#8217;s new developer experience that makes it easy to get started with CDC using BigQuery or Snowflake. If you missed my <a href="https://twitter.com/pdrmnvd/status/1605620847741325312?s=20">thread</a> about that, check it out. In this post, I&#8217;ll look at how you can leverage Striim, Parquet, and DuckDB for real-time data ingestion with fast data analysis. </p><p>Data pipelines have traditionally been batch, and batch pipelines are usually easier to reason about. Data comes in once a day. I run all my transformations and load them sometime between 12:01 AM and 7 AM UTC (or was it PT? Timezones are hard.) Views and tables get updated, people look at data from yesterday, they get the answers to all their questions, and life is good. Life is simple.&nbsp;</p><p>Unfortunately: the good old days are dead. Now we run operational workflows off data constantly fed into the data warehouse. We need to reduce the lag of all the various components that ingest, digest, transform, and reform our data as much as possible. For example, we have personalized workflows that send automated emails to prospects and customers who expect us to understand only every interaction they&#8217;ve had with us but also anticipate every human desire they could conceivably have in the next fifteen minutes.&nbsp;</p><p>We&#8217;ve started pushing batch to the limits of streaming. Some of these batch tools can run as often as every 5 minutes, pushing the boundaries of what is and isn&#8217;t streaming anymore.</p><p>This is what got me interested in the streaming and CDC space in the first place. I wanted to know if there was a better way. After a dizzying stroll down Debezium Lane, and a confusing jaunt through Kafka Caverns, I received a nice demo from the fine folks at <a href="https://striim.com">Striim</a>.&nbsp;</p><p>Striim is an enterprise-grade CDC platform, and I am but a lowly developer with toy examples, and it works just as well for me. For my first attempt in the tweet above, I set up a simple Postgres instance, piped data into it, and watched as Striim fed my BigQuery tables with change capture data every few seconds.&nbsp;</p><p>But BigQuery is old news. Today, I wanted to see if I could get our lord and savior, DuckDB, to work with Striim. The setup was simple: use a GCP Writer to save streaming data to Parquet. Then, use DuckDB&#8217;s HTTPFS extension to read data from Parquet files in bulk. Write queries. Enjoy streaming.</p><p>Let&#8217;s dive in.&nbsp;</p><h2>Striim Setup</h2><p>I decided to use one of the built-in data generators to get started quickly. These data generators are great for sketching out ideas since they let you avoid the messy parts of connecting data systems, such as permissions and IP allow-lists. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pXZJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd944dbe4-c57d-4c54-846b-82c9ced25086_528x860.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pXZJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd944dbe4-c57d-4c54-846b-82c9ced25086_528x860.png 424w, https://substackcdn.com/image/fetch/$s_!pXZJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd944dbe4-c57d-4c54-846b-82c9ced25086_528x860.png 848w, https://substackcdn.com/image/fetch/$s_!pXZJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd944dbe4-c57d-4c54-846b-82c9ced25086_528x860.png 1272w, https://substackcdn.com/image/fetch/$s_!pXZJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd944dbe4-c57d-4c54-846b-82c9ced25086_528x860.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pXZJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd944dbe4-c57d-4c54-846b-82c9ced25086_528x860.png" width="318" height="517.9545454545455" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d944dbe4-c57d-4c54-846b-82c9ced25086_528x860.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:860,&quot;width&quot;:528,&quot;resizeWidth&quot;:318,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pXZJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd944dbe4-c57d-4c54-846b-82c9ced25086_528x860.png 424w, https://substackcdn.com/image/fetch/$s_!pXZJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd944dbe4-c57d-4c54-846b-82c9ced25086_528x860.png 848w, https://substackcdn.com/image/fetch/$s_!pXZJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd944dbe4-c57d-4c54-846b-82c9ced25086_528x860.png 1272w, https://substackcdn.com/image/fetch/$s_!pXZJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd944dbe4-c57d-4c54-846b-82c9ced25086_528x860.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The <strong>ContinousGenerator</strong> can be set to various types of throughput. Low Throughput sends about ten messages per second, Medium for hundreds per second, or Spike for variable traffic with high spikes which can be handy for testing the resiliency of your pipelines.</p><p>Next up, I used a <strong>Query</strong> cell, which operates on an incoming stream and allows you to do transformations as data is produced. This can save lots of expensive compute in your warehouse by shifting the transformations left, closer to the data source. You can also do data-masking as data arrives, to make staying compliance easier. I wrote a simple query that takes the generated data, masks sensitive information, and outputs the results to a GDPR stream.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MuZ9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36bbf5d-07be-4138-8f9e-7f833594b352_992x500.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MuZ9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36bbf5d-07be-4138-8f9e-7f833594b352_992x500.png 424w, https://substackcdn.com/image/fetch/$s_!MuZ9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36bbf5d-07be-4138-8f9e-7f833594b352_992x500.png 848w, https://substackcdn.com/image/fetch/$s_!MuZ9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36bbf5d-07be-4138-8f9e-7f833594b352_992x500.png 1272w, https://substackcdn.com/image/fetch/$s_!MuZ9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36bbf5d-07be-4138-8f9e-7f833594b352_992x500.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MuZ9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36bbf5d-07be-4138-8f9e-7f833594b352_992x500.png" width="551" height="277.7217741935484" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f36bbf5d-07be-4138-8f9e-7f833594b352_992x500.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:500,&quot;width&quot;:992,&quot;resizeWidth&quot;:551,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MuZ9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36bbf5d-07be-4138-8f9e-7f833594b352_992x500.png 424w, https://substackcdn.com/image/fetch/$s_!MuZ9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36bbf5d-07be-4138-8f9e-7f833594b352_992x500.png 848w, https://substackcdn.com/image/fetch/$s_!MuZ9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36bbf5d-07be-4138-8f9e-7f833594b352_992x500.png 1272w, https://substackcdn.com/image/fetch/$s_!MuZ9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36bbf5d-07be-4138-8f9e-7f833594b352_992x500.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Finally, to be able to analyze this data in DuckDB, I elected to write the data in Parquet format to Google Cloud Storage, although S3 would also work just as well. To do that, I used the GCP Writer Target. After creating a Service Account in Google Cloud, I setup a few basic settings such as the path to the bucket and format I&#8217;d like the files saved in. </p><p>One setting to be aware of is the Upload Policy, which determines how frequently (and conversely, how large) the files are. Finding a good balance here is important, as too many files or too few can both hinder performance.</p><p>I set the Upload Policy to write every 100,000 events or every 1 minute. I set the ParquetFormatter as the output option.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ceBj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fdbcf61-1467-4880-9a8b-a3304d43fd1b_754x524.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ceBj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fdbcf61-1467-4880-9a8b-a3304d43fd1b_754x524.png 424w, https://substackcdn.com/image/fetch/$s_!ceBj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fdbcf61-1467-4880-9a8b-a3304d43fd1b_754x524.png 848w, https://substackcdn.com/image/fetch/$s_!ceBj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fdbcf61-1467-4880-9a8b-a3304d43fd1b_754x524.png 1272w, https://substackcdn.com/image/fetch/$s_!ceBj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fdbcf61-1467-4880-9a8b-a3304d43fd1b_754x524.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ceBj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fdbcf61-1467-4880-9a8b-a3304d43fd1b_754x524.png" width="411" height="285.6286472148541" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3fdbcf61-1467-4880-9a8b-a3304d43fd1b_754x524.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:524,&quot;width&quot;:754,&quot;resizeWidth&quot;:411,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ceBj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fdbcf61-1467-4880-9a8b-a3304d43fd1b_754x524.png 424w, https://substackcdn.com/image/fetch/$s_!ceBj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fdbcf61-1467-4880-9a8b-a3304d43fd1b_754x524.png 848w, https://substackcdn.com/image/fetch/$s_!ceBj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fdbcf61-1467-4880-9a8b-a3304d43fd1b_754x524.png 1272w, https://substackcdn.com/image/fetch/$s_!ceBj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fdbcf61-1467-4880-9a8b-a3304d43fd1b_754x524.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>With the setup complete, all that is needed is for the app to be deployed and started. You even have a preview feature to watch data as it is fed through the system. You can see I&#8217;m fetching about 900 messages every second, and after about a minute the data will write to GCP.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ib2S!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F592a62d6-f92a-4aa0-b747-c3290dcc851d_1600x971.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ib2S!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F592a62d6-f92a-4aa0-b747-c3290dcc851d_1600x971.png 424w, https://substackcdn.com/image/fetch/$s_!ib2S!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F592a62d6-f92a-4aa0-b747-c3290dcc851d_1600x971.png 848w, https://substackcdn.com/image/fetch/$s_!ib2S!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F592a62d6-f92a-4aa0-b747-c3290dcc851d_1600x971.png 1272w, https://substackcdn.com/image/fetch/$s_!ib2S!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F592a62d6-f92a-4aa0-b747-c3290dcc851d_1600x971.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ib2S!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F592a62d6-f92a-4aa0-b747-c3290dcc851d_1600x971.png" width="1456" height="884" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/592a62d6-f92a-4aa0-b747-c3290dcc851d_1600x971.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:884,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ib2S!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F592a62d6-f92a-4aa0-b747-c3290dcc851d_1600x971.png 424w, https://substackcdn.com/image/fetch/$s_!ib2S!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F592a62d6-f92a-4aa0-b747-c3290dcc851d_1600x971.png 848w, https://substackcdn.com/image/fetch/$s_!ib2S!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F592a62d6-f92a-4aa0-b747-c3290dcc851d_1600x971.png 1272w, https://substackcdn.com/image/fetch/$s_!ib2S!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F592a62d6-f92a-4aa0-b747-c3290dcc851d_1600x971.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>What&#8217;s neat is that Striim even displays the total End to End Lag so you can have insight into how delayed your pipelines are. In my case, the lag was about 30 seconds from creation to write.</p><p>After running for a while, the Parquet files are loaded in GCP and now it&#8217;s time to analyze the results with DuckDB.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eWp2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2033dfe6-1d79-4da6-9103-2ef139156f9e_458x546.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eWp2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2033dfe6-1d79-4da6-9103-2ef139156f9e_458x546.png 424w, https://substackcdn.com/image/fetch/$s_!eWp2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2033dfe6-1d79-4da6-9103-2ef139156f9e_458x546.png 848w, https://substackcdn.com/image/fetch/$s_!eWp2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2033dfe6-1d79-4da6-9103-2ef139156f9e_458x546.png 1272w, https://substackcdn.com/image/fetch/$s_!eWp2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2033dfe6-1d79-4da6-9103-2ef139156f9e_458x546.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eWp2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2033dfe6-1d79-4da6-9103-2ef139156f9e_458x546.png" width="364" height="433.93886462882097" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2033dfe6-1d79-4da6-9103-2ef139156f9e_458x546.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:546,&quot;width&quot;:458,&quot;resizeWidth&quot;:364,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eWp2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2033dfe6-1d79-4da6-9103-2ef139156f9e_458x546.png 424w, https://substackcdn.com/image/fetch/$s_!eWp2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2033dfe6-1d79-4da6-9103-2ef139156f9e_458x546.png 848w, https://substackcdn.com/image/fetch/$s_!eWp2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2033dfe6-1d79-4da6-9103-2ef139156f9e_458x546.png 1272w, https://substackcdn.com/image/fetch/$s_!eWp2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2033dfe6-1d79-4da6-9103-2ef139156f9e_458x546.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Analyze with DuckDB</h2><p>There are many ways to use DuckDB given that it&#8217;s a small portable binary. The CLI is a great place for simple prototyping but I prefer using Datagrip for writing queries.&nbsp;</p><p>After creating a new DuckDB connection and enabling single-session mode, I added a small startup script to ensure that every time I connect to DuckDB my GCP credentials are entered.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GFTY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecbde811-07b9-4aa8-8a73-76b2593131dd_1106x734.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GFTY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecbde811-07b9-4aa8-8a73-76b2593131dd_1106x734.png 424w, https://substackcdn.com/image/fetch/$s_!GFTY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecbde811-07b9-4aa8-8a73-76b2593131dd_1106x734.png 848w, https://substackcdn.com/image/fetch/$s_!GFTY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecbde811-07b9-4aa8-8a73-76b2593131dd_1106x734.png 1272w, https://substackcdn.com/image/fetch/$s_!GFTY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecbde811-07b9-4aa8-8a73-76b2593131dd_1106x734.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GFTY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecbde811-07b9-4aa8-8a73-76b2593131dd_1106x734.png" width="507" height="336.47197106690777" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ecbde811-07b9-4aa8-8a73-76b2593131dd_1106x734.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:734,&quot;width&quot;:1106,&quot;resizeWidth&quot;:507,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GFTY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecbde811-07b9-4aa8-8a73-76b2593131dd_1106x734.png 424w, https://substackcdn.com/image/fetch/$s_!GFTY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecbde811-07b9-4aa8-8a73-76b2593131dd_1106x734.png 848w, https://substackcdn.com/image/fetch/$s_!GFTY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecbde811-07b9-4aa8-8a73-76b2593131dd_1106x734.png 1272w, https://substackcdn.com/image/fetch/$s_!GFTY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecbde811-07b9-4aa8-8a73-76b2593131dd_1106x734.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>The <a href="https://duckdb.org/docs/guides/import/s3_export">docs on setting up S3 or GCS</a> access are pretty straightforward. A few simple SET commands and then you&#8217;re ready to query!</p><pre><code><code>INSTALL httpfs;
LOAD httpfs;&nbsp;
SET s3_endpoint='storage.googleapis.com';
SET s3_access_key_id='MY_ACCESS_KEY';
SET s3_secret_access_key='MY_SECRET';</code></code></pre><p>To start, I ran a simple query to see how many records we have in each file. By using the <code>filename=TRUE</code> command, DuckDB returns the filename as a column in the table, which I use for aggregation.</p><pre><code>SELECT
 
filename,
COUNT(1) AS n_records
 
FROM parquet_scan('s3://my-duckdb-bucket/striim-out.*', filename=TRUE)
GROUP BY filename
ORDER BY 1;</code></pre><p>In about 7 seconds, DuckDB scanned 760,000 records across 14 files with 55,000 records each to generate a count of records by file. And the best part is there&#8217;s no Spark cluster to maintain. You can see below that using the filename to do a group by makes it easy to get a sense of how many records were written in each file.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-twD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F647f3d01-7ce7-434d-ac85-8c2e5f9afa34_640x444.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-twD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F647f3d01-7ce7-434d-ac85-8c2e5f9afa34_640x444.png 424w, https://substackcdn.com/image/fetch/$s_!-twD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F647f3d01-7ce7-434d-ac85-8c2e5f9afa34_640x444.png 848w, https://substackcdn.com/image/fetch/$s_!-twD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F647f3d01-7ce7-434d-ac85-8c2e5f9afa34_640x444.png 1272w, https://substackcdn.com/image/fetch/$s_!-twD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F647f3d01-7ce7-434d-ac85-8c2e5f9afa34_640x444.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-twD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F647f3d01-7ce7-434d-ac85-8c2e5f9afa34_640x444.png" width="512" height="355.2" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/647f3d01-7ce7-434d-ac85-8c2e5f9afa34_640x444.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:444,&quot;width&quot;:640,&quot;resizeWidth&quot;:512,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-twD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F647f3d01-7ce7-434d-ac85-8c2e5f9afa34_640x444.png 424w, https://substackcdn.com/image/fetch/$s_!-twD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F647f3d01-7ce7-434d-ac85-8c2e5f9afa34_640x444.png 848w, https://substackcdn.com/image/fetch/$s_!-twD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F647f3d01-7ce7-434d-ac85-8c2e5f9afa34_640x444.png 1272w, https://substackcdn.com/image/fetch/$s_!-twD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F647f3d01-7ce7-434d-ac85-8c2e5f9afa34_640x444.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We can even do fast text processing. In 6 seconds, I can categorize all products by whether they have Heavy or Lightweight in their name and aggregate across both dimensions.</p><pre><code>SELECT
    product_name LIKE '%Lightweight%' AS is_lightweight,
    product_name LIKE '%Heavy%' AS is_heavy,
    COUNT(1) AS count_products
FROM parquet_scan('s3://my-duckdb-bucket/striim-out.*')
GROUP BY 1, 2;</code></pre><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bpHn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c82003-2740-4bee-837d-603adfaf9277_1122x202.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bpHn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c82003-2740-4bee-837d-603adfaf9277_1122x202.png 424w, https://substackcdn.com/image/fetch/$s_!bpHn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c82003-2740-4bee-837d-603adfaf9277_1122x202.png 848w, https://substackcdn.com/image/fetch/$s_!bpHn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c82003-2740-4bee-837d-603adfaf9277_1122x202.png 1272w, https://substackcdn.com/image/fetch/$s_!bpHn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c82003-2740-4bee-837d-603adfaf9277_1122x202.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bpHn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c82003-2740-4bee-837d-603adfaf9277_1122x202.png" width="1122" height="202" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/81c82003-2740-4bee-837d-603adfaf9277_1122x202.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:202,&quot;width&quot;:1122,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bpHn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c82003-2740-4bee-837d-603adfaf9277_1122x202.png 424w, https://substackcdn.com/image/fetch/$s_!bpHn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c82003-2740-4bee-837d-603adfaf9277_1122x202.png 848w, https://substackcdn.com/image/fetch/$s_!bpHn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c82003-2740-4bee-837d-603adfaf9277_1122x202.png 1272w, https://substackcdn.com/image/fetch/$s_!bpHn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c82003-2740-4bee-837d-603adfaf9277_1122x202.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The best part is this data is constantly updated by Striim. Every minute a new batch of 55,000 records arrives.&nbsp;</p><p>That was fun, but we needed to go faster. Just for fun, I cranked up the generator to see how it would handle a higher rate and set the Upload Limit to 25,000 records per file. I easily hit 20,000 messages per second, and the end-to-end lag was just a few seconds. Striim had no problem with the throughput. In just a few minutes, I had 60 Parquet files ready for DuckDB to process.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AiPs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8debd398-9475-49fe-98d2-66b2699290b4_1600x810.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AiPs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8debd398-9475-49fe-98d2-66b2699290b4_1600x810.png 424w, https://substackcdn.com/image/fetch/$s_!AiPs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8debd398-9475-49fe-98d2-66b2699290b4_1600x810.png 848w, https://substackcdn.com/image/fetch/$s_!AiPs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8debd398-9475-49fe-98d2-66b2699290b4_1600x810.png 1272w, https://substackcdn.com/image/fetch/$s_!AiPs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8debd398-9475-49fe-98d2-66b2699290b4_1600x810.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AiPs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8debd398-9475-49fe-98d2-66b2699290b4_1600x810.png" width="1456" height="737" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8debd398-9475-49fe-98d2-66b2699290b4_1600x810.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:737,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AiPs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8debd398-9475-49fe-98d2-66b2699290b4_1600x810.png 424w, https://substackcdn.com/image/fetch/$s_!AiPs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8debd398-9475-49fe-98d2-66b2699290b4_1600x810.png 848w, https://substackcdn.com/image/fetch/$s_!AiPs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8debd398-9475-49fe-98d2-66b2699290b4_1600x810.png 1272w, https://substackcdn.com/image/fetch/$s_!AiPs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8debd398-9475-49fe-98d2-66b2699290b4_1600x810.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>With now 60 files to process, DuckDB took just under 30 seconds to count every record in every file. The product name query now took 23 seconds on 1.45 million records.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UW9u!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf50d02e-b2ea-4358-82cd-03fe1b25f68d_1128x214.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UW9u!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf50d02e-b2ea-4358-82cd-03fe1b25f68d_1128x214.png 424w, https://substackcdn.com/image/fetch/$s_!UW9u!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf50d02e-b2ea-4358-82cd-03fe1b25f68d_1128x214.png 848w, https://substackcdn.com/image/fetch/$s_!UW9u!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf50d02e-b2ea-4358-82cd-03fe1b25f68d_1128x214.png 1272w, https://substackcdn.com/image/fetch/$s_!UW9u!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf50d02e-b2ea-4358-82cd-03fe1b25f68d_1128x214.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UW9u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf50d02e-b2ea-4358-82cd-03fe1b25f68d_1128x214.png" width="1128" height="214" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/df50d02e-b2ea-4358-82cd-03fe1b25f68d_1128x214.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:214,&quot;width&quot;:1128,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UW9u!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf50d02e-b2ea-4358-82cd-03fe1b25f68d_1128x214.png 424w, https://substackcdn.com/image/fetch/$s_!UW9u!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf50d02e-b2ea-4358-82cd-03fe1b25f68d_1128x214.png 848w, https://substackcdn.com/image/fetch/$s_!UW9u!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf50d02e-b2ea-4358-82cd-03fe1b25f68d_1128x214.png 1272w, https://substackcdn.com/image/fetch/$s_!UW9u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf50d02e-b2ea-4358-82cd-03fe1b25f68d_1128x214.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>As one final test, I decided to push some regex and aggregates down to see the impact on performance, and DuckDB held up well. This query took under a minute to query all 1.45 million records, and I didn&#8217;t have to store a single file locally. (And if you were wondering, the average of the last 4 digits of a phone number is 5001).</p><pre><code>SELECT
    date_trunc('minute', CAST(TIME AS TIMESTAMP)) AS DATE,
    avg(CAST(regexp_extract(Phone_Number, '\d+') AS NUMERIC)) AS avg_number
FROM parquet_scan('s3://my-duckdb-bucket/striim-out.*')
GROUP BY 1</code></pre><h2>Wrapping Up</h2><p>I hope this was a helpful exploration of how you can use Striim and DuckDB to process real-time analytic queries quickly and easily. Gone are the days of Kafka, Zookeeper and Debezium. In less than 30 minutes you can get a CDC stream setup, write to a cloud bucket location, and query with DuckDB for blazing-fast analytics.&nbsp;</p><p>If you want to give Striim a try, <a href="https://signup-developer.striim.com/">you can sign up here</a> with my referral code <strong>tAlaDngxjQ</strong>.</p>]]></content:encoded></item><item><title><![CDATA[2022 Recap: Every Random Idea I Had]]></title><description><![CDATA[a farewell tour for unfinished thoughts]]></description><link>https://databased.pedramnavid.com/p/2022-recap</link><guid isPermaLink="false">https://databased.pedramnavid.com/p/2022-recap</guid><dc:creator><![CDATA[Pedram Navid]]></dc:creator><pubDate>Wed, 28 Dec 2022 21:38:06 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!9BJv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a7d450-26bb-4b07-b113-dfc6b964db1f_906x354.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Whenever I speak to an investor, they ask me to predict the future of data. As a trained data scientist, I know all predictions are wrong, but some are useful. Unfortunately, I haven&#8217;t learned to figure out which are useful, so I usually tell them I have no idea what will happen. </p><p>But I soon learned that knowledge isn&#8217;t a requirement for predictions. We make all kinds of assumptions and predictions without any basis in fact all the time, and maybe that&#8217;s okay</p><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/pdrmnvd/status/1608181283531952128?s=20&amp;t=xxCb2wy32ycGmvLtPYnl2g&quot;,&quot;full_text&quot;:&quot;data people: upon this initial analysis, given the data we have, it does appear that there is perhaps a correlation between these two outcomes that warrants a deeper investigation into causality. \n\nsales people: BRO WE ABSOLUTELY CRUSHED IT THIS YEAR, LETS GOOOOOO &#128640;&#128293;!!!!!!!!!!!&quot;,&quot;username&quot;:&quot;pdrmnvd&quot;,&quot;name&quot;:&quot;pedram.yml&quot;,&quot;profile_image_url&quot;:&quot;&quot;,&quot;date&quot;:&quot;Wed Dec 28 19:21:11 +0000 2022&quot;,&quot;photos&quot;:[],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:0,&quot;retweet_count&quot;:0,&quot;like_count&quot;:3,&quot;impression_count&quot;:0,&quot;expanded_url&quot;:{},&quot;video_url&quot;:null,&quot;belowTheFold&quot;:false}" data-component-name="Twitter2ToDOM"></div><p>So then I started to come up with a laundry list of predictions I had for the future, and thought, why not make this into a blog post, as I&#8217;ve been horrible at keeping up with this Substack? Perhaps I can trick these gentle readers into thinking I have actual content when all I have are screenshots of moments where I was funny.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9BJv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a7d450-26bb-4b07-b113-dfc6b964db1f_906x354.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9BJv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a7d450-26bb-4b07-b113-dfc6b964db1f_906x354.png 424w, https://substackcdn.com/image/fetch/$s_!9BJv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a7d450-26bb-4b07-b113-dfc6b964db1f_906x354.png 848w, https://substackcdn.com/image/fetch/$s_!9BJv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a7d450-26bb-4b07-b113-dfc6b964db1f_906x354.png 1272w, https://substackcdn.com/image/fetch/$s_!9BJv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a7d450-26bb-4b07-b113-dfc6b964db1f_906x354.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9BJv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a7d450-26bb-4b07-b113-dfc6b964db1f_906x354.png" width="477" height="186.3774834437086" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/f0a7d450-26bb-4b07-b113-dfc6b964db1f_906x354.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:354,&quot;width&quot;:906,&quot;resizeWidth&quot;:477,&quot;bytes&quot;:80576,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9BJv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a7d450-26bb-4b07-b113-dfc6b964db1f_906x354.png 424w, https://substackcdn.com/image/fetch/$s_!9BJv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a7d450-26bb-4b07-b113-dfc6b964db1f_906x354.png 848w, https://substackcdn.com/image/fetch/$s_!9BJv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a7d450-26bb-4b07-b113-dfc6b964db1f_906x354.png 1272w, https://substackcdn.com/image/fetch/$s_!9BJv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a7d450-26bb-4b07-b113-dfc6b964db1f_906x354.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p>But, no, that felt wrong too. I have more than just screenshots to share with the world; I also have unfinished drafts. What if I could surreptitiously sneak in a few drafts and random screenshots together, would the reader even notice? Unlikely unless it was made incredibly obvious to them by some dim-witted narrator who wasn&#8217;t too careful with his (or her) words. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!F5rL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3433969-c2ff-41a1-a5ef-2793837fbb89_456x644.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!F5rL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3433969-c2ff-41a1-a5ef-2793837fbb89_456x644.png 424w, https://substackcdn.com/image/fetch/$s_!F5rL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3433969-c2ff-41a1-a5ef-2793837fbb89_456x644.png 848w, https://substackcdn.com/image/fetch/$s_!F5rL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3433969-c2ff-41a1-a5ef-2793837fbb89_456x644.png 1272w, https://substackcdn.com/image/fetch/$s_!F5rL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3433969-c2ff-41a1-a5ef-2793837fbb89_456x644.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!F5rL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3433969-c2ff-41a1-a5ef-2793837fbb89_456x644.png" width="186" height="262.6842105263158" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/d3433969-c2ff-41a1-a5ef-2793837fbb89_456x644.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:644,&quot;width&quot;:456,&quot;resizeWidth&quot;:186,&quot;bytes&quot;:78918,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!F5rL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3433969-c2ff-41a1-a5ef-2793837fbb89_456x644.png 424w, https://substackcdn.com/image/fetch/$s_!F5rL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3433969-c2ff-41a1-a5ef-2793837fbb89_456x644.png 848w, https://substackcdn.com/image/fetch/$s_!F5rL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3433969-c2ff-41a1-a5ef-2793837fbb89_456x644.png 1272w, https://substackcdn.com/image/fetch/$s_!F5rL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3433969-c2ff-41a1-a5ef-2793837fbb89_456x644.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p> And so, kind reader here is every thought I&#8217;ve had in 2022, gathered from drafts, from notes, from todo&#8217;s, from tweets, from slack, from the withered remains of my memory. </p><div class="paywall-jump" data-component-name="PaywallToDOM"></div><p>If there&#8217;s one thing I can say about 2022 is that it certainly did happen, and it is almost, without a doubt, nearly over. And I have the data to prove it. </p><div><hr></div><h2>On Building a Brand</h2><p>Here&#8217;s the ugly truth. Being public and writing publicly has done more for my career than anything else. Never mind the books I read, the courses I took, the systems I built, the systems I destroyed, the resumes I wrote, the cover letters I didn&#8217;t, none of it has had the impact of being public and writing about myself. </p><p>Not to say none of that stuff isn&#8217;t necessary, but (in a language you nerds will understand) it isn&#8217;t sufficient. </p><p>Marketing, to some, is a dirty word. I&#8217;ve learned to embrace it. I am still in the unfortunate position of caring about job security, and being seen as someone who knows things is more important than knowing things. </p><p>Networking, to some, is a dirty word. I&#8217;ve learned to rephrase it. Networking, to me, is nothing more than being friends with people who work in the same industry as you. I&#8217;ve made a lot of friends this way. I&#8217;ve also made a few enemies.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> </p><p>There&#8217;s no magic trick, secret cabal, or mysterious meetings. I talk to people I&#8217;m genuinely interested in, meet them in person when I can, and treat them as well as I can. If we like each other, we become friends, and if we&#8217;re friends, we&#8217;ll help each other. That&#8217;s all networking is. </p><h5><em>Source: Substack post called &#8220;How I Learned to Stop Worrying and Love My Job&#8221;.<br>Status: Draft has now been deleted.</em></h5><h2>On Asking for Help</h2><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IEA8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc31ab3de-26a1-4880-ab28-8260433f1180_1262x186.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IEA8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc31ab3de-26a1-4880-ab28-8260433f1180_1262x186.png 424w, https://substackcdn.com/image/fetch/$s_!IEA8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc31ab3de-26a1-4880-ab28-8260433f1180_1262x186.png 848w, https://substackcdn.com/image/fetch/$s_!IEA8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc31ab3de-26a1-4880-ab28-8260433f1180_1262x186.png 1272w, https://substackcdn.com/image/fetch/$s_!IEA8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc31ab3de-26a1-4880-ab28-8260433f1180_1262x186.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IEA8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc31ab3de-26a1-4880-ab28-8260433f1180_1262x186.png" width="1262" height="186" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/c31ab3de-26a1-4880-ab28-8260433f1180_1262x186.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:186,&quot;width&quot;:1262,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:50104,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IEA8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc31ab3de-26a1-4880-ab28-8260433f1180_1262x186.png 424w, https://substackcdn.com/image/fetch/$s_!IEA8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc31ab3de-26a1-4880-ab28-8260433f1180_1262x186.png 848w, https://substackcdn.com/image/fetch/$s_!IEA8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc31ab3de-26a1-4880-ab28-8260433f1180_1262x186.png 1272w, https://substackcdn.com/image/fetch/$s_!IEA8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc31ab3de-26a1-4880-ab28-8260433f1180_1262x186.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">wonder why this one never got published</figcaption></figure></div><p>Learn to ask for help. This is somewhat related to the previous topic. Strangers don&#8217;t owe you anything, so if you&#8217;re asking them for help, please, learn to ask for help. </p><p>Here are some simple steps to follow:</p><ol><li><p>Acknowledge that you&#8217;re asking for someone&#8217;s time and mental energy. You may not get anything back, and that is okay. It is not a reflection on you or them. </p></li><li><p>Keep it succinct. Do not send a four-page essay to someone. If they have follow-up questions, they will ask you. Make it easy for them to say yes.</p></li><li><p>Show you&#8217;ve done the work. Have you thought carefully about why you&#8217;re reaching out to this person? Have you tried answering it on your own? Here&#8217;s a great example I made up just now: &#8220;Hey, Pedram, I&#8217;ve been thinking about switching from data scientist to data engineering, and you&#8217;ve talked publicly about having made that career switch before. I was wondering if you wouldn&#8217;t mind answering a few questions I had. I can send them here or by e-mail if that&#8217;s easier for you. Thanks for your time!&#8217;&#8221;</p></li></ol><p>I honestly don&#8217;t mind helping people. I&#8217;ve been very fortunate in life and my career, and I want others to have that too. But please, make it easy for me to help you. Help me help you.</p><h5><em>Source: Substack post called &#8220;How to Ask for Help and How to Give It: A guide for everyone who's ever messaged me out of the blue&#8221;.<br>Status: Draft has now been deleted. </em></h5><h2>On Hiring Your First Data Role at a Startup</h2><p>I don&#8217;t think we have a good answer for this yet. Data is one of the loneliest positions at startups. Up there with Finance. A team of 1 for far too long, no one to talk to, no one to bounce ideas off of. </p><p>Whatever you do, don&#8217;t hire someone too junior. Data roles aren&#8217;t about data. They&#8217;re about negotiating between varying teams about who gets credit for success and who gets blamed for failures. They&#8217;re about trying to get an organization aligned on what matters. How you measure something is more about the process than it is about a data pipeline. Finding holes in the process will be all you ever do.</p><p>Data roles can be extremely isolating, even in the best of times. Apart from a stack under constant evolution, data teams are often a single-person show, while their peers in engineering typically have multiple people they can rely on for everything from code reviews, to mentorship and just ranting at each other about the state of Javascript.</p><p>Not so with data, which is perhaps why communities like dbt and Locally Optimistic are so large and vibrant. When you don&#8217;t have anyone inside the company who can truly feel the pain of incompatible schema changes that weren&#8217;t communicated, a community of others who can understand your pains becomes very valuable. But having a community is no replacement for a leader, and that&#8217;s where I think most frustrations data practitioners in their role feel, especially in early-stage companies as the first data hire.</p><h5><em>Source: Substack post called &#8220;Hiring Your First Data Hire&#8221;.<br>Status: Draft has now been deleted. </em></h5><h2>On Talking About The Work</h2><p>When we talk about data, we talk about <a href="https://roundup.getdbt.com/p/four-frameworks-for-self-service">frameworks</a>, <a href="https://roundup.getdbt.com/p/a-re-examination-of-the-data-consumer">mental models</a>, <a href="https://locallyoptimistic.com/post/run-your-data-team-like-a-product-team/">designing</a> <a href="https://www.castordoc.com/blog/how-to-build-your-data-team">data</a> <a href="https://online.hbs.edu/blog/post/analytics-team-structure">teams</a>, <a href="https://www.google.com/url?sa=t&amp;rct=j&amp;q=&amp;esrc=s&amp;source=web&amp;cd=&amp;cad=rja&amp;uact=8&amp;ved=2ahUKEwjIkbqK2OD4AhWkIkQIHaPtDHcQFnoECAUQAQ&amp;url=https%3A%2F%2Ftowardsdatascience.com%2Fthe-great-data-debate-unbundling-or-bundling-7d7721ee8514&amp;usg=AOvVaw3-TX_-Wvv6HZ7O5eoEzLRd">bundling</a>, <a href="https://www.google.com/url?sa=t&amp;rct=j&amp;q=&amp;esrc=s&amp;source=web&amp;cd=&amp;cad=rja&amp;uact=8&amp;ved=2ahUKEwjIkbqK2OD4AhWkIkQIHaPtDHcQFnoECAcQAQ&amp;url=https%3A%2F%2Froundup.getdbt.com%2Fp%2Fbundled-or-unbundled-data-stack&amp;usg=AOvVaw27mGTgg1BsCKTS-M2Y86Iz">unbundling</a>, <a href="https://dagster.io/blog/rebundling-the-data-platform">rebundling</a>, <a href="https://duckdb.org/">databases</a>, <a href="https://roundup.getdbt.com/p/htap-databases">databases</a>, and more <a href="https://benn.substack.com/p/all-your-database-are-belong-to-us">databases</a>.</p><p>What we don&#8217;t talk about is the work because we can&#8217;t. Data work is private, secret, and sometimes legally-protected. Engineers get to write blog posts and release open-source software for the work they build. Data people get to talk about a tool they used, maybe a method without context, if you&#8217;re lucky. But never the journey that got us there. Just imagine this talk at a conference:</p><p>&#8220;We&#8217;re a mortgage company, and our processing rate was down 15% this quarter. We ran an analysis to identify the causes and found that it was because we were chronically understaffed during the summer months; however, after doing a cost-benefit analysis, we found that it was cheaper to have longer processing times than to hire additional staff to cover peak hours, so we decided to settle for reduced service levels. </p><p>In this talk, I will discuss how we discovered our findings and how I negotiated to present these results to our CFO without upsetting our partners in product and processing.&#8221;</p><p>I would love to hear that talk, but it will never happen. So instead, you get a talk on how we enabled self-serve analytics by buying a tool.</p><h5><em>Source: Substack post called &#8220;We Need to Talk about Data (but can't)&#8221;.<br>Status: Draft has now been deleted. </em></h5><h2>On Smelly Code</h2><p>Data Modeling Code Smells is my term for the stuff you write; as you write it, you tell yourself&#8230;this stinks. </p><p>Here&#8217;s a non-exhaustive list of smelly code when data modeling:</p><ol><li><p>Duplicated code</p></li><li><p>Too many lines</p></li><li><p>Too many columns (very wide models)</p></li><li><p>Excessive comments</p></li><li><p>Clean-up code in marts</p></li><li><p>Too much jinja</p></li><li><p>Inconsistent naming</p></li><li><p>Casts and Coalesces</p></li><li><p>Right Joins</p></li><li><p>Functions in Joins</p></li><li><p>Magic Variables</p></li></ol><h5><em>Source: Substack post called &#8220;Data Modeling Code Smells&#8221;.<br>Status: Draft has now been deleted. </em></h5><h2>On dbt Cloud&#8217;s Pricing Changes</h2><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/pdrmnvd/status/1604588431799046144?s=20&amp;t=xxCb2wy32ycGmvLtPYnl2g&quot;,&quot;full_text&quot;:&quot;I'm going to do it anyway. The only good pricing model is the one everyone is equally unhappy with. \n\nSeat-based models are notoriously difficult, and I imagine this is a stop-gap until dbt Cloud has more utility that warrants a consumption-based model.&quot;,&quot;username&quot;:&quot;pdrmnvd&quot;,&quot;name&quot;:&quot;pedram.yml&quot;,&quot;profile_image_url&quot;:&quot;&quot;,&quot;date&quot;:&quot;Sun Dec 18 21:24:28 +0000 2022&quot;,&quot;photos&quot;:[],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:0,&quot;retweet_count&quot;:0,&quot;like_count&quot;:9,&quot;impression_count&quot;:0,&quot;expanded_url&quot;:{},&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><p>I used to work at a bank, where I did compensation modeling. Once you see the world through the lens of incentives, your brain breaks, and you cannot see anything but incentives everywhere you look.</p><p>Pricing and incentives are among the most interesting parts of running any business. The pricing model you choose can make or break you. dbt Cloud&#8217;s decision to go with a seat-based pricing model for their cloud text editor/scheduler made sense at the time but locked them into a corner.</p><p>The best thing you can do is align your pricing structure with the value you create for your customers. If you can&#8217;t do that, then you might as well just make up numbers and call it enterprise pricing. This is a perfectly valid way of pricing your product; call it a platform fee.</p><p>Seat-based pricing is rarely a high-growth strategy. Unfortunately, when you raise VC Capital, you&#8217;re expected to have high growth until you exit (at which point you can cease growing altogether).</p><p>Consumption-based pricing is better, when it can work, and when it can be understood. Fivetran is one of the few MDS companies that makes money because it&#8217;s easy to get started and rack up a $100k annual spend without blinking. Snowflake isn&#8217;t hitting 166% NRR on seat-based pricing.</p><p>When your SaaS product costs the average company less than they spend on toilet paper, you&#8217;ve got a Shit Pricing Model&#8482;. When money is free, pricing doesn&#8217;t matter, but when money costs something, the Shit Pricing Model &#8482; needs a redo. It&#8217;s no surprise dbt decided to increase their prices; what is surprising is:</p><ol><li><p>They did it with very little notice at the end of the year.</p></li><li><p>They claimed that they were doing it because we asked them to. Just own up that you&#8217;re a business trying to make money; it&#8217;s not a crime. It&#8217;s an easier story to believe.</p></li><li><p>It came with no real increase in value to the customer. </p></li><li><p>It is relatively easy to trade their subpar experience for a home-grown subpar experience.</p></li></ol><p>Again, it all comes down to incentives. People who had no incentive to roll their own mini-scheduler now had a major incentive to switch off dbt Cloud. Going from $50 to $100 a month doesn&#8217;t really impact most teams (and doesn&#8217;t really generate any real revenue for dbt). But going from $5k a year to an enterprise contract because you have 9 analysts, well, now we have the incentive to try and build it in-house. Yikes!</p><p>I know I am not a CEO (wait, I am now), and it&#8217;s easy to criticize from the sidelines (but when has that ever stopped anyone?), so in the end, what I say doesn&#8217;t matter or mean much. But using what limited info I have (here comes that data-talk again), I would have used levers like metrics or dbt Server to push companies into enterprise rather than just seats, especially since those are much harder to build internally. Oh well, what do I know! </p><h5><em>Source: Twitter threads ,and substack post called &#8220;Let&#8217;s Talk about Incentives Baby&#8221;.<br>Status: Draft has now been deleted. </em></h5><h2>100 Ways to Align with Business Outcomes</h2><p>Now that money isn&#8217;t free, it&#8217;s time to prove your data teams are worth something. Here are 100 ways to align your data team with business outcomes.</p><p>Actually, I only came up with 16 before running out of ideas. Sorry.</p><ol><li><p>Optimize marketing spend to increase conversions</p></li><li><p>Identify marketing channels to drop due to inefficient spend</p></li><li><p>Run A/B analysis on email campaigns to identify better messaging&nbsp;</p></li><li><p>Forecast end-of-quarter pipeline to identify sales targets&nbsp;</p></li><li><p> Calculate the true customer acquisition cost by channel and recommend ways to reduce expensive channels&nbsp;</p></li><li><p>Analyze inventory orders to identify wasted opportunities&nbsp;</p></li><li><p>Identify factors that lead to backlogs during peak demands season at processing centers&nbsp;</p></li><li><p>Create an LTV model for customers and build a process for continually updating it. Identify best sources for high LTV</p></li><li><p>Analyze and forecast infrastructure cost as a function of customer growth and recommend ways to prevent linear or higher growth.</p></li><li><p>Analyze funnel data to identify drop-off points and make recommendations on how to improve the experience to increase retention&nbsp;</p></li><li><p>Analyze support tickets for common themes and recommend product improvements to reduce tickets&nbsp;</p></li><li><p>Figure out how long it takes for leads to get to a first meeting and identify ways to highlight quality leads earlier to reduce that length&nbsp;</p></li><li><p>Create a lead scoring model and identify commonalities in top-scoring leads and recommend a nurture campaign to get them into conversations</p></li><li><p>Analyze cloud costs and identify underutilized resources to reduce cloud spend&nbsp;</p></li><li><p>Identify sales pipeline drop-off and break it down by factors to identify leaks in the sales process and how to plug them.</p></li><li><p>Analyze experimentation results to identify perverse incentives that may have happened through gamed metrics&nbsp; and recommend guardrail metrics to protect against them&nbsp;</p></li></ol><h5><em>Source: LinkedIn and Notes<br>Status: Notes draft has now been deleted. </em></h5><h2>And last but not least, from my Notes Draft</h2><div class="preformatted-block" data-component-name="PreformattedTextBlockToDOM"><label class="hide-text" contenteditable="false">Text within this block will maintain its original spacing when published</label><pre class="text">Here comes the dag again
Failing all my jobs like a memory
Head in my hands like a new emotion
I want to work in the open source
I want to talk like lovers do
I want to dive into your data 
Is it activating with you
So baby talk to me
Like data do
Talk to me
Like lovers do
Talk to me
Like data do</pre></div><h5><em>Source: Notes<br>Status: Notes remains in case I ever produce this song </em></h5><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>Feel free to ask me about this if you want to know the ugly side of being publicly known. I&#8217;ve had real repeated threats against me, people upset that they weren&#8217;t included in private conversations I&#8217;ve had with my friends, and more. There&#8217;s a real ugly side out there that I hope none of you ever experience.</p></div></div>]]></content:encoded></item><item><title><![CDATA[Deep Dive: What the Heck is Entity Resolution]]></title><description><![CDATA[or record linkage, or identity mapping, or data matching.]]></description><link>https://databased.pedramnavid.com/p/entity-resolution</link><guid isPermaLink="false">https://databased.pedramnavid.com/p/entity-resolution</guid><dc:creator><![CDATA[Pedram Navid]]></dc:creator><pubDate>Fri, 11 Nov 2022 00:35:16 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/h_600,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F03cd8208-403f-4c17-b515-562993c8ffdd_654x532.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Entity Resolution, Identity Mapping, Record Linkage, Data Matching, and Record Matching</em>. The names are many, but the concept is deceptively simple. In this Deep Dive, we'll look at Entity Resolution and some of its core components.&nbsp;</p><p>If you prefer video, I gave a talk on&nbsp;<a href="https://www.youtube.com/watch?v=cL2dBMuY2lw&amp;t=533s">Entity Resolution at dbt's office hours</a>&nbsp;2 years ago. If you prefer books, there's no other book I'd recommend more than&nbsp;<a href="https://www.amazon.com/Data-Matching-Techniques-Data-Centric-Applications/dp/3642430015/ref=sr_1_4?crid=CCBB9W3V844A&amp;keywords=entity+resolution&amp;qid=1668092009&amp;sprefix=entity+resolution%2Caps%2C166&amp;sr=8-4&amp;ufe=app_do%3Aamzn1.fos.006c50ae-5d4c-4777-9bc0-4513d670b6bc">Peter Christen's Data Matching.</a> If you prefer my Substack, you&#8217;re in the right place.</p><p>Let's dive in.</p><h2>What is Entity Resolution?</h2><p>Entity resolution is all about combining multiple records of things. There are two parts to entity resolution: first is the entity, and second is the record of that entity in some database.</p><p>The entity can be anything from a person to a company to a physical product. I'll use companies as examples here, but the underlying logic applies to any entity you want to dedupe.&nbsp;</p><p>A record of that entity might exist in a spreadsheet, a database, or across multiple databases.&nbsp;</p><p>What's important is that there is no unique identifier representing that entity. If you were trying to dedupe people and had their Social Security Number or another national identification number, then the problem would be relatively easy. However, absent a single unique indicator, if we want to match or dedupe these records, then we need a way to resolve them to a single entity: hence, entity resolution.</p><h2>An Illustrative Example</h2><p>Let's say you work at a small B2B company with data in various systems of record: your production database, your Salesforce instance, and several spreadsheets of data with leads captured at various events.&nbsp;</p><p>Anyone can sign in to your product by providing their company name and email address. Your Salesforce instance has accounts with company names, locations, websites, and contact information. The leads spreadsheet has similar information hand-captured by a person running the event.</p><p>It might look something like this:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8PHU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd40b2f7f-f6b5-4c9b-b87d-689781f1d704_1602x422.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8PHU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd40b2f7f-f6b5-4c9b-b87d-689781f1d704_1602x422.png 424w, https://substackcdn.com/image/fetch/$s_!8PHU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd40b2f7f-f6b5-4c9b-b87d-689781f1d704_1602x422.png 848w, https://substackcdn.com/image/fetch/$s_!8PHU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd40b2f7f-f6b5-4c9b-b87d-689781f1d704_1602x422.png 1272w, https://substackcdn.com/image/fetch/$s_!8PHU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd40b2f7f-f6b5-4c9b-b87d-689781f1d704_1602x422.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8PHU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd40b2f7f-f6b5-4c9b-b87d-689781f1d704_1602x422.png" width="1456" height="384" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/d40b2f7f-f6b5-4c9b-b87d-689781f1d704_1602x422.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:384,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:155037,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8PHU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd40b2f7f-f6b5-4c9b-b87d-689781f1d704_1602x422.png 424w, https://substackcdn.com/image/fetch/$s_!8PHU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd40b2f7f-f6b5-4c9b-b87d-689781f1d704_1602x422.png 848w, https://substackcdn.com/image/fetch/$s_!8PHU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd40b2f7f-f6b5-4c9b-b87d-689781f1d704_1602x422.png 1272w, https://substackcdn.com/image/fetch/$s_!8PHU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd40b2f7f-f6b5-4c9b-b87d-689781f1d704_1602x422.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>How do you decide whether or not these different records are part of the same underlying entity? You might keep it simple and decide that two entities are the same if they share the same website, but even websites change over time. While a simple solution may be sufficient, you're entering the realm of entity resolution if you're not satisfied with that.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://databased.pedramnavid.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share Pedram's Data Based&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://databased.pedramnavid.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share Pedram's Data Based</span></a></p><h2>The 5 Steps to Entity Resolution</h2><p>This deep dive will go through the five keys to entity resolution. There is much more nuance and depth beyond this post, but this should be enough to get us started.</p><ol><li><p>Pre-processing</p></li><li><p>Indexing</p></li><li><p>Comparing</p></li><li><p>Classifying</p></li><li><p>Merging</p></li></ol><h2>Pre-processing</h2><p>Before we embark on our journey, starting with a good foundation is essential. As much as possible, we want to clean our underlying data. Everything from trimming extra whitespaces to lowercasing all the characters, removing stop-words, or even stemming and&nbsp;<a href="https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html">lemmatizing</a>&nbsp;is possible here. The specifics are highly context-dependent, and always an iterative process. As you perform these steps on data samples, your appreciation for what you need to do to improve the quality of matches will increase, and you will refine your pre-processing.</p><p>You'll want to abstract the pre-processing steps as much as possible. Regular expressions can be convenient here, and knowing how to use them correctly can increase the performance of your system. For example, in Python,&nbsp;<a href="https://docs.python.org/3/howto/regex.html#compiling-regular-expressions">compiling your regular expression</a>&nbsp;before using them will improve performance, and taking advantage of&nbsp;<a href="https://docs.python.org/3/library/functools.html#functools.cache">a cache</a>&nbsp;to avoid repetitive computations can save significant time as you process millions of rows.</p><p>If you're using dbt, macros are helpful to reduce code duplication, and as you find incremental improvements, you only have to apply them in one place.</p><p>Some common pre-processing steps I've seen are:</p><ul><li><p>Making everything lowercase and removing whitespaces</p></li><li><p>Splitting an email into user and domain</p></li><li><p>Cleaning company names to remove stop words such as ... <em>Inc</em>. ... <em>LLC</em>, <em>The</em> .., <em>A</em> ...</p></li><li><p>Converting words such as null, na, n/a to actual NULLs</p></li><li><p>Filtering out demo/test/internal users</p></li><li><p>Parsing and cleaning websites and addresses</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ogec!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5dd7f0-845b-4f03-a790-781ea98f1520_1894x462.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ogec!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5dd7f0-845b-4f03-a790-781ea98f1520_1894x462.png 424w, https://substackcdn.com/image/fetch/$s_!ogec!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5dd7f0-845b-4f03-a790-781ea98f1520_1894x462.png 848w, https://substackcdn.com/image/fetch/$s_!ogec!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5dd7f0-845b-4f03-a790-781ea98f1520_1894x462.png 1272w, https://substackcdn.com/image/fetch/$s_!ogec!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5dd7f0-845b-4f03-a790-781ea98f1520_1894x462.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ogec!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5dd7f0-845b-4f03-a790-781ea98f1520_1894x462.png" width="1456" height="355" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/4e5dd7f0-845b-4f03-a790-781ea98f1520_1894x462.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:355,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:129232,&quot;alt&quot;:&quot;Example dbt macro for cleaning company names&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Example dbt macro for cleaning company names" title="Example dbt macro for cleaning company names" srcset="https://substackcdn.com/image/fetch/$s_!ogec!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5dd7f0-845b-4f03-a790-781ea98f1520_1894x462.png 424w, https://substackcdn.com/image/fetch/$s_!ogec!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5dd7f0-845b-4f03-a790-781ea98f1520_1894x462.png 848w, https://substackcdn.com/image/fetch/$s_!ogec!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5dd7f0-845b-4f03-a790-781ea98f1520_1894x462.png 1272w, https://substackcdn.com/image/fetch/$s_!ogec!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5dd7f0-845b-4f03-a790-781ea98f1520_1894x462.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Example Macro for cleaning company names using Regular Expressions</figcaption></figure></div></li></ul><h2>Indexing and Blocking</h2><p>Once you have cleaned your data, the next step is to index the data to improve performance. Consider this example: you have 100,000 records in Database A and 10,000 in Database B, with no common indicator. How many comparisons are you performing if you look at the website and name?</p><div class="paywall-jump" data-component-name="PaywallToDOM"></div><p>The formula for this is <code>(m * n) * p</code>, where <code>m</code> and <code>n</code> are the number of records in each table, and <code>p</code> is the number of indicators. So we get 2,000,000,000 or two billion comparisons if we do the math, and that's on relatively small tables. You can see how this will not scale beyond a million records.&nbsp;</p><p>Our only hope is to reduce the search space. Of course, we can't compare every single record against each other, but we can compare within a subset with a three-step process of <strong>blocking</strong>, <strong>indexing</strong> and <strong>reverse-indexing.</strong>&nbsp;</p><p><strong>Blocking</strong> is taking your entire record and chunking it into smaller blocks to avoid comparing every value with every other value. So the obvious question is, how do you block records together if you need to know how to match them?&nbsp;</p><p>In the simplest example, you might block by comparing records from the same country, state, or zip code, or you could look at the first letter of a name.&nbsp;</p><p>There are also algorithmic options. If your entities are names of people or companies, you could use various functions to reduce the search space by compressing information.&nbsp;</p><p>Soundex is an example of an algorithm, developed a century ago for use on names. It encodes similar-sounding names, can be used as a blocking key and is supported in many data warehouses, including Snowflake.</p><p><code>select</code></p><p><code>soundex('pedram') as pedram,</code></p><p><code>soundex('pedrum') as pedrum,</code></p><p><code>soundex('peter') as peter,</code></p><p><code>soundex('pedro') as pedro</code></p><p><code>&gt; PEDRAM PEDRUM PETER PEDRO&nbsp;</code></p><p><code>&nbsp; &nbsp;p365   p365  p360  p360</code></p><p>You can see that Peter and Pedro are blocked together, and Pedram and Pedrum are as well. </p><p>There are many different blocking techniques,&nbsp;<a href="https://arxiv.org/pdf/1905.06167.pdf">and this survey paper</a>&nbsp;reviews many of them, but the principle behind them is essentially the same.&nbsp;</p><p>You could also use multiple blocking keys to improve accuracy. For example, you might run Soundex on first and last names and compare similar blocks across either first or last names. But, of course, the trade-off is always between performance and accuracy.&nbsp;</p><p>Once you've defined your blocking function, you can apply it to every database record. This step is called indexing. For example, below, suppose we ran every name through a Soundex function. Each record has a Soundex associated with it.</p><p><code>Record 1 - D130<br>Record 2 - D130<br>Record 3 - F235<br>Record 4 - F235<br>Record 5 - D130</code></p><p>Next, we combine all records with the same blocking key into a subgroup for comparison purposes. To do this, we rely on a reverse index: for every blocking key, identify all the rows that belong to that block.</p><p><code>D130: {1, 2, 5}</code></p><p><code>F235: {3, 4}</code></p><p>In doing so, we can efficiently work on a block of similar records, and can even distribute this work in parallel. Once we have our reverse index, we are ready to proceed to Comparing.</p><h2>Comparing and Classifying</h2><p>I group comparing and classifying into one topic here because they are interrelated. Comparing is the act of summarizing the similarity between two records, and classifying is the act of deciding whether two records are 'similar enough.'&nbsp;</p><p>There are many ways to compare two records. First, you can look at equality for any column, which is the most straightforward comparison. </p><p><code>if(a.name = b.name) then 1 else 0</code></p><p>On strings, you can look at how similar they are using similarity functions such as the&nbsp;<a href="https://docs.snowflake.com/en/sql-reference/functions-string.html">edit distance</a>&nbsp;or the&nbsp;<a href="https://docs.snowflake.com/en/sql-reference/functions/jarowinkler_similarity.html">Jaro-Winkler similarity score</a>. </p><p>You can compare numbers by absolute or percent differences. You could look at dates, ages, times, or geographies.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!W5gh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F94a419d4-7d37-4464-b718-ef00c8fd932f_2232x1782.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!W5gh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F94a419d4-7d37-4464-b718-ef00c8fd932f_2232x1782.png 424w, https://substackcdn.com/image/fetch/$s_!W5gh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F94a419d4-7d37-4464-b718-ef00c8fd932f_2232x1782.png 848w, https://substackcdn.com/image/fetch/$s_!W5gh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F94a419d4-7d37-4464-b718-ef00c8fd932f_2232x1782.png 1272w, https://substackcdn.com/image/fetch/$s_!W5gh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F94a419d4-7d37-4464-b718-ef00c8fd932f_2232x1782.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!W5gh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F94a419d4-7d37-4464-b718-ef00c8fd932f_2232x1782.png" width="1456" height="1162" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/94a419d4-7d37-4464-b718-ef00c8fd932f_2232x1782.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1162,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:530916,&quot;alt&quot;:&quot;Example code of scoring functions&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Example code of scoring functions" title="Example code of scoring functions" srcset="https://substackcdn.com/image/fetch/$s_!W5gh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F94a419d4-7d37-4464-b718-ef00c8fd932f_2232x1782.png 424w, https://substackcdn.com/image/fetch/$s_!W5gh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F94a419d4-7d37-4464-b718-ef00c8fd932f_2232x1782.png 848w, https://substackcdn.com/image/fetch/$s_!W5gh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F94a419d4-7d37-4464-b718-ef00c8fd932f_2232x1782.png 1272w, https://substackcdn.com/image/fetch/$s_!W5gh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F94a419d4-7d37-4464-b718-ef00c8fd932f_2232x1782.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Example scoring functions</figcaption></figure></div><p>To classify, you can average the results from above to generate a score for each record. You can weigh individual fields, for example, giving greater weight to the last name over the first name. You might decide on a threshold that balances false positives and negatives or lean on machine learning or other more advanced techniques for classification.</p><p>There's no perfect way to compare and classify, but the goal is always the same: to create a list of tuple pairs of matched records. </p><p>One complication: records A and B might match, and records B and C might match, but records A and C might not. Therefore after matching, you need to process all the records to perform a merge.&nbsp;</p><p>For example, suppose you have seven records and have compared them with some arbitrary matching formula. You end up with the following tuple pairs of matched records.</p><ul><li><p><code>{1, 2} </code></p></li><li><p><code>{3, 4}</code></p></li><li><p><code>{5, 6}</code></p></li><li><p><code>{1, 7}</code></p></li><li><p><code>{2} </code></p></li><li><p><code>{6, 5}</code></p></li><li><p><code>{7, 5}</code></p></li></ul><p>If we graphed these pairs, it would look something like this:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fztN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b4b1117-50c7-4282-9db1-850ca890727e_774x814.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fztN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b4b1117-50c7-4282-9db1-850ca890727e_774x814.png 424w, https://substackcdn.com/image/fetch/$s_!fztN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b4b1117-50c7-4282-9db1-850ca890727e_774x814.png 848w, https://substackcdn.com/image/fetch/$s_!fztN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b4b1117-50c7-4282-9db1-850ca890727e_774x814.png 1272w, https://substackcdn.com/image/fetch/$s_!fztN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b4b1117-50c7-4282-9db1-850ca890727e_774x814.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fztN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b4b1117-50c7-4282-9db1-850ca890727e_774x814.png" width="434" height="456.42894056847547" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/7b4b1117-50c7-4282-9db1-850ca890727e_774x814.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:814,&quot;width&quot;:774,&quot;resizeWidth&quot;:434,&quot;bytes&quot;:56086,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fztN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b4b1117-50c7-4282-9db1-850ca890727e_774x814.png 424w, https://substackcdn.com/image/fetch/$s_!fztN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b4b1117-50c7-4282-9db1-850ca890727e_774x814.png 848w, https://substackcdn.com/image/fetch/$s_!fztN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b4b1117-50c7-4282-9db1-850ca890727e_774x814.png 1272w, https://substackcdn.com/image/fetch/$s_!fztN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b4b1117-50c7-4282-9db1-850ca890727e_774x814.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>However, since some pairs share a common record, we need to connect these pairs. So how do we do that? We use the aptly named&nbsp;<a href="https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.components.connected_components.html">connect components algorithm!</a></p><p>Using this algorithm, we reduce the above example to two distinct entities:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Og5W!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F03cd8208-403f-4c17-b515-562993c8ffdd_654x532.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Og5W!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F03cd8208-403f-4c17-b515-562993c8ffdd_654x532.png 424w, https://substackcdn.com/image/fetch/$s_!Og5W!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F03cd8208-403f-4c17-b515-562993c8ffdd_654x532.png 848w, https://substackcdn.com/image/fetch/$s_!Og5W!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F03cd8208-403f-4c17-b515-562993c8ffdd_654x532.png 1272w, https://substackcdn.com/image/fetch/$s_!Og5W!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F03cd8208-403f-4c17-b515-562993c8ffdd_654x532.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Og5W!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F03cd8208-403f-4c17-b515-562993c8ffdd_654x532.png" width="394" height="320.50152905198775" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/03cd8208-403f-4c17-b515-562993c8ffdd_654x532.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:532,&quot;width&quot;:654,&quot;resizeWidth&quot;:394,&quot;bytes&quot;:37643,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Og5W!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F03cd8208-403f-4c17-b515-562993c8ffdd_654x532.png 424w, https://substackcdn.com/image/fetch/$s_!Og5W!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F03cd8208-403f-4c17-b515-562993c8ffdd_654x532.png 848w, https://substackcdn.com/image/fetch/$s_!Og5W!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F03cd8208-403f-4c17-b515-562993c8ffdd_654x532.png 1272w, https://substackcdn.com/image/fetch/$s_!Og5W!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F03cd8208-403f-4c17-b515-562993c8ffdd_654x532.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We identified two entities using those seven records and are tantalizingly close to the end: our last step is merging records.</p><h2>Merging Records</h2><p>Given two or more records that refer to the same underlying entity, we must decide how to merge the information. For example, if the names are different, what do we keep?</p><p>The first approach is randomly picking one as the master record and using all the information in that one. This approach is the easiest and least subtle, but it can often be sufficient for our needs.</p><p>A better approach is to rank sources on a per-record or per-field basis. So, for example, we might pick first and last names from Database A but use addresses from Database B.&nbsp;</p><p>Both methods result in a loss of information, so another approach is called the Union Set. Essentially we keep all distinct elements across all records. At the very least, we want to keep the union set of table primary keys for better debugging.&nbsp;</p><p>Suppose Database A has a record with primary key 123 and Database B has a record with primary key 456; we might merge these two records such that the primary key field is now <code>{A: 123, B: 456}</code></p><p>Another option is to use ranges. If we are merging company information and have two different sources for the number of employees, we might include them as a range. If Record A had 100 employees, Record B had 250 employees, and Record C had 175, we might merge these two records as [100, 250].&nbsp;</p><p>You can imagine many other ways to merge records, but the goal is to preserve the right level of detail for your particular use case.&nbsp;</p><div class="community-chat" data-attrs="{&quot;url&quot;:&quot;https://open.substack.com/pub/pedram/chat?utm_source=chat_embed&quot;,&quot;subdomain&quot;:&quot;pedram&quot;,&quot;pub&quot;:{&quot;id&quot;:367470,&quot;name&quot;:&quot;Pedram's Data Based&quot;,&quot;author_name&quot;:&quot;Pedram Navid&quot;,&quot;author_photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/6655dbab-8253-47c8-9717-f00b12dcc3b8_400x400.jpeg&quot;}}" data-component-name="CommunityChatRenderPlaceholder"></div><h2>Wrapping Up</h2><p>Once merged, your work is largely done. You have created a set of candidates, classified them, linked records, and merged them together, but your journey has just begun. There is much more in this field to learn. Decisions on blocking keys, classification methods, and supervised/unsupervised learning, just to name a few. </p><p>You may also want to check out some libraries and products in this space, such as the Python <a href="https://recordlinkage.readthedocs.io/en/latest/index.html">Record Linkage</a> library and the many available <a href="https://arxiv.org/pdf/2008.04443.pdf">research papers</a> on this topic.</p><p>Hope you enjoyed this deep dive, if there&#8217;s any topic you&#8217;re interested in, <a href="mailto:pedram@pedramnavid.com">reach out</a> and let me know!</p><p></p>]]></content:encoded></item><item><title><![CDATA[The Eternal Suffering of Data Practitioners: Part 1]]></title><description><![CDATA[A totally scientific extrapolation from a small and biased sample]]></description><link>https://databased.pedramnavid.com/p/the-eternal-suffering-of-data-practitioners</link><guid isPermaLink="false">https://databased.pedramnavid.com/p/the-eternal-suffering-of-data-practitioners</guid><dc:creator><![CDATA[Pedram Navid]]></dc:creator><pubDate>Mon, 31 Oct 2022 01:35:07 GMT</pubDate><enclosure url="https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/dd907ac9-5744-4030-9938-ca5d3e2811ba_964x964.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A few weeks ago, I posted an opportunity for mentorship to data practitioners who needed someone to talk to. I received far more responses than I could ever hope to fulfill, but I read through every single one. Since I could only select three people to work with, I decided to pull out the main themes I saw and address them as best as possible.</p><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/pdrmnvd/status/1581723505682325505?s=20&amp;t=0UtwCTpbJxgWFu4qWdNk3g&quot;,&quot;full_text&quot;:&quot;Been thinking a lot about how lonely the one-person-data-team can be.\n\nI want to give back: if you&#8217;re the only data person at a startup and want free mentoring, DM me. I have 3 slots open for monthly checkins, just have a short form to help me sort through requests.&quot;,&quot;username&quot;:&quot;pdrmnvd&quot;,&quot;name&quot;:&quot;pedram&quot;,&quot;profile_image_url&quot;:&quot;&quot;,&quot;date&quot;:&quot;Sun Oct 16 19:07:25 +0000 2022&quot;,&quot;photos&quot;:[],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:0,&quot;retweet_count&quot;:14,&quot;like_count&quot;:161,&quot;impression_count&quot;:0,&quot;expanded_url&quot;:{},&quot;video_url&quot;:null,&quot;belowTheFold&quot;:false}" data-component-name="Twitter2ToDOM"></div><h2>The Servicing Framework Gap</h2><p><em>"I'm the sole data person and consistently feel like I'm playing defense in service of countless requests. I don't have a great framework for answering questions as they come in."</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://databased.pedramnavid.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Pedram's Data Based is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>This is a common refrain I've heard, especially as data teams are small, yet stakeholders seem never to stop growing. Part of the problem is that the framing of the question implies that your job is to answer requests. When you're the only data person at the company, answering questions should only be a small part of your job.</p><p>Forget everything you know about data, and pretend for a moment that you are a senior executive. What would you spend your days doing? If you're an executive worth keeping, your job is to invoke maximum impact on the organization, given your limited availability. You should be laser-focused on providing&nbsp;<strong>maximum leverage.&nbsp;</strong>A<strong>&nbsp;</strong>VP of Sales doesn't spend all day doing outbound cold calls, even though they might be quite capable of closing deals. When they are on the phone, it's to get a big deal over the finish line right before the end-of-quarter.</p><p>If you're the VP of Marketing and the product team came to you and said they need a post in two days on some feature that is about to launch, and this is the first you heard of it, what would you say? If you were told to send an outbound email and were given the subject line and content, would you take that well?</p><p><strong>"But I'm not a VP of Data. I'm a data engineer. I'm a data analyst. I'm the manager of analytics."</strong></p><p>Get over it. A title is what you list on your resume. Your job is what you make of it. Stop taking tickets if you don't want to be seen as a ticket-taker. </p><p>Okay, now that I've stepped off my soapbox. Here's some actionable advice.</p><h3>Train Your Stakeholders</h3><p>If you're a victim of unreasonable requests, you first need to work on the supply side. I find that stakeholders don't know how to ask data teams for help. They'll do everything except what you want them to do, whether it's a quick question for a number without context or a detailed description of how to pull data along with every column required.</p><p>Kelly Burdine, in&nbsp;<a href="https://locallyoptimistic.slack.com/archives/C01RYCKG02U/p1666625061581429?thread_ts=1666621433.774719&amp;cid=C01RYCKG02U">a thread in Locally Optimistic</a>, shares her team's Acceptance Criteria for requests. These fields in an intake form are the first step in teaching people how to ask data teams for help.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KM5c!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb587741d-37ae-44ae-8421-885bf939d4d6_790x223.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KM5c!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb587741d-37ae-44ae-8421-885bf939d4d6_790x223.png 424w, https://substackcdn.com/image/fetch/$s_!KM5c!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb587741d-37ae-44ae-8421-885bf939d4d6_790x223.png 848w, https://substackcdn.com/image/fetch/$s_!KM5c!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb587741d-37ae-44ae-8421-885bf939d4d6_790x223.png 1272w, https://substackcdn.com/image/fetch/$s_!KM5c!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb587741d-37ae-44ae-8421-885bf939d4d6_790x223.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KM5c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb587741d-37ae-44ae-8421-885bf939d4d6_790x223.png" width="790" height="223" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/b587741d-37ae-44ae-8421-885bf939d4d6_790x223.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:223,&quot;width&quot;:790,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:41271,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KM5c!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb587741d-37ae-44ae-8421-885bf939d4d6_790x223.png 424w, https://substackcdn.com/image/fetch/$s_!KM5c!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb587741d-37ae-44ae-8421-885bf939d4d6_790x223.png 848w, https://substackcdn.com/image/fetch/$s_!KM5c!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb587741d-37ae-44ae-8421-885bf939d4d6_790x223.png 1272w, https://substackcdn.com/image/fetch/$s_!KM5c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb587741d-37ae-44ae-8421-885bf939d4d6_790x223.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Kelly Burdine&#8217;s Acceptance Criteria from a Locally Optimistic Thread</figcaption></figure></div><p>In many ways, you need to act like a PM. I asked the "so what problem are you trying to solve" question many times, and I've gotten nicknames based on it. It's tireless work, but it's a continuous reminder to teams that your job isn't to take orders; it's to help make impactful decisions using limited resources.</p><p>If a team can't articulate why they want access to data or what decisions they will make with the data, or if there's no clear indication that the request supports a goal or KPI, then it's time for the requestors to step back and spend more time thinking about what they want before coming to you. I know that sounds scary, but Hallowe&#8217;en is around the corner, so get comfortable with being spooked.</p><p>As your company and team grow, some type of intake form is inevitable. Still, you might be fine with a Slack channel for free-form question-asking or even email in the early days. A dedicated Slack channel for asking for help with data is a great place to generate awareness amongst your peers of the types of questions people ask. With Slack workflows, you can create simple templated forms to help structure the requests.</p><p>I recommend starting with the least amount of questions possible and increasing them over time. Striking the right level of friction is more art than science, although you can't go wrong with this question from Caitlin Hudon.</p><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/beeonaposy/status/1300439462535667713?s=20&amp;t=k1TfEqE7LJaEvSmGcj_Kgg&quot;,&quot;full_text&quot;:&quot;This question (from our data team's intake form) has been helpful for clarifying expectations around analysis &quot;,&quot;username&quot;:&quot;beeonaposy&quot;,&quot;name&quot;:&quot;Caitlin Boo-don &#127875;&quot;,&quot;profile_image_url&quot;:&quot;&quot;,&quot;date&quot;:&quot;Mon Aug 31 14:25:03 +0000 2020&quot;,&quot;photos&quot;:[{&quot;img_url&quot;:&quot;https://pbs.substack.com/media/EgwVarHXcAU5wIX.png&quot;,&quot;link_url&quot;:&quot;https://t.co/zikvBSHJhL&quot;,&quot;alt_text&quot;:null}],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:0,&quot;retweet_count&quot;:77,&quot;like_count&quot;:644,&quot;impression_count&quot;:0,&quot;expanded_url&quot;:{},&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><h3>Prioritization</h3><p>Inevitably, you will get more requests than you can answer with the time you have. You also need to realize that your job has more facets than time spent responding to questions. A prioritization framework can become complicated, often looking at multiple dimensions, such as urgency and impact. But, in the early days, prioritization can be simpler: focus on one executive you're going to support and ignore the rest.</p><p>It all comes back to the role you're in. You're here to have maximum impact. Start with the CEO and work your way down. If you can align your work with the CEO, then there's little anyone can say or do to attack your priorities. That's not to say you can't help the rest of the organization, but the more you can align your work with helping the CEO make better decisions, the better. In doing so, making a case for scaling your team will become easier. More on that soon.</p><h2>Failures in Thinking Strategically </h2><p><em>"How do I think strategically? I never have the time."</em></p><p>Hopefully, I've convinced you that you need to act like an executive, even if you aren't one. If so, then this next argument will be even easier for me to make.</p><p><strong>If you are not spending time at work thinking strategically, you are not doing your job.</strong></p><p>What does strategic thinking mean? Everyone has their definition of what strategy is. For me, a data team's strategy is thinking about the approach you will take to address the company's needs over the medium to long term.</p><p>It starts with thinking. <strong>If you do not have time to think, then you will not think strategically.</strong> Block time on your calendar. Get a notebook and leave your phone behind. Sit down and think for an hour about how well or poorly you are serving the company's goals and how you will serve them as it scales and grows. What is the vision for the company? What do the founders and executives talk about at all hands? What are the headwinds, and where are the pain points? Are we concerned with growth, retention, expansion, sales, churn, hiring, revenue, and costs?</p><p>Your strategy is both what you will focus on and what you will say no to. Be explicit, write it down, and share it with your boss. Get alignment.</p><p>Here's an example scenario.</p><p><em><strong>We have been growing successfully over the last two years. Still, sales cycles have increased over the past quarter, and we missed our revenue target for the first time. There's a clear focus in the market on cost containment. We are planning on raising funding over the next 12 months. Our focus as a company over the next year will be to shift from logos to revenue, improving our sales efficiency and reducing spend where possible.</strong></em></p><p><em><strong>My strategy will be to align with our VP of Sales to get her everything she needs to hit our revenue goals. Our VP of Marketing will be my next closest ally. I will work with the two of them to help sharpen our understanding of our inbound leads through enrichment, optimize our pipeline by analyzing sales rep performance, and help figure out which channels drive the most revenue and which channels are underperforming.&nbsp;I may need to invest in certain tools or even outsource some help, especially when it comes to marketing channels.</strong></em></p><p><em><strong>As I won't have time to service the needs of the rest of the organization, I'll need to hire an analyst or double down on self-service capabilities.</strong></em></p><p>When your strategy is set, you've solved many downstream problems for yourself. I'll often get a question like "what do I build first," "how do I plan for the future," or "how do I divide my time between answering questions and building scalable systems," but these questions are all symptoms of a lack of strategy.</p><p>Start with strategy, then the answers to these questions become more manageable. Without it, there's no framework for answering them well.</p><h2>Loneliness is such a sad affair</h2><p><em>And I can hardly wait to write SQL again</em>.</p><p>I can't count the times I've looked longingly at engineering teams as a sole data hire. Yet, they are reviewing each other's PRs and discussing trade-offs and architectural decisions. Meanwhile, I'm commenting on my GitHub issues, trying to act like I have friends.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JmAr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4f4914e7-d74b-412d-b1f5-6cdb907db2aa_481x466.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JmAr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4f4914e7-d74b-412d-b1f5-6cdb907db2aa_481x466.png 424w, https://substackcdn.com/image/fetch/$s_!JmAr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4f4914e7-d74b-412d-b1f5-6cdb907db2aa_481x466.png 848w, https://substackcdn.com/image/fetch/$s_!JmAr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4f4914e7-d74b-412d-b1f5-6cdb907db2aa_481x466.png 1272w, https://substackcdn.com/image/fetch/$s_!JmAr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4f4914e7-d74b-412d-b1f5-6cdb907db2aa_481x466.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JmAr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4f4914e7-d74b-412d-b1f5-6cdb907db2aa_481x466.png" width="481" height="466" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/4f4914e7-d74b-412d-b1f5-6cdb907db2aa_481x466.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:466,&quot;width&quot;:481,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:57482,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JmAr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4f4914e7-d74b-412d-b1f5-6cdb907db2aa_481x466.png 424w, https://substackcdn.com/image/fetch/$s_!JmAr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4f4914e7-d74b-412d-b1f5-6cdb907db2aa_481x466.png 848w, https://substackcdn.com/image/fetch/$s_!JmAr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4f4914e7-d74b-412d-b1f5-6cdb907db2aa_481x466.png 1272w, https://substackcdn.com/image/fetch/$s_!JmAr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4f4914e7-d74b-412d-b1f5-6cdb907db2aa_481x466.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>Loneliness is the one thing no one prepares you for as you build a career in data. It's why I think communities like dbt and Locally Optimistic flourish. We had to expand our reach to find just one other person who feels like we do. It's what makes Data Twitter, despite its flaws, so great. It's why I started a&nbsp;<a href="https://www.linkedin.com/groups/14127002/">LinkedIn group for data practitioners.</a></p><p>My only advice here is to engage in the communities and create your own. I wouldn't be where I am today if I wasn't relentless in the questions I ask. I learn more from others than I hope to learn on the job.</p><p>There's only so much time I have available. I made myself available to three people for mentorship, and there were 27 others I had to say no to.</p><p>That's why my last request is if you've benefitted from a community and have the time to dedicate to even one person once a month, reach out and let me know. Hopefully, we can find more mentorship for the other 27.</p><h2>Coming Up</h2><p>That's it for Part 1. In Part 2, I'll cover three more common questions: Scaling a team, building a data culture, and the ever-nagging question: "am I doing this right?"</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://databased.pedramnavid.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Pedram's Data Based is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Deep Dive: What The Heck Is the Metrics Layer]]></title><description><![CDATA[also known as the semantic layer, previously known as the random queries in my BI tools]]></description><link>https://databased.pedramnavid.com/p/what-is-the-metrics-layer</link><guid isPermaLink="false">https://databased.pedramnavid.com/p/what-is-the-metrics-layer</guid><dc:creator><![CDATA[Pedram Navid]]></dc:creator><pubDate>Wed, 14 Sep 2022 19:36:06 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!0WpE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8a7b5b0-2c42-4f1b-b89c-67fb86dc0092_822x660.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>There has been a lot of buzz about the metrics layer. As always, we start with a little trip down memory lane:</p><p>In January 2021, Base Case (an investor in what was then the headless BI version Supergrain) explored the future of headless BI as a solution to unbundling metrics from BI. In April 2021, Benn Stencil&nbsp;<a href="https://benn.substack.com/p/metrics-layer">made a case for the metrics layer</a>. In October that year, Drew&nbsp;<a href="https://github.com/dbt-labs/dbt-core/issues/4071">opened an issue</a>&nbsp;that generated more discussion. In December 2021, the Metrics Layer achieved&nbsp;<a href="https://www.youtube.com/watch?v=MdSMSbQxnO0&amp;ab_channel=dbt">keynote status</a>&nbsp;at dbt Coalesce (with a long journey through the history of standardization).</p><p>Since then, Supergrain pivoted from a headless BI to a marketing tool (ok,&nbsp;<em>a warehouse-native approach to customer engagement).</em>&nbsp;Transform, which was a metrics engine, is shifting toward self-serve BI.</p><p>dbt continues development on the metrics layer, later renamed the&nbsp;<a href="https://www.getdbt.com/blog/dbt-semantic-layer/">semantic layer</a>. In October at Coalesce, we&#8217;re likely to hear more on the metrics layer, although we&#8217;ve&nbsp;<a href="https://docs.getdbt.com/blog/getting-started-with-the-dbt-semantic-layer">gotten the occasional update</a>.</p><p>To date, Amit Prakash has done the best job exploring&nbsp;<a href="https://www.thoughtspot.com/blog/the-metrics-layer-has-growing-up-to-do">the metrics layer on Thoughtspot&#8217;s blog.</a>&nbsp;In it, he describes six classes of metrics and three solutions for what a proper semantic layer could look like. I won&#8217;t go into all the details since the post already does a great job, and his writing is clear and approachable.</p><p>Instead, we&#8217;ll go one step closer to code and look at three implementations of the metrics layer and what a world without it looks like.</p><h2>The Activation Metric</h2><p>For the rest of this post, we&#8217;ll look at a metric that I think shows the true power of a well-defined metrics layer.</p><p>Pretend we&#8217;re a B2B SaaS, where users can sign up for our product and belong to one or more workspaces. Each workspace has one or more users. A workspace is active if they perform some activation event within 24 hours of workspace creation.</p><p>We&#8217;ll call the metric of interest&nbsp;<em>activation rate</em>, and we&#8217;ll define it as so:</p><blockquote><p>The&nbsp;<em>activation rate</em>&nbsp;is the ratio of active workspaces to all workspaces over a certain period.</p></blockquote><p>More concretely, every day, we have a list of all workspaces and a flag for whether that workspace was active on that day or not. The count of all workspaces on that day is the total number of workspaces. The count of all workspaces where the flag is&nbsp;<em>true</em>&nbsp;is the count of active workspaces.</p><p>We may want to report on the activation rate daily, weekly, or monthly. In addition, we&#8217;ll want to know the change in the activation rate over time.</p><h2>First, in SQL</h2><p>Let&#8217;s define everything in SQL to get a baseline. We&#8217;ll start with a basic table:</p><pre><code>select reporting_day, workspace_id, is_active from workspace_details;

####

reporting_day | workspace_id | is_active
--------------|--------------|----------|
2022-07-04    | 100          | true
2022-07-04    | 101          | false
...</code></pre><p>So far, so good. Now let&#8217;s count workspaces:</p><pre><code>select

reporting_day,
count(distinct workspace_id) as n_workspaces,
sum(case when is_active then 1 else 0 end) as n_active_ws

from workspace_details
group by 1

####

reporting_day | n_workspaces | n_active_ws|
--------------|--------------|------------|
2022-07-04    |       2      |         1
...</code></pre><p>Now, if we want to know the activation rate, we divide active over the total. For simplicity, we&#8217;ll pretend we&#8217;re using Snowflake, which allows us to refer to columns created in the same select statement.</p><pre><code>select

reporting_day,
count(distinct workspace_id) as n_workspaces,
sum(case when is_active then 1 else 0 end) as n_active_ws,
n_active_ws / n_workspaces as activation_rate

from workspace_details
group by 1

####

reporting_day | n_workspaces | n_active_ws | activation_rate
--------------|--------------|-------------|---------------
2022-07-04    |       2      |         1   | 0.5
...</code></pre><p></p><p>So far, so good. We could take this SQL, create a dbt model, and then use any reporting tool to visualize the activation rate over time. We can even start looking at change over time. But, first, let&#8217;s make the activation rates easier to read with some formatting.</p><pre><code>select

reporting_day,
count(distinct workspace_id) as n_workspaces,
sum(case when is_active then 1 else 0 end) as n_active_ws,

round(100 * (n_active_ws / n_workspaces), 2) as activation_rate,

activation_rate - lag(activation_rate) over(order by reporting_day) as abs_change,
round(100 * abs_change / lag(activation_rate) over(order by reporting_day), 2) as pct_change


from workspace_details
group by 1
order by 1

####

reporting_day | n_ws | n_active | a_rate|abs_change|pct_chg
--------------|------|----------|-------|----------|-------
2022-07-04    |  2   |      1   | 50.   |    -     |   -     
2022-07-05    |  3   |      2   | 66.6  |  +16.6   | +33.3%   

...</code></pre><p>We&#8217;ve made a ton of progress and haven&#8217;t needed to touch a metrics layer, so what&#8217;s the big deal? The real pain comes when your stakeholder now asks you for these numbers at a weekly, monthly, and quarterly aggregate. Pain is imminent.</p><p>What&#8217;s worse is if your end-users don&#8217;t understand how these measures are defined, they might start doing silly things like this:</p><pre><code>select

date_trunc('month', reporting_day) as reporting_month,
avg(activation_rate) as avg_activation_rate

from...
</code></pre><p>Instead of finding the average over a period by adding the individual components and calculating the rate, they might average a ratio and end up with incorrect measures. We don&#8217;t want that.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://databased.pedramnavid.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Data Based is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Enter the Metrics Layer</h2><p>A metrics layer solves these and other problems. Let&#8217;s look at how Looker approaches this.</p><h3>Looker</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0WpE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8a7b5b0-2c42-4f1b-b89c-67fb86dc0092_822x660.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0WpE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8a7b5b0-2c42-4f1b-b89c-67fb86dc0092_822x660.png 424w, https://substackcdn.com/image/fetch/$s_!0WpE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8a7b5b0-2c42-4f1b-b89c-67fb86dc0092_822x660.png 848w, https://substackcdn.com/image/fetch/$s_!0WpE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8a7b5b0-2c42-4f1b-b89c-67fb86dc0092_822x660.png 1272w, https://substackcdn.com/image/fetch/$s_!0WpE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8a7b5b0-2c42-4f1b-b89c-67fb86dc0092_822x660.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0WpE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8a7b5b0-2c42-4f1b-b89c-67fb86dc0092_822x660.png" width="822" height="660" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/e8a7b5b0-2c42-4f1b-b89c-67fb86dc0092_822x660.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:660,&quot;width&quot;:822,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:343550,&quot;alt&quot;:&quot;screenshot of the looker application&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="screenshot of the looker application" title="screenshot of the looker application" srcset="https://substackcdn.com/image/fetch/$s_!0WpE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8a7b5b0-2c42-4f1b-b89c-67fb86dc0092_822x660.png 424w, https://substackcdn.com/image/fetch/$s_!0WpE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8a7b5b0-2c42-4f1b-b89c-67fb86dc0092_822x660.png 848w, https://substackcdn.com/image/fetch/$s_!0WpE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8a7b5b0-2c42-4f1b-b89c-67fb86dc0092_822x660.png 1272w, https://substackcdn.com/image/fetch/$s_!0WpE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8a7b5b0-2c42-4f1b-b89c-67fb86dc0092_822x660.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Looker has a syntax-aware UI with a great reference built-in as you code, making the development experience smoother.</figcaption></figure></div><p>In Looker, we define metrics in LookML files. We have a view, which represents a model of data, usually built from an existing table or view in the data warehouse. Here&#8217;s what that might look like:</p><pre><code>view: workspace_activation {
  sql_table_name: "METRICS"."WORKSPACE_ACTIVATION"
    ;;

  dimension_group: date {
    type: time
    timeframes: [
      raw,
      time,
      date,
      week,
      month,
      quarter,
      year
    ]
    sql: ${TABLE}."REPORTING_DATE" ;;
  }

  dimension: is_active_workspace {
    type: yesno
    sql: ${TABLE}."IS_ACTIVE_WORKSPACE" ;;
  }

  dimension: workspace_id {
    type: string
    primary_key: yes
    sql: ${TABLE}."WORKSPACE_ID" ;;
  }

  measure: count_workspaces {
    type: count_distinct
    description: "# of Workspaces"
    sql:  ${workspace_id} ;; 
    filters: [workspace_name: "!='Demo Workspace'"]
  }

  measure: count_active_workspaces {
  type: count_distinct
  description: "# of Unique Workspaces Active within 1 Day"
  sql:  ${workspace_id} ;;
  filters:  [is_active_workspace: "yes"]
}


  measure: activation_rate {
  type:  number
  sql:  ${count_active_workspaces} / ${count_workspaces} ;;
  value_format_name: percent_1
}


}
</code></pre><p>A lot is going on here, but the thing to notice is that we are defining measures as formulas, not as fully-formed SQL tables. We also specify what formatting to use on measures, how to drill into the details when going from aggregate views to detailed views, how to aggregate measures, and all the ways we want to break down our reporting date.</p><p>From this code, Looker can dynamically generate the SQL needed without us having to worry about different granularities.</p><p>We could go further and start defining joins between this table and other tables, for example if we wanted to break down workspaces by paid vs. not-paid or attribution category.</p><p>Without a metrics layer, we&#8217;d have to anticipate and perform all these joins upfront. With a metrics layer, we can specify relationships between tables and let the BI tool join as needed, only on the columns the user requests. As a result, our users never need to consider what types of joins to use.</p><h2>dbt Metrics</h2><p>Let&#8217;s try and do the same with the dbt metrics layer to understand better what we&#8217;ve got. The big caveat is that dbt metrics are not yet complete and are undergoing active development. So things may change, and rough edges might need time to polish.</p><p>We&#8217;ll first define our metrics in the dbt yml file:</p><pre><code>metrics:
  - name: count_workspaces
    label: '# Workspaces'
    model: ref('active_workspace')
    type: count_distinct
    sql: workspace_id

    timestamp: reporting_day
    time_grains: [day, week, month]
    filters:
      - field: workspace_name
        operator: '!='
        value: 'Demo Workspace


  - name: count_active_workspaces
    label: 'Active Workspaces'
    model: ref('active_workspace')
    type: count_distinct
    sql: workspace_id

    timestamp: reporting_day
    time_grains: [day, week, month]

  - name: activation_rate
    label: 'Activation Rate'
    type: expression
    sql: " 100.0* {{ metric('count_active_workspaces') }} / {{ metric('count_workspaces') }} "
    
    timestamp: reporting_day
    time_grains: [day, week, month]</code></pre><p>And we&#8217;ll create a model that allows us to select from these new metrics:</p><pre><code>select * from 
  {{ metrics.calculate(
   [metric('count_workspaces'), metric('count_active_workspaces'), metric('activation_rate')], 
    grain='week',
  )}}</code></pre><p>There are a few key things to note here. In the Looker model, the organizing principle was a View, which contained dimensions, measures, and time-grains all within one namespaced View object. dbt took a different approach: metrics are self-contained units. Each metric must specify which dbt model it should run against, the timestamp column, and which time grains to support.</p><p>This approach is already leading us to some duplicated code for the three metrics above.</p><p>Another point to consider is the expression metric. We can&#8217;t refer to a metric there but need to wrap it in jinja, leading to jinja in YAML, which can be a parsing nightmare without a good IDE. While Looker&#8217;s IDE can parse, highlight, and show errors within your LookML code and expression, we don&#8217;t have that level of tooling for dbt.</p><p>You&#8217;ll also note that I am defining how to query my metrics within dbt using the dbt_metrics macro. For now, there&#8217;s no support for reading these metrics outside of dbt itself, although dbt has partnerships with BI tools, and I expect they&#8217;ll be announcing better ways to interact with dbt&#8217;s metrics layer soon enough.</p><p>Filtering is more clunky. In Looker, we provide an array of expressions to filter on, while in dbt we build our filters as yaml, explicitly defining what operator to use.</p><p>One final observation: there is no support for joins. In Looker, you can define relationships between different tables and explicitly define which related views should be available to a user within an Explore. Until support for joins arrives in dbt, it&#8217;s hard to see any value in an isolated semantic layer.</p><h3>Lightdash</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!50En!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9fbf4f60-22b9-4821-aa70-3a884c461415_1281x929.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!50En!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9fbf4f60-22b9-4821-aa70-3a884c461415_1281x929.png 424w, https://substackcdn.com/image/fetch/$s_!50En!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9fbf4f60-22b9-4821-aa70-3a884c461415_1281x929.png 848w, https://substackcdn.com/image/fetch/$s_!50En!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9fbf4f60-22b9-4821-aa70-3a884c461415_1281x929.png 1272w, https://substackcdn.com/image/fetch/$s_!50En!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9fbf4f60-22b9-4821-aa70-3a884c461415_1281x929.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!50En!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9fbf4f60-22b9-4821-aa70-3a884c461415_1281x929.png" width="1281" height="929" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/9fbf4f60-22b9-4821-aa70-3a884c461415_1281x929.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:929,&quot;width&quot;:1281,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:413592,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!50En!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9fbf4f60-22b9-4821-aa70-3a884c461415_1281x929.png 424w, https://substackcdn.com/image/fetch/$s_!50En!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9fbf4f60-22b9-4821-aa70-3a884c461415_1281x929.png 848w, https://substackcdn.com/image/fetch/$s_!50En!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9fbf4f60-22b9-4821-aa70-3a884c461415_1281x929.png 1272w, https://substackcdn.com/image/fetch/$s_!50En!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9fbf4f60-22b9-4821-aa70-3a884c461415_1281x929.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">In Lightdash, the presentation layer is separate from the code. The UI is intuitive and easy to navigate, but changes to it require changing your dbt model code.</figcaption></figure></div><p>Lightdash is a BI tool that is tightly integrated within the dbt ecosystem. It offers two ways of expressing metrics: the first uses the native dbt metrics layer we discussed above. But beyond that, it also has a metrics implementation that you can leverage, which has some added benefits, such as joins and formatting.</p><p>Like dbt, you add your metrics implementation directly in your dbt yml file. However, the structure is a little different. The metrics are specified in a meta tag underneath the column. You can define multiple metrics related to a column in-line, and there is no need to duplicate dimensions across metrics.</p><pre><code>version: 2
models:
  - name: active_workspace
    columns:
      - name: reporting_day
        description: "Day of report"
        meta:
          dimension:
            type: date
      - name: workspace_id
        description: "The Id of the workspace"
        meta:
          metrics:
            count_workspaces:
              type: count_distinct
            count_active_workspaces:
              type: count_distinct
              sql: "case when is_active then workspace_id else null end"
            activation_rate:
              type: number
              sql: (1.0 * ${count_active_workspaces} / ${count_workspaces})
              round: 2
              format: percent
</code></pre><p>We also have convenient helpers for rounding and formatting numbers. The templating is simplified, and there&#8217;s no reliance on dbt ref macros. We can directly specify a metric using the <code>${metric_name}</code> format.</p><p>There are some downsides to the integration with dbt. Namely, any change to your metrics requires a full dbt refresh, which can be slow. There&#8217;s also the question of where a metric belongs: not all metrics should live under a particular dbt column definition; perhaps a separate metric definition file could be more maintainable long-term.</p><p>That said, the reporting is simplified quite a bit. It&#8217;s easy to query the metrics using the Lightdash UI, and there&#8217;s no need to write custom code to fetch a metric. But then, your metrics are only accessible within Lightdash, although this could be alleviated with APIs that make metrics more accessible beyond just Lightdash. Given the open nature of the product, I wouldn&#8217;t be surprised if metrics became more accessible over time.</p><h2>What it all means</h2><p>All three tools have different trade-offs, and their strengths and weaknesses tell of the challenges a metrics layer faces. Looker deeply integrates its metrics layer within the Looker ecosystem. Dimensions and measures are defined within the same application, and Looker&#8217;s semantic understanding of LookML allows for a rich parsing and developer experience. Looker can write to Git for version control, but most development occurs within the Looker ecosystem.</p><p>Despite its strength, there are also pitfalls. Measures defined within Looker are not easily accessed. While Looker exposes an API, we haven&#8217;t seen it become a standard metrics layer across the data stack, perhaps because the high entry price makes it prohibitive for smaller companies.</p><p>That said, a well-configured Looker instance can reduce the burden on data teams. Providing access to views your end-users can query without relying on data teams whenever you need just one more column can be powerful. That power has led to increased interest in a universal metrics layer solution.</p><p>With dbt, it&#8217;s clear that they are trying to stake their place within the data ecosystem as a natural fit for a universal metrics layer. Much of the modern data stack already integrates with dbt, and dbt is widely adopted and available to nearly any data team. However, dbt is also moving toward a cloud-based and server-based model, and full adoption of the metrics layer will likely involve some subscription requirements.</p><p>Pricing aside, the real challenge with dbt is delivering an ergonomic and performant solution. The current jinja/yaml-based definition of metrics, the lack of any significant development tooling, and a gap in features that would make it broadly applicable are still outstanding questions.&nbsp;</p><p>Since it&#8217;s been announced, there has been very little news, although there&#8217;s still active development. Just last week, dbt changed the API by renaming some fields. Unfortunately, this active development also makes it difficult to recommend. Without stability, data teams will not likely want to develop against it.</p><p>Lightdash is in an exciting place as well. In some ways, they are trying to integrate with dbt and find a way to develop their own metrics definitions apart from it. Too much reliance on dbt can bring challenges, especially as there&#8217;s no clear roadmap on where the metrics layer will be going. On the other hand, saving your metrics definition next to your dbt code can have a lot of ergonomic benefits. The outstanding question is whether other apps can leverage the metric definitions. If not, Lightdash may approach Looker-status, another BI silo for metrics.</p><p>So the real question I have is this:&nbsp;<strong>Can a metrics layer be universal enough to gain applicability across the data stack yet still be designed in such a way to be relevant to BI tools?</strong></p><p>We are still ways off from having an answer to that question, but I&#8217;m excited to see how we get there.</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://databased.pedramnavid.com/p/what-is-the-metrics-layer?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thank you for reading Pedram's Data Based. This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://databased.pedramnavid.com/p/what-is-the-metrics-layer?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://databased.pedramnavid.com/p/what-is-the-metrics-layer?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div>]]></content:encoded></item><item><title><![CDATA[Deep Dive: What the heck is Airflow]]></title><description><![CDATA[This is the first installment in the Deep Dive series, where I go deep on a particular product or category.]]></description><link>https://databased.pedramnavid.com/p/deep-dive-airflow</link><guid isPermaLink="false">https://databased.pedramnavid.com/p/deep-dive-airflow</guid><dc:creator><![CDATA[Pedram Navid]]></dc:creator><pubDate>Mon, 22 Aug 2022 03:20:34 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!BrZY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc8da285-6e61-4ab9-ae37-5ba139a96ea2_1021x481.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This is the first installment in the Deep Dive series, where I go deep on a particular product or category. Some of these will be free, and some will be paid. This one is paid and was a special request by a paid subscriber. I hope you enjoy!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://pedram.substack.com/subscribe?coupon=24c49df0&quot;,&quot;text&quot;:&quot;Get 20% off for 1 year&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://pedram.substack.com/subscribe?coupon=24c49df0"><span>Get 20% off for 1 year</span></a></p><h2>A Short History of Orchestration</h2><p>Apache Airflow is part of a class of tools called an orchestrator, but to understand what it is and why people use it, we need to travel back a little bit to its origin and Airbnb. </p><p><a href="https://airflow.apache.org/docs/apache-airflow/stable/project.html">Airflow was created in 2014</a> and released in 2015 at Airbnb. The original blog <a href="https://medium.com/airbnb-engineering/airflow-a-workflow-management-platform-46318b977fd8">announcing the release is still up</a> and is a good resource for reminding ourselves of where Airflow came up and what the world was like then.</p><p>At Airbnb, data engineers used tools like&nbsp;<a href="https://medium.com/airbnb-engineering/data-infrastructure-at-airbnb-8adfb34f169c">Apache Hive</a>&nbsp;as a data warehouse, with much of their infrastructure built on Hadoop and Spark. There were many problems to be solved and jobs to be done: data extraction, cleaning, quality checks, and long-term storage.</p><p>Airbnb was also performing a lot of computation. They needed to know everything from how guests felt about their accommodations to how their hosts felt about their guests. They needed to understand how well their recommendations were doing and whether their experiments were working well. They needed to compute sessions from all the clickstream data on both their app and the web.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://databased.pedramnavid.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Like what you&#8217;re reading? The rest of this article is only for paid subscribers.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>
      <p>
          <a href="https://databased.pedramnavid.com/p/deep-dive-airflow">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Counting Things: Counting Users Part 2 ]]></title><description><![CDATA[come on get a little bit closer baby, cause tonight is the night]]></description><link>https://databased.pedramnavid.com/p/count-things-counting-users-part</link><guid isPermaLink="false">https://databased.pedramnavid.com/p/count-things-counting-users-part</guid><dc:creator><![CDATA[Pedram Navid]]></dc:creator><pubDate>Tue, 09 Aug 2022 03:49:16 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!T507!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf55b494-298e-4abb-a299-bec34ab98cca_1844x1484.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><a href="https://pedram.substack.com/p/counting-users">In my last post</a>, I walked through an example of counting people who visit your site, and the complexities that come with it. Next, we&#8217;ll explore what happens when a visitor becomes a user and emits a couple innocuous events.</p><h3>When Two Become One</h3><p>Let&#8217;s say Rachel visited your site over the past few months. For simplicity, she was kind enough to persist cookies, use the same device across both sites, and generally be friendly toward your site tracking. We are using something like <a href="https://www.rudderstack.com/">Rudderstack</a>, <a href="http://segment.io">Segment</a>, <a href="http://amplitude.com">Amplitude</a>, <a href="http://mixpanel.com">Mixpanel</a>, <a href="https://jitsu.com/">Jitsu</a>, or <a href="http://snowplowanalytics.com">Snowplow</a> for event tracking.</p><p>Rachel clicks the giant, blinking, iridescent &#10024;<strong>sign-up&#10024;</strong> button your growth team so thoughtfully placed in the middle of your website. She signs up with her email address and creates a password. Somewhere, a growth marketer wakes from her dreams. Success.</p><p>If your engineering team was kind and generous, they also instrumented the sign-up event and the subsequent sign-in, and now you have three types of events, and they might look like this.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cAGj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fff13ede1-57fe-47de-a0d4-d7064504e39b_523x253.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cAGj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fff13ede1-57fe-47de-a0d4-d7064504e39b_523x253.png 424w, https://substackcdn.com/image/fetch/$s_!cAGj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fff13ede1-57fe-47de-a0d4-d7064504e39b_523x253.png 848w, https://substackcdn.com/image/fetch/$s_!cAGj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fff13ede1-57fe-47de-a0d4-d7064504e39b_523x253.png 1272w, https://substackcdn.com/image/fetch/$s_!cAGj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fff13ede1-57fe-47de-a0d4-d7064504e39b_523x253.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cAGj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fff13ede1-57fe-47de-a0d4-d7064504e39b_523x253.png" width="523" height="253" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/ff13ede1-57fe-47de-a0d4-d7064504e39b_523x253.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:253,&quot;width&quot;:523,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:34172,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cAGj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fff13ede1-57fe-47de-a0d4-d7064504e39b_523x253.png 424w, https://substackcdn.com/image/fetch/$s_!cAGj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fff13ede1-57fe-47de-a0d4-d7064504e39b_523x253.png 848w, https://substackcdn.com/image/fetch/$s_!cAGj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fff13ede1-57fe-47de-a0d4-d7064504e39b_523x253.png 1272w, https://substackcdn.com/image/fetch/$s_!cAGj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fff13ede1-57fe-47de-a0d4-d7064504e39b_523x253.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>Our tracked events table, (let&#8217;s call it <code>tracks</code>), looks like this:</p><pre><code>anonymous_id | event_name | event_time | user_id  | source
123          | viewed_page| 2022-06-01 | null     | web
123          | signed_up  | 2022-08-01 | aaaa-111 | nodejs
123          | signed_in  | 2022-08-01 | aaaa-111 | nodejs</code></pre><p>A few things to note:</p><ol><li><p>Your old rows don&#8217;t get updated when new information arrives. In June, we didn&#8217;t know the user id of our visitor, since they had not signed up. But in August we did have that information.</p></li><li><p>Events take place in different contexts. The first event was emitted from your marketing website on the front-end. The second and third were server-side events from the backend, directly to your event-tracker. The first event may not always fire, depending on ad-blocking,  network blips or browser behaviour.</p></li></ol><p>These nuances will make the lives of your data practitioners hard, so it&#8217;s important to have lots of sympathy and moral support for them when they inevitably start working on sessionization. <em>Help is available, and they are not alone.</em></p><h3>What Can We Do With Events?</h3><p>Given just the three events, we can ask many different types of questions:</p><h4>Attribution</h4><p><em>What are the leading sources of user sign ups?</em></p><p><em>For people who signed up, what was the first page they visited on our marketing site? Or the last?</em></p><h4>Adoption</h4><p><em>How many people signed-up for my product each day? </em></p><p><em>How long does it take for an average visitor to sign-up?</em></p><p><em>What percent of visitors end up signing up for our product, and how does that change over time?</em></p><h4>Engagement</h4><p><em>How many people who sign-in to our product every day are new users? </em></p><p><em>How many of them are existing? </em></p><p><em>How many users stopped signing in? </em></p><p><em>How many users came back after a break?</em></p><div><hr></div><h3>Stitching User Events</h3><p>Before we can start chipping away at our newly formed backlog of questions we still have to solve the fundamental problem of <em>user stitching</em>. We want to associate every event we have with the user id, even if the user was not known until later.</p><p>Given the simplified example above, we can create a mapping of <code>anonymous_id &#8594; user_id </code>by using a <a href="https://docs.snowflake.com/en/sql-reference/functions-analytic.html">window function</a>. </p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hTXq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F145e66d3-0a42-4d74-8693-8b4a38a6aebe_802x162.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hTXq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F145e66d3-0a42-4d74-8693-8b4a38a6aebe_802x162.png 424w, https://substackcdn.com/image/fetch/$s_!hTXq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F145e66d3-0a42-4d74-8693-8b4a38a6aebe_802x162.png 848w, https://substackcdn.com/image/fetch/$s_!hTXq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F145e66d3-0a42-4d74-8693-8b4a38a6aebe_802x162.png 1272w, https://substackcdn.com/image/fetch/$s_!hTXq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F145e66d3-0a42-4d74-8693-8b4a38a6aebe_802x162.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hTXq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F145e66d3-0a42-4d74-8693-8b4a38a6aebe_802x162.png" width="802" height="162" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/145e66d3-0a42-4d74-8693-8b4a38a6aebe_802x162.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:162,&quot;width&quot;:802,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:18763,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hTXq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F145e66d3-0a42-4d74-8693-8b4a38a6aebe_802x162.png 424w, https://substackcdn.com/image/fetch/$s_!hTXq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F145e66d3-0a42-4d74-8693-8b4a38a6aebe_802x162.png 848w, https://substackcdn.com/image/fetch/$s_!hTXq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F145e66d3-0a42-4d74-8693-8b4a38a6aebe_802x162.png 1272w, https://substackcdn.com/image/fetch/$s_!hTXq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F145e66d3-0a42-4d74-8693-8b4a38a6aebe_802x162.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">For simplicity, I&#8217;m using Snowflake syntax, with other implementations you may need to specify: <code>ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING </code>to look past the current row for the last value. </figcaption></figure></div><p>If you&#8217;re new to window functions, this can look daunting. Think of a window as a slice of a table. We want to operate on every row that has the same <code>anonymous_id</code>. In each slice, apply a function to get a result, and add it as a new column. In this case, we&#8217;re applying the <code>last</code> function, which finds the last row in that window. </p><p>Here&#8217;s a little illustration of how a window function might work. Start by taking the partition highlighted in yellow, then within each partition order by timestamp, and then take the last value in that partition, and use that as the result that fills each row. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!T507!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf55b494-298e-4abb-a299-bec34ab98cca_1844x1484.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!T507!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf55b494-298e-4abb-a299-bec34ab98cca_1844x1484.png 424w, https://substackcdn.com/image/fetch/$s_!T507!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf55b494-298e-4abb-a299-bec34ab98cca_1844x1484.png 848w, https://substackcdn.com/image/fetch/$s_!T507!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf55b494-298e-4abb-a299-bec34ab98cca_1844x1484.png 1272w, https://substackcdn.com/image/fetch/$s_!T507!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf55b494-298e-4abb-a299-bec34ab98cca_1844x1484.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!T507!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf55b494-298e-4abb-a299-bec34ab98cca_1844x1484.png" width="1456" height="1172" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/df55b494-298e-4abb-a299-bec34ab98cca_1844x1484.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1172,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:468676,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!T507!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf55b494-298e-4abb-a299-bec34ab98cca_1844x1484.png 424w, https://substackcdn.com/image/fetch/$s_!T507!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf55b494-298e-4abb-a299-bec34ab98cca_1844x1484.png 848w, https://substackcdn.com/image/fetch/$s_!T507!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf55b494-298e-4abb-a299-bec34ab98cca_1844x1484.png 1272w, https://substackcdn.com/image/fetch/$s_!T507!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf55b494-298e-4abb-a299-bec34ab98cca_1844x1484.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Once the events have been stitched, we have our desired output: a mapping of anonymous id to user id. </p><p>Conversely, we also know that any anonymous id that isn&#8217;t mapped to a user id has not signed up, or cannot otherwise be identified.<br></p><pre><code>anonymous_id | user_id 
123          | aaaa-111
124          | aaaa-111
125          | aaaa-111
234          | bbbb-222 
789          | NULL      &lt;- this person has never signed up</code></pre><p>With the above, we can now start to chip away at our questions from before. </p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://databased.pedramnavid.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Liking what you read? Data Based is only possible because of the support of subscribers. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h4>Attribution</h4><p><em>What are the leading sources of user sign ups?</em></p><p><em>For people who signed up, what was the first page they visited on our marketing site? Or the last?</em></p><p>What we know:</p><ul><li><p>all the visitors to our websites and how they got there (from <a href="https://pedram.substack.com/p/counting-users">Counting Users, Part 1</a>)</p></li><li><p>which visitors eventually became users</p></li></ul><p>So naturally,  we can look at where visitors came from before signing up to understand where our sign-ups come from.</p><p>There are many models for attributing sign-ups to visits (all of them bad,  some of them useful). The simplest ones look at either the first, or last thing someone did before they performed a conversion. Consider the following events:</p><pre><code>anonymous_id| path   | utm        | referrer | event_time
123         | /      | cpc-google | google   | 2 days ago
123         | /blog/ | NULL       | bing     | yesterday</code></pre><p>We start by joining  <code>anonymous_id</code> above to a <code>stitching</code> table we created in the previous step, and let&#8217;s say we find that the anonymous user <code>123 </code>is actually user <code>aaaa-1111. </code></p><p>Pretend we also have a table that tells us when the user signed-up, and  it was <code>today. </code>We can either give credit to a cost-per-click advertisement (first-touch attribution) or to our blog (last-touch attribution). </p><p>We can get more complex if we wish. Maybe we only want to look back a certain number of days. For example, does it make sense to give credit to a paid ad from 18 months ago if someone signs up today? </p><p>We might want to categorize different types of web traffic according to some rules by bucketing similar traffic together, such as social media sources. (There&#8217;s a great <a href="https://github.com/dbt-labs/segment/blob/main/seeds/referrer_mapping.csv">dbt seed file</a> in the segment package that helps with this)</p><p>We may want to go beyond attributing conversion events, to better understand what brings new visitors to our site for the first time. For every user, we could look at the first page they visited and categorize that traffic to understand &#8216;landing pages&#8217;.</p><p>Hard to believe, but the answers to every single one of these questions starts with just a few events and stitching.</p><h4>Adoption</h4><p><em>How many people signed-up for my product each day? </em></p><p><em>How long does it take for an average visitor to sign-up?</em></p><p><em>What percent of visitors end up signing up for our product, and how does that change over time?</em></p><p>Understanding adoption is also made possible by the same types of events we used for attribution. If we look only at the sign-in events, we can count how many people visit our site like so:</p><pre><code>select 
date_trunc('day, timestamp) as event_day,
count(distinct user_id) 

from tracks
group by event_day</code></pre><p>This query counts the number of distinct users within a specified time period. We use count distinct because a single user often has multiple events a day.</p><p>If we want to know how long it takes for a visitor to sign-up, we can look at the time elapsed between their first visit, and their sign-up event.</p><pre><code>with conversions as (
  /* Assume one signed-up event per user for simplicity */
    select
    user_id,
    timestamp as signup_date

    from tracks 
    where event_name = 'signed-up'

)

select distinct

user_id,
first_value(timestamp) over(partition by stitched.user_id) as first_event_date,
signup_date,
datediff('days', first_event_date, signup_date) as days_to_signup

from tracks 
join stitched using(anonymous_id)</code></pre><p>We use a window function again, this time to get the first event. We count the days between the first event and the conversion to see how long it takes for someone to sign up.</p><p>We can also perform a very rudimentary funnel analysis by counting the number of new visitors and sign-ups each day. </p><p>To help us, let&#8217;s imagine a helper column called <code>blended_user_id</code>. It is the user id if it&#8217;s known, or the anonymous id if not. </p><p>We find the first event ever for a particular blended user id, and then find the first sign-up event for each user. Count the number of times each of those events happen, every day, and get a funnel count of visitors &#8594; users.</p><pre><code>with visitors as (
    select 

    date_trunc('days', timestamp) as day,
    count(distinct blended_user_id) as new_visitors

    from stitched_tracks
    group by day
    qualify row_number() over(partition by anonymous_id order by timestamp) = 1
),

signups as (

    select 

    date_trunc('days', timestamp) as day,
    count(distinct blended_user_id) as new_signups

    from stitched_tracks
    where event_name = 'signed-up'
    group by day
    qualify row_number() over(partition by anonymous_id order by timestamp) = 1
),

select 

day,
new_signups,
new_visitors

from visitors
full join signups using (day)</code></pre><h4>Engagement</h4><p><em>How many people who sign-in to our product every day are new users? </em></p><p><em>How many of them are existing? </em></p><p><em>How many users stopped signing in? </em></p><p><em>How many users came back after a break?</em></p><p>We can even start to get into some fun churn and retention analysis. One really simple (and not useful) way to measure churn might be to count:</p><ul><li><p>Anyone who signed in today that signed in yesterday (retention)</p></li><li><p>Anyone who signed in yesterday that didn&#8217;t sign in today (churn)</p></li></ul><p>We&#8217;re using some really fun SQL now, by joining a single table to itself and offsetting the day in the join condition.</p><pre><code>with daily_activity as (
  select distinct
    date_trunc('day', timestamp) as day,
    user_id
  from tracks
  where user_id is not null
),

retained as (
select
  today.day,
  count(distinct today.user_id) as retained
from daily_activity today
join daily_activity yesterday
  on today.user_id = yesterday.user_id
  and today.day = yesterday.day + interval 1 DAY
group by today.day
),

churned as (
select
  yesterday.day + interval 1 DAY as day,
  count(distinct yesterday.user_id) as churned
from daily_activity yesterday
left join daily_activity today
  on today.user_id = yesterday.user_id
  and today.day =  yesterday.day + interval 1 DAY,
where today.user_id is null
group by 1
)

select 
day,
coalesce(retained, 0) as retained,
coalesce(churned, 0) as churned

from retained
full join churned using (day)
order by 1
</code></pre><p>This example was taken with great inspiration from the <strong><a href="https://www.sisense.com/blog/use-self-joins-to-calculate-your-retention-churn-and-reactivation-metrics/">Sisense blog</a>, </strong>so feel free to give it a read to really understand what&#8217;s going on. Don&#8217;t sweat if this one makes your head hurt, the goal here is really to show you how much you can do with just a couple events.</p><p>I hope this was a useful foray into the depths you can go to with event streams. The world only gets more complicated from here as you try to do things like tie ad spend to revenue by connecting Salesforce Accounts to Product Signups through intermediary tables. Yuck! Let&#8217;s pretend we never spoke of such things. </p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://databased.pedramnavid.com/p/count-things-counting-users-part?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thank you for reading Data Based! If you enjoyed this post, it would mean a lot if you shared it!</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://databased.pedramnavid.com/p/count-things-counting-users-part?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://databased.pedramnavid.com/p/count-things-counting-users-part?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><div><hr></div><p>Did you enjoy this post? Do you have ideas for future metrics to cover? Maybe you think Cohort Analysis is something you&#8217;ve always wanted to learn more, or you think there&#8217;s nothing hotter than a well-defined activation metrics. Well, leave a comment or <a href="mailto:pedram@pedramnavid.com">drop me an email!</a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Counting Things: Counting Users Part 1]]></title><description><![CDATA[one of the easiest things to define]]></description><link>https://databased.pedramnavid.com/p/counting-users</link><guid isPermaLink="false">https://databased.pedramnavid.com/p/counting-users</guid><dc:creator><![CDATA[Pedram Navid]]></dc:creator><pubDate>Sat, 23 Jul 2022 22:45:16 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!3lFY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F09cafd56-ce17-46db-9807-7b461bf54569_637x443.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This is a new series as part of my promise to write more in-depth content for people who care about data. Every [irregular interval] I will cover a metric that we all know and love and go deep on it. The goal is not to just define metrics, but show the thought-process that can go into them. If you&#8217;re a data practitioner, I hope you&#8217;ll learn something new. If you work with data practitioners, I hope you learn the value that data teams can bring to an organization.</p><div><hr></div><p>It&#8217;s a question as old as time: how many users we do we have? Well, that depends on what you mean by &#8216;users&#8217;, and &#8216;we&#8217;, and &#8216;have&#8217;. </p><p>If I were to press you to define them, you might have a few definitions for each. For example,</p><p>A user is:</p><ul><li><p>Someone who visited our website, or</p></li><li><p>Someone who has logged in to our application</p></li><li><p>Any account or customer within our CRM</p></li></ul><p>We might mean:</p><ul><li><p>All the teams at our company</p></li><li><p>Anything that the marketing team is responsible for</p></li><li><p>Anyone that sales knows about</p></li></ul><p>Have could mean:</p><ul><li><p>The number of users we have today</p></li><li><p>The number of users we&#8217;ve ever had</p></li><li><p>The number of users we&#8217;ve had on a given day, at that point in time, and subsequently into the future</p></li></ul><p>Let&#8217;s dig into the first one for now. We&#8217;ll return to the second one later.</p><h2>How many people visited our website?</h2><p>Let&#8217;s take the first one: how do you know when someone visits our website? Well, we have event tracking, so our event tracker can tell us when someone views any of our pages. But what data does an event tracker provide? Let&#8217;s take <a href="https://www.rudderstack.com/docs/destinations/warehouse-destinations/warehouse-schema/#standard-rudderstack-properties">Rudderstack&#8217;s standard schema</a> and explore it further. When you save their data to your warehouse, you get something like this:</p><ul><li><p>anonymous_id: The user&#8217;s anonymous ID</p></li><li><p>event: the name of the event</p></li><li><p>context_ip: The IP address of the device </p></li><li><p>context_&lt;props&gt;: Additional properties on the event</p></li><li><p>id: the event&#8217;s unique id</p></li><li><p>url / path: the URL and path where the event was captured</p></li><li><p>timestamps: various timestamps with slight nuances that don&#8217;t matter here</p></li></ul><p>It seems we&#8217;re in the clear. If we want to know how many people visited our website, we can just<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> count the distinct number of anonymous IDs. This data stuff is easier than you think!</p><p>But, let&#8217;s push our curiosity out a little more. What..is an anonymous id? Well, we don&#8217;t have to go very far to find out. Rudderstack is open-source so we can <a href="https://github.com/rudderlabs/rudder-sdk-js-autotrack/blob/0e249fc65b4f36646047dacf9462cf2fb65fd2b8/analytics.js">find out for ourselves.</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3lFY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F09cafd56-ce17-46db-9807-7b461bf54569_637x443.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3lFY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F09cafd56-ce17-46db-9807-7b461bf54569_637x443.png 424w, https://substackcdn.com/image/fetch/$s_!3lFY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F09cafd56-ce17-46db-9807-7b461bf54569_637x443.png 848w, https://substackcdn.com/image/fetch/$s_!3lFY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F09cafd56-ce17-46db-9807-7b461bf54569_637x443.png 1272w, https://substackcdn.com/image/fetch/$s_!3lFY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F09cafd56-ce17-46db-9807-7b461bf54569_637x443.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3lFY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F09cafd56-ce17-46db-9807-7b461bf54569_637x443.png" width="637" height="443" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/09cafd56-ce17-46db-9807-7b461bf54569_637x443.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:443,&quot;width&quot;:637,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:53895,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3lFY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F09cafd56-ce17-46db-9807-7b461bf54569_637x443.png 424w, https://substackcdn.com/image/fetch/$s_!3lFY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F09cafd56-ce17-46db-9807-7b461bf54569_637x443.png 848w, https://substackcdn.com/image/fetch/$s_!3lFY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F09cafd56-ce17-46db-9807-7b461bf54569_637x443.png 1272w, https://substackcdn.com/image/fetch/$s_!3lFY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F09cafd56-ce17-46db-9807-7b461bf54569_637x443.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In here, <code>storage</code> refers to the device&#8217;s local storage where cookies are saved. If the ID already exists, then your anonymous id is what it was previously. But if not, then <a href="https://github.com/rudderlabs/rudder-sdk-js/blob/480d37d3dc1119de42c13fbbb3e836f967236fc8/utils/utils.js#L37">Rudderstack will generate one for you</a> and save it to your cookies.</p><p>Interesting, so if you clear your cookies, then you get a new anonymous id. If you use a different browser, you get a new anonymous id. If you use private mode, you get a new anonymous id. If you use a different device, like your laptop, your work laptop, or your phone, then you get a new anonymous id. And with the war on cookies from Safari and Firefox, this problem is getting worse. Turns out one person can have many different anonymous ids. </p><p>Well, what about the IP address? Couldn&#8217;t we just use that to dedupe? Let&#8217;s think a bit more about that one too. How do devices get an IP address? Let&#8217;s not dive too deep, but from a router. But multiple people connect to the same router. Especially at work, or at school, or at the airport, or on public wifi. We could have many, many different people all on the same IP address. </p><p>Wow, maybe counting things isn&#8217;t so easy after all?</p><p>So what do we do? <em>Well, in the absence of the right answer, we often have to make do with a good enough answer.</em> Let&#8217;s say that when we count visitors to our website, we will count the distinct anonymous ids, knowing full well that that number over-inflates the true number of people visiting our website.  </p><p>Our final code might look something like this </p><pre><code>select count(distinct anonymous_id) from events;</code></pre><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://databased.pedramnavid.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Liking what you read? Data Based is only possible because of the support of subscribers. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h3>How many net new people visited our website every week?</h3><p>Okay, so we know how many &#8216;people&#8217; visited our website. But that&#8217;s not actionable data. Is 20,000 good? Is 50,000 bad? What we almost always want with analytics is understanding change over time.</p><p>The first question to ask is what time-grain should we break down our data over time? We can count the number of users every minute, hour, day, week, month, or year. </p><p>Picking the right time-grain is context-dependent, but the choice revolves around having enough time for the noise to level off but not so much time that the results are not actionable. Also consider how often you will look at the data. Looking at data daily or more frequently than that is not healthy, or good for the soul, and I&#8217;m all about making sure you&#8217;re living a healthy, happy life.</p><p>Daily data is subject to fluctuations based on weekends and holidays. Monthly data is smoother, but it lacks immediacy. Do you want to wait 20 days to find out your website broke on the 10th? Let&#8217;s go with weekly, it smooths out the weekends and provides a nice balance.</p><p>The simple approach to counting people by week might look something like this:</p><pre><code>select 

date_trunc('week', timestamp) as week, 
count(distinct anonymous_id) as visitors 

from events
group by 1</code></pre><p>What we end up measuring here is the number of unique visitors to our website, every week. If Pedram and Claire both visit the website every week, but no one new shows up, well have a steady rate of 2 weekly users. Fine, but not exciting enough.</p><p>What we&#8217;re interested is how many new people are we bringing into our website. We want new people joining so we can create a healthy top-of-funnel pipeline to drive our marketing and sales motions. Without new people visiting, we&#8217;ll run out of sales, our company will die, and we will be sad forever. We&#8217;re all about happiness here.</p><p>So instead, let&#8217;s find out when we first saw a user:</p><pre><code>select

date_trunc('week', timestamp) as week,
anonymous_id,
row_number() over(partition by anonymous_id order by timestamp) as event_date_index

from events;</code></pre><p>This cute little row_number function does nothing more than count from 1 all the way down until there are no more rows. But, the magic is in the partition. A partition is nothing more than a group, so we&#8217;re asking our little function to count the number of times every user visited our website, from the 1st time, to the last time (we ordered from oldest visit to newest, but could also have done it in descending order with <code>order by timestamp desc</code>)</p><p>Now we can do something fun. We can find the first time a user visited our website by filtering that previous query.</p><pre><code>with numbered_events as (
  select

  date_trunc('week', timestamp) as week,
  anonymous_id,
  row_number() over(partition by anonymous_id order by timestamp) as event_date_index

  from events
)

select 

week, 
count(anonymous_id) as new_visitors

from numbered_events
where event_date_index = 1
group by 1</code></pre><p>We use a CTE<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> because we can&#8217;t filter on <code>row_number</code> using WHERE or HAVING<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a></p><p>So we answered our first question and dug into it a little bit. We have a better sense of the limitations of the data, and why it&#8217;s hard, but that doesn&#8217;t mean it&#8217;s not useful. Every week we can keep an eye on our overall users and see how it is trending. We might even take our models further and use things like the referrer and UTM parameters to better understand not just how many users we have, but where they come from! On to our next question.</p><h3>Next Time: How Many People Use Our Product?</h3><p>Now that we&#8217;ve solved the first question, in our next one we&#8217;ll dig into our product itself. There, our users have authenticated, so counting should be easier. But we might end up with some more interesting questions, like how many of them use our product every day? </p><p>Did you enjoy this post? Do you have ideas for future metrics to cover? Maybe you think Cohort Analysis is something you&#8217;ve always wanted to learn more, or you think there&#8217;s nothing hotter than a well-defined activation metrics. Well, leave a comment or <a href="mailto:pedram@pedramnavid.com">drop me an email!</a></p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://databased.pedramnavid.com/p/counting-users?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thank you for reading Data Based! If you enjoyed this post, it would mean a lot if you shared it!</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://databased.pedramnavid.com/p/counting-users?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://databased.pedramnavid.com/p/counting-users?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>Whenever I use the world <em>just</em> to show how easy something is, the thing I&#8217;m describing is actually really hard.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>A CTE, or Common Table Expression, is a way of taking a snippet of SQL, putting it in a little metaphorical box, and giving it a name so you can reuse it later. </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>We can use QUALIFY in Snowflake, but not every database supports that function.</p></div></div>]]></content:encoded></item></channel></rss>