Whenever I speak to an investor, they ask me to predict the future of data. As a trained data scientist, I know all predictions are wrong, but some are useful. Unfortunately, I haven’t learned to figure out which are useful, so I usually tell them I have no idea what will happen.
But I soon learned that knowledge isn’t a requirement for predictions. We make all kinds of assumptions and predictions without any basis in fact all the time, and maybe that’s okay
So then I started to come up with a laundry list of predictions I had for the future, and thought, why not make this into a blog post, as I’ve been horrible at keeping up with this Substack? Perhaps I can trick these gentle readers into thinking I have actual content when all I have are screenshots of moments where I was funny.
But, no, that felt wrong too. I have more than just screenshots to share with the world; I also have unfinished drafts. What if I could surreptitiously sneak in a few drafts and random screenshots together, would the reader even notice? Unlikely unless it was made incredibly obvious to them by some dim-witted narrator who wasn’t too careful with his (or her) words.
And so, kind reader here is every thought I’ve had in 2022, gathered from drafts, from notes, from todo’s, from tweets, from slack, from the withered remains of my memory.
If there’s one thing I can say about 2022 is that it certainly did happen, and it is almost, without a doubt, nearly over. And I have the data to prove it.
On Building a Brand
Here’s the ugly truth. Being public and writing publicly has done more for my career than anything else. Never mind the books I read, the courses I took, the systems I built, the systems I destroyed, the resumes I wrote, the cover letters I didn’t, none of it has had the impact of being public and writing about myself.
Not to say none of that stuff isn’t necessary, but (in a language you nerds will understand) it isn’t sufficient.
Marketing, to some, is a dirty word. I’ve learned to embrace it. I am still in the unfortunate position of caring about job security, and being seen as someone who knows things is more important than knowing things.
Networking, to some, is a dirty word. I’ve learned to rephrase it. Networking, to me, is nothing more than being friends with people who work in the same industry as you. I’ve made a lot of friends this way. I’ve also made a few enemies.1
There’s no magic trick, secret cabal, or mysterious meetings. I talk to people I’m genuinely interested in, meet them in person when I can, and treat them as well as I can. If we like each other, we become friends, and if we’re friends, we’ll help each other. That’s all networking is.
Source: Substack post called “How I Learned to Stop Worrying and Love My Job”.
Status: Draft has now been deleted.
On Asking for Help
Learn to ask for help. This is somewhat related to the previous topic. Strangers don’t owe you anything, so if you’re asking them for help, please, learn to ask for help.
Here are some simple steps to follow:
Acknowledge that you’re asking for someone’s time and mental energy. You may not get anything back, and that is okay. It is not a reflection on you or them.
Keep it succinct. Do not send a four-page essay to someone. If they have follow-up questions, they will ask you. Make it easy for them to say yes.
Show you’ve done the work. Have you thought carefully about why you’re reaching out to this person? Have you tried answering it on your own? Here’s a great example I made up just now: “Hey, Pedram, I’ve been thinking about switching from data scientist to data engineering, and you’ve talked publicly about having made that career switch before. I was wondering if you wouldn’t mind answering a few questions I had. I can send them here or by e-mail if that’s easier for you. Thanks for your time!’”
I honestly don’t mind helping people. I’ve been very fortunate in life and my career, and I want others to have that too. But please, make it easy for me to help you. Help me help you.
Source: Substack post called “How to Ask for Help and How to Give It: A guide for everyone who's ever messaged me out of the blue”.
Status: Draft has now been deleted.
On Hiring Your First Data Role at a Startup
I don’t think we have a good answer for this yet. Data is one of the loneliest positions at startups. Up there with Finance. A team of 1 for far too long, no one to talk to, no one to bounce ideas off of.
Whatever you do, don’t hire someone too junior. Data roles aren’t about data. They’re about negotiating between varying teams about who gets credit for success and who gets blamed for failures. They’re about trying to get an organization aligned on what matters. How you measure something is more about the process than it is about a data pipeline. Finding holes in the process will be all you ever do.
Data roles can be extremely isolating, even in the best of times. Apart from a stack under constant evolution, data teams are often a single-person show, while their peers in engineering typically have multiple people they can rely on for everything from code reviews, to mentorship and just ranting at each other about the state of Javascript.
Not so with data, which is perhaps why communities like dbt and Locally Optimistic are so large and vibrant. When you don’t have anyone inside the company who can truly feel the pain of incompatible schema changes that weren’t communicated, a community of others who can understand your pains becomes very valuable. But having a community is no replacement for a leader, and that’s where I think most frustrations data practitioners in their role feel, especially in early-stage companies as the first data hire.
Source: Substack post called “Hiring Your First Data Hire”.
Status: Draft has now been deleted.
On Talking About The Work
When we talk about data, we talk about frameworks, mental models, designing data teams, bundling, unbundling, rebundling, databases, databases, and more databases.
What we don’t talk about is the work because we can’t. Data work is private, secret, and sometimes legally-protected. Engineers get to write blog posts and release open-source software for the work they build. Data people get to talk about a tool they used, maybe a method without context, if you’re lucky. But never the journey that got us there. Just imagine this talk at a conference:
“We’re a mortgage company, and our processing rate was down 15% this quarter. We ran an analysis to identify the causes and found that it was because we were chronically understaffed during the summer months; however, after doing a cost-benefit analysis, we found that it was cheaper to have longer processing times than to hire additional staff to cover peak hours, so we decided to settle for reduced service levels.
In this talk, I will discuss how we discovered our findings and how I negotiated to present these results to our CFO without upsetting our partners in product and processing.”
I would love to hear that talk, but it will never happen. So instead, you get a talk on how we enabled self-serve analytics by buying a tool.
Source: Substack post called “We Need to Talk about Data (but can't)”.
Status: Draft has now been deleted.
On Smelly Code
Data Modeling Code Smells is my term for the stuff you write; as you write it, you tell yourself…this stinks.
Here’s a non-exhaustive list of smelly code when data modeling:
Duplicated code
Too many lines
Too many columns (very wide models)
Excessive comments
Clean-up code in marts
Too much jinja
Inconsistent naming
Casts and Coalesces
Right Joins
Functions in Joins
Magic Variables
Source: Substack post called “Data Modeling Code Smells”.
Status: Draft has now been deleted.
On dbt Cloud’s Pricing Changes
I used to work at a bank, where I did compensation modeling. Once you see the world through the lens of incentives, your brain breaks, and you cannot see anything but incentives everywhere you look.
Pricing and incentives are among the most interesting parts of running any business. The pricing model you choose can make or break you. dbt Cloud’s decision to go with a seat-based pricing model for their cloud text editor/scheduler made sense at the time but locked them into a corner.
The best thing you can do is align your pricing structure with the value you create for your customers. If you can’t do that, then you might as well just make up numbers and call it enterprise pricing. This is a perfectly valid way of pricing your product; call it a platform fee.
Seat-based pricing is rarely a high-growth strategy. Unfortunately, when you raise VC Capital, you’re expected to have high growth until you exit (at which point you can cease growing altogether).
Consumption-based pricing is better, when it can work, and when it can be understood. Fivetran is one of the few MDS companies that makes money because it’s easy to get started and rack up a $100k annual spend without blinking. Snowflake isn’t hitting 166% NRR on seat-based pricing.
When your SaaS product costs the average company less than they spend on toilet paper, you’ve got a Shit Pricing Model™. When money is free, pricing doesn’t matter, but when money costs something, the Shit Pricing Model ™ needs a redo. It’s no surprise dbt decided to increase their prices; what is surprising is:
They did it with very little notice at the end of the year.
They claimed that they were doing it because we asked them to. Just own up that you’re a business trying to make money; it’s not a crime. It’s an easier story to believe.
It came with no real increase in value to the customer.
It is relatively easy to trade their subpar experience for a home-grown subpar experience.
Again, it all comes down to incentives. People who had no incentive to roll their own mini-scheduler now had a major incentive to switch off dbt Cloud. Going from $50 to $100 a month doesn’t really impact most teams (and doesn’t really generate any real revenue for dbt). But going from $5k a year to an enterprise contract because you have 9 analysts, well, now we have the incentive to try and build it in-house. Yikes!
I know I am not a CEO (wait, I am now), and it’s easy to criticize from the sidelines (but when has that ever stopped anyone?), so in the end, what I say doesn’t matter or mean much. But using what limited info I have (here comes that data-talk again), I would have used levers like metrics or dbt Server to push companies into enterprise rather than just seats, especially since those are much harder to build internally. Oh well, what do I know!
Source: Twitter threads ,and substack post called “Let’s Talk about Incentives Baby”.
Status: Draft has now been deleted.
100 Ways to Align with Business Outcomes
Now that money isn’t free, it’s time to prove your data teams are worth something. Here are 100 ways to align your data team with business outcomes.
Actually, I only came up with 16 before running out of ideas. Sorry.
Optimize marketing spend to increase conversions
Identify marketing channels to drop due to inefficient spend
Run A/B analysis on email campaigns to identify better messaging
Forecast end-of-quarter pipeline to identify sales targets
Calculate the true customer acquisition cost by channel and recommend ways to reduce expensive channels
Analyze inventory orders to identify wasted opportunities
Identify factors that lead to backlogs during peak demands season at processing centers
Create an LTV model for customers and build a process for continually updating it. Identify best sources for high LTV
Analyze and forecast infrastructure cost as a function of customer growth and recommend ways to prevent linear or higher growth.
Analyze funnel data to identify drop-off points and make recommendations on how to improve the experience to increase retention
Analyze support tickets for common themes and recommend product improvements to reduce tickets
Figure out how long it takes for leads to get to a first meeting and identify ways to highlight quality leads earlier to reduce that length
Create a lead scoring model and identify commonalities in top-scoring leads and recommend a nurture campaign to get them into conversations
Analyze cloud costs and identify underutilized resources to reduce cloud spend
Identify sales pipeline drop-off and break it down by factors to identify leaks in the sales process and how to plug them.
Analyze experimentation results to identify perverse incentives that may have happened through gamed metrics and recommend guardrail metrics to protect against them
Source: LinkedIn and Notes
Status: Notes draft has now been deleted.
And last but not least, from my Notes Draft
Here comes the dag again Failing all my jobs like a memory Head in my hands like a new emotion I want to work in the open source I want to talk like lovers do I want to dive into your data Is it activating with you So baby talk to me Like data do Talk to me Like lovers do Talk to me Like data do
Source: Notes
Status: Notes remains in case I ever produce this song
Feel free to ask me about this if you want to know the ugly side of being publicly known. I’ve had real repeated threats against me, people upset that they weren’t included in private conversations I’ve had with my friends, and more. There’s a real ugly side out there that I hope none of you ever experience.
I LOVE THIS I never really put two and two together when trying to understand why we can’t really talk about data work the way engineers do lol.. glad to see you clearing out those drafts finally 😉