Is Your Data Ready for the Next Level? A Simple Maturity Assessment Tool

When IT leaders talk about their data, the word “mess” usually comes alongside it. Inconsistent formats, missing data, and scattered information across silos can be overwhelming. But here’s the thing: what if that mess isn’t just something to clean up? What if it’s an opportunity to take your data to the next level? 

In this post, we’ll guide you through a simple data maturity assessment tool that will help you reframe your mindset on mess, measure where your data practices stand, identify where to focus, and take actionable steps to improve.

Embrace the Data Mess

We’ve all been there. Your data feels like a tangled web that is impossible to fix. But before you throw your hands up in frustration, remember that messes can be good.

When things are messy, people are more likely to pitch in and help. Messes tend to spark emergent behaviors and patterns. People naturally want to collaborate and figure out how to make things better.

James Clear offers a story in his book Atomic Habits about a photography class experiment. Half the class was graded on how many photos they turned in, while the other half was judged solely on their single best photo. What happened? The students who focused on quantity embracing the messy, iterative process ended up creating better results than those trying to create the perfect photo.

The takeaway? Trying to perfect your data before doing anything will only slow you down. Instead, embrace the mess. Start working with what you have, and you’ll learn and improve along the way.

Step One: Assess Where You’re At

Before jumping into action, it helps to know where you stand so you know where to focus and how to prioritize. That’s where a maturity assessment comes in. Here’s a simple self-assessment we’ve created to measure where your data practices fall across five key dimensions:

  • Data Centralization: Do you have a centralized data lake or lakehouse? Is your data accessible across teams, or is it scattered? The more integrated your data is, the easier it is to make use of it.
  • Data Governance and Quality: Do you know where your data is coming from? Is it good? And is your data well-described? This is critical for making sure your data is reliable.
  • Team Skills: Can your team build end-to-end data pipelines? Do they know how to work with machine learning tools? You can’t progress without a strong team, so assess their skill set early on.
  • AI/ML Capability: Have you started experimenting with AI and machine learning models? Or are you already deploying them in production? This helps you understand how mature your AI practices are and what gaps to fill.
  • Scalability and Future-Proofing: Can your infrastructure handle growth? Real-time data processing? Without scalability, your systems will struggle as you scale your data.

Step Two: Take Action Based on Your Maturity

Once you’ve assessed your current state, it’s time to act and keep pushing forward. We break maturity into three main stages, and each stage has specific goals and actions to help you progress. Here’s how to move your data practices to the next level:

If You’re in the “Ad Hoc” Stage

Goal: Move to structured data management.
At this stage, you’re just getting started. Focus on building a foundation that will enable more advanced data practices down the line. Here’s where to focus:

  • Centralize Raw Data: Migrate your raw data to a cloud data lake (e.g., S3, ADLS). This gives you a central repository to start building on.
  • Governance Basics: Document key datasets and their ownership, even if it’s as simple as a spreadsheet. It’s crucial to start organizing your data early.
  • Upskill Teams: Get your team trained on foundational tools like SQL, Python, and cloud basics (consider AWS/Azure courses) to build a solid skillset.
  • Pilot AI: Run a low-code ML proof-of-concept (e.g., Azure AutoML, SageMaker Canvas). This allows your team to experiment with AI/ML without diving into full-scale development.

The goal here is to get something usable into your data consumers’ hands quickly so they can provide feedback and demonstrate value.

If You’re in the “Structured” Stage

Goal: Enable repeatable AI/ML workflows.
At this stage, you’ve begun integrating your data, and now it’s time to make your processes repeatable and efficient. Here’s what you should be working on:

  • Automate Pipelines: Use tools like Airflow, Databricks Workflows, or AWS Glue to automate your data pipelines, improving efficiency and reliability.
  • Enforce Governance: Implement a data catalog (e.g., AWS Glue Catalog, Purview, Atlan) to make your data easily discoverable, organized, and governed.
  • Build ML Pipelines: Standardize machine learning workflows with tools like MLflow or SageMaker Pipelines to ensure scalability and consistency.
  • Optimize Costs: Rightsize your cloud resources (e.g., spot instances, reserved capacity) to control costs while scaling your data infrastructure.

At this stage, the focus is on structuring your data processes, building scalable systems, and laying the groundwork for more advanced AI/ML capabilities.

If You’re in the “AI/ML Ready” Stage

Goal: Scale AI/ML and drive innovation.
If you’ve reached this stage, you’re in a strong position to scale AI and machine learning and lead innovation within your organization. Here’s where to focus your efforts:

  • Adopt MLOps: Automate model deployment and track experiments (e.g., MLFlow, SageMaker Model Registry) to streamline model management and continuous improvement.
  • Unify Analytics: Integrate your data and AI tools into a single platform (e.g., Databricks Lakehouse, Fabric, SageMaker Studio) to enable seamless analytics across your organization.
  • Real-Time Use Cases: Add streaming data processing capabilities (e.g., Kafka, Kinesis) to support real-time decision-making.
  • Generative AI: Experiment with large language models (e.g., Bedrock, Azure OpenAI) to explore new ways to leverage AI in your business.

At this point, your goal is to focus on scaling and optimizing AI/ML capabilities, unifying your analytics stack, and experimenting with the latest AI technologies to drive innovation.

Don’t Wait for Perfection. Get the Data into Users’ Hands.

Here’s a critical point: perfect data isn’t the goal. The goal is to get usable data into the hands of your data consumers quickly. Once they start using it, they’ll provide valuable feedback to help you improve it.

Waiting for data to be pristine before sharing it means you could waste time refining things that may not even matter to the end user. Deliver early, gather feedback fast, and then use that feedback to improve. This iterative approach helps you demonstrate value right away and makes sure you’re focused on what really matters.

Keep It Moving

Data maturity is a journey. There’s no one-size-fits-all solution. You don’t have to achieve perfection right away, and you shouldn’t wait for it. Start small, embrace the mess, and lean on your team and your data product consumers to help you iterate and improve.

Because, at the end of the day, data is really about helping you gain knowledge and wisdom over time. And getting started is the best way to unlock that potential.


This post originated from a Lean BYTES talk presented by Lean TECHniques’ Chief Innovation Officer, Tim Gifford. View the original talk here.