speaker-0: Hello everyone and welcome to Redmonk Conversations. I'm Rachel Stevens. I am the research director with Redmonk. And with me today, I am very excited to introduce you to Boris Bialik. Boris is the VP of Industries and a global field CTO for MongoDB. Boris, can you give us a quick introduction to who you are and what you do with Mongo?

speaker-1: Thanks first, Rachel, for having me here. It's really nice to be on your podcast. I've been with MongoDB for seven years and I'm responsible for industries, which sounds very exciting, which is everything from financial services, insurance to connected cars. My team is working on use cases, integrating the bleeding and leading and whatever edge into MongoDB and into the solutions of our clients. We're all use case people. We're not really data people, but we're working on this data, obviously. So it's an exciting job to have.

speaker-0: sounds exciting. And I'm sure you've seen a lot of exciting use cases in the last couple of years as AI has come onto the scene. So I'm excited to dive into that with you and hear more. So our conversation today is talking about kind of what are we as a technology industry doing in an era where this technology is changing and emerging so quickly. It feels nascent at the same time it also feels inevitable. things are becoming increasingly not optional, but it's hard to understand where to make investments. And so we're here to kind of help people untangle that and figure out where they're going. And before I go, it's like, when I say nascent, I think one of the things that people can think when you say bleeding edge or you say nascent is that can feel untested, unproven, unsure what it is that it's going to be. I think One of the things about AI that we are seeing is nascent doesn't necessarily mean weight on the sidelines in this case, because the market is moving and this AI narrative is building momentum. It is building ⁓ in terms of skill sets that people need to be developing. And even immature technologies or newer technologies can become ones that are kind of mandatory parts of our discourse and they can influence how we're expected to compete, behave and build. And so that's kind of what we're seeing from the outside view. But what are you seeing as someone who's working in all of these industries and with people who are trying to build in these use cases? How are people making smart investments in an environment where the technology is just changing and emerging so rapidly?

speaker-1: Yeah, I think the most amazing thing is, I'm in this industry here with software for the last 35 years and I've never seen every single area affected at the same time. You normally see certain industries moving faster, retail is very aggressive and honestly the regulated industries are normally laggards behind. This is everybody's involved and engaged. It's from the biggest banks to automotive to insurance and everything in between. And this is what I personally find so much amazing. And the way how things emerge is as you point out, you cannot afford to wait on the other side. You don't want to sit on the wrong horse. So people try to bet literally on the whole racetrack and try to figure out how to do that. And that leads to some very funny moments and discussions on the solutioning side, on the use case side, where we came out of these very early experimental stage. I compare this normally to the early days of the internet. and people installed the Apache HTTP server. And it's the same thing what you see right now. People install single modules. And I dealt with one client who had 14 different vendors in one single very, very narrow use key solution. And while there was maybe good reason to do that one, I asked him, how do you want to ever productize this? And he looked at me really like, wow, we haven't gotten to that stage yet. And on the other side, and I showed him a competitor. who just went live with pretty much the same use case. And the dude was kind of scared because people try to experiment, but the move from experimentation and single use case to an enterprise-wide stable movement in AI is a big jump. And I compare this normally when people say, we have an AI strategy, we have a chatbot. And then I said, dude, You may need to transform your company a little bit more than just introducing a chatbot. And that's normally the starting point of the discussions. And that change is, as you point out, people try to bet on a horse and they try to buy the whole racetrack.

speaker-0: great metaphor, I think, for how people are feeling right now because it does, sometimes it does feel like a gamble and we're not sure where we're supposed to be making our investments. We're not sure what it is that is actually going to be the technology that is the technology that's foundational. I think one of the things that we have seen though is that data is one of those strong foundations that people need to both invest in, build upon and making sure that AI or not, like a data foundation is important. So let's kind of talk more about ⁓ data and how people are building this all out.

speaker-1: People start to really realize data are really theirs. They can talk about which LLM is nicer, which color do they prefer, but at the end they start to realize data is what they own. Data is what their control point is. And it's for lot of clients important to still have some control. It's a normal human, I think, or normal business kind of behavior. Do you want to have the feeling that you're somewhere in charge of your own destiny? And the people start to realize data is where destiny is. And on the other side, data is as well as how you can shape your destiny by enriching data and bringing more data together, what we call systems of action, where you bring life systems together and have these life interactions of the consumer, which could be an internal user or an external client in any form. But you want to these real-time capabilities. And if you want to have real-time capabilities, You need to get somewhere control of your data in one spot. It's not anymore good enough to have 10 data sources and say, well, somehow I get to them. Said, well, how long does this take? Well, 25 hours and then the last ETA. And then he said, well, 25 hours sounds really interesting for a client who's giving you 50 milliseconds. That's normally start point for really funky discussions. But that's where data and ownership, compliance, security, encryption. All these classical things are still not gone away and we still see those. And that is what I see every day when you see in automotive space, nobody wants to get their car started by somebody else and themselves.

speaker-0: Yeah, that's fair. I would not. One of the things that you touch on here is the broad shape of data. So it can be moving fast, it can be moving in batches, it can have a lot of different modalities, it can live on the edge or in core, it can go in a lot of different ways. And so trying to figure out a platform strategy across all of that can be really important. And I think it also kind of ties into what capabilities your platform needs to have. So when I was watching AI, so after LLMs merged on the scene, we saw vector databases as a standalone category explode. And then almost immediately, we also saw the database world start to incorporate vector capabilities into the existing databases. And there's still a lot of people who might need a standalone vector database, but in a lot of cases, we found that users can just use a unified approach and that will meet their needs. And so, and that's not unique to vectors. can have a whole side conversation around general purpose databases. But I guess what are you seeing in terms of market demand for platforms and unified ability to access data versus kind of specific capabilities? And has that changed or become specific in the world of AI?

speaker-1: And with AI it becomes more explicit. was saying before it was more nuanced, I had discussions with people on graph pieces, on geospatial data. And they were always, wow, you have geospatial in your data. That's pretty cool. And it started actually out originally with a tech search and the Lucene engines where we started to build upon. And people start to realize, I actually want that integration. I want the speed. I want the automatism out of that one. but I don't want to lose the capabilities of a good product. And that's the same thing what we see with vector databases and vectorization. We have these strong vector capabilities inside the MongoDB platform, but you are not limited to ⁓ me too, I stash something inside of some blob and call it, I straw a vector. Because this is what a lot of people do. The advantage of the original vector databases was they actually work with vectors. And that's so do we with MongoDB and our vector capabilities and the vectorization. As you probably know, it's exactly one year ago that we acquired Voyage AI becoming part of the MongoDB family. And this is the second part of the vectorization. It doesn't help you to have a vector if you can't embed models in real time. If I talk right now to Rachel, and Rachel would like to know right now something, it doesn't help me to vectorize this next week and I do an update and I ETL the data out. Now it's about what is Rachel's desire right now? What does she want? What is the interaction with Rachel? And to do this one as an agentic system, an agent, let's assume I'm actually not a human person, I am, trust me. But if I assume I would be an agentic system, then you start exactly to get into that point. You need those real-time capabilities. You need to reflect the sentences you gave me upfront and I need to be able to structure out of this one some magic answers out and yes, I am a human being just to reconfirm. And when we want to talk about that, then the agentic system needs to memorize what is the answer back, what is the nuances of this discussion and what is the outcome. And if I want to do this in real time, I need this in real time in memory capabilities, the decision, the act model. And that is where an embedded vector system is kind of absolutely prerequisite. And as I said, there many data types. So I've seen a lot of systems which have six or seven different data types. And from data types, mean, really from graph, geospatial, vectorization, text search, the list is really, really long. And this is right now what honestly makes it fun to work at MongoDB. So he asked, why do you work still? In my age, when you see that, this is just fascinating. to bring all these different things together and bring them to life and build out of that one a data foundation where a client can decide and tomorrow I switch my ALM and two weeks later I switch my framework. But the data are mine in my format, in my understanding, in my control with my security context and my encryption.

speaker-0: See, and you already started to answer my next question, but one of the things that I hear a lot in my role and sorry, at RedMonk, we work with vendors up and down the stack and every single vendor in this space. So just like how you talked about how all across all of the industries, people are moving, everyone across the vendor space is also moving in this way and AI is getting incorporated everywhere. So like from the Silicon all the way up to ⁓ the top of the application. Every place is having AI touch it. And I think one of the things that's really intimidating as a buyer of this technology is trying to figure out which part of the stack is the part that is like the platform that is truly matters for having AI integrated. And so from your perspective, what makes the data layer uniquely positioned in this market?

speaker-1: The data layer is really the only control strong point you have to have ownership on your infrastructure and ownership over the outcomes. Your data layer allows you to monitor and to track what answers coming back, how are they coming back. You can cache as well answers. This is a big thing right now that people don't want to have wrote answers going to an LLM. Why should I get for the question? My flight is late and what is my option? For that one, it's a road request and a road response. Typical chatbot behavior. I can go to an airline, but an airline still takes me so much time. If I have the information stored, I can go directly out. But that's my data. I know what the answer is. I know how I would like to reply to these road requests. And probably a large airline has hundreds of thousands of exactly that question in multiple forms and notations. But at the end, it's all the same question. And that's the same thing. Everything comes down to my data. I need to understand. is stuck in Atlanta. There's a snowstorm. I can't get him out to Chicago. There's a snowstorm too. And I need to do something and Boris is a frequent flyer. So what are my options? And at that time, you need to have your data, your decision points at your fingertips to then start to think about, ⁓ let's go back to the transaction system and look for the ticket number. With a ticket number, I can identify Boris' flyer status and with that flyer status, I may want to make a decision. This is not possible. I will be very upset. I'm sitting in a snowstorm. So at that point, I would like to have an intelligent answer. Boris, I feel really bad that you sit in a snowstorm. Alone, that sentence shows an emotion from a chatbot and an action that the chatbot understands the situation besides all my flight in Atlanta is delayed. So, and all of these things coming down to one thing, control of your data, understanding of your data, and the flexibility to apply the data to different agentic use cases. When you think about it, what did I just do? I had a chatbot involved who talks to me. I'm talking to the agentic system to try to make a decision over what are the options we can offer. And I probably have a supervising agent over the whole system trying to make sure that the system doesn't make funny jokes about me being stuck in a snowstorm. So when we see it this whole picture, what is the only stable component in this whole picture? The data.

speaker-0: think one of the things that kind of stands out to me in your answer here is when you're talking about the data and controlling the data, it does not sound like you're talking specifically just the database, but more like data pipelines and a full platform. Is that how you're thinking about it at Mongo?

speaker-1: Correct. Correct. It is really a platform discussion. And that's why I mentioned Voyage AI. Voyage AI gives you the models to integrate. The vector search allows you the vectorization. Our MCP server is the interface into the LLMs. So when you look at all of these things, you suddenly have a data platform in this picture, which allows you to act with agents at ease in a very, simple environment. And to be honest, The best use case what I see lately is manufacturing predictive maintenance. Everybody talks about predictive maintenance for the last 30 years. Everybody talks since fuzzy logic and measurements and it goes long way back. But by now we can make this very easy. We can build with agentic systems a complete workflow engine, which is interpreting data, analyzes that maybe the temperature of this piece is not okay. We need to have a replacement. Let's put a work order in. Let's order the spare part. Let's ensure the spare part and the work order arrive at the right spot at the right time. These were all human systems, to be honest, and the people who did these jobs didn't like it much. They were literally putting paper together and stapled it and put it in a folder and put it somewhere outbound. And today, a computer can do that. So it gives a lot of benefits in that regards. But as you can see, what is the core of it? It's a data platform. to drive all of these pieces interacting together and drive the results what the systems are expected to do without hallucination, please. Because my data are clean. I know what my data are. My data are not hallucinating. Hallucinating is somewhere else, not my data.

speaker-0: I want to dive in more because you talked about your MCP server. But I think one of the things that this market has evolved so quickly in the last 18 months, we've talked about like, here comes LLMs and everybody is amazed and impressed by the chatting ability. Then we kind of went into the rag architecture mode. Then the MCP servers came on the scene. Now everyone's all about agentic AI. All of this happened in 18 to 24 months, I would say. What are we doing here when we can't necessarily predict what form of AI integration is going to be coming next? And so like, what are the core elements of a data foundation that you as a customer need to be focusing on ⁓ so that you're in a place where no matter what comes next, we're ready to go.

speaker-1: I think it's really fair to take a look when he started out exactly as he said, the first with the chatbots and came to the REC, then came the MC. There's a natural evolution what happened from a single point solution where people build small sample summary functions and LLMs are great. I love to use it myself to summarize emails to get fast information. That's great. But this is all a very small, very narrow point solution. What agentic systems and MCP service and all of the upcoming frameworks which are sitting on top are about and they are by now retail specific protocols and there's so much cool stuff happening. All of that stuff is about a digital transformation of the enterprise. We're not talking anymore the point solution, the chatbot, the I love chatbots, I admit it, I'm a total fan. But outside of the chatbot, when you take a look, I want to transform my company. I want to transform how I interact with my customers. I want to transform my underwriting. I want to transform how my car is driving. So at that point people start, wow, but yes, that's what it is about. And when we take a look at those things, agentic systems are just the next way because they come from this outbound in thinking like the human works. You look at a bigger problem, what do I try to solve? Then you break it out into smaller problems and then you solve each smaller problem and drive it together to an answer. And that is what all agentic is about. But again, it's based upon This sounds so self-serving working for MongoDB, but it's all about the data at the end. It's all about my decisions and my agents are based upon what I give them to work with and what knowledge, memory, short-term and long-term they're able to build out. Then I can look into a bigger picture, what do I try to solve? Back to the predictive maintenance case, you try to solve a factory which has issues with machines. There are 100 machines in a hall. That is, a human takes exactly that view. He looks at the whole room. What is each machine? What's each parameters? And the same way an agentic system will emerge. But what is the fundament for this one? These hundred machines deliver data. These data need to be interpreted. I need to drive them to an LLM to make statements that the human understands it again and says, by the way, machine number 12 looks running really hot versus machine number 10. That is an interaction with an LLM. All of this is based upon the data I'm collecting in the factory floor. So this is everything comes down to data. And this is a boring part a little bit that database people again says data are the center of the world. They love to say that in ERP days, if you look up 30 years ago, ERP vendor says that data is a gold, then data became the new oil. And, but we are back to the same situation that the data is what I can control. And the data is what I make my decisions upon. Everything above, to be honest, will emerge. I'm not thinking that we're at the end of the agentic platforms yet.

speaker-0: I think that's fair. And as a former DBA, all of those statements just resonate with me. ⁓ I can't tell you how many times in my ⁓ enterprise life I embarked on somebody who was doing a single source of truth ⁓ endeavor to get all of the hands aligned and get everybody in into the right place. And it's a hard process and it's a cultural process for a lot of teams to get that.

speaker-1: And Venky just told me last week, whenever we run a project and we have 50 databases today, after the single source of truth, we have 51. And that sounds so mean and so disgraceful, but they didn't mean it as such. What they tried to say was that they are looking for a way how to get the data activated because it's not that they need only a single source of truth. That's kind of a statement of direction. What they want is data in context connected. And this is where the document model becomes important with MongoDB. We're able to enrich and enhance data in context. And when you say context, ha, NCP, context protocol. That's where it comes directly and that's where the linkage is. You bring data of multiple sources together into MongoDB and with the context we build inside of each data set about Boris and the airplanes or Boris on the manufacturing floor. I can send this. as a single data set right into my LLM to get interpretation out based on my data and I can look at the result what I can back and can validate this one against what I actually see if it does make sense. So this way I can avoid even hallucinations. And then what did I not do? I need embeddings, I need to build vectors, I need to have the comparison. Why is machine number 10 better than machine number 12? What are the problems? ⁓ by the way, the air filter is broken and by the way, should I order the spare part automatically for you? Click yes. Okay, the work order is out, the part when available, part will be replaced. That is what you want as a result. All driven by your data on your shop floor.

speaker-0: Absolutely. So I think one of the things that people have been struggling with over the, I guess, for all of time, but especially in the AI era, is what is compliance, governance, security, guardrails, all of those things are so important when you are talking about your own data. How does that fit in to how people are building these systems in the AI world?

speaker-1: The biggest part is your agentic systems and your agents act, and they will act with your data. You need to ensure that whatever agent has access to data gets only the data they should see. For that one, with MongoDB, have our curable encryption, which allows you that only an agent can see the data they are allowed to see in the context they're allowed to see it. This is one of the cool features that we have in MongoDB, which allows you to have H &A sees parts of the story, but they don't see the whole book because as soon as you have agentic systems involved, you have machines talking to machines. And you want to make sure that there's not a bad machine, a bad actor machine on the other end of the protocol, which has maybe different meanings than the agent understands because at the end, it's machine talk to machine. And the second part of the discussion obviously is when we talk machines talking to machines, most systems are designed for users. Specifically, see this in the insurance space, underwriting. Underwriting was a very human activity. Now you need to rethink how you do that because now machines are doing the underwriting and the API work. It's not about APIs. It's about the data, the data format and the understanding, the enrichment of the data in real time, what the system is doing. And so there is a complete rethinking necessarily from a UI driven human approach to an human supervision approach in the system where 90 % of the cases run automatically too, 5 % get a little bit supervision, 10 % get lot of supervision. and these kind of pieces to put together, this is where right now the real art and the enterprise transformation of AI starts and to bring these things together. And encryption is a key part of this one. You mentioned guardrails, obviously all kinds of audit trails, who's touching the data, which agent made the decision based on which input data. You see the word data, data, data, data all the way. That is pretty much where the story goes.

speaker-0: Yeah, I think that sums up our conversation pretty well. Data, data, data in a world that is moving very fast.

speaker-1: So you asked me what you hear from other vendors, the data are really, it's maybe not the new snake oil or the new oil, but data is really the fundament. And whatever frameworks I'm utilizing right now, I want to quote a CTO of a very large retailer who after we implemented the solution says, so Boris, we are done. It's obsolete. And I was completely shocked. was so proud what we achieved and he says, Boris, it's obsolete because it's live. And I was like, huh. And it felt really heartbroken. But then the next sentence actually made me think. he said, but the data was Mongo, that's my fundament. I can replace whatever other LLM I can replace the frameworks on top, but the data are mine. And I don't want to change those and redo those because that effort is too expensive. And that made me think. And that's a very, very good line.

speaker-0: Yeah, that is, think a great way to think of it is like, what is the foundation that we can build on and what can we abstract away to make it easier for other things that need to move more quickly to change, but we have a solid foundation.

speaker-1: And the speed of things, I started out with the situation where we have a dozen different components to build a single point solution. One of the things what I start to see is people start to realize they don't want to have 30, 13 components to build one single solution. They need the data foundation. They need the vectorization. They need the embedders. All of that stuff we deliver out of one hand in a reliable form, in a transparent fashion. We have our vector search as you know in preview for on-premises as well, is specifically in Europe becoming very, very important for a lot of people. And when we take a look on top of that, you need still the LLM, you need the MCP, which we deliver as well. And then you're off to the races and you can transform your company.

speaker-0: Amazing. Well, Boris, thank you so much for your time today. This has been such a fun discussion. If people want to learn more about what Mongo is doing, where should we send them?

speaker-1: The best part to learn things about is our solution library, which is part of our document system. So if you go to MongoDB Docs and you look at the left side, you see the solution library. You find actually very practical use cases there. All of the things that we talked about, the whole code, GitHub repositories, LLM prompting, everything what your heart desires is there. And obviously, MongoDB.com, we have a lot of examples right off the homepage for our AI offerings.

speaker-0: Wonderful. Thank you so much, Boris.

speaker-1: Thank you for having me.