Ready, Test, Go. brought to you by Applause // Episode 19

The Evolution of Data Science and AI

Resource Library / Podcasts / The Evolution of Data Science and AI

Listen to this episode on:

About This Episode

Data scientists and engineers must weigh the potential advantages of AI adoption against its considerable risks. In this episode from ODSC East 2024 in Boston, Josh Poduska reflects on key insights gleaned from the show.

Special Guest

Josh Poduska

Josh Poduska, a client partner and AI strategist at Applause, focuses on ensuring the safe and effective implementation of AI technologies through rigorous testing and data creation. With a background spanning leadership roles at companies like Domino Data Lab, Hewlett Packard Enterprise and IBM, Josh offers a wealth of expertise in data science, AI, machine learning and statistics.

Transcript

(This transcript has been edited for brevity.)

DAVID CARTY: Traveling across the country isn’t always convenient or healthy or low-stress, but Josh Poduska finds a way.

JOSH PODUSKA: I don’t worry too much about getting sick. If it happens, it happens. I feel like, OK, whatever. I do carry with me DayQuil and NyQuil type of stuff just in case it happens because it does happen. So I traveled to Phoenix, Atlanta, New York, Boston in just the last three months and a couple trips to a few of those places, multiple trips. As I’ve traveled a bit, I like to work while I’m traveling. The worst thing for me is not that a flight is canceled. The worst thing is that a long cross-continent flight doesn’t have Wi-Fi. And I was like, oh my gosh, you got to be kidding me. Like, if my flight is canceled and I’m stuck at the airport working, I’ll be miffed. And I’ll take the bonus points they give me as a consolation prize. But if the Wi-Fi is out, I’m writing a nasty letter. I’m like, how could you expect us to travel from– I’m in Sacramento– from Sacramento to New York with no Wi-Fi? This is insane. Like, I just lost five hours of my workday. I can’t do it. So that’s the only time I get really riled up. I’m not watching a movie. Maybe I’m reading a book, but I’m just pouting. I’m grumpy.

CARTY: Traveling from the West Coast to the East Coast is especially difficult, but Josh rolls with it, even if it’s especially challenging when it’s time to catch some Z’s.

PODUSKA: I really try to get some exercise in while I’m traveling. I just find if I don’t that it’s really easy to stay in your hotel room and just work or just be heads down. The whole time, you’re staring at a screen, and your eyes are getting red. So I try to work in a little bit of that. I don’t always do it but try to. Then the other thing, which is great advice my wife always gives me, is don’t try to go to sleep until you’re tired. West Coast to East Coast travel is hard because it’s midnight on the East Coast. You got to get up early, and you might not be tired yet. Just don’t try. Just stay up a little bit longer. Maybe don’t do a screen. Read a book, and then wait till you start to feel your eyelids shut, but then give it a try. Otherwise you’re just going to lay in bed and be mad that you can’t fall asleep.

CARTY: This is the Ready, Test, Go. podcast brought to you by Applause. I’m David Carty. Today we are here at the Open Data Science Conference East at Hynes Convention Center in Boston. My next guest, one of two, is Josh Poduska. He is a client partner and AI advisor here at Applause, and he’s no stranger to shows like these. He’s been to quite a few over the years. So let’s just go ahead and skip with the long intro and jump right into it and talk with Josh about his impressions from the show. Let’s get to it.

Well, Josh, I know you are well accustomed to these ODSC shows, and you’ve been coming now for a few years. You have a good idea of the concepts and the themes that come up in the shows over the years. Can you talk about how it’s evolved and kind of arrived at this place where it is all AI all the time, very AI centric here?

PODUSKA: Yeah, it’s definitely AI on all the time. So I’ve been in data science for over 20 years. I started in data science before data science was a term. So data science as a term came about around 2012. Before that, it was machine learning, or data mining was kind of the phrase of the day. So I’ve been to these conferences for a while– ODSC and Strata and other conferences like it. So the evolution of them is a little bit interesting. It started out as mostly big data, like when data and volume came on the scene. And you had IBM talking about volume of data and velocity of data and variety of data. And cloud computing was just coming to be– coming into its own. And organizations were really looking at that seriously for their enterprise ops. And so along with that was analytics, a lot of it BI but also some advanced analytics like data mining back in the day. And then as data science became more in vogue. It went more to deep algorithms– not necessarily deep learning like we see that’s powering AI but more into advanced algorithms that are going to give you better predictions than just a tree model or like a simple logistic regression or linear regression model. And so that trend happened– data science trend really took off 2015 to 2020. Then COVID hit and all conferences took a back seat. ODSC actually had a nice virtual option at the time. But then today– this is my first time at ODSC in person since before COVID– everything’s AI. I heard about one talk that had to do with statistics and longitudinal studies, but everything else that I’ve seen has been AI. A lot of it has to do with LLMs, so natural language processing and the whole– all the GPT models but also some of the image creation, image augmentation, videos.

I have an opinion about where it’s going to go in the future. And Applause as a company is making some bets on where it’s going to go as the future and investing so that we can get ahead of those trends. But today, LLMs are the talk of the town. And everybody wants to know, how do you implement an LLM that’s safe and that’s helpful to my user base? And often the user base starts out as just your internal employees for a company. So that’s the number one use case. When ChatGPT came out not that long ago, I thought it was going to take off quicker. Like, I saw the evolution of the GPT technology through deep learning, neural networks, and I saw it starting to get closer and closer to something like what it is today. But myself and most people were totally caught off guard by ChatGPT. And it was all about the user interface. It was all about, what could the users do with this technology? And I didn’t see that coming, and most people didn’t see that coming.

CARTY: Let alone the adoption, right? I mean, the adoption was sky high right out of the gate.

PODUSKA: Absolutely. So I thought, OK, we’ve kind of cracked the code a little bit on how to leverage this technology to actually do things that make humans more productive and answer questions and entertain us. But it’s been slow. It’s been slower than I thought to materialize to be adopted, even for an internal chatbot for just your employees in a company. Even that– that’s usually the first use case, but that’s been slow to become adopted. You’re finally seeing some things. Like, Galaxy AI has the live translate between different languages with a Samsung app. And Apple and other companies are out in the public saying, we’re going after multimodal AI. So some of this more advanced stuff is starting to get integrated. Microsoft has AI integrated into their business software suite. Google has something similar. So it’s beginning to become a little bit more mainstream.

I think we’re probably one year away from most people saying, yeah, I use AI all the time in my work. Surveys and studies have that percentage really high. I heard today 60% of people use AI in their work. When I do my survey, I don’t see 60% of people using AI in their work. So I’m curious why what I’m seeing, when I talk in circles, doesn’t coincide with what is being shared in some of these talks. But I think we’re about a year away from it becoming– maybe two years– from it becoming really part of everybody’s work processes. I think it will be a good thing. I’m a fan of it. I’m a believer in it. There’s a lot of skeptics out there. But if you can find the right use case– like, ChatGPT found the right use case. It found the right user adoption, like you said. If you can find the right way to point AI and to put some nice little guardrails around it and to box in the use case, it’s a great technology.

CARTY: Right. So just go back to the point about virtual shows. The boxed lunches, are those better for the virtual show or for the in-person show?

PODUSKA: The boxed lunches?

CARTY: Yeah. Probably it depends on the show, I guess.

PODUSKA: Yeah, it depends on the show. It depends on the show. Yeah.

CARTY: Right.

PODUSKA: I like that idea, though. Maybe with a virtual show you give everybody a Grubhub coupon. Like, hey, get some food coming to your house.

CARTY: Sign me up. I like that idea. That’s great. Then you can choose what you’re getting, too.

There’s obviously been a palpable sort of enthusiasm around attendees of the show, the conversations that I’ve had so far. But there’s also been a lot of talk about risk assessment and guardrails and things like that, which it’s nice to see that those things are being considered. In your conversations, how prominent is that discussion, and how much of a concern is that?

PODUSKA: It’s real and it’s big. In the talk, we gave we cited a McKinsey study that said over 50% of leaders are concerned about two things– security and accuracy of their LLM models. And that’s the Achilles heel of LLMs today. They give you a lot of great information, but sometimes it’s not accurate. And they are susceptible to hijacking, prompt injections, and other attack surface weaknesses that they have that make executives nervous about especially releasing them to the public.

My co-presenter Peter Pham mentioned two really interesting examples in our talk. He talked about [an Air Canada] error. It’s a little bit well known. Someone got on to their help site, helpdesk, and was chatting with a bot, an LLM-based bot and asked about the bereavement policy and said, hey, I just traveled for a funeral. I’d like to get it reimbursed. Do you cover bereavement? And the chat bot said, yes, we do. It’ll be covered, and here’s the link to our policy to go check it out to learn more. Well, the link to the policy actually said they don’t. In the policy they did not cover it. But the chat bot said yes. The customer took them to court and won because the judge found that this chatbot is an official representative of [Air Canada].

CARTY: Sure.

PODUSKA: Yeah, it makes sense, right? Like, [Air Canada] didn’t like that answer, but it made sense. So you have to make sure that your LLMs have the guardrails, like you said, that they’re going to give answers that you support. Or you have to put them in a different use case where the risk of them hallucinating isn’t going to cost you financially or your brand reputation.

CARTY: Right. Security, accuracy. Regulatory concerns, kind of a distant third place there? I mean, we’ve talked a little bit about how I think we can expect more regulation in the future. It usually starts in the EU and kind of matriculates its way over here, or it can be a little bit industry specific sometimes too.

PODUSKA: Yeah, starts in the EU and California, and then it squeezes in on–

CARTY: Exactly.

PODUSKA: On the East Coast.

CARTY: Squeezes in the rest of the country here. But there’s been a little bit of that discussed at the conference too.

PODUSKA: There absolutely has. And it’s on everyone’s radar. So the EU AI Act, for one thing, mandates red teaming, which is a security testing protocol or technique. And then the US AI executive order also has been issued by President Biden. Coming out of that are going to be some NIST standards for safe AI and for regulating AI. That is one of the themes I saw at this conference. I saw several companies talking about AI governance.

And so my background– the first job I got after getting my master’s degree in statistics was I worked for Intel. And when I was at Intel, I was a factory statistician working in quality control, basically, on running their statistical process control system for one of their biggest factories. And during that time, I gained an appreciation for monitoring, for governance, for careful modifications of processes and to really document everything. And so I’ve always had a little bit of a bend towards that, always looked at the data science/machine learning space with an eye of quality and especially when it’s in production. Like, what can we do to monitor it in production? What can we do before it goes into production to vet it and make sure it’s safe? How do we document everything we did so we learn from it and get better in the future? This AI governance conversation– there’s some software out there, and a lot of conversation is going in that direction, is taking some of the same principles and ideas that machine learning and data science has gone through recently and transferring it over to AI.

So while there’s talk about it, I don’t see even industries like health care and financial services doing– I don’t see too many audits yet. I don’t see anything heavy. They’re preparing for it. And the companies that are trying to do the right thing are getting ahead of that and maybe doing their own internal audits. But once the stakes get higher– and they will because AI is going to become more and more advanced. And it’s going to touch more parts of our lives and part of the business life, the business world. It’ll touch more products. Those products, if AI goes wrong, could have physical or mental implications on customers like you and me. Once all that comes, you’re going to see more audits. You’re going to see more regulations. So it’s definitely coming, but today that’s not what’s stopping AI from getting into production. It’s more financial concerns and brand reputation concerns, I think, is what’s holding executives back from launching this at the speed I thought they would launch it with eventually.

CARTY: Right. And to that point about audits– we heard about this a little bit in the keynote today. It was a discussion point in the keynote. For the most part, you have organizations that are self-starters with the audits. They’re trying to bring this upon themselves to understand how they can better serve their customers in the long run. It’s not necessarily trying to meet a regulatory need. It’s trying to, through self-propulsion or ethical concerns or whatever, better serve their customers, which is a great idea.

PODUSKA: Some customers will require it. But most of the time, it’s what you said. It’s kind of self-initiated.

CARTY: Right. So you expect that at some point we’ll get to that level where audits are expected and required in some industries, that kind of thing?

PODUSKA: It’ll start with health care, health industry and finance, like it usually does with anything. Those are the two that have a lot of eyes on it. When things go wrong, it affects a lot of people in meaningful– or it affects a few people in very, very meaningful ways, like your health and your life with health care. Or in finance, it can affect a lot of people with their livelihood and their savings if AI goes awry. So that’s most likely where it’ll start. It could start in the consumer goods space with wearables and other things that have a health component to them. But yeah, it’s definitely coming. There’s a few companies and trends that I’m watching to see how they unfold. But it’s still early days for the regulatory space around AI.

CARTY: It’s going to be fascinating for you because in our few discussions we’ve had, which isn’t a lot– but you come at this from a pretty academic sort of perspective. And I feel like a lot of this is kind of evolving in front of us here. It must be fascinating every day to wake up and have the story being written, right?

PODUSKA: Yeah, it is. It is. And that’s one thing I like about working at Applause is we test and help vet this AI before it goes out into production. Also, we do some data collection for AI, so helping provide data that’s going to make the models safer and smarter and more accurate in the first place. So it’s really neat to have a little bit of a role to play behind the scenes in helping AI to build in a proper way and then to vet it before it goes into production, to validate before it goes into production.

I think eventually, and we talked about this in our talk– eventually we’re going to get to the point where we monitor it as well. And that makes sense. It’s logical. Like, you test it before it goes into production. Why not test it not every day but periodically once it’s in production? And then look at those metrics. Track them over time. See if they change. Because the thing with monitoring machine learning models in production, the world changes. So machine learning and AI is no different– is built off a set of data. And it can only be as good as that data set is built off of. If over time trends change, culture change, opinions change of users of your model, that model is going to degrade over time. And you want to have some visibility into that. So you set up certain tests, some automated, some human based, and you track the results of those tests over time. Once you see the trend line starting to go down, you kind of become aware. Alert goes up. You say, all right, let’s watch this closely. At what point do we need to throw this model out and get a new one? Or at what point do we need to just refresh the model we’ve already got, just spruce it up a little bit? So eventually AI is going to get to the monitoring phase, will be important to everyone. We’re not there yet, but people have seen it with data science. So I think enough people know that it’s something that should and will happen eventually.

CARTY: And you’ve mentioned your talk with Peter Pham, which you can read the summary of that talk on the Applause blog, by the way. I want to make sure I get the name of the talk correct. I did write it down– “Overcoming the Limitations of LLM Safety Parameters with Human Testing and Monitoring.” It rolls right off the tongue.

PODUSKA: Rolls off the tongue.

CARTY: Right, exactly. But the point is kind of having that validation throughout. I mean, it’s not all that different from your typical development lifecycle where, ideally, you’re kind of rolling in that feedback over time to make sure that you’re being nimble enough to adapt with the technology and the standards of the time, right?

PODUSKA: Right. There are three phases that we focused on to the AI development lifecycle in our talk, just to simplify it– the build phase, the test phase, and the monitor phase. And under build, we talked a lot about, where do you get your data from? And we talked about the three different kinds of AI that are out there right now. It’s very much an oversimplification, but you have the AI that is based off of what commonly is called like the RAG architecture, retrieval-augmented generation, where a company has their knowledge base. They have their corpora of data. And they want the LLM– the chatbot, the application, whatever it is– to only answer questions based off of this information. You were trained on Wikipedia and Reddit and all these other things off the internet, but forget about that for a minute and just only answer questions based off of what we tell you in a fenced-in way to talk about. So there’s a lot of applications in that category. The second category that we spoke about was, based off of that, Adobe’s got their stock photos that they built their image augmentation capabilities off of. Salesforce has a lot of internal data that they built their CRM AI integration off of. But both of those examples require something else. They require a unique way that the user interacts with AI. It’s not just a text prompt and return on a screen with words. Or it’s not just make me an image of a panda bear eating a snow cone on the cloud. So in the Adobe example and the Salesforce example, they had to gather human input and human user data on the interaction process. So they had the AI model built, and now let’s fine tune it and build maybe code and programming around the AI model so that the experience is specific for our particular use case– image creation for Adobe or CRM workflows for Salesforce. Then the last one is more like next gen, where’s AI going? And the examples we use there was Apple’s multimodal and Samsung’s Galaxy AI for live translation. In both of those cases, where do you get the data? What data do you need? It’s pretty much all got to be fresh. You can start with some of the foundation models built off of scraping the internet. But if you really want to get multimodal where you have some type of visual picture taking and video recording and you’re speaking to it, and maybe you’re doing some hand gestures about what you want it to do. And tell me about this plant. What type of plant is it? How should I care for it? I’ve got these ingredients, what kind of food can I make with these ingredients? Anything like that that has some multi-modalities, you need fresh, new data from humans that are out in the field, out in the wild. That are testing it, that are walking through scripts. And so that’s new data, data that you can’t just get by scraping the internet. And then with Samsung’s Galaxy AI and live translation, if you want this to work for all the big languages and eventually even some of the small ones, you’ve got to get different dialects. You’ve got to get different regions of a country. You need, obviously, different languages. You need different settings. Are you on a busy street? Are you in a home? Is there something loud– a windy day? There’s a lot of data you have to gather so that you can train the models in the right way. And again, you can’t just get that from scraping the internet. So our postulate was as AI continues to advance, more and more you’re going to need new data collected fresh from humans, usually in a crowdsourcing manner, so that you get a broad swipe of people.

CARTY: At the beginning of our conversation, you talked about how you had expected a little bit more growth in generative AI in large language models, and that we might see more of that in the next year. What will that look like?

PODUSKA: I don’t know. One trend, besides the AI governance, the other trend I saw at the conference was more hardware vendors and infrastructure vendors than I thought I’d see. I talked to a couple of them, and it coincided with what I’ve read in some of the news articles I follow.

There’s definitely a trend towards owning your own AI at a corporation level. So corporations want to own their own AI. They want to build it themselves, own it in-house, and have complete control over it. So that is a trend that I’m seeing. I’ve always thought that that trend would carry over to you and me. I don’t know how far it’s going to go there.

There is work being done in personalized AI. It makes sense to me that I would want an AI that knows my speaking style, knows my writing style. Knows my PowerPoint presentation style, or Google Slides, or whatever. And can build a deck or and write an email or draft a blog that is in my tone, and also incorporates more information than what I have. So I think that’d be pretty cool. And I think you could probably have that eventually on a PC so that it’s safe and secure, and your data is governed on your own. I hope it goes in that direction, because I would use that a lot if that was there. But the main thing I think of when I think about it being slow is like what Galaxy AI has put out– translation. I made an acquaintance in Brazil the other day. And we’re texting on an international text app. I was texting in English. He was going to Google Translate, looking up to copy/paste. What did he say? And he texted me back in Portuguese. I take it to Google Translate. And it’s just like what are we doing? AI’s been out for several years now.

CARTY: Yeah, take a step out of the process.

PODUSKA: How come this isn’t automatic yet? And we’re getting there, but it’s just been slower than I thought it would be. And I think it’s because, going back to the start of our conversation, it’s the risk and the inaccuracy of LLMs today. So as an industry, the industry is working on overcoming those barriers. But I did expect it to be integrated into more, especially our work lives, quicker than it has been.

CARTY: Josh, lightning round questions here for you. First one, what is your definition of digital quality?

PODUSKA: For me, digital quality is everything from the user experience, all the way to the accuracy of the output, with in-between how’s the code look. So I think the code has to be right, the user experience has to be good, and the accuracy output needs to be there. So I would hold it as those three things.

CARTY: Right. We’ve talked about this a little bit already, but how will AI evolve in the next five years?

PODUSKA: I think it’s going to be personalized, and I think it’s going to be integrated into all of our work applications — word processing, spreadsheets, presentation slides, and maybe even our webinars at some point– video.

CARTY: Yeah, definitely. The way everything’s evolving, you could certainly see it. What is your favorite app to use in your downtime?

PODUSKA: This is going to sound silly, but I actually like the Weather Channel app,

CARTY: OK.

PODUSKA: I like to go on there. I have family that’s spread out across the world, and so it’s interesting to see what weather patterns are going on there. And the geeky side of me also is very curious about the predictions that they give. So you can go on their radar and see the movement of the storm and how it’s moving across the ocean towards my brother who’s in the Middle East right now, or whoever it might be. And it’s just fascinating that we have that technology that’s both data science and prediction, but also just the raw data volume that you can send that down to one app. So not very exciting, but that’s one thing that I like to look at.

CARTY: No, it’s very empathetic of you. I like it. What is my brother experiencing on the other side of the world? Yeah, it’s a sweet sentiment I think.

PODUSKA: There you go.

CARTY: Yeah. And last, what is something that you are hopeful for?

PODUSKA: Something I’m hopeful for is that we will find the right balance between regulating and advancing AI. I’m not too much in the camp of AI is going to be the downfall of society. I think there’s a lot of good it can do, so I don’t want to overregulate it. But at the same time, it needs regulations. And it needs to have some common standards that the developed world follows in using AI. Because it’s going to come, no matter what. So if we as a society just say, well, AI is going to be bad for us from a military perspective, from a biological perspective, from a whatever perspective, we’re going to fall behind really fast. And then it will be bad for us, because we won’t be aware of what’s happening or be able to counteract it. So we need to move forward, but we need to do so in a smart way. So I hope government officials can find that right balance between the two.

CARTY: Well, Josh, I very much appreciate your perspective and hope you enjoy the rest of the show.

PODUSKA: Yeah. Thanks, David. It’s great talking with you.

The Evolution of Data Science and AI

About This Episode

Special Guest

Transcript

General

Company

Resources

Legal