Your AI Roadmap

Transforming Speech Recognition with Miguel Jetté of Rev.com

June 18, 2024 Dr. Joan Palmiter Bajorek / Miguel Jetté Season 1 Episode 7
Transforming Speech Recognition with Miguel Jetté of Rev.com
Your AI Roadmap
More Info
Your AI Roadmap
Transforming Speech Recognition with Miguel Jetté of Rev.com
Jun 18, 2024 Season 1 Episode 7
Dr. Joan Palmiter Bajorek / Miguel Jetté

Miguel Jetté, VP of AI at Rev.com, shares insights into the evolution and impact of AI in transcription and speech recognition. Rev started as a platform offering transcription, captions, and subtitles, heavily relying on AI to improve its tools and products. Miguel's journey began eight years ago, focusing on building speech recognition capabilities.

This technology provides a first draft for "Revvers" (transcribers) to polish, enhancing efficiency and accuracy. He highlights the importance of a large and quality dataset, derived from Rev's transcription work, as a key advantage in training their speech recognition model. Challenges in testing and improving these models include dealing with subjective interpretations of transcripts. Miguel also discusses future directions, including tackling more complex use cases, multilingual support, and expanding technology to other languages and fields like translation and generative audio.

His career path, from a mathematics background to leading AI advancements at Rev, underscores the interdisciplinary nature and rapid evolution of speech technology and AI applications.

Notable Quotes from Miguel Jetté:
🎙️ Rev started as a platform for work at home jobs, diving deep into transcription, captions, and subtitles, powered by AI innovations.
🌍 Expanding our technology to embrace multilingual support and translation is not just exciting; it's a path towards global accessibility.
💡 The interaction between machines and humans at Rev showcases the real magic in improving productivity and accuracy through AI.

Resources Mentioned
Book: Computer speech technology by Rodman, Robert 1999

Miguel Jetté is Vice President of Artificial Intelligence at Rev. He leads Rev’s speech research and development team with over 20 years’ experience in speech recognition and machine learning. Before joining Rev, Migüel was a speech scientist at VoiceBox and Nuance Communications where he created state-of-the-art speech models across major industries in multiple languages. Migüel has a Master’s in mathematics and statistics from McGill University.

Connect with Miguel on LinkedIn: https://www.linkedin.com/in/migueljette/
Find Miguel on Twitter: https://twitter.com/bonuelph


More from Your AI Roadmap
Watch on YouTube! @YourAIRoadmap
LinkedIn: Connect with Joan, and let her know you listened! ⁠
Joan has a BOOK with Wiley coming! AI, Careers, and Future-Proofing Your Income: Book Waitlist

Who is Joan? Ranked the #4⁠⁠ in Voice AI Influencer, ⁠⁠Dr. Joan Palmiter Bajorek⁠⁠ is the CEO of ⁠⁠Clarity AI⁠⁠, Founder of ⁠⁠Women in Voice⁠⁠, & Host of ⁠⁠Your AI Roadmap⁠⁠. With a decade in software & AI, she has worked at Nuance, VERSA Agency, & OneReach.ai in data & analysis, product, & digital transformation. She's an investor & technical advisor to startup & enterprise. A CES & VentureBeat speaker & Harvard Business Review published author, she has a PhD & is based in Seattle.

Clarity AI builds AI that makes businesses run better. Our mission is to help SMB and enterprise leverage the power of AI. Whether your budget is 5, 6, 7, or 8 figures, we can build effective AI solutions. Book a 15min

♥️ Love it? Rate, Review, Subscribe. Send it to a friend 😊

...
Show Notes Transcript Chapter Markers

Miguel Jetté, VP of AI at Rev.com, shares insights into the evolution and impact of AI in transcription and speech recognition. Rev started as a platform offering transcription, captions, and subtitles, heavily relying on AI to improve its tools and products. Miguel's journey began eight years ago, focusing on building speech recognition capabilities.

This technology provides a first draft for "Revvers" (transcribers) to polish, enhancing efficiency and accuracy. He highlights the importance of a large and quality dataset, derived from Rev's transcription work, as a key advantage in training their speech recognition model. Challenges in testing and improving these models include dealing with subjective interpretations of transcripts. Miguel also discusses future directions, including tackling more complex use cases, multilingual support, and expanding technology to other languages and fields like translation and generative audio.

His career path, from a mathematics background to leading AI advancements at Rev, underscores the interdisciplinary nature and rapid evolution of speech technology and AI applications.

Notable Quotes from Miguel Jetté:
🎙️ Rev started as a platform for work at home jobs, diving deep into transcription, captions, and subtitles, powered by AI innovations.
🌍 Expanding our technology to embrace multilingual support and translation is not just exciting; it's a path towards global accessibility.
💡 The interaction between machines and humans at Rev showcases the real magic in improving productivity and accuracy through AI.

Resources Mentioned
Book: Computer speech technology by Rodman, Robert 1999

Miguel Jetté is Vice President of Artificial Intelligence at Rev. He leads Rev’s speech research and development team with over 20 years’ experience in speech recognition and machine learning. Before joining Rev, Migüel was a speech scientist at VoiceBox and Nuance Communications where he created state-of-the-art speech models across major industries in multiple languages. Migüel has a Master’s in mathematics and statistics from McGill University.

Connect with Miguel on LinkedIn: https://www.linkedin.com/in/migueljette/
Find Miguel on Twitter: https://twitter.com/bonuelph


More from Your AI Roadmap
Watch on YouTube! @YourAIRoadmap
LinkedIn: Connect with Joan, and let her know you listened! ⁠
Joan has a BOOK with Wiley coming! AI, Careers, and Future-Proofing Your Income: Book Waitlist

Who is Joan? Ranked the #4⁠⁠ in Voice AI Influencer, ⁠⁠Dr. Joan Palmiter Bajorek⁠⁠ is the CEO of ⁠⁠Clarity AI⁠⁠, Founder of ⁠⁠Women in Voice⁠⁠, & Host of ⁠⁠Your AI Roadmap⁠⁠. With a decade in software & AI, she has worked at Nuance, VERSA Agency, & OneReach.ai in data & analysis, product, & digital transformation. She's an investor & technical advisor to startup & enterprise. A CES & VentureBeat speaker & Harvard Business Review published author, she has a PhD & is based in Seattle.

Clarity AI builds AI that makes businesses run better. Our mission is to help SMB and enterprise leverage the power of AI. Whether your budget is 5, 6, 7, or 8 figures, we can build effective AI solutions. Book a 15min

♥️ Love it? Rate, Review, Subscribe. Send it to a friend 😊

...

Hi, my name is Joan Palmiter Bajorek. I'm on a mission to decrease fluffy hype and talk about the people actually building in AI. Anyone can build in AI, including you. Whether you're terrified or excited, there's been no better time than today to dive in. Now is the time to be curious and future-proof your career, and ultimately, your income. This podcast isn't about white dudes patting themselves on the back. This is about you and me. and all the paths into cool projects around the world. So what's next on your AI roadmap? Let's figure it out together. You ready? This is Your AI Roadmap, the podcast. Hey folks, this is Joan dropping in to say hello and giving a little context about this episode. This episode is with Miguel Jetté, who I know through the speech recognition ecosystem. My friend introduced us and he and I just really saw parallels in our careers and have stayed in touch ever since. He is the VP of AI at Rev .com. And I really love how he talks about teams and he talks about audio files and he talks about really an international perspective and how people are using technology and augmenting the human experience related to transcription and many languages. And it's just so cool. And I think you're gonna have a blast listening to this episode. And let's dive in. Hello, hello. Hello. Yeah, of course. My name is Miguel Jetté. I'm VP of AI at Rev.com. Cool. And what is Rev? What are you working on? Yeah, Rev. Traditionally, I guess Rev was started as a platform for work at home jobs. And specifically, we did mostly work around transcription, captions, and subtitles. And so over the time, as that market sort of evolved and changed a bit, a lot of it has been driven by AI innovations. And Rev has been doing a lot of that to improve the tools inside Rev and to offer it as a product externally also. Nice. Okay, and I think people mostly these days on a Zoom, you know what captions are. Like more and more these days, I feel like people are quite acquainted with this. And what are your projects in your job? Yeah, so I started, I guess I started at Rev now eight years ago. And so it feels like a long time, but I started at Rev building the speech recognition capabilities of our internal platform. And so at Rev, we have people working on the platform. We, we internally, well, externally too, we call them Revvers. And so when you're When you're someone who transcribes for Rev or captions or subtitles, you log into the platform and you can choose, you know, to work on certain files and so on. We have, you know, thousands of customers that send us files. And then you start with an editor and that editor has a ton of tools. One of them is the first draft, we call it. So. The Revvers typically don't start with an empty blank page. They start with a speech recognition first draft, and then they kind of listen to the audio, fix the typos, of course, and, you know, polish it off to like as close to 100% accurate. And so when I started at Rev, my first project was to build this speech recognition model for Rev. And actually over time we did, you know, and that tool is used. And my job sort of changed it all a bit over time, but we're still mostly, like I would say mostly work on the speech recognition model still to this day. Yeah. Okay, wow. Well, and you mentioned files. For those of us who aren't in the speech recognition field, how does one build a speech recognition model or tool, or how does that go? What does that look like? Yeah, good question. And I think one of the fun thing that attracted me to Rev and one of our strategic advantage, I would say, is that to train a speech recognition model, you need an audio file, like an interview like this one, for example, and a transcript, so which represents the I guess some people call it the reference, or you know, so the transcript of what was said in the audio. And then from those two files, you can you use, you know, various techniques, but the latest one is what everybody talks about deep neural networks, you know, that you feed you use an algorithm to teach the mapping between that audio and that in that text file, basically. And so The nice thing about Rev was that because our work was always to transcribe files, we sort of built this really nice data set from which we could build our own speech recognition. So it may be difficult to start from scratch for others because you typically need a lot of audio to teach the model how to recognize voices. But yeah. That's what it means. That's what you need. Okay, yeah, so many different components. Oh gosh, I want to ask you about questions of size of data sets and models and things. I don't know that that's applicable. It's been a big week here. Can you share, you've been working at this for like eight years. Some people just realized AI existed. ChatGBT blew their minds. You've been actually working in this space. Are there things that have been surprising or challenging to you along the way that you'd want to share? Yeah, so many. Yeah, so I've started my career in speech 17 years ago, it's hard to believe. And yeah, before the iPhone even existed. So some people may think it's the iPhone's been around forever. But no, there was a world before iPhones. And so yeah, I mean, every so many things change. I think maybe one of the thing that I find particularly interesting because we're a transcription company, is maybe my naive view of how not subjective transcription might be. But actually, you know, in the field of AI, let's say, or ML or speech recognition, one of the things you have to do is test your model, you know, like to know how well it behaves. And to test your model, you have to have ground truths. You have to have like files that you say, okay, in this audio, this is what the person said. And in my naive view, I guess I always thought like, well, you know, it's obvious. Like if I said, I like pizza, like that's the reference. But in reality, like it's quite difficult sometimes to know exactly what the person said. Sometimes they mumble or they speak over each other. Sometimes there's background noise or... someone's talking in the background, do you include that in the transcript? Because maybe this feature recognition model will transcribe it. And so actually, like, there's a lot of art, I guess, that goes into assessing these models. And I think maybe that's something that surprised me the most, and probably applicable to a lot of AI fields. You think that it's, you know, our brain is pretty magical, actually, and we do a lot of weird processing ignoring of things when we see the world, you know, but when you build a computer to do these things, you have to think about all those details. It's all those little choices, right? I think the funny thing about AI is how much manual effort and choices are a component piece of this. You know, this reminds me, okay, this is like mega nerdy of me. I don't know if you read this book. This is a book for my PhD, but they theorize, this specifically speaks to your like, how many people in the room. They theorize that one day we'll have speech recognition systems that. any language, any dialect, voices all the same time, a cacophony, and the computer will still get it, you know, 100% right in these transcription things. And it's a very difficult problem. And I don't know, part of me is like, this was written, I think in the 80s or 90s, and like, will this happen in our lifetime? Like, I don't, I don't even know if people actually are working on this problem. But it's fascinating to think how complex, even one voice with my speech recognition you know, accuracy. Yeah. I mean, it's funny. I wrote, maybe it was already two years ago, but I wrote what's coming up in the next five years or so in SpeedTrack. I think it was up to 2030. It was like one of those like thought pieces, you know? And that was one of the things that I thought like, maybe that, like at least improvement in that field and that like multi-speaker, multi-language kind of situation. Definitely, I think the field will go that way for sure. Will we ever fix, solve it entirely? Maybe, I mean, I think, I don't know why we wouldn't eventually, but yeah, it's certainly not like next year. No, well, and I guess the most every language, every dialect part is my question mark because capitalism slows us down from the every language part, or if I could just make jokes about that. That's a painful joke, but yeah, no, it's true. It's wildly unfair which languages gets prioritized and understanding the roadmap for those things and the choices companies make. Excuse me, whoever hears this and is deeply offended, it's not my intention. Roadmaps like that are prioritized for different reasons. Different surprises or challenges along the way. Maybe, what have you learned from some of your current projects that you're really excited about as you have projected things into the future with these ideas? I mean, I'm in general very excited about speech technologies overall. I mean, you know, the things specifically that we do at Rev, I'm excited about, we talked about the other languages in English. I am very passionate about expanding. our technology to other languages, but I'm also excited to look into translation as the these other fields evolve, you know, I'm excited to look at translation and generative audio, TTS, stuff like that. But one thing maybe that yeah, I said earlier, I learned about, you know, the nitty gritty of transcription, I guess. One thing that that is sinking even more and more now is how different every use case are, you know, in speed tracking mission and how important it is to sort of pay attention to the problems you're trying to solve specifically. So yeah, a lot of people work on generic sort of AI models that work kind of well for everybody, but they don't work perfectly for anybody kind of thing. And so at Rev, we're like, we work with a lot of different use cases and that's been a challenge and something that's maybe surprising a bit that I've learned over time is how different the approach might be for customer A versus customer B. Or even what they care about, you know, some customers care, let's say about Zainaz and their transcript and some don't, you know, so it's hard to have a generic model that solves for everything. For sure, for sure. And you will work on different, you know, model for medical stuff, models for this, that, and the other. Like, do you specialize into those, or do you really work on having a main kind of default out of the box solution? for now, we've our approach has always been a main default with certain variety, not variety, but certain little differences like the arms and us is an example. We have one model, but within the model, you can do something called verbatim. So it will transcribe arms and us or repetitions, or non verbatim, which is more like a clean transcript. where you remove that stuff. But yeah, we definitely are moving towards more customizations and stuff like that as we see like that. So yeah, some use cases, you could solve the problem better if you focus your attention basically. I have one I love as a descriptive linguist myself. You're talking about like cleaning out the ums and uhs, which is like how we speak. Descriptive linguist is like, that's how people speak, prescriptive, this is clean, this is the correct version. I love that. This reminds me so much of a project I did in graduate school. I was consulting for a company for a medical IVR solution, which is just medical phone tree for those who don't speak the IVR land. And like, The system was performing as like a C plus. It was doing horribly, which is why they brought me in as a consultant. And I said, like, what's our data set? Like, who are we serving? Blah, blah, blah. And they're like, well, it's a lot of like Filipino nurses who are like calling into this system. And I said, what is your, you know, what is your data set? What is your model? And they're like, ah, out of the box. Like, white guy. And I was like, and what's the, like this use case? Oh, it's medical terminology. And I was like, none of these things match. Like, is there a chance we could make tweaks so it actually is medical jargon? Like, Filipino nurses, like, it seemed to blow their mind, whereas in my PhD program, optimization was the goal, or at least how to build the correct type of solutions. And I remember just walking away being like, am I crazy? Like, what's going on here? That was very early into my PhD studies, but. They were thrilled with my consulting. I hope they improve that product a lot. I hope so. Yeah, that sounds like something that must happen in a ton of other fields, not just speech, but yeah, for sure. And the easier the tools are to use, maybe the more important those discussions are, because people will start to use tools like now, you can use an LLM for anything, but they weren't necessarily trained to solve this specific problem. I think a lot of people start to have these eureka moments also where like, oh, okay, I can't solve everything or you better fine tune it for your own problem. I think these people were like, because do you want to niche down so far? Do you want to have something that encompasses everything? I understand from a business use case, there are lots of different constraints that go on to those choices. So that is just one example. Okay. As you are deeply embedded in this field, you're working on these things, where do you see the field heading? Yeah, I think. Well, yeah, it's interesting. So I think there's maybe a thought that speech recognition certainly feels it's so much better than it used to, to the point where maybe some people think it's solved. I would say it's not really solved, it's just the problems or the difficulties are just changing. And then people come with new use cases that are kind of... way harder to solve. For example, what you were mentioning in your book. But we work, for example, at Rev, we work with a customer that transcribed body cams footage. And that's so challenging. You know, like there are people walking on the street, it's police officers, you know, so it's like super challenging environment. And so I think the world is, the speech world is evolving certainly towards those more difficult use cases, for sure. And so that's interesting. And then multilingual is another super interesting place where the field is going. So I think more and more the single language feels a little bit soft. So then people are like, oh. you know, what else can we do? And there's certainly a lot of people that speak multi-languages, many languages like myself. I speak French and English. And so sometimes I go back and forth between the two. So it would be cool to have a speech, you know, model that can do that well. And then, you know, translation and TTS, like I think are also going through an incredible, you know, change these days. And I think that's going to power a lot of new cool technology. Certainly for me, making things more accessible. You know, I grew up as a French speaker, so I didn't really have access to the internet because I'm too old. But if I did, I wouldn't have access to English material, which limits you. And so being able to translate and maybe even dub things quickly, you know, like if it's done well and with the right intentions, I think it can really be powerful. So yeah, I like that direction with some caveats, of course. But yeah. Yeah, well, and if I like all that, it sounds good. One of the things, and this is something we say in the field too, like speech recognition has been solved or like language has been solved. Can you break that down? Like, what would that mean for those people who don't think or can you break that down for us? I mean, well, I don't think it's solved, but when I say that, you know, for example, well, what you were describing was a good example, but, you know, I grew up, well, I used to work for Nuance before Rev, 17 years ago, and we worked a lot in the call center industry. And so we worked on these, what you were saying, they're called IVRs, interactive voice. Response? Yeah, maybe. Yeah, I think that's it. Yeah. And so, you know, those are the classic things where you call on you, you know, like if you want to pay your bills, say, pay your bill and stuff like that. and even back then, you know, you would say billing and it would barely something like it would not always recognize you and like those use cases are solved in the sense that now, you know, if I call those systems, I'm expecting them to work, you know, like the. Back in the day, you would go and you would kind of know, like, okay, it doesn't always work. It's frustrating, but it's, c'est la vie, you know? But now those things work. You know, when you're texting and you use your voice, you know, you're kind of expecting it to work, right? So that's kind of what I mean by solved, but certainly like the use cases are expanding, like I said, and I don't know if it will ever be completely solved, you know, because... We're just gonna make it harder and harder for ourselves. harder and harder. Well, and you just did some code switching, which is some of my favorite projects when people, you know, we've got Spanglish and Hinglish and, you know, just in the middle of a sentence, I'm gonna switch to a different language for a noun phrase, then I'm gonna switch back for the verb phrase, just to mix things up. But those are, for humans, something we're used to relatively, depending on our context. Computers are like, what did you just do? You wanna end in French? Like, wait a minute, no, no. Don't do that. exactly. Yeah, no, that's true. And that's one of the, that's actually one of the fun, interesting things we're working on now, so yeah. Cool, cool. Okay. When you think about, okay, what's this different from understanding field? When you think about, and I don't mean to ask anything too sensitive about the Rev roadmap, but when you think about exciting challenges that you might get to tackle in the next few years at work, or is there one thing that you're like, ooh, I can't wait to, I don't know, solve, but really work deeply on that piece of the puzzle? Uh, yeah. Let me think. Well, a lot of things in our roadmap are driven internally by some of the products we're building for Revvers. And so one of them is, so we, I was saying earlier, we do transcriptions. So an example of that would be an interviewer for ESPN that's say interviewing someone, they need the transcript, they send it to Rev. We have captions where maybe someone. is posting a lecture online and they want captions for their video. And we also have subtitles, which for us means their captions in a different language. So in this case, you know, the lecture might be, you might be able to have the captions in French. And one of the things we're working on right now is making the tools for subtitlers better. which will involve translation and captions. I'm pretty excited about that. So that's a cool project. And I like the way we have things set up at Rev is we do offer some of these products externally, but mostly we're trying to build these products also for internal, you know, just to make Revver's life better, which is cool because we get immediate feedback and, you know, It's sort of a feedback loop, if you will, between the AI scientists and the reverse. So I'm excited about that for sure. Yeah, that's one example, I'd say. Okay, cool, internal, external tools. How many Revvers are there these days? 75,000 people. Whoa, 75,000? Okay, that's not a number I was expecting to hear. And are they all over the world or are these are, yeah. yeah, they do. Yeah, it's really neat to get to learn to know some of them. And every now and then we have a company event. Well, we'll bring over a few of the Revvers that have been with us for a long time. And some of them have been for like 10 years working on our platform. Yeah, it's incredible. Well, and wait, I'm also just realizing 75,000, those are 75,000 humans who are working on transcription projects and stuff. See, this is the things I think I love hearing about, at least like we build these software things. When you think about, could we do these things without humans? Do humans always need to be in the loop? Because I think someone might hear this and go like 75,000 humans, like, is this actually AI? Is this just humans? Like, When you think about kind of those dynamics, how might you respond to that kind of question? Yeah, well, yeah, that's really interesting. And something that made me really interested in Rev is the interaction between machines and humans. And so to be clear, we do have pure AI offerings, which are just machine transcripts purely. We also have pure human transcripts, which are. But the real magic is like the sort of interaction between the two, because like we use AI to make Revvers more productive so that they can make money faster. So we still charge, we still give the same amount of money to Revvers for say transcribing an hour of audio, but with our tools we make it faster and so they can make money faster. And so there's this aspect of improving productivity. But then there's of course the aspect of Revvers correcting machine mistakes and so that they can make the machine a bit better, and so there's like this sort of cycle between the two, which is really cool. I think Revvers will always be important because the use cases that we tackled just get harder and harder, as I said earlier. So we kind of always need that feedback, at least. I think could there be a future without that? Maybe when I'm not here anymore. But for the foreseeable future, I think there's always gonna be a need for that sort of collaboration between the two. Okay, yeah. This reminds me of a project I did a few years ago where we were benchmarking the cost of building something from scratch versus mechanical Turk, you know, which is just paying people overseas to do it. And it was more expensive to build it from scratch than pay humans. And it was really wild to me of, because everyone wants this beautiful out of the box software solution that fixes everything, but like it was actually gonna be a more expensive build and maintenance phase. Mm-hmm. And so we had to talk to the customer about like, you said this was a goal, the cost was more expensive than this. How do you feel about that? And I think these days, even with, you know, ChatGPT, there are people behind OpenAI's tools, right? I think people or, some of the users maybe forget content moderation behind almost all of our, hopefully behind almost all of our social media platforms and so forth. You know, this, this human behind the scenes going on. Yeah, totally. And yeah, some, I mean, at least I, I don't want to speak negatively of other companies, but I pride myself in, in that Rev, like, tries really hard to have these, to make these, these jobs as enjoyable as possible. I certainly heard some, you know, nightmare stories around, you know, other types of, of AI related work, but yeah, I think there's always going to be a need for that for sure. Right, and I've heard also that, as you were mentioning, call centers or like people who answer the phone, like that's what they do for work is answer, you know, like are these happy jobs? Like, can we make their, like, are there tools to make their lives easier as they're, you know, doing translation or I think kind of augmenting workflows is something that I think about very often. Okay, so now we hear about your work. How did you get into this field? Yeah, what's the word, serendipity? I, you know, I am from Montreal. I went to McGill University studying mathematics and the focus of my math degree was on bioinformatics and something called phylogenetics. And phylogenetics is a study of evolutionary trees and their mathematical properties. Okay, and then within that, you know, you analyze a lot of sequences of DNA and so on to try to understand these the properties of these trees. And so I got into sequence analysis and stuff like that. And a friend of mine was just starting at Nuance back then when I finished my And he just said, you know, hey, I'm working in speech rec, you know, there's a lot of overlap. But at the time, we were using something called hidden Markov models. And so there was lots of overlap between the two fields. And I thought, OK, well, I'll, you know, I'll give it a shot. So I kind of started working there and really fell in love with the with the field. But I do. Yeah, I come from. different angle where I'm not a linguist. I'm interested in language, obviously, but definitely just came from a pure mathematics background. Yeah. did not know that. And so then you worked at Nuance and their speech sciences. Yeah, so at Nuance, I worked eight years as a speech scientist within a group called professional services. And that's a group of people that are not in research per se, but they understand it well enough that they take the results of research and kind of like massage it in a way that customers can use it. And so it was really good, you know, introduction to the field for me because I got to work on real projects. And I got to see, you know, first hand, like, you know, how everything works, you know, in the real world. So we worked with Verizon, Bell Canada, AT&T, Time Warner Cable, all sorts of things. Because I don't know if you remember, but I worked at Nuance briefly. When did you? I think I came after you. yeah, I think you did, but I left in 2014. Yeah. was still in grad school, but I worked there also on a Verizon backend. I guess I was working on the design and product work and we'd occasionally get data sets over from the speech learning sciences and throw it back over the, to the other team. So siloed these teams at the time, at least those years ago, very siloed. Okay. yeah, yeah. Interesting. Yeah, yeah, that's fascinating. And then, yeah, over time, I did get involved in other projects. So started to work on mobile applications near the end at Nuance. We had started a thing called Neena, which was a mobile app, you know, that could... enable call centers to like build their own mobile app. At the time, everything was new, you know, so I was like, oh, you know, we need to jump on this iPhone frenzy. And it was a cool app. So we worked a little bit on that. And then I joined a company called VoiceBox, who was doing, they were more focused on car speech recognition. So at the time, the project was for Toyota. Um, and so I learned, I think I would say that's where I learned the most about what we call large vocabulary speech recognition. So just like being able to say almost anything to the machine, because back then at, at Nuance it was very much a directed dialogue kind of thing where, you know, you can say billing, you can say this. Um, whereas, uh, the in-car, um, experience was more open-ended. It was like. You could say like, what's the weather in Seattle or give me directions to the needle or something like that. And so it was, yeah, that was super interesting. And then I joined Rev and you know, that's been my career so far. That's awesome. That's awesome. And I wasn't expecting the math. I don't think I knew that part. And I think, will you just explain for people who don't know hidden Markov models, when you think about math, how much does math play into your work today? Yeah, it's a good question and I don't think I should try to explain hidden Markov models on the spot, but no, I mean for researchers on my team, now I kind of lead the team, you know, so for me in my personal life, I do a lot less math than I used to, but for research scientists on the team, you know, it's important to know those things well enough for sure. Like they come up in many scenarios, you know? So like in the modeling for sure, you have to understand the basics of neural nets and transformers and all those things that are cool right now. But also even just for, let's say, accuracy of your model. For example, let's say I say the audio says, I love pizza. And then, well, too simple of an example, but if I say, if it misrecognized pizza to piazza or something, like understanding how to align these two sentences, like the right word, I with I, love, you know, like, there's a lot of math involved in that. You know, like just, just to do pure alignment stuff. And so, um, yeah, those things are useful. Definitely. Uh, on a day to day basis. Um, yeah, but for me as, as I grew into more management, that is definitely, uh, less. I know. That makes sense. And I think I wasn't specifically requesting you explain hidden Markov models, more so the fact of what is in vogue in the field of the tools we use for machine learning. Back in the day, that was the best of the best, cream of the cream. And now today, I think I heard a talk recently where like transformers are the end. is what the person was saying on stage. And I was like, whoa, that's a pretty big claim. I don't, do you wanna be documented saying that? It's just, you know, how machine learning, you know, evolves and doesn't evolve or what is working for us. And yeah. interesting. I did not hear that claim, man. I would say probably that's not going to happen. That's not going to be true. But who knows? Yeah. last year. Yeah, I could, I could find his name, but. But you're right, it's interesting in my 17 years, definitely when I started HMM, it was called hidden Markov model, Gaussian mixture model. So HMM, GMM, mixture models. And it was like the state of the art, like speech recognition based on phonemes and all that stuff. And then over time, there has been like a slow migration towards neural networks. And for a while, it was this hybrid scenario where you would do part of your stuff with still with GMM, HMM, but then no, sorry, part of your stuff with a neural net. And then the rest with the language model on top of it and, and things kind of transition. And now it's mostly pure, pure transformers. I don't think they're the end of it all. But that's an interesting comment for sure. It was so bold and so absolute. It certainly wouldn't be something I would say on stage. But I thought, I think it's also, I'm seeing some movement actually backward to see if things can be more explainable, that when we have these transformers, as you mentioned, pure these like, do we know what's going on in there? Do we have control? Can we explain, or just there are different parts of the field that are taking things in different ways. So, yeah. I feel like actually, yeah, that's really interesting because, you know, as we for large language models, for example, we when they first sort of came out, you know, we were talking about emergent properties where, you know, it's kind of surprised everybody that these models could learn, you know, so much. which is kind of not a good thing if you think about it, because it's like, oh, it's like just a surprising thing that we don't understand, you know? Like, it's like, oh, how come they're learning so much? I don't know, you know? So I think people now are asking questions and like, yeah, as you said, maybe going backwards a little bit to like make things more understandable and more controllable in some way, you know? And yeah, I think the field will continue to evolve for sure. Right, and even as I say, like go back, it's like a different prioritization. It's like, we want to know what's going on. Like actually maybe, you know, that took a step in a direction we, yeah. Or just we would, I think about it as a step back, which I even have to question for myself. Okay, so let's imagine someone's listening to this and they're like, whoa, I wanna get to this field. Like, yes, please, I want his job. What advice or steps might you recommend for someone to get into this path? Um, good question. Um, I've been out of school for long enough that it may be, you know, hard to answer, but honestly, uh, I mean, first of all, for me, as I, as I think back, I feel lucky that I found something that aligns with my belief and with what I like, you know, so I would say one of the things is, you know, understand the why you're interested in something, you know, because it's important. you're going to work a long time in this field, hopefully, and you want to work on something that you're passionate about that you believe in. And so, I like this guy, Simon Sinek, who talks about starting with the why, and I feel like that's important. And then don't just follow the hype or the money just because AI is a lucrative job. I feel like that's maybe the wrong reason to get into it. But yeah, in terms of technology, I think for computer science, obviously, I think if you really want to get into language technologies, I think there's got to be some really good schools out there now that will focus a lot on them. And computer science, math background is, I think, a good foundation for it. I like school, so I kind of wanted to stay in the school system for a long time. So, you know, I went, I didn't do a PhD like you. I know you did a PhD, but, you know, I went to the master level. I don't know if it's necessary. But yeah, certainly computer science math, I think. It's a good path. Well, and if I'll even push you further, does someone, you know, in three years, how many coders will we need? Yeah, well, yeah, that's a philosophical question. That's hard to answer. I know that the field of computer science has become even like way more competitive than it was during my time. And now to the point where you're like, oh, is that really the right path? Yeah, I don't know. You have to ask maybe my... I think that's a lot of people are like, oh yeah, I get, you know, work on your Python, working on your Java. And I was like, if you, I mean, you can just generate code from ChatGPT these days. Like there's a, I'm not gonna say which university, but I've heard there's a crisis of consciousness about teaching computer science because you can, just how easy it is to get this much code, you know, why are we coding from scratch much? You're like, one has to question. There's a lot of pieces one could question about these different practices, I guess. Yeah, definitely. I think, I mean, yeah, it's really interesting because, you know, I guess you could say the same thing about a lot of things like writing and, you know, like, I think there's a lot of space for people who are interested in computer science to do things properly still, you know. But yeah, certainly that's why I say, you know, don't jump in just cool field, but do it because you enjoy it. Because yeah, it's certainly a changing field. Seriously, do you still enjoy it? He doesn't, no pause, no pause. Okay, well, I think other people have literally gone through master's program and recommend the one they went through. Maybe because you are so senior, when you are hiring people for your teams, what do you look for in different candidates? Good question. I mean Rev is at the size enough where I have the luxury of actually looking at resumes well. I don't get thousands of resumes a day. But definitely try to see relevant experience. And so if it's true your school, at least put projects that are relevant to the company or the role you're applying to. if you have previous experience, then relevant experience on the projects is usually the first thing I look at. I don't necessarily look at the school you're from. I think that's maybe a secondary bonus thing where I look usually at the experience and then eventually, oh, maybe I say, oh, okay, that's from these, or she is from a good school. But in general, I try to look for technical match, you know, first. And then I really like to talk to people and like we're a small enough team that culture, maybe cultural fit is the wrong word, but you know, like team fit, you know, is important. And so I like to talk with people in person. And so maybe your first impressions and interviews, you know, like if you can somehow work through that, like that's important also in job search. Absolutely, yeah, I've heard this thing on TikTok that just like blew my mind about this exact topic. If you got the interview, your CV says you're qualified, everything else is a personality test. I mean, that's accurate, I think. It just like blew my mind. I was like, well, yeah. Yeah, yeah, I think, yeah, I guess people underestimate, but if you're a team of like eight people, you know, adding one person is like, that's a big deal. Like, and so they have to be, you know, a good fit for the team for sure. only spend how many hours a day, how many years with these people. Yeah, it's an intense commitment. Okay, any other advice or takeaways that you'd wanna give people? Ah, oof, I don't know, I feel like, no, I think I said most of what I should say. Yeah. Yeah, good question. Rev.com on LinkedIn, Miguel Jetté My Twitter handle is weird. It's called Bonwell Photog. So maybe we can post it as a link or something. But I used to do a lot of photography and so that was my thing. But yeah, I reach out, you know, on LinkedIn. I'm happy to connect with people and, you know, geek out about speech. Well, thank you so much for your time and geeking out with speech a little bit with me today. Yeah, I'd like to talk to you. Have a good rest of the day. nice. Have a good day. Oh gosh, was that fun. Did you enjoy that episode as much as I did? Well, now be sure to check out our show notes for this episode that has tons of links and resources and our guest bio, etc. Go check it out. If you're ready to dive in to personalize your AI journey, download the free Your AI Roadmap workbook at yourairoadmap .com / workbook. Well, maybe you work at a company and you're like, hey, we want to grow in data and AI and I'd love to work with you. Please schedule an intro and sync with me at Clarity AI at hireclarity .ai. We'd love to talk to you about it. My team builds custom AI solutions, digital twins, optimizations, data, fun stuff for small and medium sized businesses. Our price points start at five, six, seven, eight figures, depends on your needs, depending on your time scales, et cetera. If you liked the podcast, please support us. Can you please rate, review, subscribe, send it to your friend, DM your boss, follow wherever you get your podcasts. I certainly learned something new and I hope you did too. Next episode drops soon. Can't wait to hear another amazing expert building in AI. Talk to you soon!

Introduction & Background
Rev's AI Innovations: Leveraging AI to enhance transcription, captions, and subtitles.
Speech Recognition at Rev: Miguel's journey in building speech recognition tools.
Developing Speech Recognition: Challenges in creating effective models with unique datasets.
Field Surprises: The nuanced difficulties in transcription and speech recognition.
Complexity in the Field: Addressing the intricate challenges of accurate transcription.
Future Directions: Exploring multilingual support and advancements in speech technologies
Upcoming Projects: Enhancing tools for subtitlers and expanding language capabilities.
Customization for Users: Tailoring solutions to meet diverse customer needs
Advancements in Translation and Text-to-Speech
Speech Technology Perceptions: Debunking the notion that speech recognition is fully "solved."
Career Advice: Insights on entering the field and the importance of team fit in hiring.
Hiring Focus: Looking for relevant experience and a good fit for the team's culture.
Final Thoughts: Encouragement for those aspiring to join the AI and speech technology field.
Closing: How to connect with Miguel and learn more about Rev's work

Podcasts we love