Ever wonder what the cutting edge of an industry looks like? To have the brightest and most innovative minds of an entire industry all gathered in one place, all exchanging their latest technological breakthroughs and discoveries with one another?

Well here it is in the flesh… and the nuts and bolts.

It’s called ICML, or the International Conference on Machine Learning. It’s basically the Super Bowl of AI. And in 2023, it was held in Waikiki. This year, I—alongside a few other Deepgrammers—had the incredible opportunity to attend.



If you’d rather watch the experience than read about it, check out the video below! Otherwise, keep reading on 😄

Note that this blog is not meant to be a *complete* overview of ICML. After all, no single attendee can listen in on *all* the talks. Especially since many of them take place at the same time. Rather, we just want you to vicariously experience this incredible conference through us.

Ready? Let’s go.

Day 1 of 5: RLHF and Differential Privacy

On Day 1, we attended two major “tutorials” or talks. The first on RLHF and the second on Differential Privacy. Let’s talk about these two topics in order.

First off, If you’re unfamiliar with RLHF, I highly recommend checking out this resource by Chip Huyen. She breaks down the concept in an extremely simple, elegant manner with beautiful illustrations along the way. But for now, you just need to know two things:

First, AI is really just a bunch of computers learning the answers to questions by example, like the image below.

An image-recognition AI would be exposed to a massive dataset of images, some of cats and others not of cats. Then, after seeing enough examples, it’ll become good at identifying where cats are, even on pictures it has never seen before.

Got it? Great!

The second thing you need to know is that RLHF stands for “Reinforcement Learning with Human Feedback.” And it basically means that humans help the AI learn when the examples get really tough. This technique of using humans to help AI with the tough problems especially applies to language models like Chatbots. After all, how can we be sure that an AI’s answer is helpful if we don’t have humans to verify that they were indeed helped?

Anyway, that’s RLHF in a nutshell.

And with that in mind, here’s the juicy part of the RLHF talk at ICML: 

At this tutorial, the focus was on data labeling. In our toy example above, the labels would be “cat” or “not cat.”

But what about multimodal data? You know data that entails text, and images, and audio, and maybe even video.

Well, when it comes to multimodal data and RLHF, data-labeling becomes not only extremely important but also extremely difficult. After all, for a given set of images, you might have different tasks that you want the machine to answer.

“Is it a cat?” versus “Where is the cat?” versus “What color is the cat?”

Or, in the case of the actual RLHF presentation in Hawaii, I think the image below depicts a baby koala or something…

Anyway, if there are many questions to be answered by the machine based on this dataset, we’d better be certain that the labels inside the dataset are not only correct, but also unambiguous and peer-verified. The talk then emphasizes the importance of having well-planned workflows for data-labeling so that we can label data as efficiently and accurately as possible.

We have to ask questions in the right way so that we have unambiguous answers.

And towards the end of the session, we even teased the idea of automating data-labeling using large language models (or LLMs). After all, in theory, if you have an AI model that achieves or even outperforms human accuracy, you should be able to trust it to label data.

And, in theory, it should be able to predict out-of-class examples.

But, as we’ve Tweeted in the past, using machines to check and balance other machines can be an uncomfortably gray area (at least for now).

Alright, that was talk number one. We’re off to a great start so far! Let’s talk about the second tutorial: “How to DP-fy ML.”

DP in this case stands for “Differential Privacy,” aka the act of coming up with algorithms that minimize the probability of information being leaked *without* sacrificing the quality of the data itself. Here’s the punchline: Right now, training AI requires *a lot* of data that we need to be careful with.

As we’ve seen in the past (looking at you, Zuck), it’s crucial to be careful with human data. From addresses to banking information. And so, if we’re going to expose machines to these massive amounts of data, how can we be sure that they won’t  leak private information? (Note that leaks can happen either accidentally or through a malicious user trying to “game” or “hack” the AI.)

Well this talk proposes three possible solutions to protect people’s privacy.

  1. Applying differential privacy at the input level, 

  2. Applying differential privacy at the training level, 

  3. Applying differential privacy at the prediction level.

Here’s what each of that means.

Applying DP at the input level means having a dataset that doesn’t include any private information in the first place. Or, at least, heavily anonymizing this initial dataset.

Applying DP at the training level means that privacy protection measures are taken while the machine is learning. That is, the dataset will totally contain private information. But the machine reads that information in such a way that once its fully trained, it will have an extremely small probability of leaking anything. This is the most common point at which DP is applied today.

And applying DP at the prediction level means that the machine itself has read a bunch of private information and it totally knows all that information inside and out. However, the people in charge of the machine will impose limits on what the machine can predict and output. Noise is injected during inference. So the machine will know the private information, but the engineers will put safety and DP measures in place such that the machine will “hold its tongue” so to speak.

Much of the talk revolved around the math behind differential privacy, and rigorously defining what it means for an algorithm to “satisfy” differential privacy requirements. Feel free to check out those details on the ICML website.

There was also a talk about disinformation, fake news, and propaganda, but we’ll get to that tomorrow. For now, let’s take a nap and wake up ready for day 2.

Day 2 of 5: Disinformation, Fake News, and a Boat Party

Alright, as promised, here’s what was discussed during the disinformation talk:

First, we went over some of the challenges revolving around fake news. I’m sure you can find similar information in numerous places online. The interesting part is when the speakers introduced a fact-checking pipeline.

Manual fact checking focuses primarily on factuality, ignoring harm, as illustrated in the image below.

Right now there is a human pipeline for fact-checking. And this talk discussed the inefficiencies that those humans encounter. Long story short, we can introduce AI into this human-run fact-checking pipeline to speed up the efforts. So while humans are indeed still the ones fact checking and verifying the integrity of journalism and the 24-hour news cycle, AI can give them a boost in speed.

That is, humans are still running the show, but AI can act as a pair of running shoes that therefore boost productivity.

Finally, before we get ready for Day 3, it’s important to note that Deepgram also held a boat party with some of the aforementioned researchers. The boat had a glass bottom where you could see the reefs and the fish! If you want to see what that looked like, click here.

Day 3: Marginalized Languages and Robot Dogs

On day three, we attended a really cool panel about AI and Marginalized languages. 

The abstract of that panel, according to ICML is this: 

During the past year or so we've seen rapidly growing interest and excitement in large-scale language models and their applications to various domains beyond traditional problems in natural language processing and machine learning. And although it is indeed an exciting development these language models have been trained on a large corpus that may not be representative of all the languages in the world, and may focus disproportionately on better served languages such as English and European languages like French and Spanish. This raises both questions and concerns about the potential for these language models to exacerbate the issue of digital divide as well as inequality and inequity and information access. 

One fun fact that I learned is that sometimes, similar languages can actually blend together in datasets. For example, Norwegian, Danish, and Swedish are all similar enough that sometimes you can train a Norwegian model using Danish or Swedish data. However, the extent to which that model will be accurate is a question that we're still trying to solve. Nevertheless further studying and representing marginalized languages—especially those of Asian, African, and Native American descent—continues to be a focus today

And so, one the panel ended, we learned rather quickly that some people *really* wanted to see Deepgram’s robot dog, since we posted videos of him on TikTok before the conference.

Long story short, our robot dog went so viral, that a couple other fellow conference attendees recognized the logos on our shirts and asked if they could see him. We obliged, and soon enough, a crowd gathered around our little puppy. Here’s what I told them:

Deepgram bought the dog from a robotics company in China. And when he was delivered to us, he only had a controller. No intelligence built in whatsoever. As a result, we want to give him robot ears. And that’s where Deepgram comes in.

If you hook up the dog to a Raspberry Pi and a Bluetooth microphone, you can write a program that uses Deepgram’s AI Speech Recognition software to map certain actions to words. Basically, the microphone hears the words you say, and sends it to the Raspberry Pi. Then, the Deepgram code inside the Raspberry Pi transcribes the words you spoke. That transcription is then mapped to a robot command, which the dog then follows.

The result would then look like this.

Day 4 of 5: Fairness and the Test of Time Award

The Test of Time Award is given to a paper published 10 years ago that still has incredible impact today. In the words of ICML themselves:

“This year, we have considered all the papers that were presented at ICML 2013, and among those papers, have selected three papers that have been well cited and well followed up by the machine learning community since then. These three papers cover diverse aspects of machine learning, including unsupervised representation learning, hyperparameter tuning (model selection), and learning beyond average risk minimization.”

This year, the winning paper revolved around learning fair representations. In other words this paper proposed a learning algorithm for fair classification that achieves both group fairness and individual fairness. You can find the actual paper right here. Note that it has been cited over a thousand times over the past 10 years. And upon reading, it becomes clear why. Perhaps most prominently, it becomes quite easy to synthesize the ideas in this paper with the talks that we had about differential privacy, misinformation and even RLHF earlier on in the week.

Day 5 of 5: Surfing, Socials, and a Surprise

I’ll be honest with you, not much technological education happened on the fifth day, in my personal experience. Friday opened up with a surfing social, wherein I caught waves alongside thirty other ICML attendees. The entire experience was fun to experience, but would be boring to read about. Consequently, I refer you to this link. Please enjoy the final <90 seconds of the video linked above.

And with that, we’ve completed a full week of ICML! Hope you enjoyed. And if you’re headed to ICML next year, I’ll (potentially) see you in Austria!

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.

Sign Up FreeBook a Demo