How Nvidia AI Robot Trained 42 Years In 32 Hours And Did THIS | Google DeepMind AlphaCode

Nvidia, an American chip maker, just advanced robotics 42 years in the span of 32 minutes.

INTERESTED IN AI? – Check out our list of the best AI newsletters here

And the secret technique it used was detailed in a 90s kids show called Dragon Ball Z.

One of the biggest stumbling blocks in robotics was designing something as effective as a human hand. 

Human hands evolved over millions of years to be able to grasp, manipulate objects and craft fine tools.

Trying to replicate the same ability in robots has proven futile until very recently.

Before, Some used claws or grippers, but it had limited applications.

A hand with fingers has more joints that must move in very specific and coordinated ways to be useful.

Training a robot to have generalized fine motor skills with unlimited degrees of motion has been extremely difficult.

One approach that has been very effective for training AI to complete certain tasks has been something called

Deep Reinforcement Learning

(Deep RL) for short.

Remember this word, it will be important later.

Reinforcement learning is a technique where an AI learns from trial and error and is rewarded for the successful completion of the assigned tasks.

The AI learns by taking actions in its environment and receiving rewards or punishments for those actions. 

Over time, the agent learns which actions lead to the most reward and therefore the best outcomes.

The AI evolves it’s abilities similar to how organic life evolves it’s abilities over time.

In fact, machine learning uses terms like Generations, Species and Genome to describe how different AI models improve over time.

Machine learning can be thought of being what evolution and survival of the fittest was for organic life, but applied to artificial intelligence.

Just like organic evolution takes millions or billions of years, machine learning can take millions or billions of virtual simulations to learn from.

This is right now the biggest limitation for training robots using reinforcement learning.

They are expensive to build and can take many real-world hours to train.

The amount of money and time it takes has limited how fast robotics can advance.

However, machine learning in virtual worlds has been accelerating very, very fast.

Neural Nets are an approach to teaching computers that is loosely modeled after the human brain.

AIphaZero, the google chess AI was tasked to play games against itself and improve its abilities.

Over the course of nine hours, the chess version of the program played forty-four million games against itself.

After two hours, it began performing better than human players; after four, it was beating the best chess engine in the world.

OpenAI created a simple game where one AI competes with another in a game of hide and seek.

In this environment, AI agents play a team-based hide-and-seek game. 

Hiders (the blue team) are tasked with avoiding line-of-sight from the seekers (the red team), and seekers are tasked with keeping vision of the hiders. 

There are objects scattered throughout the environment that hiders and seekers can grab and lock in place, as well as randomly generated immovable rooms and walls that agents must learn to navigate. 

Before the game begins, hiders are given a preparation phase where seekers are immobilized to give hiders a chance to run away or change their environment.

It’s important to understand that the AI is not given hints, such as incentives for moving around objects.

They start learning at 0, the first run through might be almost random behavior as the AI learns even the most basic of behavior like moving around.

We can refer to each game or run-through or episode or iteration by many names,

As the AI runs through tens of millions of games, some glimmers of intelligent behavior can be seen, 

The seekers learn to chase the hiders, the hiders run from the seekers.

Between 20 million and 80 million games, the seekers learn to construct shelter to hide in.

Around 100 million games, seekers learn to use ramps.

By 200 million games, hiders learn to lock the ramps to prevent seekers from using them.

By 300 million games seekers learn a skill that we can call box surfing to get over the forts that hiders built.

By 400 million games hiders learn to disable everything that the seekers can use to their advantage.

As the numbers of games scale into the billions the AI begins to test the limits of the physics engine and even finds certain exploits that the developers did not intend.

These AI agents are also tested for their ability to do other unrelated tasks that test it’s memory, ability to solve puzzles and predict their environment.

As you can see running these simulations billions of times creates a sort of mastery in these digital AI agents.

Unfortunately running a billion scenarios or using a billion robots to run a scenario in the real world is not realistic.

Energy is expensive, robotic arms can cost hundreds of thousands of dollars and are subject to wear and tear and breaking from overuse.

The amount of money, time and resources needed to train real robots made of real atoms is astonishingly expensive.

And this is where we come back to Nvidia and the complexity of reproducing a human hand.

Say hello to the NVIDIA Isaac robotics simulator, which enables robots to be trained inside a simulated universe that can run more than 10,000x faster than the real world and yet obeys the laws of physics.

If you are familiar with the kids cartoon “Dragon Ball Z”, which started in the 90s, they had a similar idea.

The Hyperbolic Time Chamber is a mysterious time-compression chamber where 1 year is just a day on the outside.

The characters use this chamber to train themselves at an accelerated pace.

And now Nvidia has a similar idea for training a new army of robots.

Using NVIDIA Isaac Gym, a reinforced learning simulation for robots, Nvidia was able to train a human-like robot hand to handle and rotate a cube in one hand.

The neural network brain learned to do this entirely in simulation before being transplanted to control a robot in the real world.

Similar work has only been shown one time before, by researchers at OpenAI. 

Their work required a far more sophisticated and expensive robot hand, a cube tricked out with precise motion control sensors, and a supercomputing cluster of hundreds of computers to train.

The hardware used by the Nvidia  project was chosen to be as simple and inexpensive as possible to enable researchers worldwide to replicate our experiments.

The robot itself is an Allegro Hand, which costs as little as 1/10th the cost of some alternatives, has four fingers instead of five, and has no moving wrist. 

They used cheap, off the shelf cameras to track the 3D cube with vision, which can be repositioned easily as needed without requiring special hardware. 

The cube is 3D-printed with stickers affixed to each face.

Training a good policy takes about 32 hours on this system, equivalent to 42 years of a single robot’s experience in the real world.

When training in simulation, the most significant challenge is bridging the gaps between the order of the simulations and the chaos of the real world. 

To address this, Nvidia added some randomization of the physics set in the simulator: 

Changing object masses, friction levels, and other attributes across hundreds of  thousand simulated environments.

Adding this randomness makes the AI more robust in the real world.

The AI learns to adapt to the small variations that it will encounter in the real world.

Changes in friction, loose connections, different power sources, lighting, chipped edges etc

Here’s a thought…

We are using a simulation to acquire real-life, useful data that we can pull out and apply to the real, physical world.

As our technology advances, the simulations will get better, more like the real world.

The AI agents will get better, not only at playing hide and seek, but maybe even building and designing technology themselves.

They may even be able to design technology that we haven’t thought of.

If we launch trillions of simulations, each one evolving and getting better with time flowing  much faster than in our “base reality”, the actual physical world we live in…

Is it possible that one day we replicate a reality that looks like our own?

Or if you believe that there are other, more advanced civilizations among the stars, is it possible they already created a virtual replica of our world?

I’ll leave you with this:

There’s a, um, sort of a philosophical concept that a sufficiently advanced civilization will be able to create. So simulation. A simulation, yeah. Maybe you’ve answered this before. So, so the ideas, right. Any sufficiently advanced civilization would create, could create a simulation that’s like our existence.

And so the theory follows that may maybe we’re in the simulation, have you thought about this? And a lot, are we ? Are we for us being in a simulation? Probably being in a simulation, I think is the follow. Um, that, that 40, 40, 40 years ago, we had pong, like two rectangles on a dot. That was what games were. Um, now 40 years later, we have photo realistic 3D simulations with millions of people playing simultaneously, and it’s getting better every year.

Mm-hmm. . And soon we’ll have vir, you know, virtual reality. We’ll have augmented reality. Um, if you assume any rate of improvement at. Um, then the games will become indistinguishable from reality. So, so given that we’re clearly on a trajectory to have games that are indistinguishable from reality, and those games could be played on any set box or on a PC or whatever, and they would probably be, you know, billions of such, uh, you know, computers or set up boxes, it would seem to follow that.

The odds that we’re in base reality is one in billions. So tell me what’s wrong with that argument, is the answer yes. Somebody beat us to it. And this is a game. No, no. There’s a one in Billions Chance that this is base reality. Oh, okay. What do you think? Well, I think it’s one in billions. Okay. .