Feed aggregator



Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion.

RoboCup 2024: 17–22 July 2024, EINDHOVEN, NETHERLANDSICRA@40: 23–26 September 2024, ROTTERDAM, NETHERLANDSIROS 2024: 14–18 October 2024, ABU DHABI, UAEICSR 2024: 23–26 October 2024, ODENSE, DENMARKCybathlon 2024: 25–27 October 2024, ZURICH

Enjoy today’s videos!

There’s a Canadian legend about a flying canoe, because of course there is. The legend involves drunkenness, a party with some ladies, swearing, and a pact with the devil, because of course it does. Fortunately for the drone in this video, it needs none of that to successfully land on this (nearly) flying canoe, just some high-friction shock absorbing legs and judicious application of reverse thrust.

[ Createk ]

Thanks, Alexis!

This paper summarizes an autonomous driving project by musculoskeletal humanoids. The musculoskeletal humanoid, which mimics the human body in detail, has redundant sensors and a flexible body structure. We reconsider the developed hardware and software of the musculoskeletal humanoid Musashi in the context of autonomous driving. The respective components of autonomous driving are conducted using the benefits of the hardware and software. Finally, Musashi succeeded in the pedal and steering wheel operations with recognition.

[ Paper ] via [ JSK Lab ]

Thanks, Kento!

Robust AI has been kinda quiet for the last little while, but their Carter robot continues to improve.

[ Robust AI ]

One of the key arguments for building robots that have similar form factors to human beings is that we can leverage the massive human data for training. In this paper, we introduce a full-stack system for humanoids to learn motion and autonomous skills from human data. We demonstrate the system on our customized 33-degrees-of-freedom 180 centimeter humanoid, autonomously completing tasks such as wearing a shoe to stand up and walk, unloading objects from warehouse racks, folding a sweatshirt, rearranging objects, typing, and greeting another robot with 60-100 percent success rates using up to 40 demonstrations.

[ HumanPlus ]

We present OmniH2O (Omni Human-to-Humanoid), a learning-based system for whole-body humanoid teleoperation and autonomy. Using kinematic pose as a universal control interface, OmniH2O enables various ways for a human to control a full-sized humanoid with dexterous hands, including using real-time teleoperation through VR headset, verbal instruction, and RGB camera. OmniH2O also enables full autonomy by learning from teleoperated demonstrations or integrating with frontier models such as GPT-4.

[ OmniH2O ]

A collaboration between Boxbot, Agility Robotics, and Robust.AI at Playground Global. Make sure and watch until the end to hear the roboticists in the background react when the demo works in a very roboticist way.

::clap clap clap:: yaaaaayyyyy....

[ Robust AI ]

The use of drones and robotic devices threatens civilian and military actors in conflict areas. We started trials with robots to see how we can adapt our HEAT (Hostile Environment Awareness Training) courses to this new reality.

[ CSD ]

Thanks, Ebe!

How to make humanoids do versatile parkour jumping, clapping dance, cliff traversal, and box pick-and-move with a unified RL framework? We introduce WoCoCo: Whole-body humanoid Control with sequential Contacts

[ WoCoCo ]

A selection of excellent demos from the Learning Systems and Robotics Lab at TUM and the University of Toronto.

[ Learning Systems and Robotics Lab ]

Harvest Automation, one of the OG autonomous mobile robot companies, hasn’t updated their website since like 2016, but some videos just showed up on YouTube this week.

[ Harvest Automation ]

Northrop Grumman has been pioneering capabilities in the undersea domain for more than 50 years. Now, we are creating a new class of uncrewed underwater vehicles (UUV) with Manta Ray. Taking its name from the massive “winged” fish, Manta Ray will operate long-duration, long-range missions in ocean environments where humans can’t go.

[ Northrop Grumman ]

Akara Robotics’ autonomous robotic UV disinfection demo.

[ Akara Robotics ]

Scientists have computationally predicted hundreds of thousands of novel materials that could be promising for new technologies—but testing to see whether any of those materials can be made in reality is a slow process. Enter A-Lab, which uses robots guided by artificial intelligence to speed up the process.

[ A-Lab ]

We wrote about this research from CMU a while back, but here’s a quite nice video.

[ CMU RI ]

Aw yiss pick and place robots.

[ Fanuc ]

Axel Moore describes his lab’s work in orthopedic biomechanics to relieve joint pain with robotic assistance.

[ CMU ]

The field of humanoid robots has grown in recent years with several companies and research laboratories developing new humanoid systems. However, the number of running robots did not noticeably rise. Despite the need for fast locomotion to quickly serve given tasks, which require traversing complex terrain by running and jumping over obstacles. To provide an overview of the design of humanoid robots with bioinspired mechanisms, this paper introduces the fundamental functions of the human running gait.

[ Paper ]



Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion.

RoboCup 2024: 17–22 July 2024, EINDHOVEN, NETHERLANDSICRA@40: 23–26 September 2024, ROTTERDAM, NETHERLANDSIROS 2024: 14–18 October 2024, ABU DHABI, UAEICSR 2024: 23–26 October 2024, ODENSE, DENMARKCybathlon 2024: 25–27 October 2024, ZURICH

Enjoy today’s videos!

There’s a Canadian legend about a flying canoe, because of course there is. The legend involves drunkenness, a party with some ladies, swearing, and a pact with the devil, because of course it does. Fortunately for the drone in this video, it needs none of that to successfully land on this (nearly) flying canoe, just some high-friction shock absorbing legs and judicious application of reverse thrust.

[ Createk ]

Thanks, Alexis!

This paper summarizes an autonomous driving project by musculoskeletal humanoids. The musculoskeletal humanoid, which mimics the human body in detail, has redundant sensors and a flexible body structure. We reconsider the developed hardware and software of the musculoskeletal humanoid Musashi in the context of autonomous driving. The respective components of autonomous driving are conducted using the benefits of the hardware and software. Finally, Musashi succeeded in the pedal and steering wheel operations with recognition.

[ Paper ] via [ JSK Lab ]

Thanks, Kento!

Robust AI has been kinda quiet for the last little while, but their Carter robot continues to improve.

[ Robust AI ]

One of the key arguments for building robots that have similar form factors to human beings is that we can leverage the massive human data for training. In this paper, we introduce a full-stack system for humanoids to learn motion and autonomous skills from human data. We demonstrate the system on our customized 33-degrees-of-freedom 180 centimeter humanoid, autonomously completing tasks such as wearing a shoe to stand up and walk, unloading objects from warehouse racks, folding a sweatshirt, rearranging objects, typing, and greeting another robot with 60-100 percent success rates using up to 40 demonstrations.

[ HumanPlus ]

We present OmniH2O (Omni Human-to-Humanoid), a learning-based system for whole-body humanoid teleoperation and autonomy. Using kinematic pose as a universal control interface, OmniH2O enables various ways for a human to control a full-sized humanoid with dexterous hands, including using real-time teleoperation through VR headset, verbal instruction, and RGB camera. OmniH2O also enables full autonomy by learning from teleoperated demonstrations or integrating with frontier models such as GPT-4.

[ OmniH2O ]

A collaboration between Boxbot, Agility Robotics, and Robust.AI at Playground Global. Make sure and watch until the end to hear the roboticists in the background react when the demo works in a very roboticist way.

::clap clap clap:: yaaaaayyyyy....

[ Robust AI ]

The use of drones and robotic devices threatens civilian and military actors in conflict areas. We started trials with robots to see how we can adapt our HEAT (Hostile Environment Awareness Training) courses to this new reality.

[ CSD ]

Thanks, Ebe!

How to make humanoids do versatile parkour jumping, clapping dance, cliff traversal, and box pick-and-move with a unified RL framework? We introduce WoCoCo: Whole-body humanoid Control with sequential Contacts

[ WoCoCo ]

A selection of excellent demos from the Learning Systems and Robotics Lab at TUM and the University of Toronto.

[ Learning Systems and Robotics Lab ]

Harvest Automation, one of the OG autonomous mobile robot companies, hasn’t updated their website since like 2016, but some videos just showed up on YouTube this week.

[ Harvest Automation ]

Northrop Grumman has been pioneering capabilities in the undersea domain for more than 50 years. Now, we are creating a new class of uncrewed underwater vehicles (UUV) with Manta Ray. Taking its name from the massive “winged” fish, Manta Ray will operate long-duration, long-range missions in ocean environments where humans can’t go.

[ Northrop Grumman ]

Akara Robotics’ autonomous robotic UV disinfection demo.

[ Akara Robotics ]

Scientists have computationally predicted hundreds of thousands of novel materials that could be promising for new technologies—but testing to see whether any of those materials can be made in reality is a slow process. Enter A-Lab, which uses robots guided by artificial intelligence to speed up the process.

[ A-Lab ]

We wrote about this research from CMU a while back, but here’s a quite nice video.

[ CMU RI ]

Aw yiss pick and place robots.

[ Fanuc ]

Axel Moore describes his lab’s work in orthopedic biomechanics to relieve joint pain with robotic assistance.

[ CMU ]

The field of humanoid robots has grown in recent years with several companies and research laboratories developing new humanoid systems. However, the number of running robots did not noticeably rise. Despite the need for fast locomotion to quickly serve given tasks, which require traversing complex terrain by running and jumping over obstacles. To provide an overview of the design of humanoid robots with bioinspired mechanisms, this paper introduces the fundamental functions of the human running gait.

[ Paper ]



The original version of this post by Benjie Holson was published on Substack here, and includes Benjie’s original comics as part of his series on robots and startups.

I worked on this idea for months before I decided it was a mistake. The second time I heard someone mention it, I thought, “That’s strange, these two groups had the same idea. Maybe I should tell them it didn’t work for us.” The third and fourth time I rolled my eyes and ignored it. The fifth time I heard about a group struggling with this mistake, I decided it was worth a blog post all on its own. I call this idea “The Mythical Non-Roboticist.”

The Mistake

The idea goes something like this: Programming robots is hard. And there are some people with really arcane skills and PhDs who are really expensive and seem to be required for some reason. Wouldn’t it be nice if we could do robotics without them? 1 What if everyone could do robotics? That would be great, right? We should make a software framework so that non-roboticists can program robots.

This idea is so close to a correct idea that it’s hard to tell why it doesn’t work out. On the surface, it’s not wrong: All else being equal, it would be good if programming robots was more accessible. The problem is that we don’t have a good recipe for making working robots. So we don’t know how to make that recipe easier to follow. In order to make things simple, people end up removing things that folks might need, because no one knows for sure what’s absolutely required. It’s like saying you want to invent an invisibility cloak and want to be able to make it from materials you can buy from Home Depot. Sure, that would be nice, but if you invented an invisibility cloak that required some mercury and neodymium to manufacture would you toss the recipe?

In robotics, this mistake is based on a very true and very real observation: Programming robots is super hard. Famously hard. It would be super great if programming robots was easier. The issue is this: Programming robots has two different kinds of hard parts.

Robots are hard because the world is complicated

Moor Studio/Getty Images

The first kind of hard part is that robots deal with the real world, imperfectly sensed and imperfectly actuated. Global mutable state is bad programming style because it’s really hard to deal with, but to robot software the entire physical world is global mutable state, and you only get to unreliably observe it and hope your actions approximate what you wanted to achieve. Getting robotics to work at all is often at the very limit of what a person can reason about, and requires the flexibility to employ whatever heuristic might work for your special problem. This is the intrinsic complexity of the problem: Robots live in complex worlds, and for every working solution there are millions of solutions that don’t work, and finding the right one is hard, and often very dependent on the task, robot, sensors, and environment.

Folks look at that challenge, see that it is super hard, and decide that, sure, maybe some fancy roboticist could solve it in one particular scenario, but what about “normal” people? “We should make this possible for non-roboticists” they say. I call these users “Mythical Non-Roboticists” because once they are programming a robot, I feel they become roboticists. Isn’t anyone programming a robot for a purpose a roboticist? Stop gatekeeping, people.

Don’t design for amorphous groups

I call also them “mythical” because usually the “non-roboticist” implied is a vague, amorphous group. Don’t design for amorphous groups. If you can’t name three real people (that you have talked to) that your API is for, then you are designing for an amorphous group and only amorphous people will like your API.

And with this hazy group of users in mind (and seeing how difficult everything is), folks think, “Surely we could make this easier for everyone else by papering over these things with simple APIs?”

No. No you can’t. Stop it.

You can’t paper over intrinsic complexity with simple APIs because if your APIs are simple they can’t cover the complexity of the problem. You will inevitably end up with a beautiful looking API, with calls like “grasp_object” and “approach_person” which demo nicely in a hackathon kickoff but last about 15 minutes of someone actually trying to get some work done. It will turn out that, for their particular application, “grasp_object()” makes 3 or 4 wrong assumptions about “grasp” and “object” and doesn’t work for them at all.

Your users are just as smart as you

This is made worse by the pervasive assumption that these people are less savvy (read: less intelligent) than the creators of this magical framework. 2 That feeling of superiority will cause the designers to cling desperately to their beautiful, simple “grasp_object()”s and resist adding the knobs and arguments needed to cover more use cases and allow the users to customize what they get.

Ironically this foists a bunch of complexity on to the poor users of the API who have to come up with clever workarounds to get it to work at all.

Moor Studio/Getty Images

The sad, salty, bitter icing on this cake-of-frustration is that, even if done really well, the goal of this kind of framework would be to expand the group of people who can do the work. And to achieve that, it would sacrifice some performance you can only get by super-specializing your solution to your problem. If we lived in a world where expert roboticists could program robots that worked really well, but there was so much demand for robots that there just wasn’t enough time for those folks to do all the programming, this would be a great solution. 3

The obvious truth is that (outside of really constrained environments like manufacturing cells) even the very best collection of real bone-fide, card-carrying roboticists working at the best of their ability struggle to get close to a level of performance that makes the robots commercially viable, even with long timelines and mountains of funding. 4 We don’t have any headroom to sacrifice power and effectiveness for ease.

What problem are we solving?

So should we give up making it easier? Is robotic development available only to a small group of elites with fancy PhDs? 5 No to both! I have worked with tons of undergrad interns who have been completely able to do robotics.6 I myself am mostly self-taught in robot programming.7 While there is a lot of intrinsic complexity in making robots work, I don’t think there is any more than, say, video game development.

In robotics, like in all things, experience helps, some things are teachable, and as you master many areas you can see things start to connect together. These skills are not magical or unique to robotics. We are not as special as we like to think we are.

But what about making programming robots easier? Remember way back at the beginning of the post when I said that there were two different kinds of hard parts? One is the intrinsic complexity of the problem, and that one will be hard no matter what. 8 But the second is the incidental complexity, or as I like to call it, the stupid BS complexity.

Stupid BS Complexity

Robots are asynchronous, distributed, real-time systems with weird hardware. All of that will be hard to configure for stupid BS reasons. Those drivers need to work in the weird flavor of Linux you want for hard real-time for your controls and getting that all set up will be hard for stupid BS reasons. You are abusing Wi-Fi so you can roam seamlessly without interruption but Linux’s Wi-Fi will not want to do that. Your log files are huge and you have to upload them somewhere so they don’t fill up your robot. You’ll need to integrate with some cloud something or other and deal with its stupid BS. 9

Moor Studio/Getty Images

There is a ton of crap to deal with before you even get to complexity of dealing with 3D rotation, moving reference frames, time synchronization, messaging protocols. Those things have intrinsic complexity (you have to think about when something was observed and how to reason about it as other things have moved) and stupid BS complexity (There’s a weird bug because someone multiplied two transform matrices in the wrong order and now you’re getting an error message that deep in some protocol a quaternion is not normalized. WTF does that mean?) 10

One of the biggest challenges of robot programming is wading through the sea of stupid BS you need to wrangle in order to start working on your interesting and challenging robotics problem.

So a simple heuristic to make good APIs is:

Design your APIs for someone as smart as you, but less tolerant of stupid BS.

That feels universal enough that I’m tempted to call it Holson’s Law of Tolerable API Design.

When you are using tools you’ve made, you know them well enough to know the rough edges and how to avoid them.

But rough edges are things that have to be held in a programmer’s memory while they are using your system. If you insist on making a robotics framework 11, you should strive to make it as powerful as you can with the least amount of stupid BS. Eradicate incidental complexity everywhere you can. You want to make APIs that have maximum flexibility but good defaults. I like python’s default-argument syntax for this because it means you can write APIs that can be used like:

It is possible to have easy things be simple and allow complex things. And please, please, please don’t make condescending APIs. Thanks!

1. Ironically it is very often the expensive arcane-knowledge-having PhDs who are proposing this.

2. Why is it always a framework?

3. The exception that might prove the rule is things like traditional manufacturing-cell automation. That is a place where the solutions exist, but the limit to expanding is set up cost. I’m not an expert in this domain, but I’d worry that physical installation and safety compliance might still dwarf the software programming cost, though.

4. As I well know from personal experience.

5. Or non-fancy PhDs for that matter?

6. I suspect that many bright highschoolers would also be able to do the work. Though, as Google tends not to hire them, I don’t have good examples.

7. My schooling was in Mechanical Engineering and I never got a PhD, though my ME classwork did include some programming fundamentals.

8. Unless we create effective general purpose AI. It feels weird that I have to add that caveat, but the possibility that it’s actually coming for robotics in my lifetime feels much more possible than it did two years ago.

9. And if you are unlucky, its API was designed by someone who thought they were smarter than their customers.

10. This particular flavor of BS complexity is why I wrote posetree.py. If you do robotics, you should check it out.

11. Which, judging by the trail of dead robot-framework-companies, is a fraught thing to do.



The original version of this post by Benjie Holson was published on Substack here, and includes Benjie’s original comics as part of his series on robots and startups.

I worked on this idea for months before I decided it was a mistake. The second time I heard someone mention it, I thought, “That’s strange, these two groups had the same idea. Maybe I should tell them it didn’t work for us.” The third and fourth time I rolled my eyes and ignored it. The fifth time I heard about a group struggling with this mistake, I decided it was worth a blog post all on its own. I call this idea “The Mythical Non-Roboticist.”

The Mistake

The idea goes something like this: Programming robots is hard. And there are some people with really arcane skills and PhDs who are really expensive and seem to be required for some reason. Wouldn’t it be nice if we could do robotics without them? 1 What if everyone could do robotics? That would be great, right? We should make a software framework so that non-roboticists can program robots.

This idea is so close to a correct idea that it’s hard to tell why it doesn’t work out. On the surface, it’s not wrong: All else being equal, it would be good if programming robots was more accessible. The problem is that we don’t have a good recipe for making working robots. So we don’t know how to make that recipe easier to follow. In order to make things simple, people end up removing things that folks might need, because no one knows for sure what’s absolutely required. It’s like saying you want to invent an invisibility cloak and want to be able to make it from materials you can buy from Home Depot. Sure, that would be nice, but if you invented an invisibility cloak that required some mercury and neodymium to manufacture would you toss the recipe?

In robotics, this mistake is based on a very true and very real observation: Programming robots is super hard. Famously hard. It would be super great if programming robots was easier. The issue is this: Programming robots has two different kinds of hard parts.

Robots are hard because the world is complicated

Moor Studio/Getty Images

The first kind of hard part is that robots deal with the real world, imperfectly sensed and imperfectly actuated. Global mutable state is bad programming style because it’s really hard to deal with, but to robot software the entire physical world is global mutable state, and you only get to unreliably observe it and hope your actions approximate what you wanted to achieve. Getting robotics to work at all is often at the very limit of what a person can reason about, and requires the flexibility to employ whatever heuristic might work for your special problem. This is the intrinsic complexity of the problem: Robots live in complex worlds, and for every working solution there are millions of solutions that don’t work, and finding the right one is hard, and often very dependent on the task, robot, sensors, and environment.

Folks look at that challenge, see that it is super hard, and decide that, sure, maybe some fancy roboticist could solve it in one particular scenario, but what about “normal” people? “We should make this possible for non-roboticists” they say. I call these users “Mythical Non-Roboticists” because once they are programming a robot, I feel they become roboticists. Isn’t anyone programming a robot for a purpose a roboticist? Stop gatekeeping, people.

Don’t design for amorphous groups

I call also them “mythical” because usually the “non-roboticist” implied is a vague, amorphous group. Don’t design for amorphous groups. If you can’t name three real people (that you have talked to) that your API is for, then you are designing for an amorphous group and only amorphous people will like your API.

And with this hazy group of users in mind (and seeing how difficult everything is), folks think, “Surely we could make this easier for everyone else by papering over these things with simple APIs?”

No. No you can’t. Stop it.

You can’t paper over intrinsic complexity with simple APIs because if your APIs are simple they can’t cover the complexity of the problem. You will inevitably end up with a beautiful looking API, with calls like “grasp_object” and “approach_person” which demo nicely in a hackathon kickoff but last about 15 minutes of someone actually trying to get some work done. It will turn out that, for their particular application, “grasp_object()” makes 3 or 4 wrong assumptions about “grasp” and “object” and doesn’t work for them at all.

Your users are just as smart as you

This is made worse by the pervasive assumption that these people are less savvy (read: less intelligent) than the creators of this magical framework. 2 That feeling of superiority will cause the designers to cling desperately to their beautiful, simple “grasp_object()”s and resist adding the knobs and arguments needed to cover more use cases and allow the users to customize what they get.

Ironically this foists a bunch of complexity on to the poor users of the API who have to come up with clever workarounds to get it to work at all.

Moor Studio/Getty Images

The sad, salty, bitter icing on this cake-of-frustration is that, even if done really well, the goal of this kind of framework would be to expand the group of people who can do the work. And to achieve that, it would sacrifice some performance you can only get by super-specializing your solution to your problem. If we lived in a world where expert roboticists could program robots that worked really well, but there was so much demand for robots that there just wasn’t enough time for those folks to do all the programming, this would be a great solution. 3

The obvious truth is that (outside of really constrained environments like manufacturing cells) even the very best collection of real bone-fide, card-carrying roboticists working at the best of their ability struggle to get close to a level of performance that makes the robots commercially viable, even with long timelines and mountains of funding. 4 We don’t have any headroom to sacrifice power and effectiveness for ease.

What problem are we solving?

So should we give up making it easier? Is robotic development available only to a small group of elites with fancy PhDs? 5 No to both! I have worked with tons of undergrad interns who have been completely able to do robotics.6 I myself am mostly self-taught in robot programming.7 While there is a lot of intrinsic complexity in making robots work, I don’t think there is any more than, say, video game development.

In robotics, like in all things, experience helps, some things are teachable, and as you master many areas you can see things start to connect together. These skills are not magical or unique to robotics. We are not as special as we like to think we are.

But what about making programming robots easier? Remember way back at the beginning of the post when I said that there were two different kinds of hard parts? One is the intrinsic complexity of the problem, and that one will be hard no matter what. 8 But the second is the incidental complexity, or as I like to call it, the stupid BS complexity.

Stupid BS Complexity

Robots are asynchronous, distributed, real-time systems with weird hardware. All of that will be hard to configure for stupid BS reasons. Those drivers need to work in the weird flavor of Linux you want for hard real-time for your controls and getting that all set up will be hard for stupid BS reasons. You are abusing Wi-Fi so you can roam seamlessly without interruption but Linux’s Wi-Fi will not want to do that. Your log files are huge and you have to upload them somewhere so they don’t fill up your robot. You’ll need to integrate with some cloud something or other and deal with its stupid BS. 9

Moor Studio/Getty Images

There is a ton of crap to deal with before you even get to complexity of dealing with 3D rotation, moving reference frames, time synchronization, messaging protocols. Those things have intrinsic complexity (you have to think about when something was observed and how to reason about it as other things have moved) and stupid BS complexity (There’s a weird bug because someone multiplied two transform matrices in the wrong order and now you’re getting an error message that deep in some protocol a quaternion is not normalized. WTF does that mean?) 10

One of the biggest challenges of robot programming is wading through the sea of stupid BS you need to wrangle in order to start working on your interesting and challenging robotics problem.

So a simple heuristic to make good APIs is:

Design your APIs for someone as smart as you, but less tolerant of stupid BS.

That feels universal enough that I’m tempted to call it Holson’s Law of Tolerable API Design.

When you are using tools you’ve made, you know them well enough to know the rough edges and how to avoid them.

But rough edges are things that have to be held in a programmer’s memory while they are using your system. If you insist on making a robotics framework 11, you should strive to make it as powerful as you can with the least amount of stupid BS. Eradicate incidental complexity everywhere you can. You want to make APIs that have maximum flexibility but good defaults. I like python’s default-argument syntax for this because it means you can write APIs that can be used like:

It is possible to have easy things be simple and allow complex things. And please, please, please don’t make condescending APIs. Thanks!

1. Ironically it is very often the expensive arcane-knowledge-having PhDs who are proposing this.

2. Why is it always a framework?

3. The exception that might prove the rule is things like traditional manufacturing-cell automation. That is a place where the solutions exist, but the limit to expanding is set up cost. I’m not an expert in this domain, but I’d worry that physical installation and safety compliance might still dwarf the software programming cost, though.

4. As I well know from personal experience.

5. Or non-fancy PhDs for that matter?

6. I suspect that many bright highschoolers would also be able to do the work. Though, as Google tends not to hire them, I don’t have good examples.

7. My schooling was in Mechanical Engineering and I never got a PhD, though my ME classwork did include some programming fundamentals.

8. Unless we create effective general purpose AI. It feels weird that I have to add that caveat, but the possibility that it’s actually coming for robotics in my lifetime feels much more possible than it did two years ago.

9. And if you are unlucky, its API was designed by someone who thought they were smarter than their customers.

10. This particular flavor of BS complexity is why I wrote posetree.py. If you do robotics, you should check it out.

11. Which, judging by the trail of dead robot-framework-companies, is a fraught thing to do.



Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion.

RoboCup 2024: 17–22 July 2024, EINDHOVEN, NETHERLANDSICRA@40: 23–26 September 2024, ROTTERDAM, NETHERLANDSIROS 2024: 14–18 October 2024, ABU DHABI, UNITED ARAB EMIRATESICSR 2024: 23–26 October 2024, ODENSE, DENMARKCybathlon 2024: 25–27 October 2024, ZURICH

Enjoy today’s videos!

In this video, you see the start of 1X’s development of an advanced AI system that chains simple tasks into complex actions using voice commands, allowing seamless multi-robot control and remote operation. By starting with single-task models, we ensure smooth transitions to more powerful unified models, ultimately aiming to automate high-level actions using AI.

This video does not contain teleoperation, computer graphics, cuts, video speedups, or scripted trajectory playback. It’s all controlled via neural networks.

[ 1X ]

As the old adage goes, one cannot claim to be a true man without a visit to the Great Wall of China. XBot-L, a full-sized humanoid robot developed by Robot Era, recently acquitted itself well in a walk along sections of the Great Wall.

[ Robot Era ]

The paper presents a novel rotary wing platform, that is capable of folding and expanding its wings during flight. Our source of inspiration came from birds’ ability to fold their wings to navigate through small spaces and dive. The design of the rotorcraft is based on the monocopter platform, which is inspired by the flight of Samara seeds.

[ AirLab ]

We present a variable stiffness robotic skin (VSRS), a concept that integrates stiffness-changing capabilities, sensing, and actuation into a single, thin modular robot design. Reconfiguring, reconnecting, and reshaping VSRSs allows them to achieve new functions both on and in the absence of a host body.

[ Yale Faboratory ]

Heimdall is a new rover design for the 2024 University Rover Challenge (URC). This video shows highlights of Heimdall’s trip during the four missions at URC 2024.

Heimdall features a split body design with whegs (wheel legs), and a drill for sub-surface sample collection. It also has the ability to manipulate a variety of objects, collect surface samples, and perform onboard spectrometry and chemical tests.

[ WVU ]

I think this may be the first time I’ve seen an autonomous robot using a train? This one is delivering lunch boxes!

[ JSME ]

The AI system used identifies and separates red apples from green apples, after which a robotic arm picks up the red apples identified with a qb SoftHand Industry and gently places them in a basket.

My favorite part is the magnetic apple stem system.

[ QB Robotics ]

DexNex (v0, June 2024) is an anthropomorphic teleoperation testbed for dexterous manipulation at the Center for Robotics and Biosystems at Northwestern University. DexNex recreates human upper-limb functionality through a near 1-to-1 mapping between Operator movements and Avatar actions.

Motion of the Operator’s arms, hands, fingers, and head are fed forward to the Avatar, while fingertip pressures, finger forces, and camera images are fed back to the Operator. DexNex aims to minimize the latency of each subsystem to provide a seamless, immersive, and responsive user experience. Future research includes gaining a better understanding of the criticality of haptic and vision feedback for different manipulation tasks; providing arm-level grounded force feedback; and using machine learning to transfer dexterous skills from the human to the robot.

[ Northwestern ]

Sometimes the best path isn’t the smoothest or straightest surface, it’s the path that’s actually meant to be a path.

[ RaiLab ]

Fulfilling a school requirement by working in a Romanian locomotive factory one week each month, Daniela Rus learned to operate “machines that help us make things.” Appreciation for the practical side of math and science stuck with Daniela, who is now Director of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL).

[ MIT ]

For AI to achieve its full potential, non-experts need to be let into the development process, says Rumman Chowdhury, CEO and cofounder of Humane Intelligence. She tells the story of farmers fighting for the right to repair their own AI-powered tractors (which some manufacturers actually made illegal), proposing everyone should have the ability to report issues, patch updates or even retrain AI technologies for their specific uses.

[ TED ]



Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion.

RoboCup 2024: 17–22 July 2024, EINDHOVEN, NETHERLANDSICRA@40: 23–26 September 2024, ROTTERDAM, NETHERLANDSIROS 2024: 14–18 October 2024, ABU DHABI, UNITED ARAB EMIRATESICSR 2024: 23–26 October 2024, ODENSE, DENMARKCybathlon 2024: 25–27 October 2024, ZURICH

Enjoy today’s videos!

In this video, you see the start of 1X’s development of an advanced AI system that chains simple tasks into complex actions using voice commands, allowing seamless multi-robot control and remote operation. By starting with single-task models, we ensure smooth transitions to more powerful unified models, ultimately aiming to automate high-level actions using AI.

This video does not contain teleoperation, computer graphics, cuts, video speedups, or scripted trajectory playback. It’s all controlled via neural networks.

[ 1X ]

As the old adage goes, one cannot claim to be a true man without a visit to the Great Wall of China. XBot-L, a full-sized humanoid robot developed by Robot Era, recently acquitted itself well in a walk along sections of the Great Wall.

[ Robot Era ]

The paper presents a novel rotary wing platform, that is capable of folding and expanding its wings during flight. Our source of inspiration came from birds’ ability to fold their wings to navigate through small spaces and dive. The design of the rotorcraft is based on the monocopter platform, which is inspired by the flight of Samara seeds.

[ AirLab ]

We present a variable stiffness robotic skin (VSRS), a concept that integrates stiffness-changing capabilities, sensing, and actuation into a single, thin modular robot design. Reconfiguring, reconnecting, and reshaping VSRSs allows them to achieve new functions both on and in the absence of a host body.

[ Yale Faboratory ]

Heimdall is a new rover design for the 2024 University Rover Challenge (URC). This video shows highlights of Heimdall’s trip during the four missions at URC 2024.

Heimdall features a split body design with whegs (wheel legs), and a drill for sub-surface sample collection. It also has the ability to manipulate a variety of objects, collect surface samples, and perform onboard spectrometry and chemical tests.

[ WVU ]

I think this may be the first time I’ve seen an autonomous robot using a train? This one is delivering lunch boxes!

[ JSME ]

The AI system used identifies and separates red apples from green apples, after which a robotic arm picks up the red apples identified with a qb SoftHand Industry and gently places them in a basket.

My favorite part is the magnetic apple stem system.

[ QB Robotics ]

DexNex (v0, June 2024) is an anthropomorphic teleoperation testbed for dexterous manipulation at the Center for Robotics and Biosystems at Northwestern University. DexNex recreates human upper-limb functionality through a near 1-to-1 mapping between Operator movements and Avatar actions.

Motion of the Operator’s arms, hands, fingers, and head are fed forward to the Avatar, while fingertip pressures, finger forces, and camera images are fed back to the Operator. DexNex aims to minimize the latency of each subsystem to provide a seamless, immersive, and responsive user experience. Future research includes gaining a better understanding of the criticality of haptic and vision feedback for different manipulation tasks; providing arm-level grounded force feedback; and using machine learning to transfer dexterous skills from the human to the robot.

[ Northwestern ]

Sometimes the best path isn’t the smoothest or straightest surface, it’s the path that’s actually meant to be a path.

[ RaiLab ]

Fulfilling a school requirement by working in a Romanian locomotive factory one week each month, Daniela Rus learned to operate “machines that help us make things.” Appreciation for the practical side of math and science stuck with Daniela, who is now Director of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL).

[ MIT ]

For AI to achieve its full potential, non-experts need to be let into the development process, says Rumman Chowdhury, CEO and cofounder of Humane Intelligence. She tells the story of farmers fighting for the right to repair their own AI-powered tractors (which some manufacturers actually made illegal), proposing everyone should have the ability to report issues, patch updates or even retrain AI technologies for their specific uses.

[ TED ]



Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion.

RoboCup 2024: 17–22 July 2024, EINDHOVEN, NETHERLANDSICSR 2024: 23–26 October 2024, ODENSE, DENMARKCybathlon 2024: 25–27 October 2024, ZURICH

Enjoy today’s videos!

Do you have trouble multitasking? Cyborgize yourself through muscle stimulation to automate repetitive physical tasks while you focus on something else.

[ SplitBody ]

By combining a 5,000 frame-per-second (FPS) event camera with a 20-FPS RGB camera, roboticists from the University of Zurich have developed a much more effective vision system that keeps autonomous cars from crashing into stuff, as described in the current issue of Nature.

[ Nature ]

Mitsubishi Electric has been awarded the GUINNESS WORLD RECORDS title for the fastest robot to solve a puzzle cube. The robot’s time of 0.305 second beat the previous record of 0.38 second, for which it received a GUINNESS WORLD RECORDS certificate on 21 May 2024.

[ Mitsubishi ]

Sony’s AIBO is celebrating its 25th anniversary, which seems like a long time, and it is. But back then, the original AIBO could check your email for you. Email! In 1999!

I miss Hotmail.

[ AIBO ]

SchniPoSa: schnitzel with french fries and a salad.

[ Dino Robotics ]

Cloth-folding is still a really hard problem for robots, but progress was made at ICRA!

[ ICRA Cloth Competition ]

Thanks, Francis!

MIT CSAIL researchers enhance robotic precision with sophisticated tactile sensors in the palm and agile fingers, setting the stage for improvements in human-robot interaction and prosthetic technology.

[ MIT ]

We present a novel adversarial attack method designed to identify failure cases in any type of locomotion controller, including state-of-the-art reinforcement-learning-based controllers. Our approach reveals the vulnerabilities of black-box neural network controllers, providing valuable insights that can be leveraged to enhance robustness through retraining.

[ Fan Shi ]

In this work, we investigate a novel integrated flexible OLED display technology used as a robotic skin-interface to improve robot-to-human communication in a real industrial setting at Volkswagen or a collaborative human-robot interaction task in motor assembly. The interface was implemented in a workcell and validated qualitatively with a small group of operators (n=9) and quantitatively with a large group (n=42). The validation results showed that using flexible OLED technology could improve the operators’ attitude toward the robot; increase their intention to use the robot; enhance their perceived enjoyment, social influence, and trust; and reduce their anxiety.

[ Paper ]

Thanks, Bram!

We introduce InflatableBots, shape-changing inflatable robots for large-scale encountered-type haptics in VR. Unlike traditional inflatable shape displays, which are immobile and limited in interaction areas, our approach combines mobile robots with fan-based inflatable structures. This enables safe, scalable, and deployable haptic interactions on a large scale.

[ InflatableBots ]

We present a bioinspired passive dynamic foot in which the claws are actuated solely by the impact energy. Our gripper simultaneously resolves the issue of smooth absorption of the impact energy and fast closure of the claws by linking the motion of an ankle linkage and the claws through soft tendons.

[ Paper ]

In this video, a 3-UPU exoskeleton robot for a wrist joint is designed and controlled to perform wrist extension, flexion, radial-deviation, and ulnar-deviation motions in stroke-affected patients. This is the first time a 3-UPU robot has been used effectively for any kind of task.

“UPU” stands for “universal-prismatic-universal” and refers to the actuators—the prismatic joints between two universal joints.

[ BAS ]

Thanks, Tony!

BRUCE Got Spot-ted at ICRA2024.

[ Westwood Robotics ]

Parachutes: maybe not as good of an idea for drones as you might think.

[ Wing ]

In this paper, we propose a system for the artist-directed authoring of stylized bipedal walking gaits, tailored for execution on robotic characters. To demonstrate the utility of our approach, we animate gaits for a custom, free-walking robotic character, and show, with two additional in-simulation examples, how our procedural animation technique generalizes to bipeds with different degrees of freedom, proportions, and mass distributions.

[ Disney Research ]

The European drone project Labyrinth aims to keep new and conventional air traffic separate, especially in busy airspaces such as those expected in urban areas. The project provides a new drone-traffic service and illustrates its potential to improve the safety and efficiency of civil land, air, and sea transport, as well as emergency and rescue operations.

[ DLR ]

This Carnegie Mellon University Robotics Institute seminar, by Kim Baraka at Vrije Universiteit Amsterdam, is on the topic “Why We Should Build Robot Apprentices and Why We Shouldn’t Do It Alone.”

For robots to be able to truly integrate human-populated, dynamic, and unpredictable environments, they will have to have strong adaptive capabilities. In this talk, I argue that these adaptive capabilities should leverage interaction with end users, who know how (they want) a robot to act in that environment. I will present an overview of my past and ongoing work on the topic of human-interactive robot learning, a growing interdisciplinary subfield that embraces rich, bidirectional interaction to shape robot learning. I will discuss contributions on the algorithmic, interface, and interaction design fronts, showcasing several collaborations with animal behaviorists/trainers, dancers, puppeteers, and medical practitioners.

[ CMU RI ]



Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion.

RoboCup 2024: 17–22 July 2024, EINDHOVEN, NETHERLANDSICSR 2024: 23–26 October 2024, ODENSE, DENMARKCybathlon 2024: 25–27 October 2024, ZURICH

Enjoy today’s videos!

Do you have trouble multitasking? Cyborgize yourself through muscle stimulation to automate repetitive physical tasks while you focus on something else.

[ SplitBody ]

By combining a 5,000 frame-per-second (FPS) event camera with a 20-FPS RGB camera, roboticists from the University of Zurich have developed a much more effective vision system that keeps autonomous cars from crashing into stuff, as described in the current issue of Nature.

[ Nature ]

Mitsubishi Electric has been awarded the GUINNESS WORLD RECORDS title for the fastest robot to solve a puzzle cube. The robot’s time of 0.305 second beat the previous record of 0.38 second, for which it received a GUINNESS WORLD RECORDS certificate on 21 May 2024.

[ Mitsubishi ]

Sony’s AIBO is celebrating its 25th anniversary, which seems like a long time, and it is. But back then, the original AIBO could check your email for you. Email! In 1999!

I miss Hotmail.

[ AIBO ]

SchniPoSa: schnitzel with french fries and a salad.

[ Dino Robotics ]

Cloth-folding is still a really hard problem for robots, but progress was made at ICRA!

[ ICRA Cloth Competition ]

Thanks, Francis!

MIT CSAIL researchers enhance robotic precision with sophisticated tactile sensors in the palm and agile fingers, setting the stage for improvements in human-robot interaction and prosthetic technology.

[ MIT ]

We present a novel adversarial attack method designed to identify failure cases in any type of locomotion controller, including state-of-the-art reinforcement-learning-based controllers. Our approach reveals the vulnerabilities of black-box neural network controllers, providing valuable insights that can be leveraged to enhance robustness through retraining.

[ Fan Shi ]

In this work, we investigate a novel integrated flexible OLED display technology used as a robotic skin-interface to improve robot-to-human communication in a real industrial setting at Volkswagen or a collaborative human-robot interaction task in motor assembly. The interface was implemented in a workcell and validated qualitatively with a small group of operators (n=9) and quantitatively with a large group (n=42). The validation results showed that using flexible OLED technology could improve the operators’ attitude toward the robot; increase their intention to use the robot; enhance their perceived enjoyment, social influence, and trust; and reduce their anxiety.

[ Paper ]

Thanks, Bram!

We introduce InflatableBots, shape-changing inflatable robots for large-scale encountered-type haptics in VR. Unlike traditional inflatable shape displays, which are immobile and limited in interaction areas, our approach combines mobile robots with fan-based inflatable structures. This enables safe, scalable, and deployable haptic interactions on a large scale.

[ InflatableBots ]

We present a bioinspired passive dynamic foot in which the claws are actuated solely by the impact energy. Our gripper simultaneously resolves the issue of smooth absorption of the impact energy and fast closure of the claws by linking the motion of an ankle linkage and the claws through soft tendons.

[ Paper ]

In this video, a 3-UPU exoskeleton robot for a wrist joint is designed and controlled to perform wrist extension, flexion, radial-deviation, and ulnar-deviation motions in stroke-affected patients. This is the first time a 3-UPU robot has been used effectively for any kind of task.

“UPU” stands for “universal-prismatic-universal” and refers to the actuators—the prismatic joints between two universal joints.

[ BAS ]

Thanks, Tony!

BRUCE Got Spot-ted at ICRA2024.

[ Westwood Robotics ]

Parachutes: maybe not as good of an idea for drones as you might think.

[ Wing ]

In this paper, we propose a system for the artist-directed authoring of stylized bipedal walking gaits, tailored for execution on robotic characters. To demonstrate the utility of our approach, we animate gaits for a custom, free-walking robotic character, and show, with two additional in-simulation examples, how our procedural animation technique generalizes to bipeds with different degrees of freedom, proportions, and mass distributions.

[ Disney Research ]

The European drone project Labyrinth aims to keep new and conventional air traffic separate, especially in busy airspaces such as those expected in urban areas. The project provides a new drone-traffic service and illustrates its potential to improve the safety and efficiency of civil land, air, and sea transport, as well as emergency and rescue operations.

[ DLR ]

This Carnegie Mellon University Robotics Institute seminar, by Kim Baraka at Vrije Universiteit Amsterdam, is on the topic “Why We Should Build Robot Apprentices and Why We Shouldn’t Do It Alone.”

For robots to be able to truly integrate human-populated, dynamic, and unpredictable environments, they will have to have strong adaptive capabilities. In this talk, I argue that these adaptive capabilities should leverage interaction with end users, who know how (they want) a robot to act in that environment. I will present an overview of my past and ongoing work on the topic of human-interactive robot learning, a growing interdisciplinary subfield that embraces rich, bidirectional interaction to shape robot learning. I will discuss contributions on the algorithmic, interface, and interaction design fronts, showcasing several collaborations with animal behaviorists/trainers, dancers, puppeteers, and medical practitioners.

[ CMU RI ]



This post was originally published on the author’s personal blog.

Last year’s Conference on Robot Learning (CoRL) was the biggest CoRL yet, with over 900 attendees, 11 workshops, and almost 200 accepted papers. While there were a lot of cool new ideas (see this great set of notes for an overview of technical content), one particular debate seemed to be front and center: Is training a large neural network on a very large dataset a feasible way to solve robotics?1

Of course, some version of this question has been on researchers’ minds for a few years now. However, in the aftermath of the unprecedented success of ChatGPT and other large-scale “foundation models” on tasks that were thought to be unsolvable just a few years ago, the question was especially topical at this year’s CoRL. Developing a general-purpose robot, one that can competently and robustly execute a wide variety of tasks of interest in any home or office environment that humans can, has been perhaps the holy grail of robotics since the inception of the field. And given the recent progress of foundation models, it seems possible that scaling existing network architectures by training them on very large datasets might actually be the key to that grail.

Given how timely and significant this debate seems to be, I thought it might be useful to write a post centered around it. My main goal here is to try to present the different sides of the argument as I heard them, without bias towards any side. Almost all the content is taken directly from talks I attended or conversations I had with fellow attendees. My hope is that this serves to deepen people’s understanding around the debate, and maybe even inspire future research ideas and directions.

I want to start by presenting the main arguments I heard in favor of scaling as a solution to robotics.

Why Scaling Might Work
  • It worked for Computer Vision (CV) and Natural Language Processing (NLP), so why not robotics? This was perhaps the most common argument I heard, and the one that seemed to excite most people given recent models like GPT4-V and SAM. The point here is that training a large model on an extremely large corpus of data has recently led to astounding progress on problems thought to be intractable just 3-4 years ago. Moreover, doing this has led to a number of emergent capabilities, where trained models are able to perform well at a number of tasks they weren’t explicitly trained for. Importantly, the fundamental method here of training a large model on a very large amount of data is general and not somehow unique to CV or NLP. Thus, there seems to be no reason why we shouldn’t observe the same incredible performance on robotics tasks.
    • We’re already starting to see some evidence that this might work well: Chelsea Finn, Vincent Vanhoucke, and several others pointed to the recent RT-X and RT-2 papers from Google DeepMind as evidence that training a single model on large amounts of robotics data yields promising generalization capabilities. Russ Tedrake of Toyota Research Institute (TRI) and MIT pointed to the recent Diffusion Policies paper as showing a similar surprising capability. Sergey Levine of UC Berkeley highlighted recent efforts and successes from his group in building and deploying a robot-agnostic foundation model for navigation. All of these works are somewhat preliminary in that they train a relatively small model with a paltry amount of data compared to something like GPT4-V, but they certainly do seem to point to the fact that scaling up these models and datasets could yield impressive results in robotics.
  • Progress in data, compute, and foundation models are waves that we should ride: This argument is closely related to the above one, but distinct enough that I think it deserves to be discussed separately. The main idea here comes from Rich Sutton’s influential essay: The history of AI research has shown that relatively simple algorithms that scale well with data always outperform more complex/clever algorithms that do not. A nice analogy from Karol Hausman’s early career keynote is that improvements to data and compute are like a wave that is bound to happen given the progress and adoption of technology. Whether we like it or not, there will be more data and better compute. As AI researchers, we can either choose to ride this wave, or we can ignore it. Riding this wave means recognizing all the progress that’s happened because of large data and large models, and then developing algorithms, tools, datasets, etc. to take advantage of this progress. It also means leveraging large pre-trained models from vision and language that currently exist or will exist for robotics tasks.
  • Robotics tasks of interest lie on a relatively simple manifold, and training a large model will help us find it: This was something rather interesting that Russ Tedrake pointed out during a debate in the workshop on robustly deploying learning-based solutions. The manifold hypothesis as applied to robotics roughly states that, while the space of possible tasks we could conceive of having a robot do is impossibly large and complex, the tasks that actually occur practically in our world lie on some much lower-dimensional and simpler manifold of this space. By training a single model on large amounts of data, we might be able to discover this manifold. If we believe that such a manifold exists for robotics — which certainly seems intuitive — then this line of thinking would suggest that robotics is not somehow different from CV or NLP in any fundamental way. The same recipe that worked for CV and NLP should be able to discover the manifold for robotics and yield a shockingly competent generalist robot. Even if this doesn’t exactly happen, Tedrake points out that attempting to train a large model for general robotics tasks could teach us important things about the manifold of robotics tasks, and perhaps we can leverage this understanding to solve robotics.
  • Large models are the best approach we have to get at “common sense” capabilities, which pervade all of robotics: Another thing Russ Tedrake pointed out is that “common sense” pervades almost every robotics task of interest. Consider the task of having a mobile manipulation robot place a mug onto a table. Even if we ignore the challenging problems of finding and localizing the mug, there are a surprising number of subtleties to this problem. What if the table is cluttered and the robot has to move other objects out of the way? What if the mug accidentally falls on the floor and the robot has to pick it up again, re-orient it, and place it on the table? And what if the mug has something in it, so it’s important it’s never overturned? These “edge cases” are actually much more common that it might seem, and often are the difference between success and failure for a task. Moreover, these seem to require some sort of ‘common sense’ reasoning to deal with. Several people argued that large models trained on a large amount of data are the best way we know of to yield some aspects of this ‘common sense’ capability. Thus, they might be the best way we know of to solve general robotics tasks.

As you might imagine, there were a number of arguments against scaling as a practical solution to robotics. Interestingly, almost no one directly disputes that this approach could work in theory. Instead, most arguments fall into one of two buckets: (1) arguing that this approach is simply impractical, and (2) arguing that even if it does kind of work, it won’t really “solve” robotics.

Why Scaling Might Not WorkIt’s impractical
  • We currently just don’t have much robotics data, and there’s no clear way we’ll get it: This is the elephant in pretty much every large-scale robot learning room. The Internet is chock-full of data for CV and NLP, but not at all for robotics. Recent efforts to collect very large datasets have required tremendous amounts of time, money, and cooperation, yet have yielded a very small fraction of the amount of vision and text data on the Internet. CV and NLP got so much data because they had an incredible “data flywheel”: tens of millions of people connecting to and using the Internet. Unfortunately for robotics, there seems to be no reason why people would upload a bunch of sensory input and corresponding action pairs. Collecting a very large robotics dataset seems quite hard, and given that we know that a lot of important “emergent” properties only showed up in vision and language models at scale, the inability to get a large dataset could render this scaling approach hopeless.
  • Robots have different embodiments: Another challenge with collecting a very large robotics dataset is that robots come in a large variety of different shapes, sizes, and form factors. The output control actions that are sent to a Boston Dynamics Spot robot are very different to those sent to a KUKA iiwa arm. Even if we ignore the problem of finding some kind of common output space for a large trained model, the variety in robot embodiments means we’ll probably have to collect data from each robot type, and that makes the above data-collection problem even harder.
  • There is extremely large variance in the environments we want robots to operate in: For a robot to really be “general purpose,” it must be able to operate in any practical environment a human might want to put it in. This means operating in any possible home, factory, or office building it might find itself in. Collecting a dataset that has even just one example of every possible building seems impractical. Of course, the hope is that we would only need to collect data in a small fraction of these, and the rest will be handled by generalization. However, we don’t know how much data will be required for this generalization capability to kick in, and it very well could also be impractically large.
  • Training a model on such a large robotics dataset might be too expensive/energy-intensive: It’s no secret that training large foundation models is expensive, both in terms of money and in energy consumption. GPT-4V — OpenAI’s biggest foundation model at the time of this writing — reportedly cost over US $100 million and 50 million KWh of electricity to train. This is well beyond the budget and resources that any academic lab can currently spare, so a larger robotics foundation model would need to be trained by a company or a government of some kind. Additionally, depending on how large both the dataset and model itself for such an endeavor are, the costs may balloon by another order-of-magnitude or more, which might make it completely infeasible.
Even if it works as well as in CV/NLP, it won’t solve robotics
  • The 99.X problem and long tails: Vincent Vanhoucke of Google Robotics started a talk with a provocative assertion: Most — if not all — robot learning approaches cannot be deployed for any practical task. The reason? Real-world industrial and home applications typically require 99.X percent or higher accuracy and reliability. What exactly that means varies by application, but it’s safe to say that robot learning algorithms aren’t there yet. Most results presented in academic papers top out at 80 percent success rate. While that might seem quite close to the 99.X percent threshold, people trying to actually deploy these algorithms have found that it isn’t so: getting higher success rates requires asymptotically more effort as we get closer to 100 percent. That means going from 85 to 90 percent might require just as much — if not more — effort than going from 40 to 80 percent. Vincent asserted in his talk that getting up to 99.X percent is a fundamentally different beast than getting even up to 80 percent, one that might require a whole host of new techniques beyond just scaling.
    • Existing big models don’t get to 99.X percent even in CV and NLP: As impressive and capable as current large models like GPT-4V and DETIC are, even they don’t achieve 99.X percent or higher success rate on previously-unseen tasks. Current robotics models are very far from this level of performance, and I think it’s safe to say that the entire robot learning community would be thrilled to have a general model that does as well on robotics tasks as GPT-4V does on NLP tasks. However, even if we had something like this, it wouldn’t be at 99.X percent, and it’s not clear that it’s possible to get there by scaling either.
  • Self-driving car companies have tried this approach, and it doesn’t fully work (yet): This is closely related to the above point, but important and subtle enough that I think it deserves to stand on its own. A number of self-driving car companies — most notably Tesla and Wayve — have tried training such an end-to-end big model on large amounts of data to achieve Level 5 autonomy. Not only do these companies have the engineering resources and money to train such models, but they also have the data. Tesla in particular has a fleet of over 100,000 cars deployed in the real world that it is constantly collecting and then annotating data from. These cars are being teleoperated by experts, making the data ideal for large-scale supervised learning. And despite all this, Tesla has so far been unable to produce a Level 5 autonomous driving system. That’s not to say their approach doesn’t work at all. It competently handles a large number of situations — especially highway driving — and serves as a useful Level 2 (i.e., driver assist) system. However, it’s far from 99.X percent performance. Moreover, data seems to suggest that Tesla’s approach is faring far worse than Waymo or Cruise, which both use much more modular systems. While it isn’t inconceivable that Tesla’s approach could end up catching up and surpassing its competitors performance in a year or so, the fact that it hasn’t worked yet should serve as evidence perhaps that the 99.X percent problem is hard to overcome for a large-scale ML approach. Moreover, given that self-driving is a special case of general robotics, Tesla’s case should give us reason to doubt the large-scale model approach as a full solution to robotics, especially in the medium term.
  • Many robotics tasks of interest are quite long-horizon: Accomplishing any task requires taking a number of correct actions in sequence. Consider the relatively simple problem of making a cup of tea given an electric kettle, water, a box of tea bags, and a mug. Success requires pouring the water into the kettle, turning it on, then pouring the hot water into the mug, and placing a tea-bag inside it. If we want to solve this with a model trained to output motor torque commands given pixels as input, we’ll need to send torque commands to all 7 motors at around 40 Hz. Let’s suppose that this tea-making task requires 5 minutes. That requires 7 * 40 * 60 * 5 = 84,000 correct torque commands. This is all just for a stationary robot arm; things get much more complicated if the robot is mobile, or has more than one arm. It is well-known that error tends to compound with longer-horizons for most tasks. This is one reason why — despite their ability to produce long sequences of text — even LLMs cannot yet produce completely coherent novels or long stories: small deviations from a true prediction over time tend to add up and yield extremely large deviations over long-horizons. Given that most, if not all robotics tasks of interest require sending at least thousands, if not hundreds of thousands, of torques in just the right order, even a fairly well-performing model might really struggle to fully solve these robotics tasks.

Okay, now that we’ve sketched out all the main points on both sides of the debate, I want to spend some time diving into a few related points. Many of these are responses to the above points on the ‘against’ side, and some of them are proposals for directions to explore to help overcome the issues raised.

Miscellaneous Related ArgumentsWe can probably deploy learning-based approaches robustly

One point that gets brought up a lot against learning-based approaches is the lack of theoretical guarantees. At the time of this writing, we know very little about neural network theory: we don’t really know why they learn well, and more importantly, we don’t have any guarantees on what values they will output in different situations. On the other hand, most classical control and planning approaches that are widely used in robotics have various theoretical guarantees built-in. These are generally quite useful when certifying that systems are safe.

However, there seemed to be general consensus amongst a number of CoRL speakers that this point is perhaps given more significance than it should. Sergey Levine pointed out that most of the guarantees from controls aren’t really that useful for a number of real-world tasks we’re interested in. As he put it: “self-driving car companies aren’t worried about controlling the car to drive in a straight line, but rather about a situation in which someone paints a sky onto the back of a truck and drives in front of the car,” thereby confusing the perception system. Moreover, Scott Kuindersma of Boston Dynamics talked about how they’re deploying RL-based controllers on their robots in production, and are able to get the confidence and guarantees they need via rigorous simulation and real-world testing. Overall, I got the sense that while people feel that guarantees are important, and encouraged researchers to keep trying to study them, they don’t think that the lack of guarantees for learning-based systems means that they cannot be deployed robustly.

What if we strive to deploy Human-in-the-Loop systems?

In one of the organized debates, Emo Todorov pointed out that existing successful ML systems, like Codex and ChatGPT, work well only because a human interacts with and sanitizes their output. Consider the case of coding with Codex: it isn’t intended to directly produce runnable, bug-free code, but rather to act as an intelligent autocomplete for programmers, thereby making the overall human-machine team more productive than either alone. In this way, these models don’t have to achieve the 99.X percent performance threshold, because a human can help correct any issues during deployment. As Emo put it: “humans are forgiving, physics is not.”

Chelsea Finn responded to this by largely agreeing with Emo. She strongly agreed that all successfully-deployed and useful ML systems have humans in the loop, and so this is likely the setting that deployed robot learning systems will need to operate in as well. Of course, having a human operate in the loop with a robot isn’t as straightforward as in other domains, since having a human and robot inhabit the same space introduces potential safety hazards. However, it’s a useful setting to think about, especially if it can help address issues brought on by the 99.X percent problem.

Maybe we don’t need to collect that much real world data for scaling

A number of people at the conference were thinking about creative ways to overcome the real-world data bottleneck without actually collecting more real world data. Quite a few of these people argued that fast, realistic simulators could be vital here, and there were a number of works that explored creative ways to train robot policies in simulation and then transfer them to the real world. Another set of people argued that we can leverage existing vision, language, and video data and then just ‘sprinkle in’ some robotics data. Google’s recent RT-2 model showed how taking a large model trained on internet scale vision and language data, and then just fine-tuning it on a much smaller set robotics data can produce impressive performance on robotics tasks. Perhaps through a combination of simulation and pretraining on general vision and language data, we won’t actually have to collect too much real-world robotics data to get scaling to work well for robotics tasks.

Maybe combining classical and learning-based approaches can give us the best of both worlds

As with any debate, there were quite a few people advocating the middle path. Scott Kuindersma of Boston Dynamics titled one of his talks “Let’s all just be friends: model-based control helps learning (and vice versa)”. Throughout his talk, and the subsequent debates, his strong belief that in the short to medium term, the best path towards reliable real-world systems involves combining learning with classical approaches. In her keynote speech for the conference, Andrea Thomaz talked about how such a hybrid system — using learning for perception and a few skills, and classical SLAM and path-planning for the rest — is what powers a real-world robot that’s deployed in tens of hospital systems in Texas (and growing!). Several papers explored how classical controls and planning, together with learning-based approaches can enable much more capability than any system on its own. Overall, most people seemed to argue that this ‘middle path’ is extremely promising, especially in the short to medium term, but perhaps in the long-term either pure learning or an entirely different set of approaches might be best.

What Can/Should We Take Away From All This?

If you’ve read this far, chances are that you’re interested in some set of takeaways/conclusions. Perhaps you’re thinking “this is all very interesting, but what does all this mean for what we as a community should do? What research problems should I try to tackle?” Fortunately for you, there seemed to be a number of interesting suggestions that had some consensus on this.

We should pursue the direction of trying to just scale up learning with very large datasets

Despite the various arguments against scaling solving robotics outright, most people seem to agree that scaling in robot learning is a promising direction to be investigated. Even if it doesn’t fully solve robotics, it could lead to a significant amount of progress on a number of hard problems we’ve been stuck on for a while. Additionally, as Russ Tedrake pointed out, pursuing this direction carefully could yield useful insights about the general robotics problem, as well as current learning algorithms and why they work so well.

We should also pursue other existing directions

Even the most vocal proponents of the scaling approach were clear that they don’t think everyone should be working on this. It’s likely a bad idea for the entire robot learning community to put its eggs in the same basket, especially given all the reasons to believe scaling won’t fully solve robotics. Classical robotics techniques have gotten us quite far, and led to many successful and reliable deployments: pushing forward on them or integrating them with learning techniques might be the right way forward, especially in the short to medium terms.

We should focus more on real-world mobile manipulation and easy-to-use systems

Vincent Vanhoucke made an observation that most papers at CoRL this year were limited to tabletop manipulation settings. While there are plenty of hard tabletop problems, things generally get a lot more complicated when the robot — and consequently its camera view — moves. Vincent speculated that it’s easy for the community to fall into a local minimum where we make a lot of progress that’s specific to the tabletop setting and therefore not generalizable. A similar thing could happen if we work predominantly in simulation. Avoiding these local minima by working on real-world mobile manipulation seems like a good idea.

Separately, Sergey Levine observed that a big reason why LLM’s have seen so much excitement and adoption is because they’re extremely easy to use: especially by non-experts. One doesn’t have to know about the details of training an LLM, or perform any tough setup, to prompt and use these models for their own tasks. Most robot learning approaches are currently far from this. They often require significant knowledge of their inner workings to use, and involve very significant amounts of setup. Perhaps thinking more about how to make robot learning systems easier to use and widely applicable could help improve adoption and potentially scalability of these approaches.

We should be more forthright about things that don’t work

There seemed to be a broadly-held complaint that many robot learning approaches don’t adequately report negative results, and this leads to a lot of unnecessary repeated effort. Additionally, perhaps patterns might emerge from consistent failures of things that we expect to work but don’t actually work well, and this could yield novel insight into learning algorithms. There is currently no good incentive for researchers to report such negative results in papers, but most people seemed to be in favor of designing one.

We should try to do something totally new

There were a few people who pointed out that all current approaches — be they learning-based or classical — are unsatisfying in a number of ways. There seem to be a number of drawbacks with each of them, and it’s very conceivable that there is a completely different set of approaches that ultimately solves robotics. Given this, it seems useful to try think outside the box. After all, every one of the current approaches that’s part of the debate was only made possible because the few researchers that introduced them dared to think against the popular grain of their times.

Acknowledgements: Huge thanks to Tom Silver and Leslie Kaelbling for providing helpful comments, suggestions, and encouragement on a previous draft of this post.


1 In fact, this was the topic of a popular debate hosted at a workshop on the first day; many of the points in this post were inspired by the conversation during that debate.



This post was originally published on the author’s personal blog.

Last year’s Conference on Robot Learning (CoRL) was the biggest CoRL yet, with over 900 attendees, 11 workshops, and almost 200 accepted papers. While there were a lot of cool new ideas (see this great set of notes for an overview of technical content), one particular debate seemed to be front and center: Is training a large neural network on a very large dataset a feasible way to solve robotics?1

Of course, some version of this question has been on researchers’ minds for a few years now. However, in the aftermath of the unprecedented success of ChatGPT and other large-scale “foundation models” on tasks that were thought to be unsolvable just a few years ago, the question was especially topical at this year’s CoRL. Developing a general-purpose robot, one that can competently and robustly execute a wide variety of tasks of interest in any home or office environment that humans can, has been perhaps the holy grail of robotics since the inception of the field. And given the recent progress of foundation models, it seems possible that scaling existing network architectures by training them on very large datasets might actually be the key to that grail.

Given how timely and significant this debate seems to be, I thought it might be useful to write a post centered around it. My main goal here is to try to present the different sides of the argument as I heard them, without bias towards any side. Almost all the content is taken directly from talks I attended or conversations I had with fellow attendees. My hope is that this serves to deepen people’s understanding around the debate, and maybe even inspire future research ideas and directions.

I want to start by presenting the main arguments I heard in favor of scaling as a solution to robotics.

Why Scaling Might Work
  • It worked for Computer Vision (CV) and Natural Language Processing (NLP), so why not robotics? This was perhaps the most common argument I heard, and the one that seemed to excite most people given recent models like GPT4-V and SAM. The point here is that training a large model on an extremely large corpus of data has recently led to astounding progress on problems thought to be intractable just 3-4 years ago. Moreover, doing this has led to a number of emergent capabilities, where trained models are able to perform well at a number of tasks they weren’t explicitly trained for. Importantly, the fundamental method here of training a large model on a very large amount of data is general and not somehow unique to CV or NLP. Thus, there seems to be no reason why we shouldn’t observe the same incredible performance on robotics tasks.
    • We’re already starting to see some evidence that this might work well: Chelsea Finn, Vincent Vanhoucke, and several others pointed to the recent RT-X and RT-2 papers from Google DeepMind as evidence that training a single model on large amounts of robotics data yields promising generalization capabilities. Russ Tedrake of Toyota Research Institute (TRI) and MIT pointed to the recent Diffusion Policies paper as showing a similar surprising capability. Sergey Levine of UC Berkeley highlighted recent efforts and successes from his group in building and deploying a robot-agnostic foundation model for navigation. All of these works are somewhat preliminary in that they train a relatively small model with a paltry amount of data compared to something like GPT4-V, but they certainly do seem to point to the fact that scaling up these models and datasets could yield impressive results in robotics.
  • Progress in data, compute, and foundation models are waves that we should ride: This argument is closely related to the above one, but distinct enough that I think it deserves to be discussed separately. The main idea here comes from Rich Sutton’s influential essay: The history of AI research has shown that relatively simple algorithms that scale well with data always outperform more complex/clever algorithms that do not. A nice analogy from Karol Hausman’s early career keynote is that improvements to data and compute are like a wave that is bound to happen given the progress and adoption of technology. Whether we like it or not, there will be more data and better compute. As AI researchers, we can either choose to ride this wave, or we can ignore it. Riding this wave means recognizing all the progress that’s happened because of large data and large models, and then developing algorithms, tools, datasets, etc. to take advantage of this progress. It also means leveraging large pre-trained models from vision and language that currently exist or will exist for robotics tasks.
  • Robotics tasks of interest lie on a relatively simple manifold, and training a large model will help us find it: This was something rather interesting that Russ Tedrake pointed out during a debate in the workshop on robustly deploying learning-based solutions. The manifold hypothesis as applied to robotics roughly states that, while the space of possible tasks we could conceive of having a robot do is impossibly large and complex, the tasks that actually occur practically in our world lie on some much lower-dimensional and simpler manifold of this space. By training a single model on large amounts of data, we might be able to discover this manifold. If we believe that such a manifold exists for robotics — which certainly seems intuitive — then this line of thinking would suggest that robotics is not somehow different from CV or NLP in any fundamental way. The same recipe that worked for CV and NLP should be able to discover the manifold for robotics and yield a shockingly competent generalist robot. Even if this doesn’t exactly happen, Tedrake points out that attempting to train a large model for general robotics tasks could teach us important things about the manifold of robotics tasks, and perhaps we can leverage this understanding to solve robotics.
  • Large models are the best approach we have to get at “common sense” capabilities, which pervade all of robotics: Another thing Russ Tedrake pointed out is that “common sense” pervades almost every robotics task of interest. Consider the task of having a mobile manipulation robot place a mug onto a table. Even if we ignore the challenging problems of finding and localizing the mug, there are a surprising number of subtleties to this problem. What if the table is cluttered and the robot has to move other objects out of the way? What if the mug accidentally falls on the floor and the robot has to pick it up again, re-orient it, and place it on the table? And what if the mug has something in it, so it’s important it’s never overturned? These “edge cases” are actually much more common that it might seem, and often are the difference between success and failure for a task. Moreover, these seem to require some sort of ‘common sense’ reasoning to deal with. Several people argued that large models trained on a large amount of data are the best way we know of to yield some aspects of this ‘common sense’ capability. Thus, they might be the best way we know of to solve general robotics tasks.

As you might imagine, there were a number of arguments against scaling as a practical solution to robotics. Interestingly, almost no one directly disputes that this approach could work in theory. Instead, most arguments fall into one of two buckets: (1) arguing that this approach is simply impractical, and (2) arguing that even if it does kind of work, it won’t really “solve” robotics.

Why Scaling Might Not WorkIt’s impractical
  • We currently just don’t have much robotics data, and there’s no clear way we’ll get it: This is the elephant in pretty much every large-scale robot learning room. The Internet is chock-full of data for CV and NLP, but not at all for robotics. Recent efforts to collect very large datasets have required tremendous amounts of time, money, and cooperation, yet have yielded a very small fraction of the amount of vision and text data on the Internet. CV and NLP got so much data because they had an incredible “data flywheel”: tens of millions of people connecting to and using the Internet. Unfortunately for robotics, there seems to be no reason why people would upload a bunch of sensory input and corresponding action pairs. Collecting a very large robotics dataset seems quite hard, and given that we know that a lot of important “emergent” properties only showed up in vision and language models at scale, the inability to get a large dataset could render this scaling approach hopeless.
  • Robots have different embodiments: Another challenge with collecting a very large robotics dataset is that robots come in a large variety of different shapes, sizes, and form factors. The output control actions that are sent to a Boston Dynamics Spot robot are very different to those sent to a KUKA iiwa arm. Even if we ignore the problem of finding some kind of common output space for a large trained model, the variety in robot embodiments means we’ll probably have to collect data from each robot type, and that makes the above data-collection problem even harder.
  • There is extremely large variance in the environments we want robots to operate in: For a robot to really be “general purpose,” it must be able to operate in any practical environment a human might want to put it in. This means operating in any possible home, factory, or office building it might find itself in. Collecting a dataset that has even just one example of every possible building seems impractical. Of course, the hope is that we would only need to collect data in a small fraction of these, and the rest will be handled by generalization. However, we don’t know how much data will be required for this generalization capability to kick in, and it very well could also be impractically large.
  • Training a model on such a large robotics dataset might be too expensive/energy-intensive: It’s no secret that training large foundation models is expensive, both in terms of money and in energy consumption. GPT-4V — OpenAI’s biggest foundation model at the time of this writing — reportedly cost over US $100 million and 50 million KWh of electricity to train. This is well beyond the budget and resources that any academic lab can currently spare, so a larger robotics foundation model would need to be trained by a company or a government of some kind. Additionally, depending on how large both the dataset and model itself for such an endeavor are, the costs may balloon by another order-of-magnitude or more, which might make it completely infeasible.
Even if it works as well as in CV/NLP, it won’t solve robotics
  • The 99.X problem and long tails: Vincent Vanhoucke of Google Robotics started a talk with a provocative assertion: Most — if not all — robot learning approaches cannot be deployed for any practical task. The reason? Real-world industrial and home applications typically require 99.X percent or higher accuracy and reliability. What exactly that means varies by application, but it’s safe to say that robot learning algorithms aren’t there yet. Most results presented in academic papers top out at 80 percent success rate. While that might seem quite close to the 99.X percent threshold, people trying to actually deploy these algorithms have found that it isn’t so: getting higher success rates requires asymptotically more effort as we get closer to 100 percent. That means going from 85 to 90 percent might require just as much — if not more — effort than going from 40 to 80 percent. Vincent asserted in his talk that getting up to 99.X percent is a fundamentally different beast than getting even up to 80 percent, one that might require a whole host of new techniques beyond just scaling.
    • Existing big models don’t get to 99.X percent even in CV and NLP: As impressive and capable as current large models like GPT-4V and DETIC are, even they don’t achieve 99.X percent or higher success rate on previously-unseen tasks. Current robotics models are very far from this level of performance, and I think it’s safe to say that the entire robot learning community would be thrilled to have a general model that does as well on robotics tasks as GPT-4V does on NLP tasks. However, even if we had something like this, it wouldn’t be at 99.X percent, and it’s not clear that it’s possible to get there by scaling either.
  • Self-driving car companies have tried this approach, and it doesn’t fully work (yet): This is closely related to the above point, but important and subtle enough that I think it deserves to stand on its own. A number of self-driving car companies — most notably Tesla and Wayve — have tried training such an end-to-end big model on large amounts of data to achieve Level 5 autonomy. Not only do these companies have the engineering resources and money to train such models, but they also have the data. Tesla in particular has a fleet of over 100,000 cars deployed in the real world that it is constantly collecting and then annotating data from. These cars are being teleoperated by experts, making the data ideal for large-scale supervised learning. And despite all this, Tesla has so far been unable to produce a Level 5 autonomous driving system. That’s not to say their approach doesn’t work at all. It competently handles a large number of situations — especially highway driving — and serves as a useful Level 2 (i.e., driver assist) system. However, it’s far from 99.X percent performance. Moreover, data seems to suggest that Tesla’s approach is faring far worse than Waymo or Cruise, which both use much more modular systems. While it isn’t inconceivable that Tesla’s approach could end up catching up and surpassing its competitors performance in a year or so, the fact that it hasn’t worked yet should serve as evidence perhaps that the 99.X percent problem is hard to overcome for a large-scale ML approach. Moreover, given that self-driving is a special case of general robotics, Tesla’s case should give us reason to doubt the large-scale model approach as a full solution to robotics, especially in the medium term.
  • Many robotics tasks of interest are quite long-horizon: Accomplishing any task requires taking a number of correct actions in sequence. Consider the relatively simple problem of making a cup of tea given an electric kettle, water, a box of tea bags, and a mug. Success requires pouring the water into the kettle, turning it on, then pouring the hot water into the mug, and placing a tea-bag inside it. If we want to solve this with a model trained to output motor torque commands given pixels as input, we’ll need to send torque commands to all 7 motors at around 40 Hz. Let’s suppose that this tea-making task requires 5 minutes. That requires 7 * 40 * 60 * 5 = 84,000 correct torque commands. This is all just for a stationary robot arm; things get much more complicated if the robot is mobile, or has more than one arm. It is well-known that error tends to compound with longer-horizons for most tasks. This is one reason why — despite their ability to produce long sequences of text — even LLMs cannot yet produce completely coherent novels or long stories: small deviations from a true prediction over time tend to add up and yield extremely large deviations over long-horizons. Given that most, if not all robotics tasks of interest require sending at least thousands, if not hundreds of thousands, of torques in just the right order, even a fairly well-performing model might really struggle to fully solve these robotics tasks.

Okay, now that we’ve sketched out all the main points on both sides of the debate, I want to spend some time diving into a few related points. Many of these are responses to the above points on the ‘against’ side, and some of them are proposals for directions to explore to help overcome the issues raised.

Miscellaneous Related ArgumentsWe can probably deploy learning-based approaches robustly

One point that gets brought up a lot against learning-based approaches is the lack of theoretical guarantees. At the time of this writing, we know very little about neural network theory: we don’t really know why they learn well, and more importantly, we don’t have any guarantees on what values they will output in different situations. On the other hand, most classical control and planning approaches that are widely used in robotics have various theoretical guarantees built-in. These are generally quite useful when certifying that systems are safe.

However, there seemed to be general consensus amongst a number of CoRL speakers that this point is perhaps given more significance than it should. Sergey Levine pointed out that most of the guarantees from controls aren’t really that useful for a number of real-world tasks we’re interested in. As he put it: “self-driving car companies aren’t worried about controlling the car to drive in a straight line, but rather about a situation in which someone paints a sky onto the back of a truck and drives in front of the car,” thereby confusing the perception system. Moreover, Scott Kuindersma of Boston Dynamics talked about how they’re deploying RL-based controllers on their robots in production, and are able to get the confidence and guarantees they need via rigorous simulation and real-world testing. Overall, I got the sense that while people feel that guarantees are important, and encouraged researchers to keep trying to study them, they don’t think that the lack of guarantees for learning-based systems means that they cannot be deployed robustly.

What if we strive to deploy Human-in-the-Loop systems?

In one of the organized debates, Emo Todorov pointed out that existing successful ML systems, like Codex and ChatGPT, work well only because a human interacts with and sanitizes their output. Consider the case of coding with Codex: it isn’t intended to directly produce runnable, bug-free code, but rather to act as an intelligent autocomplete for programmers, thereby making the overall human-machine team more productive than either alone. In this way, these models don’t have to achieve the 99.X percent performance threshold, because a human can help correct any issues during deployment. As Emo put it: “humans are forgiving, physics is not.”

Chelsea Finn responded to this by largely agreeing with Emo. She strongly agreed that all successfully-deployed and useful ML systems have humans in the loop, and so this is likely the setting that deployed robot learning systems will need to operate in as well. Of course, having a human operate in the loop with a robot isn’t as straightforward as in other domains, since having a human and robot inhabit the same space introduces potential safety hazards. However, it’s a useful setting to think about, especially if it can help address issues brought on by the 99.X percent problem.

Maybe we don’t need to collect that much real world data for scaling

A number of people at the conference were thinking about creative ways to overcome the real-world data bottleneck without actually collecting more real world data. Quite a few of these people argued that fast, realistic simulators could be vital here, and there were a number of works that explored creative ways to train robot policies in simulation and then transfer them to the real world. Another set of people argued that we can leverage existing vision, language, and video data and then just ‘sprinkle in’ some robotics data. Google’s recent RT-2 model showed how taking a large model trained on internet scale vision and language data, and then just fine-tuning it on a much smaller set robotics data can produce impressive performance on robotics tasks. Perhaps through a combination of simulation and pretraining on general vision and language data, we won’t actually have to collect too much real-world robotics data to get scaling to work well for robotics tasks.

Maybe combining classical and learning-based approaches can give us the best of both worlds

As with any debate, there were quite a few people advocating the middle path. Scott Kuindersma of Boston Dynamics titled one of his talks “Let’s all just be friends: model-based control helps learning (and vice versa)”. Throughout his talk, and the subsequent debates, his strong belief that in the short to medium term, the best path towards reliable real-world systems involves combining learning with classical approaches. In her keynote speech for the conference, Andrea Thomaz talked about how such a hybrid system — using learning for perception and a few skills, and classical SLAM and path-planning for the rest — is what powers a real-world robot that’s deployed in tens of hospital systems in Texas (and growing!). Several papers explored how classical controls and planning, together with learning-based approaches can enable much more capability than any system on its own. Overall, most people seemed to argue that this ‘middle path’ is extremely promising, especially in the short to medium term, but perhaps in the long-term either pure learning or an entirely different set of approaches might be best.

What Can/Should We Take Away From All This?

If you’ve read this far, chances are that you’re interested in some set of takeaways/conclusions. Perhaps you’re thinking “this is all very interesting, but what does all this mean for what we as a community should do? What research problems should I try to tackle?” Fortunately for you, there seemed to be a number of interesting suggestions that had some consensus on this.

We should pursue the direction of trying to just scale up learning with very large datasets

Despite the various arguments against scaling solving robotics outright, most people seem to agree that scaling in robot learning is a promising direction to be investigated. Even if it doesn’t fully solve robotics, it could lead to a significant amount of progress on a number of hard problems we’ve been stuck on for a while. Additionally, as Russ Tedrake pointed out, pursuing this direction carefully could yield useful insights about the general robotics problem, as well as current learning algorithms and why they work so well.

We should also pursue other existing directions

Even the most vocal proponents of the scaling approach were clear that they don’t think everyone should be working on this. It’s likely a bad idea for the entire robot learning community to put its eggs in the same basket, especially given all the reasons to believe scaling won’t fully solve robotics. Classical robotics techniques have gotten us quite far, and led to many successful and reliable deployments: pushing forward on them or integrating them with learning techniques might be the right way forward, especially in the short to medium terms.

We should focus more on real-world mobile manipulation and easy-to-use systems

Vincent Vanhoucke made an observation that most papers at CoRL this year were limited to tabletop manipulation settings. While there are plenty of hard tabletop problems, things generally get a lot more complicated when the robot — and consequently its camera view — moves. Vincent speculated that it’s easy for the community to fall into a local minimum where we make a lot of progress that’s specific to the tabletop setting and therefore not generalizable. A similar thing could happen if we work predominantly in simulation. Avoiding these local minima by working on real-world mobile manipulation seems like a good idea.

Separately, Sergey Levine observed that a big reason why LLM’s have seen so much excitement and adoption is because they’re extremely easy to use: especially by non-experts. One doesn’t have to know about the details of training an LLM, or perform any tough setup, to prompt and use these models for their own tasks. Most robot learning approaches are currently far from this. They often require significant knowledge of their inner workings to use, and involve very significant amounts of setup. Perhaps thinking more about how to make robot learning systems easier to use and widely applicable could help improve adoption and potentially scalability of these approaches.

We should be more forthright about things that don’t work

There seemed to be a broadly-held complaint that many robot learning approaches don’t adequately report negative results, and this leads to a lot of unnecessary repeated effort. Additionally, perhaps patterns might emerge from consistent failures of things that we expect to work but don’t actually work well, and this could yield novel insight into learning algorithms. There is currently no good incentive for researchers to report such negative results in papers, but most people seemed to be in favor of designing one.

We should try to do something totally new

There were a few people who pointed out that all current approaches — be they learning-based or classical — are unsatisfying in a number of ways. There seem to be a number of drawbacks with each of them, and it’s very conceivable that there is a completely different set of approaches that ultimately solves robotics. Given this, it seems useful to try think outside the box. After all, every one of the current approaches that’s part of the debate was only made possible because the few researchers that introduced them dared to think against the popular grain of their times.

Acknowledgements: Huge thanks to Tom Silver and Leslie Kaelbling for providing helpful comments, suggestions, and encouragement on a previous draft of this post.


1 In fact, this was the topic of a popular debate hosted at a workshop on the first day; many of the points in this post were inspired by the conversation during that debate.



Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion.

RoboCup 2024: 17–22 July 2024, EINDHOVEN, NETHERLANDSICSR 2024: 23–26 October 2024, ODENSE, DENMARKCybathlon 2024: 25–27 October 2024, ZURICH

Enjoy today’s videos!

NAVER 1784 is the world’s largest robotics testbed. The Starbucks on the second floor of 1784 is the world’s most unique Starbucks, with more than 100 service robots called “Rookie” delivering Starbucks drinks to meeting rooms and private seats, and various experiments with a dual-arm robot.

[ Naver ]

If you’re gonna take a robot dog with you on a hike, the least it could do is carry your backpack for you.

[ Deep Robotics ]

Obligatory reminder that phrases like “no teleoperation” without any additional context can mean many different things.

[ Astribot ]

This video is presented at the ICRA 2024 conference and summarizes recent results of our Learning AI for Dextrous Manipulation Lab. It demonstrates how our learning AI methods allowed for breakthroughs in dextrous manipulation with the mobile humanoid robot DLR Agile Justin. Although the core of the mechatronic hardware is almost 20 years old, only the advent of learning AI methods enabled a level of dexterity, flexibility and autonomy coming close to human capabilities.

[ TUM ]

Thanks Berthold!

Hands of blue? Not a good look.

[ Synaptic ]

With all the humanoid stuff going on, there really should be more emphasis on intentional contact—humans lean and balance on things all the time, and robots should too!

[ Inria ]

LimX Dynamics W1 is now more than a wheeled quadruped. By evolving into a biped robot, W1 maneuvers slickly on two legs in different ways: non-stop 360° rotation, upright free gliding, slick maneuvering, random collision and self-recovery, and step walking.

[ LimX Dynamics ]

Animal brains use less data and energy compared to current deep neural networks running on Graphics Processing Units (GPUs). This makes it hard to develop tiny autonomous drones, which are too small and light for heavy hardware and big batteries. Recently, the emergence of neuromorphic processors that mimic how brains function has made it possible for researchers from Delft University of Technology to develop a drone that uses neuromorphic vision and control for autonomous flight.

[ Science ]

In the beginning of the universe, all was darkness — until the first organisms developed sight, which ushered in an explosion of life, learning and progress. AI pioneer Fei-Fei Li says a similar moment is about to happen for computers and robots. She shows how machines are gaining “spatial intelligence” — the ability to process visual data, make predictions and act upon those predictions — and shares how this could enable AI to interact with humans in the real world.

[ TED ]



Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion.

RoboCup 2024: 17–22 July 2024, EINDHOVEN, NETHERLANDSICSR 2024: 23–26 October 2024, ODENSE, DENMARKCybathlon 2024: 25–27 October 2024, ZURICH

Enjoy today’s videos!

NAVER 1784 is the world’s largest robotics testbed. The Starbucks on the second floor of 1784 is the world’s most unique Starbucks, with more than 100 service robots called “Rookie” delivering Starbucks drinks to meeting rooms and private seats, and various experiments with a dual-arm robot.

[ Naver ]

If you’re gonna take a robot dog with you on a hike, the least it could do is carry your backpack for you.

[ Deep Robotics ]

Obligatory reminder that phrases like “no teleoperation” without any additional context can mean many different things.

[ Astribot ]

This video is presented at the ICRA 2024 conference and summarizes recent results of our Learning AI for Dextrous Manipulation Lab. It demonstrates how our learning AI methods allowed for breakthroughs in dextrous manipulation with the mobile humanoid robot DLR Agile Justin. Although the core of the mechatronic hardware is almost 20 years old, only the advent of learning AI methods enabled a level of dexterity, flexibility and autonomy coming close to human capabilities.

[ TUM ]

Thanks Berthold!

Hands of blue? Not a good look.

[ Synaptic ]

With all the humanoid stuff going on, there really should be more emphasis on intentional contact—humans lean and balance on things all the time, and robots should too!

[ Inria ]

LimX Dynamics W1 is now more than a wheeled quadruped. By evolving into a biped robot, W1 maneuvers slickly on two legs in different ways: non-stop 360° rotation, upright free gliding, slick maneuvering, random collision and self-recovery, and step walking.

[ LimX Dynamics ]

Animal brains use less data and energy compared to current deep neural networks running on Graphics Processing Units (GPUs). This makes it hard to develop tiny autonomous drones, which are too small and light for heavy hardware and big batteries. Recently, the emergence of neuromorphic processors that mimic how brains function has made it possible for researchers from Delft University of Technology to develop a drone that uses neuromorphic vision and control for autonomous flight.

[ Science ]

In the beginning of the universe, all was darkness — until the first organisms developed sight, which ushered in an explosion of life, learning and progress. AI pioneer Fei-Fei Li says a similar moment is about to happen for computers and robots. She shows how machines are gaining “spatial intelligence” — the ability to process visual data, make predictions and act upon those predictions — and shares how this could enable AI to interact with humans in the real world.

[ TED ]



Greetings from the IEEE International Conference on Robotics and Automation (ICRA) in Yokohama, Japan! We hope you’ve been enjoying our short videos on TikTok, YouTube, and Instagram. They are just a preview of our in-depth ICRA coverage, and over the next several weeks we’ll have lots of articles and videos for you. In today’s edition of Video Friday, we bring you a dozen of the most interesting projects presented at the conference.

Enjoy today’s videos, and stay tuned for more ICRA posts!

Upcoming robotics events for the next few months:

RoboCup 2024: 17–22 July 2024, EINDHOVEN, NETHERLANDSICSR 2024: 23–26 October 2024, ODENSE, DENMARKCybathlon 2024: 25–27 October 2024, ZURICH, SWITZERLAND

Please send us your events for inclusion.

The following two videos are part of the “ Cooking Robotics: Perception and Motion Planning” workshop, which explored “the new frontiers of ‘robots in cooking,’ addressing various scientific research questions, including hardware considerations, key challenges in multimodal perception, motion planning and control, experimental methodologies, and benchmarking approaches.” The workshop featured robots handling food items like cookies, burgers, and cereal, and the two robots seen in the videos below used knives to slice cucumbers and cakes. You can watch all workshop videos here.

“SliceIt!: Simulation-Based Reinforcement Learning for Compliant Robotic Food Slicing,” by Cristian C. Beltran-Hernandez, Nicolas Erbetti, and Masashi Hamaya from OMRON SINIC X Corporation, Tokyo, Japan.

Cooking robots can enhance the home experience by reducing the burden of daily chores. However, these robots must perform their tasks dexterously and safely in shared human environments, especially when handling dangerous tools such as kitchen knives. This study focuses on enabling a robot to autonomously and safely learn food-cutting tasks. More specifically, our goal is to enable a collaborative robot or industrial robot arm to perform food-slicing tasks by adapting to varying material properties using compliance control. Our approach involves using Reinforcement Learning (RL) to train a robot to compliantly manipulate a knife, by reducing the contact forces exerted by the food items and by the cutting board. However, training the robot in the real world can be inefficient, and dangerous, and result in a lot of food waste. Therefore, we proposed SliceIt!, a framework for safely and efficiently learning robot food-slicing tasks in simulation. Following a real2sim2real approach, our framework consists of collecting a few real food slicing data, calibrating our dual simulation environment (a high-fidelity cutting simulator and a robotic simulator), learning compliant control policies on the calibrated simulation environment, and finally, deploying the policies on the real robot.

“Cafe Robot: Integrated AI Skillset Based on Large Language Models,” by Jad Tarifi, Nima Asgharbeygi, Shuhei Takamatsu, and Masataka Goto from Integral AI in Tokyo, Japan, and Mountain View, Calif., USA.

The cafe robot engages in natural language inter-action to receive orders and subsequently prepares coffee and cakes. Each action involved in making these items is executed using AI skills developed by Integral, including Integral Liquid Pouring, Integral Powder Scooping, and Integral Cutting. The dialogue for making coffee, as well as the coordination of each action based on the dialogue, is facilitated by the Integral Task Planner.

“Autonomous Overhead Powerline Recharging for Uninterrupted Drone Operations,” by Viet Duong Hoang, Frederik Falk Nyboe, Nicolaj Haarhøj Malle, and Emad Ebeid from University of Southern Denmark, Odense, Denmark.

We present a fully autonomous self-recharging drone system capable of long-duration sustained operations near powerlines. The drone is equipped with a robust onboard perception and navigation system that enables it to locate powerlines and approach them for landing. A passively actuated gripping mechanism grasps the powerline cable during landing after which a control circuit regulates the magnetic field inside a split-core current transformer to provide sufficient holding force as well as battery recharging. The system is evaluated in an active outdoor three-phase powerline environment. We demonstrate multiple contiguous hours of fully autonomous uninterrupted drone operations composed of several cycles of flying, landing, recharging, and takeoff, validating the capability of extended, essentially unlimited, operational endurance.

“Learning Quadrupedal Locomotion With Impaired Joints Using Random Joint Masking,” by Mincheol Kim, Ukcheol Shin, and Jung-Yup Kim from Seoul National University of Science and Technology, Seoul, South Korea, and Robotics Institute, Carnegie Mellon University, Pittsburgh, Pa., USA.

Quadrupedal robots have played a crucial role in various environments, from structured environments to complex harsh terrains, thanks to their agile locomotion ability. However, these robots can easily lose their locomotion functionality if damaged by external accidents or internal malfunctions. In this paper, we propose a novel deep reinforcement learning framework to enable a quadrupedal robot to walk with impaired joints. The proposed framework consists of three components: 1) a random joint masking strategy for simulating impaired joint scenarios, 2) a joint state estimator to predict an implicit status of current joint condition based on past observation history, and 3) progressive curriculum learning to allow a single network to conduct both normal gait and various joint-impaired gaits. We verify that our framework enables the Unitree’s Go1 robot to walk under various impaired joint conditions in real world indoor and outdoor environments.

“Synthesizing Robust Walking Gaits via Discrete-Time Barrier Functions With Application to Multi-Contact Exoskeleton Locomotion,” by Maegan Tucker, Kejun Li, and Aaron D. Ames from Georgia Institute of Technology, Atlanta, Ga., and California Institute of Technology, Pasadena, Calif., USA.

Successfully achieving bipedal locomotion remains challenging due to real-world factors such as model uncertainty, random disturbances, and imperfect state estimation. In this work, we propose a novel metric for locomotive robustness – the estimated size of the hybrid forward invariant set associated with the step-to-step dynamics. Here, the forward invariant set can be loosely interpreted as the region of attraction for the discrete-time dynamics. We illustrate the use of this metric towards synthesizing nominal walking gaits using a simulation in-the-loop learning approach. Further, we leverage discrete time barrier functions and a sampling-based approach to approximate sets that are maximally forward invariant. Lastly, we experimentally demonstrate that this approach results in successful locomotion for both flat-foot walking and multicontact walking on the Atalante lower-body exoskeleton.

“Supernumerary Robotic Limbs to Support Post-Fall Recoveries for Astronauts,” by Erik Ballesteros, Sang-Yoep Lee, Kalind C. Carpenter, and H. Harry Asada from MIT, Cambridge, Mass., USA, and Jet Propulsion Laboratory, California Institute of Technology, Pasadena, Calif., USA.

This paper proposes the utilization of Supernumerary Robotic Limbs (SuperLimbs) for augmenting astronauts during an Extra-Vehicular Activity (EVA) in a partial-gravity environment. We investigate the effectiveness of SuperLimbs in assisting astronauts to their feet following a fall. Based on preliminary observations from a pilot human study, we categorized post-fall recoveries into a sequence of statically stable poses called “waypoints”. The paths between the waypoints can be modeled with a simplified kinetic motion applied about a specific point on the body. Following the characterization of post-fall recoveries, we designed a task-space impedance control with high damping and low stiffness, where the SuperLimbs provide an astronaut with assistance in post-fall recovery while keeping the human in-the-loop scheme. In order to validate this control scheme, a full-scale wearable analog space suit was constructed and tested with a SuperLimbs prototype. Results from the experimentation found that without assistance, astronauts would impulsively exert themselves to perform a post-fall recovery, which resulted in high energy consumption and instabilities maintaining an upright posture, concurring with prior NASA studies. When the SuperLimbs provided assistance, the astronaut’s energy consumption and deviation in their tracking as they performed a post-fall recovery was reduced considerably.

“ArrayBot: Reinforcement Learning for Generalizable Distributed Manipulation through Touch,” by Zhengrong Xue, Han Zhang, Jingwen Cheng, Zhengmao He, Yuanchen Ju, Changyi Lin, Gu Zhang, and Huazhe Xu from Tsinghua Embodied AI Lab, IIIS, Tsinghua University; Shanghai Qi Zhi Institute; Shanghai AI Lab; and Shanghai Jiao Tong University, Shanghai, China.

We present ArrayBot, a distributed manipulation system consisting of a 16 × 16 array of vertically sliding pillars integrated with tactile sensors. Functionally, ArrayBot is designed to simultaneously support, perceive, and manipulate the tabletop objects. Towards generalizable distributed manipulation, we leverage reinforcement learning (RL) algorithms for the automatic discovery of control policies. In the face of the massively redundant actions, we propose to reshape the action space by considering the spatially local action patch and the low-frequency actions in the frequency domain. With this reshaped action space, we train RL agents that can relocate diverse objects through tactile observations only. Intriguingly, we find that the discovered policy can not only generalize to unseen object shapes in the simulator but also have the ability to transfer to the physical robot without any sim-to-real fine tuning. Leveraging the deployed policy, we derive more real world manipulation skills on ArrayBot to further illustrate the distinctive merits of our proposed system.

“SKT-Hang: Hanging Everyday Objects via Object-Agnostic Semantic Keypoint Trajectory Generation,” by Chia-Liang Kuo, Yu-Wei Chao, and Yi-Ting Chen from National Yang Ming Chiao Tung University, in Taipei and Hsinchu, Taiwan, and NVIDIA.

We study the problem of hanging a wide range of grasped objects on diverse supporting items. Hanging objects is a ubiquitous task that is encountered in numerous aspects of our everyday lives. However, both the objects and supporting items can exhibit substantial variations in their shapes and structures, bringing two challenging issues: (1) determining the task-relevant geometric structures across different objects and supporting items, and (2) identifying a robust action sequence to accommodate the shape variations of supporting items. To this end, we propose Semantic Keypoint Trajectory (SKT), an object agnostic representation that is highly versatile and applicable to various everyday objects. We also propose Shape-conditioned Trajectory Deformation Network (SCTDN), a model that learns to generate SKT by deforming a template trajectory based on the task-relevant geometric structure features of the supporting items. We conduct extensive experiments and demonstrate substantial improvements in our framework over existing robot hanging methods in the success rate and inference time. Finally, our simulation-trained framework shows promising hanging results in the real world.

“TEXterity: Tactile Extrinsic deXterity,” by Antonia Bronars, Sangwoon Kim, Parag Patre, and Alberto Rodriguez from MIT and Magna International Inc.

We introduce a novel approach that combines tactile estimation and control for in-hand object manipulation. By integrating measurements from robot kinematics and an image based tactile sensor, our framework estimates and tracks object pose while simultaneously generating motion plans in a receding horizon fashion to control the pose of a grasped object. This approach consists of a discrete pose estimator that tracks the most likely sequence of object poses in a coarsely discretized grid, and a continuous pose estimator-controller to refine the pose estimate and accurately manipulate the pose of the grasped object. Our method is tested on diverse objects and configurations, achieving desired manipulation objectives and outperforming single-shot methods in estimation accuracy. The proposed approach holds potential for tasks requiring precise manipulation and limited intrinsic in-hand dexterity under visual occlusion, laying the foundation for closed loop behavior in applications such as regrasping, insertion, and tool use.

“Out of Sight, Still in Mind: Reasoning and Planning about Unobserved Objects With Video Tracking Enabled Memory Models,” by Yixuan Huang, Jialin Yuan, Chanho Kim, Pupul Pradhan, Bryan Chen, Li Fuxin, and Tucker Hermans from University of Utah, Salt Lake City, Utah, Oregon State University, Corvallis, Ore., and NVIDIA, Seattle, Wash., USA.

Robots need to have a memory of previously observed, but currently occluded objects to work reliably in realistic environments. We investigate the problem of encoding object-oriented memory into a multi-object manipulation reasoning and planning framework. We propose DOOM and LOOM, which leverage transformer relational dynamics to encode the history of trajectories given partial-view point clouds and an object discovery and tracking engine. Our approaches can perform multiple challenging tasks including reasoning with occluded objects, novel objects appearance, and object reappearance. Throughout our extensive simulation and real world experiments, we find that our approaches perform well in terms of different numbers of objects and different numbers

“Open Sourse Underwater Robot: Easys,” by Michikuni Eguchi, Koki Kato, Tatsuya Oshima, and Shunya Hara from University of Tsukuba and Osaka University, Japan.

“Sensorized Soft Skin for Dexterous Robotic Hands,” by Jana Egli, Benedek Forrai, Thomas Buchner, Jiangtao Su, Xiaodong Chen, and Robert K. Katzschmann from ETH Zurich, Switzerland, and Nanyang Technological University, Singapore.

Conventional industrial robots often use two fingered grippers or suction cups to manipulate objects or interact with the world. Because of their simplified design, they are unable to reproduce the dexterity of human hands when manipulating a wide range of objects. While the control of humanoid hands evolved greatly, hardware platforms still lack capabilities, particularly in tactile sensing and providing soft contact surfaces. In this work, we present a method that equips the skeleton of a tendon-driven humanoid hand with a soft and sensorized tactile skin. Multi-material 3D printing allows us to iteratively approach a cast skin design which preserves the robot’s dexterity in terms of range of motion and speed. We demonstrate that a soft skin enables frmer grasps and piezoresistive sensor integration enhances the hand’s tactile sensing capabilities.


Greetings from the IEEE International Conference on Robotics and Automation (ICRA) in Yokohama, Japan! We hope you’ve been enjoying our short videos on TikTok, YouTube, and Instagram. They are just a preview of our in-depth ICRA coverage, and over the next several weeks we’ll have lots of articles and videos for you. In today’s edition of Video Friday, we bring you a dozen of the most interesting projects presented at the conference.

Enjoy today’s videos, and stay tuned for more ICRA posts!

Upcoming robotics events for the next few months:

RoboCup 2024: 17–22 July 2024, EINDHOVEN, NETHERLANDSICSR 2024: 23–26 October 2024, ODENSE, DENMARKCybathlon 2024: 25–27 October 2024, ZURICH, SWITZERLAND

Please send us your events for inclusion.

The following two videos are part of the “ Cooking Robotics: Perception and Motion Planning” workshop, which explored “the new frontiers of ‘robots in cooking,’ addressing various scientific research questions, including hardware considerations, key challenges in multimodal perception, motion planning and control, experimental methodologies, and benchmarking approaches.” The workshop featured robots handling food items like cookies, burgers, and cereal, and the two robots seen in the videos below used knives to slice cucumbers and cakes. You can watch all workshop videos here.

“SliceIt!: Simulation-Based Reinforcement Learning for Compliant Robotic Food Slicing,” by Cristian C. Beltran-Hernandez, Nicolas Erbetti, and Masashi Hamaya from OMRON SINIC X Corporation, Tokyo, Japan.

Cooking robots can enhance the home experience by reducing the burden of daily chores. However, these robots must perform their tasks dexterously and safely in shared human environments, especially when handling dangerous tools such as kitchen knives. This study focuses on enabling a robot to autonomously and safely learn food-cutting tasks. More specifically, our goal is to enable a collaborative robot or industrial robot arm to perform food-slicing tasks by adapting to varying material properties using compliance control. Our approach involves using Reinforcement Learning (RL) to train a robot to compliantly manipulate a knife, by reducing the contact forces exerted by the food items and by the cutting board. However, training the robot in the real world can be inefficient, and dangerous, and result in a lot of food waste. Therefore, we proposed SliceIt!, a framework for safely and efficiently learning robot food-slicing tasks in simulation. Following a real2sim2real approach, our framework consists of collecting a few real food slicing data, calibrating our dual simulation environment (a high-fidelity cutting simulator and a robotic simulator), learning compliant control policies on the calibrated simulation environment, and finally, deploying the policies on the real robot.

“Cafe Robot: Integrated AI Skillset Based on Large Language Models,” by Jad Tarifi, Nima Asgharbeygi, Shuhei Takamatsu, and Masataka Goto from Integral AI in Tokyo, Japan, and Mountain View, Calif., USA.

The cafe robot engages in natural language inter-action to receive orders and subsequently prepares coffee and cakes. Each action involved in making these items is executed using AI skills developed by Integral, including Integral Liquid Pouring, Integral Powder Scooping, and Integral Cutting. The dialogue for making coffee, as well as the coordination of each action based on the dialogue, is facilitated by the Integral Task Planner.

“Autonomous Overhead Powerline Recharging for Uninterrupted Drone Operations,” by Viet Duong Hoang, Frederik Falk Nyboe, Nicolaj Haarhøj Malle, and Emad Ebeid from University of Southern Denmark, Odense, Denmark.

We present a fully autonomous self-recharging drone system capable of long-duration sustained operations near powerlines. The drone is equipped with a robust onboard perception and navigation system that enables it to locate powerlines and approach them for landing. A passively actuated gripping mechanism grasps the powerline cable during landing after which a control circuit regulates the magnetic field inside a split-core current transformer to provide sufficient holding force as well as battery recharging. The system is evaluated in an active outdoor three-phase powerline environment. We demonstrate multiple contiguous hours of fully autonomous uninterrupted drone operations composed of several cycles of flying, landing, recharging, and takeoff, validating the capability of extended, essentially unlimited, operational endurance.

“Learning Quadrupedal Locomotion With Impaired Joints Using Random Joint Masking,” by Mincheol Kim, Ukcheol Shin, and Jung-Yup Kim from Seoul National University of Science and Technology, Seoul, South Korea, and Robotics Institute, Carnegie Mellon University, Pittsburgh, Pa., USA.

Quadrupedal robots have played a crucial role in various environments, from structured environments to complex harsh terrains, thanks to their agile locomotion ability. However, these robots can easily lose their locomotion functionality if damaged by external accidents or internal malfunctions. In this paper, we propose a novel deep reinforcement learning framework to enable a quadrupedal robot to walk with impaired joints. The proposed framework consists of three components: 1) a random joint masking strategy for simulating impaired joint scenarios, 2) a joint state estimator to predict an implicit status of current joint condition based on past observation history, and 3) progressive curriculum learning to allow a single network to conduct both normal gait and various joint-impaired gaits. We verify that our framework enables the Unitree’s Go1 robot to walk under various impaired joint conditions in real world indoor and outdoor environments.

“Synthesizing Robust Walking Gaits via Discrete-Time Barrier Functions With Application to Multi-Contact Exoskeleton Locomotion,” by Maegan Tucker, Kejun Li, and Aaron D. Ames from Georgia Institute of Technology, Atlanta, Ga., and California Institute of Technology, Pasadena, Calif., USA.

Successfully achieving bipedal locomotion remains challenging due to real-world factors such as model uncertainty, random disturbances, and imperfect state estimation. In this work, we propose a novel metric for locomotive robustness – the estimated size of the hybrid forward invariant set associated with the step-to-step dynamics. Here, the forward invariant set can be loosely interpreted as the region of attraction for the discrete-time dynamics. We illustrate the use of this metric towards synthesizing nominal walking gaits using a simulation in-the-loop learning approach. Further, we leverage discrete time barrier functions and a sampling-based approach to approximate sets that are maximally forward invariant. Lastly, we experimentally demonstrate that this approach results in successful locomotion for both flat-foot walking and multicontact walking on the Atalante lower-body exoskeleton.

“Supernumerary Robotic Limbs to Support Post-Fall Recoveries for Astronauts,” by Erik Ballesteros, Sang-Yoep Lee, Kalind C. Carpenter, and H. Harry Asada from MIT, Cambridge, Mass., USA, and Jet Propulsion Laboratory, California Institute of Technology, Pasadena, Calif., USA.

This paper proposes the utilization of Supernumerary Robotic Limbs (SuperLimbs) for augmenting astronauts during an Extra-Vehicular Activity (EVA) in a partial-gravity environment. We investigate the effectiveness of SuperLimbs in assisting astronauts to their feet following a fall. Based on preliminary observations from a pilot human study, we categorized post-fall recoveries into a sequence of statically stable poses called “waypoints”. The paths between the waypoints can be modeled with a simplified kinetic motion applied about a specific point on the body. Following the characterization of post-fall recoveries, we designed a task-space impedance control with high damping and low stiffness, where the SuperLimbs provide an astronaut with assistance in post-fall recovery while keeping the human in-the-loop scheme. In order to validate this control scheme, a full-scale wearable analog space suit was constructed and tested with a SuperLimbs prototype. Results from the experimentation found that without assistance, astronauts would impulsively exert themselves to perform a post-fall recovery, which resulted in high energy consumption and instabilities maintaining an upright posture, concurring with prior NASA studies. When the SuperLimbs provided assistance, the astronaut’s energy consumption and deviation in their tracking as they performed a post-fall recovery was reduced considerably.

“ArrayBot: Reinforcement Learning for Generalizable Distributed Manipulation through Touch,” by Zhengrong Xue, Han Zhang, Jingwen Cheng, Zhengmao He, Yuanchen Ju, Changyi Lin, Gu Zhang, and Huazhe Xu from Tsinghua Embodied AI Lab, IIIS, Tsinghua University; Shanghai Qi Zhi Institute; Shanghai AI Lab; and Shanghai Jiao Tong University, Shanghai, China.

We present ArrayBot, a distributed manipulation system consisting of a 16 × 16 array of vertically sliding pillars integrated with tactile sensors. Functionally, ArrayBot is designed to simultaneously support, perceive, and manipulate the tabletop objects. Towards generalizable distributed manipulation, we leverage reinforcement learning (RL) algorithms for the automatic discovery of control policies. In the face of the massively redundant actions, we propose to reshape the action space by considering the spatially local action patch and the low-frequency actions in the frequency domain. With this reshaped action space, we train RL agents that can relocate diverse objects through tactile observations only. Intriguingly, we find that the discovered policy can not only generalize to unseen object shapes in the simulator but also have the ability to transfer to the physical robot without any sim-to-real fine tuning. Leveraging the deployed policy, we derive more real world manipulation skills on ArrayBot to further illustrate the distinctive merits of our proposed system.

“SKT-Hang: Hanging Everyday Objects via Object-Agnostic Semantic Keypoint Trajectory Generation,” by Chia-Liang Kuo, Yu-Wei Chao, and Yi-Ting Chen from National Yang Ming Chiao Tung University, in Taipei and Hsinchu, Taiwan, and NVIDIA.

We study the problem of hanging a wide range of grasped objects on diverse supporting items. Hanging objects is a ubiquitous task that is encountered in numerous aspects of our everyday lives. However, both the objects and supporting items can exhibit substantial variations in their shapes and structures, bringing two challenging issues: (1) determining the task-relevant geometric structures across different objects and supporting items, and (2) identifying a robust action sequence to accommodate the shape variations of supporting items. To this end, we propose Semantic Keypoint Trajectory (SKT), an object agnostic representation that is highly versatile and applicable to various everyday objects. We also propose Shape-conditioned Trajectory Deformation Network (SCTDN), a model that learns to generate SKT by deforming a template trajectory based on the task-relevant geometric structure features of the supporting items. We conduct extensive experiments and demonstrate substantial improvements in our framework over existing robot hanging methods in the success rate and inference time. Finally, our simulation-trained framework shows promising hanging results in the real world.

“TEXterity: Tactile Extrinsic deXterity,” by Antonia Bronars, Sangwoon Kim, Parag Patre, and Alberto Rodriguez from MIT and Magna International Inc.

We introduce a novel approach that combines tactile estimation and control for in-hand object manipulation. By integrating measurements from robot kinematics and an image based tactile sensor, our framework estimates and tracks object pose while simultaneously generating motion plans in a receding horizon fashion to control the pose of a grasped object. This approach consists of a discrete pose estimator that tracks the most likely sequence of object poses in a coarsely discretized grid, and a continuous pose estimator-controller to refine the pose estimate and accurately manipulate the pose of the grasped object. Our method is tested on diverse objects and configurations, achieving desired manipulation objectives and outperforming single-shot methods in estimation accuracy. The proposed approach holds potential for tasks requiring precise manipulation and limited intrinsic in-hand dexterity under visual occlusion, laying the foundation for closed loop behavior in applications such as regrasping, insertion, and tool use.

“Out of Sight, Still in Mind: Reasoning and Planning about Unobserved Objects With Video Tracking Enabled Memory Models,” by Yixuan Huang, Jialin Yuan, Chanho Kim, Pupul Pradhan, Bryan Chen, Li Fuxin, and Tucker Hermans from University of Utah, Salt Lake City, Utah, Oregon State University, Corvallis, Ore., and NVIDIA, Seattle, Wash., USA.

Robots need to have a memory of previously observed, but currently occluded objects to work reliably in realistic environments. We investigate the problem of encoding object-oriented memory into a multi-object manipulation reasoning and planning framework. We propose DOOM and LOOM, which leverage transformer relational dynamics to encode the history of trajectories given partial-view point clouds and an object discovery and tracking engine. Our approaches can perform multiple challenging tasks including reasoning with occluded objects, novel objects appearance, and object reappearance. Throughout our extensive simulation and real world experiments, we find that our approaches perform well in terms of different numbers of objects and different numbers

“Open Sourse Underwater Robot: Easys,” by Michikuni Eguchi, Koki Kato, Tatsuya Oshima, and Shunya Hara from University of Tsukuba and Osaka University, Japan.

“Sensorized Soft Skin for Dexterous Robotic Hands,” by Jana Egli, Benedek Forrai, Thomas Buchner, Jiangtao Su, Xiaodong Chen, and Robert K. Katzschmann from ETH Zurich, Switzerland, and Nanyang Technological University, Singapore.

Conventional industrial robots often use two fingered grippers or suction cups to manipulate objects or interact with the world. Because of their simplified design, they are unable to reproduce the dexterity of human hands when manipulating a wide range of objects. While the control of humanoid hands evolved greatly, hardware platforms still lack capabilities, particularly in tactile sensing and providing soft contact surfaces. In this work, we present a method that equips the skeleton of a tendon-driven humanoid hand with a soft and sensorized tactile skin. Multi-material 3D printing allows us to iteratively approach a cast skin design which preserves the robot’s dexterity in terms of range of motion and speed. We demonstrate that a soft skin enables frmer grasps and piezoresistive sensor integration enhances the hand’s tactile sensing capabilities.


It’s hard to think of a more dramatic way to make an entrance than falling from the sky. While it certainly happens often enough on the silver screen, whether or not it can be done in real life is a tantalizing challenge for our entertainment robotics team at Disney Research.

Falling is tricky for two reasons. The first and most obvious is what Douglas Adams referred to as “the sudden stop at the end.” Every second of free fall means another 9.8 m/s of velocity, and that can quickly add up to an extremely difficult energy dissipation problem. The other tricky thing about falling, especially for terrestrial animals like us, is that our normal methods for controlling our orientation disappear. We are used to relying on contact forces between our body and the environment to control which way we’re pointing. In the air, there’s nothing to push on except the air itself!

Finding a solution to these problems is a big, open-ended challenge. In the clip below, you can see one approach we’ve taken to start chipping away at it.

The video shows a small, stick-like robot with an array of four ducted fans attached to its top. The robot has a piston-like foot that absorbs the impact of a small fall, and then the ducted fans keep the robot standing by counteracting any tilting motion using aerodynamic thrust.

Raphael Pilon [left] and Marcela de los Rios evaluate the performance of the monopod balancing robot.Disney Research

The standing portion demonstrates that pushing on the air isn’t only useful during freefall. Conventional walking and hopping robots depend on ground contact forces to maintain the required orientation. These forces can ramp up quickly because of the stiffness of the system, necessitating high bandwidth control strategies. Aerodynamic forces are relatively soft, but even so, they were sufficient to keep our robots standing. And since these forces can also be applied during the flight phase of running or hopping, this approach might lead to robots that run before they walk. The thing that defines a running gait is the existence of a “flight phase” - a time when none of the feet are in contact with the ground. A running robot with aerodynamic control authority could potentially use a gait with a long flight phase. This would shift the burden of the control effort to mid-flight, simplifying the leg design and possibly making rapid bipedal motion more tractable than a moderate pace.

Richard Landon uses a test rig to evaluate the thrust profile of a ducted fan.Disney Research

In the next video, a slightly larger robot tackles a much more dramatic fall, from 65’ in the air. This simple machine has two piston-like feet and a similar array of ducted fans on top. The fans not only stabilize the robot upon landing, they also help keep it oriented properly as it falls. Inside each foot is a plug of single-use compressible foam. Crushing the foam on impact provides a nice, constant force profile, which maximizes the amount of energy dissipated per inch of contraction.

In the case of this little robot, the mechanical energy dissipation in the pistons is less than the total energy needed to be dissipated from the fall, so the rest of the mechanism takes a pretty hard hit. The size of the robot is an advantage in this case, because scaling laws mean that the strength-to-weight ratio is in its favor.

The strength of a component is a function of its cross-sectional area, while the weight of a component is a function of its volume. Area is proportional to length squared, while volume is proportional to length cubed. This means that as an object gets smaller, its weight becomes relatively small. This is why a toddler can be half the height of an adult but only a fraction of that adult’s weight, and why ants and spiders can run around on long, spindly legs. Our tiny robots take advantage of this, but we can’t stop there if we want to represent some of our bigger characters.

Louis Lambie and Michael Lynch assemble an early ducted fan test platform. The platform was mounted on guidewires and was used for lifting capacity tests.Disney Research

In most aerial robotics applications, control is provided by a system that is capable of supporting the entire weight of the robot. In our case, being able to hover isn’t a necessity. The clip below shows an investigation into how much thrust is needed to control the orientation of a fairly large, heavy robot. The robot is supported on a gimbal, allowing it to spin freely. At the extremities are mounted arrays of ducted fans. The fans don’t have enough force to keep the frame in the air, but they do have a lot of control authority over the orientation.

Complicated robots are less likely to survive unscathed when subjected to the extremely high accelerations of a direct ground impact, as you can see in this early test that didn’t quite go according to plan.

In this last video, we use a combination of the previous techniques and add one more capability – a dramatic mid-air stop. Ducted fans are part of this solution, but the high-speed deceleration is principally accomplished by a large water rocket. Then the mechanical legs only have to handle the last ten feet of dropping acceleration.

Whether it’s using water or rocket fuel, the principle underlying a rocket is the same – mass is ejected from the rocket at high speed, producing a reaction force in the opposite direction via Newton’s third law. The higher the flow rate and the denser the fluid, the more force is produced. To get a high flow rate and a quick response time, we needed a wide nozzle that went from closed to open cleanly in a matter of milliseconds. We designed a system using a piece of copper foil and a custom punch mechanism that accomplished just that.

Grant Imahara pressurizes a test tank to evaluate an early valve prototype [left]. The water rocket in action - note the laminar, two-inch-wide flow as it passes through the specially designed nozzleDisney Research

Once the water rocket has brought the robot to a mid-air stop, the ducted fans are able to hold it in a stable hover about ten feet above the deck. When they cut out, the robot falls again and the legs absorb the impact. In the video, the robot has a couple of loose tethers attached as a testing precaution, but they don’t provide any support, power, or guidance.

“It might not be so obvious as to what this can be directly used for today, but these rough proof-of-concept experiments show that we might be able to work within real-world physics to do the high falls our characters do on the big screen, and someday actually stick the landing,” explains Tony Dohi, the project lead.

There are still a large number of problems for future projects to address. Most characters have legs that bend on hinges rather than compress like pistons, and don’t wear a belt made of ducted fans. Beyond issues of packaging and form, making sure that the robot lands exactly where it intends to land has interesting implications for perception and control. Regardless, we think we can confirm that this kind of entrance has–if you’ll excuse the pun–quite the impact.



It’s hard to think of a more dramatic way to make an entrance than falling from the sky. While it certainly happens often enough on the silver screen, whether or not it can be done in real life is a tantalizing challenge for our entertainment robotics team at Disney Research.

Falling is tricky for two reasons. The first and most obvious is what Douglas Adams referred to as “the sudden stop at the end.” Every second of free fall means another 9.8 m/s of velocity, and that can quickly add up to an extremely difficult energy dissipation problem. The other tricky thing about falling, especially for terrestrial animals like us, is that our normal methods for controlling our orientation disappear. We are used to relying on contact forces between our body and the environment to control which way we’re pointing. In the air, there’s nothing to push on except the air itself!

Finding a solution to these problems is a big, open-ended challenge. In the clip below, you can see one approach we’ve taken to start chipping away at it.

The video shows a small, stick-like robot with an array of four ducted fans attached to its top. The robot has a piston-like foot that absorbs the impact of a small fall, and then the ducted fans keep the robot standing by counteracting any tilting motion using aerodynamic thrust.

Raphael Pilon [left] and Marcela de los Rios evaluate the performance of the monopod balancing robot.Disney Research

The standing portion demonstrates that pushing on the air isn’t only useful during freefall. Conventional walking and hopping robots depend on ground contact forces to maintain the required orientation. These forces can ramp up quickly because of the stiffness of the system, necessitating high bandwidth control strategies. Aerodynamic forces are relatively soft, but even so, they were sufficient to keep our robots standing. And since these forces can also be applied during the flight phase of running or hopping, this approach might lead to robots that run before they walk. The thing that defines a running gait is the existence of a “flight phase” - a time when none of the feet are in contact with the ground. A running robot with aerodynamic control authority could potentially use a gait with a long flight phase. This would shift the burden of the control effort to mid-flight, simplifying the leg design and possibly making rapid bipedal motion more tractable than a moderate pace.

Richard Landon uses a test rig to evaluate the thrust profile of a ducted fan.Disney Research

In the next video, a slightly larger robot tackles a much more dramatic fall, from 65’ in the air. This simple machine has two piston-like feet and a similar array of ducted fans on top. The fans not only stabilize the robot upon landing, they also help keep it oriented properly as it falls. Inside each foot is a plug of single-use compressible foam. Crushing the foam on impact provides a nice, constant force profile, which maximizes the amount of energy dissipated per inch of contraction.

In the case of this little robot, the mechanical energy dissipation in the pistons is less than the total energy needed to be dissipated from the fall, so the rest of the mechanism takes a pretty hard hit. The size of the robot is an advantage in this case, because scaling laws mean that the strength-to-weight ratio is in its favor.

The strength of a component is a function of its cross-sectional area, while the weight of a component is a function of its volume. Area is proportional to length squared, while volume is proportional to length cubed. This means that as an object gets smaller, its weight becomes relatively small. This is why a toddler can be half the height of an adult but only a fraction of that adult’s weight, and why ants and spiders can run around on long, spindly legs. Our tiny robots take advantage of this, but we can’t stop there if we want to represent some of our bigger characters.

Louis Lambie and Michael Lynch assemble an early ducted fan test platform. The platform was mounted on guidewires and was used for lifting capacity tests.Disney Research

In most aerial robotics applications, control is provided by a system that is capable of supporting the entire weight of the robot. In our case, being able to hover isn’t a necessity. The clip below shows an investigation into how much thrust is needed to control the orientation of a fairly large, heavy robot. The robot is supported on a gimbal, allowing it to spin freely. At the extremities are mounted arrays of ducted fans. The fans don’t have enough force to keep the frame in the air, but they do have a lot of control authority over the orientation.

Complicated robots are less likely to survive unscathed when subjected to the extremely high accelerations of a direct ground impact, as you can see in this early test that didn’t quite go according to plan.

In this last video, we use a combination of the previous techniques and add one more capability – a dramatic mid-air stop. Ducted fans are part of this solution, but the high-speed deceleration is principally accomplished by a large water rocket. Then the mechanical legs only have to handle the last ten feet of dropping acceleration.

Whether it’s using water or rocket fuel, the principle underlying a rocket is the same – mass is ejected from the rocket at high speed, producing a reaction force in the opposite direction via Newton’s third law. The higher the flow rate and the denser the fluid, the more force is produced. To get a high flow rate and a quick response time, we needed a wide nozzle that went from closed to open cleanly in a matter of milliseconds. We designed a system using a piece of copper foil and a custom punch mechanism that accomplished just that.

Grant Imahara pressurizes a test tank to evaluate an early valve prototype [left]. The water rocket in action - note the laminar, two-inch-wide flow as it passes through the specially designed nozzleDisney Research

Once the water rocket has brought the robot to a mid-air stop, the ducted fans are able to hold it in a stable hover about ten feet above the deck. When they cut out, the robot falls again and the legs absorb the impact. In the video, the robot has a couple of loose tethers attached as a testing precaution, but they don’t provide any support, power, or guidance.

“It might not be so obvious as to what this can be directly used for today, but these rough proof-of-concept experiments show that we might be able to work within real-world physics to do the high falls our characters do on the big screen, and someday actually stick the landing,” explains Tony Dohi, the project lead.

There are still a large number of problems for future projects to address. Most characters have legs that bend on hinges rather than compress like pistons, and don’t wear a belt made of ducted fans. Beyond issues of packaging and form, making sure that the robot lands exactly where it intends to land has interesting implications for perception and control. Regardless, we think we can confirm that this kind of entrance has–if you’ll excuse the pun–quite the impact.



Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion.

ICRA 2024: 13–17 May 2024, YOKOHAMA, JAPANRoboCup 2024: 17–22 July 2024, EINDHOVEN, NETHERLANDSICSR 2024: 23–26 October 2024, ODENSE, DENMARKCybathlon 2024: 25–27 October 2024, ZURICH

Enjoy today’s videos!

Festo has robot bees!

It’s a very clever design, but the size makes me terrified of whatever the bees are that Festo seems to be familiar with.

[ Festo ]

Boing, boing, boing!

[ USC ]

Why the heck would you take the trouble to program a robot to make sweet potato chips and then not scarf them down yourself?

[ Dino Robotics ]

Mobile robots can transport payloads far greater than their mass through vehicle traction. However, off-road terrain features substantial variation in height, grade, and friction, which can cause traction to degrade or fail catastrophically. This paper presents a system that utilizes a vehicle-mounted, multipurpose manipulator to physically adapt the robot with unique anchors suitable for a particular terrain for autonomous payload transport.

[ DART Lab ]

Turns out that working on a collaborative task with a robot can make humans less efficient, because we tend to overestimate the robot’s capabilities.

[ CHI 2024 ]

Wing posts a video with the title “What Do Wing’s Drones Sound Like” but only includes a brief snippet—though nothing without background room noise—revealing to curious viewers and listeners exactly what Wing’s drones sound like.

Because, look, a couple seconds of muted audio underneath a voiceover is in fact not really answering the question.

[ Wing ]

This first instance of ROB 450 in Winter 2024 challenged students to synthesize the knowledge acquired through their Robotics undergraduate courses at the University of Michigan to use a systematic and iterative design and analysis process and apply it to solving a real, open-ended Robotics problem.

[ Michigan Robotics ]

This Microsoft Future Leaders in Robotics and AI Seminar is from Catie Cuan at Stanford, on “Choreorobotics: Teaching Robots How to Dance With Humans.”

As robots transition from industrial and research settings into everyday environments, robots must be able to (1) learn from humans while benefiting from the full range of the humans’ knowledge and (2) learn to interact with humans in safe, intuitive, and social ways. I will present a series of compelling robot behaviors, where human perception and interaction are foregrounded in a variety of tasks.

[ UMD ]



Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion.

ICRA 2024: 13–17 May 2024, YOKOHAMA, JAPANRoboCup 2024: 17–22 July 2024, EINDHOVEN, NETHERLANDSICSR 2024: 23–26 October 2024, ODENSE, DENMARKCybathlon 2024: 25–27 October 2024, ZURICH

Enjoy today’s videos!

Festo has robot bees!

It’s a very clever design, but the size makes me terrified of whatever the bees are that Festo seems to be familiar with.

[ Festo ]

Boing, boing, boing!

[ USC ]

Why the heck would you take the trouble to program a robot to make sweet potato chips and then not scarf them down yourself?

[ Dino Robotics ]

Mobile robots can transport payloads far greater than their mass through vehicle traction. However, off-road terrain features substantial variation in height, grade, and friction, which can cause traction to degrade or fail catastrophically. This paper presents a system that utilizes a vehicle-mounted, multipurpose manipulator to physically adapt the robot with unique anchors suitable for a particular terrain for autonomous payload transport.

[ DART Lab ]

Turns out that working on a collaborative task with a robot can make humans less efficient, because we tend to overestimate the robot’s capabilities.

[ CHI 2024 ]

Wing posts a video with the title “What Do Wing’s Drones Sound Like” but only includes a brief snippet—though nothing without background room noise—revealing to curious viewers and listeners exactly what Wing’s drones sound like.

Because, look, a couple seconds of muted audio underneath a voiceover is in fact not really answering the question.

[ Wing ]

This first instance of ROB 450 in Winter 2024 challenged students to synthesize the knowledge acquired through their Robotics undergraduate courses at the University of Michigan to use a systematic and iterative design and analysis process and apply it to solving a real, open-ended Robotics problem.

[ Michigan Robotics ]

This Microsoft Future Leaders in Robotics and AI Seminar is from Catie Cuan at Stanford, on “Choreorobotics: Teaching Robots How to Dance With Humans.”

As robots transition from industrial and research settings into everyday environments, robots must be able to (1) learn from humans while benefiting from the full range of the humans’ knowledge and (2) learn to interact with humans in safe, intuitive, and social ways. I will present a series of compelling robot behaviors, where human perception and interaction are foregrounded in a variety of tasks.

[ UMD ]



For years, Shadow Robot Company’s Shadow Hand has arguably been the gold standard for robotic manipulation. Beautiful and expensive, it is able to mimic the form factor and functionality of human hands, which has made it ideal for complex tasks. I’ve personally experienced how amazing it is to use Shadow Hands in a teleoperation context, and it’s hard to imagine anything better.

The problem with the original Shadow hand was (and still is) fragility. In a research environment, this has been fine, except that research is changing: Roboticists no longer carefully program manipulation tasks by, uh, hand. Now it’s all about machine learning, in which you need robotic hands to massively fail over and over again until they build up enough data to understand how to succeed.

“We’ve aimed for robustness and performance over anthropomorphism and human size and shape.” —Rich Walker, Shadow Robot Company

Doing this with a Shadow Hand was just not realistic, which Google DeepMind understood five years ago when it asked Shadow Robot to build it a new hand with hardware that could handle the kind of training environments that now typify manipulation research. So Shadow Robot spent the last half-decade-ish working on a new, three-fingered Shadow Hand, which the company unveiled today. The company is calling it, appropriately enough, “the new Shadow Hand.”

As you can see, this thing is an absolute beast. Shadow Robot says that the new hand is “robust against a significant amount of misuse, including aggressive force demands, abrasion and impacts.” Part of the point, though, is that what robot-hand designers might call “misuse,” robot-manipulation researchers might very well call “progress,” and the hand is designed to stand up to manipulation research that pushes the envelope of what robotic hardware and software are physically capable of.

Shadow Robot understands that despite its best engineering efforts, this new hand will still occasionally break (because it’s a robot and that’s what robots do), so the company designed it to be modular and easy to repair. Each finger is its own self-contained unit that can be easily swapped out, with five Maxon motors in the base of the finger driving the four finger joints through cables in a design that eliminates backlash. The cables themselves will need replacement from time to time, but it’s much easier to do this on the new Shadow Hand than it was on the original. Shadow Robot says that you can swap out an entire New Hand’s worth of cables in the same time it would take you to replace a single cable on the old hand.

Shadow Robot

The new Shadow Hand itself is somewhat larger than a typical human hand, and heavier too: Each modular finger unit weighs 1.2 kilograms, and the entire three-fingered hand is just over 4 kg. The fingers have humanlike kinematics, and each joint can move up to 180 degrees per second with the capability of exerting at least 8 newtons of force at each fingertip. Both force control and position control are available, and the entire hand runs Robot Operating System, the Open Source Robotics Foundation’s collection of open-source software libraries and tools.

One of the coolest new features of this hand is the tactile sensing. Shadow Robot has decided to take the optical route with fingertip sensors, GelSight-style. Each fingertip is covered in soft, squishy gel with thousands of embedded particles. Cameras in the fingers behind the gel track each of those particles, and when the fingertip touches something, the particles move. Based on that movement, the fingertips can very accurately detect the magnitude and direction of even very small forces. And there are even more sensors on the insides of the fingers too, with embedded Hall effect sensors to help provide feedback during grasping and manipulation tasks.

Shadow Robot

The most striking difference here is how completely different of a robotic-manipulation philosophy this new hand represents for Shadow Robot. “We’ve aimed for robustness and performance over anthropomorphism and human size and shape,” says Rich Walker, director of Shadow Robot Company. “There’s a very definite design choice there to get something that really behaves much more like an optimized manipulator rather than a humanlike hand.”

Walker explains that Shadow Robot sees two different approaches to manipulation within the robotics community right now: There’s imitation learning, where a human does a task and then a robot tries to do the task the same way, and then there’s reinforcement learning, where a robot tries to figure out how do the task by itself. “Obviously, this hand was built from the ground up to make reinforcement learning easy.”

The hand was also built from the ground up to be rugged and repairable, which had a significant effect on the form factor. To make the fingers modular, they have to be chunky, and trying to cram five of them onto one hand was just not practical. But because of this modularity, Shadow Robot could make you a five-fingered hand if you really wanted one. Or a two-fingered hand. Or (and this is the company’s suggestion, not mine) “a giant spider.” Really, though, it’s probably not useful to get stuck on the form factor. Instead, focus more on what the hand can do. In fact, Shadow Robot tells me that the best way to think about the hand in the context of agility is as having three thumbs, not three fingers, but Walker says that “if we describe it as that, people get confused.”

There’s still definitely a place for the original anthropomorphic Shadow Hand, and Shadow Robot has no plans to discontinue it. “It’s clear that for some people anthropomorphism is a deal breaker, they have to have it,” Walker says. “But for a lot of people, the idea that they could have something which is really robust and dexterous and can gather lots of data, that’s exciting enough to be worth saying okay, what can we do with this? We’re very interested to find out what happens.”

The Shadow New Hand is available now, starting at about US $74,000 depending on configuration.



For years, Shadow Robot Company’s Shadow Hand has arguably been the gold standard for robotic manipulation. Beautiful and expensive, it is able to mimic the form factor and functionality of human hands, which has made it ideal for complex tasks. I’ve personally experienced how amazing it is to use Shadow Hands in a teleoperation context, and it’s hard to imagine anything better.

The problem with the original Shadow hand was (and still is) fragility. In a research environment, this has been fine, except that research is changing: Roboticists no longer carefully program manipulation tasks by, uh, hand. Now it’s all about machine learning, in which you need robotic hands to massively fail over and over again until they build up enough data to understand how to succeed.

“We’ve aimed for robustness and performance over anthropomorphism and human size and shape.” —Rich Walker, Shadow Robot Company

Doing this with a Shadow Hand was just not realistic, which Google DeepMind understood five years ago when it asked Shadow Robot to build it a new hand with hardware that could handle the kind of training environments that now typify manipulation research. So Shadow Robot spent the last half-decade-ish working on a new, three-fingered Shadow Hand, which the company unveiled today. The company is calling it, appropriately enough, “the new Shadow Hand.”

As you can see, this thing is an absolute beast. Shadow Robot says that the new hand is “robust against a significant amount of misuse, including aggressive force demands, abrasion and impacts.” Part of the point, though, is that what robot-hand designers might call “misuse,” robot-manipulation researchers might very well call “progress,” and the hand is designed to stand up to manipulation research that pushes the envelope of what robotic hardware and software are physically capable of.

Shadow Robot understands that despite its best engineering efforts, this new hand will still occasionally break (because it’s a robot and that’s what robots do), so the company designed it to be modular and easy to repair. Each finger is its own self-contained unit that can be easily swapped out, with five Maxon motors in the base of the finger driving the four finger joints through cables in a design that eliminates backlash. The cables themselves will need replacement from time to time, but it’s much easier to do this on the new Shadow Hand than it was on the original. Shadow Robot says that you can swap out an entire New Hand’s worth of cables in the same time it would take you to replace a single cable on the old hand.

Shadow Robot

The new Shadow Hand itself is somewhat larger than a typical human hand, and heavier too: Each modular finger unit weighs 1.2 kilograms, and the entire three-fingered hand is just over 4 kg. The fingers have humanlike kinematics, and each joint can move up to 180 degrees per second with the capability of exerting at least 8 newtons of force at each fingertip. Both force control and position control are available, and the entire hand runs Robot Operating System, the Open Source Robotics Foundation’s collection of open-source software libraries and tools.

One of the coolest new features of this hand is the tactile sensing. Shadow Robot has decided to take the optical route with fingertip sensors, GelSight-style. Each fingertip is covered in soft, squishy gel with thousands of embedded particles. Cameras in the fingers behind the gel track each of those particles, and when the fingertip touches something, the particles move. Based on that movement, the fingertips can very accurately detect the magnitude and direction of even very small forces. And there are even more sensors on the insides of the fingers too, with embedded Hall effect sensors to help provide feedback during grasping and manipulation tasks.

Shadow Robot

The most striking difference here is how completely different of a robotic-manipulation philosophy this new hand represents for Shadow Robot. “We’ve aimed for robustness and performance over anthropomorphism and human size and shape,” says Rich Walker, director of Shadow Robot Company. “There’s a very definite design choice there to get something that really behaves much more like an optimized manipulator rather than a humanlike hand.”

Walker explains that Shadow Robot sees two different approaches to manipulation within the robotics community right now: There’s imitation learning, where a human does a task and then a robot tries to do the task the same way, and then there’s reinforcement learning, where a robot tries to figure out how do the task by itself. “Obviously, this hand was built from the ground up to make reinforcement learning easy.”

The hand was also built from the ground up to be rugged and repairable, which had a significant effect on the form factor. To make the fingers modular, they have to be chunky, and trying to cram five of them onto one hand was just not practical. But because of this modularity, Shadow Robot could make you a five-fingered hand if you really wanted one. Or a two-fingered hand. Or (and this is the company’s suggestion, not mine) “a giant spider.” Really, though, it’s probably not useful to get stuck on the form factor. Instead, focus more on what the hand can do. In fact, Shadow Robot tells me that the best way to think about the hand in the context of agility is as having three thumbs, not three fingers, but Walker says that “if we describe it as that, people get confused.”

There’s still definitely a place for the original anthropomorphic Shadow Hand, and Shadow Robot has no plans to discontinue it. “It’s clear that for some people anthropomorphism is a deal breaker, they have to have it,” Walker says. “But for a lot of people, the idea that they could have something which is really robust and dexterous and can gather lots of data, that’s exciting enough to be worth saying okay, what can we do with this? We’re very interested to find out what happens.”

The Shadow New Hand is available now, starting at about US $74,000 depending on configuration.

Pages