IEEE Spectrum Automation

IEEE Spectrum
Subscribe to IEEE Spectrum Automation feed IEEE Spectrum Automation

Cigarette butts are the second most common undisposed-of litter on Earth—of the six trillion-ish cigarettes inhaled every year, it’s estimated that at over four trillion of the butts are just tossed onto the ground, each one leeching over 700 different toxic chemicals into the environment. Let’s not focus on the fact that all those toxic chemicals are also going into people’s lungs, and instead talk about the ecosystem damage that they can do and also just the general grossness of having bits of sucked-on trash everywhere. Ew.

Preventing those cigarette butts from winding up on the ground in the first place would be the best option, but would require a pretty big shift in human behavior. Operating under the assumption that humans changing their behavior is a non-starter, roboticists from the Dynamic Legged Systems unit at the Italian Institute of Technology (IIT) in Genoa have instead designed a novel platform for cigarette butt cleanup in the form of a quadrupedal robot with vacuums attached to its feet.


There are, of course, far more efficient ways of at least partially automating the cleanup of litter with machines. The challenge is that most of that automation relies on mobility systems with wheels, which won’t work on the many beautiful beaches (and many beautiful flights of stairs) of Genoa. In places like these, it still falls to humans to do the hard work, which is less than ideal.

This robot, developed in Claudio Semini’s lab at IIT, is called VERO (Vacuum-cleaner Equipped RObot). It’s based around an AlienGo from Unitree, with a commercial vacuum mounted on its back. Hoses go from the vacuum down the leg to each foot, with a custom 3D printed nozzle that puts as much suction near the ground as possible without tripping the robot up. While the vacuum is novel, the real contribution here is how the robot autonomously locates things on the ground and then plans out how to interact with those things using its feet.

First, an operator designates an area for VERO to clean, after which the robot operates by itself. After calculating an exploration path to explore the entire area, the robot uses its onboard cameras and a neural network to detect cigarette butts. This is trickier than it sounds, because there may be a lot of cigarette butts on the ground, and they all probably look pretty much the same, so the system has to filter out all of the potential duplicates. The next step is to plan out its next steps—VERO has to plan footsteps to put the vacuum side of one of its feet right next to each cigarette butt, while calculating a safe, stable pose for the rest of its body. Since this whole process can take place on sand or stairs or other uneven surfaces, VERO has to prioritize not falling over before it decides how to do the collection. The final collecting maneuver is fine tuned using an extra Intel RealSense depth camera mounted on the robot’s chin.

VERO has been tested successfully in six different scenarios that challenge both its locomotion and detection capabilities.IIT

Initial testing with the robot in a variety of different environments showed that it could successfully collect just under 90 percent of cigarette butts, which I bet is better than I could do, and I’m also much more likely to get fed up with the whole process. The robot is not very quick at the task, but unlike me it will never get fed up as long as it’s got energy in its battery, so speed is somewhat less important.

As far as the authors of this paper are aware (and I assume they’ve done their research), this is “the first time that the legs of a legged robot are concurrently utilized for locomotion and for a different task.” This is distinct from other robots that can (for example) open doors with their feet, because those robots stop using the feet as feet for a while and instead use them as manipulators.

So, this is about a lot more than cigarette butts, and the researchers suggest a variety of other potential use cases, including spraying weeds in crop fields, inspecting cracks in infrastructure, and placing nails and rivets during construction.

Some use cases include potentially doing multiple things at the same time, like planting different kinds of seeds, using different surface sensors, or driving both nails and rivets. And since quadrupeds have four feet, they could potentially host four completely different tools, and the software that the researchers developed for VERO can be slightly modified to put whatever foot you want on whatever spot you need.

VERO: A vacuum‐cleaner‐equipped quadruped robot for efficient litter removal, by Lorenzo Amatucci, Giulio Turrisi, Angelo Bratta, Victor Barasuol, and Claudio Semini from IIT, was published in the Journal of Field Robotics.

Scientists in China have built what they claim to be the smallest and lightest solar-powered aerial vehicle. It’s small enough to sit in the palm of a person’s hand, weighs less than a U.S. nickel, and can fly indefinitely while the sun shines on it.

Micro aerial vehicles (MAVs) are insect- and bird-size aircraft that might prove useful for reconnaissance and other possible applications. However, a major problem that MAVs currently face is their limited flight times, usually about 30 minutes. Ultralight MAVs—those weighing less than 10 grams—can often only stay aloft for less than 10 minutes.

One potential way to keep MAVs flying longer is to power them with a consistent source of energy such as sunlight. Now, in a new study, researchers have developed what they say is the first solar-powered MAV capable of sustained flight.

The new ultralight MAV, CoulombFly, is just 4.21g with a wingspan of 20 centimeters. That’s about 10 times as small as and roughly 600 times as light as the previous smallest sunlight-powered aircraft, a quadcopter that’s 2 meters wide and weighs 2.6 kilograms.

Sunlight powered flight test Nature

“My ultimate goal is to make a super tiny flying vehicle, about the size and weight of a mosquito, with a wingspan under 1 centimeter,” says Mingjing Qi, a professor of energy and power engineering at Beihang University in Beijing. Qi and the scientists who built CoulombFly developed a prototype of such an aircraft, measuring 8 millimeters wide and 9 milligrams in mass, “but it can’t fly on its own power yet. I believe that with the ongoing development of microcircuit technology, we can make this happen.”

Previous sunlight-powered aerial vehicles typically rely on electromagnetic motors, which use electromagnets to generate motion. However, the smaller a solar-powered aircraft gets, the less surface area it has with which to collect sunlight, reducing the amount of energy it can generate. In addition, the efficiency of electromagnetic motors decrease sharply as vehicles shrink in size. Smaller electromagnetic motors experience comparably greater friction than larger ones, as well as greater energy losses due to electrical resistance from their components. This results in low lift-to-power efficiencies, Qi and his colleagues explain.

CoulombFly instead employs an electrostatic motor, which produce motion using electrostatic fields. Electrostatic motors are generally used as sensors in microelectromechanical systems (MEMS), not for aerial propulsion. Nevertheless, with a mass of only 1.52 grams, the electrostatic motor the scientists used has a lift-to-power efficiency two to three times that of other MAV motors.

The electrostatic motor has two nested rings. The inner ring is a spinning rotor that possesses 64 slats, each made of a carbon fiber sheet covered with aluminum foil. It resembles a wooden fence curved into a circle, with gaps between the fence’s posts. The outer ring is equipped eight alternating pairs of positive and negative electrode plates, which are each also made of a carbon fiber sheet bonded to aluminum foil. Each plate’s edge also possesses a brush made of aluminum that touches the inner ring’s slats.

Above CoulombFly’s electrostatic motor is a propeller 20 cm wide and connected to the rotor. Below the motor are two high-power-density thin-film gallium arsenide solar cells, each 4 by 6 cm in size, with a mass of 0.48 g and an energy conversion efficiency of more than 30 percent.

Sunlight electrically charges CoulombFly’s outer ring, and its 16 plates generate electric fields. The brushes on the outer ring’s plates touch the inner ring, electrically charging the rotor slats. The electric fields of the outer ring’s plates exert force on the charged rotor slats, making the inner ring and the propeller spin.

In tests under natural sunlight conditions—about 920 watts of light per square meter—CoulombFly successfully took off within one second and sustained flight for an hour without any deterioration in performance. Potential applications for sunlight-powered MAVs may include long-distance and long-duration aerial reconnaissance, the researchers say.

Long term test for hovering operation Nature

CoulombFly’s propulsion system can generate up to 5.8 g of lift. This means it could support an extra payload of roughly 1.59 g, which is “sufficient to accommodate the smallest available sensors, controllers, cameras and so on” to support future autonomous operations, Qi says. ”Right now, there’s still a lot of room to improve things like motors, propellers, and circuits, so we think we can get the extra payload up to 4 grams in the future. If we need even more payload, we could switch to quadcopters or fixed-wing designs, which can carry up to 30 grams.”

Qi adds “it should be possible for the vehicle to carry a tiny lithium-ion battery.” That means it could store energy from its solar panels and fly even when the sun is not out, potentially enabling 24-hour operations.

In the future, “we plan to use this propulsion system in different types of flying vehicles, like fixed-wing and rotorcraft,” Qi says.

The scientists detailed their findings online 17 July in the journal Nature.

Among the many things that humans cannot do (without some fairly substantial modification) is shifting our body morphology around on demand. It sounds a little extreme to be talking about things like self-amputation, and it is a little extreme, but it’s also not at all uncommon for other animals to do—lizards can disconnect their tails to escape a predator, for example. And it works in the other direction, too, with animals like ants adding to their morphology by connecting to each other to traverse gaps that a single ant couldn’t cross alone.

In a new paper, roboticists from The Faboratory at Yale University have given a soft robot the ability to detach and reattach pieces of itself, editing its body morphology when necessary. It’s a little freaky to watch, but it kind of makes me wish I could do the same thing.

Faboratory at Yale

These are fairly standard soft-bodied silicon robots that use asymmetrically stiff air chambers that inflate and deflate (using a tethered pump and valves) to generate a walking or crawling motion. What’s new here are the joints, which rely on a new material called a bicontinuous thermoplastic foam (BTF) to form a supportive structure for a sticky polymer that’s solid at room temperature but can be easily melted.

The BTF acts like a sponge to prevent the polymer from running out all over the place when it melts, and means that you can pull two BTF surfaces apart by melting the joint, and stick them together again by reversing the procedure. The process takes about 10 minutes and the resulting joint is quite strong. It’s also good for a couple hundred dettach/reattach cycles before degrading. It even stands up to dirt and water reasonably well.

Faboratory at Yale

This kind of thing has been done before with mechanical connections and magnets and other things like that—getting robots to attach to and detach from other robots is a foundational technique for modular robotics, after all. But these systems are inherently rigid, which is bad for soft robots, whose whole thing is about not being rigid. It’s all very preliminary, of course, because there are plenty of rigid things attached to these robots with tubes and wires and stuff. And there’s no autonomy or payloads here either. That’s not the point, though—the point is the joint, which (as the researchers point out) is “the first instantiation of a fully soft reversible joint” resulting in the “potential for soft artificial systems [that can] shape change via mass addition and subtraction.”

Self-Amputating and Interfusing Machines, by Bilige Yang, Amir Mohammadi Nasab, Stephanie J. Woodman, Eugene Thomas, Liana G. Tilton, Michael Levin, and Rebecca Kramer-Bottiglio from Yale, was published in May in Advanced Materials.


Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion.

RoboCup 2024: 17–22 July 2024, EINDHOVEN, NETHERLANDSICRA@40: 23–26 September 2024, ROTTERDAM, NETHERLANDSIROS 2024: 14–18 October 2024, ABU DHABI, UAEICSR 2024: 23–26 October 2024, ODENSE, DENMARKCybathlon 2024: 25–27 October 2024, ZURICH

Enjoy today’s videos!

At ICRA 2024, Spectrum editor Evan Ackerman sat down with Unitree Founder and CEO Xingxing Wang and Tony Yang, VP of Business Development, to talk about the company’s newest humanoid, the G1 model.

[ Unitree ]



From navigating uneven terrain outside the lab to pure vision perception, GR-1 continues to push the boundaries of what’s possible.

[ Fourier ]

Aerial manipulation has gained interest for completing high-altitude tasks that are challenging for human workers, such as contact inspection and defect detection. This letter addresses a more general and dynamic task: simultaneously tracking time-varying contact force and motion trajectories on tangential surfaces. We demonstrate the approach on an aerial calligraphy task using a novel sponge pen design as the end-effector.

[ CMU ]

LimX Dynamics Biped Robot P1 was kicked and hit: Faced with random impacts in a crowd, P1 with its new design once again showcased exceptional stability as a mobility platform.

[ LimX Dynamics ]

Thanks, Ou Yan!

This is from ICRA 2018, but it holds up pretty well in the novelty department.


I think someone needs to crank the humor setting up on this one.

[ Deep Robotics ]

The paper summarizes the work at the Micro Air Vehicle Laboratory on end-to-end neural control of quadcopters. A major challenge in bringing these controllers to life is the “reality gap” between the real platform and the training environment. To address this, we combine online identification of the reality gap with pre-trained corrections through a deep neural controller, which is orders of magnitude more efficient than traditional computation of the optimal solution.

[ MAVLab ]

This is a dedicated Track Actuator from HEBI Robotics. Why they didn’t just call it a “tracktuator” is beyond me.

[ HEBI Robotics ]

Menteebot can navigate complex environments by combining a 3D model of the world with a dynamic obstacle map. On the first day in a new location, Menteebot generates the 3D model by following a person who shows the robot around.

[ Mentee Robotics ]

Here’s that drone with a 68kg payload and 70km range you’ve always wanted.

[ Malloy ]

AMBIDEX is a dual-armed robot with an innovative mechanism developed for safe coexistence with humans. Based on an innovative cable structure, it is designed to be both strong and stable.

[ NAVER Labs ]

As quadrotors take on an increasingly diverse range of roles, researchers often need to develop new hardware platforms tailored for specific tasks, introducing significant engineering overhead. In this article, we introduce the UniQuad series, a unified and versatile quadrotor hardware platform series that offers high flexibility to adapt to a wide range of common tasks, excellent customizability for advanced demands, and easy maintenance in case of crashes.


The video demonstrates the field testing of a 43 kg (95 lb) amphibious cycloidal propeller unmanned underwater vehicle (Cyclo-UUV) developed at the Advanced Vertical Flight Laboratory, Texas A&M University. The vehicle utilizes a combination of cycloidal propellers (or cyclo-propellers), screw propellers, and tank treads for operations on land and underwater.

[ TAMU ]

The “pill” (the package hook) on Wing’s delivery drones is a crucial component to our aircraft! Did you know our package hook is designed to be aerodynamic and has stable flight characteristics, even at 65 mph?

[ Wing ]

Happy 50th to robotics at ABB!

[ ABB ]

This JHU Center for Functional Anatomy & Evolution Seminar is by Chen Li, on Terradynamics of Animals & Robots in Complex Terrain.

[ JHU ]

Food prep is one of those problems that seems like it should be solvable by robots. It’s a predictable, repetitive, basic manipulation task in a semi-structured environment—seems ideal, right? And obviously there’s a huge need, because human labor is expensive and getting harder and harder to find in these contexts. There are currently over a million unfilled jobs in the food industry in the United States, and even with jobs that are filled, the annual turnover rate is 150 percent (meaning a lot of workers don’t even last a year).

Food prep seems like a great opportunity for robots, which is why Chef Robotics and a handful of other robotics companies tackled it a couple years ago by bringing robots to fast casual restaurants like Chipotle or Sweetgreen, where you get served a custom-ish meal from a selection of ingredients at a counter.

But this didn’t really work out, for a couple of reasons. First, doing things that are mostly effortless for humans are inevitably extremely difficult for robots. And second, humans actually do a lot of useful things in a restaurant context besides just putting food onto plates, and the robots weren’t up for all of those things.

Still, Chef Robotics founder and CEO Rajat Bhageria wasn’t ready to let this opportunity go. “The food market is arguably the biggest market that’s tractable for AI today,” he told IEEE Spectrum. And with a bit of a pivot away from the complicated mess of fast casual restaurants, Chef Robotics has still managed to prepare over 20 million meals thanks to autonomous robot arms deployed all over North America. Without knowing it, you may even have eaten such a meal.

“The hard thing is, can you pick fast? Can you pick consistently? Can you pick the right portion size without spilling? And can you pick without making it look like the food was picked by a machine?” —Rajat Bhageria, Chef Robotics

When we spoke with Bhageria, he explained that there are three basic tasks involved in prepared food production: prep (tasks like chopping ingredients), the actual cooking process, and then assembly (or plating). Of these tasks, prep scales pretty well with industrial automation in that you can usually order pre-chopped or mixed ingredients, and cooking also scales well since you can cook more with only a minimal increase in effort just by using a bigger pot or pan or oven. What doesn’t scale well is the assembly, especially when any kind of flexibility or variety is required. You can clearly see this in action at any fast casual restaurant, where a couple of people are in the kitchen cooking up massive amounts of food while each customer gets served one at a time.

So with that bottleneck identified, let’s throw some robots at the problem, right? And that’s exactly what Chef Robotics did, explains Bhageria: “we went to our customers, who said that their biggest pain point was labor, and the most labor is in assembly, so we said, we can help you solve this.”

Chef Robotics started with fast casual restaurants. They weren’t the first to try this—many other robotics companies had attempted this before, with decidedly mixed results. “We actually had some good success in the early days selling to fast casual chains,” Bhageria says, “but then we had some technical obstacles. Essentially, if we want to have a human-equivalent system so that we can charge a human-equivalent service fee for our robot, we need to be able to do every ingredient. You’re either a full human equivalent, or our customers told us it wouldn’t be useful.”

Part of the challenge is that training robots do perform all of the different manipulations required for different assembly tasks requires different kinds of real world data. That data simply doesn’t exist—or, if it does, any company that has it knows what it’s worth and isn’t sharing. You can’t easily simulate this kind of data, because food can be gross and difficult to handle, whether it’s gloopy or gloppy or squishy or slimy or unpredictably deformable in some other way, and you really need physical experience to train a useful manipulation model.

Setting fast casual restaurants aside for a moment, what about food prep situations where things are as predictable as possible, like mass-produced meals? We’re talking about food like frozen dinners, that have a handful of discrete ingredients packed into trays at factory scale. Frozen meal production relies on automation rather than robotics because the scale is such that the cost of dedicated equipment can be justified.

There’s a middle ground, though, where robots have found (some) opportunity: When you need to produce a high volume of the same meal, but that meal changes regularly. For example, think of any kind of pre-packaged meal that’s made in bulk, just not at frozen-food scale. It’s an opportunity for automation in a structured environment—but with enough variety that actual automation isn’t cost effective. Suddenly, robots and their tiny bit of flexible automation have a chance to be a practical solution.

“We saw these long assembly lines, where humans were scooping food out of big tubs and onto individual trays,” Bhageria says. “They do a lot of different meals on these lines; it’s going to change over and they’re going to do different meals throughout the week. But at any given moment, each person is doing one ingredient, and maybe on a weekly basis, that person would do six ingredients. This was really compelling for us because six ingredients is something we can bootstrap in a lab. We can get something good enough and if we can get something good enough, then we can ship a robot, and if we can ship a robot to production, then we will get real world training data.”

Chef Robotics has been deploying robot modules that they can slot into existing food assembly lines in place of humans without any retrofitting necessary. The modules consist of six degree of freedom arms wearing swanky IP67 washable suits. To handle different kinds of food, the robots can be equipped with a variety of different utensils (and their accompanying manipulation software strategies). Sensing includes a few depth cameras, as well as a weight-sensing platform for the food tray to ensure consistent amounts of food are picked. And while arms with six degrees of freedom may be overkill for now, eventually the hope is that they’ll be able to handle more complex food like asparagus, where you need to do a little bit more than just scoop.

While Chef Robotics seems to have a viable business here, Bhageria tells us that he keeps coming back to that vision of robots being useful in fast casual restaurants, and eventually, robots making us food in our homes. Making that happen will require time, experience, technical expertise, and an astonishing amount of real-world training data, which is the real value behind those 20 million robot-prepared meals (and counting). The more robots the company deploys, the more data they collect, which will allow them to train their food manipulation models to handle a wider variety of ingredients to open up even more deployments. Their robots, Chef’s website says, “essentially act as data ingestion engines to improve our AI models.”

The next step is likely ghost kitchens where the environment is still somewhat controlled and human interaction isn’t necessary, followed by deployments in commercial kitchens more broadly. But even that won’t be enough for Bhageria, who wants robots that can take over from all of the drudgery in food service: “I’m really excited about this vision,” he says. “How do we deploy hundreds of millions of robots all over the world that allow humans to do what humans do best?”

Against all odds, Ukraine is still standing almost two and a half years after Russia’s massive 2022 invasion. Of course, hundreds of billions of dollars in Western support as well as Russian errors have helped immensely, but it would be a mistake to overlook Ukraine’s creative use of new technologies, particularly drones. While uncrewed aerial vehicles have grabbed most of the attention, it is naval drones that could be the key to bringing Russian president Vladimir Putin to the negotiating table.

These naval-drone operations in the Black Sea against Russian warships and other targets have been so successful that they are prompting, in London, Paris, Washington, and elsewhere, fundamental reevaluations of how drones will affect future naval operations. In August, 2023, for example, the Pentagon launched the billion-dollar Replicator initiative to field air and naval drones (also called sea drones) on a massive scale. It’s widely believed that such drones could be used to help counter a Chinese invasion of Taiwan.

And yet Ukraine’s naval drones initiative grew out of necessity, not grand strategy. Early in the war, Russia’s Black Sea fleet launched cruise missiles into Ukraine and blockaded Odesa, effectively shutting down Ukraine’s exports of grain, metals, and manufactured goods. The missile strikes terrorized Ukrainian citizens and shut down the power grid, but Russia’s blockade was arguably more consequential, devastating Ukraine’s economy and creating food shortages from North Africa to the Middle East.

With its navy seized or sunk during the war’s opening days, Ukraine had few options to regain access to the sea. So Kyiv’s troops got creative. Lukashevich Ivan Volodymyrovych, a brigadier general in the Security Service of Ukraine, the country’s counterintelligence agency, proposed building a series of fast, uncrewed attack boats. In the summer of 2022, the service, which is known by the acronym SBU, began with a few prototype drones. These quickly led to a pair of naval drones that, when used with commercial satellite imagery, off-the-shelf uncrewed aircraft, and Starlink terminals, gave Ukrainian operators the means to sink or disable a third of Russia’s Black Sea Fleet, including the flagship Moskva and most of the fleet’s cruise-missile-equipped warships.

To protect their remaining vessels, Russian commanders relocated the Black Sea Fleet to Novorossiysk, 300 kilometers east of Crimea. This move sheltered the ships from Ukrainian drones and missiles, but it also put them too far away to threaten Ukrainian shipping or defend the Crimean Peninsula. Kyiv has exploited the opening by restoring trade routes and mounting sustained airborne and naval drone strikes against Russian bases on Crimea and the Kerch Strait Bridge connecting the peninsula with Russia.

How Maguras and Sea Babies Hunt and Attack

The first Ukrainian drone boats were cobbled together with parts from jet skis, motorboats, and off-the-shelf electronics. But within months, manufacturers working for the Ukraine defense ministry and SBU fielded several designs that proved their worth in combat, most notably the Magura V5 and the Sea Baby.

Carrying a 300-kilogram warhead, on par with that of a heavyweight torpedo, the Magura V5 is a hunter-killer antiship drone designed to work in swarms that confuse and overwhelm a ship’s defenses. Equipped with Starlink terminals, which connect to SpaceX’s Starlink satellites, and GPS, a group of about three to five Maguras likely moves autonomously to a location near the potential target. From there, operators can wait until conditions are right and then attack the target from multiple angles using remote control and video feeds from the vehicles.

A Ukrainian Magura V5 hunter-killer sea drone was demonstrated at an undisclosed location in Ukraine on 13 April 2024. The domed pod toward the bow, which can rotate from side to side, contains a thermal camera used for guidance and targeting.Valentyn Origrenko/Reuters/Redux

Larger than a Magura, the Sea Baby is a multipurpose vehicle that can carry about 800 kg of explosives, which is close to twice the payload of a Tomahawk cruise missile. A Sea Baby was used in 2023 to inflict substantial damage to the Kerch Strait Bridge. A more recent version carries a rocket launcher that Ukraine troops plan to use against Russian forces along the Dnipro River, which flows through eastern Ukraine and has often formed the frontline in that part of the country. Like a Magura, a Sea Baby is likely remotely controlled using Starlink and GPS. In addition to attack, it’s also equipped for surveillance and logistics.

Russia reduced the threat to its ships by moving them out of the region, but fixed targets like the Kerch Strait Bridge remain vulnerable to Ukrainian sea drones. To try to protect these structures from drone onslaughts, Russian commanders are taking a “kitchen sink” approach, submerging hulks around bridge supports, fielding more guns to shoot at incoming uncrewed vessels, and jamming GPS and Starlink around the Kerch Strait.

Ukrainian service members demonstrated the portable, ruggedized consoles used to remotely guide the Magura V5 naval drones in April 2024.Valentyn Origrenko/Reuters/Redux

While the war remains largely stalemated in the country’s north, Ukraine’s naval drones could yet force Russia into negotiations. The Crimean Peninsula was Moscow’s biggest prize from its decade-long assault on Ukraine. If the Kerch Bridge is severed and the Black Sea Fleet pushed back into Russian ports, Putin may need to end the fighting to regain control over Crimea.

Why the U.S. Navy Embraced the Swarm

Ukraine’s small, low-cost sea drones are offering a compelling view of future tactics and capabilities. But recent experiences elsewhere in the world are highlighting the limitations of drones for some crucial tasks. For example, for protecting shipping from piracy or stopping trafficking and illegal fishing, drones are less useful.

Before the Ukraine war, efforts by the U.S. Department of Defense to field surface sea drones focused mostly on large vehicles. In 2015, the Defense Advanced Research Projects Agency started, and the U.S. Navy later continued, a project that built two uncrewed surface vessels, called Sea Hunter and Sea Hawk. These were 130-tonne sea drones capable of roaming the oceans for up to 70 days while carrying payloads of thousands of pounds each. The point was to demonstrate the ability to detect, follow, and destroy submarines. The Navy and the Pentagon’s secretive Strategic Capabilities Office followed with the Ghost Fleet Overlord uncrewed vessel programs, which produced four larger prototypes designed to carry shipping-container-size payloads of missiles, sensors, or electronic countermeasures.

The U.S. Navy’s newly created Uncrewed Surface Vessel Division 1 ( USVDIV-1) completed a deployment across the Pacific Ocean last year with four medium and large sea drones: Sea Hunter and Sea Hawk and two Overlord vessels, Ranger and Mariner. The five-month deployment from Port Hueneme, Calif., took the vessels to Hawaii, Japan, and Australia, where they joined in annual exercises conducted by U.S. and allied navies. The U.S. Navy continues to assess its drone fleet through sea trials lasting from several days to a few months.

The Sea Hawk is a U.S. Navy trimaran drone vessel designed to find, pursue, and attack submarines. The 130-tonne ship, photographed here in October of 2023 in Sydney Harbor, was built to operate autonomously on missions of up to 70 days, but it can also accommodate human observers on board. Ensign Pierson Hawkins/U.S. Navy

In contrast with Ukraine’s small sea drones, which are usually remotely controlled and operate outside shipping lanes, the U.S. Navy’s much larger uncrewed vessels have to follow the nautical rules of the road. To navigate autonomously, these big ships rely on robust onboard sensors, processing for computer vision and target-motion analysis, and automation based on predictable forms of artificial intelligence, such as expert- or agent-based algorithms rather than deep learning.

But thanks to the success of the Ukrainian drones, the focus and energy in sea drones are rapidly moving to the smaller end of the scale. The U.S. Navy initially envisioned platforms like Sea Hunter conducting missions in submarine tracking, electronic deception, or clandestine surveillance far out at sea. And large drones will still be needed for such missions. However, with the right tactics and support, a group of small sea drones can conduct similar missions as well as other vital tasks.

For example, though they are constrained in speed, maneuverability, and power generation, solar- or sail-powered drones can stay out for months with little human intervention. The earliest of these are wave gliders like the Liquid Robotics (a Boeing company) SHARC, which has been conducting undersea and surface surveillance for the U.S. Navy for more than a decade. Newer designs like the Saildrone Voyager and Ocius Blue Bottle incorporate motors and additional solar or diesel power to haul payloads such as radars, jammers, decoys, or active sonars. The Ocean Aero Triton takes this model one step further: It can submerge, to conduct clandestine surveillance or a surprise attack, or to avoid detection.

The Triton, from Ocean Aero in Gulfport, Miss., is billed as the world’s only autonomous sea drone capable of both cruising underwater and sailing on the surface. Ocean Aero

Ukraine’s success in the Black Sea has also unleashed a flurry of new small antiship attack drones. USVDIV-1 will use the GARC from Maritime Applied Physics Corp. to develop tactics. The Pentagon’s Defense Innovation Unit has also begun purchasing drones for the China-focused Replicator initiative. Among the likely craft being evaluated are fast-attack sea drones from Austin, Texas–based Saronic.

Behind the soaring interest in small and inexpensive sea drones is the changing value proposition for naval drones. As recently as four years ago, military planners were focused on using them to replace crewed ships in “dull, dirty, and dangerous” jobs. But now, the thinking goes, sea drones can provide scale, adaptability, and resilience across each link in the “kill chain” that extends from detecting a target to hitting it with a weapon.

Today, to attack a ship, most navies generally have one preferred sensor (such as a radar system), one launcher, and one missile. But what these planners are now coming to appreciate is that a fleet of crewed surface ships with a collection of a dozen or two naval drones would offer multiple paths to both find that ship and attack it. These craft would also be less vulnerable, because of their dispersion.

Defending Taiwan by Surrounding It With a “Hellscape”

U.S. efforts to protect Taiwan may soon reflect this new value proposition. Many classified and unclassified war games suggest Taiwan and its allies could successfully defend the island—but at costs high enough to potentially dissuade a U.S. president from intervening on Taiwan’s behalf. With U.S. defense budgets capped by law and procurement constrained by rising personnel and maintenance costs, substantially growing or improving today’s U.S. military for this specific purpose is unrealistic. Instead, commanders are looking for creative solutions to slow or stop a Chinese invasion without losing most U.S. forces in the process.

Naval drones look like a good—and maybe the best— solution. The Taiwan Strait is only 160 kilometers (100 miles) wide, and Taiwan’s coastline offers only a few areas where large numbers of troops could come ashore. U.S. naval attack drones positioned on the likely routes could disrupt or possibly even halt a Chinese invasion, much as Ukrainian sea drones have denied Russia access to the western Black Sea and, for that matter, Houthi-controlled drones have sporadically closed off large parts of the Red Sea in the Middle East.

Rather than killer robots seeking out and destroying targets, the drones defending Taiwan would be passively waiting for Chinese forces to illegally enter a protected zone, within which they could be attacked.

The new U.S. Indo-Pacific Command leader, Admiral Sam Paparo, wants to apply this approach to defending Taiwan in a scenario he calls “Hellscape.” In it, U.S. surface and undersea drones would likely be based near Taiwan, perhaps in the Philippines or Japan. When the potential for an invasion rises, the drones would move themselves or be carried by larger uncrewed or crewed ships to the western coast of Taiwan to wait.

Sea drones are well-suited to this role, thanks in part to the evolution of naval technologies and tactics over the past half century. Until World War II, submarines were the most lethal threat to ships. But since the Cold War, long-range subsonic, supersonic, and now hypersonic antiship missiles have commanded navy leaders’ attention. They’ve spent decades devising ways to protect their ships against such antiship missiles.

Much less effort has gone into defending against torpedoes, mines—or sea drones. A dozen or more missiles might be needed to ensure that just one reaches a targeted ship, and even then, the damage may not be catastrophic. But a single surface or undersea drone could easily evade detection and explode at a ship’s waterline to sink it, because in this case, water pressure does most of the work.

The level of autonomy available in most sea drones today is more than enough to attack ships in the Taiwan Strait. Details of U.S. military plans are classified, but a recent Hudson Institute report that I wrote with Dan Patt, proposes a possible approach. In it, a drone flotilla, consisting of about three dozen hunter-killer surface drones, two dozen uncrewed surface vessels carrying aerial drones, and three dozen autonomous undersea drones, would take up designated positions in a “kill box” adjacent to one of Taiwan’s western beaches if a Chinese invasion fleet had begun massing on the opposite side of the strait. Even if they were based in Japan or the Philippines, the drones could reach Taiwan within a day. Upon receiving a signal from operators remotely using Starlink or locally using a line-of-sight radio, the drones would act as a mobile minefield, attacking troop transports and their escorts inside Taiwan’s territorial waters. Widely available electro-optical and infrared sensors, coupled to recognition algorithms, would direct the drones to targets.

Although communications with operators onshore would likely be jammed, the drones could coordinate their actions locally using line-of-sight Internet Protocol–based networks like Silvus or TTNT. For example, surface vessels could launch aerial drones that would attack the pilot houses and radars of ships, while surface and undersea drones strike ships at the waterline. The drones could also coordinate to ensure they do not all strike the same target and to prioritize the largest targets first. These kinds of simple collaborations are routine in today’s drones.

Treating drones like mines reduces the complexity needed in their control systems and helps them comply with Pentagon rules for autonomous weapons. Rather than killer robots seeking out and destroying targets, the drones defending Taiwan would be passively waiting for Chinese forces to illegally enter a protected zone, within which they could be attacked.

Like Russia’s Black Sea Fleet, the Chinese navy will develop countermeasures to sea drones, such as employing decoy ships, attacking drones from the air, or using minesweepers to move them away from the invasion fleet. To stay ahead, operators will need to continue innovating tactics and behaviors through frequent exercises and experiments, like those underway at U.S. Navy Unmanned Surface Vessel Squadron Three. (Like the USVDIV-1, it is a unit under the U.S. Navy’s Surface Development Squadron One.) Lessons from such exercises would be incorporated into the defending drones as part of their programming before a mission.

The emergence of sea drones heralds a new era in naval warfare. After decades of focusing on increasingly lethal antiship missiles, navies now have to defend against capable and widely proliferating threats on, above, and below the water. And while sea drone swarms may be mainly a concern for coastal areas, these choke points are critical to the global economy and most nations’ security. For U.S. and allied fleets, especially, naval drones are a classic combination of threat and opportunity. As the Hellscape concept suggests, uncrewed vessels may be a solution to some of the most challenging and sweeping of modern naval scenarios for the Pentagon and its allies—and their adversaries.

This article was updated on 10 July 2024. An earlier version stated that sea drones from Saronic Technologies are being purchased by the U.S. Department of Defense’s Defense Innovation Unit. This could not be publicly confirmed.

Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion.

RoboCup 2024: 17–22 July 2024, EINDHOVEN, NETHERLANDSICRA@40: 23–26 September 2024, ROTTERDAM, NETHERLANDSIROS 2024: 14–18 October 2024, ABU DHABI, UAEICSR 2024: 23–26 October 2024, ODENSE, DENMARKCybathlon 2024: 25–27 October 2024, ZURICH

Enjoy today’s videos!

Figure is making progress toward a humanoid robot that can do something useful, but keep in mind that the “full use case” here is not one continuous shot.

[ Figure ]

Can this robot survive a 1-meter drop? Spoiler alert: it cannot.


One of those things that’s a lot harder for robots than it probably looks.

This is a demo of hammering a nail. The instantaneous rebound force from the hammer is absorbed through a combination of the elasticity of the rubber material securing the hammer, the deflection in torque sensors and harmonic gears, back-drivability, and impedance control. This allows the nail to be driven with a certain amount of force.

[ Tokyo Robotics ]

Although bin packing has been a key benchmark task for robotic manipulation, the community has mainly focused on the placement of rigid rectilinear objects within the container. We address this by presenting a soft robotic hand that combines vision, motor-based proprioception, and soft tactile sensors to identify, sort, and pack a stream of unknown objects.


Status Update: Extending traditional visual servo and compliant control by integrating the latest reinforcement and imitation learning control methodologies, UBTECH gradually trains the embodied intelligence-based “cerebellum” of its humanoid robot Walker S for diverse industrial manipulation tasks.


If you’re gonna ask a robot to stack bread, better make it flat.


Cassie has to be one of the most distinctive sounding legged robots there is.

[ Paper ]

Twice the robots are by definition twice as capable, right...?

[ Pollen Robotics ]

The Robotic Systems Lab participated in the Advanced Industrial Robotic Applications (AIRA) Challenge at the ACHEMA 2024 process industry trade show, where teams demonstrated their teleoperated robotic solutions for industrial inspection tasks. We competed with the ALMA legged manipulator robot, teleoperated using a second robot arm in a leader-follower configuration, placing us in third place for the competition.


This is apparently “peak demand” in a single market for Wing delivery drones.

[ Wing ]

Using a new type of surgical intervention and neuroprosthetic interface, MIT researchers, in collaboration with colleagues from Brigham and Women’s Hospital, have shown that a natural walking gait is achievable using a prosthetic leg fully driven by the body’s own nervous system. The surgical amputation procedure reconnects muscles in the residual limb, which allows patients to receive “proprioceptive” feedback about where their prosthetic limb is in space.

[ MIT ]

Coal mining in Forest of Dean (UK) is such a difficult and challenging job. Going into the mine as human is sometimes almost impossible. We did it with our robot while inspecting the mine with our partners (Forestry England) and the local miners!



[ ABB ]

Would you tango with a robot? Inviting us into the fascinating world of dancing machines, robot choreographer Catie Cuan highlights why teaching robots to move with grace, intention and emotion is essential to creating AI-powered machines we will want to welcome into our daily lives.

[ TED ]

It may at times seem like there are as many humanoid robotics companies out there as the industry could possibly sustain, but the potential for useful and reliable and affordable humanoids is so huge that there’s plenty of room for any company that can actually get them to work. Joining the dozen or so companies already on this quest is Persona AI, founded last month by Nic Radford and Jerry Pratt, two people who know better than just about anyone what it takes to make a successful robotics company, although they also know enough to be wary of getting into commercial humanoids.

Persona AI may not be the first humanoid robotics startup, but its founders have some serious experience in the space:

Nic Radford lead the team that developed NASA’s Valkyrie humanoid robot, before founding Houston Mechatronics (now Nauticus Robotics), which introduced a transforming underwater robot in 2019. He also founded Jacobi Motors, which is commercializing variable flux electric motors.

Jerry Pratt worked on walking robots for 20 years at the Institute for Human and Machine Cognition (IHMC) in Pensacola, Florida. He co-founded Boardwalk Robotics in 2017, and has spent the last two years as CTO of multi-billion-dollar humanoid startup Figure.

“It took me a long time to warm up to this idea,” Nic Radford tells us. “After I left Nauticus in January, I didn’t want anything to do with humanoids, especially underwater humanoids, and I didn’t even want to hear the word ‘robot.’ But things are changing so quickly, and I got excited and called Jerry and I’m like, this is actually very possible.” Jerry Pratt, who recently left Figure due primarily to the two-body problem, seems to be coming from a similar place: “There’s a lot of bashing your head against the wall in robotics, and persistence is so important. Nic and I have both gone through pessimism phases with our robots over the years. We’re a bit more optimistic about the commercial aspects now, but we want to be pragmatic and realistic about things too.”

Behind all of the recent humanoid hype lies the very, very difficult problem of making a highly technical piece of hardware and software compete effectively with humans in the labor market. But that’s also a very, very big opportunity—big enough that Persona doesn’t have to be the first company in this space, or the best funded, or the highest profile. They simply have to succeed, but of course sustainable commercial success with any robot (and bipedal robots in particular) is anything but simple. Step one will be building a founding team across two locations: Houston and Pensacola, Fla. But Radford says that the response so far to just a couple of LinkedIn posts about Persona has been “tremendous.” And with a substantial seed investment in the works, Persona will have more than just a vision to attract top talent.

For more details about Persona, we spoke with Persona AI co-founders Nic Radford and Jerry Pratt.

Why start this company, why now, and why you?

Nic Radford

Nic Radford: The idea for this started a long time ago. Jerry and I have been working together off and on for quite a while, being in this field and sharing a love for what the humanoid potential is while at the same time being frustrated by where humanoids are at. As far back as probably 2008, we were thinking about starting a humanoids company, but for one reason or another the viability just wasn’t there. We were both recently searching for our next venture and we couldn’t imagine sitting this out completely, so we’re finally going to explore it, although we know better than anyone that robots are really hard. They’re not that hard to build; but they’re hard to make useful and make money with, and the challenge for us is whether we can build a viable business with Persona: can we build a business that uses robots and makes money? That’s our singular focus. We’re pretty sure that this is likely the best time in history to execute on that potential.

Jerry Pratt: I’ve been interested in commercializing humanoids for quite a while—thinking about it, and giving it a go here and there, but until recently it has always been the wrong time from both a commercial point of view and a technological readiness point of view. You can think back to the DARPA Robotics Challenge days when we had to wait about 20 seconds to get a good lidar scan and process it, which made it really challenging to do things autonomously. But we’ve gotten much, much better at perception, and now, we can get a whole perception pipeline to run at the framerate of our sensors. That’s probably the main enabling technology that’s happened over the last 10 years.

From the commercial point of view, now that we’re showing that this stuff’s feasible, there’s been a lot more pull from the industry side. It’s like we’re at the next stage of the Industrial Revolution, where the harder problems that weren’t roboticized from the 60s until now can now be. And so, there’s really good opportunities in a lot of different use cases.

A bunch of companies have started within the last few years, and several were even earlier than that. Are you concerned that you’re too late?

Radford: The concern is that we’re still too early! There might only be one Figure out there that raises a billion dollars, but I don’t think that’s going to be the case. There’s going to be multiple winners here, and if the market is as large as people claim it is, you could see quite a diversification of classes of commercial humanoid robots.

Jerry Pratt

Pratt: We definitely have some catching up to do but we should be able to do that pretty quickly, and I’d say most people really aren’t that far from the starting line at this point. There’s still a lot to do, but all the technology is here now—we know what it takes to put together a really good team and to build robots. We’re also going to do what we can to increase speed, like by starting with a surrogate robot from someone else to get the autonomy team going while building our own robot in parallel.

Radford: I also believe that our capital structure is a big deal. We’re taking an anti-stealth approach, and we want to bring everyone along with us as our company grows and give out a significant chunk of the company to early joiners. It was an anxiety of ours that we would be perceived as a me-too and that nobody was going to care, but it’s been the exact opposite with a compelling response from both investors and early potential team members.

So your approach here is not to look at all of these other humanoid robotics companies and try and do something they’re not, but instead to pursue similar goals in a similar way in a market where there’s room for all?

Pratt: All robotics companies, and AI companies in general, are standing on the shoulders of giants. These are the thousands of robotics and AI researchers that have been collectively bashing their heads against the myriad problems for decades—some of the first humanoids were walking at Waseda University in the late 1960s. While there are some secret sauces that we might bring to the table, it is really the combined efforts of the research community that now enables commercialization.

So if you’re at a point where you need something new to be invented in order to get to applications, then you’re in trouble, because with invention you never know how long it’s going to take. What is available today and now, the technology that’s been developed by various communities over the last 50+ years—we all have what we need for the first three applications that are widely mentioned: warehousing, manufacturing, and logistics. The big question is, what’s the fourth application? And the fifth and the sixth? And if you can start detecting those and planning for them, you can get a leg up on everybody else.

The difficulty is in the execution and integration. It’s a ten thousand—no, that’s probably too small—it’s a hundred thousand piece puzzle where you gotta get each piece right, and occasionally you lose some pieces on the floor that you just can’t find. So you need a broad team that has expertise in like 30 different disciplines to try to solve the challenge of an end-to-end labor solution with humanoid robots.

Radford: The idea is like one percent of starting a company. The rest of it, and why companies fail, is in the execution. Things like, not understanding the market and the product-market fit, or not understanding how to run the company, the dimensions of the actual business. I believe we’re different because with our backgrounds and our experience we bring a very strong view on execution, and that is our focus on day one. There’s enough interest in the VC community that we can fund this company with a singular focus on commercializing humanoids for a couple different verticals.

But listen, we got some novel ideas in actuation and other tricks up our sleeve that might be very compelling for this, but we don’t want to emphasize that aspect. I don’t think Persona’s ultimate success comes just from the tech component. I think it comes mostly from ‘do we understand the customer, the market needs, the business model, and can we avoid the mistakes of the past?’

How is that going to change things about the way that you run Persona?

Radford: I started a company [Houston Mechatronics] with a bunch of research engineers. They don’t make the best product managers. More broadly, if you’re staffing all your disciplines with roboticists and engineers, you’ll learn that it may not be the most efficient way to bring something to market. Yes, we need those skills. They are essential. But there’s so many other aspects of a business that get overlooked when you’re fundamentally a research lab trying to commercialize a robot. I’ve been there, I’ve done that, and I’m not interested in making that mistake again.

Pratt: It’s important to get a really good product team that’s working with a customer from day one to have customer needs drive all the engineering. The other approach is ‘build it and they will come’ but then maybe you don’t build the right thing. Of course, we want to build multi-purpose robots, and we’re steering clear of saying ‘general purpose’ at this point. We don’t want to overfit to any one application, but if we can get to a dozen use cases, two or three per customer site, then we’ve got something.

There still seems to be a couple of unsolved technical challenges with humanoids, including hands, batteries, and safety. How will Persona tackle those things?

Pratt: Hands are such a hard thing—getting a hand that has the required degrees of freedom and is robust enough that if you accidentally hit it against your table, you’re not just going to break all your fingers. But we’ve seen robotic hand companies popping up now that are showing videos of hitting their hands with a hammer, so I’m hopeful.

Getting one to two hours of battery life is relatively achievable. Pushing up towards five hours is super hard. But batteries can now be charged in 20 minutes or so, as long as you’re going from 20 percent to 80 percent. So we’re going to need a cadence where robots are swapping in and out and charging as they go. And batteries will keep getting better.

Radford: We do have a focus on safety. It was paramount at NASA, and when we were working on Robonaut, it led to a lot of morphological considerations with padding. In fact, the first concepts and images we have of our robot illustrate extensive padding, but we have to do that carefully, because at the end of the day it’s mass and it’s inertia.

What does the near future look like for you?

Pratt: Building the team is really important—getting those first 10 to 20 people over the next few months. Then we’ll want to get some hardware and get going really quickly, maybe buying a couple of robot arms or something to get our behavior and learning pipelines going while in parallel starting our own robot design. From our experience, after getting a good team together and starting from a clean sheet, a new robot takes about a year to design and build. And then during that period we’ll be securing a customer or two or three.

Radford: We’re also working hard on some very high profile partnerships that could influence our early thinking dramatically. Like Jerry said earlier, it’s a massive 100,000 piece puzzle, and we’re working on the fundamentals: the people, the cash, and the customers.

This article is part of our exclusive IEEE Journal Watch series in partnership with IEEE Xplore.

One of the (many) great things about robots is that they don’t have to be constrained by how their biological counterparts do things. If you have a particular problem your robot needs to solve, you can get creative with extra sensors: many quadrupeds have side cameras and butt cameras for obstacle avoidance, and humanoids sometimes have chest cameras and knee cameras to help with navigation along with wrist cameras for manipulation. But how far can you take this? I have no idea, but it seems like we haven’t gotten to the end of things yet because now there’s a quadruped with cameras on the bottom of its feet.

Sensorized feet is not a new idea; it’s pretty common for quadrupedal robots to have some kind of foot-mounted force sensor to detect ground contact. Putting an actual camera down there is fairly novel, though, because it’s not at all obvious how you’d go about doing it. And the way that roboticists from the Southern University of Science and Technology in Shenzhen went about doing it is, indeed, not at all obvious.

Go1’s snazzy feetsies have soles made of transparent acrylic, with slightly flexible plastic structure supporting a 60 millimeter gap up to each camera (640x480 at 120 frames per second) with a quartet of LEDs to provide illumination. While it’s complicated looking, at 120 grams, it doesn’t weigh all that much, and costs only about $50 per foot ($42 of which is the camera). The whole thing is sealed to keep out dirt and water.

So why bother with all of this (presumably somewhat fragile) complexity? As we ask quadruped robots to do more useful things in more challenging environments, having more information about what exactly they’re stepping on and how their feet are interacting with the ground is going to be super helpful. Robots that rely only on proprioceptive sensing (sensing self-movement) are great and all, but when you start trying to move over complex surfaces like sand, it can be really helpful to have vision that explicitly shows how your robot is interacting with the surface that it’s stepping on. Preliminary results showed that Foot Vision enabled the Go1 using it to perceive the flow of sand or soil around its foot as it takes a step, which can be used to estimate slippage, the bane of ground-contacting robots.

The researchers acknowledge that their hardware could use a bit of robustifying, and they also want to try adding some tread patterns around the circumference of the foot, since that plexiglass window is pretty slippery. The overall idea is to make Foot Vision as useful as the much more common gripper-integrated vision systems for robotic manipulation, helping legged robots make better decisions about how to get where they need to go.

Foot Vision: A Vision-Based Multi-Functional Sensorized Foot for Quadruped Robots, by Guowei Shi, Chen Yao, Xin Liu, Yuntian Zhao, Zheng Zhu, and Zhenzhong Jia from Southern University of Science and Technology in Shenzhen, is accepted to the July 2024 issue of IEEE Robotics and Automation Letters


Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion.

RoboCup 2024: 17–22 July 2024, EINDHOVEN, NETHERLANDSICRA@40: 23–26 September 2024, ROTTERDAM, NETHERLANDSIROS 2024: 14–18 October 2024, ABU DHABI, UAEICSR 2024: 23–26 October 2024, ODENSE, DENMARKCybathlon 2024: 25–27 October 2024, ZURICH

Enjoy today’s videos!

Agility has been working with GXO for a bit now, but the big news here (and it IS big news) is that Agility’s Digit robots at GXO now represent the first formal commercial deployment of humanoid robots.

[ GXO ]

GXO can’t seem to get enough humanoids, because they’re also starting some R&D with Apptronik.

[ GXO ]

In this paper, we introduce a full-stack system for humanoids to learn motion and autonomous skills from human data. Through shadowing, human operators can teleoperate humanoids to collect whole-body data for learning different tasks in the real world. Using the data collected, we then perform supervised behavior cloning to train skill policies using egocentric vision, allowing humanoids to complete different tasks autonomously by imitating human skills.


[ HumanPlus ]

Yeah these robots are impressive but it’s the sound effects that make it.

[ Deep Robotics ]

Meet CARMEN, short for Cognitively Assistive Robot for Motivation and Neurorehabilitation–a small, tabletop robot designed to help people with mild cognitive impairment (MCI) learn skills to improve memory, attention, and executive functioning at home.

[ CARMEN ] via [ UCSD ]

Thanks, Ioana!

The caption of this video is, “it did not work...”

You had one job, e-stop person! ONE JOB!


This is a demo of cutting wood with a saw. When using position control for this task, precise measurement of the cutting amount is necessary. However, by using impedance control, this requirement is eliminated, allowing for successful cutting with only rough commands.

[ Tokyo Robotics ]

This is mesmerizing.

[ Oregon State ]

Quadrupeds are really starting to look like the new hotness in bipedal locomotion.

[ University of Leeds ]

I still think this is a great way of charging a robot. Make sure and watch until the end to see the detach trick.

[ YouTube ]

The Oasa R1, now on Kickstarter for $1,200, is the world’s first robotic lawn mower that uses one of them old timey reely things for cutting.

[ Kickstarter ]

ICRA next year is in Atlanta!

[ ICRA 2025 ]

Our Skunk Works team developed a modified version of the SR-71 Blackbird, titled the M-21, which carried an uncrewed reconnaissance drone called the D-21. The D-21 was designed to capture intelligence, release its camera, then self-destruct!

[ Lockheed Martin ]

The RPD 35 is a robotic powerhouse that surveys, distributes, and drives wide-flange solar piles up to 19 feet in length.

[ Built Robotics ]

Field AI’s brain technology is enabling robots to autonomously explore oil and gas facilities, navigating throughout the site and inspecting equipment for anomalies and hazardous conditions.

[ Field AI ]

Husky Observer was recently deployed at a busy automotive rail yard to carry out various autonomous inspection tasks including measuring train car positions and RFID data collection from the offloaded train inventory.

[ Clearpath ]

If you’re going to try to land a robot on the Moon, it’s useful to have a little bit of the Moon somewhere to practice on.

[ Astrobotic ]

Would you swallow a micro-robot? In a gutsy demo, physician Vivek Kumbhari navigates Pillbot, a wireless, disposable robot swallowed onstage by engineer Alex Luebke, modeling how this technology can swiftly provide direct visualization of internal organs. Learn more about how micro-robots could move us past the age of invasive endoscopies and open up doors to more comfortable, affordable medical imaging.

[ TED ]

How will AI improve our lives in the years to come? From its inception six decades ago to its recent exponential growth, futurist Ray Kurzweil highlights AI’s transformative impact on various fields and explains his prediction for the singularity: the point at which human intelligence merges with machine intelligence.

[ TED ]

Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion.

RoboCup 2024: 17–22 July 2024, EINDHOVEN, NETHERLANDSICRA@40: 23–26 September 2024, ROTTERDAM, NETHERLANDSIROS 2024: 14–18 October 2024, ABU DHABI, UAEICSR 2024: 23–26 October 2024, ODENSE, DENMARKCybathlon 2024: 25–27 October 2024, ZURICH

Enjoy today’s videos!

We present Morphy, a novel compliant and morphologically aware flying robot that integrates sensorized flexible joints in its arms, thus enabling resilient collisions at high speeds and the ability to squeeze through openings narrower than its nominal dimensions.

Morphy represents a new class of soft flying robots that can facilitate unprecedented resilience through innovations both in the “body” and “brain.” The novel soft body can, in turn, enable new avenues for autonomy. Collisions that previously had to be avoided have now become acceptable risks, while areas that are untraversable for a certain robot size can now be negotiated through self-squeezing. These novel bodily interactions with the environment can give rise to new types of embodied intelligence.

[ ARL ]

Thanks, Kostas!

Segments of daily training for robots driven by reinforcement learning. Multiple tests done in advance for friendly service humans. The training includes some extreme tests. Please do not imitate!

[ Unitree ]

Sphero is not only still around, it’s making new STEM robots!

[ Sphero ]

Googly eyes mitigate all robot failures.


Here I am, without the ability or equipment (or desire) required to iron anything that I own, and Flexiv’s got robots out there ironing fancy leather car seats.

[ Flexiv ]

Thanks, Noah!

We unveiled a significant leap forward in perception technology for our humanoid robot GR-1. The newly adapted pure-vision solution integrates bird’s-eye view, transformer models, and an occupancy network for precise and efficient environmental perception.

[ Fourier ]

Thanks, Serin!

LimX Dynamics’ humanoid robot CL-1 was launched in December 2023. It climbed stairs based on real-time terrain perception, two steps per stair. Four months later, in April 2024, the second demo video showcased CL-1 in the same scenario. It had advanced to climb the same stair, one step per stair.

[ LimX Dynamics ]

Thanks, Ou Yan!

New research from the University of Massachusetts Amherst shows that programming robots to create their own teams and voluntarily wait for their teammates results in faster task completion, with the potential to improve manufacturing, agriculture, and warehouse automation.

[ HCRL ] via [ UMass Amherst ]

Thanks, Julia!

LASDRA (Large-size Aerial Skeleton with Distributed Rotor Actuation system (ICRA18) is a scalable and modular aerial robot. It can assume a very slender, long, and dexterous form factor and is very lightweight.


We propose augmenting initially passive structures built from simple repeated cells, with novel active units to enable dynamic, shape-changing, and robotic applications. Inspired by metamaterials that can employ mechanisms, we build a framework that allows users to configure cells of this passive structure to allow it to perform complex tasks.

[ CMU ]

Testing autonomous exploration at the Exyn Office using Spot from Boston Dynamics. In this demo, Spot autonomously explores our flight space while on the hunt for one of our engineers.

[ Exyn ]

Meet Heavy Picker, the strongest robot in bulky-waste sorting and an absolute pro at lifting and sorting waste. With skills that would make a concert pianist jealous and a work ethic that never needs coffee breaks, Heavy Picker was on the lookout for new challenges.

[ Zen Robotics ]

AI is the biggest and most consequential business, financial, legal, technological, and cultural story of our time. In this panel, you will hear from the underrepresented community of women scientists who have been leading the AI revolution—from the beginning to now.

[ Stanford HAI ]

Insects have long been an inspiration for robots. The insect world is full of things that are tiny, fully autonomous, highly mobile, energy efficient, multimodal, self-repairing, and I could go on and on but you get the idea—insects are both an inspiration and a source of frustration to roboticists because it’s so hard to get robots to have anywhere close to insect capability.

We’re definitely making progress, though. In a paper published last month in IEEE Robotics and Automation Letters, roboticists from Shanghai Jong Tong University demonstrated the most bug-like robotic bug I think I’ve ever seen.

A Multi-Modal Tailless Flapping-Wing Robot

Okay so it may not look the most bug-like, but it can do many very buggy bug things, including crawling, taking off horizontally, flying around (with six degrees of freedom control), hovering, landing, and self-righting if necessary. JT-fly weighs about 35 grams and has a wingspan of 33 centimeters, using four wings at once to fly at up to 5 meters per second and six legs to scurry at 0.3 m/s. Its 380 milliampere-hour battery powers it for an actually somewhat useful 8-ish minutes of flying and about 60 minutes of crawling.

While that amount of endurance may not sound like a lot, robots like these aren’t necessarily intended to be moving continuously. Rather, they move a little bit, find a nice safe perch, and then do some sensing or whatever until you ask them to move to a new spot. Ideally, most of that movement would be crawling, but having the option to fly makes JT-fly exponentially more useful.

Or, potentially more useful, because obviously this is still very much a research project. It does seem like there’s a bunch more optimization that could be done here; for example, JT-fly uses completely separate systems for flying and crawling, with two motors powering the legs and two additional motors powering the wings plus with two wing servos for control. There’s currently a limited amount of onboard autonomy, with an inertial measurement unit, barometer, and wireless communication, but otherwise not much in the way of useful payload.

Insects are both an inspiration and a source of frustration to roboticists because it’s so hard to get robots to have anywhere close to insect capability.

It won’t surprise you to learn that the researchers have disaster relief applications in mind for this robot, suggesting that “after natural disasters such as earthquakes and mudslides, roads and buildings will be severely damaged, and in these scenarios, JT-fly can rely on its flight ability to quickly deploy into the mission area.” One day, robots like these will actually be deployed for disaster relief, and although that day is not today, we’re just a little bit closer than we were before.

“A Multi-Modal Tailless Flapping-Wing Robot Capable of Flying, Crawling, Self-Righting and Horizontal Takeoff,” by Chaofeng Wu, Yiming Xiao, Jiaxin Zhao, Jiawang Mou, Feng Cui, and Wu Liu from Shanghai Jong Tong University, is published in the May issue of IEEE Robotics and Automation Letters.

Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion.

RoboCup 2024: 17–22 July 2024, EINDHOVEN, NETHERLANDSICRA@40: 23–26 September 2024, ROTTERDAM, NETHERLANDSIROS 2024: 14–18 October 2024, ABU DHABI, UAEICSR 2024: 23–26 October 2024, ODENSE, DENMARKCybathlon 2024: 25–27 October 2024, ZURICH

Enjoy today’s videos!

There’s a Canadian legend about a flying canoe, because of course there is. The legend involves drunkenness, a party with some ladies, swearing, and a pact with the devil, because of course it does. Fortunately for the drone in this video, it needs none of that to successfully land on this (nearly) flying canoe, just some high-friction shock absorbing legs and judicious application of reverse thrust.

[ Createk ]

Thanks, Alexis!

This paper summarizes an autonomous driving project by musculoskeletal humanoids. The musculoskeletal humanoid, which mimics the human body in detail, has redundant sensors and a flexible body structure. We reconsider the developed hardware and software of the musculoskeletal humanoid Musashi in the context of autonomous driving. The respective components of autonomous driving are conducted using the benefits of the hardware and software. Finally, Musashi succeeded in the pedal and steering wheel operations with recognition.

[ Paper ] via [ JSK Lab ]

Thanks, Kento!

Robust AI has been kinda quiet for the last little while, but their Carter robot continues to improve.

[ Robust AI ]

One of the key arguments for building robots that have similar form factors to human beings is that we can leverage the massive human data for training. In this paper, we introduce a full-stack system for humanoids to learn motion and autonomous skills from human data. We demonstrate the system on our customized 33-degrees-of-freedom 180 centimeter humanoid, autonomously completing tasks such as wearing a shoe to stand up and walk, unloading objects from warehouse racks, folding a sweatshirt, rearranging objects, typing, and greeting another robot with 60-100 percent success rates using up to 40 demonstrations.

[ HumanPlus ]

We present OmniH2O (Omni Human-to-Humanoid), a learning-based system for whole-body humanoid teleoperation and autonomy. Using kinematic pose as a universal control interface, OmniH2O enables various ways for a human to control a full-sized humanoid with dexterous hands, including using real-time teleoperation through VR headset, verbal instruction, and RGB camera. OmniH2O also enables full autonomy by learning from teleoperated demonstrations or integrating with frontier models such as GPT-4.

[ OmniH2O ]

A collaboration between Boxbot, Agility Robotics, and Robust.AI at Playground Global. Make sure and watch until the end to hear the roboticists in the background react when the demo works in a very roboticist way.

::clap clap clap:: yaaaaayyyyy....

[ Robust AI ]

The use of drones and robotic devices threatens civilian and military actors in conflict areas. We started trials with robots to see how we can adapt our HEAT (Hostile Environment Awareness Training) courses to this new reality.

[ CSD ]

Thanks, Ebe!

How to make humanoids do versatile parkour jumping, clapping dance, cliff traversal, and box pick-and-move with a unified RL framework? We introduce WoCoCo: Whole-body humanoid Control with sequential Contacts

[ WoCoCo ]

A selection of excellent demos from the Learning Systems and Robotics Lab at TUM and the University of Toronto.

[ Learning Systems and Robotics Lab ]

Harvest Automation, one of the OG autonomous mobile robot companies, hasn’t updated their website since like 2016, but some videos just showed up on YouTube this week.

[ Harvest Automation ]

Northrop Grumman has been pioneering capabilities in the undersea domain for more than 50 years. Now, we are creating a new class of uncrewed underwater vehicles (UUV) with Manta Ray. Taking its name from the massive “winged” fish, Manta Ray will operate long-duration, long-range missions in ocean environments where humans can’t go.

[ Northrop Grumman ]

Akara Robotics’ autonomous robotic UV disinfection demo.

[ Akara Robotics ]

Scientists have computationally predicted hundreds of thousands of novel materials that could be promising for new technologies—but testing to see whether any of those materials can be made in reality is a slow process. Enter A-Lab, which uses robots guided by artificial intelligence to speed up the process.

[ A-Lab ]

We wrote about this research from CMU a while back, but here’s a quite nice video.

[ CMU RI ]

Aw yiss pick and place robots.

[ Fanuc ]

Axel Moore describes his lab’s work in orthopedic biomechanics to relieve joint pain with robotic assistance.

[ CMU ]

The field of humanoid robots has grown in recent years with several companies and research laboratories developing new humanoid systems. However, the number of running robots did not noticeably rise. Despite the need for fast locomotion to quickly serve given tasks, which require traversing complex terrain by running and jumping over obstacles. To provide an overview of the design of humanoid robots with bioinspired mechanisms, this paper introduces the fundamental functions of the human running gait.

[ Paper ]

The original version of this post by Benjie Holson was published on Substack here, and includes Benjie’s original comics as part of his series on robots and startups.

I worked on this idea for months before I decided it was a mistake. The second time I heard someone mention it, I thought, “That’s strange, these two groups had the same idea. Maybe I should tell them it didn’t work for us.” The third and fourth time I rolled my eyes and ignored it. The fifth time I heard about a group struggling with this mistake, I decided it was worth a blog post all on its own. I call this idea “The Mythical Non-Roboticist.”

The Mistake

The idea goes something like this: Programming robots is hard. And there are some people with really arcane skills and PhDs who are really expensive and seem to be required for some reason. Wouldn’t it be nice if we could do robotics without them? 1 What if everyone could do robotics? That would be great, right? We should make a software framework so that non-roboticists can program robots.

This idea is so close to a correct idea that it’s hard to tell why it doesn’t work out. On the surface, it’s not wrong: All else being equal, it would be good if programming robots was more accessible. The problem is that we don’t have a good recipe for making working robots. So we don’t know how to make that recipe easier to follow. In order to make things simple, people end up removing things that folks might need, because no one knows for sure what’s absolutely required. It’s like saying you want to invent an invisibility cloak and want to be able to make it from materials you can buy from Home Depot. Sure, that would be nice, but if you invented an invisibility cloak that required some mercury and neodymium to manufacture would you toss the recipe?

In robotics, this mistake is based on a very true and very real observation: Programming robots is super hard. Famously hard. It would be super great if programming robots was easier. The issue is this: Programming robots has two different kinds of hard parts.

Robots are hard because the world is complicated

Moor Studio/Getty Images

The first kind of hard part is that robots deal with the real world, imperfectly sensed and imperfectly actuated. Global mutable state is bad programming style because it’s really hard to deal with, but to robot software the entire physical world is global mutable state, and you only get to unreliably observe it and hope your actions approximate what you wanted to achieve. Getting robotics to work at all is often at the very limit of what a person can reason about, and requires the flexibility to employ whatever heuristic might work for your special problem. This is the intrinsic complexity of the problem: Robots live in complex worlds, and for every working solution there are millions of solutions that don’t work, and finding the right one is hard, and often very dependent on the task, robot, sensors, and environment.

Folks look at that challenge, see that it is super hard, and decide that, sure, maybe some fancy roboticist could solve it in one particular scenario, but what about “normal” people? “We should make this possible for non-roboticists” they say. I call these users “Mythical Non-Roboticists” because once they are programming a robot, I feel they become roboticists. Isn’t anyone programming a robot for a purpose a roboticist? Stop gatekeeping, people.

Don’t design for amorphous groups

I call also them “mythical” because usually the “non-roboticist” implied is a vague, amorphous group. Don’t design for amorphous groups. If you can’t name three real people (that you have talked to) that your API is for, then you are designing for an amorphous group and only amorphous people will like your API.

And with this hazy group of users in mind (and seeing how difficult everything is), folks think, “Surely we could make this easier for everyone else by papering over these things with simple APIs?”

No. No you can’t. Stop it.

You can’t paper over intrinsic complexity with simple APIs because if your APIs are simple they can’t cover the complexity of the problem. You will inevitably end up with a beautiful looking API, with calls like “grasp_object” and “approach_person” which demo nicely in a hackathon kickoff but last about 15 minutes of someone actually trying to get some work done. It will turn out that, for their particular application, “grasp_object()” makes 3 or 4 wrong assumptions about “grasp” and “object” and doesn’t work for them at all.

Your users are just as smart as you

This is made worse by the pervasive assumption that these people are less savvy (read: less intelligent) than the creators of this magical framework. 2 That feeling of superiority will cause the designers to cling desperately to their beautiful, simple “grasp_object()”s and resist adding the knobs and arguments needed to cover more use cases and allow the users to customize what they get.

Ironically this foists a bunch of complexity on to the poor users of the API who have to come up with clever workarounds to get it to work at all.

Moor Studio/Getty Images

The sad, salty, bitter icing on this cake-of-frustration is that, even if done really well, the goal of this kind of framework would be to expand the group of people who can do the work. And to achieve that, it would sacrifice some performance you can only get by super-specializing your solution to your problem. If we lived in a world where expert roboticists could program robots that worked really well, but there was so much demand for robots that there just wasn’t enough time for those folks to do all the programming, this would be a great solution. 3

The obvious truth is that (outside of really constrained environments like manufacturing cells) even the very best collection of real bone-fide, card-carrying roboticists working at the best of their ability struggle to get close to a level of performance that makes the robots commercially viable, even with long timelines and mountains of funding. 4 We don’t have any headroom to sacrifice power and effectiveness for ease.

What problem are we solving?

So should we give up making it easier? Is robotic development available only to a small group of elites with fancy PhDs? 5 No to both! I have worked with tons of undergrad interns who have been completely able to do robotics.6 I myself am mostly self-taught in robot programming.7 While there is a lot of intrinsic complexity in making robots work, I don’t think there is any more than, say, video game development.

In robotics, like in all things, experience helps, some things are teachable, and as you master many areas you can see things start to connect together. These skills are not magical or unique to robotics. We are not as special as we like to think we are.

But what about making programming robots easier? Remember way back at the beginning of the post when I said that there were two different kinds of hard parts? One is the intrinsic complexity of the problem, and that one will be hard no matter what. 8 But the second is the incidental complexity, or as I like to call it, the stupid BS complexity.

Stupid BS Complexity

Robots are asynchronous, distributed, real-time systems with weird hardware. All of that will be hard to configure for stupid BS reasons. Those drivers need to work in the weird flavor of Linux you want for hard real-time for your controls and getting that all set up will be hard for stupid BS reasons. You are abusing Wi-Fi so you can roam seamlessly without interruption but Linux’s Wi-Fi will not want to do that. Your log files are huge and you have to upload them somewhere so they don’t fill up your robot. You’ll need to integrate with some cloud something or other and deal with its stupid BS. 9

Moor Studio/Getty Images

There is a ton of crap to deal with before you even get to complexity of dealing with 3D rotation, moving reference frames, time synchronization, messaging protocols. Those things have intrinsic complexity (you have to think about when something was observed and how to reason about it as other things have moved) and stupid BS complexity (There’s a weird bug because someone multiplied two transform matrices in the wrong order and now you’re getting an error message that deep in some protocol a quaternion is not normalized. WTF does that mean?) 10

One of the biggest challenges of robot programming is wading through the sea of stupid BS you need to wrangle in order to start working on your interesting and challenging robotics problem.

So a simple heuristic to make good APIs is:

Design your APIs for someone as smart as you, but less tolerant of stupid BS.

That feels universal enough that I’m tempted to call it Holson’s Law of Tolerable API Design.

When you are using tools you’ve made, you know them well enough to know the rough edges and how to avoid them.

But rough edges are things that have to be held in a programmer’s memory while they are using your system. If you insist on making a robotics framework 11, you should strive to make it as powerful as you can with the least amount of stupid BS. Eradicate incidental complexity everywhere you can. You want to make APIs that have maximum flexibility but good defaults. I like python’s default-argument syntax for this because it means you can write APIs that can be used like:

It is possible to have easy things be simple and allow complex things. And please, please, please don’t make condescending APIs. Thanks!

1. Ironically it is very often the expensive arcane-knowledge-having PhDs who are proposing this.

2. Why is it always a framework?

3. The exception that might prove the rule is things like traditional manufacturing-cell automation. That is a place where the solutions exist, but the limit to expanding is set up cost. I’m not an expert in this domain, but I’d worry that physical installation and safety compliance might still dwarf the software programming cost, though.

4. As I well know from personal experience.

5. Or non-fancy PhDs for that matter?

6. I suspect that many bright highschoolers would also be able to do the work. Though, as Google tends not to hire them, I don’t have good examples.

7. My schooling was in Mechanical Engineering and I never got a PhD, though my ME classwork did include some programming fundamentals.

8. Unless we create effective general purpose AI. It feels weird that I have to add that caveat, but the possibility that it’s actually coming for robotics in my lifetime feels much more possible than it did two years ago.

9. And if you are unlucky, its API was designed by someone who thought they were smarter than their customers.

10. This particular flavor of BS complexity is why I wrote If you do robotics, you should check it out.

11. Which, judging by the trail of dead robot-framework-companies, is a fraught thing to do.

Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion.

RoboCup 2024: 17–22 July 2024, EINDHOVEN, NETHERLANDSICRA@40: 23–26 September 2024, ROTTERDAM, NETHERLANDSIROS 2024: 14–18 October 2024, ABU DHABI, UNITED ARAB EMIRATESICSR 2024: 23–26 October 2024, ODENSE, DENMARKCybathlon 2024: 25–27 October 2024, ZURICH

Enjoy today’s videos!

In this video, you see the start of 1X’s development of an advanced AI system that chains simple tasks into complex actions using voice commands, allowing seamless multi-robot control and remote operation. By starting with single-task models, we ensure smooth transitions to more powerful unified models, ultimately aiming to automate high-level actions using AI.

This video does not contain teleoperation, computer graphics, cuts, video speedups, or scripted trajectory playback. It’s all controlled via neural networks.

[ 1X ]

As the old adage goes, one cannot claim to be a true man without a visit to the Great Wall of China. XBot-L, a full-sized humanoid robot developed by Robot Era, recently acquitted itself well in a walk along sections of the Great Wall.

[ Robot Era ]

The paper presents a novel rotary wing platform, that is capable of folding and expanding its wings during flight. Our source of inspiration came from birds’ ability to fold their wings to navigate through small spaces and dive. The design of the rotorcraft is based on the monocopter platform, which is inspired by the flight of Samara seeds.

[ AirLab ]

We present a variable stiffness robotic skin (VSRS), a concept that integrates stiffness-changing capabilities, sensing, and actuation into a single, thin modular robot design. Reconfiguring, reconnecting, and reshaping VSRSs allows them to achieve new functions both on and in the absence of a host body.

[ Yale Faboratory ]

Heimdall is a new rover design for the 2024 University Rover Challenge (URC). This video shows highlights of Heimdall’s trip during the four missions at URC 2024.

Heimdall features a split body design with whegs (wheel legs), and a drill for sub-surface sample collection. It also has the ability to manipulate a variety of objects, collect surface samples, and perform onboard spectrometry and chemical tests.

[ WVU ]

I think this may be the first time I’ve seen an autonomous robot using a train? This one is delivering lunch boxes!

[ JSME ]

The AI system used identifies and separates red apples from green apples, after which a robotic arm picks up the red apples identified with a qb SoftHand Industry and gently places them in a basket.

My favorite part is the magnetic apple stem system.

[ QB Robotics ]

DexNex (v0, June 2024) is an anthropomorphic teleoperation testbed for dexterous manipulation at the Center for Robotics and Biosystems at Northwestern University. DexNex recreates human upper-limb functionality through a near 1-to-1 mapping between Operator movements and Avatar actions.

Motion of the Operator’s arms, hands, fingers, and head are fed forward to the Avatar, while fingertip pressures, finger forces, and camera images are fed back to the Operator. DexNex aims to minimize the latency of each subsystem to provide a seamless, immersive, and responsive user experience. Future research includes gaining a better understanding of the criticality of haptic and vision feedback for different manipulation tasks; providing arm-level grounded force feedback; and using machine learning to transfer dexterous skills from the human to the robot.

[ Northwestern ]

Sometimes the best path isn’t the smoothest or straightest surface, it’s the path that’s actually meant to be a path.

[ RaiLab ]

Fulfilling a school requirement by working in a Romanian locomotive factory one week each month, Daniela Rus learned to operate “machines that help us make things.” Appreciation for the practical side of math and science stuck with Daniela, who is now Director of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL).

[ MIT ]

For AI to achieve its full potential, non-experts need to be let into the development process, says Rumman Chowdhury, CEO and cofounder of Humane Intelligence. She tells the story of farmers fighting for the right to repair their own AI-powered tractors (which some manufacturers actually made illegal), proposing everyone should have the ability to report issues, patch updates or even retrain AI technologies for their specific uses.

[ TED ]

Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion.

RoboCup 2024: 17–22 July 2024, EINDHOVEN, NETHERLANDSICSR 2024: 23–26 October 2024, ODENSE, DENMARKCybathlon 2024: 25–27 October 2024, ZURICH

Enjoy today’s videos!

Do you have trouble multitasking? Cyborgize yourself through muscle stimulation to automate repetitive physical tasks while you focus on something else.

[ SplitBody ]

By combining a 5,000 frame-per-second (FPS) event camera with a 20-FPS RGB camera, roboticists from the University of Zurich have developed a much more effective vision system that keeps autonomous cars from crashing into stuff, as described in the current issue of Nature.

[ Nature ]

Mitsubishi Electric has been awarded the GUINNESS WORLD RECORDS title for the fastest robot to solve a puzzle cube. The robot’s time of 0.305 second beat the previous record of 0.38 second, for which it received a GUINNESS WORLD RECORDS certificate on 21 May 2024.

[ Mitsubishi ]

Sony’s AIBO is celebrating its 25th anniversary, which seems like a long time, and it is. But back then, the original AIBO could check your email for you. Email! In 1999!

I miss Hotmail.

[ AIBO ]

SchniPoSa: schnitzel with french fries and a salad.

[ Dino Robotics ]

Cloth-folding is still a really hard problem for robots, but progress was made at ICRA!

[ ICRA Cloth Competition ]

Thanks, Francis!

MIT CSAIL researchers enhance robotic precision with sophisticated tactile sensors in the palm and agile fingers, setting the stage for improvements in human-robot interaction and prosthetic technology.

[ MIT ]

We present a novel adversarial attack method designed to identify failure cases in any type of locomotion controller, including state-of-the-art reinforcement-learning-based controllers. Our approach reveals the vulnerabilities of black-box neural network controllers, providing valuable insights that can be leveraged to enhance robustness through retraining.

[ Fan Shi ]

In this work, we investigate a novel integrated flexible OLED display technology used as a robotic skin-interface to improve robot-to-human communication in a real industrial setting at Volkswagen or a collaborative human-robot interaction task in motor assembly. The interface was implemented in a workcell and validated qualitatively with a small group of operators (n=9) and quantitatively with a large group (n=42). The validation results showed that using flexible OLED technology could improve the operators’ attitude toward the robot; increase their intention to use the robot; enhance their perceived enjoyment, social influence, and trust; and reduce their anxiety.

[ Paper ]

Thanks, Bram!

We introduce InflatableBots, shape-changing inflatable robots for large-scale encountered-type haptics in VR. Unlike traditional inflatable shape displays, which are immobile and limited in interaction areas, our approach combines mobile robots with fan-based inflatable structures. This enables safe, scalable, and deployable haptic interactions on a large scale.

[ InflatableBots ]

We present a bioinspired passive dynamic foot in which the claws are actuated solely by the impact energy. Our gripper simultaneously resolves the issue of smooth absorption of the impact energy and fast closure of the claws by linking the motion of an ankle linkage and the claws through soft tendons.

[ Paper ]

In this video, a 3-UPU exoskeleton robot for a wrist joint is designed and controlled to perform wrist extension, flexion, radial-deviation, and ulnar-deviation motions in stroke-affected patients. This is the first time a 3-UPU robot has been used effectively for any kind of task.

“UPU” stands for “universal-prismatic-universal” and refers to the actuators—the prismatic joints between two universal joints.

[ BAS ]

Thanks, Tony!

BRUCE Got Spot-ted at ICRA2024.

[ Westwood Robotics ]

Parachutes: maybe not as good of an idea for drones as you might think.

[ Wing ]

In this paper, we propose a system for the artist-directed authoring of stylized bipedal walking gaits, tailored for execution on robotic characters. To demonstrate the utility of our approach, we animate gaits for a custom, free-walking robotic character, and show, with two additional in-simulation examples, how our procedural animation technique generalizes to bipeds with different degrees of freedom, proportions, and mass distributions.

[ Disney Research ]

The European drone project Labyrinth aims to keep new and conventional air traffic separate, especially in busy airspaces such as those expected in urban areas. The project provides a new drone-traffic service and illustrates its potential to improve the safety and efficiency of civil land, air, and sea transport, as well as emergency and rescue operations.

[ DLR ]

This Carnegie Mellon University Robotics Institute seminar, by Kim Baraka at Vrije Universiteit Amsterdam, is on the topic “Why We Should Build Robot Apprentices and Why We Shouldn’t Do It Alone.”

For robots to be able to truly integrate human-populated, dynamic, and unpredictable environments, they will have to have strong adaptive capabilities. In this talk, I argue that these adaptive capabilities should leverage interaction with end users, who know how (they want) a robot to act in that environment. I will present an overview of my past and ongoing work on the topic of human-interactive robot learning, a growing interdisciplinary subfield that embraces rich, bidirectional interaction to shape robot learning. I will discuss contributions on the algorithmic, interface, and interaction design fronts, showcasing several collaborations with animal behaviorists/trainers, dancers, puppeteers, and medical practitioners.

[ CMU RI ]

This post was originally published on the author’s personal blog.

Last year’s Conference on Robot Learning (CoRL) was the biggest CoRL yet, with over 900 attendees, 11 workshops, and almost 200 accepted papers. While there were a lot of cool new ideas (see this great set of notes for an overview of technical content), one particular debate seemed to be front and center: Is training a large neural network on a very large dataset a feasible way to solve robotics?1

Of course, some version of this question has been on researchers’ minds for a few years now. However, in the aftermath of the unprecedented success of ChatGPT and other large-scale “foundation models” on tasks that were thought to be unsolvable just a few years ago, the question was especially topical at this year’s CoRL. Developing a general-purpose robot, one that can competently and robustly execute a wide variety of tasks of interest in any home or office environment that humans can, has been perhaps the holy grail of robotics since the inception of the field. And given the recent progress of foundation models, it seems possible that scaling existing network architectures by training them on very large datasets might actually be the key to that grail.

Given how timely and significant this debate seems to be, I thought it might be useful to write a post centered around it. My main goal here is to try to present the different sides of the argument as I heard them, without bias towards any side. Almost all the content is taken directly from talks I attended or conversations I had with fellow attendees. My hope is that this serves to deepen people’s understanding around the debate, and maybe even inspire future research ideas and directions.

I want to start by presenting the main arguments I heard in favor of scaling as a solution to robotics.

Why Scaling Might Work
  • It worked for Computer Vision (CV) and Natural Language Processing (NLP), so why not robotics? This was perhaps the most common argument I heard, and the one that seemed to excite most people given recent models like GPT4-V and SAM. The point here is that training a large model on an extremely large corpus of data has recently led to astounding progress on problems thought to be intractable just 3-4 years ago. Moreover, doing this has led to a number of emergent capabilities, where trained models are able to perform well at a number of tasks they weren’t explicitly trained for. Importantly, the fundamental method here of training a large model on a very large amount of data is general and not somehow unique to CV or NLP. Thus, there seems to be no reason why we shouldn’t observe the same incredible performance on robotics tasks.
    • We’re already starting to see some evidence that this might work well: Chelsea Finn, Vincent Vanhoucke, and several others pointed to the recent RT-X and RT-2 papers from Google DeepMind as evidence that training a single model on large amounts of robotics data yields promising generalization capabilities. Russ Tedrake of Toyota Research Institute (TRI) and MIT pointed to the recent Diffusion Policies paper as showing a similar surprising capability. Sergey Levine of UC Berkeley highlighted recent efforts and successes from his group in building and deploying a robot-agnostic foundation model for navigation. All of these works are somewhat preliminary in that they train a relatively small model with a paltry amount of data compared to something like GPT4-V, but they certainly do seem to point to the fact that scaling up these models and datasets could yield impressive results in robotics.
  • Progress in data, compute, and foundation models are waves that we should ride: This argument is closely related to the above one, but distinct enough that I think it deserves to be discussed separately. The main idea here comes from Rich Sutton’s influential essay: The history of AI research has shown that relatively simple algorithms that scale well with data always outperform more complex/clever algorithms that do not. A nice analogy from Karol Hausman’s early career keynote is that improvements to data and compute are like a wave that is bound to happen given the progress and adoption of technology. Whether we like it or not, there will be more data and better compute. As AI researchers, we can either choose to ride this wave, or we can ignore it. Riding this wave means recognizing all the progress that’s happened because of large data and large models, and then developing algorithms, tools, datasets, etc. to take advantage of this progress. It also means leveraging large pre-trained models from vision and language that currently exist or will exist for robotics tasks.
  • Robotics tasks of interest lie on a relatively simple manifold, and training a large model will help us find it: This was something rather interesting that Russ Tedrake pointed out during a debate in the workshop on robustly deploying learning-based solutions. The manifold hypothesis as applied to robotics roughly states that, while the space of possible tasks we could conceive of having a robot do is impossibly large and complex, the tasks that actually occur practically in our world lie on some much lower-dimensional and simpler manifold of this space. By training a single model on large amounts of data, we might be able to discover this manifold. If we believe that such a manifold exists for robotics — which certainly seems intuitive — then this line of thinking would suggest that robotics is not somehow different from CV or NLP in any fundamental way. The same recipe that worked for CV and NLP should be able to discover the manifold for robotics and yield a shockingly competent generalist robot. Even if this doesn’t exactly happen, Tedrake points out that attempting to train a large model for general robotics tasks could teach us important things about the manifold of robotics tasks, and perhaps we can leverage this understanding to solve robotics.
  • Large models are the best approach we have to get at “common sense” capabilities, which pervade all of robotics: Another thing Russ Tedrake pointed out is that “common sense” pervades almost every robotics task of interest. Consider the task of having a mobile manipulation robot place a mug onto a table. Even if we ignore the challenging problems of finding and localizing the mug, there are a surprising number of subtleties to this problem. What if the table is cluttered and the robot has to move other objects out of the way? What if the mug accidentally falls on the floor and the robot has to pick it up again, re-orient it, and place it on the table? And what if the mug has something in it, so it’s important it’s never overturned? These “edge cases” are actually much more common that it might seem, and often are the difference between success and failure for a task. Moreover, these seem to require some sort of ‘common sense’ reasoning to deal with. Several people argued that large models trained on a large amount of data are the best way we know of to yield some aspects of this ‘common sense’ capability. Thus, they might be the best way we know of to solve general robotics tasks.

As you might imagine, there were a number of arguments against scaling as a practical solution to robotics. Interestingly, almost no one directly disputes that this approach could work in theory. Instead, most arguments fall into one of two buckets: (1) arguing that this approach is simply impractical, and (2) arguing that even if it does kind of work, it won’t really “solve” robotics.

Why Scaling Might Not WorkIt’s impractical
  • We currently just don’t have much robotics data, and there’s no clear way we’ll get it: This is the elephant in pretty much every large-scale robot learning room. The Internet is chock-full of data for CV and NLP, but not at all for robotics. Recent efforts to collect very large datasets have required tremendous amounts of time, money, and cooperation, yet have yielded a very small fraction of the amount of vision and text data on the Internet. CV and NLP got so much data because they had an incredible “data flywheel”: tens of millions of people connecting to and using the Internet. Unfortunately for robotics, there seems to be no reason why people would upload a bunch of sensory input and corresponding action pairs. Collecting a very large robotics dataset seems quite hard, and given that we know that a lot of important “emergent” properties only showed up in vision and language models at scale, the inability to get a large dataset could render this scaling approach hopeless.
  • Robots have different embodiments: Another challenge with collecting a very large robotics dataset is that robots come in a large variety of different shapes, sizes, and form factors. The output control actions that are sent to a Boston Dynamics Spot robot are very different to those sent to a KUKA iiwa arm. Even if we ignore the problem of finding some kind of common output space for a large trained model, the variety in robot embodiments means we’ll probably have to collect data from each robot type, and that makes the above data-collection problem even harder.
  • There is extremely large variance in the environments we want robots to operate in: For a robot to really be “general purpose,” it must be able to operate in any practical environment a human might want to put it in. This means operating in any possible home, factory, or office building it might find itself in. Collecting a dataset that has even just one example of every possible building seems impractical. Of course, the hope is that we would only need to collect data in a small fraction of these, and the rest will be handled by generalization. However, we don’t know how much data will be required for this generalization capability to kick in, and it very well could also be impractically large.
  • Training a model on such a large robotics dataset might be too expensive/energy-intensive: It’s no secret that training large foundation models is expensive, both in terms of money and in energy consumption. GPT-4V — OpenAI’s biggest foundation model at the time of this writing — reportedly cost over US $100 million and 50 million KWh of electricity to train. This is well beyond the budget and resources that any academic lab can currently spare, so a larger robotics foundation model would need to be trained by a company or a government of some kind. Additionally, depending on how large both the dataset and model itself for such an endeavor are, the costs may balloon by another order-of-magnitude or more, which might make it completely infeasible.
Even if it works as well as in CV/NLP, it won’t solve robotics
  • The 99.X problem and long tails: Vincent Vanhoucke of Google Robotics started a talk with a provocative assertion: Most — if not all — robot learning approaches cannot be deployed for any practical task. The reason? Real-world industrial and home applications typically require 99.X percent or higher accuracy and reliability. What exactly that means varies by application, but it’s safe to say that robot learning algorithms aren’t there yet. Most results presented in academic papers top out at 80 percent success rate. While that might seem quite close to the 99.X percent threshold, people trying to actually deploy these algorithms have found that it isn’t so: getting higher success rates requires asymptotically more effort as we get closer to 100 percent. That means going from 85 to 90 percent might require just as much — if not more — effort than going from 40 to 80 percent. Vincent asserted in his talk that getting up to 99.X percent is a fundamentally different beast than getting even up to 80 percent, one that might require a whole host of new techniques beyond just scaling.
    • Existing big models don’t get to 99.X percent even in CV and NLP: As impressive and capable as current large models like GPT-4V and DETIC are, even they don’t achieve 99.X percent or higher success rate on previously-unseen tasks. Current robotics models are very far from this level of performance, and I think it’s safe to say that the entire robot learning community would be thrilled to have a general model that does as well on robotics tasks as GPT-4V does on NLP tasks. However, even if we had something like this, it wouldn’t be at 99.X percent, and it’s not clear that it’s possible to get there by scaling either.
  • Self-driving car companies have tried this approach, and it doesn’t fully work (yet): This is closely related to the above point, but important and subtle enough that I think it deserves to stand on its own. A number of self-driving car companies — most notably Tesla and Wayve — have tried training such an end-to-end big model on large amounts of data to achieve Level 5 autonomy. Not only do these companies have the engineering resources and money to train such models, but they also have the data. Tesla in particular has a fleet of over 100,000 cars deployed in the real world that it is constantly collecting and then annotating data from. These cars are being teleoperated by experts, making the data ideal for large-scale supervised learning. And despite all this, Tesla has so far been unable to produce a Level 5 autonomous driving system. That’s not to say their approach doesn’t work at all. It competently handles a large number of situations — especially highway driving — and serves as a useful Level 2 (i.e., driver assist) system. However, it’s far from 99.X percent performance. Moreover, data seems to suggest that Tesla’s approach is faring far worse than Waymo or Cruise, which both use much more modular systems. While it isn’t inconceivable that Tesla’s approach could end up catching up and surpassing its competitors performance in a year or so, the fact that it hasn’t worked yet should serve as evidence perhaps that the 99.X percent problem is hard to overcome for a large-scale ML approach. Moreover, given that self-driving is a special case of general robotics, Tesla’s case should give us reason to doubt the large-scale model approach as a full solution to robotics, especially in the medium term.
  • Many robotics tasks of interest are quite long-horizon: Accomplishing any task requires taking a number of correct actions in sequence. Consider the relatively simple problem of making a cup of tea given an electric kettle, water, a box of tea bags, and a mug. Success requires pouring the water into the kettle, turning it on, then pouring the hot water into the mug, and placing a tea-bag inside it. If we want to solve this with a model trained to output motor torque commands given pixels as input, we’ll need to send torque commands to all 7 motors at around 40 Hz. Let’s suppose that this tea-making task requires 5 minutes. That requires 7 * 40 * 60 * 5 = 84,000 correct torque commands. This is all just for a stationary robot arm; things get much more complicated if the robot is mobile, or has more than one arm. It is well-known that error tends to compound with longer-horizons for most tasks. This is one reason why — despite their ability to produce long sequences of text — even LLMs cannot yet produce completely coherent novels or long stories: small deviations from a true prediction over time tend to add up and yield extremely large deviations over long-horizons. Given that most, if not all robotics tasks of interest require sending at least thousands, if not hundreds of thousands, of torques in just the right order, even a fairly well-performing model might really struggle to fully solve these robotics tasks.

Okay, now that we’ve sketched out all the main points on both sides of the debate, I want to spend some time diving into a few related points. Many of these are responses to the above points on the ‘against’ side, and some of them are proposals for directions to explore to help overcome the issues raised.

Miscellaneous Related ArgumentsWe can probably deploy learning-based approaches robustly

One point that gets brought up a lot against learning-based approaches is the lack of theoretical guarantees. At the time of this writing, we know very little about neural network theory: we don’t really know why they learn well, and more importantly, we don’t have any guarantees on what values they will output in different situations. On the other hand, most classical control and planning approaches that are widely used in robotics have various theoretical guarantees built-in. These are generally quite useful when certifying that systems are safe.

However, there seemed to be general consensus amongst a number of CoRL speakers that this point is perhaps given more significance than it should. Sergey Levine pointed out that most of the guarantees from controls aren’t really that useful for a number of real-world tasks we’re interested in. As he put it: “self-driving car companies aren’t worried about controlling the car to drive in a straight line, but rather about a situation in which someone paints a sky onto the back of a truck and drives in front of the car,” thereby confusing the perception system. Moreover, Scott Kuindersma of Boston Dynamics talked about how they’re deploying RL-based controllers on their robots in production, and are able to get the confidence and guarantees they need via rigorous simulation and real-world testing. Overall, I got the sense that while people feel that guarantees are important, and encouraged researchers to keep trying to study them, they don’t think that the lack of guarantees for learning-based systems means that they cannot be deployed robustly.

What if we strive to deploy Human-in-the-Loop systems?

In one of the organized debates, Emo Todorov pointed out that existing successful ML systems, like Codex and ChatGPT, work well only because a human interacts with and sanitizes their output. Consider the case of coding with Codex: it isn’t intended to directly produce runnable, bug-free code, but rather to act as an intelligent autocomplete for programmers, thereby making the overall human-machine team more productive than either alone. In this way, these models don’t have to achieve the 99.X percent performance threshold, because a human can help correct any issues during deployment. As Emo put it: “humans are forgiving, physics is not.”

Chelsea Finn responded to this by largely agreeing with Emo. She strongly agreed that all successfully-deployed and useful ML systems have humans in the loop, and so this is likely the setting that deployed robot learning systems will need to operate in as well. Of course, having a human operate in the loop with a robot isn’t as straightforward as in other domains, since having a human and robot inhabit the same space introduces potential safety hazards. However, it’s a useful setting to think about, especially if it can help address issues brought on by the 99.X percent problem.

Maybe we don’t need to collect that much real world data for scaling

A number of people at the conference were thinking about creative ways to overcome the real-world data bottleneck without actually collecting more real world data. Quite a few of these people argued that fast, realistic simulators could be vital here, and there were a number of works that explored creative ways to train robot policies in simulation and then transfer them to the real world. Another set of people argued that we can leverage existing vision, language, and video data and then just ‘sprinkle in’ some robotics data. Google’s recent RT-2 model showed how taking a large model trained on internet scale vision and language data, and then just fine-tuning it on a much smaller set robotics data can produce impressive performance on robotics tasks. Perhaps through a combination of simulation and pretraining on general vision and language data, we won’t actually have to collect too much real-world robotics data to get scaling to work well for robotics tasks.

Maybe combining classical and learning-based approaches can give us the best of both worlds

As with any debate, there were quite a few people advocating the middle path. Scott Kuindersma of Boston Dynamics titled one of his talks “Let’s all just be friends: model-based control helps learning (and vice versa)”. Throughout his talk, and the subsequent debates, his strong belief that in the short to medium term, the best path towards reliable real-world systems involves combining learning with classical approaches. In her keynote speech for the conference, Andrea Thomaz talked about how such a hybrid system — using learning for perception and a few skills, and classical SLAM and path-planning for the rest — is what powers a real-world robot that’s deployed in tens of hospital systems in Texas (and growing!). Several papers explored how classical controls and planning, together with learning-based approaches can enable much more capability than any system on its own. Overall, most people seemed to argue that this ‘middle path’ is extremely promising, especially in the short to medium term, but perhaps in the long-term either pure learning or an entirely different set of approaches might be best.

What Can/Should We Take Away From All This?

If you’ve read this far, chances are that you’re interested in some set of takeaways/conclusions. Perhaps you’re thinking “this is all very interesting, but what does all this mean for what we as a community should do? What research problems should I try to tackle?” Fortunately for you, there seemed to be a number of interesting suggestions that had some consensus on this.

We should pursue the direction of trying to just scale up learning with very large datasets

Despite the various arguments against scaling solving robotics outright, most people seem to agree that scaling in robot learning is a promising direction to be investigated. Even if it doesn’t fully solve robotics, it could lead to a significant amount of progress on a number of hard problems we’ve been stuck on for a while. Additionally, as Russ Tedrake pointed out, pursuing this direction carefully could yield useful insights about the general robotics problem, as well as current learning algorithms and why they work so well.

We should also pursue other existing directions

Even the most vocal proponents of the scaling approach were clear that they don’t think everyone should be working on this. It’s likely a bad idea for the entire robot learning community to put its eggs in the same basket, especially given all the reasons to believe scaling won’t fully solve robotics. Classical robotics techniques have gotten us quite far, and led to many successful and reliable deployments: pushing forward on them or integrating them with learning techniques might be the right way forward, especially in the short to medium terms.

We should focus more on real-world mobile manipulation and easy-to-use systems

Vincent Vanhoucke made an observation that most papers at CoRL this year were limited to tabletop manipulation settings. While there are plenty of hard tabletop problems, things generally get a lot more complicated when the robot — and consequently its camera view — moves. Vincent speculated that it’s easy for the community to fall into a local minimum where we make a lot of progress that’s specific to the tabletop setting and therefore not generalizable. A similar thing could happen if we work predominantly in simulation. Avoiding these local minima by working on real-world mobile manipulation seems like a good idea.

Separately, Sergey Levine observed that a big reason why LLM’s have seen so much excitement and adoption is because they’re extremely easy to use: especially by non-experts. One doesn’t have to know about the details of training an LLM, or perform any tough setup, to prompt and use these models for their own tasks. Most robot learning approaches are currently far from this. They often require significant knowledge of their inner workings to use, and involve very significant amounts of setup. Perhaps thinking more about how to make robot learning systems easier to use and widely applicable could help improve adoption and potentially scalability of these approaches.

We should be more forthright about things that don’t work

There seemed to be a broadly-held complaint that many robot learning approaches don’t adequately report negative results, and this leads to a lot of unnecessary repeated effort. Additionally, perhaps patterns might emerge from consistent failures of things that we expect to work but don’t actually work well, and this could yield novel insight into learning algorithms. There is currently no good incentive for researchers to report such negative results in papers, but most people seemed to be in favor of designing one.

We should try to do something totally new

There were a few people who pointed out that all current approaches — be they learning-based or classical — are unsatisfying in a number of ways. There seem to be a number of drawbacks with each of them, and it’s very conceivable that there is a completely different set of approaches that ultimately solves robotics. Given this, it seems useful to try think outside the box. After all, every one of the current approaches that’s part of the debate was only made possible because the few researchers that introduced them dared to think against the popular grain of their times.

Acknowledgements: Huge thanks to Tom Silver and Leslie Kaelbling for providing helpful comments, suggestions, and encouragement on a previous draft of this post.

1 In fact, this was the topic of a popular debate hosted at a workshop on the first day; many of the points in this post were inspired by the conversation during that debate.

Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion.

RoboCup 2024: 17–22 July 2024, EINDHOVEN, NETHERLANDSICSR 2024: 23–26 October 2024, ODENSE, DENMARKCybathlon 2024: 25–27 October 2024, ZURICH

Enjoy today’s videos!

NAVER 1784 is the world’s largest robotics testbed. The Starbucks on the second floor of 1784 is the world’s most unique Starbucks, with more than 100 service robots called “Rookie” delivering Starbucks drinks to meeting rooms and private seats, and various experiments with a dual-arm robot.

[ Naver ]

If you’re gonna take a robot dog with you on a hike, the least it could do is carry your backpack for you.

[ Deep Robotics ]

Obligatory reminder that phrases like “no teleoperation” without any additional context can mean many different things.

[ Astribot ]

This video is presented at the ICRA 2024 conference and summarizes recent results of our Learning AI for Dextrous Manipulation Lab. It demonstrates how our learning AI methods allowed for breakthroughs in dextrous manipulation with the mobile humanoid robot DLR Agile Justin. Although the core of the mechatronic hardware is almost 20 years old, only the advent of learning AI methods enabled a level of dexterity, flexibility and autonomy coming close to human capabilities.

[ TUM ]

Thanks Berthold!

Hands of blue? Not a good look.

[ Synaptic ]

With all the humanoid stuff going on, there really should be more emphasis on intentional contact—humans lean and balance on things all the time, and robots should too!

[ Inria ]

LimX Dynamics W1 is now more than a wheeled quadruped. By evolving into a biped robot, W1 maneuvers slickly on two legs in different ways: non-stop 360° rotation, upright free gliding, slick maneuvering, random collision and self-recovery, and step walking.

[ LimX Dynamics ]

Animal brains use less data and energy compared to current deep neural networks running on Graphics Processing Units (GPUs). This makes it hard to develop tiny autonomous drones, which are too small and light for heavy hardware and big batteries. Recently, the emergence of neuromorphic processors that mimic how brains function has made it possible for researchers from Delft University of Technology to develop a drone that uses neuromorphic vision and control for autonomous flight.

[ Science ]

In the beginning of the universe, all was darkness — until the first organisms developed sight, which ushered in an explosion of life, learning and progress. AI pioneer Fei-Fei Li says a similar moment is about to happen for computers and robots. She shows how machines are gaining “spatial intelligence” — the ability to process visual data, make predictions and act upon those predictions — and shares how this could enable AI to interact with humans in the real world.

[ TED ]

Greetings from the IEEE International Conference on Robotics and Automation (ICRA) in Yokohama, Japan! We hope you’ve been enjoying our short videos on TikTok, YouTube, and Instagram. They are just a preview of our in-depth ICRA coverage, and over the next several weeks we’ll have lots of articles and videos for you. In today’s edition of Video Friday, we bring you a dozen of the most interesting projects presented at the conference.

Enjoy today’s videos, and stay tuned for more ICRA posts!

Upcoming robotics events for the next few months:

RoboCup 2024: 17–22 July 2024, EINDHOVEN, NETHERLANDSICSR 2024: 23–26 October 2024, ODENSE, DENMARKCybathlon 2024: 25–27 October 2024, ZURICH, SWITZERLAND

Please send us your events for inclusion.

The following two videos are part of the “ Cooking Robotics: Perception and Motion Planning” workshop, which explored “the new frontiers of ‘robots in cooking,’ addressing various scientific research questions, including hardware considerations, key challenges in multimodal perception, motion planning and control, experimental methodologies, and benchmarking approaches.” The workshop featured robots handling food items like cookies, burgers, and cereal, and the two robots seen in the videos below used knives to slice cucumbers and cakes. You can watch all workshop videos here.

“SliceIt!: Simulation-Based Reinforcement Learning for Compliant Robotic Food Slicing,” by Cristian C. Beltran-Hernandez, Nicolas Erbetti, and Masashi Hamaya from OMRON SINIC X Corporation, Tokyo, Japan.

Cooking robots can enhance the home experience by reducing the burden of daily chores. However, these robots must perform their tasks dexterously and safely in shared human environments, especially when handling dangerous tools such as kitchen knives. This study focuses on enabling a robot to autonomously and safely learn food-cutting tasks. More specifically, our goal is to enable a collaborative robot or industrial robot arm to perform food-slicing tasks by adapting to varying material properties using compliance control. Our approach involves using Reinforcement Learning (RL) to train a robot to compliantly manipulate a knife, by reducing the contact forces exerted by the food items and by the cutting board. However, training the robot in the real world can be inefficient, and dangerous, and result in a lot of food waste. Therefore, we proposed SliceIt!, a framework for safely and efficiently learning robot food-slicing tasks in simulation. Following a real2sim2real approach, our framework consists of collecting a few real food slicing data, calibrating our dual simulation environment (a high-fidelity cutting simulator and a robotic simulator), learning compliant control policies on the calibrated simulation environment, and finally, deploying the policies on the real robot.

“Cafe Robot: Integrated AI Skillset Based on Large Language Models,” by Jad Tarifi, Nima Asgharbeygi, Shuhei Takamatsu, and Masataka Goto from Integral AI in Tokyo, Japan, and Mountain View, Calif., USA.

The cafe robot engages in natural language inter-action to receive orders and subsequently prepares coffee and cakes. Each action involved in making these items is executed using AI skills developed by Integral, including Integral Liquid Pouring, Integral Powder Scooping, and Integral Cutting. The dialogue for making coffee, as well as the coordination of each action based on the dialogue, is facilitated by the Integral Task Planner.

“Autonomous Overhead Powerline Recharging for Uninterrupted Drone Operations,” by Viet Duong Hoang, Frederik Falk Nyboe, Nicolaj Haarhøj Malle, and Emad Ebeid from University of Southern Denmark, Odense, Denmark.

We present a fully autonomous self-recharging drone system capable of long-duration sustained operations near powerlines. The drone is equipped with a robust onboard perception and navigation system that enables it to locate powerlines and approach them for landing. A passively actuated gripping mechanism grasps the powerline cable during landing after which a control circuit regulates the magnetic field inside a split-core current transformer to provide sufficient holding force as well as battery recharging. The system is evaluated in an active outdoor three-phase powerline environment. We demonstrate multiple contiguous hours of fully autonomous uninterrupted drone operations composed of several cycles of flying, landing, recharging, and takeoff, validating the capability of extended, essentially unlimited, operational endurance.

“Learning Quadrupedal Locomotion With Impaired Joints Using Random Joint Masking,” by Mincheol Kim, Ukcheol Shin, and Jung-Yup Kim from Seoul National University of Science and Technology, Seoul, South Korea, and Robotics Institute, Carnegie Mellon University, Pittsburgh, Pa., USA.

Quadrupedal robots have played a crucial role in various environments, from structured environments to complex harsh terrains, thanks to their agile locomotion ability. However, these robots can easily lose their locomotion functionality if damaged by external accidents or internal malfunctions. In this paper, we propose a novel deep reinforcement learning framework to enable a quadrupedal robot to walk with impaired joints. The proposed framework consists of three components: 1) a random joint masking strategy for simulating impaired joint scenarios, 2) a joint state estimator to predict an implicit status of current joint condition based on past observation history, and 3) progressive curriculum learning to allow a single network to conduct both normal gait and various joint-impaired gaits. We verify that our framework enables the Unitree’s Go1 robot to walk under various impaired joint conditions in real world indoor and outdoor environments.

“Synthesizing Robust Walking Gaits via Discrete-Time Barrier Functions With Application to Multi-Contact Exoskeleton Locomotion,” by Maegan Tucker, Kejun Li, and Aaron D. Ames from Georgia Institute of Technology, Atlanta, Ga., and California Institute of Technology, Pasadena, Calif., USA.

Successfully achieving bipedal locomotion remains challenging due to real-world factors such as model uncertainty, random disturbances, and imperfect state estimation. In this work, we propose a novel metric for locomotive robustness – the estimated size of the hybrid forward invariant set associated with the step-to-step dynamics. Here, the forward invariant set can be loosely interpreted as the region of attraction for the discrete-time dynamics. We illustrate the use of this metric towards synthesizing nominal walking gaits using a simulation in-the-loop learning approach. Further, we leverage discrete time barrier functions and a sampling-based approach to approximate sets that are maximally forward invariant. Lastly, we experimentally demonstrate that this approach results in successful locomotion for both flat-foot walking and multicontact walking on the Atalante lower-body exoskeleton.

“Supernumerary Robotic Limbs to Support Post-Fall Recoveries for Astronauts,” by Erik Ballesteros, Sang-Yoep Lee, Kalind C. Carpenter, and H. Harry Asada from MIT, Cambridge, Mass., USA, and Jet Propulsion Laboratory, California Institute of Technology, Pasadena, Calif., USA.

This paper proposes the utilization of Supernumerary Robotic Limbs (SuperLimbs) for augmenting astronauts during an Extra-Vehicular Activity (EVA) in a partial-gravity environment. We investigate the effectiveness of SuperLimbs in assisting astronauts to their feet following a fall. Based on preliminary observations from a pilot human study, we categorized post-fall recoveries into a sequence of statically stable poses called “waypoints”. The paths between the waypoints can be modeled with a simplified kinetic motion applied about a specific point on the body. Following the characterization of post-fall recoveries, we designed a task-space impedance control with high damping and low stiffness, where the SuperLimbs provide an astronaut with assistance in post-fall recovery while keeping the human in-the-loop scheme. In order to validate this control scheme, a full-scale wearable analog space suit was constructed and tested with a SuperLimbs prototype. Results from the experimentation found that without assistance, astronauts would impulsively exert themselves to perform a post-fall recovery, which resulted in high energy consumption and instabilities maintaining an upright posture, concurring with prior NASA studies. When the SuperLimbs provided assistance, the astronaut’s energy consumption and deviation in their tracking as they performed a post-fall recovery was reduced considerably.

“ArrayBot: Reinforcement Learning for Generalizable Distributed Manipulation through Touch,” by Zhengrong Xue, Han Zhang, Jingwen Cheng, Zhengmao He, Yuanchen Ju, Changyi Lin, Gu Zhang, and Huazhe Xu from Tsinghua Embodied AI Lab, IIIS, Tsinghua University; Shanghai Qi Zhi Institute; Shanghai AI Lab; and Shanghai Jiao Tong University, Shanghai, China.

We present ArrayBot, a distributed manipulation system consisting of a 16 × 16 array of vertically sliding pillars integrated with tactile sensors. Functionally, ArrayBot is designed to simultaneously support, perceive, and manipulate the tabletop objects. Towards generalizable distributed manipulation, we leverage reinforcement learning (RL) algorithms for the automatic discovery of control policies. In the face of the massively redundant actions, we propose to reshape the action space by considering the spatially local action patch and the low-frequency actions in the frequency domain. With this reshaped action space, we train RL agents that can relocate diverse objects through tactile observations only. Intriguingly, we find that the discovered policy can not only generalize to unseen object shapes in the simulator but also have the ability to transfer to the physical robot without any sim-to-real fine tuning. Leveraging the deployed policy, we derive more real world manipulation skills on ArrayBot to further illustrate the distinctive merits of our proposed system.

“SKT-Hang: Hanging Everyday Objects via Object-Agnostic Semantic Keypoint Trajectory Generation,” by Chia-Liang Kuo, Yu-Wei Chao, and Yi-Ting Chen from National Yang Ming Chiao Tung University, in Taipei and Hsinchu, Taiwan, and NVIDIA.

We study the problem of hanging a wide range of grasped objects on diverse supporting items. Hanging objects is a ubiquitous task that is encountered in numerous aspects of our everyday lives. However, both the objects and supporting items can exhibit substantial variations in their shapes and structures, bringing two challenging issues: (1) determining the task-relevant geometric structures across different objects and supporting items, and (2) identifying a robust action sequence to accommodate the shape variations of supporting items. To this end, we propose Semantic Keypoint Trajectory (SKT), an object agnostic representation that is highly versatile and applicable to various everyday objects. We also propose Shape-conditioned Trajectory Deformation Network (SCTDN), a model that learns to generate SKT by deforming a template trajectory based on the task-relevant geometric structure features of the supporting items. We conduct extensive experiments and demonstrate substantial improvements in our framework over existing robot hanging methods in the success rate and inference time. Finally, our simulation-trained framework shows promising hanging results in the real world.

“TEXterity: Tactile Extrinsic deXterity,” by Antonia Bronars, Sangwoon Kim, Parag Patre, and Alberto Rodriguez from MIT and Magna International Inc.

We introduce a novel approach that combines tactile estimation and control for in-hand object manipulation. By integrating measurements from robot kinematics and an image based tactile sensor, our framework estimates and tracks object pose while simultaneously generating motion plans in a receding horizon fashion to control the pose of a grasped object. This approach consists of a discrete pose estimator that tracks the most likely sequence of object poses in a coarsely discretized grid, and a continuous pose estimator-controller to refine the pose estimate and accurately manipulate the pose of the grasped object. Our method is tested on diverse objects and configurations, achieving desired manipulation objectives and outperforming single-shot methods in estimation accuracy. The proposed approach holds potential for tasks requiring precise manipulation and limited intrinsic in-hand dexterity under visual occlusion, laying the foundation for closed loop behavior in applications such as regrasping, insertion, and tool use.

“Out of Sight, Still in Mind: Reasoning and Planning about Unobserved Objects With Video Tracking Enabled Memory Models,” by Yixuan Huang, Jialin Yuan, Chanho Kim, Pupul Pradhan, Bryan Chen, Li Fuxin, and Tucker Hermans from University of Utah, Salt Lake City, Utah, Oregon State University, Corvallis, Ore., and NVIDIA, Seattle, Wash., USA.

Robots need to have a memory of previously observed, but currently occluded objects to work reliably in realistic environments. We investigate the problem of encoding object-oriented memory into a multi-object manipulation reasoning and planning framework. We propose DOOM and LOOM, which leverage transformer relational dynamics to encode the history of trajectories given partial-view point clouds and an object discovery and tracking engine. Our approaches can perform multiple challenging tasks including reasoning with occluded objects, novel objects appearance, and object reappearance. Throughout our extensive simulation and real world experiments, we find that our approaches perform well in terms of different numbers of objects and different numbers

“Open Sourse Underwater Robot: Easys,” by Michikuni Eguchi, Koki Kato, Tatsuya Oshima, and Shunya Hara from University of Tsukuba and Osaka University, Japan.

“Sensorized Soft Skin for Dexterous Robotic Hands,” by Jana Egli, Benedek Forrai, Thomas Buchner, Jiangtao Su, Xiaodong Chen, and Robert K. Katzschmann from ETH Zurich, Switzerland, and Nanyang Technological University, Singapore.

Conventional industrial robots often use two fingered grippers or suction cups to manipulate objects or interact with the world. Because of their simplified design, they are unable to reproduce the dexterity of human hands when manipulating a wide range of objects. While the control of humanoid hands evolved greatly, hardware platforms still lack capabilities, particularly in tactile sensing and providing soft contact surfaces. In this work, we present a method that equips the skeleton of a tendon-driven humanoid hand with a soft and sensorized tactile skin. Multi-material 3D printing allows us to iteratively approach a cast skin design which preserves the robot’s dexterity in terms of range of motion and speed. We demonstrate that a soft skin enables frmer grasps and piezoresistive sensor integration enhances the hand’s tactile sensing capabilities.

It’s hard to think of a more dramatic way to make an entrance than falling from the sky. While it certainly happens often enough on the silver screen, whether or not it can be done in real life is a tantalizing challenge for our entertainment robotics team at Disney Research.

Falling is tricky for two reasons. The first and most obvious is what Douglas Adams referred to as “the sudden stop at the end.” Every second of free fall means another 9.8 m/s of velocity, and that can quickly add up to an extremely difficult energy dissipation problem. The other tricky thing about falling, especially for terrestrial animals like us, is that our normal methods for controlling our orientation disappear. We are used to relying on contact forces between our body and the environment to control which way we’re pointing. In the air, there’s nothing to push on except the air itself!

Finding a solution to these problems is a big, open-ended challenge. In the clip below, you can see one approach we’ve taken to start chipping away at it.

The video shows a small, stick-like robot with an array of four ducted fans attached to its top. The robot has a piston-like foot that absorbs the impact of a small fall, and then the ducted fans keep the robot standing by counteracting any tilting motion using aerodynamic thrust.

Raphael Pilon [left] and Marcela de los Rios evaluate the performance of the monopod balancing robot.Disney Research

The standing portion demonstrates that pushing on the air isn’t only useful during freefall. Conventional walking and hopping robots depend on ground contact forces to maintain the required orientation. These forces can ramp up quickly because of the stiffness of the system, necessitating high bandwidth control strategies. Aerodynamic forces are relatively soft, but even so, they were sufficient to keep our robots standing. And since these forces can also be applied during the flight phase of running or hopping, this approach might lead to robots that run before they walk. The thing that defines a running gait is the existence of a “flight phase” - a time when none of the feet are in contact with the ground. A running robot with aerodynamic control authority could potentially use a gait with a long flight phase. This would shift the burden of the control effort to mid-flight, simplifying the leg design and possibly making rapid bipedal motion more tractable than a moderate pace.

Richard Landon uses a test rig to evaluate the thrust profile of a ducted fan.Disney Research

In the next video, a slightly larger robot tackles a much more dramatic fall, from 65’ in the air. This simple machine has two piston-like feet and a similar array of ducted fans on top. The fans not only stabilize the robot upon landing, they also help keep it oriented properly as it falls. Inside each foot is a plug of single-use compressible foam. Crushing the foam on impact provides a nice, constant force profile, which maximizes the amount of energy dissipated per inch of contraction.

In the case of this little robot, the mechanical energy dissipation in the pistons is less than the total energy needed to be dissipated from the fall, so the rest of the mechanism takes a pretty hard hit. The size of the robot is an advantage in this case, because scaling laws mean that the strength-to-weight ratio is in its favor.

The strength of a component is a function of its cross-sectional area, while the weight of a component is a function of its volume. Area is proportional to length squared, while volume is proportional to length cubed. This means that as an object gets smaller, its weight becomes relatively small. This is why a toddler can be half the height of an adult but only a fraction of that adult’s weight, and why ants and spiders can run around on long, spindly legs. Our tiny robots take advantage of this, but we can’t stop there if we want to represent some of our bigger characters.

Louis Lambie and Michael Lynch assemble an early ducted fan test platform. The platform was mounted on guidewires and was used for lifting capacity tests.Disney Research

In most aerial robotics applications, control is provided by a system that is capable of supporting the entire weight of the robot. In our case, being able to hover isn’t a necessity. The clip below shows an investigation into how much thrust is needed to control the orientation of a fairly large, heavy robot. The robot is supported on a gimbal, allowing it to spin freely. At the extremities are mounted arrays of ducted fans. The fans don’t have enough force to keep the frame in the air, but they do have a lot of control authority over the orientation.

Complicated robots are less likely to survive unscathed when subjected to the extremely high accelerations of a direct ground impact, as you can see in this early test that didn’t quite go according to plan.

In this last video, we use a combination of the previous techniques and add one more capability – a dramatic mid-air stop. Ducted fans are part of this solution, but the high-speed deceleration is principally accomplished by a large water rocket. Then the mechanical legs only have to handle the last ten feet of dropping acceleration.

Whether it’s using water or rocket fuel, the principle underlying a rocket is the same – mass is ejected from the rocket at high speed, producing a reaction force in the opposite direction via Newton’s third law. The higher the flow rate and the denser the fluid, the more force is produced. To get a high flow rate and a quick response time, we needed a wide nozzle that went from closed to open cleanly in a matter of milliseconds. We designed a system using a piece of copper foil and a custom punch mechanism that accomplished just that.

Grant Imahara pressurizes a test tank to evaluate an early valve prototype [left]. The water rocket in action - note the laminar, two-inch-wide flow as it passes through the specially designed nozzleDisney Research

Once the water rocket has brought the robot to a mid-air stop, the ducted fans are able to hold it in a stable hover about ten feet above the deck. When they cut out, the robot falls again and the legs absorb the impact. In the video, the robot has a couple of loose tethers attached as a testing precaution, but they don’t provide any support, power, or guidance.

“It might not be so obvious as to what this can be directly used for today, but these rough proof-of-concept experiments show that we might be able to work within real-world physics to do the high falls our characters do on the big screen, and someday actually stick the landing,” explains Tony Dohi, the project lead.

There are still a large number of problems for future projects to address. Most characters have legs that bend on hinges rather than compress like pistons, and don’t wear a belt made of ducted fans. Beyond issues of packaging and form, making sure that the robot lands exactly where it intends to land has interesting implications for perception and control. Regardless, we think we can confirm that this kind of entrance has–if you’ll excuse the pun–quite the impact.