Jenga-playing robot combines vision and touch

2 mins read

MIT has developed a Jenga-playing robot that demonstrates machine-learning through not just visual cues, but tactile physical interactions – something which previous systems have had found challenging.

Equipped with a soft-pronged gripper, a force-sensing wrist cuff, and an external camera, the robot uses these tools to see and feel the tower and individual blocks.

As the robot carefully pushes against a block, a computer takes in visual and tactile feedback from its camera and cuff, and compares these measurements to moves that the robot previously made. It also considers the outcomes of those moves — specifically, whether a block, in a certain configuration and pushed with a certain amount of force, was successfully extracted or not. In real-time, the robot then “learns” whether to keep pushing or move to a new block, in order to keep the tower from falling.

Jenga requires mastery of physical skills such as probing, pushing, pulling, placing, and aligning pieces and interactive perception and manipulation, unlike purely cognitive tasks or games like chess.

“This is very difficult to simulate, so the robot has to learn in the real world, by interacting with the real Jenga tower. The key challenge is to learn from a relatively small number of experiments by exploiting common sense about objects and physics,” explains Assistant Professor Alberto Rodriguez of MIT.

He says the tactile learning system the researchers have developed can be used in applications beyond Jenga, especially in tasks that need careful physical interaction, including separating recyclable objects from landfill trash and assembling consumer products.

“In a cellphone assembly line, in almost every single step, the feeling of a snap-fit, or a threaded screw, is coming from force and touch rather than vision,” Rodriguez says. “Learning models for those actions is prime real-estate for this kind of technology.”

The team customised an industry-standard ABB IRB 120 robotic arm, then set up a Jenga tower within the robot’s reach and began a training period in which it first chose a random block and a location on the block against which to push. It then exerted a small amount of force in an attempt to push the block out of the tower.

For each block attempt, a computer recorded the associated visual and force measurements, and labelled whether each attempt was a success.

Rather than carry out tens of thousands of such attempts (which would involve reconstructing the tower almost as many times), the robot trained on just about 300, with attempts of similar measurements and outcomes grouped in clusters representing certain block behaviours. For instance, one cluster of data might represent attempts on a block that was hard to move, versus one that was easier to move, or that toppled the tower when moved. For each data cluster, the robot developed a simple model to predict a block’s behaviour given its current visual and tactile measurements.

The paper’s lead author Nima Fazeli says this clustering technique increases the efficiency with which the robot can learn to play the game and is inspired by the natural way in which humans cluster similar behaviour.

The researchers tested their approach against other machine learning algorithms, in a computer simulation of the game using the simulator MuJoCo. The lessons learned in the simulator informed the researchers of the way the robot would learn in the real world.

The team also compared the robot’s ability with a human player, reporting that although there’s still a way to go until they realise a Jenga champion, the difference between their skills levels was “not that much.”

For now, the team’s focus lies on applying the robot’s new skills to other application domains.

“There are many tasks that we do with our hands where the feeling of doing it ‘the right way’ comes in the language of forces and tactile cues,” Rodriguez says. “For tasks like these, a similar approach to ours could figure it out.”