Hi! New AI Weekly is here! This week was really great for AI developers, huge amount of new code and libraries appeared, a lot of them were published during TF Dev Summit. There are also worth reading articles in learning section, especially the one with notes about what might surprise you when trying to reproduce paper. Talking about papers, just don’t miss the one explaining usage of GAN for image compression, it’s awesome, of course that’s not all… just enjoy your weekend reading other AI news and don’t forget to share it with your friends 😉
Differentiable Plasticity: A New Method for Learning to Learn – biological brains exhibit plasticity—that is, the ability for connections between neurons to change continually and automatically throughout life, allowing animals to learn quickly and efficiently from ongoing experience. The levels of plasticity of different areas and connections in the brain are the result of millions of years of fine-tuning by evolution to allow efficient learning during the animal’s lifetime. The resultant ability to learn continually over life lets animals adapt to changing or unpredictable environments with very little additional data.
Lessons Learned Reproducing a Deep Reinforcement Learning Paper – reproducing papers is a good way of levelling up machine learning skills, if you’re thinking about reproducing papers too, here are some notes on what surprised me about working with deep RL.
Heroes of Deep Learning: Andrew Ng interviews Yann LeCun
Magenta – browser-based applications, many of which are implemented with TensorFlow.js for WebGL-accelerated inference.
Interactive supervision with TensorBoard – IBM Research AI implemented semi-supervision in TensorBoard t-SNE and contributed components required for interactive supervision to demonstrate cognitive-assisted labeling. A metadata editor, distance metric/space selection, neighborhood function selection, and t-SNE perturbation were added to TensorBoard in addition to semi-supervision for t-SNE. These components function in concert to apply a partial labeling that informs semi-supervised t-SNE to clarify the embedding and progressively ease the labeling burden.
Fitting larger networks into memory – the python/Tensorflow package openai/gradient-checkpointing, that lets you fit 10x larger neural nets into memory at the cost of an additional 20% computation time.
TensorFlow Probability – a probabilistic programming toolbox for machine learning researchers and practitioners to quickly and reliably build sophisticated models that leverage state-of-the-art hardware.
Python Script to download hundreds of images from ‘Google Images’. It is a ready-to-run code!
Looking to Listen: Audio-Visual Speech Separation – people are remarkably good at focusing their attention on a particular person in a noisy environment, mentally “muting” all other voices and sounds. Known as the cocktail party effect, this capability comes natural to us humans. However, automatic speech separation — separating an audio signal into its individual speech sources — while a well-studied problem, remains a significant challenge for computers.
Towards a Virtual Stuntman – Motion control problems have become standard benchmarks for reinforcement learning, and deep RL methods have been shown to be effective for a diverse suite of tasks ranging from manipulation to locomotion. However, characters trained with deep RL often exhibit unnatural behaviours, bearing artifacts such as jittering, asymmetric gaits, and excessive movement of limbs. Can we train our characters to produce more natural behaviours?
Generative Adversarial Networks for Extreme Learned Image Compression – a framework for extreme learned image compression based on Generative Adversarial Networks (GANs), obtaining visually pleasing images at significantly lower bitrates than previous methods. This is made possible through our GAN formulation of learned compression combined with a generator/decoder which operates on the full-resolution image and is trained in combination with a multi-scale discriminator. Additionally, this method can fully synthesize unimportant regions in the decoded image such as streets and trees from a semantic label map extracted from the original image, therefore only requiring the storage of the preserved region and the semantic label map. A user study confirms that for low bitrates, this approach significantly outperforms state-of-the-art methods, saving up to 67% compared to the next-best method BPG.
EPIC-Kitchens – The largest dataset in first-person (egocentric) vision; multi-faceted non-scripted recordings in native environments – i.e. the wearers’ homes, capturing all daily activities in the kitchen over multiple days. Annotations are collected using a novel `live’ audio commentary approach. 55 hours of recording – Full HD, 60fps, 11.5M frames