For Cassiopeia, the size of the segment is 1 second, for Phoenix it’s 8 seconds. The main thing in the first group is the increase in the amount of data that the network “observes” at a time to figure out the composition of the instruments and isolate the required ones, such as the voice or the drums. The ideas underlying the new neural network can be divided into three groups:Īs the neural network processes an audio file, it divides a track into segments and “observes” each of them one at a time. We did a lot of research to determine the best approaches and ideas that formed the basis of the next-generation neural network called Phoenix. It seemed that some of the approaches invented in the fields of image and natural language processing could be applied to audio as well, including source separation. We also closely followed the active development of machine learning and artificial intelligence. In spite of the greatly increased separation quality, Cassiopeia still didn’t meet our vision of the perfect stem separator.ĭuring the development of Cassiopeia, we’ve got a lot of ideas on how its algorithm could be improved. The researchers at LALAL.AI have an idea of the quality that stem separation solutions can have. Cassiopeia’s training required almost a terabyte of data and machine time just to isolate the vocal channel - it’s comparable to a year of operation of an average gaming computer. Cassiopeia is a far more complex network than Rocknet. However, the higher quality came at a high price. In addition to the superiority in formal quality, the stems generated by Cassiopeia had a much fuller, denser and more pleasing sound due to the correct phase processing. The new neural network was called Cassiopeia.Ĭassiopeia provided a leap in separation quality, outperforming Rocknet by a whole 1dB of SDR (signal to distortion ratio) in vocal isolation. It was a completely new network structure that had very little in common with Rocknet, the neural network LALAL.AI operated on at the time. They created a neural network architecture that could work in the complex number field (all solutions before worked in the field of real numbers) and process both the amplitude and phase parts of the input mix and output stems simultaneously. Cassiopeia - A Unique Solutionĭespite the complexity of the problem, our researchers managed to find the key to it. It’s demonstrated in open publications on the topic and proven by our own investigations. Attempts to solve the phase inclusion problem head-on, for instance, by repeating the way amplitude processing is performed, don’t work. It’s an extremely sophisticated task due to the specific and instrument-dependent behavior of the phase. There is a reason why most solutions don’t take phase into account. Both are quite unpleasant to the ear and perceived as foreign in isolated stems. These artifacts produce a dry, plastic sound, and sometimes effects similar to those generated by sound processors. It was mentioned in our Cassiopeia article that ignoring the phase aspects leads to artifacts that sound alike in all of the aforementioned solutions. It was a prerequisite for the creation of new architecture, subsequently used for Cassiopeia, LALAL.AI’s second-generation neural network. As a result, we came to the conclusion that all solutions, including ours, have the same key flaw - they focus on amplitude processing and ignore the phase aspects. A Strive For Better QualityĪt the time when LALAL.AI was still using Rocknet, our first-generation neural network, we were analyzing the network performance and studying all existing and popular AI-based stem separators to improve the stem splitting quality of our service. In order to bring LALAL.AI closer to our idea of an ideal separator, we did a lot of research and identified the most promising new approaches that became the foundation of the next-generation neural network called Phoenix.Ĭompared to the previous neural network, Phoenix processes and splits files into stems twice as fast, delivers higher quality vocal extraction, handles backing vocals much more carefully, and produces significantly fewer artifacts.īelow you can find detailed information about the new neural network and how it came to be, evaluate the quality of separation on concrete examples, and learn about our plans on the service development. Since then, we’ve had tons of new ideas about the ways Cassiopeia's algorithm could be improved, and the development of machine learning and artificial intelligence has not stood still. Less than a year ago, we introduced Cassiopeia, an audio source separation solution that surpassed Rocknet, the original LALAL.AI neural network, both in quality and accuracy.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |