Search Results for “DP2306” – DIGITAL PRODUCTION https://digitalproduction.com Magazine for Digital Media Production Wed, 04 Dec 2024 13:29:41 +0000 en-US hourly 1 https://digitalproduction.com/wp-content/uploads/2024/09/cropped-DP_icon@4x-32x32.png Search Results for “DP2306” – DIGITAL PRODUCTION https://digitalproduction.com 32 32 236729828 S/N:05 https://digitalproduction.com/2023/10/15/s-n05/ Sun, 15 Oct 2023 11:21:00 +0000 https://digitalproduction.com/?p=150435
A twenty centimetre tall robot has to compete against its rival, a cat, and save a young girl from an attack by a monstrous machine. The idea for the short film "S/N:05" (Sinos) came about when I read Catrin Misselhorn's book "Grundfragen der Maschinenethik" in 2019 and imagined a scene in which a small machine fights a cat, a symbol of freedom. Which position do we take? That of the clunky machine or that of the cute cat? At the time, however, I didn't realise that the topic of artificial intelligence would experience such a boom with ChatGPT and co. All the more reason to take another look at Catrin's Reclam book.
]]>

By Hans Jakob Harms

Jonas (script) and I worked out the story in detail and convinced the FFF Bayern to support the project with an animatic. It was important to Noah (camera) and me to achieve a dark, cool tone in the visual design, as we couldn’t tell the whole dystopia in nine minutes and with the tight budget. To achieve the look we wanted, we decided to shoot
to shoot on a RED Gemini 5k with anamorphic lenses. The film mainly takes place in two locations, a children’s room and a remote house, which Andreas (set) decorated according to a precise construction plan and colour concept. Then Ivetta (producer) and I planned the shoot before we threw ourselves into the fun. Eight days of filming. With cat, child and CGI – you can find the rest of the key data here: is.gd/sinos_imdb.

Maya, our leading actress, impressed us with her talent both at the casting and during filming. Although she was only twelve years old at the time, she knew exactly where to go. It was a lot of fun working with her. Up until the first day of filming, we were very worried that Gizmo the cat wouldn’t be in the film. But Barbara’s (animal trainer) stars are used to the hustle and bustle on set. We did our best to create a feel-good atmosphere for the little one, and lo and behold, Gizmo delivered shot after shot. Now there was just one more hurdle: Sinos.

Shooting

It was important to us that the film looked like a snapshot from the future, as realistic as possible. To achieve this, we worked on the character of Sinos and the monstrous machine long before filming. Alan (metal artist) built the robot from scrap parts based on a sketch by Michael (illustrator). Using a small scaffold that we retouched in post, we were able to move Sinos’ head and limbs. So we used the real Sinos in the close-ups and a CGI double in the wide shots. As the images in the film often follow each other directly, we needed an exact digital copy.

This is where Markus comes into play. He recreated the figure in Cinema 4D true to the original. He used a scan and the real model as a reference and the result was amazing. In the end, we were even able to mix the real model with the 3D model. For example, we unscrewed the arms from the real Sinos during the shoot and animated them in post.
It was nice to see how the character gained personality with each step during the creation process. Based on my description and some references, Michael drew the first picture of Sinos and gave him the features of an old, limping man. From there we could have gone straight to 3D and made the film with a digital model. But Alan’s sculpture, which
but Alan’s sculpture, which he assembled from metal parts (such as pliers, screwdrivers, pistons, etc.) and which is subject to the laws of physics, breathed a different essence into the little rascal. Now he looks more like a chunky baby who is struggling to hold his screws together. An ideal teammate for Maya, who plays his mother in the film.

Unfortunately, we were unable to recreate the large machine. We did have the idea of modelling the machine with individual elements, but the budget was too tight. This time, Markus created the 3D model directly from a concept by Michael and refined it according to his ideas. During filming, we used tripods as a reference to estimate the size of the machine for the shots. This allowed the actors to visualise the machine with a lot of imagination.

Concept art of the machine and Sinos by Michael Haggenmüller

With my background in post-production, I was able to take on a dual role on set as director and VFX supervisor. A static camera and lots of references provided a solid basis for post-production, so that the 3D elements could be combined well with the filmed material. I realised the animation of the machines in Cinema 4D and rendered them with Arnold. The renderings were then applied to the plates in After Effects and played out for Resolve.

3D models of the machine and sinos by Markus Kooss

The large number of VFX shots and the tight budget made detailed planning in advance essential. Even though the possibilities of digital post-production are becoming easier and more accessible to everyone these days, poor and often unnecessary visual effects can take away the authenticity of a film. We scrutinised the VFX twice for every shot and only shot what was necessary to tell the story. But of course, in this science fiction film with fantasy characters, there were still a lot of images that had to be supplemented with renderings.

Preview view of the machine on the tracked plate


To avoid headaches in post-production, it helps to have a storyboard in which the tasks are precisely specified after filming. This makes it possible to estimate and possibly reduce the scope of post-production in the pre-production phase. Of course, it is an advantage if you have mastered the craft yourself or at least have a basic understanding of it. Otherwise, I would always involve an experienced VFX supervisor in the planning.

A film lives from its actors and the roles they embody, and VFX are known to complicate the acting during filming. That’s why I distributed the focal points in the resolution of the scenes. In one shot, I focussed on the actors, giving them the freedom to perform. In other shots, however, the visual effects took centre stage and the entire team concentrated on the clean execution. Of course, it is particularly appealing to merge an elaborate visual design with an emotional play in one shot. But the problems that arise during staging and the high, often incalculable effort involved in post-production should not be underestimated, especially for an independent film. I acquired my post-production skills through my own film projects. After school, I made my own short films, in which I always had to take on several positions. As a few days of shooting were often followed by several weeks of post-production, I honed my skills in the areas of editing and animation in particular. I am first and foremost a filmmaker and like to tell stories that don’t require elaborate VFX. That’s why it’s important to me to work with programmes that produce meaningful results quickly.

Tracking markers to replace the window in post.

I want to keep the technical effort in post as low as possible and emphasise creativity. Cinema 4D offers a user-friendly introduction to the 3D world and is similar in structure to the standard Adobe products. At the same time, it is equipped for most challenges in terms of scope. In special areas (e.g. simulations) there are software solutions that are better equipped. However, when I reach my limits, I usually look directly for artists who can help me with their specialisation.


Marie and Thomas helped me with the organic animation of a deer. Tapan tracked a drone shot that Cinema 4D couldn’t solve. Tanver helped me rotoscope the actors and Nagendra removed a rig from Sin’s eyes that I couldn’t retouch in After Effects. Of course, as with almost every project, there were problems that can get on your nerves. My tip: ask Reddit. The community is full of professionals who are happy to help if they have a solution.
I have kept the animation pipeline as simple as possible. With a rough animation of a low poly model of Sinos on the tracked plates, I was able to determine the timing in the cut and play out a picture lock with preview renderings. I then linked the rig of the low-poly model with that of the high-poly model for the final rendering. In this way, the turnarounds with the final scene were kept within limits.

Green blocks as stand-ins for sinos
The stand-in is replaced by sinos.


The nature of Sinos, a robot made of rusty metal parts, suited me for the animation. Simple mechanical movements and the combination with the real model enabled me to create the animation myself. Rigging and animating the monstrous machine was a little more complex, but manageable thanks to the few settings.
Colleagues asked me why I chose Arnold as the render engine. Markus had prepared the 3D model with Redshift materials. But I think GPU renderings are not yet “on par” with classic CPU renderings. Technically, I don’t know much about this area. That’s why I converted the materials from
Markus’ materials to Arnold and compared them with the Redshift renderings. We both liked the look of Arnold better and so the decision was quite easy.
The soundtrack by Giovanni, the sound design by Alexander and Florian and the colour correction by Lukas and Nadir put the finishing touches to the images and brought the menacing world to life. I would like to take this opportunity to thank everyone who was involved in this project and helped make this short film come to life.

Conclusion

We have submitted the finished film to several festivals and are waiting to see if it finds an audience there. It will then be made freely available to everyone on the internet. A trailer is available on the Instagram page. We are currently working on a screenplay for a feature film to continue the story of Sinos in the dystopia.

]]>
DIGITAL PRODUCTION 3D Modelle der Ma­- schine und Sinos von Markus Kooss 150435
Topaz Video AI Revisited – Version 4 https://digitalproduction.com/2023/10/15/topaz-video-ai-revisited-version-4/ Sun, 15 Oct 2023 08:38:00 +0000 https://digitalproduction.com/?p=149667 Das UI wurde gründlich überarbeitet.
We only took a look at version 3 of Topaz Video AI (TVAI for short) at the beginning of the year, which had already been improved in terms of user-friendliness, but as we all know, AI is constantly learning. In version 4, the user interface has been revised again and technical developments have also been added in recent months. However, the latter also applies to Davinci Resolve (DR), which has just been released in a new version, so it's worth taking another look Comparison. This time we have dug up material from the analogue era. In addition to very poor quality video from amateur cameras, there is also video from a very high-quality Ikegami camera, which was only recorded on Super VHS after a professional recorder failed.
]]>
Das UI wurde gründlich überarbeitet.

Part of Topaz Labs’ business model is to constantly keep users happy with updates, as these updates are only included for the first year after purchase. After that, users have to pay 99 US dollars a year if they want to keep up. Unlike Adobe, however, your own work is not held hostage, but you can continue to work with the last licensed version. When the software reports an update, it is fair to point out if this is no longer included in the subscription. There is also a trial licence for one month, which you can use to thoroughly test the software for your
your own needs and with your hardware. Since TVAI 3.0.3 from the last test, not only have new features been developed, but a number of bugs have also been fixed. However, similar to DR 18.6, a few other bugs have been added with the very fresh version 4.0, especially in the new user interface and the Nyx 2 model.

Stabilisation

First of all, we would like to tick off the less successful features. These include stabilisation, which was still listed as a beta in our last article. Although this is handled very quickly by both programmes, it doesn’t work miracles. We tested it with unsteady material from a digital handheld camera, whose sensor is not particularly fast.

Bei auffälligen Bewegungsunschärfen nach der Stabilisierung leistet Themis im Deblur oft Erstaunliches.
With noticeable motion blur after stabilisation, Themis often does an amazing job in Deblur.


Although we had activated the treatment of the RS with “Rolling Shutter Correction” in TVAI, quite a lot of image distortion remained in the peripheral areas, especially when set to “Full Frame”. TVAI attempts to reconstruct missing edge areas instead of zooming into the image. Unfortunately, this leads to quite a lot of “jelly” in these edge areas. The results from Resolve were no better. In contrast to DR, rendering processes do not block further work on other tasks in the programme.

Our findings from the article on stabilisation with support from gyroscope data (DP 22:06), where DR gets a perfect grip on the RS, at least for the camera’s own movements, still apply here. We also tested this again with a clip from Sony’s Catalyst Browse, in which vibrations were stabilised with gyro data. Typically, short blurs become clearly visible with each shock. TVAI has a Deblur model called Themis for this purpose. This gets such motion blur under control quite well in some cases, but is still heavily dependent on the subject. It can sometimes have the opposite effect on contours with high contrast, and unfortunately there is no fine-tuning for this.

Slow motion

Both programmes offer the synthesis of intermediate images for subsequent slow motion, in DR the highest quality, but also slowest algorithm is called “Speed Warp”. As this worked best with very high-quality source material in both programs during the last test, this time we also deliberately tried out the lower-quality, upscaled analogue material (see below). In detail, both programmes show almost identical problems with crossing movements, as is also known from Optical Flow. Nevertheless, the results are so good that they can be used for many purposes. Speed Warp showed slightly more artefacts in detail and, in rare cases, slight flickering in textures. Nevertheless, the slow motion alone is certainly no reason to buy TVAI if you already have DR Studio, as the programmes don’t have much in common in terms of processing times either.

Deinterlace and upscaling

If the source material is interlaced and only has SD resolution, conversion to progressive HD actually seems impossible. But here the AI should show what it can achieve by looking at the neighbouring images. There are two models recommended in TVAI for this purpose: Dione and Iris. With Dione we tried Dione TV and Dione Dehalo, with Iris we simply used the LQ version. Although this means “Low Quality”, it refers to the source material and not the result. Dione Dehalo is already able to achieve a more appealing look by reducing the ugly edge sharpening of the amateur camera.


Unfortunately, just like DR’s neural de-interlacer, it still struggles with staircase artefacts on critical curves in the image. The current version of TVAI now also offers inverse telecine, but there are still problems (at least they are correctly documented). In addition, many editing programmes can do this at least as well. A new option is the setting to pure black and white, so that the AI does not hallucinate colour with corresponding sources.

With Iris LQ, on the other hand, we were somewhat taken aback, and not just in terms of the perfect de-interlacing. The HD version looked much better than the DR version, despite the activation of “Superscale 2x Enhanced” and neural deinterlacer. Although we had not yet done any fine-tuning here, but had calculated with the standard values, the edge sharpening (the halos) was also successfully removed (this did not play a major role with the professional camera above). We have never seen better upscaling of analogue material with a stand-alone program.

In contrast to some online offerings with AI processing on a third-party server, nobody has to hand over their material to TVAI because the AI models are still downloaded to their own computer.

In addition, the field sequence was reliably recognised by the automatic system, but can be set manually if required. Even if the result sometimes looks a little flat, it is not embarrassing as archive material in a modern environment. Faces are reconstructed better by Iris LQ than by Proteus. TVAI can also add a little ‘grain’ if desired, which makes the images look a little more natural again. The other big surprise was the speed. Until now, one of the criticisms levelled at TVAI was the sheer endless computing time, but even on a modest Apple MacBook M1 Pro, the better IrisLQ was generated at 5 frames per second. The less successful DioneTV model even managed almost 10, while DR only delivered 2.3 frames per second.

For the first time in our tests, TVAI was clearly superior to DR’s on-board tools, not only in terms of quality but also in terms of speed. Having become overconfident, we even dared to blow up the material to UHD with Iris LQ. However, the AI then invented a little too much graphic detail, and slight interlace artefacts returned at the archway. We had already tried to convert HD to UHD in an earlier test. However, the recognisable quality gain compared to DR shrank with high-quality source material, such as the conversion from good HD to UHD. We then needed “pixel peeping” to recognise the differences. If there is only a small amount of material to convert, many people will not want to change the programme. A suitable use case would be the upscaling of clips from a modern camera if the camera is not capable of full resolution in slow motion. Some CGI studios use it to upscale material in 2K to 4K because they can avoid gigantic render times.

Input and output

Incidentally, one of our sources was available as MPEG, which was read by TVAI without any problems, whereas we first had to have it repackaged as MOV for DR. This is because TVAI is based on ffmpeg and therefore understands an enormous number of formats. It can also deliver the ProRes format often requested by customers on PCs, which is not possible with DR. Although this is not approved by Apple, it works perfectly everywhere.

On the Mac, however, TVAI uses the system routines and should therefore produce completely “legal” ProRes. In the new version, it offers the setting for adaptive compression for H.264 and H.265. However, the three settings for this are somewhat unfortunate: High delivers extremely large files, while Medium compresses far too much and is not very different from Low. Here we would rather recommend selecting a constant bit rate until Topaz has optimised the values. FFV1 is also available for the output of an archive copy. The export settings are now directly accessible and with “Export as…” you can assign more meaningful names.

Improved preview

With a program of this type, it is essential to optimise editing for your own raw material and customer requirements. Until now, it was quite cumbersome to compare the results of different models and manual adjustments. Now two previews or original and preview can be played side by side or in split view. As usual, not all veteran users are happy with the new interface. In our opinion, however, there are significant improvements, e.g. TVAI now remembers the settings for zoom, split and position of previously edited views when the original is changed. Unfortunately, it is not possible to save entire projects: When you leave the programme, everything is gone. The 4.0 version is not yet completely free of bugs, but you can always find the latest versions of the 3 for download.
Some tools now work completely differently, such as Trim, or are subject to a few restrictions in the direct display, such as cropping. Some will look in vain for the crop tool, as it has moved to the top right and is only available when the original is displayed in full screen. The timeline now offers a zoom and the trimming tool also determines the playback area for previews. So don’t forget to adjust this for export if necessary. Version 3 users should look in the small menus with the three dots if they are missing something. The option of having two models edit the same clip one after the other must first be activated in the preferences.

Gelegentlich versagt noch die Synchro­nisation der Clips.
Occasionally the synchronisation of clips still fails.

Performance requirements

We also wanted to know whether it would make sense to clean the material with TVAI only from the interlace and edge sharpening in order to then scale with DR. With pure de-interlacing, IrisLQ even manages 9 fps, whereby the laptop did not appear to be fully utilised, while with DR the GPU was running at full capacity.
However, this method was somewhat weaker in terms of quality, more cumbersome and not any faster. In contrast to our last test, TVAI has now been optimised for Apple Silicon and significantly outperforms older Intel Macs, even if the minimum hardware requirements for both PC and Mac remain modest and date back to around 2014/15 (AVX2 instructions are required for PC). For occasional, short restorations with overnight rendering, most computers that can work with video at all should be sufficient. They can also be better utilised if you allow multiple instances; our GPU was almost fully utilised with 2 processes.

Der eingebaute Benchmark zeigt euch genau, was eure Hardware leistet.
The built-in benchmark shows you exactly what your hardware can do.


Unfortunately, even Mac Studio Ultra cannot compete with TVAI on the best PCs for more intensive use. If you regularly have such restoration work in your home, you should rather buy a PC with the best Nvidia GPU that is financially within reach. For example, an AMD Ryzen 9 5950X with 16 cores and the NVIDIA GeForce RTX 4090 can convert from HD to UHD at over 13 frames per second. CPUs with more cores or multiple GPUs provide even more of a boost, but the latter doesn’t always seem to work. In addition, not all models work with Intel ARC GPUs yet; these will be included in the coming weeks. Details can be found in the user forum at: bit.ly/topaz_bench. Please note that the output format in the list is already full HD, in some cases even UHD. As one licence can be used on two devices, whereby this applies to PC and Mac, such a bolide can also work separately.

Comment

Obviously, really bad analogue material is TVAI’s real domain. Here we were able to observe a clearly visible superiority in de-interlacing and upscaling compared to DaVinci Resolve. The speed is now also comparable, in some cases even better. The only wish that users could possibly still have if they only rarely have a job with critical archive material to
material: How about a low-cost monthly licence?

]]>
DIGITAL PRODUCTION Das UI wurde gründlich überarbeitet. 149667
From the left.. https://digitalproduction.com/2024/08/15/von-links-her/ Thu, 15 Aug 2024 17:49:30 +0000 https://digitalproduction.com/?p=144222
Nomen est omen. When reading the company name "Left Angle", anyone with any mathematical knowledge will think: "There's no such thing!" And it's a bit like that with your product, Autograph: a new compositing and motion design application that stands up to comparison with Nuke and After Effects. That doesn't even exist. Or is there?
]]>

Autograph has been on the market since the beginning of 2023 and is distributed by the plug-in manufacturer Re:Vision Effects, but is produced by the French company Left Angle. The software is available on Windows, Mac and Linux and can be purchased either as a subscription model (monthly and annually) or as a perpetual licence. In addition, there are three versions of Autograph: Creator, Studio and Render. While Render is really intended as a pure command line rendering tool, Creator is a licence for freelancers and smaller companies with less than 1 million US dollars in capital or income. For those who land above that (congratulations!) there is the Studio licence, which also includes the Python API. A monthly subscription for Autograph Creator starts at 35 US dollars, for Studio it is then 60 US dollars. Those who prefer to buy their licence will then pay a price of 945 US dollars for Creator and 1,795 US dollars for Studio. Before that, however, you can easily get a first impression in a ninety-day trial – to be found at left-angle.com.

Der Node Graph fehlt leider, aber das Interface kann man schon ganz nach seinem Gusto einrichten.
Unfortunately, the Node Graph is missing, but the interface can be customised to your liking.

The first impression

When you first look at the interface, you quickly think that you are simply sitting in front of a darker After Effects. And indeed, anyone who is familiar with Adobe’s motion design software will quickly find their way around Autograph. The viewer in the centre, properties on the right, the project window on the left and the timeline at the bottom. Similar to Nuke, you can create your own workspaces for different tasks so that you can customise the layout to suit your needs. However, there are only very few types of panels, which is why the design options are quite limited. Footage can be imported into our project via the project panel and compositions can be created. However, as with After Effects, the real centre of attention is the timeline. Individual layers can be dragged in here and the parameters can be expanded and edited accordingly. The filter function, which not only helps to isolate individual layers but also specific operations, is very convenient. These filters are also assigned to hotkeys and can be combined with text input. As an artist, you can then isolate the position value of a specific layer quite quickly, for example, in order to animate it. All parameters that are displayed in the timeline are also available in the Properties panel. This may seem redundant, but users who don’t want to scroll through thirty layers will find an alternative workflow here. And if you only want to work in the timeline, you can modify your workspace accordingly. Speaking of parameters: Those who prefer to enter values gesturally rather than numerically will be pleased with the hotkeys that adjust the intensity of the value input. For example, Shift lets you “drag” larger values, while Ctrl ensures finer input. Autograph does one thing fundamentally differently to After Effects: the Curve Editor is not integrated into the already fully loaded timeline, but is available as a separate panel. However, the viewer will undoubtedly remind many experienced artists of Nuke. No wonder, under the bonnet Autograph processes all files in 32-bit float – and so the viewer has silders to not only apply gamma and gain, but also saturation as a post process to the viewer, so that you can visualise even the last highlight in the EXR file. The viewer has two inputs, where a HUD and the blend modes known from Nuke are also available, for example to compare a reference with the current comp.

On to something new…

Up to this point, the interface looks familiar. But let’s take a look at the things that Autograph does differently from the two software packages between which it is positioned. The first thing that stands out is the use of the HUD. While Nuke’s on-screen controls are often very small, Autograph’s are not hidden at all. They are clearly visible and an important part of Autograph’s fast and fluid operation. The fact that right-clicking and the associated context menus are largely dispensed with also contributes to this fast interface. Pen and tablet users in particular will really appreciate this. But it is also the absence of clicks that makes the interface incredibly fast. For example, there is an option (which can of course be switched off) to briefly highlight layers that can be clicked on in the viewer so that you can see whether you have made the right selection. And how nicely the alpha channel is respected here – others can take a leaf out of this book. Hovering over drop-down menus works in a similar way. The selected parameter is displayed as a preview, so to speak, before you have even clicked on it. Not as a thumbnail, but actually applied to the current comp in the viewer. This works for the layer blend modes, for example, but also for the more than fifty interpolation presets. So you can run your animation in a loop and simply scroll through the presets and see what these different interpolations would look like.

Danke voller Floating Point-Unterstützung kein Problem: Motion Vector Tracking
No problem thanks to full floating point support: motion vector tracking

Let’s stay on the subject of animation for a moment: Autograph is also trying to rethink things here. All image content always has its pivot in the centre of the image and the zero point of the coordinate system is also in the centre of the image. This may sound banal, but it is enormously helpful when it comes to adapting an animation for different output formats. It also helps tremendously when creating symmetrical animations. Autograph has a sophisticated system for dynamically linking values to each other without the artist needing a maths degree to write the corresponding expressions. Autograph uses so-called modifiers, which can simply be dragged onto the corresponding layer like an effect in After Effects. Somewhat confusingly, Autograph also refers to classic effects such as keyers or colour correctors as modifiers, but this shows how important they are for the workflow in Autograph. In addition to the modifiers, there are generators that can produce everything from images, i.e. classic colour areas or gradients, to text and numerical data and even 3D primitives. This now brings us to the 3D environment. This is based entirely on USD and as the USD files are only linked and not imported into the Autograph project, common formats such as Alembic or the rather old FBX are not supported. Autograph uses the filament developed by Google as its render engine. As a result, the 3D environment is quite fast and is likely to bring tears to the eyes of some Nuke artists. Camera and lights can simply be dragged into the timeline and if 3D layers are stacked directly under each other, they are treated and rendered as a single 3D scene. This makes it possible to mix 2D and 3D elements in the same timeline without having to create precomps. And by using filament, post effects such as ambient occlusion, fog and depth of field are also available.

Auch das Verlinken externer Daten, zum Beispiel aus Excel-Tabellen, wird unterstützt.
Linking external data, for example from Excel spreadsheets, is also supported.

What is missing?

The developers of Left Angle do not seem to lack enthusiasm, as can be seen in the interview with Francois Grassard on page 30. For a first version, Autograph makes a well-rounded impression. OpenFX plug-ins are supported, even if at the moment only those from its own distributor, Re:Vision FX, are available. The extent to which plug-in manufacturers support the newcomer will be seen in the coming months. However, both Nuke and After Effects, the two programmes that Autograph is competing with, both have an ecosystem of plugins and gizmos that has been growing for several years, not to mention a very large user base, which naturally does not exist right from the start. Nuke artists will miss their Node Graph the most. Even if the toolset is impressive, the speed of the interface and software is blazingly fast and the rendering is super-precise, the logic in a timeline is of course different from that in a node graph. Motion designers coming from After Effects will certainly find it easier to familiarise themselves with Autograph.

Conclusion

First impressions can often be deceiving. Autograph is far more than the After Effects clone that some people thought it was at launch. The user interface in particular has some innovative surprises and concepts in store that the developers of the long-established software either didn’t dare to tackle due to their user base or simply didn’t think of. This breath of fresh air is something that has definitely been missing in recent years. It’s reminiscent of the days of 5D Cyborg and Socratto, where innovative software forced established developers to rethink things and push their software forward. Competition stimulates business. That alone is a reason to be happy. Autograph is already interesting for a certain group of artists. Anyone who moves between motion design and compositing or wants to make the leap from motion design to compositing will find Autograph a possibility without having to learn a completely new paradigm such as working in Node Graph. Finishing artists can find a complement to Flame or Resolve without having to commit to a monthly cloud subscription. And if the developers of Left Angle continue with the same verve, this circle will certainly grow even larger.

]]>
DIGITAL PRODUCTION 144222
Shortcuts for more connection https://digitalproduction.com/2023/10/31/shortcuts-for-more-connection/ Tue, 31 Oct 2023 11:40:00 +0000 https://digitalproduction.com/?p=151819
If you work with several laptops, you need help. Otherwise you will the USB mouse at some point. The MX Keys S Combo from Logitech with mouse and keyboard could be the solution. Views of a n00b who just wants to work efficiently.
]]>

by Pia Röder

Okay, I admit it. I hate (h.a.s.s.e.) installing software and getting it to work. I’m a copywriter and use Word, PowerPoint and, if things get stupid, Excel. Unfortunately, there are two laptops here that I have permanently in the office and slowly (after 5 years) I’m getting really fed up with having to move the USB mouse from laptop A to laptop B. Apart from that, laptop keyboards are somehow semi-optimal anyway and not ergonomic at all. I need something simpler, something better. Something that allows me to operate both luggies with one mouse and one keyboard. That’s when I found the MX Keys S Combo from Logitech.

The unboxing experience

I’m not a big fan of the bitten apple, which is why I don’t like all this unboxing hype. The hardware I order comes in black cardboard boxes – cuboid, practical, good. That’s all the unboxing experience I can get as a PC user. So well, I’ll take it. Everything in the box is wrapped in greaseproof paper – keyboard, mouse, silicone wrist-rest cushion (a big hello to all those learning German as a foreign language). But at least it avoids plastic. One UN sustainability target has already been met.

Connection via Bluetooth

The instructions on how to connect the devices to the laptops are printed on the top of the lid. One option: via Bluetooth in just 3 steps. It can’t be any more than that, otherwise I’ll jump off halfway through the user journey. I won’t even try the connection with the logi bolt USB receiver (4 steps). If anyone wants to test it, let me know if it works.

So, Bluetooth switched on in the PC settings. Keyboard MX Keys S switched on at the back, mouse on the underside too. It works. The numbers 1, 2 or 3 flash on the Easy Switch buttons, like children’s television in the nineties. I decide in favour of 3 and press firmly. Last chance… over! MX Master 3 S is connected, says my laptop. Hooray! (I’m annoyed that I didn’t press 1, but never mind now.) The keyboard has secretly connected itself via key 2. Guys… really now.

Screenshot

I ignore the problem and concentrate on connecting to the other laptop. Same game. This time it’s mouse button 2 and it works. Only the keyboard doesn’t work. The setup wizard on mxsetup-logi.com promises help. Clear design, but they have separated “setup instructions” incorrectly. I get sad. At least it’s quick and easy. Switch the keyboard off and on again, press and hold Easy-Switch button 2 until it flashes, enter the PIN in the pop-up on the desktop using the keyboard, enter and you’re done. It wasn’t that difficult for laptop 1 either. Now I can use the mouse and keyboard with both laptops.

Hardware with Fancy Keys

The keyboard has already been extensively discussed in a previous issue: solid, comfortable to type on and easy to charge via USB. I can confirm this. The really cool thing is the function keys at the top. In particular, “Microphone off” when the cat in the background throws up on the carpet during a meeting (or, alternatively, the baby cries in the case of a low A on the Crazy Cat Lady scale). The dictation function at the touch of a button works surprisingly precisely and the one-click calculator is indispensable as a copywriter with three points in the (Hessian) maths A-levels.

While the Logitech software Logi Options is installing, I stroke the ergonomic mouse, which caresses my palm like a Motorola Pebble from 2005. It sits really well in my palm, has a left and right button and a scroll wheel. So far, so familiar. The highlight: I can assign individual functions to the various buttons.

Logi Options software

It has to be said, Logitech has it all when it comes to UX. The software for the keyboard/mouse combination is slim, clean and doesn’t bother me with complicated bells and whistles. I can use it to check the battery charge level on the devices, check for updates and set all the functions. One click in the top navigator takes me to the Smart Actions – the “If this than that” feature for the hardware. I was sceptical at first, but now I use the middle mouse button to open my fee Excel or send the editor-in-chief of this magazine “Entententententente” via WhatsApp. Just the important things. This also works with the keyboard, but unfortunately only with the F keys in the top row. Would like to have 30 pizzas delivered to my ex-boyfriend automatically using the “A” key. Doesn’t work.

There is room for improvement. I will soon see what is possible with Autohotkey (alternatives here: is.gd/remapping). Using Logi Flow, you can theoretically simply switch to the other computer using the cursor and transfer files via copy-paste. Theoretically, because that doesn’t work for me. This is probably because the (customer) laptop does not tolerate such invasive behaviour. This function would actually have been extremely practical, because I wouldn’t have to send files to my own office laptop for printing by e-mail, for example. Installing a third-party printer driver is not an option on the customer’s device either.

Personal conclusion

Even tech n00bs have fun with the MX Keys S Combo. After a bumpy start (the error is usually in front of the computer), the connection for both laptops was quickly set up. Minus point: You have to turn the mouse round each time and press the Easy-Switch on the underside to activate it for the other laptop. However, this is also possible using Windows Powertoys(bit.ly/powertoys_mouse). The Easy-Switches on the keyboard are in the path of my fingers and I deactivate them from time to time. It’s a matter of getting used to it.

But the most important thing: I save a huge amount of time because I no longer have to fiddle around with the USB mouse, I type like a young goddess and everything looks smart (girls love that). Hallelujah.

]]>
DIGITAL PRODUCTION 151819
3D Audio – Into the acoustic matrix https://digitalproduction.com/2023/11/14/3d-audio-into-the-acoustic-matrix/ Tue, 14 Nov 2023 10:48:00 +0000 https://digitalproduction.com/?p=149573
Welcome to the second part, which is all about 3D sound, or rather 3D sound is all about us. In the last article, we answered the questions about how you can enjoy three-dimensional sound and what software and hardware you need.
]]>

So far, we’ve only scratched the surface of the question: “Yes, but what am I listening to anyway?” So now it’s all about content. It’s not so easy to say in general terms where 3D audio actually enables good content. Depending on the context, there can be a completely different technology behind it in the form of formats or game engines.

That’s why I’ve come up with a structure that I’ll simply call the “3D audio matrix” – or pyramid? Be that as it may, the whole thing is intended to provide a reference for applications and their prime examples, the advantages and disadvantages of 3D sound, and of course an overview of formats and tools with their respective peculiarities. You have to take a few steps to understand this: 3D audio is not just 3D audio. So let’s first go back several dimensions to the origin, i.e. from 3D audio to 2D, 1D, 0D..

The 3D audio matrix overview

How, 0D? Admittedly, that’s a bit abstract. How can you visualise it? I’m taking a mathematical approach here. Don’t worry, it won’t be any more difficult than in your first geometry lesson. Let’s imagine a coordinate system in which our head is at the origin.

Now let’s combine this with the question: what kind of geometric object do I have?
0D: A point in the coordinate system without spatial information, the origin (0|0|0)
1D: A line, here I can move from left to right (x|0|0)
2D: A plane, now I can also move forwards and backwards (x|y|0)
3D: A cube/sphere that adds height information (x|y|z)

Audio formats that you already know

So far so good. Now imagine you want to place an audio object in a room. There are already audio formats that we know from our everyday lives.

0D: Mono. No room information can be added.
1D: Stereo. You can at least move your sound to the left and right
2D: Surround. With 5.1, for example, you can also place sound at the back
3D Audio: Here you can also move sound up or down.

But wait, there’s more: degrees of freedom

All overviews in this direction that I know of stop at Dolby Atmos, but beyond that it’s just getting started with 3 or 6 degrees of freedom. OK – what are degrees of freedom? Also known as DoF (Degrees of Freedom), the degree describes the following.

  • 0DoF: It is not defined where you actually look while consuming the content, except that you look forwards, like in films, you are not supposed to turn around.
  • 3DoF: Here you can also rotate your gaze, as we know it from 360° videos. Also known as head tracking (rotation).
  • 6DoF: This concept is already familiar from 3D games, where the player also moves through a 3D space (translation).

Headphones simplify understanding

It’s as simple as that – or not. Because it’s always a question of which direction I’m looking at my concept from. Here I am referring exclusively to headphone playback. Do some of you remember the localisation and externalisation in the head from the last article? This is a good example of how mono and stereo are always perceived in the head. Even if I add reverb to create a depth gradation, I can still only move an object to the left and right and only create a difference in volume and time (ILD, ITD). But the third factor (HRTF) is missing, with which I can really differentiate between front and back. For me, stereo is therefore one-dimensional (lins right), which is often confused with the two channels.
With surround sound via headphones, a binaural attempt is made to generate an impression of the front and rear – with 3D audio, this also includes height information. This is where the aforementioned HRTF comes into play in order to really “let the sound travel out of our head” during the calculation. So we go from an in-head localisation to an externalisation. Although at the end of the day, a two-channel stereo signal is played back. But watch out! It’s not “normal” stereo, but binaural, with HRTF-filtered extension.

360° – Sonnenuntergangstour auf dem Chiemsee mit dem Stand-Up-Paddle, zu finden auch auf YouTube. Hier ein Standbild als „Little Planet“ Projektion, weil es ja irgendwie nett aussieht.
360° sunset tour on Lake Chiemsee with the stand-up paddle, also available on YouTube. Here is a still image as a “Little Planet” projection, because it looks kind of nice.
So in etwa sieht dann die Mischung aus, viele Spuren, viele Parameter, das Bild einmal equirectangular, um den Überblick zu behalten und drunter die Ansicht im Video-Player.
This is roughly what the mix looks like, many tracks, many parameters, the image once equirectangular to keep the overview and below that the view in the video player.

The walls of the concept are shaking for loudspeakers

It’s not so easy for loudspeakers, because even if I only have one loudspeaker in the room, it’s still “somehow three-dimensional”. Besides, the term stereophony actually says: more than mono, so it would even include 3D audio, but I think that in everyday production everyone thinks of stereo as a “two-channel audio file”. With stereo
Playback, you place two speakers with yourself as the third point in an equilateral triangle. And as we know, triangles are actually two-dimensional. Nevertheless, I can only move my sound between the two loudspeakers, not beyond them. Even if I add reverb, you get the feeling of depth
Feeling of depth, but you don’t know whether the room is actually in front or behind you. Nevertheless, two-channel stereo is still a stable reproduction method and, in my opinion, not broken, regardless of what the various hectic marketing newsletters claim.

0-2D aka “normal media”

OK, of course I could go on and on about where to find mono, stereo and surround content. But that’s everyday life. We use mono every day for voice messages, we stream music in stereo and if you have the right TV, you can watch films in surround. The more I think about it, the more I realise that surround is not that far removed from 3D audio in this respect. You still have height information – applause. So the quantum leap from stereo to surround actually seems greater than that from surround to 3D. I would also subscribe to this for film, for example, because when streaming, the great 3D sound from the cinema has to be compressed and is then usually a 5.1 that tries to retain a few treble elements.

Das PlugIn Sony 360 Reality Audio ermöglich das Platzieren von bunten Audioobjekten kugelförmig um unseren virtuellen Kopf.
The Sony 360 Reality Audio plug-in makes it possible to place colourful audio objects in a spherical shape around our virtual head.

Audio should be immersive

But we remember that immersive audio should ideally be so natural that we don’t even think much about the technology. And 3D audio brings us a lot closer to this impression than surround sound alone.
However, there is another factor why 3D audio can bring even more benefits under the bonnet than height information. The surround formats meant here are either quadrophones 4.0, 5.1 or 7.1. The numbers are channel information, e.g. for 5.1, channels 1 to 6 are: Front Left, Front Right, Centre, LFE (low frequency effect), Left Surround, Right Surround.
So if you want height information, you need even more channels, such as 5.1.4. You then have four more speakers on the ceiling. But as you can guess, that’s kind of impractical. And what if I have a 7.1.2 system, how are the channels converted? That’s why audio technology is moving away from channel-based formats and towards so-called NGA, next-generation audio formats.

MPEG-H ermöglicht nicht 3D Audio Panning, sondern auch personalisierte Audiowiedergabe wie Mehrsprachigkeit etc.
MPEG-H not only enables 3D audio panning, but also personalised audio playback such as multilingualism etc.

Object-based audio

To understand why 3D audio can be even better than surround sound, let’s take a brief look at object-based audio. A major advantage of object-based audio is the independence of the channels, as the rendering only takes place at the end user. Systems that want to use Next Generation Audio must therefore have a corresponding decoder integrated. This ensures optimised audio playback at all times.
Another exciting possibility in addition to the movement of audio content in 3D space is the personalisation of these audio objects. My favourite example is watching a football match, where I can simply mute the “Commentator:in” audio object. Podcasts would be an equally exciting application. Let’s say we want to listen to a news podcast that is an hour long, but we only have 10 minutes. We tell our smartphone this and the podcast is automatically shortened to the most important 10 minutes using metadata.
MPEG-H is able to do this and is already standard in broadcasting in Korea and Brazil. The biggest competitor is AC-4 aka Dolby Atoms, which only allows such personalisation and interaction according to its own specifications.

Der Dolby Atmos Music Panner lässt die Bewegung der Audioobjekte auf das Tempo der Musik abstimmen.
The Dolby Atmos Music Panner allows the movement of audio objects to be synchronised with the tempo of the music.

Spatial Audio 0DOF with/without picture?

Let’s take a look at what 3D audio content is really available now. Most of them should be familiar, because thanks to Dolby Atmos, consumer favourites such as films and music streaming services are currently being supplied. Podcasts too, but in the vast majority of cases they make more sense in stereo – or even mono.

Viele Positionierungsmöglichkeiten heißt auch viele Parameter, die anhand von Automationen (Keyframes) in die Timeline geschrieben werden müssen.
Many positioning options also mean many parameters that have to be written into the timeline using automations (keyframes) must be written into the timeline.

3D audio is better than 3D video?

When I tell people that I do “something with 3D audio”, they immediately ask if I work with “Dolby Atmos”. In fact, Dolby is not that relevant in my immersive audio bubble because it doesn’t enable many things that I need in my daily work. But more on that when it comes to degrees of freedom.
Nevertheless, Dolby Atmos (AC-4 is actually the audio format behind the marketing term) is particularly relevant in the film industry. Thousands of Hollywood blockbusters have already been mixed in this format and people are happy to spend a few euros more to enjoy a surround system in the cinema. Sure, the sound experience is more fun when a helicopter suddenly sounds like it’s flying over your head. But at the end of the day, I would argue that all films also work with stereo sound – or mono. You don’t look in all directions, you only look forwards.
Even though I like to shoot in the direction of Dolby Atmos, they do a good job of bringing this surround sound into the living room. Many soundbar models now support playback via the TV using streaming apps. And playback via Apple headphones is also really fun, even on an iPad. Although the “Airpods Pro” are in-ear headphones, you have the feeling of being enveloped by the sound and can almost save yourself an expensive home cinema. Now let’s take a look at pure audio enjoyment without visual content: Music streaming has long been part of our everyday lives and is becoming increasingly popular. 3D music streaming is the latest innovation in the industry, adding a third dimension to the listening experience.

Die 360° Video Produktion #EUsavesLives zeigt hautnah den Schultag eines Kenianischen Jungen, (hier als Equirectangular Projektion, die das 360° Bild komplett zeigt, nicht nur den späteren Bildausschnitt im Videoplayer.
The 360° video production #EUsavesLives shows the school day of a Kenyan boy up close, (here as an Equirectangular projection, which shows the 360° image in its entirety, not just the later image section in the video player.

3D music

But now Dolby came up with the idea of converting their 3D audio format for music production. Admittedly, I’m a little sceptical here too, because I felt that most songs sounded better in stereo than with the 3D audio formats. In addition to Dolby Atmos Music, Sony is trying to add 360 Reality Audio to the list of 3D formats.
However, one advantage is definitely that you have to make fewer compromises when mixing music and have more options for placing the individual audio tracks. This gives some tracks more depth and you can hear the individual instruments better. You have the feeling that the musicians are sitting around you in the studio.
We are currently in a major learning phase here, similar to the move from stereo to mono. The first Beatles songs sound interesting by today’s standards. Today we know better how to mix in stereo. The same applies to these 3D music productions. So just see for yourself which streaming platform is already there and listen to it. I think the quality is getting better and better and since Apple Music got involved, there’s been a bit of a gold-rush atmosphere in the audio community.
And what is 8D Audio now? Admittedly: maximum confusion to categorise this again, but 8D doesn’t even stand for dimensions, but directions. Let’s just accept the phenomenon and define it for what it is: music circles around our heads in mono and works quite well through headphones. But in the long run it is a bit tiring and monotonous, so 3D music tries better. Here you work with individual tracks of the various instruments and place or move them through the room where it suits the composition. It rarely sounds as clearly 3D as 8D audio. So just listen to the examples that sound better or worse on my blog and form your own opinion.

Dolby Atmos Podcast

The last remaining audio-only format is podcasts, which are now also being tackled by Dolby Atmos. Here, however, my toenails roll up a little, for reasons that go beyond the scope of this article. The short version is that Audible has now jumped on the bandwagon, but AC-4 as a 3D format does not allow certain sound sources to be made non-3D. With audio dramas in particular, this means that the narrator’s voice is suddenly in the 3D scene with the protagonists, which means that the listener can no longer really distinguish who is actually part of the action.
But I’m already in dialogue with Dolby about this too, because what’s the point of always complaining 🙂 But there are already other courageous productions that have mixed with or without Dolby in order to get answers to the questions of how well radio plays work as 3D audio productions. Just listen to it yourself and form your own opinion. Head tracking comes later.
Most productions have “the problem” of producing too classically. In other words, recording speakers in the studio, then letting them move around virtually with 3D spatialisers and adding an atmo. But that doesn’t really sound convincing or immersive. Can you cite your own productions as a positive example? Then I’ll throw my radio play for BKW Engineering into the ring. A more classically produced one is the “Erdsee-Hörspiel” from WDR. I’m curious about the opinions.

Advantages and disadvantages of 3D sound

Said productions with 3D audio can be fun if they fit the content well. It is also often used as a marketing gimmick to simply offer the listener something new, to have a unique selling point. This is also a disadvantage, as 3D audio is not a seal of quality. However, you can usually rely on your ears to determine whether the production works for you or not, apart from your own taste.
That’s why productions are often made in 3D that would probably have worked just as well in stereo. Especially in the music sector, there are genres that have spatialisation but, in contrast to the stereo version, have less pressure. In addition, the conversion to headphones is not yet perfect. The soundtrack often sounds somehow duller than you are used to with stereo directly to the ears.
Here, too, we are in a transitional phase. Users first have to get used to this spatial sound again. It usually works better via loudspeaker systems and I have to admit that I had a lot of fun with Dolby Atmos Music in a demo car.
However, as you can imagine, with 3D audio mixes there are even more parameters that I can set for a sound. That’s why such mixes are usually more complex, more time-consuming and more expensive than a stereo mix. And, as I said, you usually need a selection of special devices to really get the full benefit.

Der IEM AllRADecoder ermöglicht die Wiedergabe von Ambisonics-Signalen mit einem beliebigen Lautsprecher Setup – je kugelförmiger, desto besser.
The IEM AllRADecoder enables the playback of Ambisonics signals with any speaker setup – the more spherical, the better.

Formats and object-based audio

As mentioned, all formats are somehow based on the assumption that the
the listener is looking towards the front, where there is usually a centre speaker or screen. You don’t actually want people to look across the room because there is usually a TV picture in front of them while they are listening to 3D audio content.
As I said, Dolby Atmos has established itself in the form of AC-4 with a large marketing budget and is spreading from films to music, podcasts and gaming. The alternative is MPEG-H made in Germany, which is particularly suitable for live streaming in the broadcast sector. The competition from Dolby Atmos Music is an adaptation of MPEG-H, called Sony 360 Reality Audio, which should provide a boost for the Sony Music label in particular. Both formats can even be found on Amazon Music, although Dolby already has around four times as many tracks, so the battle seems to have been decided.
One format that has not established itself for over 30 years, but is currently experiencing a renaissance, is Ambisonics. This sound field-based format has audio channels, but instead of loudspeakers it maps spatial axes. It all sounds a little unusual and only has a very small sweet spot when played back via loudspeakers. However, this disadvantage does not exist with headphones because you have the perfect playback position directly on your ears. The format can also be easily rotated around the X, Y and Z axis. This is why it has established itself more for the use of 360° videos and thus into the world of three degrees of freedom.

New audio freedoms in three degrees

With advances in technology, new 3D audio techniques are opening up unprecedented possibilities for media production. 3D audio with three degrees of freedom (3DOF) is one such concept that enables unprecedented immersion and dynamic sound experiences that benefit from the fact that you’re not just staring straight ahead.
Below we look at the pros and cons for specialised headphones, applications where it can be used effectively and different formats available when implementing this feature into a production workflow.

360° videos from a sound perspective

Probably the best-known representative of this genre are 360° videos. These spherical moving images triggered a real hype half a decade ago. Suddenly you could watch such videos
on the largest video platforms Video YouTube and Facebook. Strictly speaking, there are many types of user experience:

  • On desktop devices, using the mouse to turn your gaze while looking further forward at the screen.
  • On smartphones, where you hold your device in front of your nose, not turning your head, but rotating your body on its own axis.
  • Head-mounted displays (HMDs) are in the top class because the image really does adapt to the movement of your head in real time. Such 360° videos are also available as stereoscopic videos, as 3D videos, if you like, where each eye gets to see its own 360° panorama, creating a more vivid image in the brain. But of course there are also 360° videos with 3D sound. In this context, I also like to call it 360° sound, because you immediately understand that the sound is also spherical like the image.

Audio head tracking

If you now remove the image component, but still want the sound to react to head movements, then we find ourselves in the world of audio head tracking. Apple is already building this technology into ALL Airpods, from the entry-level Airpods to the Airpods Pros and, of course, Airpods Max. However, the use cases are currently limited.
Technically, it is now possible to listen to Dolby Atmos Music tracks via Apple Music, but, as I said, these tracks were never mixed with the intention of “listening around” in the music. In addition, Dolby metadata is even bypassed so that this feature can be activated at all. As a result, Dolby Atmos Music tracks sound different on Amazon than on Apple Music – not exactly what you want to hear as an audio engineer. But Apple needs content to be able to use its technology as a selling point.

3D sound is more than entertainment

But there are applications that actually solve problems with 3D audio and head tracking and are not just a fun factor. The medium is also becoming increasingly relevant for communication. MS Teams has added the “spatial audio” feature to its video calls. The good thing is that you don’t need any additional hardware. Simple headphones are all you need and the microphone signal is automatically spatialised in the cloud for the other users – even without head tracking.
In a video conference with several people, things can quickly become chaotic as the current mono system has problems keeping the different voices apart. Our brain has difficulty differentiating between the voices as they all come from the same direction. 3D sound makes the conversation situation natural and actually makes it measurably easier to listen, because it takes the strain off our brain – similar to the cocktail party effect. The 3D audio spatialisation of voices makes it much easier to differentiate between them and noise is less noticeable.

Head localisation is being fought – wrongly

Films can also be watched with head tracking, but there is a fundamental problem here: the approach is to make the entire sound 3D. However, this means that even elements such as narrative voices or background music are part of the scene where they shouldn’t be. Here is a very brief explanation on the subject of diegesis. 360° videos are the most vivid. Everything you can see should also be three-dimensional in terms of sound, as you are in the scene (diegetically). But it shouldn’t be a narrative voice that can’t be seen (non-diegetic). Otherwise you will hear a ghost coming from somewhere and wonder who is talking to you. But if the voice is played in mono, it doesn’t change and you immediately understand that a person is talking to you who isn’t even part of the scene.
In other words, in the world of 3 degrees of freedom, not everything is just 3D audio so that you have the feeling that it is coming “from outside”. Rather, the ability to combine the soundtrack with mono or stereo signals via headphones is important so that listeners understand which sound is coming from which narrative level. This is not possible with loudspeakers because they are always perceived from the outside, whereas with headphones you can – and should – make use of the localisation in the head.

Confused? Then let’s go round in circles again

Apple does a good job of being able to understand whether I’m listening to stereo or multi-channel content. This is all the more interesting when you realise that Bose burnt its fingers on this very subject years ago. In Cupertino, however, they believe in the technology and have already built it into every pair of in-house Airpod headphones. This also helps with the market launch of the Apple Vision Pro, but more on that later. When it comes to audio playback alone, a distinction is made not only between the two options, but a total of five.

Input: Stereo.

You can hear the sound as normal stereo, nothing special at first. However, if you don’t like this upside-down localisation, you can activate “Spatialize Stereo/Stereo to 3D Audio”, which adds a reverb algorithm to the signal to make it sound more “natural”.
In addition to this spatialisation, you can also activate head tracking, which makes the signal sound through headphones as if you were listening to it through two speakers in the room (3DoF).

Input: Multi-channel

Mostly Dolby Atmos via film platforms such as Disney or music streaming such as Apple Music. The sound is automatically converted from multi-channel audio to binaural stereo so that the sound sounds as if it is happening around you. This setting makes the most sense if you move around and don’t want the sound to change all the time as you move your head.
You can also activate head tracking, which is particularly useful for films if you have a visual reference point or if you want to distinguish objects from the front from those from behind (which is always difficult with binaural sound). I would call this level 3DoF.

Freedom comes with pitfalls – advantages of head tracking

A big advantage of this technology for 360-degree videos is the fact that you can now hear better when something is happening behind you, for example. Something you wouldn’t be able to see because we only have a limited field of view, but our hearing is always mapped in 360°. I have often seen VR experiences that slap their beautifully designed visual scene with arrows so that people look in the right direction at the right moment. Cleverly placed 3D sound can solve this problem intuitively.
It also solves one of the biggest obstacles of binaural audio. You often have the feeling of spatiality and that the sound is happening around you. However, you can rarely distinguish the front from the back. But if you are able to turn your head even slightly, you can immediately understand which sound is where. Head tracking does not mean that you have to move 360° – but you can. This solves the aforementioned problem of narrative levels. With films, you don’t have to ask yourself whether an element is dynamic or not. When we listen to epic music in the cinema, we don’t ask ourselves “where is Hans Zimmer now”. But because you now have this clear separation, you have to question how you actually use voice-overs and music. In most cases, a scene with well-designed sound effects is better than desperately trying to keep people entertained with music and speech. The brain is usually already well supplied with 360° images anyway, so three levels of sound (speech, sound effects, music) are more likely to cause confusion.

Disadvantages of head tracking

As already mentioned, the whole thing is not so easy to implement via loudspeakers. Theoretically, it is also possible to display 360° videos as a projection in planetariums, for example. For surround sound, you are surrounded by loudspeakers. But the speaker’s voice still somehow comes from one direction. What still works well in the cinema suddenly becomes a problem with spherical videos and a series of workarounds and compromises are necessary.
Unfortunately, you rarely know when listening whether the mix should be heard with head tracking or not. There are mixes that really fall apart when you have the opportunity to turn your head. While other productions don’t even make sense if you’ve deactivated head tracking.
Admittedly, I always talk so cleverly about what you should and shouldn’t do. But the reality is simply that there is usually neither the time, money nor knowledge to make your immersive media production really good. If anything, an Ambisonics microphone is put up and labelled as immersive audio. Only to be mixed in stereo in the end for budget reasons “because nobody can hear it anyway”. It may be that listeners who come into contact with immersive media for the first time don’t necessarily scrutinise the sound. But the more points of contact you have, the greater the desire for sound. All the top podcasts are now produced in a studio, even if they started out as a hobby – there must be some kind of quality reason for this 😉

Formats for audio head tracking

I have already mentioned the most famous representatives with Dolby Atmos, 360 Reality Audio (based on MPEG-H) and Ambisonics. However, these are all 3D formats that were not primarily developed for audio head tracking.
That’s why I don’t want to go into the technical details here, but rather briefly explain why Apple is once again showing a very good approach here. Even if Dolby likes to describe its format as future-proof, at some point it will reach its limits.
As mentioned, in the world of degrees of freedom, it’s not just 3D sounds that are relevant. But also precisely those non-diegetic sounds that are not part of the scene, but make sense for music and voice-overs in 0D. But of course there is more than just black and white thinking, i.e. 0D and 3D. Because there has to be something in between. This is often referred to as a bed. Apple refers to the three factors as:

  • 3D Audio Objects
  • Ambience Bed
  • Head-locked audio

The 3 layers for immersive soundtracks

we have already looked at 3D Audio Objects, which are the objects that I can place in the room. You usually have the option of setting distance parameters or the size of an object so that it doesn’t stand out from the scene. Let’s take a cheering 3D audio fan in a stadium as an example. Then I would like to have a reverb that gives you the feeling of being in the same place. But if I simply add a reverb to the object, the reverb will only come from this corner. However, sound spreads in all directions, so it would also be heard all around us. Theoretically, I could send the reverb to our head-locked audio track. But then the reverb would not be 3D.
This is where the aforementioned bed comes into play. All signals that should be spatial but can be diffuse can be sent here. So if you have not just one fan, but hundreds, you would otherwise have to fill 100 audio tracks with objects. This way, you can simply send the group to the bed and only need a fraction of the audio tracks.

What do the other formats do differently?

Dolby Atmos, for example, works with a channel-based 7.1.2 bed and you can add up to 128 mono objects. However, it doesn’t actually have a head-locked stereo track because the format is based on loudspeakers. So for me it is not suitable for podcasts. In principle, the Dolby Atmos renderer offers the option of marking an audio
Object as “disable binauralisation”. This means that it is not played spatially. However, if you activate head tracking, the Apple renderer bypasses this meta data and only reads out where the object is located in the scene. This means that all Dolby Atmos mixes were never mixed with the intention of head tracking and therefore rarely utilise the advantages of the technology.
Ambisonics, on the other hand, has 4, 9, 16 or more channels, depending on the order. So it has a bed, I can even work with head-locked audio, but again it has no objects. Which is why the sound is always a bit diffuse, or I would have to spend a lot of audio channels to get close to the resolution of object-based formats. It therefore supports head-locked audio in mono, but not stereo. However, this optional stereo track is standard with Facebook360 and YouTubeVR, for example. An Ambisonics file is supplied, which rotates depending on the viewing direction. If required, an additional stereo file that is always played in the same way, no matter where you look in the 360° video. This gives you the best of both worlds and a good compromise between resolution and quality.

6DoF – anything but dumb

Let’s move on to the premier class of 6 degrees of freedom. Here you not only have three possible rotations around the X, Y or Z axis – but also three translations to these axes. In less unnecessarily clever terms, this means that you can also move towards or away from the sound, making it louder or quieter, for example.
All the applications discussed so far have been based on the fact that the listener is at the centre of the action, in the so-called sweet spot. This is where the optimum listening position is. But now we can move away from this point all at once and you can already guess that this makes sound design even more complex. So that there are no acoustic holes in the 3D scene or you are distracted by too many sound sources.

Unity sieht erst einmal recht komplex aus, aber viele 3D Audio Parameter kann man gar nicht für sein Audioobjekt einstellen.
Unity looks quite complex at first, but you can’t set many 3D audio parameters for your audio object.

Games (more than a gimmick)

Here you will inevitably find yourself in game engines. This is why games are the best-known representative of this category. But not every game uses 3D audio. Once again, the genre question is a legitimate one.
A 2D game needs sounds from behind/above/below just as little as a strategy game in which I look down on my people from above like a god. Left/right is perfectly adequate here. 3D audio could be used here at most in the ambience, similar to films, so that you have more of a feeling of being part of the scene. Enveloped by sound is the keyword here again.
All games that take place in 3D worlds, especially first-person games, benefit from spatial sound. Fans of shooters have long known that being able to hear the enemies behind you before you get them in front of your virtual “weapon” can make all the difference in the game. That’s why gaming headsets are popular in this respect, so that your ears can tell your eyes where to look as well as possible.
However, such surround headsets are usually not even necessary to be able to hear spatially. We remember that our brain can do this with just two cleverly rendered audio channels. In most cases, the game automatically recognises whether you are using speakers or headphones and renders the sound accordingly. Nevertheless, it may well be that such surround headsets are even more customised to the software and, above all, enable communication. The Playstation 5, for example, advertises with the Pulse 3D Audio Engine and its own headset, which are very well matched to each other. Recently also in combination with Dolby Atmos in order to be able to control multi-channel soundbars.

Daher wird bei einer VR-Produktion wie hier in Unity gerne eine Schnittstelle zu einer Middleware integriert.
This is why an interface to middleware is often integrated into VR productions, such as here in Unity.

AR (augmented reality audio)

Pokemon AR is often used as best practice for augmented reality applications. Even though I like to say that the game didn’t go viral because it was AR, but because it was Pokemon and had a great multiplayer character. The built-in cameras of smartphones are usually used here. The Lidar scanner, which enables even more precise tracking, is also increasingly being used.
AR glasses have not really become socially acceptable yet. Magic Leap or Hololens cost several thousand euros and are even slowly establishing themselves in the industry. Google Glass was way ahead of its time, but the features you would want from such a device are correspondingly limited. That’s why augmented reality is currently even more exciting from an auditory perspective. The technology here is already very advanced and, as already mentioned, headphones usually have at least head tracking built in. Combined with the smartphone, experiences with six degrees of freedom are also possible. Applications could include audio guides in museums, where the paintings or statues are brought to life by sound and tell their life stories, for example.

VR (sound for virtual reality)

When it comes to VR applications, most people also think of gaming. This is supported by the fact that such projects are almost exclusively developed with game engines such as Unity or Unreal. However, the VR bubble is much more diverse than you might think. Training and simulations in particular are currently in the B2B industry without consumers realising it. The possibilities are virtually unlimited and, in addition to the aforementioned rollercoaster games for which the medium is mostly known, real use cases are establishing themselves that offer added value – such as saving time and money when training employees. Nevertheless, it is not easy to transfer games, apps and the like from 2D screens to VR. The user experience with HMDs and controllers is simply fundamentally different. In most cases, VR attempts to replicate reality and is just a poor digital copy. However, the immersive medium really comes into its own when you do things that you can’t do in real life.

A prime example is “Notes on Blindness” (is.gd/notes_on_blindness). A VR experience in which you slip into the character of someone who is slowly going blind. Automatically, even non-sound people pay much more attention to the subtle nuances in the sound. But in the vast majority of cases, sound is usually neglected by developers due to a lack of knowledge and time. That’s why most apps sold as immersive experiences sound quite sterile. We are a long way from AAA budgets and need to give this young medium some time.

Unity kommuniziert dann mit der Middleware FMOD. Diese ermöglicht eine viel präzisere Kontrolle über Interaktivität, hier etwa wie die Musik geloopt wird und wann der Track für die nächste Szene abgespielt wird – in Abhängigkeit vom Tempo des Titels.
Unity then communicates with the FMOD middleware. This enables much more precise control over interactivity, such as how the music is looped and when the track for the next scene is played – depending on the tempo of the title.

Spatial computing meets spatial audio

Let’s not get confused by Apple’s new term. For me, spatial computing is the same as XR (eXtended Reality). You are in virtual reality, extending your reality or somehow in between. The boundaries are no longer so easy to separate when even VR glasses have cameras that look into reality. So if I see reality through an HMD, is that VR or AR?
Apple doesn’t make the confusion any easier because they didn’t want to use terms that are already used by other companies. Virtual reality or the metaverse from Meta. Or mixed reality from Microsoft. In the long term, the result will be that we don’t have a VR device and an AR device, but a device that can depict all realities. I won’t comment on how AI and other buzzwords such as blockchain will play into this 😉
For me as a sound engineer, however, it is important to separate which sound is part of which reality. In VR, I want to isolate myself, so I use a surround sound that matches what the display tells me. Whereas in AR, I want the sound to sound as if something is happening in my living room, where I am right now. Apple is also closer to a solution here than other companies because Apple Vision Pro, for example, also introduced ray tracing, which recognises the geometry of our surroundings and renders the sound accordingly.
In addition, Apple has already built a good infrastructure for 3D audio with the Airpods. If you want to further optimise the sound, you take pictures of your ears, which generates a personalised HRTF. Our hearing as a 3D model, so to speak. This allows the iPhone, for example, to tune the sound for us even more precisely and the distinction as to whether the sound is with us in VR or AR becomes even clearer.

Game audio – playful or gambled away?

Gamers know how important it is not only to see your opponents in time, but to hear them beforehand. For this reason, many people like to spend good money on expensive headsets that supposedly give them an advantage in the game. However, the sound design in AAA games is also very well budgeted and therefore correspondingly complex. A short beep sound is enough for the player to know immediately what is happening in the scene.

Communication and collaboration in 3D audio and with 6 degrees of freedom are crucial aspects for the feeling of presence in social VR. I recently had the opportunity to be in Social VR myself and was surprised at how long I was in there at a stretch. Even though the room was virtual, afterwards I had the feeling that the other person was actually in the same room as me. The cognitive load on our brain is lower because the sound is not coming from just one direction, as is the case with video calls, but a natural conversation situation is depicted.

But augmented audio can also make our everyday lives easier when we are on the move. Everyone is probably familiar with the problem of travelling on the underground using a map service for smartphones. You arrive somewhere on the
somewhere on the surface, but have no idea where to go because the sat nav is confused as to which direction we are travelling in. GPS is simply too imprecise. But if you have a second reference system – in the form of surround headphones that know the direction you are facing – you could simply hear a voice from the direction you need to move in.

Game over with these problems?

We humans have been used to hearing in three dimensions since birth. Now we can finally approximate this impression naturally. It all sounds very simple, but to get back from stereo to the original, an extremely large number of parameters are required. Unfortunately, it is not enough to load an audio object into a game engine and tick the “3D Audio” box. Here is a brief overview of what is required for 3D audio:
Before you insert an audio clip into a game engine, you should ask yourself a few questions. Where did you get the sound from and does it fit in with the other sounds in your library or recordings that you may have processed with EQ? What is the purpose of the sound? Will it run in the background or will it serve as a trigger for special actions in the game? The secrets of game audio have evolved in recent years. Usually the result is to have loud sounds in a 3D world, but they never really work together.
Since you not only give the sounds parameters as to where they are in the environment, but you also move through the world as a character, there are a variety of parameters that you can give your audio object. As already mentioned, a distinction is made between mono and 3D sound and the size of the sound. Another important parameter is the attenuation curve, which regulates how quickly and how much the volume of the sound decreases in the room. By setting the focus parameters and the air absorption and occlusion factors, you can further determine how the sound spreads in the room.
So far so good, but so far the sound still subjectively sticks very close to your face. It has a certain distance, but our brain does not yet know what kind of room we are in. At the moment it is an abstract sound source in an empty room. So it’s time for reverb. An important consideration when creating realistic sound effects using 3D reverb is the size of the room and the material of the walls. Here, too, the calculation is usually only approximate. You would actually need real-time ray tracing to be able to realistically simulate initial reflections and reverberation. However, with the right combination of the parameters mentioned above, you can get quite close and don’t need a render farm.

If the software wasn’t so hardware..

Hard… you know? Anyway, in this context you usually come across the term “game audio”, which deals with the design of interactive audio content. The three new degrees of freedom create two new problems: You don’t know exactly when this player is actually where.
That’s why audio production doesn’t end up with one long audio file with several channels. Instead, many small assets are delivered. These can be loops such as a forest atmosphere, for example, which is repeated in the background until we are no longer in the forest. The second category is trigger sounds, for example if I want to hit a tree with an axe, a “tin” should also come at the right moment.
The only audio format that is still under development and that could depict everything is Fraunhofer’s MPEG-I. But that will take a few more years. However, this will take a few more years, which is why it is mostly found in game engines such as Unity or Unreal Engine. The former is only equipped with very rudimentary 3D audio features. With Epic Games, you can go further. Nevertheless, both platforms quickly reach their limits, which is why it is common to implement middleware for your project. Audiokinetic Wwise and FMOD are popular programmes for this and usually provide everything you need. And if not, you can always write your own scripts. Easier said than done, because here the sound designer has to become more of a developer. With endless possibilities but also complexity.

Conclusion on the large 3D Audio Matrix

To summarise, 6DoF makes it possible to move freely in space and changes the way we experience sound, initially in a playful way. Even if games have somehow been using this for decades, there is a much greater added value than entertainment. That’s why it doesn’t make sense for me to call 3D audio the stereo killer now, as Dolby likes to propagate. I’m starting with what wouldn’t have worked with stereo, true to the motto “Sound First”. In the sound community VDT (Association of German Sound Engineers) and AES (Audio Engineering Society) there is a lot of talk about “immersive audio”, but it’s mostly just about 3D music. And it feels like nerdy details that users don’t understand anyway.
Audio professionals have to ensure that the scene is set to music in such a way that it doesn’t sound empty but also not overloaded. Developers suddenly have to deal with very complex audio parameters that they have to operate on their own more often than I would like. However, the bigger the project, the more budget there is for the respective specialists and games can certainly be taken as a prime example in terms of creativity and technical realisation. It’s an exciting time for the audio world because 3D audio is really celebrating one breakthrough after another in a wide variety of areas. So if you have a project that you need help with, or would like to learn more about this area with my video course, just get in touch!

]]>
DIGITAL PRODUCTION 149573