
In our philosophy, the basic criterion that a great sound system should have is to honestly reproduce the sounds that they originally were in their nature, in terms of our auditory perceptual system. This might sound simple, but it is way harder than we think.
Let’s say there is a perfect microphone that has a perfect response from 20Hz-20kHz (our hearing spectrum) to capture every sound with any dynamic level, and we have a perfect speaker that has the same perfect response for playing them back. The inputs would be exactly identical to the outputs. What you hear from the sound source is what you get from the speaker. That is the most perfect definition of what we believe the best sound quality is.
Numerous double-blind experiments conducted by reputable researchers have demonstrated that the Flat Frequency Response is a key criterion for delivering the best sonic pleasure to listeners. There is no doubt that flat frequency response is the first priority in terms of loudspeaker design and manufacturing. However, if we get the chance to measure our sound system in the listening room, it is almost impossible for us to see the measurements are flat.
In fact, in the real world, all the responses of the speakers that you normally see in the product specifications are actually measured on their “acoustic-center axis” (in-phase plane) in the anechoic room (without reflections). The reason they only do that is to eliminate the effect of the surrounding environment and off-axis signal interference. Because once the sound bounces back from any reflective surface, the measurement will be “contaminated”. As we can’t control where and how the users listen to the speaker, we cannot and should not standardize the combined measurement result as a target response. There are simply unlimited types of our listening environment, and we can literally say every room response is unique acoustically.
Unfortunately, these geometric variants have significantly altered the direct sound from the speaker to our ears. There are just too many factors that make the sound no longer the same as it is designed to be. And it is almost impossible for us to track them all.
No matter how much you have spent on the sound system or even the acoustic treatment of the listening room (except the anechoic room), the sound quality has been compromised to some degree and needs to unlock its full potential. It’s like having multiple filters that have been added to a picture, and we need to retrieve the details behind them. The objective is to resemble its true colors by doing the reverse-engineering-like calibration. Is that even possible?
We understand that it is not easy to objectively measure the sound quality, as auditory perception is very subjective. And people nowadays don't quite trust the scientific data to determine the sound quality because it is not often accurately reflecting what we hear, possibly due to the fact that our hearing system is very adaptive and still not fully understood in some areas. Although the calibration process has been drastically overlooked, people are losing faith in this process because most of the time it sounds worse than “doing nothing” when the calibration process isn’t done right.
We have been studying the science behind all these sound alterations for over 20 years. Along the way, all the knowledge we collected from our experiences and research tells us that the measurement data and our perception can be aligned when the method is applied correctly. Even making us excited is that all these responses and output signals can be “fixed” and resemble their true color (timbre) of sound, as long as we have the knowledge and specialized spectral/spatial correction technique.
We are very proud of the results from our sophisticated methods, and
and named it as “Timbre Chart Tuning Method”. We really hope that we can share this outstanding sound quality to the world.
From the deep dive examinations over the years, we concluded three main factors that we have to look at to achieve the best sound quality:
And we will take 4 basic approaches to achieve the best sound for you:

One day, I saw my 3-year-old daughter playing with her clay and trying to mix and roll different colors of clay together into a pretty small planet. Lots of times when I asked her what her favorite color is, she always said “rainbow“. I think “Everybody likes rainbows”; they lay out almost all the colors we can see in our visible spectrum in front of our eyes. The fullness of colors gives us the highest “color contrast” we can possibly see, giving us the most beautiful phenomena that every human being could agree on. The same logic can also be applied to audio. In order to reproduce the highest sound quality for the listeners, a good sound system not only must be able to reproduce all the frequencies we can hear across our hearing spectrum, but it must also be able to deliver the frequencies “accurately” at the same phase relationship. The question is, how do we know it is “accurate” for all of us? We understand that it is far more complex than just looking at the Real-Time Analyser (RTA), which doesn’t really illustrate what we perceive in our human hearing system.
It is not that hard to find an excellent response speaker in the market. But why do people often say the good response speaker doesn’t give you the good sound? We believe one way to explain is that we will seldom have the same excellent response in a real-life situation.
The window of this ideal listening position (sweet spot) is so narrow that it is not easy to listen just right there at the spot. The typical frequency response we see in the specification is normally measured at 1 or 2 meters on the acoustic center axis in the anechoic chamber. And the manufacturers could not customize each of their products to fit every listening angle and distance for every user.
Because most of the time, we are not just hearing the direct sound from the speaker. And the direct sound is straightforward to be altered by the environment, especially the low frequency, as the sound waves are moving so fast. The speakers are acting like the strings of a guitar, and the room is acting like the guitar's resonant body, which simply becomes a part of the musical instrument. We are hearing the combination of direct sound and reflected sound. How many of these sounds we heard are mixed in our perception, and how much of them can be separated by our hearing system? It is honestly hard to tell. Or the more important question would be whether we can control the very outcome of the combination sound just by turning the speaker? We can say this is truly possible. We are able to show you how.
It is the nature of the sound source that causes sound propagation to behave differently at different angles, distances, and after reflection. The physical size of the sound source makes the frequencies in the spectrum have their own directivity and intensity attenuation rate. And the arrangements or dislocations of the multiple drivers and geometrically unstable reflections all make these behaviors even more complicated and unpredictable when they have to work with each other in the crossover frequency spans. It is simply impossible for us to track them all.
Let’s say there are three spotlights with three different colors (Red, Green, and Blue) aiming at the screen in a room. These spotlights represent the different drivers of a speaker (Red = woofer, Green = Mid, Blue = tweeter) and the screen represents our listening area. Each of the lights has its own inherent beam angle, just like we have different directivity in different audio frequencies. In order to emit a perfect white color to our eyes at the aiming point (on-axis sweet spot), these 3 lights have been balanced with the perfect right amount of intensity. So we got the perfect white color at the spot on the screen (Pic on top). If we look at the outer edge of the spot (off-axis), we start having the color distortion. It faded into yellow and then ultimately turned red. That’s why it’s so important to listen to the acoustic center of the speaker, where the perfect spectral balance only occurs.
Now, what makes things worse is that the room interior is not in black anymore; instead, it’s painted randomly in different light colors (mostly in red (LF)). The paint of the walls, ceiling, and floor makes the light rays reflect back to the screen (Bottom Pic).
Consequently, we no longer get the perfect white at the aiming point on the screen, and all the colors shift elsewhere. What we need to do is rebalance the RGB intensities to get the white color spot again. In audio, we use the equalizer and filters.
It’s easy to paint the room into the color black to absorb all the reflected light. Unfortunately, in the audio realm, it is very hard to absorb all the sound (especially the bass) in the room, even in the studios. The low energy and narrow directivity of high frequencies makes them less troublesome, but the alterations by the high power and omnidirectional low frequencies are almost impossible to get away with. They will be bouncing around in the room with no control. Besides, our human hearing system is much more complex than we think. At this moment, we still can’t mimic our high-level adaptive sensations of direct/reflected sound in the measuring machine (analyzer). Utilizing the ear to calibrate the sound system is the best way to make sure we hear the output is the same as the input.
Many people said that we cannot hear the phase differences of sound. That statement is true when the phase of sound is already non-linear in the first place. Unless the speaker drivers are coaxial, the sounds from the multiple drivers can only be in-phase on a very thin listening plane (acoustic center). Off this plane, the travel time of the sounds from drivers to the ear would be different. Adding the factors of directivity and the reflections, you will have the directivity mismatch, which is an issue in the frequency response off-axis and on-axis. Up to this point, the phase response is no longer linear and has been randomized. It will sound electrically reverberated. That’s why most of the time when you play your voice in the speaker, it just sounds like playing from speakers. Seldom sounds like there is a person making a voice.
It is hard to tell people what good sound quality is unless you give them a chance to listen to the a/b comparison by themselves. The term “auditory proximity” has not been heard much in the audio realm. But we understand it has a great deal to do with sound quality. As the coaxial or linear phase speaker becomes popular, people might start to be aware of its existence, especially when the current trend of Immersive sounds requires this feature very necessarily. In our opinion, this is the crucial quality to distinguish between the real raw sound sources and the reproduced sound sources. To achieve “auditory proximity” is simple, reproducing the linear phase response in the system like the real raw sound source does. The ear detects auditory proximity through the phase coherence of upper harmonics in the direct sound, which are randomized by early reflections. The harmonic phase is an important element of realistic sounds.
If the main sound quality issue of the audio reproduction system is the inconsistent response between systems in listening environments with different reverberation patterns, maintaining the flat frequency responses (FR) of all the systems in the chain would be the right direction. However, this FR maintenance must be relative to our human ears, not relative to the analysis machine, as the machine doesn’t align with our hearing.
Using our ears to flatten the responses is the key!
The first thing we need to identify is our frontal hearing response; it’s definitely not flat, and it is different in everyone in some degree, like our fingerprints. And it’s also directionally variated. Sound coming from different angles will have different responses. Fortunately, we are not aiming for tuning the response to fit our ears; we are just focusing on flattening the sound as it naturally is (anechoic flat). We are just utilizing our hearing mechanism to determine what the anechoic flat responses would sound like.
If you are familiar with color grading, you might know there is also a clever approach to easily make the color calibration process of a display screen simpler for the user, without requiring deep color management knowledge. The idea is simple, providing a bunch of graphic color/brightness charts and simple instructions for user to create a profile for their own displays by themselves. The result is exceedingly accurate.
Our Timbre Chart Method takes a similar approach; we created a set of test tones that are designated for our well-trained engineer’s frontal hearing. It acts like a personalized color chart in audio, allowing us to identify the true relative level of each frequency band. With other deep measurement tools, like specific Time-Windowing Dual-FFT, we can recreate the best response that the system is designed to give you.

Suppose your ultimate goal is to achieve the best sound quality, which reproduces the original nature of the sound before it was captured, in terms of our hearing perception. The Flat Curve is what you are looking for, but only in anechoic conditions, where you are solely hearing the direct sound. Hypothetically, if we can keep the direct sound responses all flat in every sound system and listen to them in anechoic conditions, no matter recording, mixing, mastering, or consumer playback, the inputs would be exactly equal to outputs, and there would be no “Circle of Confusion” problem.
However, in the practical world, when we look at the ideal direct sound in a typical reverberant listening room from the analyzer (Steady-State), we will notice that the bass has been significantly boosted (>18 dB variations depending on room size and boundary distance to speakers) as its wavelength is too long and moving so fast for them to add up together. The direct bass has accumulated the LF reflections, and the HF might have an attenuation roll-off mainly because of the Comb filtering effect (acting like a negative feedback loop) and its narrow directivity. Perceptually, we need to attenuate the bass and boost the treble, but the truth is that we don’t know how much to tweak because the reflections are already mixed with the direct sound, and we don’t know our human hearing time window for differentiating them. Counterintuitively, flattening the steady state frequency response will weaken the bass and overshoot the treble, which leads to the piercingly bright sound. That is why people tend to do nothing rather than target their sound to the Flat Curve. On the other hand, some people take another alternative approach, utilizing different Target Curves to predict how much or less LF and HF are needed to be boosted or suppressed. What it actually does is to try matching what we perceptually heard out of the accumulated direct/reflected sound. Unfortunately, due to the fact that every room has its unique reflection pattern, we can say that there’s no “one target curve fits all” situation.
How about utilizing the FTW for the measurement? Can this solve the problem? FTW is to limit the measuring time to a specific short time. The purpose of doing this is to avoid measuring the reflections by removing later-arriving energy. It brings the measurements closer to the direct sound, but it still cannot separate the reflections like the way our ears do. Moreover, it is controversial to say that this can effectively reduce the detrimental artifacts caused by reflections in the frequency response, like comb filtering. Besides, utilizing the FTW is no doubt a good reference for the spectral balancing, especially the low frequency.
(except the LFE or other effect channels).
In 1931, engineer Alan Blumlein at EMI went to watch a movie with his wife in a local cinema. He found that a single-channel mono sound led to a disconcerting effect, where the actor appeared on one side of the screen while his voice seemed to come from the other. Blumlein said to his wife that he had an idea which could make the sound follow the actor across the screen. He then invented the multi-channel stereophonic technology and subsequently patented stereo records, stereo films, and also surround sound.
To this day, all the advanced sound systems in the commercial market, including Immersive Audio, are still basically utilizing this technology from Blumlein. Their major differences that impact the outcome sound quality are lying on Rendering solutions, Speaker configurations, and Calibration processes, which must not be underestimated.
In our RGB Spotlight Analogue, a single-channel signal would provide a spot on the screen; multiple channels would provide multiple spots. They are acting like the picture pixels of a soundscape, to construct what we call Sonic Image; more channels will give us a higher resolution of this image. The only difference is that, unlike the visual perception, our binaural hearing system doesn’t require very high resolution to form a Sonic Image, as we can utilize a simple principle of “Amplitude Panning” (short for “Amplitude Panoramic Perception). Even a two-channel stereo configuration could give us a very good sensation of soundscape and envelopment. Of course, as the trend of Immersive Audio, the design of multi-channel speaker configuration becomes more and more complicated and evolved to utilize the Object-based audio system.
Amplitude Panning utilizes Interaural level difference (ILD) localization cues to create a compromised sonic image placement. The signal uses level offset against each other and places individual signals along the panoramic horizon by tricking our brains with the level differences. The level relationship between channels is called “panning law” or “panning rule.”
Based on this principle, the Amplitude Panning can only work when the listener stays within a narrow range of distances and angles from the speakers. We call it “10ms listening window”. Listening outside of this window, the Sonic Image will distort and ultimately collapse. Adding the factor of speaker coverage for different sizes of audience listening area, designing a multi-channel speaker configuration can sometimes be very challenging and can not be taken lightly.
Object-based Audio (OBA) benefit:
In the output masters, the OBA separately contains unmixed sound clips and mixing data recorded in the sound studio. All the final mixing actions that took place in the studios are recorded in this metadata, and subsequently proceed at the playback sound system for the listener. The benefit of having this design is that OBA can be more capable of honestly resembling the “Sonic Images” that the content producers created in the studio, regardless of the speaker configuration differences between the dubbing studio and auditoriums.
"Circle of confusion" is the fundamental problem in the audio industry, described by legendary acoustical engineer and audio expert Floyd Toole. It explains the lack of a standardized calibration method in different stages of sound monitoring systems, i.e., manufacturing, recording studio, mixing studio, mastering studio, and consumer listening environments, leading to the sound quality being deviated from the originally intended in each stage. Listeners then judge audio equipment based on how these potentially flawed recordings sound on their own systems, creating a cycle with no fixed, objective reference point for accuracy.
The reason why we don’t have a standardized calibration method is that the sound quality solely calibrated by analysing machine. The sound quality cannot replicate and be consistent in different listening environments of the sound reporduction chain, where they have their own unique reverberation patterns, in terms of our hearing perception. Due to the fact that the analyzer cannot accurately review what we actually hear.
If we have a method to keep the sound quality of the input perceptually equal to its output in each stage, we could basically break this “circuit of confusion”.
(Coming Soon…)
(Coming Soon…)
Copyright © 2025 Kino Acoustics - All Rights Reserved.
We use cookies to analyze website traffic and optimize your website experience. By accepting our use of cookies, your data will be aggregated with all other user data.