Brief introduction of audio codec and its optimization method

Audio codec (codec) is mainly formulated by several major technical organizations according to different application scenarios, namely ITU-T, 3GPP, and MPEG. Of course, there are also some companies or a combination of companies, such as Microsoft's WMA. They not only formulated the codec specifications, but also provided the reference code for the software implementation, which is convenient for the popularization of the use of the codec. This article first talks about these codec, and then talk about how to optimize the codec according to the reference code (mainly to reduce CPU load).
1. Codec specification
1) ITU-T
ITU-T formulates the codec standard for wired voice, that is, the G series, which mainly includes G.711, G.722, G.726, G.728, G.729 and so on. The sampling rate is 8KHz in the narrow band and 16KHz in the wide band. The code rate varies from 64kbps to 8kbps.
The following table lists the specific sampling rate and code rate.
2) 3GPP
3GPP has developed a codec standard for mobile voice, mainly AMR (adapTIve mulTI-rate, adaptive multiple code rate) series, etc., which can adaptively adjust the code rate according to network conditions. The sampling rate is 8KHz in the narrow band and 16KHz in the wide band. In recent years, in order to cope with the competition of the Internet (Internet companies have proposed OPUS codec covering voice and music), 3GPP has released the EVS (enhanced voice service) audio codec specification. EVS also covers voice and music, can flexibly switch between the two, and supports multiple sampling rates and bit rates. The details are as follows.
3) MPEG
MPEG formulates the music codec specifications, mainly including MP3, AAC, etc. Everyone is familiar with MP3. It is the most important format for listening to music in the past two decades. AAC is the successor of MP3 and the most important music codec specification for the next generation. The sampling rate in music is generally 44100HZ, and some use 48000HZ. The code rate is within a range, the greater the code rate, the better the sound quality.
4) Company or company consortium
Some companies or corporate consortiums develop audio codec specifications as needed, such as Microsoft â€™s WMA, Skype â€™s SILK, and GIPS (GIPS was acquired by Google in 2011, and Google â€™s GIPS-based audio and video solution launched webRTC and open sourced it. Huge) ILBC etc. Another thing that must be mentioned is OPUS, which was jointly developed by the non-profit Xiph.org Foundation, Skype and Mozilla, etc., full frequency band (8kHZ to 48kHZ), supports voice and music (SILK for voice, music for voice CELT), has been accepted by the IETF as the voice codec standard on the network (RFC6716).
The codec I used has G.711 / G.722 / G.726 / G.728 / G.729 / AMR-NB / AMR-WB / ILBC / OPUS / MP3 / AAC / WMA / APE from voice to music / Vorbis / ALAC / FLAC etc.
2. Optimization of codec
The optimization mentioned here mainly refers to the optimization of CPU load, that is, the optimized codec consumes less CPU and runs more smoothly on a specific hardware platform. The extent to which optimization is completed depends on demand. If it is used for the project after optimization, it depends on how long the project gives you optimization and the optimized CPU load that the project can accept. In general, the project can run smoothly in the most complex scenarios after using the optimized codec. It does not need to affect other functions, because the project needs to free up people to do other things. After all, the project schedule and quality are the most important. If it is sold as a library to customers after optimization, it should be optimized to the extreme, because this is an important indicator of which company library the user chooses to use, and it is the selling point. In this case, there will be more optimization methods and skills. The optimization I have done is for the project, not as a library for the client, so the skills are not very much.
(1) Preparation before optimization
a) Read through the codec of the codec to be optimized, and try to understand it. Even if you don't understand it, you need to figure out what the function does. This is good for optimization later.
b) Prepare the profiling tool. The profiling tool measures how much clock is spent running a function. It is best to have ready-made profiling tools. If not, make your own tools based on the specific OS and hardware platform (ARM / MIPS, etc.).
c) Prepare the test vector, that is, the test audio source, which will be provided by the official codec, usually multiple vectors, corresponding to different scenarios. The principle of optimization is to reduce the CPU load and the algorithm operation results are not changed, so each optimization will be run with the test vector to see if the results have changed. If it changes, it will be returned to the previous version. . When I do optimization, I keep at least one version every day, sometimes two or three, just to go back when there is a problem and find out where the optimization problem is as soon as possible.
(2) Optimization steps and methods
a) Change the compiler optimization option from -o0 to -o3
b) Add inline to the short functions that are frequently called in the code
Under normal circumstances, after doing a and b, the load will come down a lot, just like squeezing the foam, it will squeeze a large part.
c) There are many basic operations (addition, subtraction, multiplication and division) functions in the codec reference code of ITU-T or 3GPP. These functions are very rigorously written, and the frequency of calls is very high, which increases the complexity of the operation. Some of these functions can be simplified under the premise of ensuring correctness (such as some anti-saturation can be unnecessary), so that the load will drop after processing.
d) Use the profiling tool step by step to check which function takes more load, understand what this function does, and then analyze the specific problems and see how to optimize.
e) Some functions are just a small algorithm, the reference code is more complicated, and the calling frequency is relatively high. To find if there is a simple implementation that can be replaced, and if it is replaced, the load will drop a bit. For example, the calculation of the square root is often found in the codec, and the reference code is usually more complicated. We know that the square root can also be found using Newton's iterative method, and we can use the Newton's iterative method to replace and lower the load.
f) Use assembly optimization. If you can solve the problem at the C level, don't use assembly. Each processor has its own assembly instruction set, need to learn and master the ideas and skills. Usually the function with higher frequency and more load is written in assembly, that is, mixed programming with C and assembly. Assembly optimization will take a relatively long time.
Of course, there are some small tricks such as expanding for loops, replacing arrays with pointers, etc., so I won't talk about them one by one here.

EVOD kit
Suizhou simi intelligent technology development co., LTD , https://www.msmsmart.com