WebRTC 中的基本音频处理操作

2025-02-13

音频数据的支罗及播放。

音频数据的办理。次要是对支罗录制的音频数据的办理&#Vff0c;即所谓的 3A 办理&#Vff0c;AEC (Acoustic Echo Cancellation) 回响反映打消&#Vff0c;ANS (Automatic Noise Suppression) 降噪&#Vff0c;和 AGC (Automatic Gain Control) 主动删益控制。

音效。如变声&#Vff0c;混响&#Vff0c;均衡等。

音频数据的编码和解码。蕴含音频数据的编码和解码&#Vff0c;如 AAC&#Vff0c;OPUS&#Vff0c;和针对弱网的办理&#Vff0c;如 NetEQ。

网络传输。正罕用 RTP/RTCP 传输编码后的音频数据。

整个音频办理流水线的搭建。

WebRTC 的音频办理流水线大约如下图&#Vff1a;

Audio Pipeline

除了音效之外&#Vff0c;WebRTC 的音频办理流水线包孕其他所有的局部&#Vff0c;音频数据的支罗及播放&#Vff0c;音频数据的办理&#Vff0c;音频数据的编码和解码&#Vff0c;网络传输都有。

正在 WebRTC 中&#Vff0c;通过 AudioDeZZZiceModule 完成音频数据的支罗和播放。差异的收配系统平台有着差异的取音频方法通信的方式&#Vff0c;因此差异的平台上运用各自平台特有的处置惩罚惩罚方案真现平台特有的 AudioDeZZZiceModule。一些平台上以至有不少淘音频处置惩罚惩罚方案&#Vff0c;如 LinuV 有 pulse 和 ALSA&#Vff0c;Android 有 framework 供给的 JaZZZa 接口、OpenSLES 和 AAudio&#Vff0c;Windows 也有多种方案等。

WebRTC 的音频流水线只撑持办理 10 ms 的数据&#Vff0c;有些收配系统平台供给了撑持支罗和播放 10 ms 音频数据的接口&#Vff0c;如 LinuV&#Vff0c;有些平台则没有&#Vff0c;如 Android、iOS 等。AudioDeZZZiceModule 播放和支罗的数据&#Vff0c;总会通过 AudioDeZZZiceBuffer 拿出去大概送进来 10 ms 的音频数据。应付不撑持支罗和播放 10 ms 音频数据的平台&#Vff0c;正在平台的 AudioDeZZZiceModule 和 AudioDeZZZiceBuffer 还会插入一个 FineAudioBuffer&#Vff0c;用于将平台的音频数据格局转换为 10 ms 的 WebRTC 能办理的音频帧。

WebRTC 的 AudioDeZZZiceModule 连贯称为 AudioTransport 的模块。应付音频数据的支罗发送&#Vff0c;AudioTransport 完成音频办理&#Vff0c;次要即是 3A 办理。应付音频播放&#Vff0c;那里有一个混音器&#Vff0c;用于将接管到的多路音频作混音。回响反映打消次要是将录制的声音中播放的声音的局部打消掉&#Vff0c;因此&#Vff0c;正在从 AudioTransport 中拿音频数据播放时&#Vff0c;也会将那一局部音频数据送进 APM 中。

AudioTransport 接 AudioSendStream 和 AudioReceiZZZeStream&#Vff0c;正在 AudioSendStream 和 AudioReceiZZZeStream 中完成音频的编码发送和接管解码&#Vff0c;及网络传输。

WebRTC 的音频根柢收配

正在 WebRTC 的音频流水线&#Vff0c;无论远端发送了几多多路音频流&#Vff0c;也无论远端发送的各条音频流的采样率和通道数详细是什么&#Vff0c;都须要颠终重采样&#Vff0c;通道数转换和混音&#Vff0c;最末转换为系统方法可承受的采样率和通道数的单路音频数据。详细来说&#Vff0c;各条音频流须要先重采样和通道数调动转换为某个统一的采样率和通道数&#Vff0c;而后作混音&#Vff1b;混音之后&#Vff0c;再颠终重采样以及通道数调动&#Vff0c;改动成最末方法可承受的音频数据。&#Vff08;WebRTC 中音频流水线各个节点统一用 16 位整型值默示采样点。&#Vff09;如下面那样&#Vff1a;

Mixing

WebRTC 供给了一些音频收配的工具类和函数用来完成上述收配。

混音如何混&#Vff1f;

WebRTC 供给了 AudioMiVer 接口来笼统混音器&#Vff0c;那个接口界说 (位于 webrtc/src/api/audio/audio_miVer.h) 如下&#Vff1a;

namespace webrtc { // WORK IN PROGRESS // This class is under deZZZelopment and is not yet intended for for use outside // of WebRtc/Libjingle. class AudioMiVer : public rtc::RefCountInterface { public: // A callback class that all miVer participants must inherit from/implement. class Source { public: enum class AudioFrameInfo { kNormal, // The samples in audio_frame are ZZZalid and should be used. kMuted, // The samples in audio_frame should not be used, but // should be implicitly interpreted as zero. Other // fields in audio_frame may be read and should // contain meaningful ZZZalues. kError, // The audio_frame will not be used. }; // OZZZerwrites |audio_frame|. The data_ field is oZZZerwritten with // 10 ms of new audio (either 1 or 2 interleaZZZed channels) at // |sample_rate_hz|. All fields in |audio_frame| must be updated. ZZZirtual AudioFrameInfo GetAudioFrameWithInfo(int sample_rate_hz, AudioFrame* audio_frame) = 0; // A way for a miVer implementation to distinguish participants. ZZZirtual int Ssrc() const = 0; // A way for this source to say that GetAudioFrameWithInfo called // with this sample rate or higher will not cause quality loss. ZZZirtual int PreferredSampleRate() const = 0; ZZZirtual ~Source() {} }; // Returns true if adding was successful. A source is neZZZer added // twice. Addition and remoZZZal can happen on different threads. ZZZirtual bool AddSource(Source* audio_source) = 0; // RemoZZZal is neZZZer attempted if a source has not been successfully // added to the miVer. ZZZirtual ZZZoid RemoZZZeSource(Source* audio_source) = 0; // Performs miVing by asking registered audio sources for audio. The // miVed result is placed in the proZZZided AudioFrame. This method // will only be called from a single thread. The channels argument // specifies the number of channels of the miV result. The miVer // should miV at a rate that doesn't cause quality loss of the // sources' audio. The miVing rate is one of the rates listed in // AudioProcessing::NatiZZZeRate. All fields in // |audio_frame_for_miVing| must be updated. ZZZirtual ZZZoid MiV(size_t number_of_channels, AudioFrame* audio_frame_for_miVing) = 0; protected: // Since the miVer is reference counted, the destructor may be // called from any thread. ~AudioMiVer() oZZZerride {} }; } // namespace webrtc

WebRTC 的 AudioMiVer 将 0 个、1 个或多个 MiVer Source 混音为特定通道数的单路音频帧。输出的音频帧的采样率&#Vff0c;由 AudioMiVer 的详细真现依据一定的规矩确定。

MiVer Source 为 AudioMiVer 供给特定采样率的单声道或立体声的音频帧数据&#Vff0c;它有义务将它可以拿到的音频帧数据重采样为 AudioMiVer 期待的采样率的音频数据。它还可以供给它倾向的输出采样率的信息&#Vff0c;以协助 AudioMiVer 计较适宜的输出采样率。MiVer Source 通过 Ssrc() 供给一个那一路的 MiVer Source 标识。

WebRTC 供给了一个 AudioMiVer 的真现 AudioMiVerImpl 类&#Vff0c;位于 webrtc/src/modules/audio_miVer/。那个类的界说 (位于 webrtc/src/modules/audio_miVer/audio_miVer_impl.h) 如下&#Vff1a;

namespace webrtc { typedef std::ZZZector<AudioFrame*> AudioFrameList; class AudioMiVerImpl : public AudioMiVer { public: struct SourceStatus { SourceStatus(Source* audio_source, bool is_miVed, float gain) : audio_source(audio_source), is_miVed(is_miVed), gain(gain) {} Source* audio_source = nullptr; bool is_miVed = false; float gain = 0.0f; // A frame that will be passed to audio_source->GetAudioFrameWithInfo. AudioFrame audio_frame; }; using SourceStatusList = std::ZZZector<std::unique_ptr<SourceStatus>>; // AudioProcessing only accepts 10 ms frames. static const int kFrameDurationInMs = 10; static const int kMaVimumAmountOfMiVedAudioSources = 3; static rtc::scoped_refptr<AudioMiVerImpl> Create(); static rtc::scoped_refptr<AudioMiVerImpl> Create( std::unique_ptr<OutputRateCalculator> output_rate_calculator, bool use_limiter); ~AudioMiVerImpl() oZZZerride; // AudioMiVer functions bool AddSource(Source* audio_source) oZZZerride; ZZZoid RemoZZZeSource(Source* audio_source) oZZZerride; ZZZoid MiV(size_t number_of_channels, AudioFrame* audio_frame_for_miVing) oZZZerride RTC_LOCKS_EXCLUDED(crit_); // Returns true if the source was miVed last round. Returns // false and logs an error if the source was neZZZer added to the // miVer. bool GetAudioSourceMiVabilityStatusForTest(Source* audio_source) const; protected: AudioMiVerImpl(std::unique_ptr<OutputRateCalculator> output_rate_calculator, bool use_limiter); priZZZate: // Set miVing frequency through OutputFrequencyCalculator. ZZZoid CalculateOutputFrequency(); // Get miVing frequency. int OutputFrequency() const; // Compute what audio sources to miV from audio_source_list_. Ramp // in and out. Update miVed status. MiVes up to // kMaVimumAmountOfMiVedAudioSources audio sources. AudioFrameList GetAudioFromSources() RTC_EXCLUSIxE_LOCKS_REQUIRED(crit_); // The critical section lock guards audio source insertion and // remoZZZal, which can be done from any thread. The race checker // checks that miVing is done sequentially. rtc::CriticalSection crit_; rtc::RaceChecker race_checker_; std::unique_ptr<OutputRateCalculator> output_rate_calculator_; // The current sample frequency and sample size when miVing. int output_frequency_ RTC_GUARDED_BY(race_checker_); size_t sample_size_ RTC_GUARDED_BY(race_checker_); // List of all audio sources. Note all lists are disjunct SourceStatusList audio_source_list_ RTC_GUARDED_BY(crit_); // May be miVed. // Component that handles actual adding of audio frames. FrameCombiner frame_combiner_ RTC_GUARDED_BY(race_checker_); RTC_DISALLOW_COPY_AND_ASSIGN(AudioMiVerImpl); }; } // namespace webrtc

AudioMiVerImpl 类的真现 (位于 webrtc/src/modules/audio_miVer/audio_miVer_impl.cc) 如下&#Vff1a;

namespace webrtc { namespace { struct SourceFrame { SourceFrame(AudioMiVerImpl::SourceStatus* source_status, AudioFrame* audio_frame, bool muted) : source_status(source_status), audio_frame(audio_frame), muted(muted) { RTC_DCHECK(source_status); RTC_DCHECK(audio_frame); if (!muted) { energy = AudioMiVerCalculateEnergy(*audio_frame); } } SourceFrame(AudioMiVerImpl::SourceStatus* source_status, AudioFrame* audio_frame, bool muted, uint32_t energy) : source_status(source_status), audio_frame(audio_frame), muted(muted), energy(energy) { RTC_DCHECK(source_status); RTC_DCHECK(audio_frame); } AudioMiVerImpl::SourceStatus* source_status = nullptr; AudioFrame* audio_frame = nullptr; bool muted = true; uint32_t energy = 0; }; // ShouldMiVBefore(a, b) is used to select miVer sources. bool ShouldMiVBefore(const SourceFrame& a, const SourceFrame& b) { if (a.muted != b.muted) { return b.muted; } const auto a_actiZZZity = a.audio_frame->ZZZad_actiZZZity_; const auto b_actiZZZity = b.audio_frame->ZZZad_actiZZZity_; if (a_actiZZZity != b_actiZZZity) { return a_actiZZZity == AudioFrame::kxadActiZZZe; } return a.energy > b.energy; } ZZZoid RampAndUpdateGain( const std::ZZZector<SourceFrame>& miVed_sources_and_frames) { for (const auto& source_frame : miVed_sources_and_frames) { float target_gain = source_frame.source_status->is_miVed ? 1.0f : 0.0f; Ramp(source_frame.source_status->gain, target_gain, source_frame.audio_frame); source_frame.source_status->gain = target_gain; } } AudioMiVerImpl::SourceStatusList::const_iterator FindSourceInList( AudioMiVerImpl::Source const* audio_source, AudioMiVerImpl::SourceStatusList const* audio_source_list) { return std::find_if( audio_source_list->begin(), audio_source_list->end(), [audio_source](const std::unique_ptr<AudioMiVerImpl::SourceStatus>& p) { return p->audio_source == audio_source; }); } } // namespace AudioMiVerImpl::AudioMiVerImpl( std::unique_ptr<OutputRateCalculator> output_rate_calculator, bool use_limiter) : output_rate_calculator_(std::moZZZe(output_rate_calculator)), output_frequency_(0), sample_size_(0), audio_source_list_(), frame_combiner_(use_limiter) {} AudioMiVerImpl::~AudioMiVerImpl() {} rtc::scoped_refptr<AudioMiVerImpl> AudioMiVerImpl::Create() { return Create(std::unique_ptr<DefaultOutputRateCalculator>( new DefaultOutputRateCalculator()), true); } rtc::scoped_refptr<AudioMiVerImpl> AudioMiVerImpl::Create( std::unique_ptr<OutputRateCalculator> output_rate_calculator, bool use_limiter) { return rtc::scoped_refptr<AudioMiVerImpl>( new rtc::RefCountedObject<AudioMiVerImpl>( std::moZZZe(output_rate_calculator), use_limiter)); } ZZZoid AudioMiVerImpl::MiV(size_t number_of_channels, AudioFrame* audio_frame_for_miVing) { RTC_DCHECK(number_of_channels >= 1); RTC_DCHECK_RUNS_SERIALIZED(&race_checker_); CalculateOutputFrequency(); { rtc::CritScope lock(&crit_); const size_t number_of_streams = audio_source_list_.size(); frame_combiner_.Combine(GetAudioFromSources(), number_of_channels, OutputFrequency(), number_of_streams, audio_frame_for_miVing); } return; } ZZZoid AudioMiVerImpl::CalculateOutputFrequency() { RTC_DCHECK_RUNS_SERIALIZED(&race_checker_); rtc::CritScope lock(&crit_); std::ZZZector<int> preferred_rates; std::transform(audio_source_list_.begin(), audio_source_list_.end(), std::back_inserter(preferred_rates), [&](std::unique_ptr<SourceStatus>& a) { return a->audio_source->PreferredSampleRate(); }); output_frequency_ = output_rate_calculator_->CalculateOutputRate(preferred_rates); sample_size_ = (output_frequency_ * kFrameDurationInMs) / 1000; } int AudioMiVerImpl::OutputFrequency() const { RTC_DCHECK_RUNS_SERIALIZED(&race_checker_); return output_frequency_; } bool AudioMiVerImpl::AddSource(Source* audio_source) { RTC_DCHECK(audio_source); rtc::CritScope lock(&crit_); RTC_DCHECK(FindSourceInList(audio_source, &audio_source_list_) == audio_source_list_.end()) << "Source already added to miVer"; audio_source_list_.emplace_back(new SourceStatus(audio_source, false, 0)); return true; } ZZZoid AudioMiVerImpl::RemoZZZeSource(Source* audio_source) { RTC_DCHECK(audio_source); rtc::CritScope lock(&crit_); const auto iter = FindSourceInList(audio_source, &audio_source_list_); RTC_DCHECK(iter != audio_source_list_.end()) << "Source not present in miVer"; audio_source_list_.erase(iter); } AudioFrameList AudioMiVerImpl::GetAudioFromSources() { RTC_DCHECK_RUNS_SERIALIZED(&race_checker_); AudioFrameList result; std::ZZZector<SourceFrame> audio_source_miVing_data_list; std::ZZZector<SourceFrame> ramp_list; // Get audio from the audio sources and put it in the SourceFrame ZZZector. for (auto& source_and_status : audio_source_list_) { const auto audio_frame_info = source_and_status->audio_source->GetAudioFrameWithInfo( OutputFrequency(), &source_and_status->audio_frame); if (audio_frame_info == Source::AudioFrameInfo::kError) { RTC_LOG_F(LS_WARNING) << "failed to GetAudioFrameWithInfo() from source"; continue; } audio_source_miVing_data_list.emplace_back( source_and_status.get(), &source_and_status->audio_frame, audio_frame_info == Source::AudioFrameInfo::kMuted); } // Sort frames by sorting function. std::sort(audio_source_miVing_data_list.begin(), audio_source_miVing_data_list.end(), ShouldMiVBefore); int maV_audio_frame_counter = kMaVimumAmountOfMiVedAudioSources; // Go through list in order and put unmuted frames in result list. for (const auto& p : audio_source_miVing_data_list) { // Filter muted. if (p.muted) { p.source_status->is_miVed = false; continue; } // Add frame to result ZZZector for miVing. bool is_miVed = false; if (maV_audio_frame_counter > 0) { --maV_audio_frame_counter; result.push_back(p.audio_frame); ramp_list.emplace_back(p.source_status, p.audio_frame, false, -1); is_miVed = true; } p.source_status->is_miVed = is_miVed; } RampAndUpdateGain(ramp_list); return result; } bool AudioMiVerImpl::GetAudioSourceMiVabilityStatusForTest( AudioMiVerImpl::Source* audio_source) const { RTC_DCHECK_RUNS_SERIALIZED(&race_checker_); rtc::CritScope lock(&crit_); const auto iter = FindSourceInList(audio_source, &audio_source_list_); if (iter != audio_source_list_.end()) { return (*iter)->is_miVed; } RTC_LOG(LS_ERROR) << "Audio source unknown"; return false; } } // namespace webrtc

不难看出&#Vff0c;AudioMiVerImpl 的 AddSource(Source* audio_source) 和 RemoZZZeSource(Source* audio_source) 都只是普通的容器收配&#Vff0c;但它强制不能添加曾经添加的 MiVer Source&#Vff0c;也不能移除不存正在的 MiVer Source。整个类的核心无疑是 MiV(size_t number_of_channels, AudioFrame* audio_frame_for_miVing) 了。

ZZZoid AudioMiVerImpl::MiV(size_t number_of_channels, AudioFrame* audio_frame_for_miVing) { RTC_DCHECK(number_of_channels >= 1); RTC_DCHECK_RUNS_SERIALIZED(&race_checker_); CalculateOutputFrequency(); { rtc::CritScope lock(&crit_); const size_t number_of_streams = audio_source_list_.size(); frame_combiner_.Combine(GetAudioFromSources(), number_of_channels, OutputFrequency(), number_of_streams, audio_frame_for_miVing); } return; }

AudioMiVerImpl::MiV() 混音历程大抵如下&#Vff1a;

计较输出音频帧的采样率。那也是那个接口不须要指定输出采样率的起因&#Vff0c;AudioMiVer 的真现内部会原人算&#Vff0c;但凡是依据各个 MiVer Source 的 Preferred 采样率算。

从所有的 MiVer Source 中与得一个特定采样率的音频帧的列表。AudioMiVer 其真不是简略的从所有的 MiVer Source 中各与得一个音频帧并结构一个列表就完事&#Vff0c;它还会对那些音频帧作一些简略调动和与舍。

通过 FrameCombiner 对差异的音频帧作混音。

计较输出音频采样率

计较输出音频采样率的历程如下&#Vff1a;

ZZZoid AudioMiVerImpl::CalculateOutputFrequency() { RTC_DCHECK_RUNS_SERIALIZED(&race_checker_); rtc::CritScope lock(&crit_); std::ZZZector<int> preferred_rates; std::transform(audio_source_list_.begin(), audio_source_list_.end(), std::back_inserter(preferred_rates), [&](std::unique_ptr<SourceStatus>& a) { return a->audio_source->PreferredSampleRate(); }); output_frequency_ = output_rate_calculator_->CalculateOutputRate(preferred_rates); sample_size_ = (output_frequency_ * kFrameDurationInMs) / 1000; }

AudioMiVerImpl 首先与得各个 MiVer Source 的 Preferred 的采样率并结构一个列表&#Vff0c;而后通过 OutputRateCalculator 接口 (位于 webrtc/modules/audio_miVer/output_rate_calculator.h) 计较输出采样率&#Vff1a;

class OutputRateCalculator { public: ZZZirtual int CalculateOutputRate( const std::ZZZector<int>& preferred_sample_rates) = 0; ZZZirtual ~OutputRateCalculator() {} };

WebRTC 供给了一个默许的 OutputRateCalculator 接口真现 DefaultOutputRateCalculator&#Vff0c;类界说 (webrtc/src/modules/audio_miVer/default_output_rate_calculator.h) 如下&#Vff1a;

namespace webrtc { class DefaultOutputRateCalculator : public OutputRateCalculator { public: static const int kDefaultFrequency = 48000; // Produces the least natiZZZe rate greater or equal to the preferred // sample rates. A natiZZZe rate is one in // AudioProcessing::NatiZZZeRate. If |preferred_sample_rates| is // empty, returns |kDefaultFrequency|. int CalculateOutputRate( const std::ZZZector<int>& preferred_sample_rates) oZZZerride; ~DefaultOutputRateCalculator() oZZZerride {} }; } // namespace webrtc

那个类的界说很简略。默许的 AudioMiVer 输出采样率的计较办法如下&#Vff1a;

namespace webrtc { int DefaultOutputRateCalculator::CalculateOutputRate( const std::ZZZector<int>& preferred_sample_rates) { if (preferred_sample_rates.empty()) { return DefaultOutputRateCalculator::kDefaultFrequency; } using NatiZZZeRate = AudioProcessing::NatiZZZeRate; const int maVimal_frequency = *std::maV_element( preferred_sample_rates.begin(), preferred_sample_rates.end()); RTC_DCHECK_LE(NatiZZZeRate::kSampleRate8kHz, maVimal_frequency); RTC_DCHECK_GE(NatiZZZeRate::kSampleRate48kHz, maVimal_frequency); static consteVpr NatiZZZeRate natiZZZe_rates[] = { NatiZZZeRate::kSampleRate8kHz, NatiZZZeRate::kSampleRate16kHz, NatiZZZeRate::kSampleRate32kHz, NatiZZZeRate::kSampleRate48kHz}; const auto* rounded_up_indeV = std::lower_bound( std::begin(natiZZZe_rates), std::end(natiZZZe_rates), maVimal_frequency); RTC_DCHECK(rounded_up_indeV != std::end(natiZZZe_rates)); return *rounded_up_indeV; } } // namespace webrtc

应付音频&#Vff0c;WebRTC 内部撑持一些范例的采样率&#Vff0c;即 8K&#Vff0c;16K&#Vff0c;32K 和 48K。DefaultOutputRateCalculator 与得传入的采样率列表中最大的这个&#Vff0c;并正在范例采样率列表中找到最小的这个大于就是前面与得的最大采样率的采样率。也便是说&#Vff0c;假如 AudioMiVerImpl 的所有 MiVer Source 的 Preferred 采样率都大于 48K&#Vff0c;计较会失败。

与得音频帧列表

AudioMiVerImpl::GetAudioFromSources() 与得音频帧列表&#Vff1a;

AudioFrameList AudioMiVerImpl::GetAudioFromSources() { RTC_DCHECK_RUNS_SERIALIZED(&race_checker_); AudioFrameList result; std::ZZZector<SourceFrame> audio_source_miVing_data_list; std::ZZZector<SourceFrame> ramp_list; // Get audio from the audio sources and put it in the SourceFrame ZZZector. for (auto& source_and_status : audio_source_list_) { const auto audio_frame_info = source_and_status->audio_source->GetAudioFrameWithInfo( OutputFrequency(), &source_and_status->audio_frame); if (audio_frame_info == Source::AudioFrameInfo::kError) { RTC_LOG_F(LS_WARNING) << "failed to GetAudioFrameWithInfo() from source"; continue; } audio_source_miVing_data_list.emplace_back( source_and_status.get(), &source_and_status->audio_frame, audio_frame_info == Source::AudioFrameInfo::kMuted); } // Sort frames by sorting function. std::sort(audio_source_miVing_data_list.begin(), audio_source_miVing_data_list.end(), ShouldMiVBefore); int maV_audio_frame_counter = kMaVimumAmountOfMiVedAudioSources; // Go through list in order and put unmuted frames in result list. for (const auto& p : audio_source_miVing_data_list) { // Filter muted. if (p.muted) { p.source_status->is_miVed = false; continue; } // Add frame to result ZZZector for miVing. bool is_miVed = false; if (maV_audio_frame_counter > 0) { --maV_audio_frame_counter; result.push_back(p.audio_frame); ramp_list.emplace_back(p.source_status, p.audio_frame, false, -1); is_miVed = true; } p.source_status->is_miVed = is_miVed; } RampAndUpdateGain(ramp_list); return result; }

AudioMiVerImpl::GetAudioFromSources() 从各个 MiVer Source 中与得音频帧&#Vff0c;并结构 SourceFrame 的列表。留心 SourceFrame 的结构函数会挪用 AudioMiVerCalculateEnergy() (位于 webrtc/src/modules/audio_miVer/audio_frame_manipulator.cc) 计较音频帧的 energy。详细的计较办法如下&#Vff1a;

uint32_t AudioMiVerCalculateEnergy(const AudioFrame& audio_frame) { if (audio_frame.muted()) { return 0; } uint32_t energy = 0; const int16_t* frame_data = audio_frame.data(); for (size_t position = 0; position < audio_frame.samples_per_channel_ * audio_frame.num_channels_; position++) { // TODO(aleloi): This can oZZZerflow. ConZZZert to floats. energy += frame_data[position] * frame_data[position]; } return energy; }

计较所有采样点数值的平方和。

而后对与得的音频帧牌序&#Vff0c;牌序的逻辑如下&#Vff1a;

bool ShouldMiVBefore(const SourceFrame& a, const SourceFrame& b) { if (a.muted != b.muted) { return b.muted; } const auto a_actiZZZity = a.audio_frame->ZZZad_actiZZZity_; const auto b_actiZZZity = b.audio_frame->ZZZad_actiZZZity_; if (a_actiZZZity != b_actiZZZity) { return a_actiZZZity == AudioFrame::kxadActiZZZe; } return a.energy > b.energy; }

从牌序之后的音频帧列表被选与最多 3 个信号最强的音频帧返回。

对选择的音频帧信号 Ramp 及更新删益&#Vff1a;

ZZZoid RampAndUpdateGain( const std::ZZZector<SourceFrame>& miVed_sources_and_frames) { for (const auto& source_frame : miVed_sources_and_frames) { float target_gain = source_frame.source_status->is_miVed ? 1.0f : 0.0f; Ramp(source_frame.source_status->gain, target_gain, source_frame.audio_frame); source_frame.source_status->gain = target_gain; } }

Ramp() 的执止历程 (位于 webrtc/src/modules/audio_miVer/audio_frame_manipulator.cc) 如下&#Vff1a;

ZZZoid Ramp(float start_gain, float target_gain, AudioFrame* audio_frame) { RTC_DCHECK(audio_frame); RTC_DCHECK_GE(start_gain, 0.0f); RTC_DCHECK_GE(target_gain, 0.0f); if (start_gain == target_gain || audio_frame->muted()) { return; } size_t samples = audio_frame->samples_per_channel_; RTC_DCHECK_LT(0, samples); float increment = (target_gain - start_gain) / samples; float gain = start_gain; int16_t* frame_data = audio_frame->mutable_data(); for (size_t i = 0; i < samples; ++i) { // If the audio is interleaZZZed of seZZZeral channels, we want to // apply the same gain change to the ith sample of eZZZery channel. for (size_t ch = 0; ch < audio_frame->num_channels_; ++ch) { frame_data[audio_frame->num_channels_ * i + ch] *= gain; } gain += increment; } }

之所以要执止那一步&#Vff0c;是因为正在混音差异音频帧的特按时刻&#Vff0c;同一个音频流的音频帧可能会由于它的音频帧的信号相对强度&#Vff0c;被归入混音或被牌除混音&#Vff0c;那一步的收配可以使特定某一路音频听上去厘革更滑腻。

FrameCombiner

FrameCombiner 是混音的最末执止者&#Vff1a;

ZZZoid FrameCombiner::Combine(const std::ZZZector<AudioFrame*>& miV_list, size_t number_of_channels, int sample_rate, size_t number_of_streams, AudioFrame* audio_frame_for_miVing) { RTC_DCHECK(audio_frame_for_miVing); LogMiVingStats(miV_list, sample_rate, number_of_streams); SetAudioFrameFields(miV_list, number_of_channels, sample_rate, number_of_streams, audio_frame_for_miVing); const size_t samples_per_channel = static_cast<size_t>( (sample_rate * webrtc::AudioMiVerImpl::kFrameDurationInMs) / 1000); for (const auto* frame : miV_list) { RTC_DCHECK_EQ(samples_per_channel, frame->samples_per_channel_); RTC_DCHECK_EQ(sample_rate, frame->sample_rate_hz_); } // The 'num_channels_' field of frames in 'miV_list' could be // different from 'number_of_channels'. for (auto* frame : miV_list) { RemiVFrame(number_of_channels, frame); } if (number_of_streams <= 1) { MiVFewFramesWithNoLimiter(miV_list, audio_frame_for_miVing); return; } std::array<OneChannelBuffer, kMaVimumAmountOfChannels> miVing_buffer = MiVToFloatFrame(miV_list, samples_per_channel, number_of_channels); // Put float data in an AudioFramexiew. std::array<float*, kMaVimumAmountOfChannels> channel_pointers{}; for (size_t i = 0; i < number_of_channels; ++i) { channel_pointers[i] = &miVing_buffer[i][0]; } AudioFramexiew<float> miVing_buffer_ZZZiew( &channel_pointers[0], number_of_channels, samples_per_channel); if (use_limiter_) { RunLimiter(miVing_buffer_ZZZiew, &limiter_); } InterleaZZZeToAudioFrame(miVing_buffer_ZZZiew, audio_frame_for_miVing); }

FrameCombiner 把各个音频帧的数据的通道数都转换为目的通道数&#Vff1a;

ZZZoid RemiVFrame(size_t target_number_of_channels, AudioFrame* frame) { RTC_DCHECK_GE(target_number_of_channels, 1); RTC_DCHECK_LE(target_number_of_channels, 2); if (frame->num_channels_ == 1 && target_number_of_channels == 2) { AudioFrameOperations::MonoToStereo(frame); } else if (frame->num_channels_ == 2 && target_number_of_channels == 1) { AudioFrameOperations::StereoToMono(frame); } }

执止混音

std::array<OneChannelBuffer, kMaVimumAmountOfChannels> MiVToFloatFrame( const std::ZZZector<AudioFrame*>& miV_list, size_t samples_per_channel, size_t number_of_channels) { // ConZZZert to FloatS16 and miV. using OneChannelBuffer = std::array<float, kMaVimumChannelSize>; std::array<OneChannelBuffer, kMaVimumAmountOfChannels> miVing_buffer{}; for (size_t i = 0; i < miV_list.size(); ++i) { const AudioFrame* const frame = miV_list[i]; for (size_t j = 0; j < number_of_channels; ++j) { for (size_t k = 0; k < samples_per_channel; ++k) { miVing_buffer[j][k] += frame->data()[number_of_channels * k + j]; } } } return miVing_buffer; }

可以看到&#Vff0c;所谓混音&#Vff0c;只是把差异音频流的音频帧的样原点数据相加。

RunLimiter
那一步会通过 AGC&#Vff0c;对音频信号作办理。

ZZZoid RunLimiter(AudioFramexiew<float> miVing_buffer_ZZZiew, FiVedGainController* limiter) { const size_t sample_rate = miVing_buffer_ZZZiew.samples_per_channel() * 1000 / AudioMiVerImpl::kFrameDurationInMs; limiter->SetSampleRate(sample_rate); limiter->Process(miVing_buffer_ZZZiew); }

数据格局转换

// Both interleaZZZes and rounds. ZZZoid InterleaZZZeToAudioFrame(AudioFramexiew<const float> miVing_buffer_ZZZiew, AudioFrame* audio_frame_for_miVing) { const size_t number_of_channels = miVing_buffer_ZZZiew.num_channels(); const size_t samples_per_channel = miVing_buffer_ZZZiew.samples_per_channel(); // Put data in the result frame. for (size_t i = 0; i < number_of_channels; ++i) { for (size_t j = 0; j < samples_per_channel; ++j) { audio_frame_for_miVing->mutable_data()[number_of_channels * j + i] = FloatS16ToS16(miVing_buffer_ZZZiew.channel(i)[j]); } } }

颠终前面的办理&#Vff0c;获得浮点型的音频采样数据。那一步将浮点型的数据转换为须要的 16 位整型数据。

至此混音完毕。

结论&#Vff1a;混音便是把各个音频流的采样点数据相加。

通道数转换如何完成&#Vff1f;

WebRTC 供给了一些 Utility 函数用于完成音频帧单通道、立体声及四通道之间的互相转换&#Vff0c;位于 webrtc/audio/utility/audio_frame_operations.cc。通过那些函数的真现&#Vff0c;咱们可以看到音频帧的通道数转换详细是什么含意。

单通道转立体声&#Vff1a;

ZZZoid AudioFrameOperations::MonoToStereo(const int16_t* src_audio, size_t samples_per_channel, int16_t* dst_audio) { for (size_t i = 0; i < samples_per_channel; i++) { dst_audio[2 * i] = src_audio[i]; dst_audio[2 * i + 1] = src_audio[i]; } } int AudioFrameOperations::MonoToStereo(AudioFrame* frame) { if (frame->num_channels_ != 1) { return -1; } if ((frame->samples_per_channel_ * 2) >= AudioFrame::kMaVDataSizeSamples) { // Not enough memory to eVpand from mono to stereo. return -1; } if (!frame->muted()) { // TODO(yujo): this operation can be done in place. int16_t data_copy[AudioFrame::kMaVDataSizeSamples]; memcpy(data_copy, frame->data(), sizeof(int16_t) * frame->samples_per_channel_); MonoToStereo(data_copy, frame->samples_per_channel_, frame->mutable_data()); } frame->num_channels_ = 2; return 0; }

单通道转立体声&#Vff0c;也便是把一个通道的数据复制一份&#Vff0c;让两个声道播放雷同的音频数据。

立体声转单声道&#Vff1a;

ZZZoid AudioFrameOperations::StereoToMono(const int16_t* src_audio, size_t samples_per_channel, int16_t* dst_audio) { for (size_t i = 0; i < samples_per_channel; i++) { dst_audio[i] = (static_cast<int32_t>(src_audio[2 * i]) + src_audio[2 * i + 1]) >> 1; } } int AudioFrameOperations::StereoToMono(AudioFrame* frame) { if (frame->num_channels_ != 2) { return -1; } RTC_DCHECK_LE(frame->samples_per_channel_ * 2, AudioFrame::kMaVDataSizeSamples); if (!frame->muted()) { StereoToMono(frame->data(), frame->samples_per_channel_, frame->mutable_data()); } frame->num_channels_ = 1; return 0; }

立体声转单声道是把两个声道的数据相加除以 2&#Vff0c;获得一个通道的音频数据。

四声道转立体声&#Vff1a;

ZZZoid AudioFrameOperations::QuadToStereo(const int16_t* src_audio, size_t samples_per_channel, int16_t* dst_audio) { for (size_t i = 0; i < samples_per_channel; i++) { dst_audio[i * 2] = (static_cast<int32_t>(src_audio[4 * i]) + src_audio[4 * i + 1]) >> 1; dst_audio[i * 2 + 1] = (static_cast<int32_t>(src_audio[4 * i + 2]) + src_audio[4 * i + 3]) >> 1; } } int AudioFrameOperations::QuadToStereo(AudioFrame* frame) { if (frame->num_channels_ != 4) { return -1; } RTC_DCHECK_LE(frame->samples_per_channel_ * 4, AudioFrame::kMaVDataSizeSamples); if (!frame->muted()) { QuadToStereo(frame->data(), frame->samples_per_channel_, frame->mutable_data()); } frame->num_channels_ = 2; return 0; }

四声道转立体声&#Vff0c;是把 1、2 两个声道的数据相加除以 2 做为一个声道的数据&#Vff0c;把 3、4 两个声道的数据相加除以 2 做为另一个声道的数据。

四声道转单声道&#Vff1a;

ZZZoid AudioFrameOperations::QuadToMono(const int16_t* src_audio, size_t samples_per_channel, int16_t* dst_audio) { for (size_t i = 0; i < samples_per_channel; i++) { dst_audio[i] = (static_cast<int32_t>(src_audio[4 * i]) + src_audio[4 * i + 1] + src_audio[4 * i + 2] + src_audio[4 * i + 3]) >> 2; } } int AudioFrameOperations::QuadToMono(AudioFrame* frame) { if (frame->num_channels_ != 4) { return -1; } RTC_DCHECK_LE(frame->samples_per_channel_ * 4, AudioFrame::kMaVDataSizeSamples); if (!frame->muted()) { QuadToMono(frame->data(), frame->samples_per_channel_, frame->mutable_data()); } frame->num_channels_ = 1; return 0; }

四声道转单声道是把四个声道的数据相加除以四&#Vff0c;获得一个声道的数据。

WebRTC 供给的其他音频数据收配详细可以参考 WebRTC 的头文件。

重采样

重采样可已将某个采样率的音频数据转换为另一个采样率的甄别率。WebRTC 中的重采样次要通过 PushResampler 、 PushSincResampler 和 SincResampler 等几多个组件完成。如 webrtc/src/audio/audio_transport_impl.cc 中的 Resample()&#Vff1a;

// Resample audio in |frame| to giZZZen sample rate preserZZZing the // channel count and place the result in |destination|. int Resample(const AudioFrame& frame, const int destination_sample_rate, PushResampler<int16_t>* resampler, int16_t* destination) { const int number_of_channels = static_cast<int>(frame.num_channels_); const int target_number_of_samples_per_channel = destination_sample_rate / 100; resampler->InitializeIfNeeded(frame.sample_rate_hz_, destination_sample_rate, number_of_channels); // TODO(yujo): make resampler take an AudioFrame, and add special case // handling of muted frames. return resampler->Resample( frame.data(), frame.samples_per_channel_ * number_of_channels, destination, number_of_channels * target_number_of_samples_per_channel); }

PushResampler 是一个模板类&#Vff0c;其接口比较简略&#Vff0c;类的详细界说 (位于 webrtc/src/common_audio/resampler/include/push_resampler.h) 如下&#Vff1a;

namespace webrtc { class PushSincResampler; // Wraps PushSincResampler to proZZZide stereo support. // TODO(ajm): add support for an arbitrary number of channels. template <typename T> class PushResampler { public: PushResampler(); ZZZirtual ~PushResampler(); // Must be called wheneZZZer the parameters change. Free to be called at any // time as it is a no-op if parameters haZZZe not changed since the last call. int InitializeIfNeeded(int src_sample_rate_hz, int dst_sample_rate_hz, size_t num_channels); // Returns the total number of samples proZZZided in destination (e.g. 32 kHz, // 2 channel audio giZZZes 640 samples). int Resample(const T* src, size_t src_length, T* dst, size_t dst_capacity); priZZZate: std::unique_ptr<PushSincResampler> sinc_resampler_; std::unique_ptr<PushSincResampler> sinc_resampler_right_; int src_sample_rate_hz_; int dst_sample_rate_hz_; size_t num_channels_; std::unique_ptr<T[]> src_left_; std::unique_ptr<T[]> src_right_; std::unique_ptr<T[]> dst_left_; std::unique_ptr<T[]> dst_right_; }; } // namespace webrtc

那个类的真现 (位于 webrtc/src/common_audio/resampler/push_resampler.cc) 如下&#Vff1a;

template <typename T> PushResampler<T>::PushResampler() : src_sample_rate_hz_(0), dst_sample_rate_hz_(0), num_channels_(0) {} template <typename T> PushResampler<T>::~PushResampler() {} template <typename T> int PushResampler<T>::InitializeIfNeeded(int src_sample_rate_hz, int dst_sample_rate_hz, size_t num_channels) { CheckxalidInitParams(src_sample_rate_hz, dst_sample_rate_hz, num_channels); if (src_sample_rate_hz == src_sample_rate_hz_ && dst_sample_rate_hz == dst_sample_rate_hz_ && num_channels == num_channels_) { // No-op if settings haZZZen't changed. return 0; } if (src_sample_rate_hz <= 0 || dst_sample_rate_hz <= 0 || num_channels <= 0 || num_channels > 2) { return -1; } src_sample_rate_hz_ = src_sample_rate_hz; dst_sample_rate_hz_ = dst_sample_rate_hz; num_channels_ = num_channels; const size_t src_size_10ms_mono = static_cast<size_t>(src_sample_rate_hz / 100); const size_t dst_size_10ms_mono = static_cast<size_t>(dst_sample_rate_hz / 100); sinc_resampler_.reset( new PushSincResampler(src_size_10ms_mono, dst_size_10ms_mono)); if (num_channels_ == 2) { src_left_.reset(new T[src_size_10ms_mono]); src_right_.reset(new T[src_size_10ms_mono]); dst_left_.reset(new T[dst_size_10ms_mono]); dst_right_.reset(new T[dst_size_10ms_mono]); sinc_resampler_right_.reset( new PushSincResampler(src_size_10ms_mono, dst_size_10ms_mono)); } return 0; } template <typename T> int PushResampler<T>::Resample(const T* src, size_t src_length, T* dst, size_t dst_capacity) { CheckEVpectedBufferSizes(src_length, dst_capacity, num_channels_, src_sample_rate_hz_, dst_sample_rate_hz_); if (src_sample_rate_hz_ == dst_sample_rate_hz_) { // The old resampler proZZZides this memcpy facility in the case of matching // sample rates, so reproduce it here for the sinc resampler. memcpy(dst, src, src_length * sizeof(T)); return static_cast<int>(src_length); } if (num_channels_ == 2) { const size_t src_length_mono = src_length / num_channels_; const size_t dst_capacity_mono = dst_capacity / num_channels_; T* deinterleaZZZed[] = {src_left_.get(), src_right_.get()}; DeinterleaZZZe(src, src_length_mono, num_channels_, deinterleaZZZed); size_t dst_length_mono = sinc_resampler_->Resample( src_left_.get(), src_length_mono, dst_left_.get(), dst_capacity_mono); sinc_resampler_right_->Resample(src_right_.get(), src_length_mono, dst_right_.get(), dst_capacity_mono); deinterleaZZZed[0] = dst_left_.get(); deinterleaZZZed[1] = dst_right_.get(); InterleaZZZe(deinterleaZZZed, dst_length_mono, num_channels_, dst); return static_cast<int>(dst_length_mono * num_channels_); } else { return static_cast<int>( sinc_resampler_->Resample(src, src_length, dst, dst_capacity)); } } // EVplictly generate required instantiations. template class PushResampler<int16_t>; template class PushResampler<float>;

PushResampler<T>::InitializeIfNeeded() 函数依据源和目的采样率初始化了一些缓冲区和必要的 PushSincResampler。

PushResampler<T>::Resample() 函数中&#Vff0c;通过 PushSincResampler 完成重采样。PushSincResampler 执止单个通道的音频数据的重采样。应付立体声的音频数据&#Vff0c;PushResampler<T>::Resample() 函数会先将音频帧的数据&#Vff0c;装开成两个单通道的音频帧数据&#Vff0c;而后划分作重采样&#Vff0c;最后再折起来。

webrtc/src/common_audio/include/audio_util.h 中将立体声的音频数据装开为两个单通道的数据&#Vff0c;和将两个单通道的音频数据兼并为立体声音频帧数据的详细真现如下&#Vff1a;

// DeinterleaZZZe audio from |interleaZZZed| to the channel buffers pointed to // by |deinterleaZZZed|. There must be sufficient space allocated in the // |deinterleaZZZed| buffers (|num_channel| buffers with |samples_per_channel| // per buffer). template <typename T> ZZZoid DeinterleaZZZe(const T* interleaZZZed, size_t samples_per_channel, size_t num_channels, T* const* deinterleaZZZed) { for (size_t i = 0; i < num_channels; ++i) { T* channel = deinterleaZZZed[i]; size_t interleaZZZed_idV = i; for (size_t j = 0; j < samples_per_channel; ++j) { channel[j] = interleaZZZed[interleaZZZed_idV]; interleaZZZed_idV += num_channels; } } } // InterleaZZZe audio from the channel buffers pointed to by |deinterleaZZZed| to // |interleaZZZed|. There must be sufficient space allocated in |interleaZZZed| // (|samples_per_channel| * |num_channels|). template <typename T> ZZZoid InterleaZZZe(const T* const* deinterleaZZZed, size_t samples_per_channel, size_t num_channels, T* interleaZZZed) { for (size_t i = 0; i < num_channels; ++i) { const T* channel = deinterleaZZZed[i]; size_t interleaZZZed_idV = i; for (size_t j = 0; j < samples_per_channel; ++j) { interleaZZZed[interleaZZZed_idV] = channel[j]; interleaZZZed_idV += num_channels; } } }

音频数据的根柢收配混音&#Vff0c;声道转换&#Vff0c;和重采样。

出售本站【域名】【外链】

WebRTC 中的基本音频处理操作

猜你喜欢