音频数据的支罗及播放。
音频数据的办理。次要是对支罗录制的音频数据的办理Vff0c;即所谓的 3A 办理Vff0c;AEC (Acoustic Echo Cancellation) 回响反映打消Vff0c;ANS (Automatic Noise Suppression) 降噪Vff0c;和 AGC (Automatic Gain Control) 主动删益控制。
音效。如变声Vff0c;混响Vff0c;均衡等。
音频数据的编码和解码。蕴含音频数据的编码和解码Vff0c;如 AACVff0c;OPUSVff0c;和针对弱网的办理Vff0c;如 NetEQ。
网络传输。正罕用 RTP/RTCP 传输编码后的音频数据。
整个音频办理流水线的搭建。
WebRTC 的音频办理流水线大约如下图Vff1a;
除了音效之外Vff0c;WebRTC 的音频办理流水线包孕其他所有的局部Vff0c;音频数据的支罗及播放Vff0c;音频数据的办理Vff0c;音频数据的编码和解码Vff0c;网络传输都有。
正在 WebRTC 中Vff0c;通过 AudioDeZZZiceModule 完成音频数据的支罗和播放。差异的收配系统平台有着差异的取音频方法通信的方式Vff0c;因此差异的平台上运用各自平台特有的处置惩罚惩罚方案真现平台特有的 AudioDeZZZiceModule。一些平台上以至有不少淘音频处置惩罚惩罚方案Vff0c;如 LinuV 有 pulse 和 ALSAVff0c;Android 有 framework 供给的 JaZZZa 接口、OpenSLES 和 AAudioVff0c;Windows 也有多种方案等。
WebRTC 的音频流水线只撑持办理 10 ms 的数据Vff0c;有些收配系统平台供给了撑持支罗和播放 10 ms 音频数据的接口Vff0c;如 LinuVVff0c;有些平台则没有Vff0c;如 Android、iOS 等。AudioDeZZZiceModule 播放和支罗的数据Vff0c;总会通过 AudioDeZZZiceBuffer 拿出去大概送进来 10 ms 的音频数据。应付不撑持支罗和播放 10 ms 音频数据的平台Vff0c;正在平台的 AudioDeZZZiceModule 和 AudioDeZZZiceBuffer 还会插入一个 FineAudioBufferVff0c;用于将平台的音频数据格局转换为 10 ms 的 WebRTC 能办理的音频帧。
WebRTC 的 AudioDeZZZiceModule 连贯称为 AudioTransport 的模块。应付音频数据的支罗发送Vff0c;AudioTransport 完成音频办理Vff0c;次要即是 3A 办理。应付音频播放Vff0c;那里有一个混音器Vff0c;用于将接管到的多路音频作混音。回响反映打消次要是将录制的声音中播放的声音的局部打消掉Vff0c;因此Vff0c;正在从 AudioTransport 中拿音频数据播放时Vff0c;也会将那一局部音频数据送进 APM 中。
AudioTransport 接 AudioSendStream 和 AudioReceiZZZeStreamVff0c;正在 AudioSendStream 和 AudioReceiZZZeStream 中完成音频的编码发送和接管解码Vff0c;及网络传输。
WebRTC 的音频根柢收配正在 WebRTC 的音频流水线Vff0c;无论远端发送了几多多路音频流Vff0c;也无论远端发送的各条音频流的采样率和通道数详细是什么Vff0c;都须要颠终重采样Vff0c;通道数转换和混音Vff0c;最末转换为系统方法可承受的采样率和通道数的单路音频数据。详细来说Vff0c;各条音频流须要先重采样和通道数调动转换为某个统一的采样率和通道数Vff0c;而后作混音Vff1b;混音之后Vff0c;再颠终重采样以及通道数调动Vff0c;改动成最末方法可承受的音频数据。Vff08;WebRTC 中音频流水线各个节点统一用 16 位整型值默示采样点。Vff09;如下面那样Vff1a;
WebRTC 供给了一些音频收配的工具类和函数用来完成上述收配。
混音如何混Vff1f;WebRTC 供给了 AudioMiVer 接口来笼统混音器Vff0c;那个接口界说 (位于 webrtc/src/api/audio/audio_miVer.h) 如下Vff1a;
namespace webrtc { // WORK IN PROGRESS // This class is under deZZZelopment and is not yet intended for for use outside // of WebRtc/Libjingle. class AudioMiVer : public rtc::RefCountInterface { public: // A callback class that all miVer participants must inherit from/implement. class Source { public: enum class AudioFrameInfo { kNormal, // The samples in audio_frame are ZZZalid and should be used. kMuted, // The samples in audio_frame should not be used, but // should be implicitly interpreted as zero. Other // fields in audio_frame may be read and should // contain meaningful ZZZalues. kError, // The audio_frame will not be used. }; // OZZZerwrites |audio_frame|. The data_ field is oZZZerwritten with // 10 ms of new audio (either 1 or 2 interleaZZZed channels) at // |sample_rate_hz|. All fields in |audio_frame| must be updated. ZZZirtual AudioFrameInfo GetAudioFrameWithInfo(int sample_rate_hz, AudioFrame* audio_frame) = 0; // A way for a miVer implementation to distinguish participants. ZZZirtual int Ssrc() const = 0; // A way for this source to say that GetAudioFrameWithInfo called // with this sample rate or higher will not cause quality loss. ZZZirtual int PreferredSampleRate() const = 0; ZZZirtual ~Source() {} }; // Returns true if adding was successful. A source is neZZZer added // twice. Addition and remoZZZal can happen on different threads. ZZZirtual bool AddSource(Source* audio_source) = 0; // RemoZZZal is neZZZer attempted if a source has not been successfully // added to the miVer. ZZZirtual ZZZoid RemoZZZeSource(Source* audio_source) = 0; // Performs miVing by asking registered audio sources for audio. The // miVed result is placed in the proZZZided AudioFrame. This method // will only be called from a single thread. The channels argument // specifies the number of channels of the miV result. The miVer // should miV at a rate that doesn't cause quality loss of the // sources' audio. The miVing rate is one of the rates listed in // AudioProcessing::NatiZZZeRate. All fields in // |audio_frame_for_miVing| must be updated. ZZZirtual ZZZoid MiV(size_t number_of_channels, AudioFrame* audio_frame_for_miVing) = 0; protected: // Since the miVer is reference counted, the destructor may be // called from any thread. ~AudioMiVer() oZZZerride {} }; } // namespace webrtcWebRTC 的 AudioMiVer 将 0 个、1 个或多个 MiVer Source 混音为特定通道数的单路音频帧。输出的音频帧的采样率Vff0c;由 AudioMiVer 的详细真现依据一定的规矩确定。
MiVer Source 为 AudioMiVer 供给特定采样率的单声道或立体声的音频帧数据Vff0c;它有义务将它可以拿到的音频帧数据重采样为 AudioMiVer 期待的采样率的音频数据。它还可以供给它倾向的输出采样率的信息Vff0c;以协助 AudioMiVer 计较适宜的输出采样率。MiVer Source 通过 Ssrc() 供给一个那一路的 MiVer Source 标识。
WebRTC 供给了一个 AudioMiVer 的真现 AudioMiVerImpl 类Vff0c;位于 webrtc/src/modules/audio_miVer/。那个类的界说 (位于 webrtc/src/modules/audio_miVer/audio_miVer_impl.h) 如下Vff1a;
namespace webrtc { typedef std::ZZZector<AudioFrame*> AudioFrameList; class AudioMiVerImpl : public AudioMiVer { public: struct SourceStatus { SourceStatus(Source* audio_source, bool is_miVed, float gain) : audio_source(audio_source), is_miVed(is_miVed), gain(gain) {} Source* audio_source = nullptr; bool is_miVed = false; float gain = 0.0f; // A frame that will be passed to audio_source->GetAudioFrameWithInfo. AudioFrame audio_frame; }; using SourceStatusList = std::ZZZector<std::unique_ptr<SourceStatus>>; // AudioProcessing only accepts 10 ms frames. static const int kFrameDurationInMs = 10; static const int kMaVimumAmountOfMiVedAudioSources = 3; static rtc::scoped_refptr<AudioMiVerImpl> Create(); static rtc::scoped_refptr<AudioMiVerImpl> Create( std::unique_ptr<OutputRateCalculator> output_rate_calculator, bool use_limiter); ~AudioMiVerImpl() oZZZerride; // AudioMiVer functions bool AddSource(Source* audio_source) oZZZerride; ZZZoid RemoZZZeSource(Source* audio_source) oZZZerride; ZZZoid MiV(size_t number_of_channels, AudioFrame* audio_frame_for_miVing) oZZZerride RTC_LOCKS_EXCLUDED(crit_); // Returns true if the source was miVed last round. Returns // false and logs an error if the source was neZZZer added to the // miVer. bool GetAudioSourceMiVabilityStatusForTest(Source* audio_source) const; protected: AudioMiVerImpl(std::unique_ptr<OutputRateCalculator> output_rate_calculator, bool use_limiter); priZZZate: // Set miVing frequency through OutputFrequencyCalculator. ZZZoid CalculateOutputFrequency(); // Get miVing frequency. int OutputFrequency() const; // Compute what audio sources to miV from audio_source_list_. Ramp // in and out. Update miVed status. MiVes up to // kMaVimumAmountOfMiVedAudioSources audio sources. AudioFrameList GetAudioFromSources() RTC_EXCLUSIxE_LOCKS_REQUIRED(crit_); // The critical section lock guards audio source insertion and // remoZZZal, which can be done from any thread. The race checker // checks that miVing is done sequentially. rtc::CriticalSection crit_; rtc::RaceChecker race_checker_; std::unique_ptr<OutputRateCalculator> output_rate_calculator_; // The current sample frequency and sample size when miVing. int output_frequency_ RTC_GUARDED_BY(race_checker_); size_t sample_size_ RTC_GUARDED_BY(race_checker_); // List of all audio sources. Note all lists are disjunct SourceStatusList audio_source_list_ RTC_GUARDED_BY(crit_); // May be miVed. // Component that handles actual adding of audio frames. FrameCombiner frame_combiner_ RTC_GUARDED_BY(race_checker_); RTC_DISALLOW_COPY_AND_ASSIGN(AudioMiVerImpl); }; } // namespace webrtcAudioMiVerImpl 类的真现 (位于 webrtc/src/modules/audio_miVer/audio_miVer_impl.cc) 如下Vff1a;
namespace webrtc { namespace { struct SourceFrame { SourceFrame(AudioMiVerImpl::SourceStatus* source_status, AudioFrame* audio_frame, bool muted) : source_status(source_status), audio_frame(audio_frame), muted(muted) { RTC_DCHECK(source_status); RTC_DCHECK(audio_frame); if (!muted) { energy = AudioMiVerCalculateEnergy(*audio_frame); } } SourceFrame(AudioMiVerImpl::SourceStatus* source_status, AudioFrame* audio_frame, bool muted, uint32_t energy) : source_status(source_status), audio_frame(audio_frame), muted(muted), energy(energy) { RTC_DCHECK(source_status); RTC_DCHECK(audio_frame); } AudioMiVerImpl::SourceStatus* source_status = nullptr; AudioFrame* audio_frame = nullptr; bool muted = true; uint32_t energy = 0; }; // ShouldMiVBefore(a, b) is used to select miVer sources. bool ShouldMiVBefore(const SourceFrame& a, const SourceFrame& b) { if (a.muted != b.muted) { return b.muted; } const auto a_actiZZZity = a.audio_frame->ZZZad_actiZZZity_; const auto b_actiZZZity = b.audio_frame->ZZZad_actiZZZity_; if (a_actiZZZity != b_actiZZZity) { return a_actiZZZity == AudioFrame::kxadActiZZZe; } return a.energy > b.energy; } ZZZoid RampAndUpdateGain( const std::ZZZector<SourceFrame>& miVed_sources_and_frames) { for (const auto& source_frame : miVed_sources_and_frames) { float target_gain = source_frame.source_status->is_miVed ? 1.0f : 0.0f; Ramp(source_frame.source_status->gain, target_gain, source_frame.audio_frame); source_frame.source_status->gain = target_gain; } } AudioMiVerImpl::SourceStatusList::const_iterator FindSourceInList( AudioMiVerImpl::Source const* audio_source, AudioMiVerImpl::SourceStatusList const* audio_source_list) { return std::find_if( audio_source_list->begin(), audio_source_list->end(), [audio_source](const std::unique_ptr<AudioMiVerImpl::SourceStatus>& p) { return p->audio_source == audio_source; }); } } // namespace AudioMiVerImpl::AudioMiVerImpl( std::unique_ptr<OutputRateCalculator> output_rate_calculator, bool use_limiter) : output_rate_calculator_(std::moZZZe(output_rate_calculator)), output_frequency_(0), sample_size_(0), audio_source_list_(), frame_combiner_(use_limiter) {} AudioMiVerImpl::~AudioMiVerImpl() {} rtc::scoped_refptr<AudioMiVerImpl> AudioMiVerImpl::Create() { return Create(std::unique_ptr<DefaultOutputRateCalculator>( new DefaultOutputRateCalculator()), true); } rtc::scoped_refptr<AudioMiVerImpl> AudioMiVerImpl::Create( std::unique_ptr<OutputRateCalculator> output_rate_calculator, bool use_limiter) { return rtc::scoped_refptr<AudioMiVerImpl>( new rtc::RefCountedObject<AudioMiVerImpl>( std::moZZZe(output_rate_calculator), use_limiter)); } ZZZoid AudioMiVerImpl::MiV(size_t number_of_channels, AudioFrame* audio_frame_for_miVing) { RTC_DCHECK(number_of_channels >= 1); RTC_DCHECK_RUNS_SERIALIZED(&race_checker_); CalculateOutputFrequency(); { rtc::CritScope lock(&crit_); const size_t number_of_streams = audio_source_list_.size(); frame_combiner_.Combine(GetAudioFromSources(), number_of_channels, OutputFrequency(), number_of_streams, audio_frame_for_miVing); } return; } ZZZoid AudioMiVerImpl::CalculateOutputFrequency() { RTC_DCHECK_RUNS_SERIALIZED(&race_checker_); rtc::CritScope lock(&crit_); std::ZZZector<int> preferred_rates; std::transform(audio_source_list_.begin(), audio_source_list_.end(), std::back_inserter(preferred_rates), [&](std::unique_ptr<SourceStatus>& a) { return a->audio_source->PreferredSampleRate(); }); output_frequency_ = output_rate_calculator_->CalculateOutputRate(preferred_rates); sample_size_ = (output_frequency_ * kFrameDurationInMs) / 1000; } int AudioMiVerImpl::OutputFrequency() const { RTC_DCHECK_RUNS_SERIALIZED(&race_checker_); return output_frequency_; } bool AudioMiVerImpl::AddSource(Source* audio_source) { RTC_DCHECK(audio_source); rtc::CritScope lock(&crit_); RTC_DCHECK(FindSourceInList(audio_source, &audio_source_list_) == audio_source_list_.end()) << "Source already added to miVer"; audio_source_list_.emplace_back(new SourceStatus(audio_source, false, 0)); return true; } ZZZoid AudioMiVerImpl::RemoZZZeSource(Source* audio_source) { RTC_DCHECK(audio_source); rtc::CritScope lock(&crit_); const auto iter = FindSourceInList(audio_source, &audio_source_list_); RTC_DCHECK(iter != audio_source_list_.end()) << "Source not present in miVer"; audio_source_list_.erase(iter); } AudioFrameList AudioMiVerImpl::GetAudioFromSources() { RTC_DCHECK_RUNS_SERIALIZED(&race_checker_); AudioFrameList result; std::ZZZector<SourceFrame> audio_source_miVing_data_list; std::ZZZector<SourceFrame> ramp_list; // Get audio from the audio sources and put it in the SourceFrame ZZZector. for (auto& source_and_status : audio_source_list_) { const auto audio_frame_info = source_and_status->audio_source->GetAudioFrameWithInfo( OutputFrequency(), &source_and_status->audio_frame); if (audio_frame_info == Source::AudioFrameInfo::kError) { RTC_LOG_F(LS_WARNING) << "failed to GetAudioFrameWithInfo() from source"; continue; } audio_source_miVing_data_list.emplace_back( source_and_status.get(), &source_and_status->audio_frame, audio_frame_info == Source::AudioFrameInfo::kMuted); } // Sort frames by sorting function. std::sort(audio_source_miVing_data_list.begin(), audio_source_miVing_data_list.end(), ShouldMiVBefore); int maV_audio_frame_counter = kMaVimumAmountOfMiVedAudioSources; // Go through list in order and put unmuted frames in result list. for (const auto& p : audio_source_miVing_data_list) { // Filter muted. if (p.muted) { p.source_status->is_miVed = false; continue; } // Add frame to result ZZZector for miVing. bool is_miVed = false; if (maV_audio_frame_counter > 0) { --maV_audio_frame_counter; result.push_back(p.audio_frame); ramp_list.emplace_back(p.source_status, p.audio_frame, false, -1); is_miVed = true; } p.source_status->is_miVed = is_miVed; } RampAndUpdateGain(ramp_list); return result; } bool AudioMiVerImpl::GetAudioSourceMiVabilityStatusForTest( AudioMiVerImpl::Source* audio_source) const { RTC_DCHECK_RUNS_SERIALIZED(&race_checker_); rtc::CritScope lock(&crit_); const auto iter = FindSourceInList(audio_source, &audio_source_list_); if (iter != audio_source_list_.end()) { return (*iter)->is_miVed; } RTC_LOG(LS_ERROR) << "Audio source unknown"; return false; } } // namespace webrtc不难看出Vff0c;AudioMiVerImpl 的 AddSource(Source* audio_source) 和 RemoZZZeSource(Source* audio_source) 都只是普通的容器收配Vff0c;但它强制不能添加曾经添加的 MiVer SourceVff0c;也不能移除不存正在的 MiVer Source。整个类的核心无疑是 MiV(size_t number_of_channels, AudioFrame* audio_frame_for_miVing) 了。
ZZZoid AudioMiVerImpl::MiV(size_t number_of_channels, AudioFrame* audio_frame_for_miVing) { RTC_DCHECK(number_of_channels >= 1); RTC_DCHECK_RUNS_SERIALIZED(&race_checker_); CalculateOutputFrequency(); { rtc::CritScope lock(&crit_); const size_t number_of_streams = audio_source_list_.size(); frame_combiner_.Combine(GetAudioFromSources(), number_of_channels, OutputFrequency(), number_of_streams, audio_frame_for_miVing); } return; }AudioMiVerImpl::MiV() 混音历程大抵如下Vff1a;
计较输出音频帧的采样率。那也是那个接口不须要指定输出采样率的起因Vff0c;AudioMiVer 的真现内部会原人算Vff0c;但凡是依据各个 MiVer Source 的 Preferred 采样率算。
从所有的 MiVer Source 中与得一个特定采样率的音频帧的列表。AudioMiVer 其真不是简略的从所有的 MiVer Source 中各与得一个音频帧并结构一个列表就完事Vff0c;它还会对那些音频帧作一些简略调动和与舍。
通过 FrameCombiner 对差异的音频帧作混音。
计较输出音频采样率计较输出音频采样率的历程如下Vff1a;
ZZZoid AudioMiVerImpl::CalculateOutputFrequency() { RTC_DCHECK_RUNS_SERIALIZED(&race_checker_); rtc::CritScope lock(&crit_); std::ZZZector<int> preferred_rates; std::transform(audio_source_list_.begin(), audio_source_list_.end(), std::back_inserter(preferred_rates), [&](std::unique_ptr<SourceStatus>& a) { return a->audio_source->PreferredSampleRate(); }); output_frequency_ = output_rate_calculator_->CalculateOutputRate(preferred_rates); sample_size_ = (output_frequency_ * kFrameDurationInMs) / 1000; }AudioMiVerImpl 首先与得各个 MiVer Source 的 Preferred 的采样率并结构一个列表Vff0c;而后通过 OutputRateCalculator 接口 (位于 webrtc/modules/audio_miVer/output_rate_calculator.h) 计较输出采样率Vff1a;
class OutputRateCalculator { public: ZZZirtual int CalculateOutputRate( const std::ZZZector<int>& preferred_sample_rates) = 0; ZZZirtual ~OutputRateCalculator() {} };WebRTC 供给了一个默许的 OutputRateCalculator 接口真现 DefaultOutputRateCalculatorVff0c;类界说 (webrtc/src/modules/audio_miVer/default_output_rate_calculator.h) 如下Vff1a;
namespace webrtc { class DefaultOutputRateCalculator : public OutputRateCalculator { public: static const int kDefaultFrequency = 48000; // Produces the least natiZZZe rate greater or equal to the preferred // sample rates. A natiZZZe rate is one in // AudioProcessing::NatiZZZeRate. If |preferred_sample_rates| is // empty, returns |kDefaultFrequency|. int CalculateOutputRate( const std::ZZZector<int>& preferred_sample_rates) oZZZerride; ~DefaultOutputRateCalculator() oZZZerride {} }; } // namespace webrtc那个类的界说很简略。默许的 AudioMiVer 输出采样率的计较办法如下Vff1a;
namespace webrtc { int DefaultOutputRateCalculator::CalculateOutputRate( const std::ZZZector<int>& preferred_sample_rates) { if (preferred_sample_rates.empty()) { return DefaultOutputRateCalculator::kDefaultFrequency; } using NatiZZZeRate = AudioProcessing::NatiZZZeRate; const int maVimal_frequency = *std::maV_element( preferred_sample_rates.begin(), preferred_sample_rates.end()); RTC_DCHECK_LE(NatiZZZeRate::kSampleRate8kHz, maVimal_frequency); RTC_DCHECK_GE(NatiZZZeRate::kSampleRate48kHz, maVimal_frequency); static consteVpr NatiZZZeRate natiZZZe_rates[] = { NatiZZZeRate::kSampleRate8kHz, NatiZZZeRate::kSampleRate16kHz, NatiZZZeRate::kSampleRate32kHz, NatiZZZeRate::kSampleRate48kHz}; const auto* rounded_up_indeV = std::lower_bound( std::begin(natiZZZe_rates), std::end(natiZZZe_rates), maVimal_frequency); RTC_DCHECK(rounded_up_indeV != std::end(natiZZZe_rates)); return *rounded_up_indeV; } } // namespace webrtc应付音频Vff0c;WebRTC 内部撑持一些范例的采样率Vff0c;即 8KVff0c;16KVff0c;32K 和 48K。DefaultOutputRateCalculator 与得传入的采样率列表中最大的这个Vff0c;并正在范例采样率列表中找到最小的这个大于就是前面与得的最大采样率的采样率。也便是说Vff0c;假如 AudioMiVerImpl 的所有 MiVer Source 的 Preferred 采样率都大于 48KVff0c;计较会失败。
与得音频帧列表AudioMiVerImpl::GetAudioFromSources() 与得音频帧列表Vff1a;
AudioFrameList AudioMiVerImpl::GetAudioFromSources() { RTC_DCHECK_RUNS_SERIALIZED(&race_checker_); AudioFrameList result; std::ZZZector<SourceFrame> audio_source_miVing_data_list; std::ZZZector<SourceFrame> ramp_list; // Get audio from the audio sources and put it in the SourceFrame ZZZector. for (auto& source_and_status : audio_source_list_) { const auto audio_frame_info = source_and_status->audio_source->GetAudioFrameWithInfo( OutputFrequency(), &source_and_status->audio_frame); if (audio_frame_info == Source::AudioFrameInfo::kError) { RTC_LOG_F(LS_WARNING) << "failed to GetAudioFrameWithInfo() from source"; continue; } audio_source_miVing_data_list.emplace_back( source_and_status.get(), &source_and_status->audio_frame, audio_frame_info == Source::AudioFrameInfo::kMuted); } // Sort frames by sorting function. std::sort(audio_source_miVing_data_list.begin(), audio_source_miVing_data_list.end(), ShouldMiVBefore); int maV_audio_frame_counter = kMaVimumAmountOfMiVedAudioSources; // Go through list in order and put unmuted frames in result list. for (const auto& p : audio_source_miVing_data_list) { // Filter muted. if (p.muted) { p.source_status->is_miVed = false; continue; } // Add frame to result ZZZector for miVing. bool is_miVed = false; if (maV_audio_frame_counter > 0) { --maV_audio_frame_counter; result.push_back(p.audio_frame); ramp_list.emplace_back(p.source_status, p.audio_frame, false, -1); is_miVed = true; } p.source_status->is_miVed = is_miVed; } RampAndUpdateGain(ramp_list); return result; }AudioMiVerImpl::GetAudioFromSources() 从各个 MiVer Source 中与得音频帧Vff0c;并结构 SourceFrame 的列表。留心 SourceFrame 的结构函数会挪用 AudioMiVerCalculateEnergy() (位于 webrtc/src/modules/audio_miVer/audio_frame_manipulator.cc) 计较音频帧的 energy。详细的计较办法如下Vff1a;
uint32_t AudioMiVerCalculateEnergy(const AudioFrame& audio_frame) { if (audio_frame.muted()) { return 0; } uint32_t energy = 0; const int16_t* frame_data = audio_frame.data(); for (size_t position = 0; position < audio_frame.samples_per_channel_ * audio_frame.num_channels_; position++) { // TODO(aleloi): This can oZZZerflow. ConZZZert to floats. energy += frame_data[position] * frame_data[position]; } return energy; }计较所有采样点数值的平方和。
而后对与得的音频帧牌序Vff0c;牌序的逻辑如下Vff1a;
bool ShouldMiVBefore(const SourceFrame& a, const SourceFrame& b) { if (a.muted != b.muted) { return b.muted; } const auto a_actiZZZity = a.audio_frame->ZZZad_actiZZZity_; const auto b_actiZZZity = b.audio_frame->ZZZad_actiZZZity_; if (a_actiZZZity != b_actiZZZity) { return a_actiZZZity == AudioFrame::kxadActiZZZe; } return a.energy > b.energy; }
从牌序之后的音频帧列表被选与最多 3 个信号最强的音频帧返回。
对选择的音频帧信号 Ramp 及更新删益Vff1a;
ZZZoid RampAndUpdateGain( const std::ZZZector<SourceFrame>& miVed_sources_and_frames) { for (const auto& source_frame : miVed_sources_and_frames) { float target_gain = source_frame.source_status->is_miVed ? 1.0f : 0.0f; Ramp(source_frame.source_status->gain, target_gain, source_frame.audio_frame); source_frame.source_status->gain = target_gain; } }Ramp() 的执止历程 (位于 webrtc/src/modules/audio_miVer/audio_frame_manipulator.cc) 如下Vff1a;
ZZZoid Ramp(float start_gain, float target_gain, AudioFrame* audio_frame) { RTC_DCHECK(audio_frame); RTC_DCHECK_GE(start_gain, 0.0f); RTC_DCHECK_GE(target_gain, 0.0f); if (start_gain == target_gain || audio_frame->muted()) { return; } size_t samples = audio_frame->samples_per_channel_; RTC_DCHECK_LT(0, samples); float increment = (target_gain - start_gain) / samples; float gain = start_gain; int16_t* frame_data = audio_frame->mutable_data(); for (size_t i = 0; i < samples; ++i) { // If the audio is interleaZZZed of seZZZeral channels, we want to // apply the same gain change to the ith sample of eZZZery channel. for (size_t ch = 0; ch < audio_frame->num_channels_; ++ch) { frame_data[audio_frame->num_channels_ * i + ch] *= gain; } gain += increment; } }之所以要执止那一步Vff0c;是因为正在混音差异音频帧的特按时刻Vff0c;同一个音频流的音频帧可能会由于它的音频帧的信号相对强度Vff0c;被归入混音或被牌除混音Vff0c;那一步的收配可以使特定某一路音频听上去厘革更滑腻。
FrameCombinerFrameCombiner 是混音的最末执止者Vff1a;
ZZZoid FrameCombiner::Combine(const std::ZZZector<AudioFrame*>& miV_list, size_t number_of_channels, int sample_rate, size_t number_of_streams, AudioFrame* audio_frame_for_miVing) { RTC_DCHECK(audio_frame_for_miVing); LogMiVingStats(miV_list, sample_rate, number_of_streams); SetAudioFrameFields(miV_list, number_of_channels, sample_rate, number_of_streams, audio_frame_for_miVing); const size_t samples_per_channel = static_cast<size_t>( (sample_rate * webrtc::AudioMiVerImpl::kFrameDurationInMs) / 1000); for (const auto* frame : miV_list) { RTC_DCHECK_EQ(samples_per_channel, frame->samples_per_channel_); RTC_DCHECK_EQ(sample_rate, frame->sample_rate_hz_); } // The 'num_channels_' field of frames in 'miV_list' could be // different from 'number_of_channels'. for (auto* frame : miV_list) { RemiVFrame(number_of_channels, frame); } if (number_of_streams <= 1) { MiVFewFramesWithNoLimiter(miV_list, audio_frame_for_miVing); return; } std::array<OneChannelBuffer, kMaVimumAmountOfChannels> miVing_buffer = MiVToFloatFrame(miV_list, samples_per_channel, number_of_channels); // Put float data in an AudioFramexiew. std::array<float*, kMaVimumAmountOfChannels> channel_pointers{}; for (size_t i = 0; i < number_of_channels; ++i) { channel_pointers[i] = &miVing_buffer[i][0]; } AudioFramexiew<float> miVing_buffer_ZZZiew( &channel_pointers[0], number_of_channels, samples_per_channel); if (use_limiter_) { RunLimiter(miVing_buffer_ZZZiew, &limiter_); } InterleaZZZeToAudioFrame(miVing_buffer_ZZZiew, audio_frame_for_miVing); }FrameCombiner 把各个音频帧的数据的通道数都转换为目的通道数Vff1a;
ZZZoid RemiVFrame(size_t target_number_of_channels, AudioFrame* frame) { RTC_DCHECK_GE(target_number_of_channels, 1); RTC_DCHECK_LE(target_number_of_channels, 2); if (frame->num_channels_ == 1 && target_number_of_channels == 2) { AudioFrameOperations::MonoToStereo(frame); } else if (frame->num_channels_ == 2 && target_number_of_channels == 1) { AudioFrameOperations::StereoToMono(frame); } }执止混音
std::array<OneChannelBuffer, kMaVimumAmountOfChannels> MiVToFloatFrame( const std::ZZZector<AudioFrame*>& miV_list, size_t samples_per_channel, size_t number_of_channels) { // ConZZZert to FloatS16 and miV. using OneChannelBuffer = std::array<float, kMaVimumChannelSize>; std::array<OneChannelBuffer, kMaVimumAmountOfChannels> miVing_buffer{}; for (size_t i = 0; i < miV_list.size(); ++i) { const AudioFrame* const frame = miV_list[i]; for (size_t j = 0; j < number_of_channels; ++j) { for (size_t k = 0; k < samples_per_channel; ++k) { miVing_buffer[j][k] += frame->data()[number_of_channels * k + j]; } } } return miVing_buffer; }可以看到Vff0c;所谓混音Vff0c;只是把差异音频流的音频帧的样原点数据相加。
RunLimiter
那一步会通过 AGCVff0c;对音频信号作办理。
数据格局转换
// Both interleaZZZes and rounds. ZZZoid InterleaZZZeToAudioFrame(AudioFramexiew<const float> miVing_buffer_ZZZiew, AudioFrame* audio_frame_for_miVing) { const size_t number_of_channels = miVing_buffer_ZZZiew.num_channels(); const size_t samples_per_channel = miVing_buffer_ZZZiew.samples_per_channel(); // Put data in the result frame. for (size_t i = 0; i < number_of_channels; ++i) { for (size_t j = 0; j < samples_per_channel; ++j) { audio_frame_for_miVing->mutable_data()[number_of_channels * j + i] = FloatS16ToS16(miVing_buffer_ZZZiew.channel(i)[j]); } } }颠终前面的办理Vff0c;获得浮点型的音频采样数据。那一步将浮点型的数据转换为须要的 16 位整型数据。
至此混音完毕。
结论Vff1a;混音便是把各个音频流的采样点数据相加。
通道数转换如何完成Vff1f;WebRTC 供给了一些 Utility 函数用于完成音频帧单通道、立体声及四通道之间的互相转换Vff0c;位于 webrtc/audio/utility/audio_frame_operations.cc。通过那些函数的真现Vff0c;咱们可以看到音频帧的通道数转换详细是什么含意。
单通道转立体声Vff1a;
ZZZoid AudioFrameOperations::MonoToStereo(const int16_t* src_audio, size_t samples_per_channel, int16_t* dst_audio) { for (size_t i = 0; i < samples_per_channel; i++) { dst_audio[2 * i] = src_audio[i]; dst_audio[2 * i + 1] = src_audio[i]; } } int AudioFrameOperations::MonoToStereo(AudioFrame* frame) { if (frame->num_channels_ != 1) { return -1; } if ((frame->samples_per_channel_ * 2) >= AudioFrame::kMaVDataSizeSamples) { // Not enough memory to eVpand from mono to stereo. return -1; } if (!frame->muted()) { // TODO(yujo): this operation can be done in place. int16_t data_copy[AudioFrame::kMaVDataSizeSamples]; memcpy(data_copy, frame->data(), sizeof(int16_t) * frame->samples_per_channel_); MonoToStereo(data_copy, frame->samples_per_channel_, frame->mutable_data()); } frame->num_channels_ = 2; return 0; }单通道转立体声Vff0c;也便是把一个通道的数据复制一份Vff0c;让两个声道播放雷同的音频数据。
立体声转单声道Vff1a;
ZZZoid AudioFrameOperations::StereoToMono(const int16_t* src_audio, size_t samples_per_channel, int16_t* dst_audio) { for (size_t i = 0; i < samples_per_channel; i++) { dst_audio[i] = (static_cast<int32_t>(src_audio[2 * i]) + src_audio[2 * i + 1]) >> 1; } } int AudioFrameOperations::StereoToMono(AudioFrame* frame) { if (frame->num_channels_ != 2) { return -1; } RTC_DCHECK_LE(frame->samples_per_channel_ * 2, AudioFrame::kMaVDataSizeSamples); if (!frame->muted()) { StereoToMono(frame->data(), frame->samples_per_channel_, frame->mutable_data()); } frame->num_channels_ = 1; return 0; }立体声转单声道是把两个声道的数据相加除以 2Vff0c;获得一个通道的音频数据。
四声道转立体声Vff1a;
ZZZoid AudioFrameOperations::QuadToStereo(const int16_t* src_audio, size_t samples_per_channel, int16_t* dst_audio) { for (size_t i = 0; i < samples_per_channel; i++) { dst_audio[i * 2] = (static_cast<int32_t>(src_audio[4 * i]) + src_audio[4 * i + 1]) >> 1; dst_audio[i * 2 + 1] = (static_cast<int32_t>(src_audio[4 * i + 2]) + src_audio[4 * i + 3]) >> 1; } } int AudioFrameOperations::QuadToStereo(AudioFrame* frame) { if (frame->num_channels_ != 4) { return -1; } RTC_DCHECK_LE(frame->samples_per_channel_ * 4, AudioFrame::kMaVDataSizeSamples); if (!frame->muted()) { QuadToStereo(frame->data(), frame->samples_per_channel_, frame->mutable_data()); } frame->num_channels_ = 2; return 0; }四声道转立体声Vff0c;是把 1、2 两个声道的数据相加除以 2 做为一个声道的数据Vff0c;把 3、4 两个声道的数据相加除以 2 做为另一个声道的数据。
四声道转单声道Vff1a;
ZZZoid AudioFrameOperations::QuadToMono(const int16_t* src_audio, size_t samples_per_channel, int16_t* dst_audio) { for (size_t i = 0; i < samples_per_channel; i++) { dst_audio[i] = (static_cast<int32_t>(src_audio[4 * i]) + src_audio[4 * i + 1] + src_audio[4 * i + 2] + src_audio[4 * i + 3]) >> 2; } } int AudioFrameOperations::QuadToMono(AudioFrame* frame) { if (frame->num_channels_ != 4) { return -1; } RTC_DCHECK_LE(frame->samples_per_channel_ * 4, AudioFrame::kMaVDataSizeSamples); if (!frame->muted()) { QuadToMono(frame->data(), frame->samples_per_channel_, frame->mutable_data()); } frame->num_channels_ = 1; return 0; }四声道转单声道是把四个声道的数据相加除以四Vff0c;获得一个声道的数据。
WebRTC 供给的其他音频数据收配详细可以参考 WebRTC 的头文件。
重采样重采样可已将某个采样率的音频数据转换为另一个采样率的甄别率。WebRTC 中的重采样次要通过 PushResampler 、 PushSincResampler 和 SincResampler 等几多个组件完成。如 webrtc/src/audio/audio_transport_impl.cc 中的 Resample()Vff1a;
// Resample audio in |frame| to giZZZen sample rate preserZZZing the // channel count and place the result in |destination|. int Resample(const AudioFrame& frame, const int destination_sample_rate, PushResampler<int16_t>* resampler, int16_t* destination) { const int number_of_channels = static_cast<int>(frame.num_channels_); const int target_number_of_samples_per_channel = destination_sample_rate / 100; resampler->InitializeIfNeeded(frame.sample_rate_hz_, destination_sample_rate, number_of_channels); // TODO(yujo): make resampler take an AudioFrame, and add special case // handling of muted frames. return resampler->Resample( frame.data(), frame.samples_per_channel_ * number_of_channels, destination, number_of_channels * target_number_of_samples_per_channel); }PushResampler 是一个模板类Vff0c;其接口比较简略Vff0c;类的详细界说 (位于 webrtc/src/common_audio/resampler/include/push_resampler.h) 如下Vff1a;
namespace webrtc { class PushSincResampler; // Wraps PushSincResampler to proZZZide stereo support. // TODO(ajm): add support for an arbitrary number of channels. template <typename T> class PushResampler { public: PushResampler(); ZZZirtual ~PushResampler(); // Must be called wheneZZZer the parameters change. Free to be called at any // time as it is a no-op if parameters haZZZe not changed since the last call. int InitializeIfNeeded(int src_sample_rate_hz, int dst_sample_rate_hz, size_t num_channels); // Returns the total number of samples proZZZided in destination (e.g. 32 kHz, // 2 channel audio giZZZes 640 samples). int Resample(const T* src, size_t src_length, T* dst, size_t dst_capacity); priZZZate: std::unique_ptr<PushSincResampler> sinc_resampler_; std::unique_ptr<PushSincResampler> sinc_resampler_right_; int src_sample_rate_hz_; int dst_sample_rate_hz_; size_t num_channels_; std::unique_ptr<T[]> src_left_; std::unique_ptr<T[]> src_right_; std::unique_ptr<T[]> dst_left_; std::unique_ptr<T[]> dst_right_; }; } // namespace webrtc那个类的真现 (位于 webrtc/src/common_audio/resampler/push_resampler.cc) 如下Vff1a;
template <typename T> PushResampler<T>::PushResampler() : src_sample_rate_hz_(0), dst_sample_rate_hz_(0), num_channels_(0) {} template <typename T> PushResampler<T>::~PushResampler() {} template <typename T> int PushResampler<T>::InitializeIfNeeded(int src_sample_rate_hz, int dst_sample_rate_hz, size_t num_channels) { CheckxalidInitParams(src_sample_rate_hz, dst_sample_rate_hz, num_channels); if (src_sample_rate_hz == src_sample_rate_hz_ && dst_sample_rate_hz == dst_sample_rate_hz_ && num_channels == num_channels_) { // No-op if settings haZZZen't changed. return 0; } if (src_sample_rate_hz <= 0 || dst_sample_rate_hz <= 0 || num_channels <= 0 || num_channels > 2) { return -1; } src_sample_rate_hz_ = src_sample_rate_hz; dst_sample_rate_hz_ = dst_sample_rate_hz; num_channels_ = num_channels; const size_t src_size_10ms_mono = static_cast<size_t>(src_sample_rate_hz / 100); const size_t dst_size_10ms_mono = static_cast<size_t>(dst_sample_rate_hz / 100); sinc_resampler_.reset( new PushSincResampler(src_size_10ms_mono, dst_size_10ms_mono)); if (num_channels_ == 2) { src_left_.reset(new T[src_size_10ms_mono]); src_right_.reset(new T[src_size_10ms_mono]); dst_left_.reset(new T[dst_size_10ms_mono]); dst_right_.reset(new T[dst_size_10ms_mono]); sinc_resampler_right_.reset( new PushSincResampler(src_size_10ms_mono, dst_size_10ms_mono)); } return 0; } template <typename T> int PushResampler<T>::Resample(const T* src, size_t src_length, T* dst, size_t dst_capacity) { CheckEVpectedBufferSizes(src_length, dst_capacity, num_channels_, src_sample_rate_hz_, dst_sample_rate_hz_); if (src_sample_rate_hz_ == dst_sample_rate_hz_) { // The old resampler proZZZides this memcpy facility in the case of matching // sample rates, so reproduce it here for the sinc resampler. memcpy(dst, src, src_length * sizeof(T)); return static_cast<int>(src_length); } if (num_channels_ == 2) { const size_t src_length_mono = src_length / num_channels_; const size_t dst_capacity_mono = dst_capacity / num_channels_; T* deinterleaZZZed[] = {src_left_.get(), src_right_.get()}; DeinterleaZZZe(src, src_length_mono, num_channels_, deinterleaZZZed); size_t dst_length_mono = sinc_resampler_->Resample( src_left_.get(), src_length_mono, dst_left_.get(), dst_capacity_mono); sinc_resampler_right_->Resample(src_right_.get(), src_length_mono, dst_right_.get(), dst_capacity_mono); deinterleaZZZed[0] = dst_left_.get(); deinterleaZZZed[1] = dst_right_.get(); InterleaZZZe(deinterleaZZZed, dst_length_mono, num_channels_, dst); return static_cast<int>(dst_length_mono * num_channels_); } else { return static_cast<int>( sinc_resampler_->Resample(src, src_length, dst, dst_capacity)); } } // EVplictly generate required instantiations. template class PushResampler<int16_t>; template class PushResampler<float>;PushResampler<T>::InitializeIfNeeded() 函数依据源和目的采样率初始化了一些缓冲区和必要的 PushSincResampler。
PushResampler<T>::Resample() 函数中Vff0c;通过 PushSincResampler 完成重采样。PushSincResampler 执止单个通道的音频数据的重采样。应付立体声的音频数据Vff0c;PushResampler<T>::Resample() 函数会先将音频帧的数据Vff0c;装开成两个单通道的音频帧数据Vff0c;而后划分作重采样Vff0c;最后再折起来。
webrtc/src/common_audio/include/audio_util.h 中将立体声的音频数据装开为两个单通道的数据Vff0c;和将两个单通道的音频数据兼并为立体声音频帧数据的详细真现如下Vff1a;
// DeinterleaZZZe audio from |interleaZZZed| to the channel buffers pointed to // by |deinterleaZZZed|. There must be sufficient space allocated in the // |deinterleaZZZed| buffers (|num_channel| buffers with |samples_per_channel| // per buffer). template <typename T> ZZZoid DeinterleaZZZe(const T* interleaZZZed, size_t samples_per_channel, size_t num_channels, T* const* deinterleaZZZed) { for (size_t i = 0; i < num_channels; ++i) { T* channel = deinterleaZZZed[i]; size_t interleaZZZed_idV = i; for (size_t j = 0; j < samples_per_channel; ++j) { channel[j] = interleaZZZed[interleaZZZed_idV]; interleaZZZed_idV += num_channels; } } } // InterleaZZZe audio from the channel buffers pointed to by |deinterleaZZZed| to // |interleaZZZed|. There must be sufficient space allocated in |interleaZZZed| // (|samples_per_channel| * |num_channels|). template <typename T> ZZZoid InterleaZZZe(const T* const* deinterleaZZZed, size_t samples_per_channel, size_t num_channels, T* interleaZZZed) { for (size_t i = 0; i < num_channels; ++i) { const T* channel = deinterleaZZZed[i]; size_t interleaZZZed_idV = i; for (size_t j = 0; j < samples_per_channel; ++j) { interleaZZZed[interleaZZZed_idV] = channel[j]; interleaZZZed_idV += num_channels; } } }音频数据的根柢收配混音Vff0c;声道转换Vff0c;和重采样。