- The early works focused on estimating the spectral envelope of the speech signal and model the mapping from narrowband to wideband signals.
- Our work is related to earlier work on auto-regressive multi-layer neural networks, starting with Bengio & Bengio (1999), then NADE (Larochelle & Murray, 2011) and more recently PixelRNN (van den Oord et al., 2016). Similar to how they tractably model joint distribution over units of the data (e.g. words in sentences, pixels in images, etc.) through an auto-regressive decomposition, we transform the joint distribution of acoustic samples using Eq. 1.
To investigate the behavior of RPNs as a proposal method, we conducted several ablation studies. First, we show the effect of sharing convolutional layers between the RPN and Fast R-CNN detection network. To do this, we stop
after the second step in the 4-step training process. Using separate networks reduces the result slightly to 58.7% (RPN+ZF, unshared, Table 2).** We observe that** this is because in the third step when the detectortuned features are used to fine-tune the RPN, the proposal quality is improved.
Next, we disentangle the RPN’s influence on training the Fast R-CNN detection network. For this purpose, we train a Fast R-CNN model by using the 2000 SS proposals and ZF net. We fix this detector and evaluate the detection mAP by changing the proposal regions used at test-time. In these ablation experiments, the RPN does not share features with the detector
- complicated 复杂的
- For some cases, a slight improvement can be observed from the 2nd and 3rd columns in Fig. 8
- We can further observe that …
- We observe that this is because …
Best results are shown in bold.