Friday, May 18, 2018

Voice coder

Description: Creating music, I've seen a lot of different virtual instruments and effects. One of the most interesting effects is the vocoder, which allows you to modulate his voice and make it look like a voice for example a robot or something like that. Vocoder was originally used to compress the voice data, and then it began to be used in the music industry. Because I had free time, I decided to write something like this for the sake of the experiment and describe in detail the stages of development for VB6.

Download from VBForums
Download from Me

So, take a look at the simplest scheme vocoder:

The signal from the microphone (speech) is fed to a bank of band-pass filters, each of which passes only a small part of the frequency band of the speech signal. The greater the number of filters - the better speech intelligibility. At the same time, the carrier signal (e.g. ramp) is also passed through the same filter bank. Filter output speech signal is fed to envelope detectors which control modulators and outputs a filter carrier signal passes to the other input of the modulator. As a result, each band speech signal adjusts the level of the corresponding band carrier (modulates it). Further, output signals from all modulators are mixed and sent to the output. Further, all signal modulators are mixed and sent to the output. In order to improve speech intelligibility also apply additional blocks, such as the detector "sizzling" sound. So, to begin development necessary to determine the source signals, where they will take. It is possible for example to capture data from a file or directly processed in real-time from a microphone or line input. To test very easy to use file, so we will do and so and so. As the carrier will use an external file looped in a circle, to adjust the tone simply add the ability to change the playback speed, which will change the tone. To capture the sound of the file will use Audio Compression Manager (ACM), with it very convenient to make conversion between formats (because the file can be in any format, you would have to write some functions to different formats). It may be that to convert to the desired format will not correct ACM drivers, then play this file will not be available (although you can try to do it in 2 stages). As input files will use the wav - files, because to work with them in the system has special features to facilitate retrieving data from them.

When loading of form we perform initialization of all components. Capture, playing back the audio size FFT, the amount of overlap, overlapping buffers, creating buffers for integer and complex data. Next, I made a box shape with rounded corners, as use a window without frame (draw in the nonclient area had no desire). Now the whole problem is reduced to handling events - AudioPlayback_NewData and AudioCapture_NewData. First event occurs when the playback device needs another portion of the audio data, the second when the buffer capture, in which we simply copy the data into a temporary buffer from where it will take them at processing AudioPlayback_NewData. The main method - Process, in it we just do the conversion. First we check whether we capture from a file or device. To do this, we check the variable mInpFile, which specifies the name of the input file to capture. If capture is made from a file, then we are using object inpConv, which is an instance of clsTrickWavConverter, convert the data into the format you want us to. If the data is finished (the number of bytes read does not match the passed), it means that we are on the edge of the file and continue to have to start over again. Also check the carrier signal and if it is not set then just copy the input data on output and, in this case, we will hear the raw sound. Otherwise, we translate the data into a complex form (count a real part of the signal and the imaginary zero out) and puts the resulting array in an overlapping buffer. Next, start processing the carrier signal. Because carrier signal we can have a very small length (you can use one wave period), in order to optimize I will do the repetition of the signal if required. Let me explain. For example, if we have a carrier signal 10 ms and 100 ms buffer (for example), then you could just call the conversion each time using ACM overwriting the pointer to the array destination, but it is not optimal. For optimization can be converted only once, and then simply duplicate the data to the end of the array, which we did. Only then do not forget to change the position in the source file, otherwise the next phase of the reading will not be the same and will flicks. We will write to another buffer (rawBuffer). This buffer length is based on the pitch shift. For example, if we want to shift the tone for the amount of semitones (halftones), the buffer size must be rawBuffer 2semitones / 12 times more. Then we simply compress / stretch buffer to a value mFFTSize, which will give us the acceleration / deceleration, and as a result increase / decrease tone. After all the manipulations we write data in an overlapping buffer and start processing. To do this, we pass by the number of overlapping data and handle them. Class objects clsTrickOverlappedBuffer return us the correct data. Processing is clear from the code, as We consider in detail the performance of each class. After processing all of overlap we get the output and convert them to integer suitable for playback. As the setting uses a form frmSettings. As the list of devices using a standard listbox, just going through my drawing class. The list of devices will be added in the following order:
  • A default device predetermined format
  • Device 1
  • Device 2
  • ...
  • Device n
  • Capturing from a file

For testing click on the last point message is used LB_GETITEMRECT, which receives the coordinates and size of the item in the list. If this is not done then click the sheet of paper, if there is an empty space at the bottom will be equivalent to clicking on the last point. In the handler settings button in the main form frmTrickVocoder we check capture device and either open the file for conversion or initialize capture. To adjust the volume and mixing using a logarithmic scale, as the sensitivity of the human ear is not linear.


1 comment:

  1. The information you shared was useful. Thank you for taking the time to organize it. Mitroz Technologies is a professional software development firm in Pune India. We furnish our services and best solutions to our customers. We deliver great websites and mobile apps of every size to our clients. If you require any further information related to software, web or mobile app development, feel free to contact us at +91 7066734606 or drop an email Please go through our website for more information.