Voice chip / what is Voice IC / definition of voice chip

Source: Date: 2020-11-25

From the point of view of the name, the voice chip is a chip related to voice. Voice is the stored electronic sound. Any chip that can make a sound is a voice chip, commonly known as a voice chip. To be more accurate in English, it should be Voice IC.

In the big family of voice chips, according to the type of voice, it can be divided into two types: voice IC (Speech IC) and music IC (Music IC). This should be regarded as a professional method of distinguishing voice chips.

In daily life, the application of voice chips is different from the industry, and they are divided into toy chips (used in the toy industry, such as AC80E5), doorbell chips (AC8DM32), OTP voice chips (AC8040), nursery rhymes IC (AC8DE12), stroller ICs Wait. Of course, she also has voice IC (Speech ic) and music IC (Music IC) in this category.

The voice chip has multiple channels according to the physical structure of the IC itself (multiple channels sound at the same time) can be divided into multiple types:

One, single channel:
1, Single-channel voice IC (Speech IC) (this kind of voice chip does not support music IC music storage mode); common voice IC is a single-channel voice chip, AC8020-OTP 20 seconds and AC83E12 animal calls are the most typical single Channel voice chip,
2, Single-channel music IC (Music IC), a music IC that can only emit one kind of music in the same unit time, electronic sound files have only one channel. Mid suffix file.
The often said monophonic film is one of the most basic music ICs. The effect of the monophonic film is determined by the number of notes output in a certain period of time. There are more than 64 notes, 128 notes, etc. Monophonic films are widely used, The price is extremely low, the most common are monophonic films and happy birthday greeting card monophonic films. Typical examples are AC8SE07, etc.
Strictly speaking, the structure of the two-channel music IC and the monophonic film are different.

Two, 2 channels:
    1, 2-channel voice IC, 2-channel and multi-channel voice chips. In actual applications, voice playback is generally fixed in a certain channel for sound playback (equivalent to single channel). The cost of channel speech IC (Speech ic) is higher, and the price will be higher.In order to balance the price and application of the product when designing the voice chip manufacturer, in general, the function support and sound effects will be more perfect.
This structure may be determined by the actual application field and price of products and solutions. The voice chip output is generally a single-channel sound output. There are few products that support stereo. For high-end products, you must choose one of the MP3 master chips. Class scheme
    2, 2-channel music chip, popularly called Music With Dual Tone IC, so the name implies, a music IC that can emit music on both channels in the same unit time. Electronic sound source files are generally. Mid's two-channel file. Common Christmas series music IC such as: AC8DC12.
    There are two more words to add here. There is also a music chip called melody on the market. What is its definition? In simple terms, the effect of a monophonic film is like a music chip that is worse than a chord music chip. Therefore, dual-tone films are also called melody music chips.The melody structure should be a more advanced monophonic film, or it can be said that it is a monophonic film with double effect.

Three, 4 channels, 8 channels or more:
Sounds with more than three channels. Also known as chord music. The often said 4 chord music IC refers to the 4-channel music IC...
Generally, multi-channel voice chips support both music IC (Music IC) and speech IC (Speech IC) functions.

How to distinguish voice chips with and without integrated MCU

    First look at the definition of MCU (Micro Controller Unit), also known as single chip microcomputer (Single Chip Microcomputer), which means that with the emergence and development of large-scale integrated circuits, the computer’s CPU, RAM, ROM, timer and counter A variety of I/O interfaces are integrated on a chip to form a chip-level computer.
    There are two kinds of voice chips, one is integrated with MCU, such as AC4060, AC5080 programmable OTP series, which has more powerful functions. Simply speaking from the physical structure, it integrates RAM (random memory and timer counter, etc.) and ROM. Memory. The other one does not integrate MCU, it only has a ROM inside (not specific calculations and random storage timers and other functions), such as AC9020, AC8040, AC3030 series chips, but the price will be quite cheaper.
    Simply put, the OTP voice chip with RAM is integrated with MCU. Due to the characteristics of RAM random access memory, the chip can complete more complex functions.

According to the definition of the physical characteristics of the voice chip:

The voice chip is an integrated circuit that converts the voice signal into a digital signal through sound sampling, stores it in the ROM of the chip, and then restores the digital signal in the ROM to a voice signal through a circuit.

There are two ways to output the voice signal of the conventional voice chip: pulse width modulation (PWM) output and digital-to-analog conversion output (DAC).

Pulse width modulation (PWM) output is a very effective technology that uses the digital output of a microprocessor to control analog circuits. Compared with DAC output, it is characterized by wide application and low cost. It is the main sound output application method in the toy industry.

Digital-to-analog conversion output (DAC): Ordinary voice chip refers to an integrated circuit with a separate playback function, which is essentially a DAC process, while the ADC process data is completed by the computer, including the sampling, compression, EQ, etc. of the voice signal deal with.

ADC=Analog Digital Change

DAC = Digital Analog Change

The sound quality depends on the number of ADC and DAC bits. Some ADCs and DACs are both 32bit, which is close to real human sound quality. And some ADCs and DACs are 16bit, close to CD sound quality. And some common DACs are 8bit, which is common sound quality.

The recording chip includes ADC and DAC two processes, which are all completed by the chip itself, including the steps of voice data collection, analysis, compression, storage, and playback.

2. Quantitative expression of speech signal: (Classification: speech IC and music IC)

(a) Introduction of "Voice IC":

(1) Quantization of speech signal

Sampling rate (f), number of bits (n), baud rate (T)

Sampling: Convert voice analog signals into digital signals.

Sampling rate: the number of samples per second (byte).

Baud rate: The number of bits sampled per second (bit). The baud rate directly determines the sound quality. Bps: bit per second

The number of sampling bits refers to the number of bits under binary conditions. Generally, unless otherwise specified, the number of sound samples refers to 8 bits, from 00H to FFH, and the mute is set to 80H. Shenzhen Huanxin Semiconductor Co., Ltd.

(2) Sampling rate

Nyquist sampling theorem: To restore the original signal without distortion from the sampled signal, the sampling frequency should be greater than 2 times the highest frequency of the signal. When the sampling frequency is less than 2 times the highest frequency of the spectrum, the signal spectrum is aliased. When the sampling frequency is greater than 2 times the highest frequency of the spectrum, there is no aliasing in the signal spectrum.

The bandwidth of the voice is about 20-20K HZ, and the ordinary voice is probably below 3KHZ. Therefore, the general sound quality of CD is 44.1K and 16bit. If you encounter some special sounds, such as musical instruments, the sound quality is also available in 48K and 24bit, but it is not the mainstream.

Generally, when we are dealing with ordinary voice ICs, the sampling rate is up to 16K, and the voice is generally 8K (such as telephone sound quality) or about 6K. The effect is poorer below 6K.

In the process of applying the single-chip microcomputer, the higher the sampling, the faster the timer interruption speed, which will affect the monitoring and detection of other signals, so comprehensive consideration must be given.

(3) Voice compression technology.

Due to the huge amount of voice data, effective compression of voice data is necessary, which enables us to record more voice content in the limited ROM space. There are several ways:

Voice segmentation: cut out the repeatable part of the voice, and play back the content completely through permutation and combination.

Voice sampling: Generally, the frequency response curve of the speaker we use is in the middle frequency part, and the high frequency is rarely used. Therefore, when the speaker sound quality is acceptable, the sampling frequency should be appropriately reduced to achieve the compression effect. This process is irreversible and cannot be Restoring the original appearance is called lossy compression.

Mathematical compression: Compression is mainly based on the number of sampling bits. This method is also lossy compression. For example, the ADPCM compression format we often use is to compress voice data from 16bit to 4bit, and the compression rate is 4 times. MP3 compresses the data stream and involves data prediction. Its baud rate compression ratio is about 10 times.

Usually, the above several compression methods are used in combination.

(4) Commonly used voice formats

PCM format: Pulse Code Modulation, which samples the sound analog signal to obtain quantized voice data. It is the most basic and primitive voice format. It is very similar to RAW format and SND format. They are all pure speech formats.

WAV format: Wave Audio Files is a sound file format developed by Microsoft Corporation, also called wave sound file, which is widely supported by the Windows platform and its applications. The WAV format supports many compression algorithms, supports a variety of audio bits, sampling frequencies, and sound channels, but the WAV format requires too much storage space for communication and dissemination. Each piece of data stored in the WAV file has its own independent identification. These identifications can tell the user what data it is. These data include sampling frequency and number of bits, mono (mono) or stereo (stero), etc. Ring core voice chip IC

ADPCM format: It uses several past sample values to predict the current input sample value, and enables it to have an adaptive prediction function to compare with the actual detection value, and the measured difference is automatically processed at any time. Make it always keep changing with the signal. It is suitable for situations where the voice change rate is moderate, and the voice playback process is short. Its advantage is that the processing of human voice is more realistic, generally reaching more than 90%, and it has been widely used in the field of telephone communication.

MP3 format: Moving Picture Experts Group Audio Layer III, referred to as MP3. It uses the technology of MPEG Audio Layer 3 and adopts an encoding algorithm called "sensory coding technology": when encoding, the audio file is first analyzed on the spectrum, and then the noise level is filtered out with a filter, and then the remaining is quantized. Each bit below is scattered and arranged, and finally an mp3 file with a higher compression ratio is formed, and the compressed file can achieve a sound effect closer to the original sound source during playback. Its essence is that vbr (Variant Bitrate) can dynamically select the appropriate baud rate according to the encoded content, so the result of encoding is to ensure the sound quality while taking care of the file size.

mp3 compression rate is 10 times or even 12 times. It is a voice format with high compression rate that first appeared.

Linear Scale format: According to the change rate of the sound, the sound is divided into several segments, and each segment is compressed with a linear scale, but its ratio is variable. The Linear Scale format of SUNLINK and ALPHA is 5bit.

Logpcm format: basically linearly compress the entire sound, and remove the last few bits. This compression method is easy to implement on hardware, but the sound quality is worse than Linear Scale, especially when the volume is small and the sound is more delicate. Mainly used for pure speech.

(B) Introduction of "Music IC":

(1) The channels and timbre of music:

Envelope square wave (patch) channel (channel)

Envelope: part of the synthetic tone, the change of note output per unit time, commonly known as "ADSR"

Square wave: part of the synthesized tone, the change of the square wave current of the note in a unit time. (See also triangle wave etc.)

Channel: At the same time, the maximum number of notes output by the IC, that is, the number of "single musical instruments".

PCT: A type of analog sound, which simulates the pitch of each note by sampling the sound of an instrument with 256 points. (The sound is soft, and it takes up little space, but it is not realistic enough)

FULL WAVE: Simulate the pitch of each note by collecting the sound of an instrument. (The sound of the instrument is real, but it occupies a lot of space and requires high quality of collected timbre

(2) Compression of music:

Due to the huge amount of music data, effective compression of music data is necessary, which enables us to record more music content in the limited ROM space. There are several ways:

Music segmentation: Cut out the repeatable part of the music, and play back the content completely through permutation and combination.

Tone: According to the fullness and demand of the music, the choice of Full wave, PCT, and dual tone is determined. Each tone takes up different space and has different tone quality. .

Mathematical compression: It is mainly used to compress the sampled timbre (Full wave). This method is also lossy compression. Downsampling and processing are performed on the timbre to be collected to reduce the size of the collected timbre (same tone modification).

(3) Common music formats:

MID format: MIDI (Musical Instrument Digital Interface) musical instrument digital interface, which was proposed in the early 1980s to solve the communication problem between electroacoustic musical instruments. What MIDI transmits is not sound signals, but instructions such as notes and control parameters.

WAV format: (see the introduction of voice IC) the format for collecting timbre.

3. Representation of voice ROM space

The voice chip is the visualization of the expression, expressed by the length of the voice

a) Normal voice chip uses 6K sampling rate to calculate the voice length.

b) The recording IC uses the 4K sampling rate to calculate the voice length (take AC6006, AC6009, AC6012 as examples).

That is: the length that the chip can play at 6k (4k) sampling rate.

4. Elements of a voice chip

Back Top

Previous: The Method of Random Number Generation by Single-chip Timer

Next: What are the differences between touch MCU and touch IC, and what are the differences

Technical Support

Voice chip / what is Voice IC / definition of voice chip

Recommended news

ELAN: Capacitive multi-finger touch technology

Di Guanjie: Breaking through technical barriers and leading the innovation and development of MCU field

IC factory ELAN Group joins hands with NTU AI Center to create intelligent transportation system to enter Southeast Asian countries

NY2 series products are single-chip CMOS music and speech synthesis ICs

ELAN launches high-speed 8-bit Flash MCU products

Development strength

Global channels

Quality Assurance

After-sales service