Search results

Filters

  • Journals
  • Authors
  • Keywords
  • Date
  • Type

Search results

Number of results: 3
items per page: 25 50 75
Sort by:
Download PDF Download RIS Download Bibtex

Abstract

Glottal waveform models have long been employed in improving the quality of speech synthesis. This paper presents a new approach for modeling the glottal flow. The model is based on three control volumes that strike a one-mass and two-springs system sequentially and generate a glottal pulse. The first, second and third control volumes represent the opening, closing and closed phases of the vocal folds, respectively. The masses of the three control volumes and the size of the first one are the four parameters that define the shape, pitch and amplitude of the glottal pulse. The model may be viewed as parametric approach governed by second order differential equations rather than analytical functions and is very flexible for designing a glottal pulse. The glottal pulse generated by the present model, when compared with those generated by Rosenberg, LF and mucosal wave propagation models demonstrates that it appropriately represents the opening, closing and closed phases of the vocal fold oscillation. This leads to the validity of our model. Numerical solution of the present model has been found to be very efficient as compared to its analytical solution and two other well-known parametric models Rosenberg++ and LF. The accuracy of the numerical solution has been illustrated with the help of analytical solution. It has been observed that the accuracy improves by increasing the size of the first control volume and may decrease insignificantly with increase in the mass of any of the control volumes. Two experiments with the present model support its successful implementation as a voice source in speech synthesis. Thus our model renders itself as an efficient, accurate and realistic choice as a voice source to be employed in real-time speech production.

Go to article

Authors and Affiliations

Tahir Qureshi
Khalid Syed
Download PDF Download RIS Download Bibtex

Abstract

A vocal tract model based on a digital waveguide is presented in which the vocal tract has been decomposed into uniform cylindrical segments of variable lengths. We present a model for the real-time numerical solution of the digital waveguide equations in a uniform tube with the temporally varying cross section. In the current work, the uniform cylindrical segments of the vocal tract may have their different lengths, the time taken by the sound wave to propagate through a cylindrical segment in an axial direction may not be an integer multiple of each other. In such a case, the delay in an axial direction is necessarily a fractional delay. For the approximation of fractional-delay filters, Lagrange interpolation is used in the current model. Variable length of the individual segment of the vocal tract enables the model to produce realistic results. These results are validated with accurate benchmark model. The proposed model has been devised to elongate or shorten any arbitrary cylindrical segment by a suitable scaling factor. This model has a single algorithm and there is no need to make section of segments for elongation or shortening of the intermediate segments. The proposed model is about 23% more efficient than the previous model.

Go to article

Authors and Affiliations

Tahir Mushtaq Qureshi
Muhammad Ishaq
Download PDF Download RIS Download Bibtex

Abstract

For many years, a digital waveguide model is being used for sound propagation in the modeling of the vocal tract with the structured and uniform mesh of scattering junctions connected by same delay lines. There are many varieties in the formation and layouts of the mesh grid called topologies. Current novel work has been dedicated to the mesh of two-dimensional digital waveguide models of sound propagation in the vocal tract with the structured and non-uniform rectilinear grid in orientation. In this work, there are two types of delay lines: one is called a smaller-delay line and other is called a larger-delay line. The larger-delay lines are the double of the smaller delay lines. The scheme of using the combination of both smaller- and larger-delay lines generates the non-uniform rectilinear two-dimensional waveguide mesh. The advantage of this approach is the ability to get a transfer function without fractional delay. This eliminates the need to get interpolation for the approximation of fractional delay and give efficient simulation for sound wave propagation in the two-dimensional waveguide modeling of the vocal tract. The simulation has been performed by considering the vowels /ɔ/, /a/, /i/ and /u/ in this work. By keeping the same sampling frequency, the standard two-dimensional waveguide model with uniform mesh is considered as our benchmark model. The results and efficiency of the proposed model have compared with our benchmark model.

Go to article

Authors and Affiliations

Tahir Mushtaq Qureshi
Khalid Saifullah Syed
Asim Zafar

This page uses 'cookies'. Learn more