Kepuska, Veton

Associate Professor
Electrical and Computer Engineering

My goal is to make a significant contribution in advancing Human - Machine Interaction and Communication through my Wake-Up-Word (WUW) Speech Recognition (SR) Technology. Conventional Speech Recognition Systems typically operate at their best within the range of 99% accuracy.  This implies that for the natural rate of conversation of a human speech the person who utters 100 words per minute, then we are expected to have at least 1 (one) error per minute. My research has shown that WUW SR will make 1 (one) error per 3 hours

Educational Background

1990

Ph.D.

Computer Engineering
Clemson University

Dissertation

Artificial Neural Networks for Speech Recognition Applications

Advisor

John N. Gowdy

1986

M.S.

Computer Engineering
Clemson University

Advisor

John N. Gowdy

1981

Dipl. Eng.

Electrical Engineering University of Prishtina

Thesis

The use of the Analog Computers for Simulation and Automatic Control

Advisor

Abdurrahman Grapci

1976

Diploma

Mathematical Gymnasium

Diploma Work

Experimental Methods for Measurements of the Speed of Light

Advisor

Skender Skenderi

Recognition & Awards

2011

FaST - Calculate Potential Energy Savings-from Using Mobile Smart Technologies.
http://science.energy.gov/wdts/fast/project-descriptions/2011-projects/epa-calculate-potential-energy-savings-from-using-mobile-smart-technologies/

2008 - 2009


Kerry Bruce Clark Teacher

2008

Greatest Commercial Potential - "Smart Room" Senior Design 2008.

2007

Third Place in IEEE SouthEastCo. Student Hardware Competition: Basketball Robot

2007

Best Junior Design 2007 - Visual Audio

2006

Best Paper Nomination " 2006-472: A MATLAB TOOL FOR SPEECH PROCESSING, ANALYSIS AND RECOGNITION: SAR-LAB"

2005

UML-ADI Assistive Device Competition, June 2005, University of Massachusetts Lowell MA, First Plac
http://faculty.uml.edu/Mufeed_Mahd/UML_ADI/photo_fit.htm

1984 – 1985


Fulbright Fellow

1987 – 1988


Harris Fellow

1977 – 1979


Univeristy of Prishtina Fellow

Current Courses

Graduate:

  1. Speech Processing – ECE 5525
  2. Speech Recognition – ECE 5526
  3. Search and Decoding in Speech Recognition – ECE 5527
  4. Computer Networks 2 – ECE 5535
  5. Spetial Topics Course: Android Programming - ECE 5570
  6. Digital System Design 1 – ECE 5571
  7. Digital System Design 2 – ECE 5572

Undergraduate:

  1. Hardware Software Design – ECE 2551
  2. Hardware Software Integration – ECE 2552
  3. Signal and Systems – ECE 3222
  4. Digital State Machines – ECE 3541
  5. Microcomputer Systems 1 – ECE 3551
  6. Microcomputer Systems 2 – ECE 3552
  7. Multifarious Systems 1 – ECE 3553
  8. Multifarious Systems 2 – ECE 4553
  9. Computer Architecture – ECE 4551
  10. Computer Communications - ECE 4561
  11. Electric and Electronic Circuits – ECE 4991

Professional Experience

2003 - Present

Florida Institute of Technology, Electrical and Computer Engineering – Associate Professor:

  • Accomplishments:
  • Developed a web portal with my graduate students for TRDA of Melbourne in collaboration with Nterspec.
  • PPT Commander – A Voice Only Activated Power Point Presentation Application
  • Ported PPT Commander to Apple Mac OS
  • Developed Voice Activated Elevator Simulator:
    • http://www.youtube.com/watch?v=j5CeVtQMvK0
    • http://www.youtube.com/watch?v=OQ8eyBTbS_E
    • Developed a Nursing Call Station Voice Only Activated interface for patients. Researching for ways to extend its capability for the patient to control its bed, TV and other devices connected to the system.
    • Developed a Voice Activated Car Inspection System prototype for BMW:
      • http://files.me.com/hardcaseron/l3byyd.mov
      • Designed and Developed a High Speed Currency Bill Reader system using Embedded Hardware.
      • Organized and Hosted NIST Rich Transcription Evaluation Workshop,
      • Hosted and Participated in International "NIST Rich Transcription Evaluation" 2009
      • First Place in the First Annual Analog Devices & University of Massachusetts DSP Contest 2005 (Brian Ramos and Don McMann),

http://faculty.uml.edu/Mufeed_Mahd/UML_ADI/photo_fit.htm),

  • Third Place in IEEE SouthEastCon 2007 Student Hardware Competition: Basketball Robot among 38 Universities (Ronald Ramdhan, Xerxes Beharry & Sean Powers). http://www.southeastcon.org/2007/students/. The robot is displayed in Deans Conference Room.
  • Best Paper Nomination " 2006-472: A MATLAB TOOL FOR SPEECH PROCESSING, ANALYSIS AND RECOGNITION: SAR-LAB" ASEE 2006 (undergraduate co-authors Rogers N., Patel M.),
  • Best Junior Design 2007 - Visual Audio - (Brandon Schmitt).
  • Greatest Commercial Potential- "Smart Room" Senior Design 2008. (Matt Hopkins, David Herndon, Patrick Marinelli).
  • AWARDS:
    • FaST - Program Calculate Potential Energy Savings-from Using Mobile Smart Technologies. http://science.energy.gov/wdts/fast/project-descriptions/2011-projects/epa-calculate-potential-energy-savings-from-using-mobile-smart-technologies/, 2011
    • Kerry Bruce Clark Teacher, 2009
    • UML-ADI Assistive Device Competition, June 2005 Lowell MA, First Place.
      Developed and Ported Wiener Based Noise Removal Algorithm to Analog Devices ADDS 21161 DSP.
  • Notable Presentations:

CS Dept. Curriculum Series Presentation, 2005:
“Wake-Up-Word Speech Recognition: A Missing Link toward Natural Language Understanding”.

  • NSF Proposals – PI : Written over 20 NSF proposal.
  • NSF Proposal – Co-PI: Participated in over 15 NSF proposal.

 

2001 - 2003

Speech Recognition Scientist - ThinkEngine Networks, Inc., 175 Maple Street, Marlborough, MA 01745. USA.

  • Invented, Designed and Developed unique solution to “Wake-Up-Word” or “OnWord™Spotting Technology. Wake-Up-Word Spotting entails recognition of a specific word/phrase uttered in isolation or in a context of a continuous speech. Currently this technology is not as widely used as other Speech Recognition Applications/Tasks because of poor performance of Speech Recognition Systems offering such technology commercially - Nuance, SpeechWorks, Philips, Conversay, ART, etc., or as a research tool, that is speech recognition technologies of primarily research and development institutions such as – Byblos (BBN), Sphinx (CMU), HTK Speech Recognition Tool Kit (Cambridge University, Entropic and Microsoft), etc. Furthermore, all those systems require computer systems with powerful CPU’s (~1.5 GHz Pentiums) with large memory (512 Mbytes RAM) with Speech Recognition process itself requiring tens of hundreds of Mbytes for this feature alone to even run in real time. Additional advantage of the developed system is that it is designed also to run on a Fixed Point DSP, requiring less than 36.2 Kbytes of program memory space and 2 Kbytes for Model space, consuming less 2 Million Cycles per Second on a TI C62xx.
  • Inventor of 3 Patented Solutions – Patent Pending:
  • Working on Generalized scoring using Reversed and Normal Ordered Features for any Pattern Matching Method (e.g, DTW, HMM) to be filed for patent.
  • Designed and Assisted in Developed of Voice Data Collection System - necessary for research, development, testing and evaluation of the Wake-Up-Word Recognition System.
  • Performed and Managed 2 data collections over various calling environments (noisy, quiet, public, car, etc.) using various calling devices (cellular, landline, speaker phone. Created 2 Corpora from the recorded data. Those Corpora are used for:
  • Transcribed and/or Supervised transcription process of recorded data. Set up conventions and standards so that all the tools to be developed that use data of created Corpora comply with a clear set of standards.
  • Converted other (CallHome and PhoneBook) Corpora to this set of standards for easy and consistent use.
  • Directed and Supervised Code Conversion and Porting from Floating Point to Fixed Point of Wake-Up-Word Spotting Technology.
  • Developed Automated Process using combination of perl scripts and perl configuration files controlling various parameters affecting each step of the complex process of:
  • Voice Activity Detection Based on Cepstral Features.
  • Dynamic Time Warping (DTW) Matching using Reverse Ordered Feature Vectors.
  • Rescoring using Distribution Distortion Measurements of Dynamic Time Warping Match.
  • Building Models of a particular Wake-Up-Word
  • Testing and Evaluation of the System, and
  • Research, Development, and Refinement of Wake-Up-Word Recognition System,
  • Generating Features from a Voice Data Corpus,
  • Building a Model of a Wake-Up-Word (e.g., “Operator”, “Help”, “MapQuest”, “Verizon”, etc.) from the features,
  • Using built Model to test and evaluate Wake-Up-Word Recognition System,
  • Generating Performance Plots, Charts and Graphs.

Those scripts use numerous executables, gnuplot – a graph plotting tool, as well as other perl scripts. End result of this process is automatic generation of number of plots, charts, and graphs that depict performance of the system for easy evaluation and comparison.

  • Trained and Supervised a DSP engineer to port, test and evaluate Wake-Up-Word Technology.
  • Worked with Application Developers to integrate Wake-Up-Word Spotting Technology into a viable Demo and potentially viable product.
  • Wrote Technical Document and Manual for this Technology.
  • Consulted CTO in decision making process regarding Speech Recognition, Text to Speech, as well as Wake-Up-Words Spotting Technologies.

1999 – 2001

Speech Recognition Scientist SpeechWorks International, Inc., Product Group, 695 Atlantic Ave., Boston, MA 02111. USA.

  • Developed Noise Compensation Algorithm to increase recognition robustness against Noise and Channel varying characteristics.
  • Conducted Study of Wireless/Cellular vs. Wireline/Landline signal differences and their effect on recognition performance.
  • Developed Nonlinear Front End Signal Processing.
  • Performed Comparative Studies of various Speech Recognition Technologies (e.g., AT&T, NUANCE, SPEECHWORKS recognizers).
  • Developed algorithms to investigate various features (confidence score, acoustic score, etc.) and their optimal use for combining N-best lists produced by different features (mfcc, lpc, etc.) and different recognizers (segmental, HMM, Watson). Combining algorithm achieved significant error reduction as compared to the best.
  • Developed diphone clustering for HMM models to minimize model size.
  • Involved in re-alignment of acoustic segments for Text to Speech (TTS) model building data. Developed frame work for modular expansion and refinement of re-alignment process using perl scripts combined with perl configuration files. Implemented various heuristic rules to improve alignments generated by the Speech Recognizer to better fit TTS.
  • Developed data collection program for Dialogic JCT board that supports CSP. Developed, Run, Digested, Processed, “Call Environment Data Collection” using this application.

 

1997 - 1999

Scientist - GTE, BBN Technologies, Speech Solutions Group, 70 Fawcett St., Cambridge, MA 02138. USA.

  • Compiled and Analyzed BYBLOS (research speech recognition technology) and BBN HARK (commercial technology) system differences; Analyzed possible BYBLOS technologies for porting into BBN HARK; Developed and Coded Voice Model Filter that loads BYBLOS and/or BBN HARK training files and converts them into a new format files in compliance to designed specifications; Ran various tests (BYBLOS and BBN HARK) for Continuous Densities BBN HARK for benchmarking.
  • Peer reviewed a paper for Speech Communication Journal.

 

1993 - 1997

Speech Scientist – Voice Processing Corporation/Voice Control Systems, Advanced Technology Development Group,One Main Street,MA02142.USA.

  • Enhanced the performance of existing Front End of Speech Recognition System, implemented in VPro line of products, by designing a non-linear smoothing algorithm based on median filtering.
  • Developed and Implemented Dynamic Features that augmented existing Front End Features.
  • Developed a universal preprocessing module of the Front End that enables run-time front-end configuration, decompression, and sample-rate transformations of the original wave file.
  • Performed numerous tests that provided critical insights into enhancement and debugging of VProFlex Technology.
  • Invented, Developed and Integrated a very efficient novel Code Book Search strategy (internally named Fickle Search).
  • Compiled a condensed Internal Report of the Literature Review Study on different ways to perform fast FFT’s of a real valued sequence.
  • Developed, Tested, and Integrated Split Radix FFT algorithm. The function can handle any power of 2 Real Valued FFT’s.
  • Modified Front End to take advantage of higher FFT size and increased frequency resolution:

¨         Analyzed the conflicting effect of window size and type (higher frequency resolution causing break down of enhancement due to harmonics,

¨         Analyzed several possible modifications of enhancement algorithm to accommodate higher frequency resolution, and

¨         Proposed elimination of pitch harmonics from the spectrum with Homomorphic filtering or LPC - based Spectrum.

  • Implemented LPC based spectrum integrating it with existing Spectral Enhancement module of the Front End.
  • Initiated the study toward enhanced composition of boundary and internal acoustic phonemic features.
  • Invented, Developed, Ported, and extensively Tested a novel Noise Compensation with Speech Enhancement Algorithm. Also invented several integration strategies that take further advantage of the algorithm through a better interaction of the Front End with API. that take advantage of calibration when feasible. Default mode of operation is fully unsupervised in real-time.
  • Developed and ANN software tool currently supporting five different feed-forward back-propagation type of learning.
  • Developed a Pitch Tracking Algorithm based on enhanced Super-Resolution Pitch Determination Algorithm.

 

1990 – 1993

Post-Doctoral Research Associate - Swiss Federal Institute of Technology, IGP, ETH-Hönggerberg, CH-8093Zürich,Switzerland.

  • Swiss National Science Foundation Research Project in Image Understanding - Design and Analysis of Spatial Image Sequences

 

1985 – 1990

Teaching Assistant – Electrical and Computer Engineering Department.Clemson University.

  • Digital Processing of the Speech Signals, Digital Systems, Digital Circuit Design and Microprocessor Applications, Electronics, Programming.

 

1987 - 1990

Consultant - Engineering Research and Computer Services Department, Clemson University, Electrical and Computer Engineering Department,Clemson,SC29634-0915.USA.

  • Design and Development of a database system for processing of the expenditures of theCollegeofEngineering,ClemsonUniversity.
  • Design and Development of a database system prototype for automation of:

¨         Management of the repair and maintenance orders,

¨         Task allocation and duty assignment,

¨         Time-table management of the assigned personnel, and

¨         Generation of relevant statistical data.

 

1985 - 1986
Summer Job

Software Engineer - Keiltronix: Textile Control Systems Inc.2910 Horseshoe Lane, P.O. Box 1923, Charlotte, NC 28219.

  • Developed software for polling and analyzing data from peripheral machine controllers. Developed software for graphical display of status of a manufacturing dying process in real-time.
  • Development of Software Package using REGIS as a low-level software tool for dynamic display of the state of technological process in real-time.

 

1981 - 1984

Assistant Lecturer - Electrical Engineering Faculty,University ofPrishtina,Republic ofKosova.

  • Taught courses in Control Theory, Systems Theory, Algorithms, Digital Communications, Boolean Algebra, Digital Systems, and Programming.
  • Contributed in the publishing of the first Automatic Control Theory text book in Albanian Language.
  • Key member of the commission that prepared a detailed proposal for Advancement of Curricula of Electrical and Electronics Engineering Faculty.

Current Research

Wake-Up-Word Speech Recognition: http://spie.org/x42008.xml

http://cdn.intechweb.org/pdfs/15946.pdf

http://activities.fit.edu/crimsons/?p=2919

http://www.youtube.com/watch?v=OQ8eyBTbS_E

Selected Publications

PATENTS:

  • Dynamic Time Warping (DTW) Using Frequency Distributed Distance Measures: 6983246, January 3, 2006.
  • Scoring and Rescoring Dynamic Time Warping of Speech: 7085717, April 1, 2006.
  • Exploiting Differences in Correlations for Modeled and Un-Modeled Sequences by Transforming Trained Model Topology in Sequence Recognition: Provisional Patent Application, August 2009

BOOK CHAPTER

  • Këpuska, V "Wake-Up-Word Speech Recognition", Speech Technologies /Book 1, Intech, ISBN 978-953-307-152-7, February 2011.

JOURNAL PUBLICATIONS

  • Këpuska, V. et al. (2012). Energy Savings from using Mobile Smart Technologies, Journal of Renewable and Sustainable Energy, Submitted 2012
  • Këpuska, V., Xerxes, B., & Powers, S (2011)  Phoning Home: Bridging the Gap between Conservation and Convenience", JSTEM, 2012.
  • Këpuska, V, & Rojanasthien, P. (2011) Speech Corpus Generation from DVDs of Movies and TV Series, JITIM, 2011-2012
  • Këpuska, V (2010). Wake-Up-Word Recogntion. SPIE Newsroom, Oct 6 2010. DOI: 10.1117/2.1201009.003154 http://spie.org/x42008.xml?ArticleID=x42008
  • Rodriguez, W., Fiore, S., De Welde, K., Carstens, D., Këpuska, V. (2010). Ubiquitous Collaboration (uC) Learning, Ubiquitous Learning: Journal of International Technology and Information Management.
  • Këpuska, V., & Klein, T. (2009). On Wake-Up-Word Speech Recogntion Task, Technology, and Evaluation. Elsevier Journal of Nonlinear Analysis.
  • Këpuska, V., Gurbuz, S., Rodriguez, W., Fiore, S., Carstens, D., Converse, P., Metcalf, D. (2009). uC: Ubiquitous Collaboration Platform for Multimodal Team Interaction Support, Submitted to Journal of International Technology and Information Management (IJTIM), Invited Paper Special Issue on Knowledge Management and Business Intelligence
  • Këpuska, V. and Mason. S., (1995). A Neural Network Approach to Signalized Point Recognition in Aerial Photographs, Photogrammetric Engineering & Remote Sensing, Vol. 61, No. 7, pp. 917-925, July 1995.
  • Mason, S. and Këpuska, V., (1992). CONSENS: An Expert System for Photogrammetric Network Design, Allgemaine Vermessungs Nachrichten, pp. 384-393, September 1992.

CONFERENCE PUBLICATIONS:

  • Këpuska, V. (2012). Elevator Simulator, IEEE-ESPA, Las Vegas, 2012
  • Këpuska, V., & Shih, C. (2010). Prosodic Analysis of Alerting and Referential Contexts of Sentinel Words. Internatioanl Conference on Artificial Intelligence and Pattern Recogntion (AIPR'10), Orlando, Florida, 2010
  • Këpuska, V., & Klein, T. (2008). On Wake-Up-Word Speech Recogntion Task, Technology, and Evaluation Results against HTK and Microsoft SDK 5.1. Invited Paper: World Congress on Nonlinear Analysts, Orlando 2008, To appear in Journal of Nonlinear Analysis, Theory, Methods & Applications.
  • Beharry, X., Këpuska, V., Powers, S., Ramdhan, R., Rojanasthien, P., Weerasooriya, A., (2008). Patriot Robotic System Design, Florida Conference on Recent Advances in Robotics, FCRAR 2008
  • Këpuska, V., Carstens, D. S., & Wallace, R. (2006). Leading and Trailing Silence in Wake-Up-Word Speech Recognition, Proceedings of the International Conference: Industry, Engineering & Management Systems 2006, Cocoa Beach, FL., 259-266.
  • Këpuska V., (2006). Wake-Up-Word Application for First Responder Communication Enhancement, SPIE,Orlando, 2006.
  • Këpuska V., Rogers N., Patel M., (2006).  A MATLAB Tool for Speech Analysis, Processing and Recognition: SAR-LAB, ASEE, Chicago, 2006.
  • Kasza T., Shahsavari M., Këpuska V., Chen Ch., (2006). Communications Protocol for RF-based Indoor Wireless Localization Systems, SPIE,Orlando, 2006.
  • Anagnostopoulos G., Georgiopoulos M., Ports K., Richie S., White M., Këpuska V., Chan P. K., Wu A., Kysilka M., (2006).  Engaging Undergraduate Students in Machine Learning Research: Progress, Experiences and Achievements of Project EMD-MLR, Proceedings of the ASEE 2006 Annual Conference and Exposition, June 18-21, Chicago, Illinois.
  • Anagnostopoulos G., Georgiopoulos M., Ports K., Richie S., Cardinale N., White M., Këpuska V., Chan P., Wu A., Kysilka M., (2005).  Project EMD-MLR: Educational Material Development and Research in Machine Learning for Undergraduate Students, Session 3232, Proceedings of the ASEE 2005 Annual Conference and Exposition, June 12-15, Portland, Oregon.
  • Mason, S. and Këpuska, V., (1992). On the Representation of Close-Range Network Design Knowledge, XVII ISPRS Congress,WashingtonD.C., August 1992.
  • Këpuska, V. and Mason, S., (1991). Automatic Signalized Point Recognition with Feed-Forward Neural Network, IEE Second International conference on Artificial Neural Networks, Bournemouth, U.K., November, 1991.
  • Mason, S., Beyer, H., and Këpuska, V., (1991). An AI-based Photogrammetric Network Design System, First Australian Photogrammetric Conference,University of Newcastle,Australia, November 1991.
  • Këpuska, V. and Mason, S., (1991). Artificial Neural Network Approach to Signalized Point Recognition in Aerial Photographs, First Australian Photogrammetric Conference, University of Newcastle, Australia, November 1991.
  • Këpuska, V., Beyer, H. and Mason, S., (1991). Artificial Neural Networks for Calibration of CCD-Cameras, Workshop on Industrial Applications of Neural Networks, Ascona, Switzerland, September 1991.
  • Këpuska, V. and Gowdy, J., (1990). On the Effect of Topological Structure of the Kohonen Network on the Performance of the Hierarchical two Layered Isolated Word Recognition System, IEEE Southeastcon Symposium, New Orleans, April 1990.
  • Këpuska, V. and Gowdy, J., (1989). Investigation of Phonemic Context in Speech using Self-Organizing Feature Maps, IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP’89, Glasgow, Scotland, May 1989.
  • Këpuska, V. and Gowdy, J., (1989). Phonemic Speech Recognition Based on Neural Network, IEEE Southeastcon Symposium, Columbia, April 1989.
  • Këpuska, V. and Gowdy, J., (1988).  The Kohonen Net for Speaker Dependent Isolated Word Recognition, IEEE Southeastern Symposium on Systems Theory, UNCC Charlotte, March 1988.
  • Këpuska, V. and Gowdy, J., (1987).  Evaluation of Digital Signal Processing Chips for Speech Processing Applications, IEEE Southeastern Symposium on Systems Theory, Clemson University, Clemson, March 1987.
  • Këpuska, V. and Gacaferri, J., (1979). The Determination of the Polynomial Coefficients for Approximation of the EKG with Computer, (in Serbo-Croatian), Symposium JUREMA, Zagreb 1979.
  • Këpuska, V. and Mason. S., (1992) NFP23: Design and Analysis of Spatial Image Sequences, Wissentsschaflicher Bericht zum Schweizerischer Nationalfonds zer Förderung der Wissentsschaftlicher Forschung, 1992.
  • Këpuska, V. and Mason, S., (1992)  Design and Analysis of Spatial Image Sequences, NFP 23 Third Annual Status Report,Bern,July 6, 1992.
  • Këpuska, V. and Mason. S., (1991) NFP23: Design and Analysis of Spatial Image Sequences, Wissentsschaflicher Bericht zum Schweizerischer Nationalfonds zer Förderung der Wissentsschaftlicher Forschung, 1991.
  • Këpuska, V. and Mason, S., (1991) Design and Analysis of Spatial Image Sequences, NFP 23 Second Annual Status Report,Bern,June 5, 1992.
  • Mason, S. and Këpuska, V.,(1991) NFP 23: Design and Analysis of Spatial Image Sequences (Project Summary), SGAICO Newsletter, Swiss Group for Artificial Intelligence and Cognitive Science, 1991.