<template>
    <div class='project-page'>
        <ProjectHero :src="'NoiseDecoder/rolls-2.jpg'"
        :sides="7"  :min-radius="0.3" :theme-color="themeColor">
            <template v-slot:title>
                Noise Decoder
            </template>
            <template v-slot:subtitle>
                Concrete Poetry of Fleeting Sounds
            </template>
        </ProjectHero>
        <v-divider></v-divider>
        <v-row class="project-statement-wrapper wrapper">
            <v-col>
                <span class="project-statement-title">Project Statement</span>
                <span class="project-statement-text">The Noise Decoder is a machine with a piece of code that classifies environmental noises into words in the human language. From Iteration 1 to Iteration 4, it lives inside an Arduino, a laptop, and a Raspberry Pi; its output ranges from words printed on an LCD screen and on a webpage to words printed on a roll of thermal paper.  I also explored various methods for sound feature selection and classification, such as MFCCs and neural networks. Besides its aesthetically pleasing output, applications for the Noise Decoder span from a pure leisure activity and a fun demonstration of machine learning technology to potential use in the auto-generation of movie subtitles for environmental sounds and a machine that bridges creative audio and visual expressions.
                </span>
            </v-col>
        </v-row>
        <v-divider></v-divider>
        <v-row class='wrapper team-wrapper'>
            <v-col>
                <span>Team: <b>Yumeng Zhuang</b>, David Faizi</span>
                <span>My Role: Everything except circuitry and diagrams in Iteration 1</span>
                <span>Time: September 2019, September 2020</span>
            </v-col>
        </v-row>
        <v-divider></v-divider>
        <v-row class="wrapper">
            <v-col md='12' sm='12'>
                <span class='section-title'>Inspiration: Macchina Poetica</span>
                <span>This project was inspired by the project <b>Macchina Poetica</b>, a device that prints onomatopoeic words when certain metal parts on the box are hit (Romano, 2016).</span>
                <span>I was attracted to the idea of capturing meaningless, fleeting sounds with text and even images, because it expands the <b>dimensions of human experience</b> with sounds and provides room for creative expressions. </span>
            </v-col>
        </v-row>
        <v-divider></v-divider>
        <v-row class="wrapper">
            <v-col md='12' sm='12'>
                <span class='section-title'>Iteration 1: Arduino</span>
                <span>Iteration 1 uses an <b>Arduino board with a microphone</b> to collect sounds and a <b>neural network</b> to classify sounds in real time. </span>
            </v-col>
        </v-row>
        <v-row class="wrapper">
            <v-col md='4' sm='12' class='text-center'>
                <v-img class="illustration-icon"
                src="@/assets/NoiseDecoder/noun_pcb_3160544.svg"></v-img>
                <span>Setup the microphone piece and LCD display on the circuit board</span>
            </v-col>
            <v-col md='4' sm='12' class='text-center'>
                <v-img class="illustration-icon"
                src="@/assets/NoiseDecoder/noun_Laptop_3574496.svg"></v-img>
                <span>Setup the code used in Arduino IDE and analyze data in Python</span>
            </v-col>
            <v-col md='4' sm='12' class='text-center'>
                <v-img class="illustration-icon"
                src="@/assets/NoiseDecoder/noun_integrated circuit_2833946.svg"></v-img>
                <span>Train the model to differentiate sound types and display on the LCD screen</span>
            </v-col>
        </v-row>
        <v-divider></v-divider>
        <v-row class="wrapper">
            <v-col md='12' sm='12'>
                <span class='section-subtitle'>Feature Extraction</span>
                <span><b>Fast Hartley Transform</b> (FHT), similar to <b>Fast Fourier Transform</b> (FFT), turns signals in the <b>time</b> domain, i.e. amplitudes as a function of time, to signals in the <b>frequency</b> domain. </span>
                <ul>
                    <li>Arduino is reclocked to run at 38.4kHz</li>
                    <li>FHT Length is 256</li>
                    <li>Microphone sample length is 2048</li>
                    <li>Top 10 frequencies are fed into the neural network</li>
                </ul>
            </v-col>
            <v-col md='6' sm='12'  class='text-center'>
                <v-img class="illustration-icon" src="@/assets/NoiseDecoder/illustrations_waveform.svg"></v-img>
                <span class='pt-3'>waveform of sound</span>
            </v-col>
            <v-col md='6' sm='12'  class='text-center'>
                <v-img  class="illustration-icon" src="@/assets/NoiseDecoder/illustrations_frequency_peaks.svg"></v-img>
                <span class='pt-3'>frequency peaks</span>
            </v-col>
        </v-row>
        <v-divider></v-divider>
        <v-row class="wrapper">
            <v-col md='6' sm='12'>
                <span class='section-subtitle'>Fully Connected Neural Network</span>
                <span>Our multilayer classification model consists of <b>10 inputs</b>, one hidden layer with <b>7 nodes</b>, and <b>4 outputs</b>. </span>
                <span>It classifies sounds by adjusting weights in the matrices in the direction that minimizes the discrepancy between predicted labels and the labels we provided.</span> 
            </v-col>
            <v-col md='6' sm='12'>
                <v-img class="chart-ann" src="@/assets/NoiseDecoder/illustrations_ANN.svg"></v-img>
            </v-col>
        </v-row>
        <v-divider></v-divider>
        <v-row class="wrapper">
            <v-col md='6' sm='12'>
                <v-img src="@/assets/NoiseDecoder/illustrations_circuit_diagram.svg"></v-img>
            </v-col>
            <v-col md='6' sm='12'>
                <span class='section-subtitle'>Circuit Components</span>
                <span>Component list:</span>
                <ol type='a'>
                    <li>10k Potentiometer</li>
                    <li>18 wires used</li>
                    <li>LCD 1602 module (with pin header)</li>
                    <li>Arduino Uno R3 controller board</li>
                    <li>USB cable</li>
                    <li>Microphone piece</li>
                </ol>
            </v-col>
        </v-row>
        <v-row class="wrapper">
            <v-col md='12' sm='12'>
                <span class='section-subtitle'>The Circuit We Built</span>
                <v-img src="@/assets/NoiseDecoder/illustrations_real_circuit.svg"></v-img>
            </v-col>
        </v-row>
        <v-divider></v-divider>
        <v-row class="wrapper">
            <v-col md='12' sm='12'>
                <span class='section-subtitle'>Data Taking and Analysis</span>
                <ul>
                    <li><b>310 clips</b> of training data were taken by Arduino on two different days in the lab to ensure generality. We produced the sound, printed its FHT result to Serial, and copied that to a txt file.
                    <li>Then, a <b>jupyter notebook</b> read in training data sets and generated true labels for training.</li>
                    <li>After <b>100000 epochs</b> of training we achieved an accuracy of 83.2%.</li>
                    <li>We printed out the <b>matrices</b> in a python notebook and copied them to Arduino.</li>
                    <li>We reproduced the exact <b>same model</b> on Arduino.</li>
                </ul>
            </v-col>
        </v-row>
        <v-row class="wrapper">
            <v-col md='6' sm='12'>
                <v-img src="@/assets/NoiseDecoder/FHT_jingle.png"  max-height="320" contain></v-img>
                <v-img src="@/assets/NoiseDecoder/FHT_knock.png"  max-height="320" contain></v-img>
                <span>The top ten normalized frequencies vary a lot between sound types: top: jingle, bottom: knock.</span>
            </v-col>
            <v-col md='6' sm='12' class='text-center model-in-python'>
                <span class="mt-5 image-caption">The model in Python</span>
                <v-img  src="@/assets/NoiseDecoder/model-in-python.png"  max-height="150" contain ></v-img>
                <span class="mt-6 image-caption">The model on Arduino</span>
                <v-img src="@/assets/NoiseDecoder/arduino_code.png" max-height="450" contain></v-img>
            </v-col>
        </v-row>
        <v-divider></v-divider>
        <v-row class="wrapper">
            <v-col md='12' sm='12'>
                <span class='section-subtitle'>Demonstration</span>
                <iframe src="https://player.vimeo.com/video/482205032" width="640" height="360" frameborder="0" allow="autoplay; fullscreen" allowfullscreen></iframe>
            </v-col>
        </v-row>
        <v-divider></v-divider>
        <v-row class="wrapper">
            <v-col md='12' sm='12'>
                <span class='section-subtitle'>Accuracy</span>
                <span> We performed accuracy tests at two different locations to see <b>how well the model generalized</b>. The overall performance was quite satisfactory, with knock having the highest accuracy in the lab and jingle in my apartment. Knock had lower accuracy in my apartment because the table I was knocking on was much smaller than the one in the lab, so they did not sound similar even to a human. </span>
                <v-img class="chart" src="@/assets/NoiseDecoder/performance_1lab.png"></v-img>
                <v-img class="chart" src="@/assets/NoiseDecoder/performance_1apartment.png"></v-img>
            </v-col>
        </v-row>
        <v-divider></v-divider>
        <v-row class="wrapper">
            <v-col md='12' sm='12'>
                <span class='section-title'>Iteration 2: Website</span>
                <span>Iteration 2 uses <b>Python</b> to extract features via <b>Fast Fourier Transform</b> (FFT), infer sound names, and serve <b>a website</b> to display the results. </span>
            </v-col>
        </v-row>
        <v-divider></v-divider>
        <v-row class="wrapper">
            <v-col md='12' sm='12'>
                <span class='section-subtitle'>Inspiration: Eugene Gomringer</span>
                <ul>
                    <li>Concrete Poetry.</li>
                    <li>Power of individual words in the society (e.g. slogans and advertisement).</li>
                    <li>Poetry should regain its place in the society.</li>
                    <li>The Noise Decoder generates the opposite of “silence“.</li>
                </ul>
                <v-img src="@/assets/NoiseDecoder/schweigen.jpg"></v-img>
            </v-col>
        </v-row>
        <v-divider></v-divider>
        <v-row class="wrapper">
            <v-col md='12' sm='12'>
                <span class='section-subtitle'>Feature Extraction</span>
                <span>Iteration 2 uses <b>NumPy’s FFT</b> function. </span>
                <ul>
                    <li>Sampling rate is 44.1 kHz and FFT length is 2048</li>
                    <li>Use top 20 frequencies from FFT</li>
                    <li>Feed 10 blocks of 20 frequencies with stride 1 into the network</li>
                    <li>The machine is expected to classify the 10 by 20 images shown below</li>
                </ul>
            </v-col>
            <v-col md='12' sm='12'>
                <v-row>
                    <v-col md='6' sm='12'>
                        <v-img src="@/assets/NoiseDecoder/fft-jingle.png"></v-img>
                    </v-col>
                    <v-col md='6' sm='12' class='d-flex flex-column-reverse'>
                        <span>Jingling of a key chain has many high frequency components, displayed as yellow in the figure.</span>
                    </v-col>
                </v-row>
            </v-col>
            <v-col md='12' sm='12'>
                <v-row>
                    <v-col md='6' sm='12' class='d-flex flex-column-reverse'>
                        <span>Knocking a table, on the other hand, produces low frequency sounds. </span>
                    </v-col>
                    <v-col md='6' sm='12'>
                        <v-img src="@/assets/NoiseDecoder/fft-knock.png"></v-img>
                    </v-col>
                    
                </v-row>
            </v-col>
        </v-row>
        <v-divider></v-divider>
        <v-row class="wrapper">
            <v-col md='2' sm='12'>
                <v-img class='illustration-icon'
                src="@/assets/NoiseDecoder/noun_Laptop_3574496.svg"></v-img>
            </v-col>
            <v-col md='10' sm='12'>
                <span class='section-subtitle'>Hardware and Display</span>
                <ul>
                    <li>Laptop Microphone</li>
                    <li>Display words on a webpage hosted with Flask</li>
                </ul>
            </v-col>
        </v-row>
        <v-divider></v-divider>
        <v-row class="wrapper">
            <v-col md='12' sm='12'>
                <span class='section-subtitle'>Training Results</span>
                <ul>
                    <li>The multilayer classifier consists of <b>200 inputs</b>, one hidden layer with <b>7 nodes</b>, and <b>5 outputs</b>. </li>
                    <li>I terminated training after <b>100000 epochs</b>, <b>stopping early</b> if the performance on the testing set deteriorated. </li>
                    <li>The confusion map of this model is shown below.  </li>
                    <li>Water is most likely to be confused with clap and jingle. </li>
                </ul>
                <v-img class="chart" src="@/assets/NoiseDecoder/confusion-7-5-fft.png"></v-img>
            </v-col>
        </v-row>
        <v-divider></v-divider>
        <v-row class="wrapper">
            <v-col md='12' sm='12'>
                <span class='section-subtitle'>Visualization</span>
                <span> The Noise Decoder reacting to sounds I produced. The <b>size of the words</b> represents the volume of the sound. </span>
                <v-img src="@/assets/NoiseDecoder/Noise_Decoder.gif"></v-img>
            </v-col>
        </v-row>
        <v-divider></v-divider>
        <v-row class="wrapper">
            <v-col md='12' sm='12'>
                <span class='section-title'>Iteration 3: MFCCS</span>
                <span>Iteration 3 uses <b>20 MFCCs</b> instead of 20 FFT frequencies.</span>
                <span><b>Mel-frequency cepstral coefficients</b> (MFCCs) are coefficients that make up an MFC, or mel-frequency cepstrum. MFC approximates the <b>human auditory system’s response</b> closely and is widely used for speech recognition (Wikipedia, 2020).</span>
                <ul>
                    <li>MFCCs window length is <b>2048</b>, <b>stride is 512</b> and a block of data contains 20480 samples and <b>41 frames</b>. </li>
                    <li>There are <b>20 coefficients</b> in each frame, forming a sample with <b>820 samples</b>.</li>
                </ul>
                <v-img class="chart" src="@/assets/NoiseDecoder/mfccs-jingle.png"></v-img>
                <span>MFCCs of jingle has many high frequency components, shown as faint yellow towards the right of the images. </span>
                <v-img class="chart" src="@/assets/NoiseDecoder/mfccs-knock.png"></v-img>
                <span>The majority of the power in knock is stored in the low frequency part. </span>
            </v-col>
        </v-row>
        <v-divider></v-divider>
        <v-row class="wrapper">
            <v-col md='12' sm='12'>
                <span class='section-subtitle'>Training Results</span>
                <ul>
                    <li>The multilayer classifier consists of <b>820 inputs</b>, one hidden layer with <b>7 nodes</b>, and <b>5 outputs</b>. </li>
                    <li>Again, I terminated training after <b>100000 epochs</b>, <b>stopping early</b> if the performance on the testing set deteriorated. </li>
                    <li>The confusion map of this model is shown below.  </li>
                    <li>It is much better than FFT.</li>
                    <li>Accuracy in every category is close to 100%. </li>
                </ul>
                <v-img class="chart" src="@/assets/NoiseDecoder/confusion-7-5-mfccs-life.png"></v-img>
            </v-col>
        </v-row>
        <v-divider></v-divider>
        <v-row class="wrapper">
            <v-col md='12' sm='12'>
                <span class='section-title'>Iteration 4: Raspberry Pi + Printer</span>
                <span>Iteration 4 increases the <b>portability</b> of the system and enhances the interaction between the user and the machine by deploying <b>Python</b> code onto <b>a Raspberry Pi</b> and printing results onto a <b>receipt</b>. </span>
            </v-col>
        </v-row>
        <v-row class="wrapper">
            <v-col md='6' sm='12'>
                <v-img src="@/assets/NoiseDecoder/both.png"></v-img>
            </v-col>
            <v-col md='6' sm='12'>
                <span><b>Raspberry Pi 4</b> and a <b>microphone</b> are used for data collection and prediction.</span> 
                <span>A <b>thermal receipt printer</b> with Python library <b>ESC/POS</b>  prints out the predictions in real time. </span>
                <span><b>MFCCs</b> are extracted via public code online (ILM, 2017) instead of using the librosa library which could not compile on my Raspberry Pi.  </span>
                <v-img src='@/assets/NoiseDecoder/illustrations_pi.svg'></v-img>
            </v-col>
        </v-row>
        <v-row class="wrapper">
            <v-col md='12'>
            </v-col>
        </v-row>
        <v-row class="wrapper">
            <v-col md='6' sm="12">
                <v-img src='@/assets/NoiseDecoder/many_strips.png'></v-img>
            </v-col>
            <v-col md='6' sm="12">
                <v-img src='@/assets/NoiseDecoder/many_strips-2.jpg'></v-img>
                <span class="pt-2"> Rolls of paper printed by the Noise Decoder. </span>
            </v-col>
        </v-row>
        <v-divider></v-divider>
        <v-row class="wrapper">
            <v-col md='12' sm='12'>
                <span class='section-subtitle'>Performance</span>
                <span>The device <b>performed well</b> with knock, water, and clap, while ahh and jingle matched <b>less than half of the time</b>, even when the training accuracy was high.</span> 
                <span>The average accuracy was 73%.</span>
                <span>My hypothesis is that the microphone I got for my raspberry pi was <b>poor at recording high frequency sounds</b>, which were ahh and jingle. Jingle can be quite weak, so when the sound threshold was set to match that of other sounds, the high frequency sounds could not pass the threshold and be classified.</span>
                <v-img class="chart" src="@/assets/NoiseDecoder/performance_mfccs.png"></v-img>
            </v-col>
        </v-row>
        <v-divider></v-divider>
        <v-row class="wrapper">
            <v-col md='12' sm='12'>
                <span class='section-title'>Prospects</span>
                <span>The iterations of Noise Decoders clearly documented <b>my exploration</b> with hardware components, different feature selection codes and libraries, as well as the evolving design for the device itself. I envision the core Noise Decoder idea, namely environmental sound to word conversion, to be a <b>flexible language</b> with which both <b>practical</b> and <b>expressive</b> use may be written. </span>
                <span>The Noise Decoder algorithm can be improved to <b>include more sounds</b>, sounds of different lengths, and possibly <b>meaningful sequences</b> of sounds. With these features implemented, the Noise Decoder could be used to generate <b>automatic subtitles</b> for videos, potentially benefiting people with hearing difficulties. </span>
                <span>I have several ideas to use the Noise Decoder as a tool for <b>creative expression</b>. For example, it could classify phonemes, and <b>decompose normal speech</b> into a seemingly undecipherable sequence of symbols. It could also learn to classify musical instruments, thereby <b>converting songs</b> into text with a similar rhythm.</span>
            </v-col>
        </v-row>
        <v-divider></v-divider>
        <v-row class="wrapper reference">
            <v-col md='12' sm='12'>
                <span class='section-title'>References and Attributions</span>
            </v-col>
        </v-row>
        <v-row class="wrapper reference">
            <v-col md='12' sm='12'>
                <span class='section-subtitle'>References</span>
                <ul>
                    <li>ILM. (2017) <i>MFCC theory and implementation</i> [Online]. Available at https://www.kaggle.com/ilyamich/mfcc-implementation-and-tutorial (Accessed: 24 October 2020)</li>
                    <li>Romano, Z. (2016) <i>Macchina poetica converts sounds into onomatopoeic words</i> [Online]. Available at https://blog.arduino.cc/2016/02/17/macchina-poetica/#more-13106 (Accessed: 24 October 2020)</li>
                    <li>Wikipedia. (2020) <i>Mel-frequency cepstrum</i> [Online]. Available at https://en.wikipedia.org/wiki/Mel-frequency_cepstrum (Accessed: 24 October 2020)</li>
                </ul>
                <span class='section-subtitle'>Attributions</span>
                <ul>
                    <li>schweigen https://www.tagesspiegel.de/kultur/umstrittenes-hauswand-gedicht-vier-ansichten-zum-umgang-mit-den-gomringer-versen/20938218.html</li>
                    <li>pcb by Eucalyp from the Noun Project</li>
                    <li>integrated circuit by Bombasticon Studio from the Noun Project</li>
                    <li>Laptop by Marcel Boer from the Noun Project</li>
                </ul>
            </v-col>
        </v-row>
    </div>
</template>
<script>
//import colors from 'vuetify/lib/util/colors';
import ProjectHero from '@/components/ProjectHero.vue'
//import AnimatedPolygon from "@/components/AnimatedPolygon.vue"

export default {
    name:"NoiseDecoder",
    components:{
        //AnimatedPolygon,
        ProjectHero
    },
    data: ()=>{
        return {
            //themeColor: '#504de5'
        }
    },
    computed:{
        //...mapState(['entries']), 
        themeColor(){
            for (let entry of this.$store.state.compDesEntries){
                console.log("find theme color", entry)
                if(entry.title=="Noise Decoder"){
                    return entry.titleBg
                }
            }
            return "#000000"
        }
    },
}
</script>
<style scoped>
.illustration-icon{
    max-width: 150px;
    max-height: 150px;
    margin: auto;
}

.image-caption{
    margin-bottom: 0px;
}

.chart{
    max-width: 400px;
    margin: auto;
}

.chart-ann{
    max-width: 300px;
}

.model-in-python{
    background-color: #f7f7f7;
}
</style>