<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>𝚃𝚛𝚊𝚗𝚜𝚙𝚘𝚗𝚜𝚝𝚎𝚛</title>
    <link>https://almostkapil.netlify.com/</link>
      <atom:link href="https://almostkapil.netlify.com/index.xml" rel="self" type="application/rss+xml" />
    <description>𝚃𝚛𝚊𝚗𝚜𝚙𝚘𝚗𝚜𝚝𝚎𝚛</description>
    <generator>Source Themes Academic (https://sourcethemes.com/academic/)</generator><language>en-us</language><copyright>© 2018 Kapil Khanal</copyright><lastBuildDate>Sat, 01 Jun 2030 13:00:00 +0000</lastBuildDate>
    <image>
      <url>https://almostkapil.netlify.com/img/aph-salt-spring-zoom.jpg</url>
      <title>𝚃𝚛𝚊𝚗𝚜𝚙𝚘𝚗𝚜𝚝𝚎𝚛</title>
      <link>https://almostkapil.netlify.com/</link>
    </image>
    
    <item>
      <title>Example Page 1</title>
      <link>https://almostkapil.netlify.com/courses/example/example1/</link>
      <pubDate>Sun, 05 May 2019 00:00:00 +0100</pubDate>
      <guid>https://almostkapil.netlify.com/courses/example/example1/</guid>
      <description>

&lt;p&gt;In this tutorial, I&amp;rsquo;ll share my top 10 tips for getting started with Academic:&lt;/p&gt;

&lt;h2 id=&#34;tip-1&#34;&gt;Tip 1&lt;/h2&gt;

&lt;p&gt;Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis posuere tellus ac convallis placerat. Proin tincidunt magna sed ex sollicitudin condimentum. Sed ac faucibus dolor, scelerisque sollicitudin nisi. Cras purus urna, suscipit quis sapien eu, pulvinar tempor diam. Quisque risus orci, mollis id ante sit amet, gravida egestas nisl. Sed ac tempus magna. Proin in dui enim. Donec condimentum, sem id dapibus fringilla, tellus enim condimentum arcu, nec volutpat est felis vel metus. Vestibulum sit amet erat at nulla eleifend gravida.&lt;/p&gt;

&lt;p&gt;Nullam vel molestie justo. Curabitur vitae efficitur leo. In hac habitasse platea dictumst. Sed pulvinar mauris dui, eget varius purus congue ac. Nulla euismod, lorem vel elementum dapibus, nunc justo porta mi, sed tempus est est vel tellus. Nam et enim eleifend, laoreet sem sit amet, elementum sem. Morbi ut leo congue, maximus velit ut, finibus arcu. In et libero cursus, rutrum risus non, molestie leo. Nullam congue quam et volutpat malesuada. Sed risus tortor, pulvinar et dictum nec, sodales non mi. Phasellus lacinia commodo laoreet. Nam mollis, erat in feugiat consectetur, purus eros egestas tellus, in auctor urna odio at nibh. Mauris imperdiet nisi ac magna convallis, at rhoncus ligula cursus.&lt;/p&gt;

&lt;p&gt;Cras aliquam rhoncus ipsum, in hendrerit nunc mattis vitae. Duis vitae efficitur metus, ac tempus leo. Cras nec fringilla lacus. Quisque sit amet risus at ipsum pharetra commodo. Sed aliquam mauris at consequat eleifend. Praesent porta, augue sed viverra bibendum, neque ante euismod ante, in vehicula justo lorem ac eros. Suspendisse augue libero, venenatis eget tincidunt ut, malesuada at lorem. Donec vitae bibendum arcu. Aenean maximus nulla non pretium iaculis. Quisque imperdiet, nulla in pulvinar aliquet, velit quam ultrices quam, sit amet fringilla leo sem vel nunc. Mauris in lacinia lacus.&lt;/p&gt;

&lt;p&gt;Suspendisse a tincidunt lacus. Curabitur at urna sagittis, dictum ante sit amet, euismod magna. Sed rutrum massa id tortor commodo, vitae elementum turpis tempus. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aenean purus turpis, venenatis a ullamcorper nec, tincidunt et massa. Integer posuere quam rutrum arcu vehicula imperdiet. Mauris ullamcorper quam vitae purus congue, quis euismod magna eleifend. Vestibulum semper vel augue eget tincidunt. Fusce eget justo sodales, dapibus odio eu, ultrices lorem. Duis condimentum lorem id eros commodo, in facilisis mauris scelerisque. Morbi sed auctor leo. Nullam volutpat a lacus quis pharetra. Nulla congue rutrum magna a ornare.&lt;/p&gt;

&lt;p&gt;Aliquam in turpis accumsan, malesuada nibh ut, hendrerit justo. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Quisque sed erat nec justo posuere suscipit. Donec ut efficitur arcu, in malesuada neque. Nunc dignissim nisl massa, id vulputate nunc pretium nec. Quisque eget urna in risus suscipit ultricies. Pellentesque odio odio, tincidunt in eleifend sed, posuere a diam. Nam gravida nisl convallis semper elementum. Morbi vitae felis faucibus, vulputate orci placerat, aliquet nisi. Aliquam erat volutpat. Maecenas sagittis pulvinar purus, sed porta quam laoreet at.&lt;/p&gt;

&lt;h2 id=&#34;tip-2&#34;&gt;Tip 2&lt;/h2&gt;

&lt;p&gt;Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis posuere tellus ac convallis placerat. Proin tincidunt magna sed ex sollicitudin condimentum. Sed ac faucibus dolor, scelerisque sollicitudin nisi. Cras purus urna, suscipit quis sapien eu, pulvinar tempor diam. Quisque risus orci, mollis id ante sit amet, gravida egestas nisl. Sed ac tempus magna. Proin in dui enim. Donec condimentum, sem id dapibus fringilla, tellus enim condimentum arcu, nec volutpat est felis vel metus. Vestibulum sit amet erat at nulla eleifend gravida.&lt;/p&gt;

&lt;p&gt;Nullam vel molestie justo. Curabitur vitae efficitur leo. In hac habitasse platea dictumst. Sed pulvinar mauris dui, eget varius purus congue ac. Nulla euismod, lorem vel elementum dapibus, nunc justo porta mi, sed tempus est est vel tellus. Nam et enim eleifend, laoreet sem sit amet, elementum sem. Morbi ut leo congue, maximus velit ut, finibus arcu. In et libero cursus, rutrum risus non, molestie leo. Nullam congue quam et volutpat malesuada. Sed risus tortor, pulvinar et dictum nec, sodales non mi. Phasellus lacinia commodo laoreet. Nam mollis, erat in feugiat consectetur, purus eros egestas tellus, in auctor urna odio at nibh. Mauris imperdiet nisi ac magna convallis, at rhoncus ligula cursus.&lt;/p&gt;

&lt;p&gt;Cras aliquam rhoncus ipsum, in hendrerit nunc mattis vitae. Duis vitae efficitur metus, ac tempus leo. Cras nec fringilla lacus. Quisque sit amet risus at ipsum pharetra commodo. Sed aliquam mauris at consequat eleifend. Praesent porta, augue sed viverra bibendum, neque ante euismod ante, in vehicula justo lorem ac eros. Suspendisse augue libero, venenatis eget tincidunt ut, malesuada at lorem. Donec vitae bibendum arcu. Aenean maximus nulla non pretium iaculis. Quisque imperdiet, nulla in pulvinar aliquet, velit quam ultrices quam, sit amet fringilla leo sem vel nunc. Mauris in lacinia lacus.&lt;/p&gt;

&lt;p&gt;Suspendisse a tincidunt lacus. Curabitur at urna sagittis, dictum ante sit amet, euismod magna. Sed rutrum massa id tortor commodo, vitae elementum turpis tempus. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aenean purus turpis, venenatis a ullamcorper nec, tincidunt et massa. Integer posuere quam rutrum arcu vehicula imperdiet. Mauris ullamcorper quam vitae purus congue, quis euismod magna eleifend. Vestibulum semper vel augue eget tincidunt. Fusce eget justo sodales, dapibus odio eu, ultrices lorem. Duis condimentum lorem id eros commodo, in facilisis mauris scelerisque. Morbi sed auctor leo. Nullam volutpat a lacus quis pharetra. Nulla congue rutrum magna a ornare.&lt;/p&gt;

&lt;p&gt;Aliquam in turpis accumsan, malesuada nibh ut, hendrerit justo. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Quisque sed erat nec justo posuere suscipit. Donec ut efficitur arcu, in malesuada neque. Nunc dignissim nisl massa, id vulputate nunc pretium nec. Quisque eget urna in risus suscipit ultricies. Pellentesque odio odio, tincidunt in eleifend sed, posuere a diam. Nam gravida nisl convallis semper elementum. Morbi vitae felis faucibus, vulputate orci placerat, aliquet nisi. Aliquam erat volutpat. Maecenas sagittis pulvinar purus, sed porta quam laoreet at.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Example Page 2</title>
      <link>https://almostkapil.netlify.com/courses/example/example2/</link>
      <pubDate>Sun, 05 May 2019 00:00:00 +0100</pubDate>
      <guid>https://almostkapil.netlify.com/courses/example/example2/</guid>
      <description>

&lt;p&gt;Here are some more tips for getting started with Academic:&lt;/p&gt;

&lt;h2 id=&#34;tip-3&#34;&gt;Tip 3&lt;/h2&gt;

&lt;p&gt;Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis posuere tellus ac convallis placerat. Proin tincidunt magna sed ex sollicitudin condimentum. Sed ac faucibus dolor, scelerisque sollicitudin nisi. Cras purus urna, suscipit quis sapien eu, pulvinar tempor diam. Quisque risus orci, mollis id ante sit amet, gravida egestas nisl. Sed ac tempus magna. Proin in dui enim. Donec condimentum, sem id dapibus fringilla, tellus enim condimentum arcu, nec volutpat est felis vel metus. Vestibulum sit amet erat at nulla eleifend gravida.&lt;/p&gt;

&lt;p&gt;Nullam vel molestie justo. Curabitur vitae efficitur leo. In hac habitasse platea dictumst. Sed pulvinar mauris dui, eget varius purus congue ac. Nulla euismod, lorem vel elementum dapibus, nunc justo porta mi, sed tempus est est vel tellus. Nam et enim eleifend, laoreet sem sit amet, elementum sem. Morbi ut leo congue, maximus velit ut, finibus arcu. In et libero cursus, rutrum risus non, molestie leo. Nullam congue quam et volutpat malesuada. Sed risus tortor, pulvinar et dictum nec, sodales non mi. Phasellus lacinia commodo laoreet. Nam mollis, erat in feugiat consectetur, purus eros egestas tellus, in auctor urna odio at nibh. Mauris imperdiet nisi ac magna convallis, at rhoncus ligula cursus.&lt;/p&gt;

&lt;p&gt;Cras aliquam rhoncus ipsum, in hendrerit nunc mattis vitae. Duis vitae efficitur metus, ac tempus leo. Cras nec fringilla lacus. Quisque sit amet risus at ipsum pharetra commodo. Sed aliquam mauris at consequat eleifend. Praesent porta, augue sed viverra bibendum, neque ante euismod ante, in vehicula justo lorem ac eros. Suspendisse augue libero, venenatis eget tincidunt ut, malesuada at lorem. Donec vitae bibendum arcu. Aenean maximus nulla non pretium iaculis. Quisque imperdiet, nulla in pulvinar aliquet, velit quam ultrices quam, sit amet fringilla leo sem vel nunc. Mauris in lacinia lacus.&lt;/p&gt;

&lt;p&gt;Suspendisse a tincidunt lacus. Curabitur at urna sagittis, dictum ante sit amet, euismod magna. Sed rutrum massa id tortor commodo, vitae elementum turpis tempus. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aenean purus turpis, venenatis a ullamcorper nec, tincidunt et massa. Integer posuere quam rutrum arcu vehicula imperdiet. Mauris ullamcorper quam vitae purus congue, quis euismod magna eleifend. Vestibulum semper vel augue eget tincidunt. Fusce eget justo sodales, dapibus odio eu, ultrices lorem. Duis condimentum lorem id eros commodo, in facilisis mauris scelerisque. Morbi sed auctor leo. Nullam volutpat a lacus quis pharetra. Nulla congue rutrum magna a ornare.&lt;/p&gt;

&lt;p&gt;Aliquam in turpis accumsan, malesuada nibh ut, hendrerit justo. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Quisque sed erat nec justo posuere suscipit. Donec ut efficitur arcu, in malesuada neque. Nunc dignissim nisl massa, id vulputate nunc pretium nec. Quisque eget urna in risus suscipit ultricies. Pellentesque odio odio, tincidunt in eleifend sed, posuere a diam. Nam gravida nisl convallis semper elementum. Morbi vitae felis faucibus, vulputate orci placerat, aliquet nisi. Aliquam erat volutpat. Maecenas sagittis pulvinar purus, sed porta quam laoreet at.&lt;/p&gt;

&lt;h2 id=&#34;tip-4&#34;&gt;Tip 4&lt;/h2&gt;

&lt;p&gt;Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis posuere tellus ac convallis placerat. Proin tincidunt magna sed ex sollicitudin condimentum. Sed ac faucibus dolor, scelerisque sollicitudin nisi. Cras purus urna, suscipit quis sapien eu, pulvinar tempor diam. Quisque risus orci, mollis id ante sit amet, gravida egestas nisl. Sed ac tempus magna. Proin in dui enim. Donec condimentum, sem id dapibus fringilla, tellus enim condimentum arcu, nec volutpat est felis vel metus. Vestibulum sit amet erat at nulla eleifend gravida.&lt;/p&gt;

&lt;p&gt;Nullam vel molestie justo. Curabitur vitae efficitur leo. In hac habitasse platea dictumst. Sed pulvinar mauris dui, eget varius purus congue ac. Nulla euismod, lorem vel elementum dapibus, nunc justo porta mi, sed tempus est est vel tellus. Nam et enim eleifend, laoreet sem sit amet, elementum sem. Morbi ut leo congue, maximus velit ut, finibus arcu. In et libero cursus, rutrum risus non, molestie leo. Nullam congue quam et volutpat malesuada. Sed risus tortor, pulvinar et dictum nec, sodales non mi. Phasellus lacinia commodo laoreet. Nam mollis, erat in feugiat consectetur, purus eros egestas tellus, in auctor urna odio at nibh. Mauris imperdiet nisi ac magna convallis, at rhoncus ligula cursus.&lt;/p&gt;

&lt;p&gt;Cras aliquam rhoncus ipsum, in hendrerit nunc mattis vitae. Duis vitae efficitur metus, ac tempus leo. Cras nec fringilla lacus. Quisque sit amet risus at ipsum pharetra commodo. Sed aliquam mauris at consequat eleifend. Praesent porta, augue sed viverra bibendum, neque ante euismod ante, in vehicula justo lorem ac eros. Suspendisse augue libero, venenatis eget tincidunt ut, malesuada at lorem. Donec vitae bibendum arcu. Aenean maximus nulla non pretium iaculis. Quisque imperdiet, nulla in pulvinar aliquet, velit quam ultrices quam, sit amet fringilla leo sem vel nunc. Mauris in lacinia lacus.&lt;/p&gt;

&lt;p&gt;Suspendisse a tincidunt lacus. Curabitur at urna sagittis, dictum ante sit amet, euismod magna. Sed rutrum massa id tortor commodo, vitae elementum turpis tempus. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aenean purus turpis, venenatis a ullamcorper nec, tincidunt et massa. Integer posuere quam rutrum arcu vehicula imperdiet. Mauris ullamcorper quam vitae purus congue, quis euismod magna eleifend. Vestibulum semper vel augue eget tincidunt. Fusce eget justo sodales, dapibus odio eu, ultrices lorem. Duis condimentum lorem id eros commodo, in facilisis mauris scelerisque. Morbi sed auctor leo. Nullam volutpat a lacus quis pharetra. Nulla congue rutrum magna a ornare.&lt;/p&gt;

&lt;p&gt;Aliquam in turpis accumsan, malesuada nibh ut, hendrerit justo. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Quisque sed erat nec justo posuere suscipit. Donec ut efficitur arcu, in malesuada neque. Nunc dignissim nisl massa, id vulputate nunc pretium nec. Quisque eget urna in risus suscipit ultricies. Pellentesque odio odio, tincidunt in eleifend sed, posuere a diam. Nam gravida nisl convallis semper elementum. Morbi vitae felis faucibus, vulputate orci placerat, aliquet nisi. Aliquam erat volutpat. Maecenas sagittis pulvinar purus, sed porta quam laoreet at.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Example Talk</title>
      <link>https://almostkapil.netlify.com/talk/example/</link>
      <pubDate>Sat, 01 Jun 2030 13:00:00 +0000</pubDate>
      <guid>https://almostkapil.netlify.com/talk/example/</guid>
      <description>&lt;div class=&#34;alert alert-note&#34;&gt;
  &lt;div&gt;
    Click on the &lt;strong&gt;Slides&lt;/strong&gt; button above to view the built-in slides feature.
  &lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;Slides can be added in a few ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Create&lt;/strong&gt; slides using Academic&amp;rsquo;s &lt;a href=&#34;https://sourcethemes.com/academic/docs/managing-content/#create-slides&#34; target=&#34;_blank&#34;&gt;&lt;em&gt;Slides&lt;/em&gt;&lt;/a&gt; feature and link using &lt;code&gt;slides&lt;/code&gt; parameter in the front matter of the talk file&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Upload&lt;/strong&gt; an existing slide deck to &lt;code&gt;static/&lt;/code&gt; and link using &lt;code&gt;url_slides&lt;/code&gt; parameter in the front matter of the talk file&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Embed&lt;/strong&gt; your slides (e.g. Google Slides) or presentation video on this page using &lt;a href=&#34;https://sourcethemes.com/academic/docs/writing-markdown-latex/&#34; target=&#34;_blank&#34;&gt;shortcodes&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Further talk details can easily be added to this page using &lt;em&gt;Markdown&lt;/em&gt; and $\rm \LaTeX$ math code.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Sales Impact Analysis with Clustering and Causal effects</title>
      <link>https://almostkapil.netlify.com/post/sales-impact-analysis-with-clustering-and-b/</link>
      <pubDate>Mon, 30 Mar 2020 00:00:00 +0000</pubDate>
      <guid>https://almostkapil.netlify.com/post/sales-impact-analysis-with-clustering-and-b/</guid>
      <description>&lt;p&gt;This project looks at how can the introduction of a discount during the holidays affect the total sale of customer groups within a timeframe of a year. The statistical techniques used are:&lt;/p&gt;

&lt;p&gt;RFM analysis (recency, frenquency, monetary) to analyse customer behavior by examining their transaction history such as,&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;how recently a customer has purchased (recency)&lt;/li&gt;
&lt;li&gt;how often they purchase (frequency)&lt;/li&gt;
&lt;li&gt;how much the customer spends (monetary)
RFM helps us identify customers who are more likely to respond to promotions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;K-means to segment customers into various category groups.&lt;/p&gt;

&lt;p&gt;Causal impact analysis to study the impact of discounts within each customer group&lt;/p&gt;

&lt;p&gt;Link to the github projects: &lt;a href=&#34;https://github.com/KapilKhanal/Sales_Impact&#34; target=&#34;_blank&#34;&gt;https://github.com/KapilKhanal/Sales_Impact&lt;/a&gt;
Link to the data product: &lt;a href=&#34;https://salesimpact.herokuapp.com/&#34; target=&#34;_blank&#34;&gt;https://salesimpact.herokuapp.com/&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Minnesota Lake Project:ML in Production Exercise</title>
      <link>https://almostkapil.netlify.com/post/minnesota-lake-project/</link>
      <pubDate>Sun, 19 Jan 2020 00:00:00 +0000</pubDate>
      <guid>https://almostkapil.netlify.com/post/minnesota-lake-project/</guid>
      <description>

&lt;h2 id=&#34;so-you-have-a-good-model-want-to-make-it-available-to-serve-the-world&#34;&gt;So you have a good model? Want to make it available to serve the world?&lt;/h2&gt;

&lt;h5 id=&#34;prototype-grade-model-workflow-to-production-land-workflow&#34;&gt;Prototype-grade model workflow to Production land workflow&lt;/h5&gt;

&lt;p&gt;Making a good model is awesome. It does takes enormous amount of experimentation and research. When we have a decent model, that is an eureka moment.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Every model wants to go out in the real world and serve its purpose.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But to actually use the model in production is a whole another pain. Recently I have been learning ways to deploy models.&lt;br /&gt;
&lt;img src=&#34;https://almostkapil.netlify.com/post/2020-01-19-minnesota-lake-project_files/mlsystem.jpg&#34; alt=&#34;Source:Manning Reactive Machine learning book&#34; /&gt;
Source: Reactive machine learning book&lt;/p&gt;

&lt;h4 id=&#34;model-predictions-as-webservice&#34;&gt;Model Predictions as WebService&lt;/h4&gt;

&lt;p&gt;Now,as we can see it is a lifecycle. There is a lot of nuances on deploying models. The workflow has to be reproducible,elastic and easy to manage. If you end up changing the model, the infrastructure should not have to be changed. For example, I used a simple regression model for this project, now if i am training random forest model, the parts that needs to be changed should be easily changed without change in infrastructure,  that is I collect all the parameters and file locations, data locations on on file say &lt;strong&gt;&lt;em&gt;config&lt;/em&gt;&lt;/strong&gt; file.Similarly, if I separate the feature engineering, feature selection part ,data validation etc on their own separate files then it will be easy to deploy[[&lt;strong&gt;Modular code&lt;/strong&gt;]].
&lt;br&gt;&lt;/p&gt;

&lt;p&gt;I can always train two different model and put it in the python package or cloud location like Pypi,S3 etc then i can easily retrieve those models and use it in the flask API i design just to serve the model.&lt;/p&gt;

&lt;p&gt;Thinking each service as a different code repoisitory. We will have three different repos.
&lt;li&gt;Python package for retrieving data, training model and uploading final model to PyPi,or S3&lt;/li&gt;
&lt;li&gt;ML api: Flask Application to serve the model by downloading the model from PyPi/S3 and exposing a API to get the data and return the prediction&lt;/li&gt;
&lt;li&gt;Another Flask/Front-end framework to get the json from API and populate the dashboard with plots and predictions&lt;/li&gt;&lt;/p&gt;

&lt;p&gt;While learning about this, I came across the idea of DataOps and MLOps. I think in future, most softwares will be ML softwares doing real time prediction and inference with very little slowdown. Wait, don&amp;rsquo;t human do that?&lt;/p&gt;

&lt;h2 id=&#34;basic-software-engineering-skills-and-gotchas&#34;&gt;Basic software engineering skills and gotchas:&lt;/h2&gt;

&lt;p&gt;What packages do your application need? &lt;strong&gt;&lt;em&gt;requirements.txt&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Right data types and schema for your data/database. Why making all of them StringType() is not a wise move in database.&lt;br&gt;&lt;/li&gt;
&lt;li&gt;Do you really need all those dataframe in memory?  Remove or avoid storing intermediate dataframes. If possible do all the column operations in database itself &lt;br&gt;&lt;/li&gt;
&lt;li&gt;Always use Git and version your code (and your data(Data Version Control)) in the right way.The simplest way is to store all the data used in prediction in a database with the model version and predicted value. &lt;br&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Use venv (virtual environments). You don&amp;rsquo;t want conflicting libraries quarelling with each other.&lt;br&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;set.seed() while training, to increase reproducibility.Use the same seed across different models.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Think of logging: what do you want to monitor? import logging in Python. &lt;br&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Think of how you are going to share your final model(pickel file? parametrized formula?) &lt;br&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;A Docker image, ready to be used, is a good choice. &lt;br&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Separate the data collection tool from the ML pipeline. Different repository for data wrangling, ML training and dashboard/front-end
&lt;br&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Your tools should have clear input parameters
e.g., Path to the repository &lt;br&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;The command line tool should not work if input parameters are wrong &lt;br&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Make &lt;em&gt;config&lt;/em&gt; parameters very clear &lt;br&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;A config.py file where people can tune specific configs &lt;br&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;If you use environment variables, document them clearly &lt;br&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Do not use hard coded paths &lt;br&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;During development, consider storing intermediate steps.(Rmarkdown or Jupyter notebooks) &lt;br&gt;
Understand the importance of the data you are passing to your model &lt;br&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Pay attention to “garbage-in garbage-out” &lt;br&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;If you would prefer more indepth resource on software skills: I found this summary notes of the famous book:
&lt;a href =
https://gist.github.com/wojteklu/73c6914cc446146b8b533c0988cf8d29&gt;Clean Code&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;All these different services are extensive on their own. Without a dedicated team, these services will not succeed. But this exercise was to get a general understanding of the overall ecosystem of Data and ML system. To be a good data scientist, i think it is good to get a lay of the land.&lt;/p&gt;

&lt;p&gt;If you like to see the end product, this link will take you there &lt;a href = &#34;http://lakedashboard.team/&#34;&gt;lake Dashboard&lt;/a&gt; &lt;br&gt;&lt;/p&gt;

&lt;p&gt;Some tips &lt;strong&gt;blatantly copied&lt;/strong&gt;  from ,&lt;br&gt;
References:
&lt;a href=&#34;http://gousios.org/courses/ml4se/building-your-ml-pipeline.html&#34; target=&#34;_blank&#34;&gt;http://gousios.org/courses/ml4se/building-your-ml-pipeline.html&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Data Driven Public Policy</title>
      <link>https://almostkapil.netlify.com/post/data-driven-public-policy/</link>
      <pubDate>Fri, 27 Dec 2019 00:00:00 +0000</pubDate>
      <guid>https://almostkapil.netlify.com/post/data-driven-public-policy/</guid>
      <description>&lt;p&gt;Most of the policy that got enacted has its own reasons. Policymakers and policy analyst have debated policies for years. These policies impact people&amp;rsquo;s lives but very few people had a direct say on how policy was proposed, analyzed and enacted. Democracy thrives on public opinion. Ironically, nowadays the very foundation of democracy is shaken because of public opinion. And it turns out, factual understanding of world also is getting blurry day by day. When people do not own the research fundings or are not in the policy debating loop, they disagree with policies even though it benefits them.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;https://almostkapil.netlify.com/post/2019-12-27-data-driven-public-policy_files/Drawing.png&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;

&lt;p&gt;A lot of the policies especially science and public good policies should be data driven. National Representatives and other policymakers should not just make the data public but have a wider discussion before even presenting it on public sphere.&lt;/p&gt;

&lt;p&gt;Conspiracies and falsified information can only be rooted out if we let people know how much money , time and effort was behind the policy. Average people do not know the level of investment when they are sharing/retweeting/making memes about falsified information. People should be made aware that science research is not just a google search. Google search is not research. One is merely fishing for information in google search. for example, only very few scholar know how much of investment it took to eradicate measles through immunization policy but now it is only taking a share/retweet to bring it back because they don&amp;rsquo;t see how many professional spent their time, how much governments invested money and how many schools/universities debated / cross checked and verfied before it got into public sphere and i think this is exactly where a general people should be included. This is only possible through sharing of public data.A repository of metadata for every single public policy that got enacted.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Minnesota Lake Project</title>
      <link>https://almostkapil.netlify.com/project/educational/</link>
      <pubDate>Sun, 27 Oct 2019 00:00:00 +0000</pubDate>
      <guid>https://almostkapil.netlify.com/project/educational/</guid>
      <description>&lt;p&gt;This dataset contains lake quality in each lake and year.&lt;/p&gt;

&lt;p&gt;MCES data
The ** MCES Citizen-Assisted-Monitoring-Program(CAMP)&lt;/p&gt;

&lt;p&gt;The goal of the MCES lake monitoring program is to obtain and provide information that enables cities, counties, lake associations, and watershed management districts to better manage TCMA lakes, thereby protecting and improving lake water quality.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>CMU Sport Analytics Projects Slideshows</title>
      <link>https://almostkapil.netlify.com/post/baseball/</link>
      <pubDate>Wed, 21 Aug 2019 21:13:14 -0500</pubDate>
      <guid>https://almostkapil.netlify.com/post/baseball/</guid>
      <description>


&lt;div id=&#34;my-cmsac-experience&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;My CMSAC Experience&lt;/h2&gt;
&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;Jeremy Sanchez &lt;a href=&#34;https://twitter.com/_jsanchez1?ref_src=twsrc%5Etfw&#34;&gt;@_jsanchez1&lt;/a&gt;, Nathan Moss &lt;a href=&#34;https://twitter.com/CMU_Stats?ref_src=twsrc%5Etfw&#34;&gt;@CMU_Stats&lt;/a&gt;, and Kapil Khanal @Kapil71001628 working on soccer with &lt;a href=&#34;https://twitter.com/kpelechrinis?ref_src=twsrc%5Etfw&#34;&gt;@kpelechrinis&lt;/a&gt; &lt;a href=&#34;https://t.co/Ij2eFiJ8eH&#34;&gt;pic.twitter.com/Ij2eFiJ8eH&lt;/a&gt;&lt;/p&gt;&amp;mdash; CMU Stats &amp;amp; DS (@CMU_Stats) &lt;a href=&#34;https://twitter.com/CMU_Stats/status/1154857646429749248?ref_src=twsrc%5Etfw&#34;&gt;July 26, 2019&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;

&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;https://almostkapil.netlify.com/post/Baseball_files/CMSAC.jpeg&#34; alt=&#34;Presenting our Final Project&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;Presenting our Final Project&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;my-first-project-at-cmu-statistics-sport-analytics-camp&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;My First Project at CMU Statistics :Sport Analytics Camp&lt;/h3&gt;
&lt;p&gt;The first week has been a good review of basic dplyr syntax and ggplot2 philosophy. I like how Professors and TA are always there for us. Small data manipulation problems or points being masked in scatterplots, i ran into all sort of problems. &lt;br&gt;
These are a practice projects before we actually work with our choice of research projects.&lt;/p&gt;
&lt;p&gt;Here is the schedule of this summer camp.
&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;Last day of &lt;a href=&#34;https://twitter.com/hashtag/CMSACamp?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#CMSACamp&lt;/a&gt;! Jam-packed summer full of &lt;a href=&#34;https://twitter.com/hashtag/datascience?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#datascience&lt;/a&gt;, &lt;a href=&#34;https://twitter.com/hashtag/sportsanalytics?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#sportsanalytics&lt;/a&gt;, speakers, tours, amazing partners &lt;a href=&#34;https://twitter.com/TruMediaSports?ref_src=twsrc%5Etfw&#34;&gt;@TruMediaSports&lt;/a&gt; &lt;a href=&#34;https://twitter.com/albert_larcada?ref_src=twsrc%5Etfw&#34;&gt;@albert_larcada&lt;/a&gt; &lt;a href=&#34;https://twitter.com/stat_sam?ref_src=twsrc%5Etfw&#34;&gt;@stat_sam&lt;/a&gt; &lt;a href=&#34;https://twitter.com/penguins?ref_src=twsrc%5Etfw&#34;&gt;@penguins&lt;/a&gt; &lt;a href=&#34;https://twitter.com/kpelechrinis?ref_src=twsrc%5Etfw&#34;&gt;@kpelechrinis&lt;/a&gt;  &lt;a href=&#34;https://twitter.com/Stat_Ron?ref_src=twsrc%5Etfw&#34;&gt;@Stat_Ron&lt;/a&gt; &lt;a href=&#34;https://twitter.com/NFL?ref_src=twsrc%5Etfw&#34;&gt;@NFL&lt;/a&gt; &lt;a href=&#34;https://twitter.com/albertbayes?ref_src=twsrc%5Etfw&#34;&gt;@albertbayes&lt;/a&gt; &lt;a href=&#34;https://twitter.com/bklynmaks?ref_src=twsrc%5Etfw&#34;&gt;@bklynmaks&lt;/a&gt; &lt;a href=&#34;https://twitter.com/ATLHawks?ref_src=twsrc%5Etfw&#34;&gt;@ATLHawks&lt;/a&gt; &lt;a href=&#34;https://twitter.com/acthomasca?ref_src=twsrc%5Etfw&#34;&gt;@acthomasca&lt;/a&gt; &lt;a href=&#34;https://twitter.com/sarah_malle?ref_src=twsrc%5Etfw&#34;&gt;@sarah_malle&lt;/a&gt; &lt;a href=&#34;https://twitter.com/nflscrapR?ref_src=twsrc%5Etfw&#34;&gt;@nflscrapR&lt;/a&gt; &lt;a href=&#34;https://twitter.com/Pirates?ref_src=twsrc%5Etfw&#34;&gt;@Pirates&lt;/a&gt; &lt;a href=&#34;https://t.co/feG2cZnGQR&#34;&gt;pic.twitter.com/feG2cZnGQR&lt;/a&gt;&lt;/p&gt;&amp;mdash; CMU Stats &amp;amp; DS (@CMU_Stats) &lt;a href=&#34;https://twitter.com/CMU_Stats/status/1154736616646283264?ref_src=twsrc%5Etfw&#34;&gt;July 26, 2019&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;
&lt;/p&gt;
&lt;div id=&#34;project1-baseball&#34; class=&#34;section level6&#34;&gt;
&lt;h6&gt;Project1: Baseball&lt;/h6&gt;
&lt;p&gt;For this project, we looked into how similar the top 5 hitters are in baseball.Below is the slide we presented at the camp.
&lt;iframe src=&#34;https://docs.google.com/presentation/d/e/2PACX-1vRe0m6fqFga0-BulFHn6_wXG7qKkp1G7Y8zpTAS6nrDmH69k_574IjHaGK_MrQxagGN_mQtNBF33uvo/embed?start=true&amp;loop=true&amp;delayms=2000&#34; frameborder=&#34;0&#34; width=&#34;860&#34; height=&#34;469&#34; allowfullscreen=&#34;true&#34; mozallowfullscreen=&#34;true&#34; webkitallowfullscreen=&#34;true&#34;&gt;&lt;/iframe&gt;&lt;/p&gt;
&lt;p&gt;Similarly for project 2 , we did another project using tennis dataset.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;project-2-tennis&#34; class=&#34;section level6&#34;&gt;
&lt;h6&gt;Project 2: Tennis&lt;/h6&gt;
&lt;p&gt;&lt;b&gt;What factors are best at predicting point ratio for a match during a Grand Slam?
&lt;/b&gt;
&lt;iframe src=&#34;https://docs.google.com/presentation/d/e/2PACX-1vTA379JFEoMzgqXndoeEaU3ZIC0P1P8f0d8dna7Je4QsnKWGDKW6-sTWIU5FTCvPAEynta1l1NWI1Na/embed?start=true&amp;loop=true&amp;delayms=3000&#34; frameborder=&#34;0&#34; width=&#34;860&#34; height=&#34;469&#34; allowfullscreen=&#34;true&#34; mozallowfullscreen=&#34;true&#34; webkitallowfullscreen=&#34;true&#34;&gt;&lt;/iframe&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;project-3simulating-office-environment-in-analytics&#34; class=&#34;section level6&#34;&gt;
&lt;h6&gt;Project 3:Simulating Office Environment in Analytics &lt;br&gt;&lt;/h6&gt;
&lt;p&gt;This is a non-technical project but most fun project. Our class of 16 students were partitioned into 4 analytics department for a hypothetical team. There is a lot of romour on players market, where some players are up for grab who are extremely essential for our team. Also, we have to let go some players. The crazy part of this project is that time is ticking. Our boss changes her decision every few minutes as per the changes inmarket. We have to come up with a some numbers to back up some decisions we are about to recommend.&lt;/p&gt;
&lt;p&gt;Below is the slide we prepared within 10 minutes with so many factora being changed while we were working on it.
&lt;iframe src=&#34;https://docs.google.com/presentation/d/e/2PACX-1vQnqAWxyL5W47Sd0FPRzSNdeWEq9uXE2T_3S_enY2YUNgIIhiJvFTbrA9tDmVztZtENwd9Rv3aT6QBV/embed?start=true&amp;loop=true&amp;delayms=3000&#34; frameborder=&#34;0&#34; width=&#34;960&#34; height=&#34;569&#34; allowfullscreen=&#34;true&#34; mozallowfullscreen=&#34;true&#34; webkitallowfullscreen=&#34;true&#34;&gt;&lt;/iframe&gt;&lt;/p&gt;
&lt;p&gt;This project shed some light on the life of working data scientists and data analysts. It’s not always about fancy graphs or complicated tongue twisting models. I learned that we start with the problem we have, collect necessary data, make new metrics as per problem, graph problems and proposed solutions so that intuitive to all concerned parties and then use models to test our hypothesis and take decision.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;project-3&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Project 3:&lt;/h3&gt;
&lt;p&gt;This is the final project i worked on for the half of this summer camp. We This is actually a work in progress. We will be changing a lot of things(i guess that is research, &lt;code&gt;change until you no longer find a justification to change&lt;/code&gt;)&lt;/p&gt;
&lt;p&gt;I chose this because soccer has been very interesting for me from my childhood. I played soccer in my high school extensively and it still fascinates me with all the complexity involved from Math ,Statistical and data point of view.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;https://almostkapil.netlify.com/post/Baseball_files/kapilcmu.jpeg&#34; alt=&#34;Presenting to class mates before poster presentation&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;Presenting to class mates before poster presentation&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Like i tweeted, I am extremely grateful for CMU Stats for letting me experience life as a data scientists.
&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;The best 8 weeks. I got to learn so many things and enjoy Pittsburgh. The spirit at &lt;a href=&#34;https://twitter.com/CMU_Stats?ref_src=twsrc%5Etfw&#34;&gt;@CMU_Stats&lt;/a&gt;  is amazing, like a Stat-Disney land. Thank you for everything especially all those free foods and tickets to game and Kennywood. &lt;a href=&#34;https://twitter.com/hashtag/CMSACamp?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#CMSACamp&lt;/a&gt;&lt;/p&gt;&amp;mdash; Kapil.Khanal (@almost_kapil) &lt;a href=&#34;https://twitter.com/almost_kapil/status/1154867446177763328?ref_src=twsrc%5Etfw&#34;&gt;July 26, 2019&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Data Dashboard for StockX Contest</title>
      <link>https://almostkapil.netlify.com/post/stockx/</link>
      <pubDate>Wed, 21 Aug 2019 21:13:14 -0500</pubDate>
      <guid>https://almostkapil.netlify.com/post/stockx/</guid>
      <description>


&lt;div id=&#34;stockx-data-contest-2019&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Stock&lt;span style=&#34;color:green&#34;&gt;&lt;b&gt;X&lt;/b&gt;&lt;/span&gt; Data Contest 2019&lt;/h2&gt;
&lt;p&gt;&lt;a href = &#34;https://stockx.com/news/the-2019-data-contest/&#34;&gt;StockX Challenge&lt;/a&gt; is a call for data and sneakers nerds to have fun.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;https://almostkapil.netlify.com/post/stockX_files/sneaker.jpg&#34; alt=&#34;source: stockX&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;source: stockX&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The basic idea is this: they give you a bunch of original StockX sneaker data, then you crunch the numbers and come up with the coolest, smartest, most compelling story you can tell. It can be literally anything you want. A theory, an insight, even just a really original data visualization. It could be a novel hypothesis about resale prices you’ve always wanted to test. Or maybe it’s just a beautiful chart to visualize the data. It can be on any subject – sneakers, brands, buyers, or even StockX itself. Whatever you find interesting, just follow your bliss.&lt;/p&gt;
&lt;p&gt;I also gave a shot on trying to come up with something useful. Below is my finished data dashboard. &lt;br&gt;&lt;/p&gt;
&lt;div id=&#34;my-data-dashboard-for-stockx&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;My Data Dashboard for Stock&lt;span style=&#34;color:green&#34;&gt;&lt;b&gt;X&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;https://almostkapil.netlify.com/post/index_files/stockX.png&#34; alt=&#34;Dashboard&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;Dashboard&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The link for tableau worksheet is &lt;a href = &#34;https://public.tableau.com/views/StockX_0/Dashboard1?:embed=y&amp;:display_count=yes&amp;:origin=viz_share_link&#34;&gt;here&lt;/a&gt; &lt;br&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Calculations on the Dashboards&lt;/em&gt; &lt;/br&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Price ratio&lt;/code&gt;: Ratio of Sales to Retail Price for Each Sneakers &lt;br&gt;
&lt;code&gt;Weeks&lt;/code&gt;: (Order Date - Release Date) Converted in Weeks.&lt;br&gt;
&lt;code&gt;Median Price ratio&lt;/code&gt; is chosen to eliminate the effect of asymmetrical range of dates(2017-2019 not
complete as 2018) and counts of sneakers sales.&lt;br&gt;
&lt;code&gt;Color&lt;/code&gt; Scale for two brands are consistent whenever there is plot relating to brands.&lt;br&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;1) Order of Sneakers by brand for weeks from Release Date&lt;/b&gt;&lt;br&gt;&lt;/p&gt;
&lt;p&gt;This plot shows the total count of orders for different sneakers of two brands
Both Brands are ordered before the release date. Off white has more orders than yeezy on the datasets.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;its-interesting-how-the-demand-of-yeezy-increased-at-around-90-weeks-after-the-release-of-the-shoes.&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;It’s interesting how the demand of yeezy increased at around &lt;code&gt;90 weeks&lt;/code&gt; after the release of the shoes.&lt;/h2&gt;
&lt;p&gt;2)&lt;b&gt;Ratio of Sales Price to Retail Price For each Brand by Weeks&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;This plot look at the relation of ratio of sale price to retail price for each brands and weeks after release
date. Clearly,Both Brand’s sale price is more than the retail price. The ratio of off-White increases in
general regardless of the individual sneakers while the ratio of yeezy brands is somewhat noisy but it has
a trend like off white. Both brand’s price ratio is increased after the release date.&lt;br&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;3)Distribution of Median Sales price given the retail price for each brand&lt;/b&gt;&lt;br&gt;
This plot looks in detail on how the median sale price is distributed for each sneaker. The distribution of
median sale price for top 28 sneakers which were sold as least as 5 times over retail price are plotted.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;4) Median Price and States&lt;/b&gt; &lt;br&gt;&lt;/p&gt;
&lt;p&gt;This plot is looking at the median price ratio for all the states. The color scale is chosen for the ratio and
the size of the sneakers shows total sales relative to others. Which states usually pays more for
sneakers? Clearly, Delaware,Vermont,Utah had some sales with high price ratio. States like California and
Newyork have a lot of sales as shown by their relative sizes. The relative size is calculated by taking the
log of total sales in each states. States like Wyoming have less Sales and also with lower sales ratio.&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Winona Area Public Schools: Community Contribution</title>
      <link>https://almostkapil.netlify.com/post/waps/</link>
      <pubDate>Wed, 21 Aug 2019 21:13:14 -0500</pubDate>
      <guid>https://almostkapil.netlify.com/post/waps/</guid>
      <description>


&lt;div id=&#34;winona-area-public-schools-data-visualization&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;&lt;span style=&#34;color:purple&#34;&gt;Winona Area Public Schools Data Visualization &lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;Introduction&lt;/code&gt;:&lt;br&gt;
This Project addresses the need of communication of public school data to community members in an meaningful way.Also, making the data available to general public in a proper and useable format. &lt;br&gt;&lt;/p&gt;
&lt;p&gt;There has been a wider discussion regarding the budget issue in Winona area schools. Here is
&lt;a href = &#34;https://www.winonadailynews.com/news/local/what-will-waps-cut-board-to-weigh-new-options-for/article_23c25b9f-7365-5aa2-b370-1ed251eb8231.html&#34;
 width=&#34;645&#34; height=&#34;955&#34;&gt;the article &lt;/a&gt;&lt;/p&gt;
Primarily, this Project was focused on cleaning and visualizing the Enrollment,Expenditures and Staffing History reports of the Winona Area Public District(WAPS) available publicly through Minnesota department of education, Data Center
Link:&lt;a href=&#34;http://education.state.mn.us/MDE/Data/&#34; class=&#34;uri&#34;&gt;http://education.state.mn.us/MDE/Data/&lt;/a&gt;
&lt;img src=&#34;https://almostkapil.netlify.com/post/WAPS_files/waps.png&#34; /&gt; &lt;br&gt;
&lt;h5&gt;
Methods and Steps of Projects
&lt;/h5&gt;
&lt;p&gt;1)Data Inspection/Acquisition:.&lt;br&gt;
Public Data was collected by Alison Quam (Representative from WAPS District).
The Data were made available in different pdf/excel files. Also, the information were scattered in different files.&lt;br&gt;&lt;/p&gt;
&lt;p&gt;2)Data Cleaning and Formatting&lt;br&gt;
First,most of the pdf files were converted to excel by Tabula(Link:&lt;a href=&#34;http://tabula.technology/&#34; class=&#34;uri&#34;&gt;http://tabula.technology/&lt;/a&gt;) and online tool(&lt;a href=&#34;http://pdftoexcel.com&#34; class=&#34;uri&#34;&gt;http://pdftoexcel.com&lt;/a&gt;)
then, they were cleaned up in proper format and stacked using Python (Pandas).&lt;br&gt;&lt;/p&gt;
&lt;p&gt;3)Data Exploration and Visualization &lt;br&gt;
This part of the project is focused on addressing the questions provided by representative of WAPS(Alison Quam).
Tableau was used extensively to explore the data and visualize it.
Primarily, i focused on answering following questions.&lt;br&gt;
1. &lt;span style=&#34;color:purple&#34;&gt;&lt;strong&gt;I was curious about,how does the enrollment and capture rate(rate of new born enrolling to Kindergarten)is changing on WAPS district?.&lt;/strong&gt; &lt;/span&gt;&lt;br&gt;&lt;/p&gt;
&lt;p&gt;After few meetings with representative, i realized she was more curious about how schools spends on across different programs.&lt;br&gt;&lt;/p&gt;
&lt;p&gt;2.&lt;span style=&#34;color:purple&#34;&gt;&lt;strong&gt;How the expenditure per average daily membership (count of student daily served in schools) and spending on various category is changing?.&lt;/span&gt;&lt;/strong&gt;&lt;br&gt;&lt;/p&gt;
&lt;p&gt;The link to the tableau file and the data is &lt;a href = &#34;https://public.tableau.com/views/WinonaAreaPublicSchoolsDataStory/FourthDashboard?:retry=yes&amp;:embed=y&amp;:display_count=yes&amp;:origin=viz_share_link&#34;&gt; &lt;b&gt;here&lt;/b&gt; &lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Now, Visual Story Begins….&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://almostkapil.netlify.com/post/WAPS_files/Second_Dashboard.png&#34; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://almostkapil.netlify.com/post/WAPS_files/Third_Dashboard.png&#34; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://almostkapil.netlify.com/post/WAPS_files/Fourth_Dashboard.png&#34; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;This project actually helped inform the decision makers in local level. Thus, i was able to contribute to something meaningful with my python and tableau skills.&lt;/code&gt;&lt;/p&gt;
&lt;div id=&#34;acknowledgement&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;Acknowledgement&lt;/h4&gt;
&lt;p&gt;I would like to thank WAPS representative and Prof.Silas Bergen on helping and guiding me to understand the terms and calculations already done in the reports and Prof.Todd Iverson to help figure out Python code for cleaning the data.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Animation:Internet Usage</title>
      <link>https://almostkapil.netlify.com/post/internetusage/</link>
      <pubDate>Thu, 15 Aug 2019 21:13:14 -0500</pubDate>
      <guid>https://almostkapil.netlify.com/post/internetusage/</guid>
      <description>


&lt;div id=&#34;how-internet-is-eating-the-world-internet-usage-animation&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;How internet is eating the world? Internet Usage animation&lt;/h2&gt;
&lt;p&gt;Internet Usage is the world bank development indicator. In this project i grabbed the world bank dataset(which is in the link provided below).&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://almostkapil.netlify.com/post/internetusage_files/internetUsage.gif&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Link to the tableau &lt;a href = &#34;https://public.tableau.com/shared/NXKC4HKX7?:display_count=yes&amp;:origin=viz_share_link&#34;&gt;worksheet&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Sankey diagrams for Bacteria and antibiotics</title>
      <link>https://almostkapil.netlify.com/post/sankey/</link>
      <pubDate>Wed, 24 Jul 2019 21:13:14 -0500</pubDate>
      <guid>https://almostkapil.netlify.com/post/sankey/</guid>
      <description>


&lt;div id=&#34;visually-classifying-bacteria-and-antibiotics&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Visually Classifying Bacteria and Antibiotics&lt;/h2&gt;
&lt;p&gt;After World War II, antibiotics earned the moniker “wonder drugs” for quickly treating previously-incurable diseases. Data was gathered to determine which drug worked best for each bacterial infection. Comparing drug performance was an enormous aid for practitioners and scientists alike. In the fall of 1951, Will Burtin published a &lt;a href = &#34;https://mbostock.github.io/protovis/ex/antibiotics-burtin.html&#34;&gt;graph &lt;/a&gt; showing the effectiveness of three popular antibiotics on &lt;B&gt;16&lt;/B&gt; different bacteria, measured in terms of minimum inhibitory concentration.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;https://almostkapil.netlify.com/post/sankey_files/avb.jpg&#34; alt=&#34;image creidt: Ask a biologist&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;image creidt: Ask a biologist&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I am reproducing this &lt;a href = &#34;https://www.dropbox.com/s/68ahri9xnnabce4/Bacteria-sigmoid-howto.docx?dl=0&#34;&gt;wonderful visualization&lt;/a&gt; from my professor(&lt;a href = &#34;http://driftlessdata.space/&#34;&gt; Silas Bergen&lt;/a&gt;.) in ggplot2, who did this in Tableau&lt;/p&gt;
&lt;p&gt;Let’s bring the datasets,&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(tidyverse)
library(knitr)
library(kableExtra)
df &amp;lt;- read.csv(&amp;quot;https://cdn.rawgit.com/plotly/datasets/5360f5cd/Antibiotics.csv&amp;quot;, stringsAsFactors = F)
#String as Factors is a demon. Better not bring it here ! We rarely need that beast.
#There are 16 bacteria so giving them ID to reference later..
df&amp;lt;-df %&amp;gt;% mutate(ID =seq(1:16) )&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;kable(head(df,n = 16))&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Bacteria
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Penicillin
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Streptomycin
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Neomycin
&lt;/th&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Gram
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
ID
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Mycobacterium tuberculosis
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
800.000
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
5.00
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2.000
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Salmonella schottmuelleri
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
10.000
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.80
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.090
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Proteus vulgaris
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
3.000
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.10
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.100
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
3
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Klebsiella pneumoniae
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
850.000
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1.20
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1.000
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
4
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Brucella abortus
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1.000
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2.00
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.020
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
5
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Pseudomonas aeruginosa
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
850.000
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2.00
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.400
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
6
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Escherichia coli
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
100.000
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.40
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.100
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
7
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Salmonella (Eberthella) typhosa
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1.000
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.40
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.008
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
8
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Aerobacter aerogenes
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
870.000
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1.00
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1.600
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
9
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Brucella antracis
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.001
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.01
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.007
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
positive
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
10
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Streptococcus fecalis
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1.000
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1.00
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.100
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
positive
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
11
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Staphylococcus aureus
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.030
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.03
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.001
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
positive
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
12
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Staphylococcus albus
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.007
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.10
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.001
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
positive
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
13
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Streptococcus hemolyticus
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.001
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
14.00
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
10.000
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
positive
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
14
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Streptococcus viridans
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.005
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
10.00
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
40.000
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
positive
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
15
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Diplococcus pneumoniae
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.005
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
11.00
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
10.000
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
positive
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
16
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Before proceeding further with the data manipulation we need to think about the format of the visualization. Here we will be making our visualization on the bacteria level, that means we will have information for each bacteria, their gram stain , and the concentration of drug required .&lt;/p&gt;
&lt;p&gt;If you look at the table above, we do have all the data we need but not on the format we are thinking. We want one information per row for each bacteria unlike above where each row has all the information of each bacteria on one single row.
Let’s change the format of the data,&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;key_value = df %&amp;gt;% gather(&amp;quot;Drug&amp;quot;,&amp;quot;Concentration&amp;quot;,Penicillin:Neomycin,-Bacteria)
kable(head(key_value))&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Bacteria
&lt;/th&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Gram
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
ID
&lt;/th&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Drug
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Concentration
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Mycobacterium tuberculosis
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Penicillin
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
800
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Salmonella schottmuelleri
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Penicillin
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
10
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Proteus vulgaris
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
3
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Penicillin
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
3
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Klebsiella pneumoniae
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
4
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Penicillin
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
850
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Brucella abortus
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
5
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Penicillin
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Pseudomonas aeruginosa
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
6
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Penicillin
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
850
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;okay so, now what we need to do is add a minimum concentration information for each bacteria for each stain type. so basically a column on the gathered table above. The only thing to keep note of is that here we should group all these bacteria and select the minimum concentration. We could have done this first[basically for eacg ] and gather like above but this is my thought process.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;df_min&amp;lt;- key_value  %&amp;gt;% 
  group_by(Bacteria) %&amp;gt;% summarise(Min = min(Concentration))
kable(head(df_min))&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Bacteria
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Min
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Aerobacter aerogenes
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1.000
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Brucella abortus
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.020
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Brucella antracis
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.001
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Diplococcus pneumoniae
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.005
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Escherichia coli
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.100
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Klebsiella pneumoniae
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1.000
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;so now, let’s join this &lt;code&gt;df_min&lt;/code&gt; dataframe from above with &lt;code&gt;df&lt;/code&gt; to have that minimum information in the dataframe.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;df&amp;lt;- inner_join(df,df_min,by = &amp;quot;Bacteria&amp;quot;)
df&amp;lt;- df %&amp;gt;% mutate(Best = case_when(
  Penicillin == Min~ &amp;quot;Penicillin&amp;quot;,
  Neomycin == Min~ &amp;quot;Neomycin&amp;quot;,
  Streptomycin == Min~ &amp;quot;Streptomycin&amp;quot;
))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now, since the data is ready and in the format we want,&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;kable(head(df))&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Bacteria
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Penicillin
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Streptomycin
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Neomycin
&lt;/th&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Gram
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
ID
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Min
&lt;/th&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Best
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Mycobacterium tuberculosis
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
800
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
5.0
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2.00
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2.00
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Neomycin
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Salmonella schottmuelleri
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
10
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.8
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.09
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.09
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Neomycin
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Proteus vulgaris
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
3
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.1
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.10
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
3
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.10
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Neomycin
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Klebsiella pneumoniae
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
850
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1.2
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1.00
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
4
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1.00
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Neomycin
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Brucella abortus
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2.0
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.02
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
5
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.02
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Neomycin
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Pseudomonas aeruginosa
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
850
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2.0
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.40
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
6
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.40
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Neomycin
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Okay, this step might be a little unintuitive but if we think with &lt;code&gt;grammer of graphics&lt;/code&gt; philosophy this will make sense.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;seq1 &amp;lt;- rep(1:16,each=100)
seq2 &amp;lt;-rep(seq(-6,6,length=100),16)
newdat &amp;lt;-data.frame(ID=seq1,T=seq2)
write.csv(newdat,&amp;quot;new_data.csv&amp;quot;,row.names=FALSE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We are making a new dataframe that has data point for the sigmoid curve(you can just draw sigmoid curve in R but this way it is linked with our data with ID)&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#Joining the data by ID
final_df&amp;lt;-inner_join(df,newdat,by = &amp;quot;ID&amp;quot;)
kable(head(final_df))&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Bacteria
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Penicillin
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Streptomycin
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Neomycin
&lt;/th&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Gram
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
ID
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Min
&lt;/th&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Best
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
T
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Mycobacterium tuberculosis
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
800
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
5
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Neomycin
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
-6.000000
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Mycobacterium tuberculosis
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
800
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
5
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Neomycin
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
-5.878788
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Mycobacterium tuberculosis
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
800
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
5
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Neomycin
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
-5.757576
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Mycobacterium tuberculosis
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
800
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
5
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Neomycin
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
-5.636364
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Mycobacterium tuberculosis
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
800
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
5
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Neomycin
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
-5.515151
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Mycobacterium tuberculosis
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
800
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
5
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Neomycin
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
-5.393939
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#ggplot
final_df &amp;lt;- final_df %&amp;gt;% mutate(Sigmoid = 1/(1 + exp(-T)))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;okay so now we have the final dataset, we can get in the ggplot2 land.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = final_df , aes(x = T , y = Sigmoid ))
p + geom_point() &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://almostkapil.netlify.com/post/sankey_files/figure-html/unnamed-chunk-10-1.png&#34; width=&#34;1344&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#Making best slope
#Different slop will separate our curves
final_df&amp;lt;-final_df %&amp;gt;% mutate(bestBacSlope = case_when(
  Best ==&amp;quot;Streptomycin&amp;quot; ~ 4 - ID,
  Best ==&amp;quot;Neomycin&amp;quot; ~ 9 - ID,
  Best ==&amp;quot;Penicillin&amp;quot; ~ 14 - ID
))&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;final_df&amp;lt;-final_df %&amp;gt;% mutate(curveBest = ID + bestBacSlope * Sigmoid)
#Figuring out ID and labels

label_df&amp;lt;-final_df %&amp;gt;% dplyr::select(c(ID, Bacteria))%&amp;gt;% group_by(Bacteria,ID) %&amp;gt;% summarise(count = n()) %&amp;gt;% dplyr::select(Bacteria,ID) %&amp;gt;% arrange(ID)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Below are the label we will use in y-axis&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;label_y= c(&amp;quot;Mycobacterium tuberculosis&amp;quot; ,  &amp;quot;Salmonella schottmuelleri&amp;quot;  ,    
           &amp;quot;Proteus vulgaris&amp;quot;        ,        &amp;quot;Klebsiella pneumoniae&amp;quot;  ,        
           &amp;quot;Brucella abortus&amp;quot;      ,          &amp;quot;Pseudomonas aeruginosa&amp;quot;    ,     
           &amp;quot;Escherichia coli&amp;quot;    ,            &amp;quot;Salmonella (Eberthella) typhosa&amp;quot;,
           &amp;quot;Aerobacter aerogenes&amp;quot;     ,       &amp;quot;Brucella antracis&amp;quot;    ,          
           &amp;quot;Streptococcus fecalis&amp;quot;    ,       &amp;quot;Staphylococcus aureus&amp;quot;      ,    
           &amp;quot;Staphylococcus albus&amp;quot;    ,        &amp;quot;Streptococcus hemolyticus&amp;quot;      ,
           &amp;quot;Streptococcus viridans&amp;quot;    ,      &amp;quot;Diplococcus pneumoniae&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now it’s a &lt;code&gt;plotting time&lt;/code&gt; !&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#Plotting the sigmoid plots
library(ggthemes)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: package &amp;#39;ggthemes&amp;#39; was built under R version 3.5.2&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;sankey &amp;lt;- ggplot(data = final_df, aes(x = T , y = curveBest, color =Gram,size = Min,alpha = 0.9,group = Bacteria)) + geom_line() +scale_fill_manual(values=c(&amp;quot;green&amp;quot;,&amp;quot;red&amp;quot;)) + 
    scale_y_continuous(breaks = seq(1:16) , labels = label_y)   + theme(axis.title.y = element_blank() , axis.line.x  = element_blank() , axis.ticks.x = element_blank(), axis.title.x =element_blank() , axis.text.x.bottom = element_blank() ) + 
  annotate(&amp;quot;text&amp;quot;, x = 6, y = 14, label = &amp;quot;Penicillin&amp;quot;) +
  annotate(&amp;quot;text&amp;quot;, x = 6, y = 9, label = &amp;quot;Neomycin&amp;quot;) +
  annotate(&amp;quot;text&amp;quot;, x = 6, y = 4, label = &amp;quot;Streptomycin&amp;quot;) +
  annotate(&amp;quot;text&amp;quot;,x = 5.5,y = 15,label = &amp;quot;Best Antibiotics&amp;quot; ,size = 5, colour = &amp;#39;blue&amp;#39;)+
  theme_minimal()&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;sankey&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span id=&#34;fig:sankey&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://almostkapil.netlify.com/post/sankey_files/figure-html/sankey-1.png&#34; alt=&#34;Classification of Bacteria&#34; width=&#34;1344&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 1: Classification of Bacteria
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Truck Factor</title>
      <link>https://almostkapil.netlify.com/post/truckfactor/</link>
      <pubDate>Tue, 23 Jul 2019 21:13:14 -0500</pubDate>
      <guid>https://almostkapil.netlify.com/post/truckfactor/</guid>
      <description>


&lt;div id=&#34;truck-factor&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Truck Factor&lt;/h2&gt;
&lt;p&gt;&lt;img src=&#34;https://almostkapil.netlify.com/post/TruckFactor_files/truckfactor.png&#34; /&gt;
Today I learned an interesting concept in software engineering and project management called “Truck Factor”. The minimum numbers of contributors of a project that needs to be hit by a truck before the project is crippled and unfinished.&lt;/p&gt;
&lt;p&gt;my first thought was why would you think of such a extreme example.Seems like there is an emphasis on how important some people are to the project.This suggests a need for many heroes than a single hero. Its a good metric to see how centralized your project is in terms of contributions. You would want to many people on the project, helping each other so that if one gets hit by a bus, projects is not in serious trouble.&lt;/p&gt;
&lt;p&gt;There is a whole science of organizing project called Agile methodology if any one is interested.&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>An example preprint / working paper</title>
      <link>https://almostkapil.netlify.com/publication/preprint/</link>
      <pubDate>Sun, 07 Apr 2019 00:00:00 +0000</pubDate>
      <guid>https://almostkapil.netlify.com/publication/preprint/</guid>
      <description>&lt;div class=&#34;alert alert-note&#34;&gt;
  &lt;div&gt;
    Click the &lt;em&gt;Slides&lt;/em&gt; button above to demo Academic&amp;rsquo;s Markdown slides feature.
  &lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;Supplementary notes can be added here, including &lt;a href=&#34;https://sourcethemes.com/academic/docs/writing-markdown-latex/&#34; target=&#34;_blank&#34;&gt;code and math&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Slides</title>
      <link>https://almostkapil.netlify.com/slides/example/</link>
      <pubDate>Tue, 05 Feb 2019 00:00:00 +0000</pubDate>
      <guid>https://almostkapil.netlify.com/slides/example/</guid>
      <description>

&lt;h1 id=&#34;welcome-to-slides&#34;&gt;Welcome to Slides&lt;/h1&gt;

&lt;p&gt;&lt;a href=&#34;https://sourcethemes.com/academic/&#34; target=&#34;_blank&#34;&gt;Academic&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&#34;features&#34;&gt;Features&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Efficiently write slides in Markdown&lt;/li&gt;
&lt;li&gt;3-in-1: Create, Present, and Publish your slides&lt;/li&gt;
&lt;li&gt;Supports speaker notes&lt;/li&gt;
&lt;li&gt;Mobile friendly slides&lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h2 id=&#34;controls&#34;&gt;Controls&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Next: &lt;code&gt;Right Arrow&lt;/code&gt; or &lt;code&gt;Space&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Previous: &lt;code&gt;Left Arrow&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Start: &lt;code&gt;Home&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Finish: &lt;code&gt;End&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Overview: &lt;code&gt;Esc&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Speaker notes: &lt;code&gt;S&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Fullscreen: &lt;code&gt;F&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Zoom: &lt;code&gt;Alt + Click&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/hakimel/reveal.js#pdf-export&#34; target=&#34;_blank&#34;&gt;PDF Export&lt;/a&gt;: &lt;code&gt;E&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h2 id=&#34;code-highlighting&#34;&gt;Code Highlighting&lt;/h2&gt;

&lt;p&gt;Inline code: &lt;code&gt;variable&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Code block:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-python&#34;&gt;porridge = &amp;quot;blueberry&amp;quot;
if porridge == &amp;quot;blueberry&amp;quot;:
    print(&amp;quot;Eating...&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;

&lt;hr /&gt;

&lt;h2 id=&#34;math&#34;&gt;Math&lt;/h2&gt;

&lt;p&gt;In-line math: $x + y = z$&lt;/p&gt;

&lt;p&gt;Block math:&lt;/p&gt;

&lt;p&gt;$$
f\left( x \right) = \;\frac{{2\left( {x + 4} \right)\left( {x - 4} \right)}}{{\left( {x + 4} \right)\left( {x + 1} \right)}}
$$&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&#34;fragments&#34;&gt;Fragments&lt;/h2&gt;

&lt;p&gt;Make content appear incrementally&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;{{% fragment %}} One {{% /fragment %}}
{{% fragment %}} **Two** {{% /fragment %}}
{{% fragment %}} Three {{% /fragment %}}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Press &lt;code&gt;Space&lt;/code&gt; to play!&lt;/p&gt;

&lt;p&gt;&lt;span class=&#34;fragment &#34; &gt;
   One
&lt;/span&gt;
&lt;span class=&#34;fragment &#34; &gt;
   &lt;strong&gt;Two&lt;/strong&gt;
&lt;/span&gt;
&lt;span class=&#34;fragment &#34; &gt;
   Three
&lt;/span&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;A fragment can accept two optional parameters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;class&lt;/code&gt;: use a custom style (requires definition in custom CSS)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;weight&lt;/code&gt;: sets the order in which a fragment appears&lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h2 id=&#34;speaker-notes&#34;&gt;Speaker Notes&lt;/h2&gt;

&lt;p&gt;Add speaker notes to your presentation&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-markdown&#34;&gt;{{% speaker_note %}}
- Only the speaker can read these notes
- Press `S` key to view
{{% /speaker_note %}}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Press the &lt;code&gt;S&lt;/code&gt; key to view the speaker notes!&lt;/p&gt;

&lt;aside class=&#34;notes&#34;&gt;
  &lt;ul&gt;
&lt;li&gt;Only the speaker can read these notes&lt;/li&gt;
&lt;li&gt;Press &lt;code&gt;S&lt;/code&gt; key to view&lt;/li&gt;
&lt;/ul&gt;
&lt;/aside&gt;

&lt;hr /&gt;

&lt;h2 id=&#34;themes&#34;&gt;Themes&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;black: Black background, white text, blue links (default)&lt;/li&gt;
&lt;li&gt;white: White background, black text, blue links&lt;/li&gt;
&lt;li&gt;league: Gray background, white text, blue links&lt;/li&gt;
&lt;li&gt;beige: Beige background, dark text, brown links&lt;/li&gt;
&lt;li&gt;sky: Blue background, thin dark text, blue links&lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;ul&gt;
&lt;li&gt;night: Black background, thick white text, orange links&lt;/li&gt;
&lt;li&gt;serif: Cappuccino background, gray text, brown links&lt;/li&gt;
&lt;li&gt;simple: White background, black text, blue links&lt;/li&gt;
&lt;li&gt;solarized: Cream-colored background, dark green text, blue links&lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;


&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;/img/boards.jpg&#34;
  &gt;


&lt;h2 id=&#34;custom-slide&#34;&gt;Custom Slide&lt;/h2&gt;

&lt;p&gt;Customize the slide style and background&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-markdown&#34;&gt;{{&amp;lt; slide background-image=&amp;quot;/img/boards.jpg&amp;quot; &amp;gt;}}
{{&amp;lt; slide background-color=&amp;quot;#0000FF&amp;quot; &amp;gt;}}
{{&amp;lt; slide class=&amp;quot;my-style&amp;quot; &amp;gt;}}
&lt;/code&gt;&lt;/pre&gt;

&lt;hr /&gt;

&lt;h2 id=&#34;custom-css-example&#34;&gt;Custom CSS Example&lt;/h2&gt;

&lt;p&gt;Let&amp;rsquo;s make headers navy colored.&lt;/p&gt;

&lt;p&gt;Create &lt;code&gt;assets/css/reveal_custom.css&lt;/code&gt; with:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-css&#34;&gt;.reveal section h1,
.reveal section h2,
.reveal section h3 {
  color: navy;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;hr /&gt;

&lt;h1 id=&#34;questions&#34;&gt;Questions?&lt;/h1&gt;

&lt;p&gt;&lt;a href=&#34;https://discourse.gohugo.io&#34; target=&#34;_blank&#34;&gt;Ask&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://sourcethemes.com/academic/docs/&#34; target=&#34;_blank&#34;&gt;Documentation&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Watershed Quality in Minnesota</title>
      <link>https://almostkapil.netlify.com/post/lake_data/</link>
      <pubDate>Sat, 23 Jun 2018 21:13:14 -0500</pubDate>
      <guid>https://almostkapil.netlify.com/post/lake_data/</guid>
      <description>


&lt;p&gt;** Data Product Below… **&lt;/p&gt;
&lt;p&gt;Link of the Competition the data is of : &lt;a href=&#34;http://minneanalytics.org/minnemudac-2016/data/&#34; class=&#34;uri&#34;&gt;http://minneanalytics.org/minnemudac-2016/data/&lt;/a&gt;
Our Submission as a freshmen :
&lt;B&gt; MINNEMUDAC:2016 &lt;/B&gt; &lt;br&gt;
&lt;I&gt;Water Quality Analytics Competiton:&lt;/I&gt; &lt;br&gt;
A blog on Data Analytics Competition that we recently participated . We were Ranked 5th out of 19th team that participated from regional universities of Midwest USA
This is the Analysis Report of a Analytics Competition that i participated in Minnesota on Nov 4 and Nov 5 in Eden Praire, Optum Technologies. This Competition was Organized by Minneanalytics[biggest analytics Group in Minneapolis], MUDAC[ Yearly analytics event of Winonat State University] and Social Data Science[a Data Science for Social Good Platform based in Minneapolis]&lt;/p&gt;
&lt;p&gt;Thanks to my Wonderful team for collaboration and Professor for Helping this happen !
For interactive Dashboard of our Report:
&lt;a href=&#34;https://public.tableau.com/profile/malek.hakim#!/vizhome/PARCELS_Story/WaterQualityVisualizationsintheTwinCities&#34; class=&#34;uri&#34;&gt;https://public.tableau.com/profile/malek.hakim#!/vizhome/PARCELS_Story/WaterQualityVisualizationsintheTwinCities&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;My first Data Analytics competition and we got honourable mention &lt;/em&gt;&lt;/p&gt;
This project is divided into two parts.
&lt;li&gt;
Data Cleaning and Data Management&lt;br&gt;
&lt;li&gt;
&lt;p&gt;Data Product and Presentation&lt;/p&gt;
&lt;p&gt;First Part of this project was done in Python. This is the link of the code:
&lt;a href=&#34;https://github.com/KapilKhanal/DSCI430/blob/master/project_data_khanal.py&#34; class=&#34;uri&#34;&gt;https://github.com/KapilKhanal/DSCI430/blob/master/project_data_khanal.py&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The Data was sponsored by MinneMUDAC as part of the Fall Data Challenge&lt;/p&gt;
&lt;p&gt;The second part of the project was focused on the making a usable data product.
The link of the code: &lt;a href=&#34;https://github.com/KapilKhanal/DSCI430/blob/master/app.R&#34; class=&#34;uri&#34;&gt;https://github.com/KapilKhanal/DSCI430/blob/master/app.R&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Below you can use this product&lt;/em&gt;
&lt;iframe width=&#34;800&#34; height=&#34;1000&#34; scrolling=&#34;no&#34; frameborder=&#34;yes&#34;  src=&#34;https://kapilkhanal.shinyapps.io/r_final_app/&#34;&gt; &lt;/iframe&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Testing Alcohol level</title>
      <link>https://almostkapil.netlify.com/post/beer/</link>
      <pubDate>Wed, 23 May 2018 21:13:14 -0500</pubDate>
      <guid>https://almostkapil.netlify.com/post/beer/</guid>
      <description>


&lt;div id=&#34;is-there-really-5.4-alcohol-in-that-beer-brand&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Is there really 5.4% alcohol in that beer brand?&lt;/h2&gt;
&lt;p&gt;We all see that a lot of brand publish on their wrapper that the alcohol level is 5.4%. Let’s say we collected the percent level of volume for those brand. We sampled randomly and measured the alcohol level ourselves&lt;/p&gt;
&lt;p&gt;So we believe that the actual beer percent should be 5.4% but as a beer consumer, we feel sometime it’s not.&lt;/p&gt;
&lt;p&gt;if we measure one, and found out that beer has 6.7 we would immediately complain that the brand is telling us lie that there is 5.4% . They may argue that our measuring apparatus or technique is not 100% accurate. There is no way of finding our inaccurate our measurement without measuring it multiple times or taking measurement of multiple beers. It might be the case that our measurement is 100% accurate and the beer has more alcohol than the company is saying. We don’t really know. Also, we can’t measure every single beer they ever manufactured.
This is the perfect timing to test this with our statistics sense,
Below we have a list of measurements from different beer randomly bought, some from midtown, some from walmart.
Let’s do a t-test.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;level = c(5.1,5.2,6,7,5.01,5.0,6.5,5.6,5.2,6.1,6.2,5.0)
t.test(level, mu = 5.4)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
##  One Sample t-test
## 
## data:  level
## t = 1.3139, df = 11, p-value = 0.2156
## alternative hypothesis: true mean is not equal to 5.4
## 95 percent confidence interval:
##  5.225010 6.093323
## sample estimates:
## mean of x 
##  5.659167&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The p-value is greater than 0.05 and confidence interval [5.17 to 6.17]. Which means if 100 people have done this random sampling of beer and have calculated the confidence interval , then the mean[5.4] would have always fall in the confidence interval.&lt;/p&gt;
&lt;p&gt;Enough with the statistical jargon? Okay let’s enjoy the beer&lt;img src=&#34;https://almostkapil.netlify.com/post/Beer_files/giphy.gif&#34; alt=&#34;Cold Beer and Confidence Interval&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Police  Data Challenge</title>
      <link>https://almostkapil.netlify.com/post/police-data-challenge/</link>
      <pubDate>Sun, 23 Jul 2017 21:13:14 -0500</pubDate>
      <guid>https://almostkapil.netlify.com/post/police-data-challenge/</guid>
      <description>


&lt;div id=&#34;police-data-challenge-winner-recommendations&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Police Data Challenge: Winner Recommendations&lt;/h1&gt;
&lt;p&gt;February 1, 2018&lt;/p&gt;
&lt;p&gt;The Police Data Challenge contest brought talented high school and undergraduate students across the nation to show their passion for the good statistics can do.&lt;/p&gt;
&lt;p&gt;With the Police Foundation’s efforts to make the information available, the 70 teams used real crime data sets from Baltimore, Seattle and Cincinnati police departments to analyze the best possible solutions for safer communities.&lt;/p&gt;
&lt;p&gt;Check out below how the winning teams analyzed the best way to fight crime through statistics:&lt;/p&gt;
&lt;p&gt;Winona State University, Winona, MN
Jimmy Hickey, Kapil Khanal, Luke Peacock divided the crimes into more detailed categories than what the Seattle Police Department data provided. They used the crime types and locations to discover that gun related crimes are condensed in specific areas. Their recommendation was to raise public awareness of the times and locations of high crimes and include more police for patrol.&lt;/p&gt;
&lt;p&gt;Sponsored by: Silas Bergen&lt;/p&gt;
&lt;p&gt;Link to Presentation: &lt;a href=&#34;https://thisisstatistics.org/wp-content/uploads/2018/01/Best-Overall-College-.pptx&#34; class=&#34;uri&#34;&gt;https://thisisstatistics.org/wp-content/uploads/2018/01/Best-Overall-College-.pptx&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Please visit here for code
&lt;a href=&#34;https://github.com/KapilKhanal/Police-Data-Challenge&#34; class=&#34;uri&#34;&gt;https://github.com/KapilKhanal/Police-Data-Challenge&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Secretory Problem</title>
      <link>https://almostkapil.netlify.com/post/secretoryproblem/</link>
      <pubDate>Sun, 23 Jul 2017 21:13:14 -0500</pubDate>
      <guid>https://almostkapil.netlify.com/post/secretoryproblem/</guid>
      <description>


&lt;div id=&#34;when-to-give-up-exploration-vs-exploitation&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;&lt;B&gt;When to give up? Exploration vs Exploitation&lt;/B&gt;&lt;/h2&gt;
&lt;p&gt;A lot of hard working students don’t end up being selected for the scholarships. I should know because i lost 3 years doing it.&lt;/p&gt;
&lt;p&gt;Now i turn into a information theoretic game to find when should i have quit the whole process.&lt;/p&gt;
&lt;p&gt;Assumption: Your best score will get you scholarship if you are one of the sufficiently prepared student.&lt;/p&gt;
&lt;p&gt;Say, entrance exams are the games. We all agree they do behave as a game. If a student is well prepared as indicated by practice questions and exams, then getting their name in scholarship in list is basically a game of chance. This is not to say that it is not possible but given that we all have time and money constrains in our life, when is the right amount to quit. Thus, A player in this game is a sufficiently prepared, hard working student. for others, before playing this game one has to be efficiently prepared.&lt;/p&gt;
&lt;p&gt;Now that we agree, getting your name on that list is a work of chance. Say, that you are prepared to give entrance exams 10 times but that will come at a cost of time and money. Out of 10 exams you give, say all these exams can be ranked from your best score to worst score , thus you can rank them from 1 to 10. We can agree on one thing that your best possible score has the highest chance of getting scholarship[which may not be necessarily true for all but our player is a smart, hardworking , well prepared one.].&lt;/p&gt;
&lt;p&gt;Now, we give exams one by one and the score one get is random after some cutoff[for me it was 90]. We can all relate to the “fact” that some questions are actually random and they determine our fate.&lt;/p&gt;
&lt;p&gt;So we don’t know which exam’s is gonna be the best score for us. so its ideal to assume that it is random. After we give each exams, we surely can rank which one was the best exams and which one was the worst.&lt;/p&gt;
&lt;p&gt;The optimal solution is to give n/e exams before deciding to quit and quit after the n/e exams if the score on n/e + 1 is not better than the exams before.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;def quit_candidate(n):
    &amp;#39;&amp;#39;&amp;#39;Choose a exam to quit after.. from a list of n exam using 
    the optimal strategy. 1= best time to quit,n is worst time to quit&amp;#39;&amp;#39;&amp;#39;

    exams = np.arange(1, n+1)
    np.random.shuffle(exams)
    
    stop = int(round(n/np.e)) 
    best_from_rejected = np.min(exams[:stop])
    rest = exams[stop:]
    
    try:
        return rest[rest &amp;lt; best_from_rejected][0]
    except IndexError:
        return exams[-1]
#Now let&amp;#39;s see if it actually holds..by having  100,000 student give 100 exams

sim = np.array([quit_candidate(n=100) for i in range(100000)])

plt.figure(figsize=(10, 6))
plt.hist(sim, bins=100)
plt.xticks(np.arange(0, 101, 10))
plt.ylim(0, 40000)
plt.xlabel(&amp;#39;Chosen candidate&amp;#39;)
plt.ylabel(&amp;#39;frequency&amp;#39;)
plt.show()&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;https://imrankhan17.github.io/pages/figs/secretary/fig1.png&#34; alt=&#34;img&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;img&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We see most of the time we ended up quiting on the prime time[rank 1 is the prime time to quit]&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;best_candidate = []
for r in range(5, 101, 5):
    sim = np.array([quit_candidate(n=100, reject=r) for i in range(100000)])
    # np.histogram counts frequency of each candidate
    best_candidate.append(np.histogram(sim, bins=100)[0][0]/100000)

plt.figure(figsize=(10, 6))
plt.scatter(range(5, 101, 5), best_candidate)
plt.xlim(0, 100)
plt.xticks(np.arange(0, 101, 10))
plt.ylim(0, 0.4)
plt.xlabel(&amp;#39;% of candidates rejected&amp;#39;)
plt.ylabel(&amp;#39;Probability of choosing best candidate&amp;#39;)
plt.grid(True)
plt.axvline(100/np.e, ls=&amp;#39;--&amp;#39;, c=&amp;#39;black&amp;#39;)
plt.show()
&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;https://imrankhan17.github.io/pages/figs/secretary/fig3.png&#34; alt=&#34;img&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;img&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Hence , if we decide to quit on the optimal time to quit is try giving 37% exams and quit if the score is lower than the lower score you got before.&lt;/p&gt;
&lt;p&gt;so i was ready to give 8 exams and my score were [84,87,88,94,90,92]&lt;/p&gt;
&lt;p&gt;37% of 8 = 3.&lt;/p&gt;
&lt;p&gt;My score was improving after 3rd exam so i guess i was right to keep giving exams but the 5th exam my score went down i guess i should have quit then instead of giving one more exam. I lost another 3 month preparing for that.&lt;/p&gt;
&lt;p&gt;:-by Kapil Khanal&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Why do you have to wait more for the buses?</title>
      <link>https://almostkapil.netlify.com/post/inspectionparadox/</link>
      <pubDate>Sun, 23 Jul 2017 21:13:14 -0500</pubDate>
      <guid>https://almostkapil.netlify.com/post/inspectionparadox/</guid>
      <description>


&lt;div id=&#34;average-for-group-vs-individual&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Average for group vs Individual&lt;/h1&gt;
&lt;p&gt;&lt;B&gt;Inspection Paradox&lt;/B&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://almostkapil.netlify.com/post/InspectionParadox_files/sajha.png&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Buses and trains are supposed to arrive at constant intervals, but in practice some intervals are longer than others. This means the buses do not follow schedule exactly. There is always some randomness..With your luck, you might think you are more likely to arrive during a long interval. It turns out you are right: a random arrival is more likely to fall in a long interval because, well, it’s longer..!&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Let’s think of a scenario…&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Suppose a Bus service in your city says they pass a station every 10 minutes. This means you will assume that when you go to station randomly you would think that the average time is 5 minutes but more often you will be waiting longer than five minutes actually 10 minutes on average.&lt;/p&gt;
&lt;p&gt;Another example of this paradoxes is: Most of the school report there average class size. But if you, as a student that average is not accurate. Say, there are 4 classes of size 75,13,12,10. Then, the average colleges will report is &lt;span class=&#34;math inline&#34;&gt;\((75 + 13 +12 +10)/4 = 27.5\)&lt;/span&gt; but you as a prospective student, the average is different.&lt;/p&gt;
&lt;p&gt;You are more likely to be in room with 75 students &lt;span class=&#34;math inline&#34;&gt;\(((75*75) + (13*13)+(12*12)+(10*10))/110 = 54.89\)&lt;/span&gt;. Hence, the average reporting is not for you. This kind of paradoxes happen everywhere.&lt;/p&gt;
&lt;p&gt;To generalize it in a more abstract way,&lt;/p&gt;
&lt;p&gt;This is one case where the perspective of the individual and the group differs.For group, the average is what happens but as a individual the average will not make any sense.&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>External Project</title>
      <link>https://almostkapil.netlify.com/project/research/</link>
      <pubDate>Wed, 27 Apr 2016 00:00:00 +0000</pubDate>
      <guid>https://almostkapil.netlify.com/project/research/</guid>
      <description></description>
    </item>
    
    <item>
      <title>An example journal article</title>
      <link>https://almostkapil.netlify.com/publication/journal-article/</link>
      <pubDate>Tue, 01 Sep 2015 00:00:00 +0000</pubDate>
      <guid>https://almostkapil.netlify.com/publication/journal-article/</guid>
      <description>&lt;div class=&#34;alert alert-note&#34;&gt;
  &lt;div&gt;
    Click the &lt;em&gt;Cite&lt;/em&gt; button above to demo the feature to enable visitors to import publication metadata into their reference management software.
  &lt;/div&gt;
&lt;/div&gt;

&lt;div class=&#34;alert alert-note&#34;&gt;
  &lt;div&gt;
    Click the &lt;em&gt;Slides&lt;/em&gt; button above to demo Academic&amp;rsquo;s Markdown slides feature.
  &lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;Supplementary notes can be added here, including &lt;a href=&#34;https://sourcethemes.com/academic/docs/writing-markdown-latex/&#34; target=&#34;_blank&#34;&gt;code and math&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>An example conference paper</title>
      <link>https://almostkapil.netlify.com/publication/conference-paper/</link>
      <pubDate>Mon, 01 Jul 2013 00:00:00 +0000</pubDate>
      <guid>https://almostkapil.netlify.com/publication/conference-paper/</guid>
      <description>&lt;div class=&#34;alert alert-note&#34;&gt;
  &lt;div&gt;
    Click the &lt;em&gt;Cite&lt;/em&gt; button above to demo the feature to enable visitors to import publication metadata into their reference management software.
  &lt;/div&gt;
&lt;/div&gt;

&lt;div class=&#34;alert alert-note&#34;&gt;
  &lt;div&gt;
    Click the &lt;em&gt;Slides&lt;/em&gt; button above to demo Academic&amp;rsquo;s Markdown slides feature.
  &lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;Supplementary notes can be added here, including &lt;a href=&#34;https://sourcethemes.com/academic/docs/writing-markdown-latex/&#34; target=&#34;_blank&#34;&gt;code and math&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://almostkapil.netlify.com/post/dsci_final_project/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>https://almostkapil.netlify.com/post/dsci_final_project/</guid>
      <description>

&lt;h1 id=&#34;smoothing-of-images-and-edge-detection&#34;&gt;Smoothing of Images and Edge Detection&lt;/h1&gt;

&lt;pre&gt;&lt;code&gt;              `The one with Criminal Behind the Gaussian Noise`

            KAPIL KHANAL, DANIEL LEW, NILIMA PANDEY
&lt;/code&gt;&lt;/pre&gt;

&lt;h3 id=&#34;problem-statement&#34;&gt;Problem Statement&lt;/h3&gt;

&lt;p&gt;In the parallel universe, Winona Police department came to us for identifying the location of a criminal. The &lt;code&gt;Criminal was hiding behind the gaussian noise&lt;/code&gt;.
We took three steps to help the department.&lt;br&gt;
&lt;li&gt;Identifying whether or not it was a gaussian noise
&lt;li&gt;Smooth the photo using different techniques
&lt;li&gt;Locate the important edges in the photograph.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-python&#34;&gt;
%matplotlib inline
import cv2
import numpy as np
from matplotlib import pyplot as plt
#plt.style.use(&#39;ggplot&#39;)
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
from IPython.display import display
&lt;/code&gt;&lt;/pre&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;A histogram is a graph or a plot that represents the distribution of the pixel intensities in an image&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;focus on the RGB color space&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Calculating the histogram of an image is very useful as it gives an intuition regarding some properties of the image such as the tonal range, the contrast and the brightness&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-python&#34;&gt;image = cv2.imread(&amp;quot;Unidentified.png&amp;quot;)

def histogram(image):
fig,axs = plt.subplots(2,1,figsize = (12,11))
channels = cv2.split(image)

colors = (&amp;quot;b&amp;quot;, &amp;quot;g&amp;quot;, &amp;quot;r&amp;quot;) 

for(channel, c) in zip(channels, colors):
    histogram = cv2.calcHist([channel], [0], None, [256], [0, 256])
    axs[0].plot(histogram, color = c,linewidth=1.0)
axs[1].imshow(image[:,:,::-1])
histogram(image)

    
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&#34;DSCI_FINAL_Project_files/DSCI_FINAL_Project_4_0.png&#34; alt=&#34;png&#34; /&gt;&lt;/p&gt;

&lt;h3 id=&#34;gaussian-blur&#34;&gt;Gaussian Blur&lt;/h3&gt;

&lt;p&gt;We Performed a Gaussian blur on the image. The blur removes some of the noise before further processing the image. A appropriate sigma can be computed from trial and error.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;In Gaussian Blurring, a Gaussian Kernel is used to blur the image.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;cv2.GaussianBlur() function is used to blur the image&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;cv2.getGaussianKernel() fcuntion can be used to to create a Gaussian Kernel.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;width and height of the kernel should be specified in the kernel and both of them should be positive and odd&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;The standard deviation in X and Y direction, sigmaX and sigmaY should also be specified.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;If only sigmaX is specified, sigmaY is taken same as the sigmaX&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;If both sigmaX and sigmaY are given zero, they are calculated from the kernel size&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Gaussian Blurring is higly effective in removing gaussian noise from the image.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&#34;gaussian-noise&#34;&gt;Gaussian Noise&lt;/h3&gt;

&lt;p&gt;&lt;img src=&#34;attachment:Screen%20Shot%202018-12-07%20at%208.27.39%20PM%202.png&#34; alt=&#34;Screen%20Shot%202018-12-07%20at%208.27.39%20PM%202.png&#34; /&gt;&lt;/p&gt;

&lt;p&gt;?cv2.GaussianBlur()&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;attachment:Screen%20Shot%202018-12-07%20at%208.24.49%20PM%202.png&#34; alt=&#34;Screen%20Shot%202018-12-07%20at%208.24.49%20PM%202.png&#34; /&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-python&#34;&gt;def smoothed_gaussian(img,window_size,sigma,hist = True):
    &amp;quot;&amp;quot;&amp;quot; Function called by interact &amp;quot;&amp;quot;&amp;quot;
    img = cv2.GaussianBlur(img, (window_size, window_size), sigma)
    if hist:
        histogram(img)
    return img
y = interact(smoothed_gaussian,img = fixed(image),window_size = [3,5,7],sigma = widgets.IntSlider(min=0,max=10,step=1,value=4) )

&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code&gt;interactive(children=(Dropdown(description=&#39;window_size&#39;, options=(3, 5, 7), value=3), IntSlider(value=4, desc…
&lt;/code&gt;&lt;/pre&gt;

&lt;h3 id=&#34;median-blur&#34;&gt;Median Blur&lt;/h3&gt;

&lt;pre&gt;&lt;code class=&#34;language-python&#34;&gt;def smoothed_median(img ,window_size,hist = True):
    &amp;quot;&amp;quot;&amp;quot; Function called by interact &amp;quot;&amp;quot;&amp;quot;
    img = cv2.medianBlur(img, window_size)
    if hist:
        histogram(img)
    return img
y = interact(smoothed_median,img = fixed(image),window_size  = widgets.IntSlider(min=1,max=10,step=2,value=3))

&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code&gt;interactive(children=(IntSlider(value=3, description=&#39;window_size&#39;, max=10, min=1, step=2), Checkbox(value=Tru…
&lt;/code&gt;&lt;/pre&gt;

&lt;h3 id=&#34;convolution&#34;&gt;Convolution&lt;/h3&gt;

&lt;p&gt;Convolution is an important operation in signal and image processing.
Convolution operateson two signals (in 1D) or two images (in 2D):you can think of one as the “input”
signal (or image), and the other (called the kernel) as a “filter” on the input image, producing
an output image (so convolution takes two images as input and produces a third
as output).&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;http://machinelearninguru.com/_images/topics/computer_vision/basics/convolution/1.JPG&#34; alt=&#34;Alt text&#34; title=&#34;Title text&#34; /&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-python&#34;&gt;def smoothed_convolution(img,window_size,hist =True):
    kernel = np.ones((window_size,window_size))/(window_size*window_size)
    img = cv2.filter2D(img, -1,kernel)
    if hist:
        histogram(img)
    return img
y = interact(smoothed_convolution,img = fixed(image),window_size = widgets.IntSlider(min=0,max=10,step=1,value=3))


&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code&gt;interactive(children=(IntSlider(value=3, description=&#39;window_size&#39;, max=10), Checkbox(value=True, description=…
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Edge detection is one of the fundamental operations when we perform image processing. It helps us reduce the amount of data (pixels) to process and maintains the structural aspect of the image&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-python&#34;&gt;def showEdges(img,blur,thresholds, blurType,window_size,sigma):
    &amp;quot;&amp;quot;&amp;quot; Function called by interact &amp;quot;&amp;quot;&amp;quot;
    if blurType == &#39;Median&#39;:
        img = smoothed_median(img,window_size,hist = False)
    elif blurType == &#39;Guassian&#39;:
        img = smoothed_gaussian(img,window_size,sigma,hist = False)
    elif blurType == &#39;Convolution&#39;:
        img = smoothed_convolution(img,window_size,hist = False)
    
    thresh1, thresh2 = thresholds
    edges = cv2.Canny(img, thresh1, thresh2)
    plt.imshow(edges)
    
rangeSlider = widgets.IntRangeSlider(
    value = [50, 200],
    min = 0,
    max = 255,
    step = 1,
    description = &#39;Thresholds&#39;,
    continuous_update = True
)
&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code class=&#34;language-python&#34;&gt;y = interact(showEdges,
         img = fixed(image),
         blur = True,
         thresholds = rangeSlider,
         blurType = [&#39;Median&#39;, &#39;Guassian&#39;, &#39;Convolution&#39;],
         window_size = [3,5,7,9],
        sigma = widgets.IntSlider(min=0,max=10,step=1,value=4))
        

display(y)
&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code&gt;interactive(children=(Checkbox(value=True, description=&#39;blur&#39;), IntRangeSlider(value=(50, 200), description=&#39;T…



&amp;lt;function __main__.showEdges(img, blur, thresholds, blurType, window_size, sigma)&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
</description>
    </item>
    
    <item>
      <title>Verifying empirical rule and Chebyshev&#39;s theorem</title>
      <link>https://almostkapil.netlify.com/post/untitled/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>https://almostkapil.netlify.com/post/untitled/</guid>
      <description>


&lt;div id=&#34;empirical-rule-and-chebyshevs-theorem&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Empirical rule and Chebyshev’s theorem&lt;/h2&gt;
&lt;p&gt;Let’s talk about this really simple concept but powerful one. &lt;code&gt;Data Distributions&lt;/code&gt;. A data distribution is an abstract concept(a function) that gives the the possible values of data and also how often that data is generated. When you want to talk about the all the data of your experiments at once, then talk about data distribution. A data distribution gives us the probability of how often that data will be an output if we keep repeating the experiment.&lt;/p&gt;
&lt;p&gt;We rarely have the complete dataset from the experiment.So, it is powerful to have the an idea of how data is distributed and which data occurs more often than others. We can intuitively understand some distributions like the height of the populations. We know there will be few people with really short height while few have more height. But we are sure that most of the people will be in between.This is really convienient for us to know in advance the spread and frequency of the data.
&lt;img src=&#34;https://almostkapil.netlify.com/post/Untitled_files/normal.png&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Interesting thing is that there are more than one kinds of distributions in the world. So the convienience if knowing in advance the spread of the data will be helpful. There is a famous theorem that givrs us an idea of how our data is distributed. It’s called Chebyshev’s theorem.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;https://almostkapil.netlify.com/post/Untitled_files/chebyshev.jpg&#34; alt=&#34;image credit: libretext&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;image credit: libretext&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;It says that most(3/4th) of our data will be at max two standard deviations from the mean.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(tidyverse)
library(knitr)
library(kableExtra)
stock&amp;lt;- read.csv(&amp;quot;~/OneDrive - MNSCU/FALL 2019/MathStat/Data/Stock Trade.csv&amp;quot;,stringsAsFactors = FALSE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now let’s clean the name,&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;stock&amp;lt;- stock %&amp;gt;% select(percentStock = X..of.Shares.Outstanding)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The empirical rule says that 68% of the data will be within two standard deviation.&lt;/p&gt;
&lt;p&gt;This function below:
1. standardizes the data &lt;br&gt;
2. counts data within &lt;code&gt;z&lt;/code&gt; standard deviations &lt;br&gt;
3. outputs the proportion&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;data_within&amp;lt;- function(df, z){
  func_normalize&amp;lt;-function(x){(x-mean(x))/sd(x)}
  #&amp;gt;11 after removing a data point 
  df&amp;lt;-df %&amp;gt;% filter(percentStock&amp;lt;11)
  df_scaled&amp;lt;- df %&amp;gt;% mutate(percentStock_normal = func_normalize(percentStock)) %&amp;gt;% filter(abs(percentStock_normal)&amp;lt;z)
  proportion = dim(df_scaled)[1]/dim(df)[1]
  return (round(proportion,2))
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s collect the output in a small tibble.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;tb&amp;lt;- tibble(
  first_std_dev = data_within(stock,1),
  second_std_dev = data_within(stock,2),
  third_std_dev = data_within(stock,3),
)

kable(tb)&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
first_std_dev
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
second_std_dev
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
third_std_dev
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.64
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.92
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;We can also test if our function is working correctly,&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(testthat)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: package &amp;#39;testthat&amp;#39; was built under R version 3.5.2&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;normal_generated = tibble(percentStock = rnorm(10,mean = 6.2,sd = 1.2))

#Testing our function
tb_test&amp;lt;- tibble(
  first_std_dev = data_within(normal_generated,1),
  second_std_dev = data_within(normal_generated,2),
  third_std_dev = data_within(normal_generated,3),
)


kable(tb_test)&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
first_std_dev
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
second_std_dev
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
third_std_dev
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.7
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;testthat::expect_gt(tb_test$first_std_dev, 0.68,label = &amp;quot;data proportion within first deivation&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Hence, our function is working correctly.Note that the data is randomly generated every time the code is run.&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
  </channel>
</rss>
