<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Data Visualization | 𝚃𝚛𝚊𝚗𝚜𝚙𝚘𝚗𝚜𝚝𝚎𝚛</title>
    <link>https://almostkapil.netlify.com/tags/data-visualization/</link>
      <atom:link href="https://almostkapil.netlify.com/tags/data-visualization/index.xml" rel="self" type="application/rss+xml" />
    <description>Data Visualization</description>
    <generator>Source Themes Academic (https://sourcethemes.com/academic/)</generator><language>en-us</language><copyright>© 2018 Kapil Khanal</copyright><lastBuildDate>Wed, 21 Aug 2019 21:13:14 -0500</lastBuildDate>
    <image>
      <url>https://almostkapil.netlify.com/img/aph-salt-spring-zoom.jpg</url>
      <title>Data Visualization</title>
      <link>https://almostkapil.netlify.com/tags/data-visualization/</link>
    </image>
    
    <item>
      <title>CMU Sport Analytics Projects Slideshows</title>
      <link>https://almostkapil.netlify.com/post/baseball/</link>
      <pubDate>Wed, 21 Aug 2019 21:13:14 -0500</pubDate>
      <guid>https://almostkapil.netlify.com/post/baseball/</guid>
      <description>


&lt;div id=&#34;my-cmsac-experience&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;My CMSAC Experience&lt;/h2&gt;
&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;Jeremy Sanchez &lt;a href=&#34;https://twitter.com/_jsanchez1?ref_src=twsrc%5Etfw&#34;&gt;@_jsanchez1&lt;/a&gt;, Nathan Moss &lt;a href=&#34;https://twitter.com/CMU_Stats?ref_src=twsrc%5Etfw&#34;&gt;@CMU_Stats&lt;/a&gt;, and Kapil Khanal @Kapil71001628 working on soccer with &lt;a href=&#34;https://twitter.com/kpelechrinis?ref_src=twsrc%5Etfw&#34;&gt;@kpelechrinis&lt;/a&gt; &lt;a href=&#34;https://t.co/Ij2eFiJ8eH&#34;&gt;pic.twitter.com/Ij2eFiJ8eH&lt;/a&gt;&lt;/p&gt;&amp;mdash; CMU Stats &amp;amp; DS (@CMU_Stats) &lt;a href=&#34;https://twitter.com/CMU_Stats/status/1154857646429749248?ref_src=twsrc%5Etfw&#34;&gt;July 26, 2019&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;

&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;https://almostkapil.netlify.com/post/Baseball_files/CMSAC.jpeg&#34; alt=&#34;Presenting our Final Project&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;Presenting our Final Project&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;my-first-project-at-cmu-statistics-sport-analytics-camp&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;My First Project at CMU Statistics :Sport Analytics Camp&lt;/h3&gt;
&lt;p&gt;The first week has been a good review of basic dplyr syntax and ggplot2 philosophy. I like how Professors and TA are always there for us. Small data manipulation problems or points being masked in scatterplots, i ran into all sort of problems. &lt;br&gt;
These are a practice projects before we actually work with our choice of research projects.&lt;/p&gt;
&lt;p&gt;Here is the schedule of this summer camp.
&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;Last day of &lt;a href=&#34;https://twitter.com/hashtag/CMSACamp?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#CMSACamp&lt;/a&gt;! Jam-packed summer full of &lt;a href=&#34;https://twitter.com/hashtag/datascience?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#datascience&lt;/a&gt;, &lt;a href=&#34;https://twitter.com/hashtag/sportsanalytics?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#sportsanalytics&lt;/a&gt;, speakers, tours, amazing partners &lt;a href=&#34;https://twitter.com/TruMediaSports?ref_src=twsrc%5Etfw&#34;&gt;@TruMediaSports&lt;/a&gt; &lt;a href=&#34;https://twitter.com/albert_larcada?ref_src=twsrc%5Etfw&#34;&gt;@albert_larcada&lt;/a&gt; &lt;a href=&#34;https://twitter.com/stat_sam?ref_src=twsrc%5Etfw&#34;&gt;@stat_sam&lt;/a&gt; &lt;a href=&#34;https://twitter.com/penguins?ref_src=twsrc%5Etfw&#34;&gt;@penguins&lt;/a&gt; &lt;a href=&#34;https://twitter.com/kpelechrinis?ref_src=twsrc%5Etfw&#34;&gt;@kpelechrinis&lt;/a&gt;  &lt;a href=&#34;https://twitter.com/Stat_Ron?ref_src=twsrc%5Etfw&#34;&gt;@Stat_Ron&lt;/a&gt; &lt;a href=&#34;https://twitter.com/NFL?ref_src=twsrc%5Etfw&#34;&gt;@NFL&lt;/a&gt; &lt;a href=&#34;https://twitter.com/albertbayes?ref_src=twsrc%5Etfw&#34;&gt;@albertbayes&lt;/a&gt; &lt;a href=&#34;https://twitter.com/bklynmaks?ref_src=twsrc%5Etfw&#34;&gt;@bklynmaks&lt;/a&gt; &lt;a href=&#34;https://twitter.com/ATLHawks?ref_src=twsrc%5Etfw&#34;&gt;@ATLHawks&lt;/a&gt; &lt;a href=&#34;https://twitter.com/acthomasca?ref_src=twsrc%5Etfw&#34;&gt;@acthomasca&lt;/a&gt; &lt;a href=&#34;https://twitter.com/sarah_malle?ref_src=twsrc%5Etfw&#34;&gt;@sarah_malle&lt;/a&gt; &lt;a href=&#34;https://twitter.com/nflscrapR?ref_src=twsrc%5Etfw&#34;&gt;@nflscrapR&lt;/a&gt; &lt;a href=&#34;https://twitter.com/Pirates?ref_src=twsrc%5Etfw&#34;&gt;@Pirates&lt;/a&gt; &lt;a href=&#34;https://t.co/feG2cZnGQR&#34;&gt;pic.twitter.com/feG2cZnGQR&lt;/a&gt;&lt;/p&gt;&amp;mdash; CMU Stats &amp;amp; DS (@CMU_Stats) &lt;a href=&#34;https://twitter.com/CMU_Stats/status/1154736616646283264?ref_src=twsrc%5Etfw&#34;&gt;July 26, 2019&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;
&lt;/p&gt;
&lt;div id=&#34;project1-baseball&#34; class=&#34;section level6&#34;&gt;
&lt;h6&gt;Project1: Baseball&lt;/h6&gt;
&lt;p&gt;For this project, we looked into how similar the top 5 hitters are in baseball.Below is the slide we presented at the camp.
&lt;iframe src=&#34;https://docs.google.com/presentation/d/e/2PACX-1vRe0m6fqFga0-BulFHn6_wXG7qKkp1G7Y8zpTAS6nrDmH69k_574IjHaGK_MrQxagGN_mQtNBF33uvo/embed?start=true&amp;loop=true&amp;delayms=2000&#34; frameborder=&#34;0&#34; width=&#34;860&#34; height=&#34;469&#34; allowfullscreen=&#34;true&#34; mozallowfullscreen=&#34;true&#34; webkitallowfullscreen=&#34;true&#34;&gt;&lt;/iframe&gt;&lt;/p&gt;
&lt;p&gt;Similarly for project 2 , we did another project using tennis dataset.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;project-2-tennis&#34; class=&#34;section level6&#34;&gt;
&lt;h6&gt;Project 2: Tennis&lt;/h6&gt;
&lt;p&gt;&lt;b&gt;What factors are best at predicting point ratio for a match during a Grand Slam?
&lt;/b&gt;
&lt;iframe src=&#34;https://docs.google.com/presentation/d/e/2PACX-1vTA379JFEoMzgqXndoeEaU3ZIC0P1P8f0d8dna7Je4QsnKWGDKW6-sTWIU5FTCvPAEynta1l1NWI1Na/embed?start=true&amp;loop=true&amp;delayms=3000&#34; frameborder=&#34;0&#34; width=&#34;860&#34; height=&#34;469&#34; allowfullscreen=&#34;true&#34; mozallowfullscreen=&#34;true&#34; webkitallowfullscreen=&#34;true&#34;&gt;&lt;/iframe&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;project-3simulating-office-environment-in-analytics&#34; class=&#34;section level6&#34;&gt;
&lt;h6&gt;Project 3:Simulating Office Environment in Analytics &lt;br&gt;&lt;/h6&gt;
&lt;p&gt;This is a non-technical project but most fun project. Our class of 16 students were partitioned into 4 analytics department for a hypothetical team. There is a lot of romour on players market, where some players are up for grab who are extremely essential for our team. Also, we have to let go some players. The crazy part of this project is that time is ticking. Our boss changes her decision every few minutes as per the changes inmarket. We have to come up with a some numbers to back up some decisions we are about to recommend.&lt;/p&gt;
&lt;p&gt;Below is the slide we prepared within 10 minutes with so many factora being changed while we were working on it.
&lt;iframe src=&#34;https://docs.google.com/presentation/d/e/2PACX-1vQnqAWxyL5W47Sd0FPRzSNdeWEq9uXE2T_3S_enY2YUNgIIhiJvFTbrA9tDmVztZtENwd9Rv3aT6QBV/embed?start=true&amp;loop=true&amp;delayms=3000&#34; frameborder=&#34;0&#34; width=&#34;960&#34; height=&#34;569&#34; allowfullscreen=&#34;true&#34; mozallowfullscreen=&#34;true&#34; webkitallowfullscreen=&#34;true&#34;&gt;&lt;/iframe&gt;&lt;/p&gt;
&lt;p&gt;This project shed some light on the life of working data scientists and data analysts. It’s not always about fancy graphs or complicated tongue twisting models. I learned that we start with the problem we have, collect necessary data, make new metrics as per problem, graph problems and proposed solutions so that intuitive to all concerned parties and then use models to test our hypothesis and take decision.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;project-3&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Project 3:&lt;/h3&gt;
&lt;p&gt;This is the final project i worked on for the half of this summer camp. We This is actually a work in progress. We will be changing a lot of things(i guess that is research, &lt;code&gt;change until you no longer find a justification to change&lt;/code&gt;)&lt;/p&gt;
&lt;p&gt;I chose this because soccer has been very interesting for me from my childhood. I played soccer in my high school extensively and it still fascinates me with all the complexity involved from Math ,Statistical and data point of view.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;https://almostkapil.netlify.com/post/Baseball_files/kapilcmu.jpeg&#34; alt=&#34;Presenting to class mates before poster presentation&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;Presenting to class mates before poster presentation&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Like i tweeted, I am extremely grateful for CMU Stats for letting me experience life as a data scientists.
&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;The best 8 weeks. I got to learn so many things and enjoy Pittsburgh. The spirit at &lt;a href=&#34;https://twitter.com/CMU_Stats?ref_src=twsrc%5Etfw&#34;&gt;@CMU_Stats&lt;/a&gt;  is amazing, like a Stat-Disney land. Thank you for everything especially all those free foods and tickets to game and Kennywood. &lt;a href=&#34;https://twitter.com/hashtag/CMSACamp?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#CMSACamp&lt;/a&gt;&lt;/p&gt;&amp;mdash; Kapil.Khanal (@almost_kapil) &lt;a href=&#34;https://twitter.com/almost_kapil/status/1154867446177763328?ref_src=twsrc%5Etfw&#34;&gt;July 26, 2019&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Data Dashboard for StockX Contest</title>
      <link>https://almostkapil.netlify.com/post/stockx/</link>
      <pubDate>Wed, 21 Aug 2019 21:13:14 -0500</pubDate>
      <guid>https://almostkapil.netlify.com/post/stockx/</guid>
      <description>


&lt;div id=&#34;stockx-data-contest-2019&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Stock&lt;span style=&#34;color:green&#34;&gt;&lt;b&gt;X&lt;/b&gt;&lt;/span&gt; Data Contest 2019&lt;/h2&gt;
&lt;p&gt;&lt;a href = &#34;https://stockx.com/news/the-2019-data-contest/&#34;&gt;StockX Challenge&lt;/a&gt; is a call for data and sneakers nerds to have fun.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;https://almostkapil.netlify.com/post/stockX_files/sneaker.jpg&#34; alt=&#34;source: stockX&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;source: stockX&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The basic idea is this: they give you a bunch of original StockX sneaker data, then you crunch the numbers and come up with the coolest, smartest, most compelling story you can tell. It can be literally anything you want. A theory, an insight, even just a really original data visualization. It could be a novel hypothesis about resale prices you’ve always wanted to test. Or maybe it’s just a beautiful chart to visualize the data. It can be on any subject – sneakers, brands, buyers, or even StockX itself. Whatever you find interesting, just follow your bliss.&lt;/p&gt;
&lt;p&gt;I also gave a shot on trying to come up with something useful. Below is my finished data dashboard. &lt;br&gt;&lt;/p&gt;
&lt;div id=&#34;my-data-dashboard-for-stockx&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;My Data Dashboard for Stock&lt;span style=&#34;color:green&#34;&gt;&lt;b&gt;X&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;https://almostkapil.netlify.com/post/index_files/stockX.png&#34; alt=&#34;Dashboard&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;Dashboard&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The link for tableau worksheet is &lt;a href = &#34;https://public.tableau.com/views/StockX_0/Dashboard1?:embed=y&amp;:display_count=yes&amp;:origin=viz_share_link&#34;&gt;here&lt;/a&gt; &lt;br&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Calculations on the Dashboards&lt;/em&gt; &lt;/br&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Price ratio&lt;/code&gt;: Ratio of Sales to Retail Price for Each Sneakers &lt;br&gt;
&lt;code&gt;Weeks&lt;/code&gt;: (Order Date - Release Date) Converted in Weeks.&lt;br&gt;
&lt;code&gt;Median Price ratio&lt;/code&gt; is chosen to eliminate the effect of asymmetrical range of dates(2017-2019 not
complete as 2018) and counts of sneakers sales.&lt;br&gt;
&lt;code&gt;Color&lt;/code&gt; Scale for two brands are consistent whenever there is plot relating to brands.&lt;br&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;1) Order of Sneakers by brand for weeks from Release Date&lt;/b&gt;&lt;br&gt;&lt;/p&gt;
&lt;p&gt;This plot shows the total count of orders for different sneakers of two brands
Both Brands are ordered before the release date. Off white has more orders than yeezy on the datasets.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;its-interesting-how-the-demand-of-yeezy-increased-at-around-90-weeks-after-the-release-of-the-shoes.&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;It’s interesting how the demand of yeezy increased at around &lt;code&gt;90 weeks&lt;/code&gt; after the release of the shoes.&lt;/h2&gt;
&lt;p&gt;2)&lt;b&gt;Ratio of Sales Price to Retail Price For each Brand by Weeks&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;This plot look at the relation of ratio of sale price to retail price for each brands and weeks after release
date. Clearly,Both Brand’s sale price is more than the retail price. The ratio of off-White increases in
general regardless of the individual sneakers while the ratio of yeezy brands is somewhat noisy but it has
a trend like off white. Both brand’s price ratio is increased after the release date.&lt;br&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;3)Distribution of Median Sales price given the retail price for each brand&lt;/b&gt;&lt;br&gt;
This plot looks in detail on how the median sale price is distributed for each sneaker. The distribution of
median sale price for top 28 sneakers which were sold as least as 5 times over retail price are plotted.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;4) Median Price and States&lt;/b&gt; &lt;br&gt;&lt;/p&gt;
&lt;p&gt;This plot is looking at the median price ratio for all the states. The color scale is chosen for the ratio and
the size of the sneakers shows total sales relative to others. Which states usually pays more for
sneakers? Clearly, Delaware,Vermont,Utah had some sales with high price ratio. States like California and
Newyork have a lot of sales as shown by their relative sizes. The relative size is calculated by taking the
log of total sales in each states. States like Wyoming have less Sales and also with lower sales ratio.&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Winona Area Public Schools: Community Contribution</title>
      <link>https://almostkapil.netlify.com/post/waps/</link>
      <pubDate>Wed, 21 Aug 2019 21:13:14 -0500</pubDate>
      <guid>https://almostkapil.netlify.com/post/waps/</guid>
      <description>


&lt;div id=&#34;winona-area-public-schools-data-visualization&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;&lt;span style=&#34;color:purple&#34;&gt;Winona Area Public Schools Data Visualization &lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;Introduction&lt;/code&gt;:&lt;br&gt;
This Project addresses the need of communication of public school data to community members in an meaningful way.Also, making the data available to general public in a proper and useable format. &lt;br&gt;&lt;/p&gt;
&lt;p&gt;There has been a wider discussion regarding the budget issue in Winona area schools. Here is
&lt;a href = &#34;https://www.winonadailynews.com/news/local/what-will-waps-cut-board-to-weigh-new-options-for/article_23c25b9f-7365-5aa2-b370-1ed251eb8231.html&#34;
 width=&#34;645&#34; height=&#34;955&#34;&gt;the article &lt;/a&gt;&lt;/p&gt;
Primarily, this Project was focused on cleaning and visualizing the Enrollment,Expenditures and Staffing History reports of the Winona Area Public District(WAPS) available publicly through Minnesota department of education, Data Center
Link:&lt;a href=&#34;http://education.state.mn.us/MDE/Data/&#34; class=&#34;uri&#34;&gt;http://education.state.mn.us/MDE/Data/&lt;/a&gt;
&lt;img src=&#34;https://almostkapil.netlify.com/post/WAPS_files/waps.png&#34; /&gt; &lt;br&gt;
&lt;h5&gt;
Methods and Steps of Projects
&lt;/h5&gt;
&lt;p&gt;1)Data Inspection/Acquisition:.&lt;br&gt;
Public Data was collected by Alison Quam (Representative from WAPS District).
The Data were made available in different pdf/excel files. Also, the information were scattered in different files.&lt;br&gt;&lt;/p&gt;
&lt;p&gt;2)Data Cleaning and Formatting&lt;br&gt;
First,most of the pdf files were converted to excel by Tabula(Link:&lt;a href=&#34;http://tabula.technology/&#34; class=&#34;uri&#34;&gt;http://tabula.technology/&lt;/a&gt;) and online tool(&lt;a href=&#34;http://pdftoexcel.com&#34; class=&#34;uri&#34;&gt;http://pdftoexcel.com&lt;/a&gt;)
then, they were cleaned up in proper format and stacked using Python (Pandas).&lt;br&gt;&lt;/p&gt;
&lt;p&gt;3)Data Exploration and Visualization &lt;br&gt;
This part of the project is focused on addressing the questions provided by representative of WAPS(Alison Quam).
Tableau was used extensively to explore the data and visualize it.
Primarily, i focused on answering following questions.&lt;br&gt;
1. &lt;span style=&#34;color:purple&#34;&gt;&lt;strong&gt;I was curious about,how does the enrollment and capture rate(rate of new born enrolling to Kindergarten)is changing on WAPS district?.&lt;/strong&gt; &lt;/span&gt;&lt;br&gt;&lt;/p&gt;
&lt;p&gt;After few meetings with representative, i realized she was more curious about how schools spends on across different programs.&lt;br&gt;&lt;/p&gt;
&lt;p&gt;2.&lt;span style=&#34;color:purple&#34;&gt;&lt;strong&gt;How the expenditure per average daily membership (count of student daily served in schools) and spending on various category is changing?.&lt;/span&gt;&lt;/strong&gt;&lt;br&gt;&lt;/p&gt;
&lt;p&gt;The link to the tableau file and the data is &lt;a href = &#34;https://public.tableau.com/views/WinonaAreaPublicSchoolsDataStory/FourthDashboard?:retry=yes&amp;:embed=y&amp;:display_count=yes&amp;:origin=viz_share_link&#34;&gt; &lt;b&gt;here&lt;/b&gt; &lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Now, Visual Story Begins….&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://almostkapil.netlify.com/post/WAPS_files/Second_Dashboard.png&#34; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://almostkapil.netlify.com/post/WAPS_files/Third_Dashboard.png&#34; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://almostkapil.netlify.com/post/WAPS_files/Fourth_Dashboard.png&#34; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;This project actually helped inform the decision makers in local level. Thus, i was able to contribute to something meaningful with my python and tableau skills.&lt;/code&gt;&lt;/p&gt;
&lt;div id=&#34;acknowledgement&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;Acknowledgement&lt;/h4&gt;
&lt;p&gt;I would like to thank WAPS representative and Prof.Silas Bergen on helping and guiding me to understand the terms and calculations already done in the reports and Prof.Todd Iverson to help figure out Python code for cleaning the data.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Animation:Internet Usage</title>
      <link>https://almostkapil.netlify.com/post/internetusage/</link>
      <pubDate>Thu, 15 Aug 2019 21:13:14 -0500</pubDate>
      <guid>https://almostkapil.netlify.com/post/internetusage/</guid>
      <description>


&lt;div id=&#34;how-internet-is-eating-the-world-internet-usage-animation&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;How internet is eating the world? Internet Usage animation&lt;/h2&gt;
&lt;p&gt;Internet Usage is the world bank development indicator. In this project i grabbed the world bank dataset(which is in the link provided below).&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://almostkapil.netlify.com/post/internetusage_files/internetUsage.gif&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Link to the tableau &lt;a href = &#34;https://public.tableau.com/shared/NXKC4HKX7?:display_count=yes&amp;:origin=viz_share_link&#34;&gt;worksheet&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Sankey diagrams for Bacteria and antibiotics</title>
      <link>https://almostkapil.netlify.com/post/sankey/</link>
      <pubDate>Wed, 24 Jul 2019 21:13:14 -0500</pubDate>
      <guid>https://almostkapil.netlify.com/post/sankey/</guid>
      <description>


&lt;div id=&#34;visually-classifying-bacteria-and-antibiotics&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Visually Classifying Bacteria and Antibiotics&lt;/h2&gt;
&lt;p&gt;After World War II, antibiotics earned the moniker “wonder drugs” for quickly treating previously-incurable diseases. Data was gathered to determine which drug worked best for each bacterial infection. Comparing drug performance was an enormous aid for practitioners and scientists alike. In the fall of 1951, Will Burtin published a &lt;a href = &#34;https://mbostock.github.io/protovis/ex/antibiotics-burtin.html&#34;&gt;graph &lt;/a&gt; showing the effectiveness of three popular antibiotics on &lt;B&gt;16&lt;/B&gt; different bacteria, measured in terms of minimum inhibitory concentration.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;https://almostkapil.netlify.com/post/sankey_files/avb.jpg&#34; alt=&#34;image creidt: Ask a biologist&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;image creidt: Ask a biologist&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I am reproducing this &lt;a href = &#34;https://www.dropbox.com/s/68ahri9xnnabce4/Bacteria-sigmoid-howto.docx?dl=0&#34;&gt;wonderful visualization&lt;/a&gt; from my professor(&lt;a href = &#34;http://driftlessdata.space/&#34;&gt; Silas Bergen&lt;/a&gt;.) in ggplot2, who did this in Tableau&lt;/p&gt;
&lt;p&gt;Let’s bring the datasets,&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(tidyverse)
library(knitr)
library(kableExtra)
df &amp;lt;- read.csv(&amp;quot;https://cdn.rawgit.com/plotly/datasets/5360f5cd/Antibiotics.csv&amp;quot;, stringsAsFactors = F)
#String as Factors is a demon. Better not bring it here ! We rarely need that beast.
#There are 16 bacteria so giving them ID to reference later..
df&amp;lt;-df %&amp;gt;% mutate(ID =seq(1:16) )&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;kable(head(df,n = 16))&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Bacteria
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Penicillin
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Streptomycin
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Neomycin
&lt;/th&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Gram
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
ID
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Mycobacterium tuberculosis
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
800.000
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
5.00
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2.000
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Salmonella schottmuelleri
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
10.000
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.80
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.090
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Proteus vulgaris
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
3.000
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.10
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.100
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
3
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Klebsiella pneumoniae
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
850.000
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1.20
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1.000
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
4
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Brucella abortus
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1.000
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2.00
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.020
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
5
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Pseudomonas aeruginosa
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
850.000
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2.00
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.400
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
6
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Escherichia coli
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
100.000
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.40
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.100
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
7
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Salmonella (Eberthella) typhosa
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1.000
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.40
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.008
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
8
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Aerobacter aerogenes
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
870.000
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1.00
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1.600
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
9
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Brucella antracis
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.001
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.01
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.007
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
positive
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
10
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Streptococcus fecalis
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1.000
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1.00
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.100
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
positive
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
11
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Staphylococcus aureus
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.030
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.03
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.001
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
positive
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
12
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Staphylococcus albus
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.007
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.10
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.001
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
positive
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
13
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Streptococcus hemolyticus
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.001
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
14.00
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
10.000
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
positive
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
14
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Streptococcus viridans
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.005
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
10.00
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
40.000
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
positive
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
15
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Diplococcus pneumoniae
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.005
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
11.00
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
10.000
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
positive
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
16
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Before proceeding further with the data manipulation we need to think about the format of the visualization. Here we will be making our visualization on the bacteria level, that means we will have information for each bacteria, their gram stain , and the concentration of drug required .&lt;/p&gt;
&lt;p&gt;If you look at the table above, we do have all the data we need but not on the format we are thinking. We want one information per row for each bacteria unlike above where each row has all the information of each bacteria on one single row.
Let’s change the format of the data,&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;key_value = df %&amp;gt;% gather(&amp;quot;Drug&amp;quot;,&amp;quot;Concentration&amp;quot;,Penicillin:Neomycin,-Bacteria)
kable(head(key_value))&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Bacteria
&lt;/th&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Gram
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
ID
&lt;/th&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Drug
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Concentration
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Mycobacterium tuberculosis
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Penicillin
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
800
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Salmonella schottmuelleri
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Penicillin
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
10
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Proteus vulgaris
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
3
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Penicillin
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
3
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Klebsiella pneumoniae
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
4
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Penicillin
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
850
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Brucella abortus
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
5
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Penicillin
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Pseudomonas aeruginosa
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
6
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Penicillin
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
850
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;okay so, now what we need to do is add a minimum concentration information for each bacteria for each stain type. so basically a column on the gathered table above. The only thing to keep note of is that here we should group all these bacteria and select the minimum concentration. We could have done this first[basically for eacg ] and gather like above but this is my thought process.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;df_min&amp;lt;- key_value  %&amp;gt;% 
  group_by(Bacteria) %&amp;gt;% summarise(Min = min(Concentration))
kable(head(df_min))&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Bacteria
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Min
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Aerobacter aerogenes
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1.000
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Brucella abortus
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.020
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Brucella antracis
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.001
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Diplococcus pneumoniae
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.005
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Escherichia coli
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.100
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Klebsiella pneumoniae
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1.000
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;so now, let’s join this &lt;code&gt;df_min&lt;/code&gt; dataframe from above with &lt;code&gt;df&lt;/code&gt; to have that minimum information in the dataframe.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;df&amp;lt;- inner_join(df,df_min,by = &amp;quot;Bacteria&amp;quot;)
df&amp;lt;- df %&amp;gt;% mutate(Best = case_when(
  Penicillin == Min~ &amp;quot;Penicillin&amp;quot;,
  Neomycin == Min~ &amp;quot;Neomycin&amp;quot;,
  Streptomycin == Min~ &amp;quot;Streptomycin&amp;quot;
))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now, since the data is ready and in the format we want,&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;kable(head(df))&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Bacteria
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Penicillin
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Streptomycin
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Neomycin
&lt;/th&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Gram
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
ID
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Min
&lt;/th&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Best
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Mycobacterium tuberculosis
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
800
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
5.0
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2.00
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2.00
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Neomycin
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Salmonella schottmuelleri
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
10
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.8
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.09
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.09
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Neomycin
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Proteus vulgaris
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
3
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.1
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.10
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
3
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.10
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Neomycin
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Klebsiella pneumoniae
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
850
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1.2
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1.00
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
4
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1.00
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Neomycin
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Brucella abortus
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2.0
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.02
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
5
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.02
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Neomycin
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Pseudomonas aeruginosa
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
850
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2.0
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.40
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
6
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
0.40
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Neomycin
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Okay, this step might be a little unintuitive but if we think with &lt;code&gt;grammer of graphics&lt;/code&gt; philosophy this will make sense.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;seq1 &amp;lt;- rep(1:16,each=100)
seq2 &amp;lt;-rep(seq(-6,6,length=100),16)
newdat &amp;lt;-data.frame(ID=seq1,T=seq2)
write.csv(newdat,&amp;quot;new_data.csv&amp;quot;,row.names=FALSE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We are making a new dataframe that has data point for the sigmoid curve(you can just draw sigmoid curve in R but this way it is linked with our data with ID)&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#Joining the data by ID
final_df&amp;lt;-inner_join(df,newdat,by = &amp;quot;ID&amp;quot;)
kable(head(final_df))&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Bacteria
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Penicillin
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Streptomycin
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Neomycin
&lt;/th&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Gram
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
ID
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Min
&lt;/th&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Best
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
T
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Mycobacterium tuberculosis
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
800
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
5
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Neomycin
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
-6.000000
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Mycobacterium tuberculosis
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
800
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
5
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Neomycin
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
-5.878788
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Mycobacterium tuberculosis
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
800
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
5
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Neomycin
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
-5.757576
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Mycobacterium tuberculosis
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
800
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
5
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Neomycin
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
-5.636364
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Mycobacterium tuberculosis
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
800
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
5
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Neomycin
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
-5.515151
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Mycobacterium tuberculosis
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
800
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
5
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
negative
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Neomycin
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
-5.393939
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#ggplot
final_df &amp;lt;- final_df %&amp;gt;% mutate(Sigmoid = 1/(1 + exp(-T)))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;okay so now we have the final dataset, we can get in the ggplot2 land.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = final_df , aes(x = T , y = Sigmoid ))
p + geom_point() &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://almostkapil.netlify.com/post/sankey_files/figure-html/unnamed-chunk-10-1.png&#34; width=&#34;1344&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#Making best slope
#Different slop will separate our curves
final_df&amp;lt;-final_df %&amp;gt;% mutate(bestBacSlope = case_when(
  Best ==&amp;quot;Streptomycin&amp;quot; ~ 4 - ID,
  Best ==&amp;quot;Neomycin&amp;quot; ~ 9 - ID,
  Best ==&amp;quot;Penicillin&amp;quot; ~ 14 - ID
))&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;final_df&amp;lt;-final_df %&amp;gt;% mutate(curveBest = ID + bestBacSlope * Sigmoid)
#Figuring out ID and labels

label_df&amp;lt;-final_df %&amp;gt;% dplyr::select(c(ID, Bacteria))%&amp;gt;% group_by(Bacteria,ID) %&amp;gt;% summarise(count = n()) %&amp;gt;% dplyr::select(Bacteria,ID) %&amp;gt;% arrange(ID)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Below are the label we will use in y-axis&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;label_y= c(&amp;quot;Mycobacterium tuberculosis&amp;quot; ,  &amp;quot;Salmonella schottmuelleri&amp;quot;  ,    
           &amp;quot;Proteus vulgaris&amp;quot;        ,        &amp;quot;Klebsiella pneumoniae&amp;quot;  ,        
           &amp;quot;Brucella abortus&amp;quot;      ,          &amp;quot;Pseudomonas aeruginosa&amp;quot;    ,     
           &amp;quot;Escherichia coli&amp;quot;    ,            &amp;quot;Salmonella (Eberthella) typhosa&amp;quot;,
           &amp;quot;Aerobacter aerogenes&amp;quot;     ,       &amp;quot;Brucella antracis&amp;quot;    ,          
           &amp;quot;Streptococcus fecalis&amp;quot;    ,       &amp;quot;Staphylococcus aureus&amp;quot;      ,    
           &amp;quot;Staphylococcus albus&amp;quot;    ,        &amp;quot;Streptococcus hemolyticus&amp;quot;      ,
           &amp;quot;Streptococcus viridans&amp;quot;    ,      &amp;quot;Diplococcus pneumoniae&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now it’s a &lt;code&gt;plotting time&lt;/code&gt; !&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#Plotting the sigmoid plots
library(ggthemes)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: package &amp;#39;ggthemes&amp;#39; was built under R version 3.5.2&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;sankey &amp;lt;- ggplot(data = final_df, aes(x = T , y = curveBest, color =Gram,size = Min,alpha = 0.9,group = Bacteria)) + geom_line() +scale_fill_manual(values=c(&amp;quot;green&amp;quot;,&amp;quot;red&amp;quot;)) + 
    scale_y_continuous(breaks = seq(1:16) , labels = label_y)   + theme(axis.title.y = element_blank() , axis.line.x  = element_blank() , axis.ticks.x = element_blank(), axis.title.x =element_blank() , axis.text.x.bottom = element_blank() ) + 
  annotate(&amp;quot;text&amp;quot;, x = 6, y = 14, label = &amp;quot;Penicillin&amp;quot;) +
  annotate(&amp;quot;text&amp;quot;, x = 6, y = 9, label = &amp;quot;Neomycin&amp;quot;) +
  annotate(&amp;quot;text&amp;quot;, x = 6, y = 4, label = &amp;quot;Streptomycin&amp;quot;) +
  annotate(&amp;quot;text&amp;quot;,x = 5.5,y = 15,label = &amp;quot;Best Antibiotics&amp;quot; ,size = 5, colour = &amp;#39;blue&amp;#39;)+
  theme_minimal()&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;sankey&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span id=&#34;fig:sankey&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://almostkapil.netlify.com/post/sankey_files/figure-html/sankey-1.png&#34; alt=&#34;Classification of Bacteria&#34; width=&#34;1344&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 1: Classification of Bacteria
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
  </channel>
</rss>
