<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:base="https://paulgoetze.com/">
  <id>https://paulgoetze.com/</id>
  <title>Paul Götze’s tech blog</title>
  <updated>2021-10-25T00:00:00Z</updated>
  <link rel="alternate" href="https://paulgoetze.com/" type="text/html"/>
  <link rel="self" href="https://paulgoetze.com/blog/feed.xml" type="application/atom+xml"/>
  <author>
    <name>Paul Götze</name>
    <uri>https://paulgoetze.com</uri>
  </author>
  <entry>
    <id>tag:paulgoetze.com,2021-10-25:/2021/10/25/testing-your-json-api-in-ruby-with-dry-rb/</id>
    <title type="html">Testing your JSON API in Ruby with dry-rb</title>
    <published>2021-10-25T00:00:00Z</published>
    <updated>2021-10-25T00:00:00Z</updated>
    <link rel="alternate" href="https://medium.com/@paulgoetze/dffb6a9bccdf?sk=13183f3f6e1f96a9db459fdf10f40360" type="text/html"/>
    <content type="html">
</content>
    <summary type="html">Using dry-schema and dry-validate to keep endpoint tests readable and maintainable</summary>
  </entry>
  <entry>
    <id>tag:paulgoetze.com,2020-08-02:/2020/08/02/using-mjml-in-elixir-and-phoenix/</id>
    <title type="html">Using MJML in Elixir &amp; Phoenix</title>
    <published>2020-08-02T00:00:00Z</published>
    <updated>2020-08-02T00:00:00Z</updated>
    <link rel="alternate" href="https://medium.com/p/ca27050ff26f?sk=c92e6c2e58246868aaf7b4b231d4d501" type="text/html"/>
    <content type="html">
</content>
    <summary type="html">How to create responsive HTML emails for your Phoenix app with ease</summary>
  </entry>
  <entry>
    <id>tag:paulgoetze.com,2020-06-24:/2020/06/24/experiencing-the-stroop-effect-with-a-ruby-cli/</id>
    <title type="html">Experiencing The Stroop Effect with a Ruby CLI</title>
    <published>2020-06-24T00:00:00Z</published>
    <updated>2020-06-24T00:00:00Z</updated>
    <link rel="alternate" href="https://paulgoetze.com/2020/06/24/experiencing-the-stroop-effect-with-a-ruby-cli/" type="text/html"/>
    <content type="html">&lt;div class="images-panel images-panel--full-width"&gt;
  &lt;img alt="Stroop test" src="/images/post/stroop.webp"&gt;
&lt;/div&gt;

&lt;p&gt;You’ve probably seen these weird lists of color words before. Words whose text color does not match their content. Maybe you’ve even tried to read the text colors out loud quickly and have struggled to do&amp;nbsp;so?&lt;/p&gt;

&lt;p&gt;These lists are part of psychological tests to demonstrate the so-called &lt;a href="https://en.wikipedia.org/wiki/Stroop_effect"&gt;Stroop effect&lt;/a&gt;. It is named after the American psychologist John Ridley Stroop, who first published the effect in English in&amp;nbsp;1935.&lt;/p&gt;

&lt;p&gt;Wikipedia tells&amp;nbsp;us:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;The Stroop effect is the delay in reaction time between congruent and incongruent&amp;nbsp;stimuli.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But what does that actually mean?&amp;nbsp;🤔&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;I built a tiny command line interface with Ruby – just because it was fun and so that you can experience the Stroop effect yourself. You can install and run it&amp;nbsp;with:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;gem install stroop
stroop
&lt;/code&gt;&lt;/pre&gt;

&lt;hr&gt;

&lt;h1 id="congruent-incongruent"&gt;Congruent?&amp;nbsp;Incongruent?&lt;/h1&gt;

&lt;p&gt;A Stroop tests consists of a small task, like reading color words. This task can be done with different kind of stimuli, namely: neutral, congruent, and&amp;nbsp;incongruent.&lt;/p&gt;

&lt;p&gt;Let’s take a list of color words with a neutral stimulus, which means that all words are written in the same text&amp;nbsp;color:&lt;/p&gt;

&lt;div class="images-panel single"&gt;
  &lt;img alt="Color word box - neutral stimulus" src="/images/post/stroop-neutral.webp"&gt;
  &lt;figcaption&gt;Neutral Stimulus – All color words are written in a neutral text color (black, or here gray)&lt;/figcaption&gt;
&lt;/div&gt;

&lt;p&gt;Reading all words out loud should not be hard to&amp;nbsp;do.&lt;/p&gt;

&lt;p&gt;Next let’s take the same list of color words, but let’s print them in a color that matches the respective text. That’s a congruent&amp;nbsp;stimulus:&lt;/p&gt;

&lt;div class="images-panel single"&gt;
  &lt;img alt="Color word box - congruent stimulus" src="/images/post/stroop-congruent.webp"&gt;
  &lt;figcaption&gt;Congruent Stimulus – The word content matches the text color.&lt;/figcaption&gt;
&lt;/div&gt;

&lt;p&gt;If you read he words again, it might even be easier than with the neutral&amp;nbsp;stimulus.&lt;/p&gt;

&lt;p&gt;Last, let’s print the same word list again, but this time we use another text color than the color that is represented in the text. That’s an incongruent&amp;nbsp;stimulus:&lt;/p&gt;

&lt;div class="images-panel single"&gt;
  &lt;img alt="Color word box - incongruent stimulus" src="/images/post/stroop-incongruent.webp"&gt;
  &lt;figcaption&gt;Incongruent Stimulus – The word content and the text color are different.&lt;/figcaption&gt;
&lt;/div&gt;

&lt;p&gt;Reading the colors words might feel a bit slower now. However, the Stroop effect is much more apparent, if we slightly change our&amp;nbsp;task:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Instead of reading out loud the color words, try saying the text color for each word as fast as&amp;nbsp;possible!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;With the incongruent stimulus you probably need much longer now and you might even make one or the other&amp;nbsp;mistake.&lt;/p&gt;

&lt;p&gt;And that’s the Stroop effect: the delay that appears if we speak the text color out loud for the incongruent word list compared to the congruent word&amp;nbsp;list.&lt;/p&gt;

&lt;p&gt;Isn’t it fun – and really&amp;nbsp;exhausting?&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;🌱 The above word lists were generated using the &lt;a href="https://github.com/paulgoetze/stroop"&gt;stroop gem&lt;/a&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;stroop neutral
stroop congruent
stroop incongruent
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;with the seed:&amp;nbsp;134414671674842647560860440639024210370&lt;/p&gt;

&lt;p&gt;&lt;em&gt;(&lt;span class="caps"&gt;FYI&lt;/span&gt;: I found this tiny stroop &lt;span class="caps"&gt;CLI&lt;/span&gt; to also be perfect for creating colorful artsy-fartsy wallpapers &lt;span class="amp"&gt;&amp;amp;&lt;/span&gt; online profile backgrounds&lt;/em&gt; 😉&lt;em&gt;)&lt;/em&gt;&lt;/p&gt;
</content>
    <summary type="html">Creating colorful Stroop tests in your terminal</summary>
  </entry>
  <entry>
    <id>tag:paulgoetze.com,2020-05-14:/2020/05/14/how-to-find-new-maintainers-for-your-open-source-project/</id>
    <title type="html">How to find new maintainers for your open source project</title>
    <published>2020-05-14T00:00:00Z</published>
    <updated>2020-05-14T00:00:00Z</updated>
    <link rel="alternate" href="https://opensource.com/article/20/5/adoptoposs" type="text/html"/>
    <content type="html">
</content>
    <summary type="html">This open source software helps you build a healthy team of co-maintainers so that no project gets left behind. – The story behind how and why I built Adoptoposs.org.</summary>
  </entry>
  <entry>
    <id>tag:paulgoetze.com,2020-05-09:/2020/05/09/the-details-of-a-dropdown/</id>
    <title type="html">The &amp;lt;details&amp;gt; Of a Dropdown</title>
    <published>2020-05-09T00:00:00Z</published>
    <updated>2020-05-09T00:00:00Z</updated>
    <link rel="alternate" href="https://paulgoetze.com/2020/05/09/the-details-of-a-dropdown/" type="text/html"/>
    <content type="html">&lt;p&gt;I would by no means consider myself as an expert when it comes to &lt;span class="caps"&gt;HTML&lt;/span&gt; &lt;span class="amp"&gt;&amp;amp;&lt;/span&gt; &lt;span class="caps"&gt;CSS.&lt;/span&gt; But I thought I just share what I found to work really well for me, when I tried to build a dropdown panel without any&amp;nbsp;JavaScript.&lt;/p&gt;

&lt;div class="tldr start"&gt;
  &lt;hr&gt;
&lt;/div&gt;

&lt;p&gt;&lt;em&gt;You can build a fully functional dropdown panel by using plain &lt;span class="caps"&gt;HTML&lt;/span&gt; and &lt;span class="caps"&gt;CSS&lt;/span&gt; with no JavaScript involved by leveraging the &lt;code&gt;&amp;lt;details&amp;gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;summary&amp;gt;&lt;/code&gt; tags. Here’s a minimal&amp;nbsp;example:&lt;/em&gt;&lt;/p&gt;

&lt;div class="panel"&gt;
  &lt;p class="codepen" data-height="354" data-theme-id="dark" data-default-tab="html,result" data-user="paulgoetze" data-slug-hash="NWGaZPP" style="height: 354px; box-sizing: border-box; display: flex; align-items: center; justify-content: center; border: 2px solid; margin: 1em 0; padding: 1em;" data-pen-title="details &amp;amp;amp; summary – dropdown (minimal example)"&gt;
    &lt;span&gt;See the Pen &lt;a href="https://codepen.io/paulgoetze/pen/NWGaZPP"&gt;
    details &lt;span class="amp"&gt;&amp;amp;&lt;/span&gt; summary – dropdown (minimal example)&lt;/a&gt; by Paul Götze (&lt;a href="https://codepen.io/paulgoetze"&gt;@paulgoetze&lt;/a&gt;)
    on &lt;a href="https://codepen.io"&gt;CodePen&lt;/a&gt;.&lt;/span&gt;
  &lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;&lt;em&gt;If you want to see more advanced dropdowns built with this approach and want to learn about what’s going on here, then read&amp;nbsp;on…&lt;/em&gt;&lt;/p&gt;

&lt;div class="tldr end"&gt;
  &lt;hr&gt;
&lt;/div&gt;

&lt;p&gt;There is an &lt;span class="caps"&gt;HTML&lt;/span&gt; tag that might count among the little known but most powerful tags in the world of websites: the &lt;code&gt;&amp;lt;details&amp;gt;&lt;/code&gt;&amp;nbsp;tag.&lt;/p&gt;

&lt;p&gt;It usually appears together with another tag that makes the &lt;code&gt;&amp;lt;details&amp;gt;&lt;/code&gt; tag shine: the &lt;code&gt;&amp;lt;summary&amp;gt;&lt;/code&gt;&amp;nbsp;tag.&lt;/p&gt;

&lt;p&gt;It basically does what it says – it shows you a summary. And you can click on it to see some details. The best feature is, that it comes out of the box with each browser, no JavaScript needed for toggling the details. Just plain old &lt;span class="caps"&gt;HTML.&lt;/span&gt; Here’s what it looks like by&amp;nbsp;default:&lt;/p&gt;

&lt;div class="panel"&gt;
  &lt;p class="codepen" data-height="226" data-theme-id="dark" data-default-tab="html,result" data-user="paulgoetze" data-slug-hash="RwWLQqe" style="height: 226px; box-sizing: border-box; display: flex; align-items: center; justify-content: center; border: 2px solid; margin: 1em 0; padding: 1em;" data-pen-title="plain details &amp;amp;amp; summary"&gt;
    &lt;span&gt;See the Pen &lt;a href="https://codepen.io/paulgoetze/pen/RwWLQqe"&gt;
    plain details &lt;span class="amp"&gt;&amp;amp;&lt;/span&gt; summary&lt;/a&gt; by Paul Götze (&lt;a href="https://codepen.io/paulgoetze"&gt;@paulgoetze&lt;/a&gt;)
    on &lt;a href="https://codepen.io"&gt;CodePen&lt;/a&gt;.&lt;/span&gt;
  &lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;Wow, not sparking too much joy, yet. But we’ll get&amp;nbsp;there.&lt;/p&gt;

&lt;h2 id="lets-spice-it-up-"&gt;Let’s Spice It Up&amp;nbsp;🌶️&lt;/h2&gt;

&lt;p&gt;With a bit of styling, the plain &lt;code&gt;&amp;lt;details&amp;gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;summary&amp;gt;&lt;/code&gt; snippet from above can already be used for such exciting things as questions (summary) and answers (details) on your &lt;a href="https://adoptoposs.org/faq"&gt;&lt;span class="caps"&gt;FAQ&lt;/span&gt; page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But let us explore some even more exciting use cases. Let’s build a dropdown menu that might live in the navigation on your web&amp;nbsp;page.&lt;/p&gt;

&lt;p&gt;First, we put a container around our &lt;code&gt;&amp;lt;details&amp;gt;&lt;/code&gt; snippet which will represent the dropdown element. Then we replace the former summary text with a hamburger menu icon and insert a list as the actual dropdown content. Last, we wrap the whole dropdown with a &lt;code&gt;&amp;lt;nav&amp;gt;&lt;/code&gt; tag and give all of that some quick color styles. Et&amp;nbsp;voilà:&lt;/p&gt;

&lt;div class="panel"&gt;
  &lt;p class="codepen" data-height="506" data-theme-id="dark" data-default-tab="html,result" data-user="paulgoetze" data-slug-hash="gOaGepN" style="height: 506px; box-sizing: border-box; display: flex; align-items: center; justify-content: center; border: 2px solid; margin: 1em 0; padding: 1em;" data-pen-title="plain details &amp;amp;amp; summary"&gt;
    &lt;span&gt;See the Pen &lt;a href="https://codepen.io/paulgoetze/pen/gOaGepN"&gt;
    plain details &lt;span class="amp"&gt;&amp;amp;&lt;/span&gt; summary&lt;/a&gt; by Paul Götze (&lt;a href="https://codepen.io/paulgoetze"&gt;@paulgoetze&lt;/a&gt;)
    on &lt;a href="https://codepen.io"&gt;CodePen&lt;/a&gt;.&lt;/span&gt;
  &lt;/p&gt;
&lt;/div&gt;

&lt;h3 id="hiding-the-disclosure-widget"&gt;Hiding The Disclosure&amp;nbsp;Widget&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;&amp;lt;details&amp;gt;&lt;/code&gt; tag comes with an &lt;code&gt;open&lt;/code&gt; attribute and a disclosure widget (▶, ▼) which indicates whether the dropdown is opened and details are visible. Let’s hide this marker, so that the click area looks more like a general&amp;nbsp;button.&lt;/p&gt;

&lt;p&gt;In Firefox we can do so by setting &lt;code&gt;list-style: none;&lt;/code&gt; for our summary. In other browsers you need to apply &lt;code&gt;display: none;&lt;/code&gt; for the summary’s pseudo-class&amp;nbsp;&lt;code&gt;::-webkit-details-marker&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;We’ll also fix the cursor behavior for the &lt;code&gt;&amp;lt;summary&amp;gt;&lt;/code&gt; on the fly. So, this is what we end up with after applying these&amp;nbsp;fixes:&lt;/p&gt;

&lt;div class="panel"&gt;
  &lt;p class="codepen" data-height="500" data-theme-id="dark" data-default-tab="css,result" data-user="paulgoetze" data-slug-hash="pojWOEK" style="height: 500px; box-sizing: border-box; display: flex; align-items: center; justify-content: center; border: 2px solid; margin: 1em 0; padding: 1em;" data-pen-title="plain details &amp;amp;amp; summary"&gt;
    &lt;span&gt;See the Pen &lt;a href="https://codepen.io/paulgoetze/pen/pojWOEK"&gt;
    plain details &lt;span class="amp"&gt;&amp;amp;&lt;/span&gt; summary&lt;/a&gt; by Paul Götze (&lt;a href="https://codepen.io/paulgoetze"&gt;@paulgoetze&lt;/a&gt;)
    on &lt;a href="https://codepen.io"&gt;CodePen&lt;/a&gt;.&lt;/span&gt;
  &lt;/p&gt;
&lt;/div&gt;

&lt;h3 id="detaching-the-details"&gt;Detaching The&amp;nbsp;Details&lt;/h3&gt;

&lt;p&gt;This might already serve well as a simple dropdown menu. But we can do much&amp;nbsp;better.&lt;/p&gt;

&lt;p&gt;Right now, when opening the details, the content enlarges the details container and therefore also pushes down any other content that lives below our navigation bar. In more complex navigations we might like to have a dropdown panel that is detached from the triggering summary. So, next up, we’ll add some styles to display the actual menu content in a position-wise independent&amp;nbsp;panel.&lt;/p&gt;

&lt;p&gt;We can get an independent menu content by adjusting its position property. Hence, we wrap our content into a &lt;code&gt;&amp;lt;div&amp;gt;&lt;/code&gt; container and give it an &lt;code&gt;absolute&lt;/code&gt; position. With this, the opened menu content is now displayed directly below the summary, which is the opening trigger for our dropdown&amp;nbsp;menu.&lt;/p&gt;

&lt;p&gt;In order to define if the panel is opened to the left or the right of the triggering summary, we apply a &lt;code&gt;position: relative;&lt;/code&gt; for the dropdown &lt;code&gt;&amp;lt;div&amp;gt;&lt;/code&gt; container. You can now customize the dropdown panel’s anchor by applying &lt;code&gt;right: 0;&lt;/code&gt; or &lt;code&gt;left: 0;&lt;/code&gt; (which is the default) to the summary. When using a relative position, we also need to make sure the dropdown panel has a minimum inline size of its maximum content width – else our menu items will have unwanted line&amp;nbsp;breaks.&lt;/p&gt;

&lt;p&gt;By giving the dropdown &lt;code&gt;&amp;lt;div&amp;gt;&lt;/code&gt; container an additional &lt;code&gt;display: inline-block;&lt;/code&gt; we make sure the dropdown panel only opens when clicking the hamburger menu icon&amp;nbsp;directly.&lt;/p&gt;

&lt;p&gt;Similar to before, we also add some list styles to make it already look like a menu and make it more distinct from the navigation and the text content&amp;nbsp;below:&lt;/p&gt;

&lt;div class="panel"&gt;
  &lt;p class="codepen" data-height="506" data-theme-id="dark" data-default-tab="html,result" data-user="paulgoetze" data-slug-hash="qBOPMPE" style="height: 506px; box-sizing: border-box; display: flex; align-items: center; justify-content: center; border: 2px solid; margin: 1em 0; padding: 1em;" data-pen-title="details &amp;amp;amp; summary – dropdown III"&gt;
    &lt;span&gt;See the Pen &lt;a href="https://codepen.io/paulgoetze/pen/qBOPMPE"&gt;
    details &lt;span class="amp"&gt;&amp;amp;&lt;/span&gt; summary – dropdown &lt;span class="caps"&gt;III&lt;/span&gt;&lt;/a&gt; by Paul Götze (&lt;a href="https://codepen.io/paulgoetze"&gt;@paulgoetze&lt;/a&gt;)
    on &lt;a href="https://codepen.io"&gt;CodePen&lt;/a&gt;.&lt;/span&gt;
  &lt;/p&gt;
&lt;/div&gt;

&lt;h3 id="closing-the-menu"&gt;Closing The&amp;nbsp;Menu&lt;/h3&gt;

&lt;p&gt;Opening and closing the dropdown works nicely and it looks like a navigation menu alright. However, there’s still an issue if we open our menu and decide to not click anything in it but leave for interacting with other content on the page. Then our menu will still be wide open and cover underlying&amp;nbsp;content.&lt;/p&gt;

&lt;p&gt;So, we need to figure out how to close the &lt;code&gt;&amp;lt;details&amp;gt;&lt;/code&gt; again, whenever we click somewhere else outside the menu&amp;nbsp;area.&lt;/p&gt;

&lt;p&gt;We can use a neat little trick to reach this behavior – again without any JavaScript involved. When the &lt;code&gt;&amp;lt;details&amp;gt;&lt;/code&gt; container is in the open state, then clicking any &lt;code&gt;&amp;lt;summary&amp;gt;&lt;/code&gt; content will hide the menu content. We can leverage this behavior by making sure, that the only area we can click on outside the menu content will always be the &lt;code&gt;&amp;lt;summary&amp;gt;&lt;/code&gt; area. So, a click anywhere else on the page would always trigger closing the&amp;nbsp;menu.&lt;/p&gt;

&lt;p&gt;Technically this is possible by enlarging the summaries &lt;code&gt;::before&lt;/code&gt; pseudo-class to the full view size. This is done by giving it a &lt;code&gt;fixed&lt;/code&gt; position and expanding it to all four view corners (setting &lt;code&gt;top&lt;/code&gt;, &lt;code&gt;right&lt;/code&gt;, &lt;code&gt;bottom&lt;/code&gt;, &lt;code&gt;left&lt;/code&gt; to 0). In order to fill the whole screen we also need to set the &lt;code&gt;content&lt;/code&gt; of the &lt;code&gt;::before&lt;/code&gt; pseudo-class. By default the details content is displayed with a higher z-index than the related summary content, so we don’t need to care about this. You can still interact with the menu content. The next example applies a transparent background color for the &lt;code&gt;summary::before&lt;/code&gt;’s content, so that we can see the area covering the entire screen behind the menu&amp;nbsp;content:&lt;/p&gt;

&lt;div class="panel"&gt;
  &lt;p class="codepen" data-height="506" data-theme-id="dark" data-default-tab="css,result" data-user="paulgoetze" data-slug-hash="bGVoxyj" style="height: 506px; box-sizing: border-box; display: flex; align-items: center; justify-content: center; border: 2px solid; margin: 1em 0; padding: 1em;" data-pen-title="details &amp;amp;amp; summary – dropdown IV"&gt;
    &lt;span&gt;See the Pen &lt;a href="https://codepen.io/paulgoetze/pen/bGVoxyj"&gt;
    details &lt;span class="amp"&gt;&amp;amp;&lt;/span&gt; summary – dropdown &lt;span class="caps"&gt;IV&lt;/span&gt;&lt;/a&gt; by Paul Götze (&lt;a href="https://codepen.io/paulgoetze"&gt;@paulgoetze&lt;/a&gt;)
    on &lt;a href="https://codepen.io"&gt;CodePen&lt;/a&gt;.&lt;/span&gt;
  &lt;/p&gt;
&lt;/div&gt;

&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;We built a fully functional dropdown menu without any JavaScript by using the &lt;code&gt;&amp;lt;details&amp;gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;summary&amp;gt;&lt;/code&gt; &lt;span class="caps"&gt;HTML&lt;/span&gt; tags and some &lt;span class="caps"&gt;CSS&lt;/span&gt;&amp;nbsp;styles.&lt;/p&gt;

&lt;p&gt;This approach can be used for a multitude of different dropdown panels, including navigation menus, sharing widgets, and all sorts of buttons that open a panel with further details or actions. In fact, GitHub is using a similar approach for their clone button and the branch select panel – which is also where I took inspiration from to kind of reverse-engineer the described&amp;nbsp;approach:&lt;/p&gt;

&lt;div class="images-panel"&gt;
  &lt;img src="/images/post/github-dropdown-button.webp" alt="GitHub clone button dropdown"&gt;

  &lt;img src="/images/post/github-branch-select.webp" alt="GitHub branch select button dropdown"&gt;
&lt;/div&gt;

&lt;p&gt;With some minor additions and a couple more &lt;span class="caps"&gt;CSS&lt;/span&gt; styles we can make our example into a shiny dropdown menu, that works just as you would expect a dropdown menu to&amp;nbsp;work:&lt;/p&gt;

&lt;div class="panel"&gt;
  &lt;p class="codepen" data-height="525" data-theme-id="dark" data-default-tab="html,result" data-user="paulgoetze" data-slug-hash="OJyxByw" style="height: 525; box-sizing: border-box; display: flex; align-items: center; justify-content: center; border: 2px solid; margin: 1em 0; padding: 1em;" data-pen-title="details &amp;amp;amp; summary – dropdown V"&gt;
    &lt;span&gt;See the Pen &lt;a href="https://codepen.io/paulgoetze/pen/OJyxByw"&gt;
    details &lt;span class="amp"&gt;&amp;amp;&lt;/span&gt; summary – dropdown V&lt;/a&gt; by Paul Götze (&lt;a href="https://codepen.io/paulgoetze"&gt;@paulgoetze&lt;/a&gt;)
    on &lt;a href="https://codepen.io"&gt;CodePen&lt;/a&gt;.&lt;/span&gt;
  &lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;As long as you don’t want to put any more complex interactions into the dropdown, the described approach does not need any JavaScript. However, if you don’t have a page reload after clicking a link in the dropdown panel, you would need to add some JavaScript to toggle the &lt;code&gt;&amp;lt;details&amp;gt;&lt;/code&gt;s open state. The same applies for any additional actions from within the dropdown panel that should change its open&amp;nbsp;state.&lt;/p&gt;

&lt;p&gt;I hope you learned some useful details about how to build dropdowns. We can give the positive summary that you might not always need JavaScript to build interactive web components. &lt;span class="caps"&gt;HTML&lt;/span&gt; and &lt;span class="caps"&gt;CSS&lt;/span&gt; might have you covered in more cases than you think. For dropdown menus and dropdown panels it certainly&amp;nbsp;does.&lt;/p&gt;

&lt;script async="" src="https://static.codepen.io/assets/embed/ei.js"&gt;&lt;/script&gt;

</content>
    <summary type="html">How to build a beautiful dropdown panel with just HTML and CSS. (Including a customizable minimal example for you to use!)</summary>
  </entry>
  <entry>
    <id>tag:paulgoetze.com,2020-03-30:/2020/03/20/how-to-keep-open-source-software-maintained/</id>
    <title type="html">Announcing Adoptoposs.org</title>
    <published>2020-03-30T00:00:00Z</published>
    <updated>2020-03-30T00:00:00Z</updated>
    <link rel="alternate" href="https://paulgoetze.com/2020/03/20/how-to-keep-open-source-software-maintained/" type="text/html"/>
    <content type="html">
</content>
    <summary type="html">I am happy to announce adoptoposs.org – an open source app that connects open source software maintainers with people who want to help keep projects and maintainers healthy in the long term.</summary>
  </entry>
  <entry>
    <id>tag:paulgoetze.com,2019-08-06:/2019/08/06/uploading-files-to-gcs-with-a-flask-api-3/</id>
    <title type="html">File upload to Google Cloud Storage using a Flask API (3 of 3)</title>
    <published>2019-08-06T00:00:00Z</published>
    <updated>2019-08-06T00:00:00Z</updated>
    <link rel="alternate" href="https://bit.ly/file-upload-gcs-flask-3" type="text/html"/>
    <content type="html">
</content>
    <summary type="html">In part 3 we’ll have a look at how to customize upload directories and allow multiple storages for different attachments.</summary>
  </entry>
  <entry>
    <id>tag:paulgoetze.com,2019-08-06:/2019/08/06/uploading-files-to-gcs-with-a-flask-api-2/</id>
    <title type="html">File upload to Google Cloud Storage using a Flask API (2 of 3)</title>
    <published>2019-08-06T00:00:00Z</published>
    <updated>2019-08-06T00:00:00Z</updated>
    <link rel="alternate" href="https://bit.ly/file-upload-gcs-flask-2" type="text/html"/>
    <content type="html">
</content>
    <summary type="html">In part 2 we’ll implement the attachment in the model, the file upload endpoint and some upload tests using an in-memory file storage.</summary>
  </entry>
  <entry>
    <id>tag:paulgoetze.com,2019-08-06:/2019/08/06/uploading-files-to-gcs-with-a-flask-api-1/</id>
    <title type="html">File upload to Google Cloud Storage using a Flask API (1 of 3)</title>
    <published>2019-08-06T00:00:00Z</published>
    <updated>2019-08-06T00:00:00Z</updated>
    <link rel="alternate" href="https://bit.ly/file-upload-gcs-flask-1" type="text/html"/>
    <content type="html">
</content>
    <summary type="html">In part 1 we’ll setup a basic Flask app for file uploading to GCS using the filedepot package.</summary>
  </entry>
  <entry>
    <id>tag:paulgoetze.com,2017-06-28:/2017/06/28/handling-complex-json-schemas-in-python/</id>
    <title type="html">Handling complex JSON Schemas in Python</title>
    <published>2017-06-28T00:00:00Z</published>
    <updated>2017-06-28T00:00:00Z</updated>
    <link rel="alternate" href="https://medium.com/grammofy/handling-complex-json-schemas-in-python-9eacc04a60cf?source=friends_link&amp;sk=2acab57dc699857181742e153f3e58bb" type="text/html"/>
    <content type="html">
</content>
    <summary type="html">JSON schemas can get confusing if you have to deal with complex data. We’ll look into how to use references to clean up your schemas.</summary>
  </entry>
  <entry>
    <id>tag:paulgoetze.com,2017-05-31:/2017/05/31/testing-your-python-api-app-with-json-Schema/</id>
    <title type="html">Testing Your Python API App with JSON Schema</title>
    <published>2017-05-31T00:00:00Z</published>
    <updated>2017-05-31T00:00:00Z</updated>
    <link rel="alternate" href="https://medium.com/grammofy/testing-your-python-api-app-with-json-schema-52677fe73351?source=friends_link&amp;sk=17be56f67f578a80395f5b520fb7ff60" type="text/html"/>
    <content type="html">
</content>
    <summary type="html">A nice way to test JSON APIs is verifying a request’s response against a JSON Schema. Here’s how you can cleanly test your Python API app by using the jsonschema package and a custom assertion helper.</summary>
  </entry>
  <entry>
    <id>tag:paulgoetze.com,2017-04-19:/2017/04/19/building-a-city-search-with-elixir-and-python/</id>
    <title type="html">Building a City Search with Elixir and Python</title>
    <published>2017-04-19T00:00:00Z</published>
    <updated>2017-04-19T00:00:00Z</updated>
    <link rel="alternate" href="https://paulgoetze.com/2017/04/19/building-a-city-search-with-elixir-and-python/" type="text/html"/>
    <content type="html">&lt;p&gt;The other day I was wondering whether there was an easy self-made local&amp;nbsp;alternative
to something like the &lt;a href="https://developers.google.com/places"&gt;Google Places &lt;span class="caps"&gt;API&lt;/span&gt;&lt;/a&gt;, that I could use in a Phoenix&amp;nbsp;app.
I wanted to search for a city and wanted to get back the city itself, its state, and the&amp;nbsp;country.&lt;/p&gt;

&lt;p&gt;I found the free &lt;a href="http://dev.maxmind.com/geoip/geoip2/geolite2/#Downloads"&gt;GeoLite2&lt;/a&gt; city dataset, provided by Maxmind, which&amp;nbsp;I
could use to create a city search&amp;nbsp;index.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;(In case you directly want to dive into the programmatic materialisation&amp;nbsp;of
what I came up with, it is available &lt;a href="https://github.com/paulgoetze/elixir-python"&gt;on Github&lt;/a&gt;.)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I did a quick search and stumbled upon the &lt;a href="https://github.com/elixir-search/searchex"&gt;searchex&lt;/a&gt; project by &lt;a href="https://github.com/andyl"&gt;@andyl&lt;/a&gt;. This actually looked like it was&amp;nbsp;exactly
what I was searching for. However, there is very little documentation&amp;nbsp;yet.
So, unfortunately, I couldn’t really figure out how to get it&amp;nbsp;working.&lt;/p&gt;

&lt;p&gt;Then, while thinking about how to approach this, &lt;a href="https://whoosh.readthedocs.io"&gt;Whoosh&lt;/a&gt;, a Python&amp;nbsp;package
that I have used at work, came to my mind. Whoosh is a library for indexing text&amp;nbsp;and
searching the index. It is pretty easy to set up and delivers great&amp;nbsp;search
results with little&amp;nbsp;effort.&lt;/p&gt;

&lt;p&gt;With this in my mind, I was wondering whether there was a way to call Python&amp;nbsp;code
from Elixir. After some further research and &lt;a href="https://medium.com/@Stephanbv/ruby-code-in-elixir-project-97614a9543d#.rp7o5vrpl"&gt;some&lt;/a&gt; &lt;a href="https://hackernoon.com/calling-python-from-elixir-erlport-vs-thrift-be75073b6536#.netzr6o72"&gt;articles&lt;/a&gt; later I found the Erlang&amp;nbsp;library
&lt;a href="http://erlport.org"&gt;erlport&lt;/a&gt;, which allows you to call Ruby and Python code from&amp;nbsp;Elixir.
There is also an Elixir wrapper for it, bearing the sounding name &lt;a href="https://github.com/fazibear/export"&gt;Export&lt;/a&gt;.
You could also use erlport directly in Elixir, but Export gives you some&amp;nbsp;convenient
functions on top and a more Elixir-like&amp;nbsp;feeling.&lt;/p&gt;

&lt;h2 id="setting-up-a-new-mix-project--python-virtualenv"&gt;Setting Up a New mix Project &lt;span class="amp"&gt;&amp;amp;&lt;/span&gt; Python&amp;nbsp;virtualenv&lt;/h2&gt;

&lt;p&gt;In order to get started with our custom city&amp;nbsp;index,
let’s set up a prototype mix project, called&amp;nbsp;&lt;code&gt;elixir_python&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;mix new elixir_python
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Head to the &lt;code&gt;mix.exs&lt;/code&gt; file and add the &lt;code&gt;export&lt;/code&gt;&amp;nbsp;dependency:&lt;/p&gt;

&lt;div class="language-elixir highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="c1"&gt;# mix.exs&lt;/span&gt;
&lt;span class="c1"&gt;# ...&lt;/span&gt;
&lt;span class="k"&gt;defp&lt;/span&gt; &lt;span class="n"&gt;deps&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="ss"&gt;:export&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"~&amp;gt; 0.1.0"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then install the dependencies&amp;nbsp;with:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;mix deps.get
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;For setting up a Python environment you can use &lt;a href="https://virtualenv.pypa.io"&gt;virturalenv&lt;/a&gt; to create a&amp;nbsp;local
virtual environment. Also keep in mind to activate it after&amp;nbsp;creating:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;virtualenv -p python3 venv
source venv/bin/activate
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We will use Whoosh, so we need a &lt;code&gt;requirements.txt&lt;/code&gt; next to our &lt;code&gt;mix.exs&lt;/code&gt; that&amp;nbsp;defines
the Python&amp;nbsp;dependencies:&lt;/p&gt;

&lt;div class="language-python highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="c1"&gt;# /requirements.txt
&lt;/span&gt;
&lt;span class="n"&gt;whoosh&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="mf"&gt;2.7&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Install the requirements&amp;nbsp;with:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;pip install -r requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Next, we need a directory where our Python code will live. Let’s create a&amp;nbsp;&lt;code&gt;lib/python&lt;/code&gt;
directory where we will put the *.py files later&amp;nbsp;on.
You can really put them wherever you want, you just have to link to the&amp;nbsp;directory
when using&amp;nbsp;Export.&lt;/p&gt;

&lt;p&gt;In your &lt;code&gt;lib/python&lt;/code&gt; directory create a &lt;code&gt;geolite2.py&lt;/code&gt; file. This is where we&amp;nbsp;will
put the code for our city search index. Next, download the GeoLite2 &lt;span class="caps"&gt;CSV&lt;/span&gt; files&amp;nbsp;from
&lt;a href="http://dev.maxmind.com/geoip/geoip2/geolite2/#Downloads"&gt;dev.maxmind.com&lt;/a&gt; and put the English city locations in the&amp;nbsp;&lt;code&gt;/lib/python/data&lt;/code&gt;
directory. For our Python requirements we will also need a &lt;code&gt;requirements.txt&lt;/code&gt;&amp;nbsp;file
in our project’s root&amp;nbsp;directory.&lt;/p&gt;

&lt;p&gt;Our Elixir code will live in&amp;nbsp;&lt;code&gt;lib/elixir_python/geolite2.ex&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The overall project structure should now look like&amp;nbsp;this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;└── elixir_python
    ├── config
    ├── lib
    │   ├── elixir_python
    │   │   └── geolite2.ex
    │   ├── python
    │   │   ├── data
    │   │   │   └── GeoLite2-City-Locations-en.csv
    │   │   ├── __init__.py
    │   │   └── geolite2.py
    │   └── elixir_python.ex
    ├── mix.exs
    ├── requirements.txt
    └── …
&lt;/code&gt;&lt;/pre&gt;

&lt;h2 id="the-python-part"&gt;The Python&amp;nbsp;Part&lt;/h2&gt;

&lt;p&gt;Our geolite2 Python module will have an &lt;span class="caps"&gt;API&lt;/span&gt; composed of two&amp;nbsp;functions:&lt;/p&gt;

&lt;div class="language-python highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="c1"&gt;# lib/python/geolite2.py
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_index&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="c1"&gt;# We will add code here in some minutes...
&lt;/span&gt;    &lt;span class="k"&gt;pass&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# We will add some code here soon...
&lt;/span&gt;    &lt;span class="k"&gt;pass&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The first one creates our search index using the GeoLite2 city &lt;span class="caps"&gt;CSV&lt;/span&gt;&amp;nbsp;file.
The second lets us search for cities, states or countries and will pass&amp;nbsp;the
results back to&amp;nbsp;Elixir.&lt;/p&gt;

&lt;h3 id="indexing-the-city-data"&gt;Indexing the City&amp;nbsp;Data&lt;/h3&gt;

&lt;p&gt;For each Whoosh index you can define a certain structure, its&amp;nbsp;schema.
The schema defines which data you want to store in the index and&amp;nbsp;which
fulltext–or content–you want to run the search&amp;nbsp;on.&lt;/p&gt;

&lt;p&gt;Our city schema looks like&amp;nbsp;this:&lt;/p&gt;

&lt;div class="language-python highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;whoosh.fields&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SchemaClass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TEXT&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;whoosh.analysis&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;NgramWordAnalyzer&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CitySchema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SchemaClass&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stored&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stored&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;country&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stored&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;analyzer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;NgramWordAnalyzer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;minsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;phrase&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We want to store the city, the state, and the country. The &lt;code&gt;content&lt;/code&gt; field&amp;nbsp;will
hold the fulltext to search in, in our case it will be the joined city, state,&amp;nbsp;and
country name. This allows us to also search for cities, states or countries&amp;nbsp;and
provide multiple query terms to narrow down our&amp;nbsp;results.
We use an &lt;code&gt;NgramWordAnalyzer&lt;/code&gt; and set the &lt;code&gt;phrase&lt;/code&gt; argument to &lt;code&gt;False&lt;/code&gt; in&amp;nbsp;order
to save some space (see &lt;a href="https://whoosh.readthedocs.io/en/latest/recipes.html#itunes-style-search-as-you-type"&gt;this whoosh recipe&lt;/a&gt; for more&amp;nbsp;details).&lt;/p&gt;

&lt;p&gt;Before creating the index let’s define our directory names and files we want to&amp;nbsp;use
along with some handy functions for building the absolute paths to these&amp;nbsp;files:&lt;/p&gt;

&lt;div class="language-python highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="c1"&gt;# lib/python/geolite2.py
&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;


&lt;span class="c1"&gt;# The base directory where out data lies, relative to this file
&lt;/span&gt;&lt;span class="n"&gt;DATA_BASE_DIR&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="c1"&gt;# The actual city data file
&lt;/span&gt;&lt;span class="n"&gt;CITY_DATA_FILE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;GeoLite2-City-Locations-en.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="c1"&gt;# Our base directory where the index files are stored
&lt;/span&gt;&lt;span class="n"&gt;INDEX_BASE_DIR&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;index&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="c1"&gt;# The name of our index
&lt;/span&gt;&lt;span class="n"&gt;CITY_INDEX&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;city&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;index_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index_name&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt; Returns the absolute index path for the given index name &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;index_dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;{}_index&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;current_path&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;INDEX_BASE_DIR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index_dir&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;data_file_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_name&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt; Returns the absolute path to the file with the given name &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;current_path&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;DATA_BASE_DIR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;file_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;current_path&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt; Returns the absolute directory of this file &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dirname&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abspath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__file__&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Armed with these helpers we can now go ahead and define the actual index&amp;nbsp;creation
function. We read the &lt;span class="caps"&gt;CSV&lt;/span&gt; file line by line, and create the schema from&amp;nbsp;it.
Some lines in the &lt;span class="caps"&gt;CSV&lt;/span&gt; do not represent cities but states or countries, so we&amp;nbsp;skip
these lines, unless there is a value in the city&amp;nbsp;column:&lt;/p&gt;

&lt;div class="language-python highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="c1"&gt;# lib/python/geolite2.py
&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;shutil&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;whoosh.index&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_in&lt;/span&gt;

&lt;span class="c1"&gt;# ...
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_index&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt; Create search index files &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;index_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CITY_INDEX&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;_recreate_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_in&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CitySchema&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;writer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;writer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;data_file_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CITY_DATA_FILE&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;csv_file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DictReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;csv_file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;_add_document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;writer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;writer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;writer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_recreate_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt; Deletes and recreates the given path &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;shutil&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rmtree&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;makedirs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_add_document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;writer&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt; Writes the data to the index &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;city_name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;

    &lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;subdivision_1_name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;country&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;country_name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;country&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="n"&gt;writer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;country&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;country&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Note that the content (&lt;code&gt;"&amp;lt;city&amp;gt; &amp;lt;state&amp;gt; &amp;lt;country&amp;gt;"&lt;/code&gt;) is the actual text we&amp;nbsp;analyse
and put into the index. The rest of the schema properties is just stored&amp;nbsp;data,
which we can access again later on in our results and pass on to our Elixir&amp;nbsp;app.&lt;/p&gt;

&lt;h3 id="searching-for-cities"&gt;Searching for&amp;nbsp;Cities&lt;/h3&gt;

&lt;p&gt;Let’s now implement the function for making a search&amp;nbsp;request:&lt;/p&gt;

&lt;div class="language-python highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="c1"&gt;# lib/python/geolite2.py
&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;whoosh.qparser&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;QueryParser&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;whoosh.query&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Prefix&lt;/span&gt;

&lt;span class="c1"&gt;# ...
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt; Searches for the given query and returns `count` results &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;open_dir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;index_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CITY_INDEX&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;searcher&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;searcher&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;parser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;QueryParser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;termclass&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Prefix&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;parsed_query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;searcher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parsed_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;city&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                 &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;state&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                 &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;country&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;First, we get the city index that we created with&amp;nbsp;&lt;code&gt;create_index()&lt;/code&gt;.
Then we build an instance of whoosh’s &lt;code&gt;QueryParser&lt;/code&gt; in order to parse our&amp;nbsp;query
using our city schema. We use the &lt;code&gt;termclass=Prefix&lt;/code&gt; here to only match&amp;nbsp;documents
that contain any term that starts with the given query&amp;nbsp;text
(see the &lt;a href="http://whoosh.readthedocs.io/en/latest/api/query.html#whoosh.query.Prefix"&gt;whoosh.query.Prefix&lt;/a&gt;&amp;nbsp;docs).
The parsed query is then passed to a searcher which finally runs the search&amp;nbsp;and
compiles the results for&amp;nbsp;us.
In order to keep it simple we collect the needed data in a list of&amp;nbsp;lists.
This will be the data we are going to receive from our Elixir function in a&amp;nbsp;moment.&lt;/p&gt;

&lt;h2 id="the-elixir-part"&gt;The Elixir&amp;nbsp;part&lt;/h2&gt;

&lt;p&gt;Our Elixir &lt;span class="caps"&gt;API&lt;/span&gt; will look pretty much the same as the Python&amp;nbsp;&lt;span class="caps"&gt;API&lt;/span&gt;:&lt;/p&gt;

&lt;div class="language-elixir highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="c1"&gt;# /lib/elixir_python/geolite2.ex&lt;/span&gt;

&lt;span class="k"&gt;defmodule&lt;/span&gt; &lt;span class="no"&gt;ElixirPython&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;GeoLite2&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;create_index&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="c1"&gt;# We will add code here in some more minutes...&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="p"&gt;\\&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="c1"&gt;# We will add some code here later...&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id="calling-python"&gt;Calling&amp;nbsp;Python&lt;/h2&gt;

&lt;p&gt;To prepare for calling our Python functions from Elixir, add a&amp;nbsp;&lt;code&gt;python_call/3&lt;/code&gt;
function to the &lt;code&gt;ElixirPython&lt;/code&gt; module. It creates a Python instance for&amp;nbsp;us
and runs the Python code we provide it&amp;nbsp;with.&lt;/p&gt;

&lt;div class="language-elixir highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="c1"&gt;# lib/elixir_python.ex&lt;/span&gt;

&lt;span class="k"&gt;defmodule&lt;/span&gt; &lt;span class="no"&gt;ElixirPython&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="no"&gt;Export&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;Python&lt;/span&gt;

  &lt;span class="nv"&gt;@python_dir&lt;/span&gt; &lt;span class="s2"&gt;"lib/python"&lt;/span&gt; &lt;span class="c1"&gt;# &amp;lt;-- this is the dir we created before&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;python_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="p"&gt;\\&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;:ok&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;py&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Python&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;python_path:&lt;/span&gt; &lt;span class="no"&gt;Path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;expand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;@python_dir&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="no"&gt;Python&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;py&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We make use of Export’s Python module. &lt;code&gt;Python.start/1&lt;/code&gt; returns a tuple&amp;nbsp;including
a Python instance. In order to pick up our modules we pass the path to our&amp;nbsp;Python
directory as base&amp;nbsp;path.
&lt;code&gt;Python.call/4&lt;/code&gt; takes care of calling the given Python function from&amp;nbsp;the
respective module&amp;nbsp;file.&lt;/p&gt;

&lt;h3 id="creating-the-city-index"&gt;Creating the City&amp;nbsp;Index&lt;/h3&gt;

&lt;p&gt;We use the &lt;code&gt;python_call&lt;/code&gt; function we just defined to run the&amp;nbsp;&lt;code&gt;create_index&lt;/code&gt;
function in&amp;nbsp;Python:&lt;/p&gt;

&lt;div class="language-elixir highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="c1"&gt;# lib/elixir_python/geolite2.ex&lt;/span&gt;

&lt;span class="k"&gt;defmodule&lt;/span&gt; &lt;span class="no"&gt;ElixirPython&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;GeoLite2&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="no"&gt;ElixirPython&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;only:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;python_call:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

  &lt;span class="nv"&gt;@python_module&lt;/span&gt; &lt;span class="s2"&gt;"geolite2"&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;create_index&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="n"&gt;python_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;@python_module&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"create_index"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="c1"&gt;# ...&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id="searching-for-cities-1"&gt;Searching for&amp;nbsp;Cities&lt;/h3&gt;

&lt;p&gt;To run a search query we use our &lt;code&gt;python_call&lt;/code&gt; function again to call the&amp;nbsp;Python
&lt;code&gt;search&lt;/code&gt; function we defined. The returned value is a list of lists&amp;nbsp;holding
the stored index data. We just loop over it and create Maps from&amp;nbsp;it:&lt;/p&gt;

&lt;div class="language-elixir highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="c1"&gt;# lib/elixir_python/geolite2.ex&lt;/span&gt;

&lt;span class="k"&gt;defmodule&lt;/span&gt; &lt;span class="no"&gt;ElixirPython&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;GeoLite2&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="no"&gt;ElixirPython&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;only:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;python_call:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;python_call:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

  &lt;span class="nv"&gt;@python_module&lt;/span&gt; &lt;span class="s2"&gt;"geolite2"&lt;/span&gt;

  &lt;span class="c1"&gt;# ...&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="p"&gt;\\&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;python_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;@python_module&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"search"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="n"&gt;for&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;country&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
      &lt;span class="p"&gt;%{&lt;/span&gt;&lt;span class="ss"&gt;city:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;state:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;country:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;country&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id="running-a-search"&gt;Running a&amp;nbsp;Search&lt;/h2&gt;

&lt;p&gt;And we are done with our hunt for a city search index and we can use it&amp;nbsp;now.
Make sure you activated your Python virturalenv, then open up iex and give it a&amp;nbsp;try:&lt;/p&gt;

&lt;div class="language-bash highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="nb"&gt;source &lt;/span&gt;venv/bin/activate
iex &lt;span class="nt"&gt;-S&lt;/span&gt; mix
iex&lt;span class="o"&gt;(&lt;/span&gt;1&lt;span class="o"&gt;)&amp;gt;&lt;/span&gt; ElixirPython.GeoLite2.create_index&lt;span class="o"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;⌛&lt;/p&gt;

&lt;div class="language-bash highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;iex&lt;span class="o"&gt;(&lt;/span&gt;2&lt;span class="o"&gt;)&amp;gt;&lt;/span&gt; ElixirPython.GeoLite2.search&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Berlin"&lt;/span&gt;, 3&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;%&lt;span class="o"&gt;{&lt;/span&gt;city: &lt;span class="s2"&gt;"Berlin"&lt;/span&gt;, country: &lt;span class="s2"&gt;"Germany"&lt;/span&gt;, state: &lt;span class="s2"&gt;"Land Berlin"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;,
 %&lt;span class="o"&gt;{&lt;/span&gt;city: &lt;span class="s2"&gt;"Berlingen"&lt;/span&gt;, country: &lt;span class="s2"&gt;"Belgium"&lt;/span&gt;, state: &lt;span class="s2"&gt;"Flanders"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;,
 %&lt;span class="o"&gt;{&lt;/span&gt;city: &lt;span class="s2"&gt;"Falkenberg"&lt;/span&gt;, country: &lt;span class="s2"&gt;"Germany"&lt;/span&gt;, state: &lt;span class="s2"&gt;"Land Berlin"&lt;/span&gt;&lt;span class="o"&gt;}]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Yay. It&amp;nbsp;works!&lt;/p&gt;

&lt;p&gt;Let’s try another&amp;nbsp;one:&lt;/p&gt;

&lt;div class="language-bash highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;iex&lt;span class="o"&gt;(&lt;/span&gt;3&lt;span class="o"&gt;)&amp;gt;&lt;/span&gt; ElixirPython.GeoLite2.search&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"San José"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;[]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Hm, why is that? We couldn’t find any results, although San José is definitely in the index. It’s because our system does not normalise special characters and accents&amp;nbsp;yet.
Let’s do this in a final next&amp;nbsp;step.&lt;/p&gt;

&lt;h2 id="handling-special-characters"&gt;Handling Special&amp;nbsp;Characters&lt;/h2&gt;

&lt;p&gt;On the Elixir side this is easy to do. There is an erlang lib called &lt;a href="https://github.com/processone/iconv"&gt;iconv&lt;/a&gt;.
Let’s just add it to our&amp;nbsp;&lt;code&gt;mix.exs&lt;/code&gt;:&lt;/p&gt;

&lt;div class="language-elixir highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="c1"&gt;# mix.exs&lt;/span&gt;

&lt;span class="c1"&gt;#...&lt;/span&gt;
&lt;span class="k"&gt;defp&lt;/span&gt; &lt;span class="n"&gt;deps&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="ss"&gt;:export&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"~&amp;gt; 0.1.0"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
   &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;:iconv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"~&amp;gt; 1.0"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then install the dependency&amp;nbsp;with:&lt;/p&gt;

&lt;div class="language-bash highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;mix deps.get
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Let’s now preprocess the query before we pass it to our Python&amp;nbsp;function:&lt;/p&gt;

&lt;div class="language-elixir highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="c1"&gt;# lib/elixir_python/geolite2.ex&lt;/span&gt;

&lt;span class="k"&gt;defmodule&lt;/span&gt; &lt;span class="no"&gt;ElixirPython&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;GeoLite2&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="c1"&gt;# ...&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="p"&gt;\\&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;clean_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# ...&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;defp&lt;/span&gt; &lt;span class="n"&gt;clean_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="ss"&gt;:iconv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;convert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"utf-8"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"ascii//translit"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;When we rerun our query now we get the wanted city in the&amp;nbsp;results:&lt;/p&gt;

&lt;div class="language-bash highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;iex&lt;span class="o"&gt;(&lt;/span&gt;4&lt;span class="o"&gt;)&amp;gt;&lt;/span&gt; ElixirPython.GeoLite2.search&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"San José"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;%&lt;span class="o"&gt;{&lt;/span&gt;city: &lt;span class="s2"&gt;"San José"&lt;/span&gt;, country: &lt;span class="s2"&gt;"Costa Rica"&lt;/span&gt;, state: &lt;span class="s2"&gt;"Provincia de San Jose"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;,
&lt;span class="c"&gt;# ...&lt;/span&gt;
&lt;span class="o"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We still have some problems with e.g. German cities, like Görlitz, that use&amp;nbsp;Umlauts,
so let’s transform them to their &lt;span class="caps"&gt;ASCII&lt;/span&gt; counterparts before creating the&amp;nbsp;index:&lt;/p&gt;

&lt;div class="language-python highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="c1"&gt;# lib/python/geolite2.py
&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;unicodedata&lt;/span&gt;

&lt;span class="c1"&gt;# ...
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_add_document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;writer&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt; Writes the data to the index &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="c1"&gt;# ...
&lt;/span&gt;
    &lt;span class="c1"&gt;# clean up the content that goes to the index by using _cleanup_text():
&lt;/span&gt;    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_cleanup_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;country&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;

    &lt;span class="n"&gt;writer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;country&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;country&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_cleanup_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt; Removes accents and replaces umlauts &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;replaces&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ä&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ae&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ö&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;oe&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ü&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Ä&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Ae&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Ö&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Oe&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Ü&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Ue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ß&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ss&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;original&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;replacement&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;replaces&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;original&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;replacement&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;unicodedata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;NFKD&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;char&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;char&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;unicodedata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;combining&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;char&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ascii&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ignore&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ascii&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If we recreate our index and run a &lt;code&gt;Görlitz&lt;/code&gt; query now, we will get some fitting&amp;nbsp;results.&lt;/p&gt;

&lt;h2 id="wrapping-up"&gt;Wrapping&amp;nbsp;Up&lt;/h2&gt;

&lt;p&gt;We managed to build a small local fulltext index for a city search without too much&amp;nbsp;effort.
Our system returns great search results and we allowed to search for cities with or&amp;nbsp;without
using special&amp;nbsp;characters.&lt;/p&gt;

&lt;p&gt;All in all, it does not scale very well,&amp;nbsp;though.
I tried to build another, more sophisticated city index including about &lt;span class="caps"&gt;4.4&lt;/span&gt;&amp;nbsp;M
cities and villages world-wide and the location&amp;nbsp;coordinates
(latitude and longitude) and time zone for each&amp;nbsp;place.
If you are interested you can find the script for combining the city data&amp;nbsp;and
location data in &lt;a href="https://gist.github.com/paulgoetze/3fea5dfb2b757a46aec25d5bcfd1359d"&gt;this gist&lt;/a&gt;.
It took quite some time to build the index (about 40 minutes on my laptop)&amp;nbsp;and
resulted in an index file of &lt;span class="caps"&gt;1.3&lt;/span&gt; &lt;span class="caps"&gt;GB&lt;/span&gt; size (compared to ~29 &lt;span class="caps"&gt;MB&lt;/span&gt; for the GeoLite2&amp;nbsp;index).
Although it also worked well and you will get fitting search results, it takes&amp;nbsp;about
10 seconds to finish a single request. This approach would need some&amp;nbsp;additional
caching and further optimisation in order to be useful in any kind of&amp;nbsp;way.&lt;/p&gt;

&lt;p&gt;So, eventually, I ended up using the Google Places &lt;span class="caps"&gt;API&lt;/span&gt; anyway&amp;nbsp;😉.
But, hey: “Wieder was&amp;nbsp;gelernt.”&lt;/p&gt;

</content>
    <summary type="html">In this “let’s-combine-languages” experiment I show how to build a file-based fulltext search index for cities using Elixir and “Whoosh”, a mature Python package for fulltext indexing.</summary>
  </entry>
  <entry>
    <id>tag:paulgoetze.com,2017-03-19:/2017/03/19/why-i-dont-like-switching-the-programming-language/</id>
    <title type="html">Why I Don’t Like Switching the Programming Language</title>
    <published>2017-03-19T00:00:00Z</published>
    <updated>2017-03-19T00:00:00Z</updated>
    <link rel="alternate" href="https://paulgoetze.com/2017/03/19/why-i-dont-like-switching-the-programming-language/" type="text/html"/>
    <content type="html">&lt;p&gt;A bit more than 5 months ago I started working at &lt;a href="https://grammofy.com"&gt;Grammofy&lt;/a&gt;,
where we use Python a&amp;nbsp;lot.
Before, I wrote only a tiny bit of production code in&amp;nbsp;Python.
I had a rather hard time starting with it and I came to have a lot of fun with it&amp;nbsp;now.
As always, there were people who were congratulating that I finally joined&amp;nbsp;the
right&amp;nbsp;side.
However, after a couple of months of freely forcing myself to work with a&amp;nbsp;new
language, I can say that I don’t like to switch to&amp;nbsp;Python.&lt;/p&gt;

&lt;p&gt;Just that you get me right: It isn’t solely Python. I realised that I don’t&amp;nbsp;like
&lt;em&gt;switching&lt;/em&gt; to any language &lt;em&gt;entirely&lt;/em&gt;. Ever. Instead I choose to learn&amp;nbsp;even
more&amp;nbsp;languages.
And most importantly: to never stick to only a single&amp;nbsp;language.&lt;/p&gt;

&lt;p&gt;If you are tempted to do exactly this, may it be because you think you already&amp;nbsp;found
your favourite language or because everything else looks not as promising as&amp;nbsp;what
you’re using right now – here’s why you better should not settle&amp;nbsp;down.&lt;/p&gt;

&lt;h2 id="the-excitement"&gt;The&amp;nbsp;Excitement&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Six months ago.&lt;/em&gt; It was the time were we had this book club at my former work&amp;nbsp;and
just finished &lt;a href="http://www.poodr.com"&gt;&lt;span class="caps"&gt;POODR&lt;/span&gt;&lt;/a&gt; and the &lt;a href="https://pragprog.com/book/btlang/seven-languages-in-seven-weeks"&gt;“7 languages in (definitely more than 😉) 7 weeks”&lt;/a&gt;-book.
And because we were still hungry, we directly started reading a couple of&amp;nbsp;Elixir
books afterwards. I had all these new languages in my head, each one touting for&amp;nbsp;my
attention.
I did mostly Ruby the last years, I was looking into Elixir and a bit&amp;nbsp;into
JavaScript, but I had not done too much Python&amp;nbsp;before.
So, when I started working in my new job, I was curious to see how I would&amp;nbsp;do,
spending most of my time with a – for me rather foreign –&amp;nbsp;language.&lt;/p&gt;

&lt;p&gt;Well, it turned out to be a nightmare for the first couple of&amp;nbsp;weeks.&lt;/p&gt;

&lt;h2 id="the-pain"&gt;The&amp;nbsp;Pain&lt;/h2&gt;

&lt;p&gt;If you learn a new programming language, it feels a bit like learning a&amp;nbsp;real
language from&amp;nbsp;scratch.
You hardly understand the basics, and yet you are asked to build full&amp;nbsp;sentences.
You are restarting as a “Junior”, even if your position is not called&amp;nbsp;“Junior”.
You love what you’re doing, and you decided a long time ago, that you’ll give&amp;nbsp;a
shit on your job title. Still, now you are tempted to beg for being&amp;nbsp;titled
“Junior”, just to get even more time to understand&amp;nbsp;things.&lt;/p&gt;

&lt;p&gt;But no fear! The good thing is: Learning a new language is just like learning&amp;nbsp;a
&lt;em&gt;dialect&lt;/em&gt;.
You already know a lot about the language from stuff you’ve heard&amp;nbsp;before.
Breaking it down, it is just a couple of words, rules, and concepts you&amp;nbsp;don’t
know yet. Doesn’t sound too hard to manage, does&amp;nbsp;it?&lt;/p&gt;

&lt;p&gt;Nevertheless, my first two weeks were horribly unproductive. Every couple&amp;nbsp;of
hours I asked myself why people loved Python. What was so special about&amp;nbsp;it,
that I couldn’t achieve the same with Ruby for&amp;nbsp;instance?
After my first week of subtle frustration, I decided to get to the bottom&amp;nbsp;of
what was so special about&amp;nbsp;it.
I searched for Python books and after a while I luckily&amp;nbsp;found
&lt;a href="http://shop.oreilly.com/product/0636920032519.do"&gt;“Fluent Python”&lt;/a&gt;, that did a great job teaching me the&amp;nbsp;basic
principles and specialities of the language. Problem&amp;nbsp;solved.&lt;/p&gt;

&lt;p&gt;Not quite. I had a motivation but still no clue and writing code went terribly&amp;nbsp;slow.&lt;/p&gt;

&lt;p&gt;I realised that when learning a new programming language, you spend a lot&amp;nbsp;of
time thinking about how to do things. If you care (and you should!), you want&amp;nbsp;to
write some code that other (more experienced) developers can read and modify&amp;nbsp;without
getting a&amp;nbsp;headache.&lt;/p&gt;

&lt;p&gt;Of course you know all about clean code, object oriented design, patterns&amp;nbsp;etc.
The problem is, you still don’t really&amp;nbsp;know:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;how to build your&amp;nbsp;architecture&lt;/li&gt;
  &lt;li&gt;where to put&amp;nbsp;things&lt;/li&gt;
  &lt;li&gt;how to name&amp;nbsp;things&lt;/li&gt;
  &lt;li&gt;how to build certain parts like packages, modules, and&amp;nbsp;classes&lt;/li&gt;
  &lt;li&gt;what conventions are followed by your&amp;nbsp;team&lt;/li&gt;
  &lt;li&gt;which of the pythillion different syntaxes you should&amp;nbsp;prefer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Luckily most programming languages come with a community-driven or even&amp;nbsp;in-house
style guide. Python has the so-called Python Enhancement Proposal number&amp;nbsp;8–or
shorter &lt;a href="https://www.python.org/dev/peps/pep-0008"&gt;&lt;span class="caps"&gt;PEP08&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Next thing I noticed: Whenever you learn a new language, the first thing you should&amp;nbsp;do
is setting up some linting tool for the editor of your choice. It will&amp;nbsp;massively
speed you up and leaving you with some really nice aha moments. As painful as&amp;nbsp;a
linter sometimes seems if you already know what you’re doing, as helpful it&amp;nbsp;is
for starting off. You will learn how things should look like, you will&amp;nbsp;get
immediate feedback about what is wrong with your code and it will be&amp;nbsp;right
at least syntax-wise and&amp;nbsp;style-wise.&lt;/p&gt;

&lt;h2 id="the-new"&gt;The&amp;nbsp;New&lt;/h2&gt;

&lt;p&gt;Before I wrote Python there were lots of things I could never understand,&amp;nbsp;like:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Why would one ever use 4 spaces for&amp;nbsp;indentation?&lt;/li&gt;
  &lt;li&gt;Why would you not put spaces between keyword arguments and default&amp;nbsp;values?&lt;/li&gt;
  &lt;li&gt;Why that waste of space by putting two blank lines between two&amp;nbsp;functions?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;(It absolutely makes sense to me now. I’ll leave it to the kind reader to&amp;nbsp;discover
the answers and arguments&amp;nbsp;😏)&lt;/p&gt;

&lt;p&gt;The point is, if you don’t explore other languages, you will never see the&amp;nbsp;different
communities and thoughts behind&amp;nbsp;them.
Learning another language forces you to think about why other communities&amp;nbsp;are
doing some things&amp;nbsp;differently.
It forces you to question the approaches you just took to be&amp;nbsp;right,
because you knew them long enough and you just have a way with&amp;nbsp;them.&lt;/p&gt;

&lt;p&gt;I really like looking into the code of libraries I&amp;nbsp;use.
Sometimes it’s needed because of lacking documentation. But even if you&amp;nbsp;have
all the information you need, there are some good reasons why you still&amp;nbsp;should
read some open source library code. First of all, it helps to judge how well&amp;nbsp;the
project is built and maintained. It helps to make a well-informed decision&amp;nbsp;on
whether to use it or better just move on. Secondly, it will reveal&amp;nbsp;some
patterns how the community likes to structure their&amp;nbsp;code.
You will–hopefully–see best practices and–wham-bam–you learned an&amp;nbsp;important
piece for your daily work&amp;nbsp;again.
Additionally, browsing the code of apps and libraries being built with a&amp;nbsp;certain
language will get you an idea about what problems can be solved&amp;nbsp;elegantly
using it. You will very soon have a feeling for whether the problem&amp;nbsp;you’re
currently working on can be solved easier by using another&amp;nbsp;language.&lt;/p&gt;

&lt;p&gt;This brings us to a very important point: You should be open-minded to&amp;nbsp;explore
alternative solutions. Sometimes this even means trying another&amp;nbsp;language.
I always try to not to be prejudiced by all the rumours out there. Just&amp;nbsp;because
someone says: “Nah, don’t use it. It’s bullshit!”, doesn’t mean that you are&amp;nbsp;not
allowed to verify it is actual bullshit. Maybe it’s&amp;nbsp;not.&lt;/p&gt;

&lt;p&gt;Sometimes there is no way around working with your favourite “worst language&amp;nbsp;in
the world”. Who has for instance not dealt with legacy&amp;nbsp;code?
If you are forced to work with something you don’t like for the&amp;nbsp;moment,
here’s what I’d&amp;nbsp;do:&lt;/p&gt;

&lt;p&gt;Find something special about the&amp;nbsp;language.
You can’t have fun with something that seems utterly boring and useless to&amp;nbsp;you.&lt;/p&gt;

&lt;p&gt;When stumbling upon a new language ask&amp;nbsp;yourself:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Which different concepts are used and&amp;nbsp;why?&lt;/li&gt;
  &lt;li&gt;What specialities does it&amp;nbsp;have?&lt;/li&gt;
  &lt;li&gt;What kind of problems does it solve really&amp;nbsp;well?&lt;/li&gt;
  &lt;li&gt;For which problems should it not be&amp;nbsp;used?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I truly believe that once you found something exciting about a&amp;nbsp;language,
all the experienced pain and the too-many-new-things will very soon give&amp;nbsp;you
small benefits in a lot of&amp;nbsp;fields.
Just keep&amp;nbsp;learning!&lt;/p&gt;

&lt;h2 id="the-benefit"&gt;The&amp;nbsp;Benefit&lt;/h2&gt;

&lt;p&gt;Unfortunately, there is no language which is perfect for all kind of&amp;nbsp;problems.
Every language was built with the goal to solve a certain problem&amp;nbsp;space.
This also means knowing about lots of languages will give you the ability to&amp;nbsp;work
in different fields. It allows you to solve problems with the fitting problem&amp;nbsp;solver.&lt;/p&gt;

&lt;p&gt;I personally can feel a benefit from working with different languages to&amp;nbsp;that
effect that it helps me to dynamically choose what to work&amp;nbsp;on.
Programming is still a kind of creative work and you can’t always force you into the&amp;nbsp;flow.
In case I’m in the mood for doing some fast prototyping I can choose&amp;nbsp;Ruby.
Need to solve some Maths? I’ll probably see whether Python can help&amp;nbsp;here.
If I like to try some spacy &lt;span class="caps"&gt;UI&lt;/span&gt; stuff, I can leverage&amp;nbsp;JavaScript.
If I have to work on some highly available, distributed &lt;em&gt;[put your favourite 🦄 and 🌈 here]&lt;/em&gt; loveliness I’ll maybe go and write some Elixir these&amp;nbsp;days.
Personally, it helps me to have a more balanced programming experience all day&amp;nbsp;through.&lt;/p&gt;

&lt;p&gt;Besides, learning a new language is just a nice&amp;nbsp;challenge.
It keeps your head spinning and demands some patience. Sometimes it requires&amp;nbsp;to
be more tolerant towards things you’d approach very differently from what you already&amp;nbsp;know.
It gives you loads of valuable insights and when you learn together&amp;nbsp;with
someone else or you visit some developer meetups while learning, it will&amp;nbsp;even
affect your social&amp;nbsp;surroundings.&lt;/p&gt;

&lt;p&gt;To start off, you could go and pick a language you could never really&amp;nbsp;stand.
Just keep in mind that you don’t have to switch to it. You should learn&amp;nbsp;about
its concepts, the reasons why they exist and what the language is used&amp;nbsp;for.
If you like what you learned, look at the community behind, understand&amp;nbsp;their
approaches, and maybe build something together with&amp;nbsp;them.
If you don’t like what you learned, then ask&amp;nbsp;yourself,
&lt;a href="https://www.bloomberg.com/news/articles/2015-06-23/the-old-coding-languages-that-refuse-to-die"&gt;why people still use it&lt;/a&gt;.
It’s certainly not because they are&amp;nbsp;stupid.&lt;/p&gt;

&lt;h2 id="the-wrap-up"&gt;The Wrap&amp;nbsp;Up&lt;/h2&gt;

&lt;p&gt;Phew, that was quite a mix of personal stories and&amp;nbsp;insights.&lt;/p&gt;

&lt;p&gt;With these, I feel like not sticking to a single language seems to be a rather good&amp;nbsp;idea.
So, the question is: Should I learn &lt;em&gt;all&lt;/em&gt; the languages out there? How&amp;nbsp;many
languages are enough and how could I possibly learn all this and at the&amp;nbsp;same
time still have a social&amp;nbsp;life?&lt;/p&gt;

&lt;p&gt;I’m totally aware of that it’s nearly impossible to keep up with all&amp;nbsp;the
programming languages you already know, much less to fulfill the “Learn a new&amp;nbsp;language
each year!”, that some folks are preaching. I think it’s about something&amp;nbsp;else.
It’s about developing a mind-set and skills, that allow you to choose&amp;nbsp;interesting
projects, move on if it doesn’t fit anymore, and make a living while having&amp;nbsp;fun
solving&amp;nbsp;problems.&lt;/p&gt;

&lt;p&gt;It’s like with real languages. You don’t have to know every language&amp;nbsp;perfectly
but it might help to know some more in-depth to get along in foreign&amp;nbsp;regions.
And if you are missing some bits and pieces you can still wave and make&amp;nbsp;some
noise to make your&amp;nbsp;point.&lt;/p&gt;

&lt;p&gt;So – back to programming languages – these are my final&amp;nbsp;words:&lt;/p&gt;

&lt;p&gt;Do not switch! Rather use the language that suits&amp;nbsp;best.
Not switching entirely will enlarge your toolset, keeps you open-minded,&amp;nbsp;and
will eventually make you a better&amp;nbsp;developer.&lt;/p&gt;

</content>
    <summary type="html">Welcome to the “right side”? Instead of switching, I choose to learn more languages and to never stick to only a single language. If you are tempted to do exactly this – here’s why you better should not settle down.</summary>
  </entry>
  <entry>
    <id>tag:paulgoetze.com,2016-10-03:/2016/10/03/creating-machine-learning-systems-with-jruby/</id>
    <title type="html">Creating Machine Learning Systems with JRuby</title>
    <published>2016-10-03T00:00:00Z</published>
    <updated>2016-10-03T00:00:00Z</updated>
    <link rel="alternate" href="https://paulgoetze.com/2016/10/03/creating-machine-learning-systems-with-jruby/" type="text/html"/>
    <content type="html">&lt;p&gt;All the different programming languages out there seem to be a better fit for machine learning tasks than Ruby, right? Python has &lt;a href="https://scikit-learn.org/"&gt;scikit-learn&lt;/a&gt;, Java has &lt;a href="https://www.cs.waikato.ac.nz/ml/weka/index.html"&gt;Weka&lt;/a&gt;, and there’s &lt;a href="https://shogun-toolbox.org/"&gt;Shogun&lt;/a&gt; for machine learning in C++, just to name a few. On the other hand, Ruby has an excellent reputation for fast&amp;nbsp;prototyping.&lt;/p&gt;

&lt;p&gt;So, why shouldn’t you prototype machine learning systems with Ruby? Challenge accepted! In this tutorial, we will build a system that can automatically categorize &lt;span class="caps"&gt;BBC&lt;/span&gt; sports articles for&amp;nbsp;you.&lt;/p&gt;

&lt;p&gt;Oh, and we’ll do it in Ruby. Well, that’s not entirely true—we will use JRuby and Java’s Weka library via the &lt;a href="https://rubygems.org/gems/weka"&gt;weka gem&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id="preparation"&gt;Preparation&lt;/h2&gt;

&lt;p&gt;First, install &lt;a href="http://jruby.org/"&gt;JRuby&lt;/a&gt; v&lt;span class="caps"&gt;10.0.0.1&lt;/span&gt;+. Then create an &lt;code&gt;ml_with_jruby&lt;/code&gt; directory and put the following Gemfile into&amp;nbsp;it:&lt;/p&gt;

&lt;div class="language-ruby highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Gemfile&lt;/span&gt;

&lt;span class="n"&gt;source&lt;/span&gt; &lt;span class="s1"&gt;'https://rubygems.org'&lt;/span&gt;

&lt;span class="c1"&gt;# use your JRuby version here&lt;/span&gt;
&lt;span class="n"&gt;ruby&lt;/span&gt; &lt;span class="s1"&gt;'3.4.2'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;engine: &lt;/span&gt;&lt;span class="s1"&gt;'jruby'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;engine_version: &lt;/span&gt;&lt;span class="s1"&gt;'10.0.0.1'&lt;/span&gt;

&lt;span class="n"&gt;gem&lt;/span&gt; &lt;span class="s1"&gt;'weka'&lt;/span&gt;    &lt;span class="c1"&gt;# this provides us with the weka lib&lt;/span&gt;
&lt;span class="n"&gt;gem&lt;/span&gt; &lt;span class="s1"&gt;'scalpel'&lt;/span&gt; &lt;span class="c1"&gt;# used for text processing&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In your JRuby environment, run &lt;code&gt;bundle install&lt;/code&gt; to install the&amp;nbsp;gems.&lt;/p&gt;

&lt;p&gt;Next, &lt;a href="http://mlg.ucd.ie/files/datasets/bbcsport-fulltext.zip"&gt;download&lt;/a&gt; the free dataset of &lt;a href="http://mlg.ucd.ie/datasets/bbc.html"&gt;&lt;span class="caps"&gt;BBC&lt;/span&gt; sport articles&lt;/a&gt; and move the unpacked article directories into a &lt;code&gt;./data/training&lt;/code&gt;&amp;nbsp;directory.&lt;/p&gt;

&lt;p&gt;Finally, move the last two articles of each sports type into a separate &lt;code&gt;./data/test&lt;/code&gt;&amp;nbsp;directory.&lt;/p&gt;

&lt;p&gt;Your project structure should look like&amp;nbsp;this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;└── ml_with_jruby
    ├── data
    │   ├── test
    │   │   ├── athletics
    │   │   ├── cricket
    │   │   ├── football
    │   │   ├── rugby
    │   │   └── tennis
    │   └── training
    │       ├── athletics
    │       ├── cricket
    │       ├── football
    │       ├── rugby
    │       └── tennis
    └── Gemfile
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The texts in the &lt;code&gt;test&lt;/code&gt; directory will be our test files and will be classified with our trained&amp;nbsp;classifier.&lt;/p&gt;

&lt;p&gt;Wait…&lt;em&gt;training&lt;/em&gt;, &lt;em&gt;classification&lt;/em&gt;, &lt;em&gt;classifier&lt;/em&gt;? Lots of terms here. Let’s have a quick look at what they&amp;nbsp;mean.&lt;/p&gt;

&lt;h2 id="what-is-classification"&gt;What is&amp;nbsp;Classification?&lt;/h2&gt;

&lt;p&gt;Classification means “labeling given data”. An article could, for instance, be labeled as &lt;em&gt;Tennis&lt;/em&gt; or &lt;em&gt;Cricket&lt;/em&gt;. These labels, &lt;em&gt;Tennis&lt;/em&gt; and &lt;em&gt;Cricket&lt;/em&gt;, are called &lt;em&gt;classes&lt;/em&gt;. The algorithm that chooses the label for one of our articles is called the &lt;em&gt;classifier&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Now, there are different types of classification problems: &lt;em&gt;supervised&lt;/em&gt; and &lt;em&gt;unsupervised&lt;/em&gt;. The first is often referred to as “Clustering”, where you don’t have any example data and you don’t know beforehand into which classes your algorithm will split your&amp;nbsp;data.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Supervised&lt;/em&gt; means we have pre-labeled data, like our labeled articles. We use these labeled articles to train our classifier or in other words: to build a model that can decide on how to categorize new data. After the training, we can pass unlabeled articles to our classifier and it will give us a label for each of&amp;nbsp;them.&lt;/p&gt;

&lt;p&gt;This said, the three steps to build a system for supervised classification&amp;nbsp;are:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Creating a training dataset from raw&amp;nbsp;data&lt;/li&gt;
  &lt;li&gt;Training the classifier with the training&amp;nbsp;dataset&lt;/li&gt;
  &lt;li&gt;Classifying new data with the trained&amp;nbsp;classifier&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id="creating-the-dataset"&gt;Creating the&amp;nbsp;Dataset&lt;/h2&gt;

&lt;p&gt;Let’s start with compiling our training&amp;nbsp;data.&lt;/p&gt;

&lt;p&gt;We need some example data to tell our classifier what different article types look like. Computers are smart, but we can’t expect them to take text and have a good gut feeling of what sports it is about. So, the first step is to transform our raw text into a representation with which our classifier can work. A computer should be good with numbers, so we will use a set of numbers that describe the properties of the text (the so-called &lt;em&gt;features&lt;/em&gt;).&lt;/p&gt;

&lt;p&gt;We have to find some features that can best divide our data into the different article types. We could calculate, for example, the total length of the text or the number of certain keywords in the text. At this step you can be creative, choosing whatever comes to your mind and makes sense. There are feature combinations that work well together, whereas others reduce the performance of the classifier. Once you have a pool of features, you can use algorithms to select the most valuable features. To keep it simple we won’t cover the feature selection in this tutorial and just use our good sense to select a small set of&amp;nbsp;features.&lt;/p&gt;

&lt;h3 id="extracting-features-from-the-text"&gt;Extracting Features From the&amp;nbsp;Text&lt;/h3&gt;

&lt;p&gt;We’ll do the feature extraction in a &lt;code&gt;FeatureExtractor&lt;/code&gt; class that takes a piece of text and returns a Hash of properties and their numeric representations. Let’s directly process the given text into paragraphs, sentences (using &lt;a href="https://rubygems.org/gems/scalpel"&gt;Scalpel&lt;/a&gt;), and words. We will need these soon enough as we fill up our features&amp;nbsp;Hash:&lt;/p&gt;

&lt;div class="language-ruby highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="c1"&gt;# feature_extractor.rb&lt;/span&gt;

&lt;span class="nb"&gt;require&lt;/span&gt; &lt;span class="s1"&gt;'scalpel'&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;FeatureExtractor&lt;/span&gt;
  &lt;span class="nb"&gt;attr_reader&lt;/span&gt; &lt;span class="ss"&gt;:text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:paragraphs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:sentences&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:words&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;initialize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="vi"&gt;@text&lt;/span&gt;       &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;
    &lt;span class="vi"&gt;@paragraphs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/\n{2,}/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="vi"&gt;@sentences&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Scalpel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cut&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="vi"&gt;@words&lt;/span&gt;      &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;scan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/[\w'-]+/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;features&lt;/span&gt;
   &lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="c1"&gt;# to be implemented next :)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;First, we will add some obvious features: Count the appearance of words describing the sports itself, such as “tennis” in an article about tennis, “cricket” in an article about cricket, etc. (note that e.g. “athlet” counts “athletes” as well as “athletic”, and so&amp;nbsp;on).&lt;/p&gt;

&lt;div class="language-ruby highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;FeatureExtractor&lt;/span&gt;
  &lt;span class="c1"&gt;# ...&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;features&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="ss"&gt;athletics_hints_count: &lt;/span&gt;&lt;span class="n"&gt;match_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'athlet'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="ss"&gt;cricket_hints_count:   &lt;/span&gt;&lt;span class="n"&gt;match_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'cricket'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="ss"&gt;football_hints_count:  &lt;/span&gt;&lt;span class="n"&gt;match_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'football'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="ss"&gt;rugby_hints_count:     &lt;/span&gt;&lt;span class="n"&gt;match_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'rugby'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="ss"&gt;tennis_hints_count:    &lt;/span&gt;&lt;span class="n"&gt;match_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'tennis'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="kp"&gt;private&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;match_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;scan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sr"&gt;/i&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It might be interesting how many proper nouns, like names and teams, appear in the text. So we’ll add a &lt;code&gt;capitalized_words_count&lt;/code&gt;&amp;nbsp;feature.&lt;/p&gt;

&lt;p&gt;Articles about e.g. tennis and athletics might be more likely to talk about women than, for example, football articles. As such, we’ll cover this in a feature that scans for male and female keywords and says which appear most often. Let’s call it&amp;nbsp;&lt;code&gt;gender_dominance&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Also, add some more generic text features, &lt;code&gt;like text_length&lt;/code&gt;, &lt;code&gt;sentence_count&lt;/code&gt;, &lt;code&gt;paragraphs_count&lt;/code&gt;, and&amp;nbsp;&lt;code&gt;words_per_sentence_average&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;When you read through a couple of our training articles, it seems like some people have to say more than others, so let’s count the quotes in the text,&amp;nbsp;too.&lt;/p&gt;

&lt;p&gt;You get the idea. Just try to extract some properties that probably can distinguish the content of an article from&amp;nbsp;another.&lt;/p&gt;

&lt;p&gt;We will add some additional features that indicate whether it’s more a team or individual sport, by counting the number of hints like pronouns (I, my, me vs. we, our, us) and a &lt;code&gt;number_count&lt;/code&gt; feature that might indicate sports where scores or times are&amp;nbsp;important.&lt;/p&gt;

&lt;p&gt;With this, we are good for now and we can finish up our extractor&amp;nbsp;class:&lt;/p&gt;

&lt;div class="language-ruby highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;FeatureExtractor&lt;/span&gt;
  &lt;span class="c1"&gt;# ...&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;features&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="ss"&gt;athletics_hints_count:      &lt;/span&gt;&lt;span class="n"&gt;match_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'athlet'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="ss"&gt;cricket_hints_count:        &lt;/span&gt;&lt;span class="n"&gt;match_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'cricket'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="ss"&gt;football_hints_count:       &lt;/span&gt;&lt;span class="n"&gt;match_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'football'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="ss"&gt;rugby_hints_count:          &lt;/span&gt;&lt;span class="n"&gt;match_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'rugby'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="ss"&gt;tennis_hints_count:         &lt;/span&gt;&lt;span class="n"&gt;match_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'tennis'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="ss"&gt;capitalized_words_count:    &lt;/span&gt;&lt;span class="n"&gt;capitalized_words_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;gender_dominance:           &lt;/span&gt;&lt;span class="n"&gt;gender_dominance&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;text_length:                &lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;sentences_count:            &lt;/span&gt;&lt;span class="n"&gt;sentences&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;paragraphs_count:           &lt;/span&gt;&lt;span class="n"&gt;paragraphs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;words_per_sentence_average: &lt;/span&gt;&lt;span class="n"&gt;words_per_sentence_average&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;quote_count:                &lt;/span&gt;&lt;span class="n"&gt;quote_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;single_sport_hints_count:   &lt;/span&gt;&lt;span class="n"&gt;terms_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sx"&gt;%w(I me my)&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="ss"&gt;team_sport_hints_count:     &lt;/span&gt;&lt;span class="n"&gt;terms_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sx"&gt;%w(we us our team)&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="ss"&gt;number_count:               &lt;/span&gt;&lt;span class="n"&gt;number_count&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="kp"&gt;private&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;match_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;scan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sr"&gt;/i&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;capitalized_words_count&lt;/span&gt;
    &lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_with?&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;upcase&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;gender_dominance&lt;/span&gt;
    &lt;span class="n"&gt;terms_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sx"&gt;%w(she her)&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;terms_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sx"&gt;%w(he his)&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;terms_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;terms&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;terms&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;include?&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;downcase&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;words_per_sentence_average&lt;/span&gt;
    &lt;span class="n"&gt;sentences&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;zero?&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;sentences&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;quote_count&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;scan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/"[^"]+"/&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;number_count&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;scan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/\d+[\.,]\d+|\d+/&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id="compiling-the-training-dataset"&gt;Compiling the Training&amp;nbsp;Dataset&lt;/h3&gt;

&lt;p&gt;We want to compile a dataset from our text features and save it as a file so that we can load it later on and train our classifier. We could also do it all in memory, but when storing the dataset as a file we can have a look into it and get a better understanding of what’s actually going on in this step. Weka provides a nice way for doing this with its &lt;code&gt;Instances&lt;/code&gt; class in the &lt;code&gt;Weka::Core&lt;/code&gt;&amp;nbsp;module.&lt;/p&gt;

&lt;p&gt;In a separate script, load our training texts and extract our features. Create an &lt;code&gt;Instances&lt;/code&gt; object out of them and finally store our dataset on our disk. Before we start with this, let’s create another &lt;code&gt;FileLoader&lt;/code&gt; and a &lt;code&gt;Text&lt;/code&gt; class that will nicely abstract our file loading and feature extraction from a given&amp;nbsp;file.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;FileLoader&lt;/code&gt; will return all text files from the given data&amp;nbsp;directory:&lt;/p&gt;

&lt;div class="language-ruby highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="c1"&gt;# file_loader.rb&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;FileLoader&lt;/span&gt;
  &lt;span class="nb"&gt;attr_reader&lt;/span&gt; &lt;span class="ss"&gt;:data_directory&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;initialize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_directory&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="vi"&gt;@data_directory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;File&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;expand_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"../&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;data_directory&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kp"&gt;__FILE__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;files_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;article_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="no"&gt;Dir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;glob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;data_directory&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;article_type&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/*.txt"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Our &lt;code&gt;Text&lt;/code&gt; class allows us passing a text file and getting its features, by using the &lt;code&gt;FeatureExtractor&lt;/code&gt; we created&amp;nbsp;above:&lt;/p&gt;

&lt;div class="language-ruby highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="c1"&gt;# text.rb&lt;/span&gt;

&lt;span class="nb"&gt;require_relative&lt;/span&gt; &lt;span class="s1"&gt;'feature_extractor'&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Text&lt;/span&gt;
  &lt;span class="nb"&gt;attr_reader&lt;/span&gt; &lt;span class="ss"&gt;:text&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;initialize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;file_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;File&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;expand_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kp"&gt;__FILE__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# There seem to be some invalid UTF-8 characters in the texts,&lt;/span&gt;
    &lt;span class="c1"&gt;# so we remove them here.&lt;/span&gt;
    &lt;span class="vi"&gt;@text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;File&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;encode!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'UTF-8'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'UTF-8'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;invalid: :replace&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;features&lt;/span&gt;
    &lt;span class="no"&gt;FeatureExtractor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;features&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We can now use these classes to write a script for creating the training dataset. Create a new file called&amp;nbsp;&lt;code&gt;create_dataset.rb&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;First, create an empty &lt;code&gt;Instances&lt;/code&gt; object that represents our training dataset. Add a numeric attribute for each feature and a nominal class attribute. We configure the different article types as possible class&amp;nbsp;values:&lt;/p&gt;

&lt;div class="language-ruby highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="c1"&gt;# create_dataset.rb&lt;/span&gt;

&lt;span class="nb"&gt;require&lt;/span&gt; &lt;span class="s1"&gt;'weka'&lt;/span&gt;
&lt;span class="nb"&gt;require_relative&lt;/span&gt; &lt;span class="s1"&gt;'feature_extractor'&lt;/span&gt;
&lt;span class="nb"&gt;require_relative&lt;/span&gt; &lt;span class="s1"&gt;'file_loader'&lt;/span&gt;
&lt;span class="nb"&gt;require_relative&lt;/span&gt; &lt;span class="s1"&gt;'text'&lt;/span&gt;

&lt;span class="n"&gt;article_types&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sx"&gt;%i(athletics cricket football rugby tennis)&lt;/span&gt;
&lt;span class="n"&gt;attribute_names&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;FeatureExtractor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;features&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;

&lt;span class="n"&gt;dataset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Weka&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Core&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Instances&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;with_attributes&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="n"&gt;attribute_names&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;each&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;numeric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="n"&gt;nominal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;values: &lt;/span&gt;&lt;span class="n"&gt;article_types&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;class_attribute: &lt;/span&gt;&lt;span class="kp"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="c1"&gt;# ...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Next, calculate the features for all articles and add them to our&amp;nbsp;&lt;code&gt;dataset&lt;/code&gt;:&lt;/p&gt;

&lt;div class="language-ruby highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ...&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;feature_list_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;article_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;files&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;FileLoader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'data/training'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;files_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;article_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
    &lt;span class="c1"&gt;# Remember that Text#features returns a Hash.&lt;/span&gt;
    &lt;span class="c1"&gt;# We only need the feature values.&lt;/span&gt;
    &lt;span class="c1"&gt;# Since the class value is still missing, we append the&lt;/span&gt;
    &lt;span class="c1"&gt;# article_type as the class value.&lt;/span&gt;
    &lt;span class="no"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;features&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;article_type&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="n"&gt;article_types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;each&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;article_type&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;feature_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;feature_list_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;article_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_instances&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;feature_list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="c1"&gt;# ...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Last, we can save all our calculated features to a file in the &lt;code&gt;/generated&lt;/code&gt; directory. &lt;code&gt;Instances&lt;/code&gt; allows saving and loading datasets to and from different file formats like &lt;span class="caps"&gt;CSV&lt;/span&gt;, &lt;span class="caps"&gt;JSON&lt;/span&gt;, &lt;span class="caps"&gt;ARFF&lt;/span&gt;, and the less common &lt;span class="caps"&gt;C.45&lt;/span&gt; file format. Let’s pick &lt;a href="https://weka.wikispaces.com/ARFF"&gt;&lt;span class="caps"&gt;ARFF&lt;/span&gt;&lt;/a&gt; (&lt;em&gt;Attribute-Relation File Format&lt;/em&gt;) here, which was especially developed to work with datasets for machine learning tasks and is also nicely legible for&amp;nbsp;humans:&lt;/p&gt;

&lt;div class="language-ruby highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ...&lt;/span&gt;
&lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_arff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'generated/articles.arff'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Run the script in your terminal to create the training&amp;nbsp;dataset:&lt;/p&gt;

&lt;div class="language-bash highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;jruby create_dataset.rb &lt;span class="c"&gt;# If you're using RVM, this is just `ruby...`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If you have a quick look into the generated &lt;code&gt;.arff&lt;/code&gt; file, you’ll see a header with the customizable relation name and the defined attributes, followed by the actual data&amp;nbsp;rows:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@relation Instances

@attribute athletics_hints_count numeric
@attribute cricket_hints_count numeric
# ...
@attribute class {athletics,cricket,football,rugby,tennis}

@data
1,0,0,0,0,47,1,1237,11,3,19,2,1,0,7,athletics
1,0,0,0,0,46,1,901,7,2,20,0,0,2,5,athletics
# ...
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;With our training dataset compiled we can now go ahead and train our classifier and classify our test&amp;nbsp;articles.&lt;/p&gt;

&lt;h2 id="training-the-classifier"&gt;Training the&amp;nbsp;Classifier&lt;/h2&gt;

&lt;p&gt;There are loads of different built-in classifiers from which we can choose. We could use Bayes classifiers, Neural Networks, Logistic Regression, Decision Trees, and many more. For simplicity we will use the &lt;a href="https://www.stat.berkeley.edu/~breiman/randomforest2001.pdf"&gt;RandomForest&lt;/a&gt; classifier. With RandomForest, we get an easy to configure classifier that is based on Decision Trees and performs well for common&amp;nbsp;problems.&lt;/p&gt;

&lt;p&gt;It’s time for loading the training dataset and then training a RandomForest classifier. Let’s do it in a new file called&amp;nbsp;&lt;code&gt;run_classification.rb&lt;/code&gt;.&lt;/p&gt;

&lt;div class="language-ruby highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="c1"&gt;# run_classification.rb&lt;/span&gt;

&lt;span class="nb"&gt;require&lt;/span&gt; &lt;span class="s1"&gt;'weka'&lt;/span&gt;

&lt;span class="n"&gt;instances&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Weka&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Core&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Instances&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_arff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'generated/articles.arff'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;instances&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;class_attribute&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="ss"&gt;:class&lt;/span&gt;

&lt;span class="n"&gt;classifier&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Weka&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Classifiers&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Trees&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;RandomForest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;

&lt;span class="c1"&gt;# The -I option determines the number of decision trees that are used in each&lt;/span&gt;
&lt;span class="c1"&gt;# learning iteration, the default is 100, we increase it to 200 here to gain a&lt;/span&gt;
&lt;span class="c1"&gt;# better performance.&lt;/span&gt;
&lt;span class="n"&gt;classifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use_options&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'-I 200'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;classifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;train_with_instances&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;instances&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Note that we have to manually set the class attribute after we loaded our dataset. This is necessary because there is no information about the position of our class attribute in our &lt;span class="caps"&gt;ARFF&lt;/span&gt; file (it doesn’t always have to be the last&amp;nbsp;one!).&lt;/p&gt;

&lt;p&gt;That was easy enough. Our test articles are already waiting for&amp;nbsp;us!&lt;/p&gt;

&lt;h2 id="classifying-test-articles"&gt;Classifying Test&amp;nbsp;Articles&lt;/h2&gt;

&lt;p&gt;We can now use our trained classifier to classify the (let’s pretend) unlabeled articles in our &lt;code&gt;data/test&lt;/code&gt;&amp;nbsp;directory.&lt;/p&gt;

&lt;p&gt;Before we can pass our test articles to the classifier, we have to extract the same features from them as we did for our training texts. Luckily we can use our FileLoader and Text classes&amp;nbsp;again:&lt;/p&gt;

&lt;div class="language-ruby highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="c1"&gt;# run_classification.rb&lt;/span&gt;

&lt;span class="nb"&gt;require&lt;/span&gt; &lt;span class="s1"&gt;'weka'&lt;/span&gt;
&lt;span class="nb"&gt;require_relative&lt;/span&gt; &lt;span class="s1"&gt;'file_loader'&lt;/span&gt; &lt;span class="c1"&gt;# &amp;lt;= added!&lt;/span&gt;
&lt;span class="nb"&gt;require_relative&lt;/span&gt; &lt;span class="s1"&gt;'text'&lt;/span&gt;        &lt;span class="c1"&gt;# &amp;lt;= added!&lt;/span&gt;

&lt;span class="c1"&gt;# ...&lt;/span&gt;

&lt;span class="n"&gt;article_types&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sx"&gt;%i(athletics cricket football rugby tennis)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;feature_list_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;article_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;files&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;FileLoader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'data/test'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;files_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;article_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
    &lt;span class="c1"&gt;# Remember again that Text#features returns a Hash.&lt;/span&gt;
    &lt;span class="c1"&gt;# We only need the feature values.&lt;/span&gt;
    &lt;span class="c1"&gt;# The class value is still missing, but this time, we append a "missing"&lt;/span&gt;
    &lt;span class="c1"&gt;# as class value. You can use nil, '?' or Float::NAN.&lt;/span&gt;
    &lt;span class="no"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;features&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="s1"&gt;'?'&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="n"&gt;article_types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;each&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;article_type&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;feature_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;feature_list_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;article_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;feature_list&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
    &lt;span class="n"&gt;label&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;classifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;classify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"* article about &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;article_type&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; classified as &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Here, we load our test texts and pass their extracted features to the &lt;code&gt;classify&lt;/code&gt; method of our classifier. After classifying, print out our predicted classes to the&amp;nbsp;stdout.&lt;/p&gt;

&lt;p&gt;Run the script and have look at the&amp;nbsp;output:&lt;/p&gt;

&lt;div class="language-bash highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;jruby run_classification.rb

&lt;span class="k"&gt;*&lt;/span&gt; article about athletics classified as athletics
&lt;span class="k"&gt;*&lt;/span&gt; article about athletics classified as athletics
&lt;span class="k"&gt;*&lt;/span&gt; article about cricket classified as cricket
&lt;span class="k"&gt;*&lt;/span&gt; article about cricket classified as cricket
&lt;span class="k"&gt;*&lt;/span&gt; article about football classified as football
&lt;span class="k"&gt;*&lt;/span&gt; article about football classified as football
&lt;span class="k"&gt;*&lt;/span&gt; article about rugby classified as rugby
&lt;span class="k"&gt;*&lt;/span&gt; article about rugby classified as rugby
&lt;span class="k"&gt;*&lt;/span&gt; article about tennis classified as tennis
&lt;span class="k"&gt;*&lt;/span&gt; article about tennis classified as tennis
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Yay. Looks like all our articles got the right&amp;nbsp;label!&lt;/p&gt;

&lt;p&gt;This doesn’t mean that our classification system is perfect, though. When training classifiers, their performance can be evaluated by an approach called &lt;em&gt;&lt;a href="https://en.wikipedia.org/wiki/Cross-validation_(statistics)#k-fold_cross-validation"&gt;cross validation&lt;/a&gt;&lt;/em&gt;. Weka also gives us a &lt;code&gt;cross_validate&lt;/code&gt; method for our&amp;nbsp;classifier.&lt;/p&gt;

&lt;p&gt;Cross validation splits up the training dataset into N different parts with an equal number of instances. By default, it uses 10 splits. Then it takes 9 subsets to train the classifier and classifies the leftover set. This is done until each subset has been classified after training the classifier with the other 9 subsets. With this procedure, you get an idea of how good your classifier performs because you already know all the labels and can calculate certain&amp;nbsp;measures.&lt;/p&gt;

&lt;p&gt;Let’s look at the 10-fold cross validation for our&amp;nbsp;classifier:&lt;/p&gt;

&lt;div class="language-ruby highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="n"&gt;evaluation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;classifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cross_validate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;folds: &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="n"&gt;evaluation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;summary&lt;/span&gt;

&lt;span class="c1"&gt;# Correctly Classified Instances         602               82.8061 %&lt;/span&gt;
&lt;span class="c1"&gt;# Incorrectly Classified Instances       125               17.1939 %&lt;/span&gt;
&lt;span class="c1"&gt;# Kappa statistic                          0.7708&lt;/span&gt;
&lt;span class="c1"&gt;# Mean absolute error                      0.1223&lt;/span&gt;
&lt;span class="c1"&gt;# Root mean squared error                  0.2281&lt;/span&gt;
&lt;span class="c1"&gt;# Relative absolute error                 39.9808 %&lt;/span&gt;
&lt;span class="c1"&gt;# Root relative squared error             58.3231 %&lt;/span&gt;
&lt;span class="c1"&gt;# Coverage of cases (0.95 level)          97.9367 %&lt;/span&gt;
&lt;span class="c1"&gt;# Mean rel. region size (0.95 level)      52.7373 %&lt;/span&gt;
&lt;span class="c1"&gt;# Total Number of Instances              727&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In the first two lines we see, that our classifier classified only about 83% of our articles correctly. It’s actually not too bad for our small, contrived feature set. You can expect the performance to improve with a set of carefully selected features. It’s up to you, now—let the hunt for the best features&amp;nbsp;begin!&lt;/p&gt;

&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;In this article, we used JRuby to automatically categorize sports articles. We went through three basic steps for building a classification system: extracting features from raw texts, building a training dataset, and training a classifier. With our trained classifier, we classified unlabeled&amp;nbsp;articles.&lt;/p&gt;

&lt;p&gt;It looks like Ruby can also be your best friend for machine learning tasks and I really encourage you to check out the Weka framework and play around with it a bit. It’s not only a good exercise but also lets you discover that basic machine learning is actually not rocket science! Give it a try, thanks for reading and let me know how it&amp;nbsp;goes.&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;&lt;em&gt;You can find the code from this blog post on&amp;nbsp;GitHub:&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;a href="https://github.com/paulgoetze/ml_with_jruby"&gt;github.com/paulgoetze/ml_with_jruby&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
</content>
    <summary type="html">All the different programming languages out there seem to be a better fit for machine learning tasks than Ruby, right? On the other hand, Ruby has an excellent reputation for fast prototyping. So, why shouldn’t you prototype machine learning systems with Ruby? Challenge accepted!</summary>
  </entry>
</feed>

