๐พ Updated on **May 28, 2020**

Now that we have Julia setup on our local machine, it is time to really start cooking! And by cooking I mean analyzing some interesting data...
In (maybe?) contradiction to the vastly more common approach of teaching computational topics from a **bottom up** approach, these tutorials will (possibly to a fault) be very *top down*. What do I mean by that? Usually with computer science topics, and even more so with computational approaches to analytical topics, teaching often starts from the most basic/atomic concepts: i.e. learn data science by writing algorithms (and to write algorithms you first have to learn about variables and `for`

loops).

While this often seems like a perfectly logical approach for an expert trying to teach novices (teacher/student), this method of curriculum development really optimizes for the **teacher's** experience rather than *learner's*^{[1]}. Much like building a house, for a **builder** it is vastly more efficient to start pouring the concrete for the foundation (and pretty impossible to do otherwise). But for the *homeowner*, it is much more rewarding and exciting to see windows, a faรงade, and a `siiiiiic`

door knocker.

[1] | ...in an innocuously perverse kind of way. |

Thankfully for me/us we are not building a house.... we are leaning how to do data science in Julia! And also I am not a Julia expert. So how does a novice teach a novice? Especially when that novice is themself?

I personally find the quickest approach to learn a new topic is to setup an engaging problem you are curious about that has a semi-definite answer (i.e. you'll know the answer when you see it). Then develop a sequence of steps to get you to that answer.

It can be a little mind-bendy to think about learning in this way but it is similar to the steps of a proof by (backwards) induction (or recursion). Just like with the mathematical proof however, proving the *base case* is a non-trivial task (that may not even be possible).

The **base case** in our curricula bootstrapping is that an **answer exists** for the problem you setup for yourself. But since you haven't actually solved this problem you somehow need to determine whether or not you scoped the problem correctly. And once you figure out the base case—there does in fact exist a solution to this problem)^{[2]}—you then work backwards on a (slightly) smaller problem (the **N-1** case). And repeat:

[2] | You can definitely learn from an unsolvable (or unsolvable-in-the-amount-of-time-you-have) problem, but it is often much more rewarding when learning a new subject to feel a sense of accomplishing something. |

Does a solution exist for (sub)problem

**N**?Am I at

**N = 0**(i.e. a subproblem I know the solution to)?If no, go to back to step

*1*and set**N := N - 1**^{[3]}If yes,

*celebrate*you did it!

[3] | Implicit in this step is determining a suitable subproblem that corresponds to N-1, ideally within a comfortable ZPD for the learner (i.e. yourself). |

Brainstorm a motivating problem you are curious about

Are higher rated Airbnb listings more expensive per night?

Research if it can be been solved

^{[4]}Has anyone written a blog post/research paper/conference talk showing this?

Or can I imagine a way it

**could**be solved given*what I currently know*?

Decompose it into managable subproblems (that you assume can be solved)

How can I get a structured dataset of Airbnb listing that contains

*both*price and ratings?

[4] | Often the easiest way is to find if anyone has done it before. |

Many data science learning resources often use **well structured** and "canned" datasets. This has the benefit of circumventing step 1 and being much more predictable/reliable. But what one gains in convenience, one loses in motivation. Now maybe *some* people really do care about the fuel economy of cars from the 1974 Motor Trend issue^{[5]}.

[5] | These canonical datasets are canonical for good reason however. And are actually decent for learning machine learning/modeling (among other things). |

If I can coin a term here^{[6]}→ **simulated experience**: a structured exercise that emulates the *challenges* and *uncertainties* of a real world task.

[6] | I mean I will, it is my blog after all. |

The data can be

**faceted**by region (city/county) so everyone can find a facet that they have a particular affinity towards.These data are generated from

*real world processes*and contain authentic problems (that Airbnb and epidemiologists have likely solved)

Two of my favorite data sources that fit these criteria are ๐

Now that we have learned how to learn, we are metacognitively^{[7]} setup for success! For the forseeable future posts, the long running (and timely) case study we will be working with is analyzing the COVID tracking project data.

[7] | Roger Azevedo. Computer environments as metacognitive tools for enhancing learning. Educational Psychologist. 2005. |

DISCLAIMER: I am not an epidemiologist nor a public health professional. Always take care when making **inferences** from any data, especially if it is **unfamiliar high-stakes** data (like the COVID testing date is for me).

I promise tomorrow we will write at least *one* line of Julia code ๐

To the extent possible under law,
Jonathan Dinu
has waived all copyright and related or neighboring rights to
Bootstrapping Curricula (for yourself).

This work is published from:
United States.

๐ฎ A production of the
hyphaebeast.club ๐ฎ