Think about you join a database to analytics and the info isn’t within the form you want to have it. For instance, coordinates ought to be separated by latitude and longitude, some values are in a unique format or kind, and a few tables can actually have a dangerous design. Usually, you would want to spend so much of time context-switching between the ELT/ETL pipeline and analytics, the place, within the transformation (T) section you would want to edit code that transforms knowledge from the applying form to the analytics form. This expertise is type of regular in our trade however does it should be? Because of this, we’re introducing Final Mile ETL! With it, you are able to do all the pieces in a single software which considerably improves the flexibility to iterate on and the pace of improvement, customization, and safety. It’s possible you’ll now ask how? Effectively, sufficient phrases, it’s time for an instance. Let’s deep dive into it!
What do I imply by “exploratory analytics”? Let’s say, we now have simply three tables with knowledge, and we want to discover its worth or study some info primarily based on this knowledge. It signifies that as an alternative of 1 concrete objective to realize, we are going to primarily attempt to discover some worth on this knowledge! With outlined exploratory analytics, listed here are three tables in a database (Airports
, Nation record
, and GDP
— Gross Home Product):

You possibly can see that the coordinates are in a single column known as coordinates
as an alternative of latitude
and longitude
, or worth
in GDP
the desk is textual content moderately than numerics. We will tackle these points with the assistance of Final Mile ETL contained in the analytics.
Let’s join the database to the analytics (in case you are not accustomed to GoodData, I encourage you to examine the documentation). The result’s the next:
What you’ll be able to see within the picture above are datasets. We will convert a dataset to a so-called SQL dataset:
The SQL dataset offers us the likelihood to jot down SQL queries which are executed immediately within the database. Let’s simply examine what varieties of airports we now have within the database:
Plainly the Airports
desk accommodates a number of varieties of airports reminiscent of heliports, and even closed airports. Let’s say that I’m simply within the medium and enormous airports — it’s not an issue in any respect. I don’t have to go to the ELT/ETL pipeline, I merely create a brand new SQL dataset known as Airports reworked
immediately within the analytics, and I can do this with the next SQL code:
You possibly can see that I can do it with fairly a easy SQL question and the result’s the next:
Sadly, there may be another downside we have to resolve. The coordinates are usually not within the format we want for analytics. We have to break this column into longitude
and latitude
to render a geo chart. Let’s edit the Airports reworked
dataset:
Now, with this little transformation, we are able to merely render a geo chart to see the all airports in the entire world:
As this was moderately easy, let’s discover one thing extra priceless — what’s the correlation between GDP and the variety of airports in a rustic?
First, we are able to simply record the values of Airports
and GDP
tables:
The dangerous information is that the Airports
desk doesn’t have the total nation names (for instance, Albania) however solely iso codes of nations (for instance, AL). Alternatively, the GDP
desk doesn’t have iso codes however solely has the total identify of the nation. Sadly, there may be another challenge with the column worth
. You possibly can see that the values include commas and this isn’t the proper format for numbers. Let’s resolve the talked about issues. We will begin with the Nation record
desk the place each the total names and the iso codes are discovered:
Now, we simply have to create a brand new SQL dataset for the GDP the place we be a part of Nation record
so as to add the lacking iso code (or as I named it — nation code), and take away commas from the column worth
and convert it to numeric kind:
Good, we now have the GDP SQL dataset that we want! Additionally, to depend the airports accurately, I’ll add an identifior to the Airport reworked
SQL dataset:
Each of the SQL datasets now include nation codes, we are able to merely be a part of them primarily based on these nation codes:
The result’s as follows. I additionally eliminated the Nation record
dataset because it doesn’t have any worth for us proper now:
Let’s transfer to the Analyze tab to seek out out what the correlation between the GDP and the variety of airports within the nation is. To begin with, we have to calculate the variety of airports within the nation (for extra info, examine learn how to create metrics in GoodData):
The final step is to calculate the correlation between GDP and the variety of airports within the nation:
All the things is reworked and computed, the result’s as follows:
Collectively we explored what Final Mile ETL is, and how one can profit from it. Briefly, you are able to do all the pieces throughout the analytics and, consequently, keep away from the necessity for context-switching. On high of that, you’ll be able to profit from the safety implications — that being, you do not want to offer entry to the database so as to make easy adjustments throughout the analytics. An extra plus level is the flexibility to iterate, that means that you could begin with one thing easy, after which transfer to extra complicated outcomes. Lastly, because of GoodData’s analytics-as-code strategy you’ll be able to simply model all the pieces in Git and thus apply software program engineering best-practices to the analytics.
Wish to strive it out your self? Join a free trial and get began at the moment.