Analytics using code and visuals
there are two fundamentally different approaches to data analytics - code based and visual based. this is possibly why "data science" and "business intelligence" have diverged
I’m surprised I’ve taken so long to articulate this, but there are two fundamentally different approaches to data analytics - code based and visual based. Both have their pros and cons, but appreciating this distinction can help you appreciate better the divergent paths that the industry has taken.
And among other things this explains why “data science” and “business intelligence” are largely considered as separate fields by most practitioners.
Tableau and Python
At one end of the spectrum you have the “python approach” which I have criticised sufficiently in this blog. Here, you look at data analysis as a branch of software engineering, and analysing data is reduced to a cookie-cutter series of “steps” you need to follow to get the insight.
In this approach, visual processes such as inspecting the data and drawing good graphs is deprioritised. The important thing is in having the right kind of “pipelines” and replicable code.
The other extreme is what I would call the “tableau approach”, where everything is done using drag and drop (and other visual aids). You might write SQL *code* to get the data ready for the dashboard, but the dashboard itself is constructed visually. You drag and drop elements in place. Any edits to the graphs are made using the mouse. The idea here is to get the dashboard in a visually pleasing format for easy consumption and insight.
That these two diametrically opposite approaches exist for analytics might explain why two parallel fields - “data science” and “business intelligence” have developed.
All over the stack
This dichotomy between a visual and code-based approach is not just restricted to the analytics layer - it is present up and down the stack as well.
The trigger for this post, for example was this post by David J on semantic layers and data models. In that he writes:
The quote here is from the Qlik link the image above is also taken from - it talks about data modeling being a visual process. I don’t feel this is necessarily true. You can easily think of it as a code-first exercise, which then generates a visual artefact instead of the other way round. However, the end result is the same and I agree with the rest of the quote.
You notice that there is one code-based approach and one visual-based approach when it comes to data modelling as well.
Pros and cons
The pros and cons of a visual and code-based approach are fairly straightforward.
With a visual approach
You can SEE the data, which is super important in terms of knowing what to do
You don’t need to be particularly proficient when it comes to programming. Taken to an extreme, SQL can also be replaced with a query builder
What you see is what you get - there is instant feedback in terms of what the analysis / visual looks like
Formatting is relatively easier. Anyone who has tried to adjust relative sizes of graphs using code will be able to attest to this
Analysis is far more nimble. Sometimes, with a code based approach, you might use too formal / software engineering an approach, which makes changes difficult. You can get into “every change needs a thousand approvals” hell. Visual processes, usually outside of the formality of software engineering practices, allow for more explorations.
With a code-based approach
The process becomes infinitely more repeatable and scalable. It is far easier to rerun your analysis when new data comes about (by building an end-to-end pipeline)
You can version control your analysis. One of the biggest problems with data analysis is the ease of making “silly mistakes” or introducing bugs. The ability to version control makes it easier to modify the analysis
Testing can be automated. Again, because data analysis is highly prone to bugs, it’s important that they get caught asap. And when you turn your analysis into code, you can automate tests as well
It is far easier to reuse analysis. If you want to copy-paste one widget from a code-based dashboard to another, you can just copy-paste the code rather than trying to replicate the process of building the widget
Sort of related to this - comparisons are easier to run. If you want to compare something across different periods or categories (and those periods or categories haven’t been defined in the “model”), you can simply run the same code with slight changes in parameters and get your results.
Once you get used to coding, the speed you can achieve in this is significantly higher than what you can ever get with drag and drop. This is also why you see most developers being extremely familiar with keyboard shortcuts and hardly using mice
Neither of these lists are exhaustive - they are what I’ve been able to come up with at the time of writing.
Striking a balance
Like a good analyst I’ve swung wildly between these two approaches over time. I started my career with the visual SPSS and then Excel, though I did a half-and-half approach with Excel, using VBA heavily. Then, as I moved jobs, I started using SAS, and then R - thus progressively moving more towards code.
That said, having started off in terms of visuals, and a breakthrough I once had by looking at the data in Excel, I’ve been partial towards visual methods. In fact, I follow a hybrid approach, using code to analyse the data but then looking at it visually. As far as I am concerned, this is a decent middle ground.
Best of both worlds
Can we really have the best of both worlds, though? In visual C++, for example (I had used it for a project back in 2001), you could build a UI by writing code, and see the UI appear to the side. You could even drag and drop a UI element, and see the appropriate code added to the side.
If you think of it, Excel Macros allow something like this as well. I haven’t really used PowerBI, so i don’t know if they actually offer it - they should, since Microsoft has an illustrious history offering this kind of a feature.
What this (visual and code based analysis) offers is to make the analytics tool interoperable, and each person can design it the way they are most comfortable with. You can pay attention to detail the way you want to, have version control and easily reuse code.
Now, I haven’t really used too many BI tools (tried using Quicksight once, gave up as it was too cumbersome), so I asked my friendly intern ChatGPT. This is what it has to say:
• Power BI: Rare. Most users define elements in the Power BI Desktop application. The REST API is more commonly used for automation and embedding.
• Tableau: Rare. Tableau dashboards are usually designed in Tableau Desktop. The API is used for embedding and automation, not for defining individual elements.
• Grafana: Very common. Grafana JSON models and Grafonnet-lib are often used to define dashboards and elements programmatically.
• Looker: Common. LookML is used to define data models and visualizations programmatically, which can include dashboard elements.
• Metabase: Less common. Most dashboards are designed via the UI, though the API allows for programmatic control over dashboards.
• Kibana: Common. Users often define visualizations and dashboards in JSON format, especially in DevOps and automated environments.
• Superset: Common. Users can define dashboards and charts using JSON configurations, fitting into data engineering workflows and CI/CD pipelines.
How do you prefer to do your analysis? Drag and drop, or using code? Is there a particular preference in terms of when you want to use what? I’d love to hear from you!
As someone who has gotten into the Data world recently, I prefer the visual to the code. In fact, my usual approach to analyse any new dataset is to immediately load it into Power BI to look at some key visuals and what insights it might give before the heavy lifting using code begins. Excel is too clunky for whipping up quick visuals
A few points about Power BI :
1. Data Ingestion using M & Power Query simplies and automates a lot of data cleaning activity
because each transformation is stored as a step which helps avoid repetitive effort as would be
required in Excel
2. Data modeling is very convenient using drag & drop and automatic identification of relationships.
Also the Data model can be comprehended visually which is quite helpful
3. Visual tweaking and formatting features are quite comprehensive
4. The Power BI workspace for deployment, automatic data refresh schedules and flexibility of d
distribution is what makes it the go-to dashboarding environment.This significantly reduces
maintenance of published dashboards
5. Just like in PPT & Excel one can use UI elements ( shapes etc. ) with an embedded action using VB
- one can do the same in Power BI but I have not had the need to use it till now
6. Now there is also support for Python embedding in Power BI - One can do two things, one is data transformation using Python if you dont want to use DAX and M and the second is you can embed python visuals (Seaborn etc.) in Power BI
An expert R/Python programmer would prefer the "code "approach, while a PowerBI expert would bat for the "drag & drop" ("GUI" or "visual") approach. If every visual requirement (think of complex graphs) could be achieved in both the approaches then I think the whole comparison doesn't matter much (it might matter in relatively smaller aspects like re-use and other things). The critical difference is that some visual requirements simply cannot be met by the "visual" approach. I think the visual approach comes handy when the analysis is a one-time activity and straightforward. The biggest advantage with visual tools like PowerBI is the ability to rapidly build visuals, even an expert programmer would require some time to build a basic graph with code.
The formality & software engineering aspect with the "code" approach (con) in the last bullet doesn't hold- you can always work in a prototype/PoC mode and override all software engineering standards. This is quite common now in the industry.
Another big advantage with the "code" approach is that the data analysis might clear out some questions on the data, following this the next phase of building a data science model might start. With a code approach you can continue this smoothly (think of Jupyter notebooks), while in a "visual" approach either you handover to another person or start a new code IDE now.
~Mukund