Member-only story

Metabase+SparkSQL+JSON+Parquet+ORC Files.

CA Amit Singh
3 min readNov 10, 2024

Learn to Connect Metabase (Open Source Data Visualization) with JSON/Parquet/ORC Files via SparkSQL

What is Apache Spark?

Apache Spark™ is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.

Batch/streaming data

Unify the processing of your data in batches and real-time streaming, using your preferred language: Python, SQL, Scala, Java or R

SQL analytics

Execute fast, distributed ANSI SQL queries for dashboarding and ad-hoc reporting. Runs faster than most data warehouses.

Data science at scale

Perform Exploratory Data Analysis (EDA) on petabyte-scale data without having to resort to downsampling

Machine learning

Train machine learning algorithms on a laptop and use the same code to scale to fault-tolerant clusters…

--

--

CA Amit Singh
CA Amit Singh

Written by CA Amit Singh

Qualified Chartered Accountant & Multi Technology Trainer with 25 yrs of Multi Technology/ Multi Industry Experience. www.linkedin.com/in/ca-amit-singh-07babb

No responses yet