Tags | towards data

access-modifiers

Access Modifiers in Scala

Access modifiers, also known as access specifiers, determine the accessibility and scope of classes, methods, and other members. Scala's access modifiers closely resemble those of Java, although they provide more granular and powerful visibility control than Java.

Senthil Nayagan
Sep 5, 2022 - 4 Mins Read

airflow

How To Set SLA in Apache Airflow

Apache Airflow enables us to schedule tasks as code. In Airflow, a SLA determines the maximum completion time for a task or DAG. Note that SLAs are established based on the DAG execution date, not the task start time.

Senthil Nayagan
Jul 24, 2022 - 4 Mins Read

algorithms

An Introduction to Algorithms and Data Structures

An algorithm is a series of instructions in a particular order for performing a specific task.

Senthil Nayagan
Aug 25, 2022 - 10 Mins Read

algorithms-and-data-structures

An Introduction to Algorithms and Data Structures

An algorithm is a series of instructions in a particular order for performing a specific task.

Senthil Nayagan
Aug 25, 2022 - 10 Mins Read

amazon-emr

Overview of Amazon EMR

Amazon EMR is a managed cluster platform that makes it easier to run big data frameworks like Apache Hadoop and Apache Spark on AWS to process and analyze huge amounts of data.

Senthil Nayagan
Sep 5, 2022 - 21 Mins Read

anti-pattern

Anti-Pattern

Anti-patterns at first seem to be quick and reasonable, they typically have adverse effects in the future. They are design and code smells. It affects our software badly and adds technical debt. We should avoid them at all costs.

Senthil Nayagan
Jul 20, 2022 - 2 Mins Read

apache spark

apache-pinot

Apache Pinot joins hands with Kafka and Presto to provide low-latency, high-throughput user-facing analytics

Senthil Nayagan
Nov 14, 2022 - 18 Mins Read

apache-spark

Let’s Know About the Parquet File

An open source file format for Hadoop that provides columnar storage and is built from the ground up with complex nested data structures in mind.

Senthil Nayagan
Jul 25, 2022 - 1 Min Read

Partitions and Bucketing in Spark

Partitioning and bucketing are used to improve the reading of data by reducing the cost of shuffles, the need for serialization, and the amount of network traffic.

Senthil Nayagan
Jul 25, 2022 - 13 Mins Read

Need for Caching in Apache Spark

Caching is one of Spark's optimization strategies for reusing computations. It stores interim and partial results so they'll be utilised in subsequent computation stages.

Senthil Nayagan
Jul 24, 2022 - 2 Mins Read

apache-yarn

apm

application-performance-monitoring

aws

Overview of Amazon EMR

Amazon EMR is a managed cluster platform that makes it easier to run big data frameworks like Apache Hadoop and Apache Spark on AWS to process and analyze huge amounts of data.

Senthil Nayagan
Sep 5, 2022 - 21 Mins Read

AWS Command Line Interface (AWS CLI)

Senthil Nayagan
Aug 12, 2022 - 4 Mins Read

aws
aws-cli

aws-cli

AWS Command Line Interface (AWS CLI)

Senthil Nayagan
Aug 12, 2022 - 4 Mins Read

aws
aws-cli

aws-glue

aws-lake-formation

big-data

Data Governance

Data governance is the process of defining security guidelines and policies and making sure they are followed by having authority and control over how data assets are managed.

Senthil Nayagan
Jul 26, 2022 - 4 Mins Read

Data Catalog

A data catalog is an ordered inventory of an organization's data assets that makes it easy to find the most relevant data quickly.

Senthil Nayagan
Jul 25, 2022 - 3 Mins Read

Introduction to Data Engineering

It's the process of designing and building systems for gathering vast quantities of raw operational data from a variety of sources and formats, analyzing, converting, and storing it at scale.

Senthil Nayagan
Jul 24, 2022 - 5 Mins Read

bucketing

Partitions and Bucketing in Spark

Partitioning and bucketing are used to improve the reading of data by reducing the cost of shuffles, the need for serialization, and the amount of network traffic.

Senthil Nayagan
Jul 25, 2022 - 13 Mins Read

cache

Need for Caching in Apache Spark

Caching is one of Spark's optimization strategies for reusing computations. It stores interim and partial results so they'll be utilised in subsequent computation stages.

Senthil Nayagan
Jul 24, 2022 - 2 Mins Read

callback

coding-principles

Singleton Pattern

A singleton pattern limits the number of instances of a class to one.

Senthil Nayagan
Jul 21, 2022 - 2 Mins Read

coding-problem

coding-problem-solving

columnar-format

Let’s Know About the Parquet File

An open source file format for Hadoop that provides columnar storage and is built from the ground up with complex nested data structures in mind.

Senthil Nayagan
Jul 25, 2022 - 1 Min Read

columnar-storage

Let’s Know About the Parquet File

An open source file format for Hadoop that provides columnar storage and is built from the ground up with complex nested data structures in mind.

Senthil Nayagan
Jul 25, 2022 - 1 Min Read

container-management

container-orchestration

copy-on-write

cow

data

data-as-a-product

Data Product vs. Data as a Product

A data product is not the same as data as a product. A data product aids the accomplishment of the product's goal by using the data, whereas in data as a product, the data itself is seen as the actual product.

Senthil Nayagan
Aug 2, 2022 - 3 Mins Read

data-caching

Need for Caching in Apache Spark

Caching is one of Spark's optimization strategies for reusing computations. It stores interim and partial results so they'll be utilised in subsequent computation stages.

Senthil Nayagan
Jul 24, 2022 - 2 Mins Read

data-catalog

Data Catalog

A data catalog is an ordered inventory of an organization's data assets that makes it easy to find the most relevant data quickly.

Senthil Nayagan
Jul 25, 2022 - 3 Mins Read

data-engineering

Data Product vs. Data as a Product

Senthil Nayagan
Aug 2, 2022 - 3 Mins Read

Data Governance

Data governance is the process of defining security guidelines and policies and making sure they are followed by having authority and control over how data assets are managed.

Senthil Nayagan
Jul 26, 2022 - 4 Mins Read

Data Catalog

A data catalog is an ordered inventory of an organization's data assets that makes it easy to find the most relevant data quickly.

Senthil Nayagan
Jul 25, 2022 - 3 Mins Read

Introduction to Data Engineering

It's the process of designing and building systems for gathering vast quantities of raw operational data from a variety of sources and formats, analyzing, converting, and storing it at scale.

Senthil Nayagan
Jul 24, 2022 - 5 Mins Read

Data Deluge

When the granularity of data increases, its complexity also increases. At some point, we will reach a point where we cannot handle the volume of fresh data being generated.

Senthil Nayagan
Jul 18, 2022 - 1 Min Read

data-goverance

Data Governance

Data governance is the process of defining security guidelines and policies and making sure they are followed by having authority and control over how data assets are managed.

Senthil Nayagan
Jul 26, 2022 - 4 Mins Read

data-inventory

Data Catalog

A data catalog is an ordered inventory of an organization's data assets that makes it easy to find the most relevant data quickly.

Senthil Nayagan
Jul 25, 2022 - 3 Mins Read

data-key

Envelope Encryption - Putting Your Encryption Key in an Envelope Is the Safer Option

Envelope encryption is a way of encrypting plaintext data using a key and then encrypting that key using an another key. This strategy is intended not just to make things more secure but also to enhance performance.

Senthil Nayagan
Jul 22, 2022 - 6 Mins Read

data-lake

Data Catalog

A data catalog is an ordered inventory of an organization's data assets that makes it easy to find the most relevant data quickly.

Senthil Nayagan
Jul 25, 2022 - 3 Mins Read

data-management

Data Product vs. Data as a Product

Senthil Nayagan
Aug 2, 2022 - 3 Mins Read

Data Deluge

When the granularity of data increases, its complexity also increases. At some point, we will reach a point where we cannot handle the volume of fresh data being generated.

Senthil Nayagan
Jul 18, 2022 - 1 Min Read

data-mesh

data-observability

data-pipeline

Introduction to Data Engineering

It's the process of designing and building systems for gathering vast quantities of raw operational data from a variety of sources and formats, analyzing, converting, and storing it at scale.

Senthil Nayagan
Jul 24, 2022 - 5 Mins Read

data-product

Data Product vs. Data as a Product

Senthil Nayagan
Aug 2, 2022 - 3 Mins Read

data-protection

Envelope Encryption - Putting Your Encryption Key in an Envelope Is the Safer Option

Senthil Nayagan
Jul 22, 2022 - 6 Mins Read

data-quality

data-quality-dimensions

data-science

data-security

Data Governance

Data governance is the process of defining security guidelines and policies and making sure they are followed by having authority and control over how data assets are managed.

Senthil Nayagan
Jul 26, 2022 - 4 Mins Read

data-streaming

data-structures

An Introduction to Algorithms and Data Structures

An algorithm is a series of instructions in a particular order for performing a specific task.

Senthil Nayagan
Aug 25, 2022 - 10 Mins Read

database

database-indexing

delta lake

delta-lake

design-patterns

Singleton Pattern

A singleton pattern limits the number of instances of a class to one.

Senthil Nayagan
Jul 21, 2022 - 2 Mins Read

Anti-Pattern

Senthil Nayagan
Jul 20, 2022 - 2 Mins Read

distributed-system

elastic-apm

elasticsearch

envelope-encryption

Envelope Encryption - Putting Your Encryption Key in an Envelope Is the Safer Option

Senthil Nayagan
Jul 22, 2022 - 6 Mins Read

etl

functional-programming

Defining Variables Using the `def` Keyword in Scala

Difference between `lazy val` and `def`.

Senthil Nayagan
Jul 21, 2022 - 2 Mins Read

ge

great-expectations

grpc

gx

hadoop

Let’s Know About the Parquet File

An open source file format for Hadoop that provides columnar storage and is built from the ground up with complex nested data structures in mind.

Senthil Nayagan
Jul 25, 2022 - 1 Min Read

iac

Terraform Basics

Senthil Nayagan
Aug 19, 2022 - 25 Mins Read

index

infrastructure-as-code

Terraform Basics

Senthil Nayagan
Aug 19, 2022 - 25 Mins Read

inter-process-communication

inverted-index

kibana

knowledge-graph

kubernetes

lake-formation

lakefs

lakehouse

lakehouses

memory-management

Rust’s Ownership and Borrowing Enforce Memory Safety

Senthil Nayagan
Jul 19, 2022 - 40 Mins Read

merge-on-read

mor

nginx

object-oriented-programming

Case Class in Scala

The case class represents immutable data. It is a type of class that is often used for data storage.

Senthil Nayagan
Jul 24, 2022 - 4 Mins Read

olap

Apache Pinot joins hands with Kafka and Presto to provide low-latency, high-throughput user-facing analytics

Senthil Nayagan
Nov 14, 2022 - 18 Mins Read

olap-datastore

Apache Pinot joins hands with Kafka and Presto to provide low-latency, high-throughput user-facing analytics

Senthil Nayagan
Nov 14, 2022 - 18 Mins Read

oops

Access Modifiers in Scala

Senthil Nayagan
Sep 5, 2022 - 4 Mins Read

Case Class in Scala

The case class represents immutable data. It is a type of class that is often used for data storage.

Senthil Nayagan
Jul 24, 2022 - 4 Mins Read

parquet

Let’s Know About the Parquet File

An open source file format for Hadoop that provides columnar storage and is built from the ground up with complex nested data structures in mind.

Senthil Nayagan
Jul 25, 2022 - 1 Min Read

partition

Partitions and Bucketing in Spark

Partitioning and bucketing are used to improve the reading of data by reducing the cost of shuffles, the need for serialization, and the amount of network traffic.

Senthil Nayagan
Jul 25, 2022 - 13 Mins Read

pinot

Apache Pinot joins hands with Kafka and Presto to provide low-latency, high-throughput user-facing analytics

Senthil Nayagan
Nov 14, 2022 - 18 Mins Read

postgres

postgresql

presto

prestodb

problem-solving

programming

remote-procedure-call

resource-management

reverse-etl

root-key

Envelope Encryption - Putting Your Encryption Key in an Envelope Is the Safer Option

Senthil Nayagan
Jul 22, 2022 - 6 Mins Read

rpc

rust

Rust’s Ownership and Borrowing Enforce Memory Safety

Senthil Nayagan
Jul 19, 2022 - 40 Mins Read

scala

scala-collections

sdlc

secured-data-lake

service-level-agreement

How To Set SLA in Apache Airflow

Senthil Nayagan
Jul 24, 2022 - 4 Mins Read

shuffling

singleton-pattern

Singleton Pattern

A singleton pattern limits the number of instances of a class to one.

Senthil Nayagan
Jul 21, 2022 - 2 Mins Read

sla

How To Set SLA in Apache Airflow

Senthil Nayagan
Jul 24, 2022 - 4 Mins Read

solid

sql

storage-strategies

terraform

Terraform Basics

Senthil Nayagan
Aug 19, 2022 - 25 Mins Read

transactionsl-data-lake

version control

web-server

webhook

workflow-engine

How To Set SLA in Apache Airflow

Senthil Nayagan
Jul 24, 2022 - 4 Mins Read

access-modifiers

airflow

algorithms

algorithms-and-data-structures

amazon-emr

anti-pattern

apache spark

apache-pinot

apache-spark

apache-yarn

apm

application-performance-monitoring

aws

aws-cli

aws-glue

aws-lake-formation

big-data

bucketing

cache

callback

coding-principles

coding-problem

coding-problem-solving

columnar-format

columnar-storage

container-management

container-orchestration

copy-on-write

cow

data

data-as-a-product

data-caching

data-catalog

data-engineering

data-goverance

data-inventory

data-key

data-lake

data-management

data-mesh

data-observability

data-pipeline

data-product

data-protection

data-quality

data-quality-dimensions

data-science

data-security

data-streaming

data-structures

database

database-indexing

delta lake

delta-lake

design-patterns

distributed-system

elastic-apm

elasticsearch

envelope-encryption

etl

functional-programming

ge

great-expectations

grpc

gx

hadoop

iac

index

infrastructure-as-code

inter-process-communication

inverted-index

kibana

knowledge-graph

kubernetes

lake-formation

lakefs

lakehouse

lakehouses

memory-management

Rust’s Ownership and Borrowing Enforce Memory Safety 1 2 3 4 5

Rust’s Ownership and Borrowing Enforce Memory Safety