Apache Project(s) of the month


Apache CarbonData

In the last few years I have been working quite extensively with Apache Spark, and I have come to realize that a good storage format goes a long way toward efficiency and speed. For instance, when dealing with large CSV or JSON files, adding an Apache Parquet writing step would …

Apache Rya

Since I have been working with Semantic Web technologies for quite some time, I was looking forward to explore new Apache projects within the area. Apache Rya fit the purpose perfectly, as it is a SPARQL-enabled triplestore for Big Data, promising to scale to billions of triples across multiple nodes …

Apache Atlas (part 2)

Since Atlas is a fairly large and complex project, one article was definitely not enough to explore all of its capabilities. Building on the previous article, we will explore classifications and glossary, the REST API, and two more sources of lineage information (Spark and Kafka). Classification Let's start with classification …

Apache Atlas

Since I have always been interested in (and mainly working with) Semantic Web technologies and knowledge engineering, metadata is a topic I care about quite a lot. "Metadata" means "data about data", which practically speaking may include the format, the source, the purpose, the author, the creation date, and many …