APOTHEM | Apache Project(s) of the month

Articles

Mon 30 September 2019
projects

Apache CarbonData

In the last few years I have been working quite extensively with Apache Spark, and I have come to realize that a good storage format goes a long way toward efficiency and speed. For instance, when dealing with large CSV or JSON files, adding an Apache Parquet writing step would …

Sat 31 August 2019
projects

Apache Rya

Since I have been working with Semantic Web technologies for quite some time, I was looking forward to explore new Apache projects within the area. Apache Rya fit the purpose perfectly, as it is a SPARQL-enabled triplestore for Big Data, promising to scale to billions of triples across multiple nodes …

Wed 31 July 2019
projects

Apache Atlas (part 2)

Since Atlas is a fairly large and complex project, one article was definitely not enough to explore all of its capabilities. Building on the previous article, we will explore classifications and glossary, the REST API, and two more sources of lineage information (Spark and Kafka). Classification Let's start with classification …

Sun 30 June 2019
projects

Apache Atlas

Since I have always been interested in (and mainly working with) Semantic Web technologies and knowledge engineering, metadata is a topic I care about quite a lot. "Metadata" means "data about data", which practically speaking may include the format, the source, the purpose, the author, the creation date, and many …

Previous
3 of 4
Next