Treasury Pay | Sr. Data Engineer | 2018-2020

‹

›

About Link to heading

My fist significant contract project was a financial platform for global retailers with Invictus Gurus, a consultancy based in Dallas, Tx.

The software manages high throughput point-of-sale transactions and offers companies with a global retail footprint the opportunity to get highly granular metrics on their banking, revenue and forecasting data.

I was initially engaged to bring the proof of concept application back online after some unexpected downtime. I lead the technical architecture and implementaiton of the big data components for the production phase that followed. I settled on a series of open source big data technologies, tied together with distributed infrastructure on Kubernetes.

I had worked with most of the technologies in the final stack, but this was my first experience tying together so many tools together in a high volume - production context. Standing up this data stack was thrilling in many ways and very satisfying to see my design ideas work at scale and meet the client requirements.

Impact Link to heading

Led big data architecture and implementation of a high throughput financial platform, supporting 100k+ transactions per second
Wrote complex data manipulation pipelines in Spark & Spark Streaming.
Built out base Elasticsearch data caching layer for low latency frontend consumption via API.
Built out distributed, platform agnostic infrastructure on Kubernetes

Technology

Data Architecture Context

High throughput - system needed to support hundreds of thousands of transactions per second
Linear scaling - system needed to scale linearly with data volume
Processing load - system required complex and compute heavy data processing
Low latency - system needed to supply high volume of processed and formated data to the front-end
Flexibility - ingesting and building new datasets on a per-client basis
Geo + temporal domains - Incoming data was primarily time-series and segmented on location, so the system needed to focus on these dimensions for search and scoping.

Data Architecture Delivery

Data Firehose - socket based data streamer written in python - to benchmark data volumes
Spark Streaming - handling ingest and initial processing of raw external data
Cassandra - wide column data warehouse to store incoming data streams, cold and long term data
Spark Core - loading and processing datasets from cassandra into final format
Elasticsearch - Application data caching layer for low latency consumption over API