Blockchain Analytics and Data Pipelines - Progress Report #1

Report Name

Blockchain Analytics and Data Pipelines - Report 1

Reporting Period

July 3 - July 22 (19 days)

Report Description

Several areas crucial to the development of this proposal have been worked on during the initial period of the grant. Initially the grant has focused on a community canvas to prioritize metrics for exploration. Results from that survey have just been finalized and various members of the community are beginning to engage with our efforts. The infrastructure necessary to run the production workloads has begun development and by the next report will be up and running. The most challenging part of the grant is the development of the chain parsers and associated ETL tooling which has been started with major progress to report. The tooling is built on the same patterns of the most widely used blockchain ETL packages and is currently available for use.

Project Completion Percentage

40%

Remaining Time to Completion

7.5 weeks

Expected Results for the Next Period

  • Completion of infrastructure and blockchain data ingest with Airflow.
  • Supporting modules/packages

Materials Proving Progress on the Project

Review of each specific goal/milestone

Phase 1: Metrics High Grading (80%)

Milestones:

  • Canvass community and foundation for a high grading of metrics to be collected (completed)

Deliverables:

  • Consolidated list of metrics and associated tables needed to feed analytics (forthcoming)
  • DDL for initial SQL schema design and SQLAlchemy object model (forthcoming)

Current progress update:

  • We sent out a questionnaire to the community and are acting on the responses.
  • Since we have already built major parts of the chain parser, we have a good understanding now of what tables are easily available to build the associated analytics around.
  • Initial SQL schemas for the base tables are being completed and we’re now building the schemas for the enriched analytics tables.

Phase 2: Infrastructure Deployment (40%)

Milestones:

  • Get infrastructure up in a pattern that can support multiple environments (in progress)
  • Build high throughput architecture for delivering analytics (in progress)
  • Selection of long term storage options and short term query optimized solutions (in progress)

Deliverables:

  • Terraform and Ansible to stand up Airflow, workers, OLTP DB, OLAP DB, and business intelligence dashboarding tools with automation

Current progress update:

  • We have the patterns and associated components developed to support the analytics stack.
  • Currently we just need to integrate them all and stand up the associated infrastructure.

Phase 3: Data Pipelines (40%)

Milestones:

  • Chain parsers and data pipelines feeding intermediary tables and data warehouse (in progress)
  • Database tuning and index optimization for high fidelity exploratory queries (forthcoming)

Deliverables:

  • A collection of Airflow DAGs to construct data pipelines (forthcoming)
  • Scheduled jobs to build reports and analysis tables (forthcoming)

Current progress update:

  • ETL package for extracting blocks, transactions, and transaction logs has been completed following the blockchain-etl template including automated tests and deployment to Pypi

Phase 4: Data Visualization (0%)

Milestones:

  • Build dashboards to support high-graded metrics from initial community canvassing (forthcoming)

Deliverables:

  • Deployment of Superset business analytics tooling with support for integration with Tableau (forthcoming)
  • Visualizations that can be wrapped in iframes and embedded into various block explorers and other tools the community uses to visualize metrics (forthcoming)

Current progress update:

  • Currently visualization has not been started.

Review Result Comments

Review Result

Approve

Review Comments

Thank you for your work and we look forward to the completion of this product