Blockchain Analytics and Data Pipelines - Progress Report #2

Report Name

Blockchain Analytics and Data Pipelines - Report 2

Reporting Period

July 23 - August 20 (29 days)

Report Description

The first beta release of ICON-ETL is out, with the basic blockchain data exporters providing blocks, transactions, and transaction-related information. Integration with Airflow has begun with the exporter correctly batching export jobs. It should be noted, however, that a recent change to the ICON RPC Server (as part of the 1.5.0 release) has broken our ability to send batch requests to any ICON node and is impeding our ability to progress further on the non-infrastructure related aspects of this project. An issue has been filed on GitHub and we are awaiting a fix. Work on deployments for the analysis platforms has progressed with Airflow and supporting databases being almost ready for normal use.

Project Completion Percentage

60%

Remaining Time to Completion

3.5 weeks, after ICON RPC server bug is fixed

Expected Results for the Next Period

Providing the ICON RPC Server bug is fixed in short order, the analysis platform will be ready to run the analyses requested by the Foundation and the Community. Repositories for the infrastructure deployments required to run the analysis platform will be complete, and the required code to deploy the ETL DAGs will be ready.

Materials Proving Progress on the Project

Review of each KPI (Key Performance Indicator) or specific goal/milestone

Phase 1: Metrics High Grading (80%)

Milestones:

  • Canvass community and foundation for a high grading of metrics to be collected (completed)

Deliverables:

  • Consolidated list of metrics and associated tables needed to feed analytics (forthcoming)
  • DDL for initial SQL schema design and SQLAlchemy object model (forthcoming)

Current progress update:

  • Community members from Transcranial Solutions have been familiarized with the basics of the ICON-ETL package and will be working with us to determine which data elements are necessary for their analyses, and will be

Phase 2: Infrastructure Deployment (60%)

Milestones:

  • Get infrastructure up in a pattern that can support multiple environments (in progress)
  • Build high throughput architecture for delivering analytics (in progress)
  • Selection of long term storage options and short term query optimized solutions (in progress)

Deliverables:

  • Terraform and Ansible to stand up Airflow, workers, OLTP DB, OLAP DB, and business intelligence dashboarding tools with automation

Current progress update:

  • IaC repos for Airflow and databases is underway and should be completed shortly.

Phase 3: Data Pipelines (50%)

Milestones:

  • Chain parsers and data pipelines feeding intermediary tables and data warehouse (in progress)
  • Database tuning and index optimization for high fidelity exploratory queries (forthcoming)

Deliverables:

  • A collection of Airflow DAGs to construct data pipelines (in progress)
  • Scheduled jobs to build reports and analysis tables (forthcoming)

Current progress update:

  • Creation of DAGs has begun using the ICON-ETL package, however this development is blocked by the bug in the ICON RPC Server.

Phase 4: Data Visualization (20%)

Milestones:

  • Build dashboards to support high-graded metrics from initial community canvassing (forthcoming)

Deliverables:

  • Deployment of Superset business analytics tooling with support for integration with Tableau (in progress)
  • Visualizations that can be wrapped in iframes and embedded into various block explorers and other tools the community uses to visualize metrics (forthcoming)

Current progress update:

  • Currently visualization has not been started, however development on the infrastructure is underway.

Review Result Comments

Review Result

Approve

Review Comments

Thank you for your work.