Report Name
Blockchain Analytics and Data Pipelines - Report 3
Reporting Period
August 21 - September 19 (30 days)
Report Description
The ICON-ETL package has had its first full release (v0.1.0), and is available on Pypi! Integration with Airflow is underway, with DAGs working to extract and load the blockchain data into a Postgres database. The recent change to the ICON RPC Server (as part of the 1.5.0 release) has been fixed as part of the 1.5.1 release, however the bug did significantly delay our work on this project by nearly a month. Basic queries on data are functioning, with visualizations coming in the next week or so.
Project Completion Percentage
80%
Remaining Time to Completion
3.5 weeks
Expected Results for the Next Period
Next reporting period should see the final deliverables for the project which was delayed as mentioned above. We also have two fellows working on building analytics solutions on top of our process this session. Work from their projects will be integrated when their fellowship ends in 2 months.
Materials Proving Progress on the Project
- icon-etl
- icon-etl-airflow
- Infrastructure
Review of each KPI (Key Performance Indicator) or specific goal/milestone
Phase 1: Metrics High Grading (80%)
Milestones:
- Canvass community and foundation for a high grading of metrics to be collected (completed)
Deliverables:
- Consolidated list of metrics and associated tables needed to feed analytics (in progress)
- DDL for initial SQL schema design and SQLAlchemy object model (completed)
Current progress update:
- Final bugs are being worked out with Airflow, so the database should be ready shortly for community analytics use.
Phase 2: Infrastructure Deployment (80%)
Milestones:
- Get infrastructure up in a pattern that can support multiple environments (nearly complete)
- Build high throughput architecture for delivering analytics (in progress)
- Selection of long term storage options and short term query optimized solutions (in progress)
Deliverables:
- Terraform and Ansible to stand up Airflow, workers, OLTP DB, OLAP DB, and business intelligence dashboarding tools with automation
Current progress update:
- IaC repos for Airflow and databases are essentially completed with some final bug fixes being worked on.
Phase 3: Data Pipelines (75%)
Milestones:
- Chain parsers and data pipelines feeding intermediary tables and data warehouse (in progress)
- Database tuning and index optimization for high fidelity exploratory queries (forthcoming)
Deliverables:
- A collection of Airflow DAGs to construct data pipelines (nearly complete)
- Scheduled jobs to build reports and analysis tables (forthcoming)
Current progress update:
- DAGs to export and load blockchain data are essentially complete, with final bug fixes pending.
Phase 4: Data Visualization (50%)
Milestones:
- Build dashboards to support high-graded metrics from initial community canvassing (forthcoming)
Deliverables:
- Deployment of Superset business analytics tooling with support for integration with Tableau (complete)
- Visualizations that can be wrapped in iframes and embedded into various block explorers and other tools the community uses to visualize metrics (forthcoming)
Current progress update:
- Superset has been deployed and is connected to the dev database with basic visualizations being done.
- Work is being started on developing useful visualizations.