DNA Exchange Proposal on ICE CPS

This is a simple overview. We will polish this up once we get some feedback. Thank for taking a look.

  1. Title: DNA Exchange
  2. Project Category
  • Development
    • Developer support and product ideas - wallets, block explorers, dapps, developer documentation, etc.
  1. Project Description

Introduction

DNA data was once for the scientific elite. Today over 26 million people have taken a DNA test. DNA test kit companies hold the data from millions of consumers with the potential to deliver huge profits. Now DNA Exchange puts control of DNA data into the hands of ordinary consumers. DNA Exchange will anonymously host DNA data and compensate consumers upon usage of their DNA data.

Opportunity Overview

DNA testing companies are compiling large amounts of our data from millions of tests. Consumers are unaware that their data is more valuable to the companies than their test results. Companies that are collecting data are selling this data to major research labs and pharmaceutical companies for millions of dollars. We the consumer see no compensation when these deals are done. In reality, most consumers have no idea that these multi million dollar deals are happening. In 2020 23andMe sold the right to develop a drug based on users data.

There are many genetic testing companies that say we own our data but consumers are actually unaware of this. Everyone that does a direct to consumer test, such as 23andMe or any other genetic test, has access to their data and can do what they want with it. However, consumers have no idea what they can do with their data or how valuable it actually is. This is where the DNA Exchange can help consumers monetize their data.

We would like to add a Defi and NFT component to this project making your DNA data even more valuable. Mints a native and NFT based on your DNA profile will increase rewards and transaction fees.

  1. Project Duration would be more than 6 months as there is quite a bit of development.

  2. Project Milestones

  • Stage 1- Start building front end website
  • Stage 2- DNA Blockchain wallet - KYC for yourself - interface (ICE blockchain)
  • Stage 3- Direct to consumer DNA pool - Data / Mining native token
  • Stage 4- Defi LP for mining native token
  • Stage 5- Design NFTs for farming boost
  • Stage 6- Data set access - monetization works (staking)
  • Stage 7- Data set access - monetization works
  • Stage 8- Other sequence data pools
  1. Funding Amount Requested would be around 60K ICX

Technology/roadmap addendum/appendix:

How to verify on and off/chain data? We preserve privacy by only storing the Schnorr signature and searchable metadata e.g. age, type of DNA included, etc. on-chain. The actual personal information is stored off-chain on user devices. When an agreement to purchase the data access is made, the smart contract providing access to the data is made, and users can choose whether they would like to have distributed data availability when they upload (likely more and easier transactions) or whether to only keep their data on their own device (requires device to be online and connected to provide data), or whether to entrust their data to an escrow provider.

How to do searching? The data, and thus the network is valuable when people who want DNA info, can search for the datasets they need. A typical flow may look as follows: Scientist at HealthyLife Pharma LLC wants to initiate a GWAS study on a particular SNP trait, and also wants to collect data on heart conditions. Scientist logs onto Ploidy DNA exchange and starts a query for the relevant heart condition code, and finds data sets and records associated with the disease they are studying. Scientists can then propose a price, and the record holders who have already set an Ask price that is below the Bid threshold will have the smart contract automatically executed to share their DNA data. For the blockchain, nodes will index the important parameters for easy search ability and nodes can provide decentralized RPC search service calls for a specified compute gas fee.

How to upload data, how to keep the data format consistent and usable? We plan to accept the most popular formats of DNA data both produced by commercial and academic labs e.g. from physician-ordered lab tests, and also from consumer DNA testing services e.g. 23AndMe, AncestryDNA. The data is most useful when also associated with health records and codes e.g. to have particular health conditions coded as well, so Ploidy DNA exchange will also handle the most common types of text metadata and electronic medical record systems.

How to validate data that users upload? We will use NLP and AI-powered validation techniques to prevent data spam and false data form being uploaded.

How to package data together in big data sets? Most large-scale consumers of data e.g. pharmaceutical companies want to run analysis on thousands or more of data points. Thus we will allow larger institutional data providers and labs to upload larger data sets to our system.

Usage as it relates to on-chain technology:
Users will upload their DNA data, and receive an initial reward, to help seed the network. We will partner as needed to get the larger initial seed datasets. We want these to be paired with health records. Additionally, we will also get the records from various genetic testing companies, and back-trace to connect these with actual people and their characteristics.

Nodes will run software that continues to create and maintain the blockchain, and also provide database search service.

We will offer a convenient web interface and APIs for both data providers and data users to upload and search the DNA records.

This is a very interesting idea indeed. The exact reason I’ve held off from doing any of these DNA tests is that I was never satisfied with how my data would be handled/monetized.

The first thing that comes to mind is it is not clear to me how you plan to collect and then monetize user’s DNA data. Do you already have a data set? Will you exclusively use DNA data brought to you by users of the platform? Do you have connections to a lab that can handle sequencing. Do you have connections to entities that would pay for this data? If not then how do you plan to monetize the data set and compete with other entities with larger, more mature sets?

These are just a few of my initial thoughts on the proposal. It would be nice to see some further discussion around this topic as I think it’s very interesting and very current. I also like the slightly different take on use of NFT as digital ownership of your DNA data, would be very interested to hear more about how that would work.

Thanks

Ed

Hi Ed,

Thanks for taking a look. The idea is for users to sign up and deposit their data or store on the blockchain. Array data like 23andme is small, so storing on chain might be the best solution. We would incentivize users to store their data for future access. We would also like to partner with labs that do similar testing to either store their data or allow their patient to monetize off of it. We would also approach researchers that have similar data laying. The hope is to grow a large enough data set to offer to many groups at a much lower rate than what is offered now. We may offer free access to general data, but pay a premium for specific user info (given with users consent/questionnaires).

I currently work for a very large genetic testing company and have many contacts in the genetic and pharma space.

The NFTs will be build around peoples specific data. There are rare variants in the population that people carry. Those people would generate a more rare DNA NFTs (art). Those NFTs can then be used to increase their mining emissions or increase their percentage of the transaction fees pool.

There a lot of programing to be done and marketing as well. I will post some costs, but this grant won’t cover all the costs.

I am a molecular biologist with emphasis on genetics and proteomics (by education… I never really worked in the field professionally). So this proposal is quite interesting; however, it feels like it needs to be at scale.

  1. To start with my worry would be that DNA testing still is not very common and usually, in my opinion at least, is done as a reaction to something. This proposal will require people to be educated about the potential benefit of paying for the tests and their string the data.

  2. DNA testing is broad and varied; do you plan to store data from all types of tests?

  3. What would be a minimum sample size for this to be a viable product?

  4. Could you elaborate on the role the native token will play a little more?

I think this is very exciting honestly but feel that your initial pitch requires a little more elaboration. Good luck!

Great questions…

  1. You’d be surprised how many DTC genetic companies are out there. In Asia I know of 5. The array data is all the same so tapping into that market would be easy. Everyone in Asia is looking for passive income.

  2. The first step is to store array data… it’s much like a VCF file. Arrays typically have over 500K SNPs, so there will be a lot of data points. Eventually we would like to expand to other file type and single variant results.

  3. The sample size will vary depending on your study. GWS studies need 10K or more sample size and the more complex the disorder, the larger the sample. You actually don’t need a large sample if you have a well defined disorder, such as a single allelic disorder.

  4. The token will be generated when people upload or lock their data. The rewards will only be minted based on how long they keep their data accessible. Like getting paid to contribute data. The user can then stake their token to get rewards from the transaction pool or contribute to LP (the model will be much like iconbet staking). The user can also mint more native token by answering specific questions regarding their health or contributing to an LP. The native token will also be used to mint NFT (and burned), for larger token rewards.

Please let me know if you have any other questions. I welcome the feedback

Interesting proposal. Since all data on the chain is publically available, how will you handle permission/a ccess to this sequence data?

Hi skyvell,

We would store the data at storage layer, via smart contracts. Smart contracts will control the ability to query the database and who has access to what data.

This is actually a very interesting project because I know how valuable data can be. You haven’t really described how a user (someone who wants to buy the data) will acquire it, how much it’ll be per data point, and how a consumer (person who supplies the data) will benefit from that cost. I’m assuming this would all be described in detail at some point so just checking in on it. I guess my concern with this is making sure that the consumer gets a decent amount for their data and it’s not just another 23 & me making all the money with a new “blockchain” hype to it.

I’m assuming there would be a token that your team would hold a certain % of, would there also be a marketing fund to acquire the users (data purchasers)?

As someone mentioned above the blockchain transactions are public, so anyone storing data would have to send the data to the contract in a public transaction. I could scrape your entire contract’s tx history and just grab all the data entered. How will you make sure something like this can’t happen?

2 Likes

Hi Geo,

Thanks for the reply. We will set it up so that token holders will share in 100% of the transaction fees that are generated from accessing the data. Users will be able to mine the token by staking their data and contributing phenotypic information about themselves.

We will hold a DAO fund for marketing and development (much like iconbet).

Data storage is something we are debating… there are a lot of public databases out there, however, they don’t mean a lot without phenotypic information. This information will be stored on the users app and linked to their data for access. The phyenotypic data will only be accessible if the user allows (payment). So the data on chain will basically be free. Anyone can look at it and run it, but you only get allele frequency… which is already known in many cases.

We are looking for development help if you have anyone in mind. We would like to build this on ICE.

Hi Geo,

I have updated some of the technical parts. Please take a look. We are looking for a p-rep sponsor if you or anyone would like to sponsor us.

Technology/roadmap addendum/appendix:

How to verify on and off/chain data? We preserve privacy by only storing the Schnorr signature and searchable metadata e.g. age, type of DNA included, etc. on-chain. The actual personal information is stored off-chain on user devices. When an agreement to purchase the data access is made, the smart contract providing access to the data is made, and users can choose whether they would like to have distributed data availability when they upload (likely more and easier transactions) or whether to only keep their data on their own device (requires device to be online and connected to provide data), or whether to entrust their data to an escrow provider.

How to do searching? The data, and thus the network is valuable when people who want DNA info, can search for the datasets they need. A typical flow may look as follows: Scientist at HealthyLife Pharma LLC wants to initiate a GWAS study on a particular SNP trait, and also wants to collect data on heart conditions. Scientist logs onto Ploidy DNA exchange and starts a query for the relevant heart condition code, and finds data sets and records associated with the disease they are studying. Scientists can then propose a price, and the record holders who have already set an Ask price that is below the Bid threshold will have the smart contract automatically executed to share their DNA data. For the blockchain, nodes will index the important parameters for easy searchability and nodes can provide decentralized RPC search service calls for a specified compute gas fee.

How to upload data, how to keep the data format consistent and usable? We plan to accept the most popular formats of DNA data both produced by commercial and academic labs e.g. from physician-ordered lab tests, and also from consumer DNA testing services e.g. 23AndMe, AncestryDNA. The data is most useful when also associated with health records and codes e.g. to have particular health conditions coded as well, so Ploidy DNA exchange will also handle the most common types of text metadata and electronic medical record systems.

How to validate data that users upload? We will use NLP and AI-powered validation techniques to prevent data spam and false data form being uploaded.

How to package data together in big data sets? Most large-scale consumers of data e.g. pharmaceutical companies want to run analysis on thousands or more of data points. Thus we will allow larger institutional data providers and labs to upload larger data sets to our system.

Usage as it relates to on-chain technology:
Users will upload their DNA data, and receive an initial reward, to help seed the network. We will partner as needed to get the larger initial seed datasets. We want these to be paired with health records. Additionally, we will also get the records from various genetic testing companies, and back-trace to connect these with actual people and their characteristics.

Nodes will run software that continues to create and maintain the blockchain, and also provide database search service.

We will offer convenient web interface and APIs for both data providers and data users to upload and search the DNA records.

This is an interesting proposal. I have a PhD in biology and I agree with the direction that you are proposing. I do believe that you already have competition, and would love to hear more about it.
I am missing a governance function here. I am concerned that ppl can set an unrealistic price for their data and actually prevent scientific progress, or exploit the system. That’s why I think that a governance token is a must. Some diseases are rare. Some are common and the reward structure needs to deal with both scenarios. Have you given it any thoughts? I think it’s a good thing to consider subscriptions of universities and institutes to the data.
I think that your idea is cutting edge and is the next step in data distribution and ownership.
I would love to hear about you, your team, competition, and how to deal with ppl that want tons of money for their data.

Thank you for taking time to review our project and writing feedback. We don’t have all the details pricing of data at this time. There are a lot of working parts around the data and I don’t have all the details at this time. I also don’t want to give people the blue print to start their own project. We will have a governance token distributed for those who contribute data and supply phenotypic data.The competition I have seen don’t allow user to upload their own data and are on ETH. I have some thought and strategies for getting big labs involved too.

1 Like

Thanks for the reply. I think that you presented an important real life problem that involves both money and people health. I think it’s important to separate between industry and academia when considering pricing and such. Wish you the best of luck!

1 Like