This tutorial web page provides an introduction to the CellCards features.
CellCards Data Collection, Annotation, and Processing
There are different ways to generate the cell cards, including extraction of cell knowledge from existing resources, manual collection from published papers, and automatic or semi-automatic literature mining.
For the existing knowledge base data extraction, we have first extracted cell knowledge from the ASCT+B tables. The Anatomical Structures, Cell Types, plus Biomarkers (ASCT+B) tables are authored and reviewed by international teams of anatomists, pathologists, physicians, and other experts. The extraction and reformatting of the knowledge from the ASCT+B tables allow us to have high quality data available for CellCards.
The podocyte cell card data were manually generated. Later, we can more automatically generate the cell card data using different methods.
One method of cell card data generation is to generate a standard spreadsheet template, whicn can be used by data submitters (or data submitting tools) to generate and submit data for a specific cell card. The populated sheet will be validated by a data validator.
There are different ways to validate the submitted data. For example, the DataHarmonizer (Hsiao Lab) can be used to support the data validation, which can be internally supported by our LinkML-based schema as described above.
CellCards schema is a LinkML-based design schemas that are easily shared across many platforms and communities. CellCards schema is designed to define the minimum information standards, the structure of the cell cards, and mappings between cell card fields and ontology terms. This will allow us to validate data that goes into the cell cards and disseminate documentation to the community about the information standards required to produce CellCards. We will also use LinkML to drive the CellCards user interface. This will standardize the workflows, making it easier for other groups to use and ensuring rigor and reproducibility. We will leverage the knowledge base created in other resources such as ASCT+B2.
More information about the CellCards Schema: https://cellcards.org/standards/cellcards_schema.php.
Data Storage in Databases
The submitted and validated data can be stored in our CellCards server in different ways, faciliting the public query, analysis, and downloading. We are using two types of databases:
- MySQL database:
Data ETL (extract, transform, and load) tool development
ETL is the process to extract data from different sources, transform the data into a resource like a relational database, and load the data into the systems that end-users can access and use.
Web query and analysis
We are :
Web links introduced above:
- CellCard Schema GitHub (https://github.com/CellCards/CellCard-Schema):
- Schema cellcard.yaml: https://github.com/CellCards/CellCard-Schema/blob/main/src/linkml/cellcard.yaml
- Data example (podocyte): https://github.com/CellCards/CellCard-Schema/blob/main/src/data/examples/podocyte-001.yaml
- Jupyter notebook example (podocyte): https://github.com/CellCards/CellCard-Schema/blob/main/notebooks/podocyte-linkml-example.ipynb
- Schema Documentation:
- CellCard Schema Documentation: https://cellcards.github.io/CellCard-Schema/CellCard/
- Linkml tutorial: https://linkml.io/linkml/intro/tutorial.html
- Podocyte Cell Card: https://cellcards.org/podocyte.php
- DataHarmonizer GitHub: https://github.com/cidgoh/DataHarmonizer.
More information will be provided later. Stay tuned ...