Annotations
Triticum aestivum Chinese Spring
IWGSC RefSeq Annotations :
Under the leadership of Frédéric Choulet and Hélène Rimbert (INRAE) and with funding from the French Government managed by the Research National Agency (ANR) under the Investment for the Future program (BreedWheat project ANR-10-BTBR-03), a new annotation, IWGSC Annotation v2.1, to accompany RefSeq v2.1 was completed. Initially, the previous annotation was updated to IWGSC Annotation v1.2 by integrating a set of 117 novel genes and 81 microRNA, many of which had been curated manually by the wheat community and then this, in turn, was used to annotate IWGSC RefSeq v2.1. The transposable elements (TEs) in the resulting assembly IWGSC RefSeq v2.1 were reannotated and gene annotation was updated by transferring the previously known gene models (v1.1) using a fine-tuned, dedicated strategy implemented in the Marker-Assisted Gene Annotation Transfer for Triticeae (MAGATT ) pipeline. The newly released IWGSC RefSeq Annotation v2.1 contains 266,753 genes comprising 106,913 HC genes and 159,840 LC genes.
Article: Zhu et al., Optical maps refine the bread wheat Triticum aestivum cv Chinese Spring genome assembly, Plant J. 2021 Apr 24, https://doi.org/10.1111/tpj.15289
cf. IWGSC announcement .
The corresponding assembly is also available in open access: IWGSC RefSeq v2.1
- News:
- Last files added are:
- functional annotation
- all correspondances between IWGSC RefSeq v1.0, v1.1, v2.1 and survey sequence.
- IWGSC Annotation v2.1 including NCBI coding protein is available for display in the Wheat Apollo .
- IWGSC Annotation v2.1 and other wheat genomes annotations are available in Pretzel .
- Last files added are:
- IWGSC RefSeq v1.0 annotation is publicly available for download and display in a browser and in a InterMine .
The IWGSC RefSeq v1.0 annotation includes gene models generated by integrating predictions made by INRA-GDEC using Triannot and PGSB using their customised pipeline (previously MIPS pipeline). The integration was undertaken by the Earlham institute (EI), who have also added UTRs to the gene models where supporting data are available. Gene models have been assigned to high confidence (HC) or low confidence (LC) classes based on completeness, similarity to genes represented in protein and DNA databases and repeat content. The automated assignment of functional annotation to genes has been generated by PGSB based on AHRD parameters.
In addition, annotated transposable elements (TEs), non-coding RNAs, varietal SNPs, RH maps, GBS maps, optical maps are available.
The syntenic gene pairs are available for download .
More information about these data is provided in the README file .
- IWGSC RefSeq v1.1 annotation is publicly available for download and display in a browser and in a InterMine .
It is the new version of the genes annotation which refer to the same assembly. It includes genes and RNAseq mapping.
In comparison of the v1.0 annotation, 3 modifications were done:
- add wrongly removed genes during the integration
- remove LC which have an over lap with manually curated genes (IWGSC_v1.1_LC_removed.ids )
- update ids of TE - LC genes coming from the HC set in order to fit with the LC naming and numbering (IWGSC_v1.1_LC.correspondanceTEHC.txt).
More information about these data is provided in the README file .
How to access the data?
All these data are now in open access. While scientists may freely publish using the IWGSC data, IWGSC does request that the source of the data be properly acknowledged.
>>> The corresponding Assembly is accessible here . <<<
These data should be displayed in Ensembl Plants and GrainGenes .
Warning:
Notice that some bioinformatics tools (e.g. GATK) requiere that you split the chromosomes to chunks of 512 Mbp maximum.
IWGSC Survey sequence annotations
Versions 1 and 2 :
- Gene models performed by MIPS
plant group (K. Mayer) are publicly available.
Major changes are:
a.) we re-named the genome assembly scaffolds from the old e.g ">10" identifiers to ">ta_iwgsc_1al_v2_10" identifiers for the fasta files of CLEANED and repeat-masked genome sequences and adapted the ids in the annotation GTF files accordingly.
b.) we fixed an issue with missing stop codons in the gene prediction fasta and GTF files
NO structural changes were made between v2.1 and v2.2 annotation, all gene identifiers remain stable, so this update can be considered cosmetic and mainly attributed to better user convenience.
Re-named genome assembly: genome_assembly/genome_arm_assemblies_CLEANED/ and genome_assembly/genome_arm_assemblies_CLEANED_REPMASKED/
gene predictions incl. changelog, README, ...: genePrediction_v2.2/
- Genome Zipper performed by MIPS plant group (K. Mayer) is publicly available.
Version 3 :
Gene models performed by the National Research Council Canada and the U. of Saskatchewan (A. Sharpe, D. Konkin and C. Pozniak) are publicly available.
Triticum aestivum Renan
The Renan RefSeq v2.0 and V2.1 annotations are available in open access for download .
Corresponding assembly is available for download .
Article: Aury et al. Long-read and chromosome-scale assembly of the hexaploid wheat genome achieves high resolution for research and breeding, GigaScience, Volume 11, 2022, giac034, https://doi.org/10.1093/gigascience/giac034