Summary
•Long noncoding RNAs (lncRNAs) are transcripts of at least 200 bp in length, possess
no apparent coding capacity and are involved in various biological regulatory
processes. Until now, no systematic identification of lncRNAs has been reported
in cotton (Gossypium spp.).
•Here,
we describe the identification of 30 550 long intergenic noncoding RNA
(lincRNA) loci (50 566 transcripts) and 4718 long noncoding natural antisense
transcript (lncNAT) loci (5826 transcripts). LncRNAs are rich in repetitive
sequences and preferentially expressed in a tissue-specific manner. The
detection of abundant genome-specific and/or lineage-specific lncRNAs indicated
their weak evolutionary conservation. Approximately 76% of homoeologous lncRNAs
exhibit biased expression patterns towards the At or Dt subgenomes. Compared
with protein-coding genes, lncRNAs showed overall higher methylation levels and
their expression was less affected by gene body methylation.
•Expression validation in different cotton accessions and coexpression network construction
helped to identify several functional lncRNA candidates involved in cotton
fibre initiation and elongation. Analysis of integrated expression from the
subgenomes of lncRNAs generating miR397 and its targets as a result of genome polyploidization
indicated their pivotal functions in regulating lignin metabolism in
domesticated tetraploid cotton fibres.
•This study provides the first comprehensive identification of lncRNAs in
Gossypium.