您好,登錄后才能下訂單哦!
小編給大家分享一下如何使用perl實現ncbi基因組數據格式轉換,希望大家閱讀完這篇文章之后都有所收獲,下面讓我們一起去探討吧!
ncbi基因組數據格式轉換
大家都知道從ncbi下載的基因組數據與正常的基因組數據有所不同,所有的基因轉錄本CDS的ID都是重新起的,用起來非常不方便,為此寫了一perl腳本將其ID轉換回來。
用法如下:
Usage Forced parameter: -gff genoma gff file <infile> must be given -fa genoma fasta file <infile> must be given -o output gff file <outfile> must be given -out output fasta file <outfile> must be given Other parameter: -h Help document
代碼如下:
use Getopt::Long; use strict; use Bio::SeqIO; use Bio::Seq; #get opts my %opts; GetOptions(\%opts, "gff=s", "o=s", "fa=s", "out=s","h"); if(!defined($opts{gff}) || !defined($opts{o}) || !defined($opts{fa}) || !defined($opts{out}) || defined($opts{h})) { print <<"Usage End."; Usage Forced parameter: -gff genoma gff file <infile>must be given -fa genoma fasta file <infile>must be given -o output gff file <outfile>must be given -out output fasta file <outfile>must be given Other parameter: -h Help document Usage End. exit; } open(IN,"$opts{gff}") || die "open $opts{gff} failed\n"; open(OUT,">$opts{o}") || die "open $opts{o} failed\n"; my %chr; my %gene; my %gene_mrna_num; my %mrna; my %mrna_exon_num; while(<IN>){ if(/^#/){ print OUT $_; next; } chomp; my @line = split("\t"); ########################################################################### if($line[2] eq "region"){ if($line[8] =~/chromosome/){ $line[8] =~ /chromosome=([^;]*)/; my $chromosome = $1; $chr{$line[0]} = $chromosome; }else{ $chr{$line[0]} = $line[0]; } } ########################################################################### if($line[2] eq "gene"){ $line[8] =~ /ID=([^;]*);Name=([^;]*)/; my $geneid = $1; my $genename = $2; $line[8] =~ s/$geneid/$genename/; $gene{$geneid} = $genename; $gene_mrna_num{$geneid} = 0; } ########################################################################### if($line[2] eq "mRNA"){ $line[8] =~ /ID=([^;]*);Parent=([^;]*)/; my $mrnaid = $1; my $geneid = $2; $line[8] =~ s/$geneid/$gene{$geneid}/; $gene_mrna_num{$geneid}++; $line[8] =~ s/$mrnaid/$gene{$geneid}\.$gene_mrna_num{$geneid}/; $mrna{$mrnaid} = "$gene{$geneid}.$gene_mrna_num{$geneid}"; $mrna_exon_num{$mrnaid} = 0; } ########################################################################### if($line[2] eq "exon"){ $line[8] =~ /ID=([^;]*);Parent=([^;]*)/; my $exonid = $1; my $mrnaid = $2; $mrna_exon_num{$mrnaid}++; $line[8] =~ s/$mrnaid/$mrna{$mrnaid};Name=$mrna{$mrnaid}\.exon$mrna_exon_num{$mrnaid}/; $line[8] =~ s/ID=$exonid;//; } ########################################################################### if($line[2] eq "CDS"){ $line[8] =~ /ID=([^;]*);Parent=([^;]*)/; my $cdsid = $1; my $mrnaid = $2; $line[8] =~ s/$mrnaid/$mrna{$mrnaid}/; $line[8] =~ s/$cdsid/$mrna{$mrnaid}/; } $line[0] = $chr{$line[0]}; print OUT join("\t",@line)."\n"; } close(IN); close(OUT); my $in = Bio::SeqIO->new(-file => "$opts{fa}" , -format => 'Fasta'); my $out = Bio::SeqIO->new(-file => ">$opts{out}" , -format => 'fasta'); while ( my $seq = $in->next_seq() ) { #my($id,$sequence)=($seq->id,$seq->seq); my($id,$sequence,$desc)=($seq->id,$seq->seq,$seq->desc); my $newSeqobj = Bio::Seq->new(-seq => $sequence, -desc => $desc, -id => $chr{$id}, ); $out->write_seq($newSeqobj); }
看完了這篇文章,相信你對“如何使用perl實現ncbi基因組數據格式轉換”有了一定的了解,如果想了解更多相關知識,歡迎關注億速云行業資訊頻道,感謝各位的閱讀!
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。