VeggiePharm: American Gut Project Data Mining Tutorial

Thursday, February 12, 2015

American Gut Project Data Mining Tutorial

Sorry, non-geeky types...most boring blog post ever! I'm putting this up as a resource for anyone who wants to try it out. If you do, and have any short-cuts, please post in the comments and I'll amend later.

Special thanks to Anonymous Commenter "B" and The Self-Taught Author as well as Richard Sprague for the links, guidance and patience to work your way through all these metagenomic tools. I feel we learn a bit more every day, even if it's not what we hoped to learn!

I wrote these instructions out as I went through analyzing some of my own gut results. Let me know how they work if you try.

Step 1

First, go to the Euro Nucleotide Archive biosample page and enter your 9 digit AmGut kit #, example 000016449, and search by 'samples' from dropdown (default).
One of the results you get should say in the description "American Gut Project Stool sample"
Click the link under 'Accession' and make sure this is your sample.
Scroll to bottom and click the link next to "databases" (ex: ERS577362)
Scroll down and look for a column labeled "Fastq files (ftp)", click on File 1.
When prompted, save this file to your desktop or somewhere handy, change the name from the default (ERR667803.fastq.gz) to something more appropriate (AmGut2.fastq.gz).
You are done with Step 1

Step 2

Log into MG-Rast. Create an account if needed.
Click the Green UP arrow in upper right hand corner box to upload the file you just made in step 1.
Skip their step 1, go to step 2, find the fastq.gz file you just created and upload it. Do not generate a webkey, not needed. Don't do the Md-5 check, just close it when prompted.
Go to step 3, manage Inbox. Highlight your new file, then select 'unpack selected' (takes a few minutes to decompress).
Click 'update inbox', the fastq.gz file changes to .fasta. Highlight the new file and see that there are no errors.
Click Step 1 under Data Submission, click the "I do not want to supply metadata" box, and select.
Step 2, create a new project with unique name, or use existing.
Step 3, You should see your file here, check the box and then 'select'.
Step 4, leave all as default, click 'select'.
Step 5, click the very top box, for highest priority (otherwise it will take days!), Submit the job, and make a note of the files in the popup box for future reference. Click OK, then you'll get a "Successful Job" popup with another number, write that down, too!

Step 3

To check progress of new jobs, click on the Globe Icon in upper right corner
In the "Browse Metagenomes" section, you'll see a listing of your jobs and files. "In progress" shows how many you are waiting on.
Click the little linked number 'available for analysis', and you'll get a table with all of your files.
Click the link under 'name' column and you'll get a good analysis of your file, but you can also analyze the data using Step 4 procedures.

Step 4

Click the 'barchart' icon in top right corner.
Under 'data selection', expand the 'metagenomes' This will give a list of all that you have put into MG-rast.
Select a Genome from the dropdown
Use the arrows to slide it over to the box on the right. Once you get the hang, you can select 2 samples and compare them.
Change 'annotation sources' from M5NR to GreenGenes. Green Genes is what AmGut uses. It's a master list of microbes. Remember, they can't see what microbe they are looking at, they can only compare it to a known microbe in a 16srRNA library! These sources are the different libraries.
Leave all other parameters alone.
Now, later you can play, but choose 'Table' and 'generate'
Group table by Species, and hit change.
You'll have hundreds of line items, 15 or so per page. Scroll through and you'll see what was in the sample.

Too easy!

20 comments:

JinFebruary 20, 2015 at 5:35 PM
Sadly, I was intimidated by the instructions for analyzing uBiome samples at Richard's site.
ReplyDelete
Replies
JinFebruary 21, 2015 at 3:19 PM
Looks like I've got 8 fastq.gz files from uBiome. I got them uploaded to MG-Rast. I see them in the "manage inbox" section but they have been "decompressing" for almost a half hour now. Is that normal?
ReplyDelete
Replies
Tim SteeleFebruary 21, 2015 at 3:49 PM
Something is wrong.

Maybe you need to unzip what you got from uBiome first? Seems I had that problem. It should only take 30 seconds to decompress!
ReplyDelete
Replies
AnonymousFebruary 21, 2015 at 3:59 PM
Yes, unpack selected, then refresh inbox. .gz will be converted to just fastq. Try selecting a single file. Press "Unpack Selected", then "refresh inbox".
ReplyDelete
Replies
Tim SteeleFebruary 21, 2015 at 5:15 PM
Great! Thanks for helping, Bar.
ReplyDelete
Replies
IlaineMarch 1, 2015 at 8:11 AM
I can't create an MG-RAST account. It says I've entered an incorrect reCaptcha, but there is no ReCaptca anywhere on the page. I've tried several time.
ReplyDelete
Replies

Add comment