Thursday, February 12, 2015

American Gut Project Data Mining Tutorial

Sorry, non-geeky types...most boring blog post ever!  I'm putting this up as a resource for anyone who wants to try it out.  If you do, and have any short-cuts, please post in the comments and I'll amend later.

Special thanks to Anonymous Commenter "B" and The Self-Taught Author as well as Richard Sprague for the links, guidance and patience to work your way through all these metagenomic tools. I feel we learn a bit more every day, even if it's not what we hoped to learn!

I wrote these instructions out as I went through analyzing some of my own gut results.  Let me know how they work if you try.

Step 1

  • First, go to the Euro Nucleotide Archive biosample page and enter your 9 digit AmGut kit #, example 000016449, and search by 'samples' from dropdown (default).
  • One of the results you get should say in the description "American Gut Project Stool sample"
  • Click the link under 'Accession' and make sure this is your sample.
  • Scroll to bottom and click the link next to "databases" (ex: ERS577362)
  • Scroll down and look for a column labeled "Fastq files (ftp)", click on File 1.
  • When prompted, save this file to your desktop or somewhere handy, change the name from the default (ERR667803.fastq.gz) to something more appropriate (AmGut2.fastq.gz).
  • You are done with Step 1

Step 2

  • Log into MG-Rast. Create an account if needed.
  • Click the Green UP arrow in upper right hand corner box to upload the file you just made in step 1.
  • Skip their step 1, go to step 2, find the fastq.gz file you just created and upload it. Do not generate a webkey, not needed. Don't do the Md-5 check, just close it when prompted.
  • Go to step 3, manage Inbox. Highlight your new file, then select 'unpack selected' (takes a few minutes to decompress).
  • Click 'update inbox', the fastq.gz file changes to .fasta. Highlight the new file and see that there are no errors.
  • Click Step 1 under Data Submission, click the "I do not want to supply metadata" box, and select.
  • Step 2, create a new project with unique name, or use existing.
  • Step 3, You should see your file here, check the box and then 'select'.
  • Step 4, leave all as default, click 'select'.
  • Step 5, click the very top box, for highest priority (otherwise it will take days!), Submit the job, and make a note of the files in the popup box for future reference. Click OK, then you'll get a "Successful Job" popup with another number, write that down, too!

Step 3

  • To check progress of new jobs, click on the Globe Icon in upper right corner
  • In the "Browse Metagenomes" section, you'll see a listing of your jobs and files. "In progress" shows how many you are waiting on.
  • Click the little linked number 'available for analysis', and you'll get a table with all of your files.
  • Click the link under 'name' column and you'll get a good analysis of your file, but you can also analyze the data using Step 4 procedures.

Step 4

  • Click the 'barchart' icon in top right corner.
  • Under 'data selection', expand the 'metagenomes' This will give a list of all that you have put into MG-rast.
  • Select a Genome from the dropdown
  • Use the arrows to slide it over to the box on the right. Once you get the hang, you can select 2 samples and compare them.
  • Change 'annotation sources' from M5NR to GreenGenes. Green Genes is what AmGut uses. It's a master list of microbes. Remember, they can't see what microbe they are looking at, they can only compare it to a known microbe in a 16srRNA library! These sources are the different libraries.
  • Leave all other parameters alone.
  • Now, later you can play, but choose 'Table' and 'generate'
  • Group table by Species, and hit change.
  • You'll have hundreds of line items, 15 or so per page. Scroll through and you'll see what was in the sample.
Too easy!









20 comments:

  1. Sadly, I was intimidated by the instructions for analyzing uBiome samples at Richard's site.

    ReplyDelete
    Replies
    1. Give these a try Jin. You can skip the first step and get your files from uBiome. If you have any problems, someone should be able to help you get the data up and running.

      B

      Delete
  2. Looks like I've got 8 fastq.gz files from uBiome. I got them uploaded to MG-Rast. I see them in the "manage inbox" section but they have been "decompressing" for almost a half hour now. Is that normal?

    ReplyDelete
    Replies
    1. You have to refresh your browser. Or click update inbox.

      Skip their step 1, go to step 2, find the fastq.gz file you just created and upload it. Do not generate a webkey, not needed. Don't do the Md-5 check, just close it when prompted.

      Go to step 3, manage Inbox. Highlight your new file, then select 'unpack selected' (takes a few minutes to decompress).

      Click 'update inbox', the fastq.gz file changes to .fasta. Highlight the new file and see that there are no errors.

      Delete
    2. It's been another 20 minutes (after selecting "unpack selected") and it still indicates the files are decompressing. Still decompressing after refreshing browser. Still decompressing after selecting "update inbox".

      We'll see!

      Delete
  3. Something is wrong.

    Maybe you need to unzip what you got from uBiome first? Seems I had that problem. It should only take 30 seconds to decompress!

    ReplyDelete
  4. Yes, unpack selected, then refresh inbox. .gz will be converted to just fastq. Try selecting a single file. Press "Unpack Selected", then "refresh inbox".

    ReplyDelete
    Replies
    1. That you, "B"?

      I just tried downloading from uBiome, it's a 50MB zipped file, what's in it?

      Delete
    2. LOL. Lots more data than AmGut, remember. BTW, mine was much smaller than yours. You have too much diversity. I thought they all were as small as my file.

      Delete
    3. But when you unzip it, what do you get? fastq.gz files? Or do you just put what you download from uBiome straight into MG-rast without unzipping.

      Jin said there were 8 files.

      Delete
    4. Assumed 8 = 8 runs. The .fastq.gz is zipped (compressed). Unpack converts it to .fastq

      Delete
    5. I'm trying again. This time I clicked on all eight of the fastq.gz files downloaded from uBiome first (this unzips them?), then uploaded those to MG-Rast. I'm still waiting for them to upload.

      Delete
    6. 7 appear to have loaded in the inbox (turned black), but one is still hanging there, uploading, in light grey.

      Delete
    7. What's the file extension? .fastq.gz? If so, they need to be unpacked. Try one at a time. The gray one may be from before, the one that seemd to be stuck decompressing.

      Delete
    8. I'm not sure how I ended up freeing the one that was hung up, but all 8 files were uploaded and my data is in the pipeline! Cheers!

      Delete
    9. That was my first experience too! Maybe we all did something wrong, or maybe that's just something built in to the system :) This next part can take some time, so be prepared to wait a little. Enjoy playing around with the data.

      Delete
    10. I will probably need some hand holding for the next step too, but psyched I got this far.

      Delete
  5. I can't create an MG-RAST account. It says I've entered an incorrect reCaptcha, but there is no ReCaptca anywhere on the page. I've tried several time.

    ReplyDelete
    Replies
    1. Which browser are you using? I use Chrome and could find a way to see it with and without the reCaptcha :) Try this link which has a slightly different format than the page generated if you press the Register link on the home page. Not sure if it will fully work as I only tested to see if the reCaptcha would show or not.

      Delete