NSCI0007 Practice Exam 1

Specimen Answers and Mark Scheme

  • The specimen code below demonstrates one way to correctly answer the questions.

  • Full marks will be awarded if the candidate has implemented another suitable method and the code behaves as specified in the question.

  • If the candidate’s code produces an error, or does not behave as specified in the question, partial credit will be awarded as described in the mark scheme.

  • Where a candidate has used a different method to below, partial credit will be awarded in an analogous way.

Question 1 [7]

def overlap(x, y):
    n = min(len(x), len(y))
    for i in range(n, 0, -1):
        if x[-i:] == y[:i]:
            return i
    return 0

n1 = overlap("XXXABC", "ABCYYY")
n2 = overlap("ABCYYY", "XXXABC")
n3 = overlap("XXXABC", "ABC")
print(n1, n2, n3)

# [2] find minimum of length of two strings
# [1] appropriate looping construct
# [2] if statement with correct string indexing
# [2] tests pass and function behaves as specified
3 0 3

Question 2 [5]

def merge(x, y):
    i = overlap(x, y)
    return x + y[i:]

s1 = merge("XXXABC", "ABCYYY")
s2 = merge("ABCYYY", "XXXABC")
s3 = merge("XXXABC", "ABC")
print(s1, s2, s3)

# [1] call overlap function
# [2] calculate merged string 
# [2] tests pass and function behaves as specified 
XXXABCYYY ABCYYYXXXABC XXXABC

Question 3 [10]

def longest_overlap(sequences):
    max_overlap = 0
    max_i = 0
    max_j = 0
    for i in range(len(sequences)):
        for j in range(len(sequences)):
            if i != j:
                d = overlap(sequences[i], sequences[j])
                if d > max_overlap:
                    max_overlap = d
                    max_i = i
                    max_j = j
    
    return [max_i, max_j, max_overlap]

i, j, k = longest_overlap(["XXXABC", "ABCYYY", "BC"])
print(i, j, k)

# [1] declare max variables
# [2] two nested for loops
# [1] test for i=j
# [1] call overlap function
# [1] check for maximum
# [1] update max values
# [1] return list of values
# [2] tests pass and function behaves as specified
0 1 3

Question 4 [10]

def identify_strand(sequences, n):
    i, j, d = longest_overlap(sequences)
    while d >= n:
        z = merge(sequences[i], sequences[j])
        del sequences[max(i, j)]
        del sequences[min(i, j)]
        sequences.append(z)
        i, j, d = longest_overlap(sequences)
    return sequences

# [2] suitable looping construct with correct condition for termination
# [1] call merge function
# [3] remove two items in correct order
# [1] append merged string to list
# [1] call longest_overlap function
# [2] tests pass and function behaves as specified

sequences = ['tgaaaattcctttctattttaggccc', 'tgaaaattcctttctattttaggcccatgcaat', 'ggcattagggcggttaa', 'atgcaatggcattagggcggttaa', 'ggttaa', 'tgaaaattcctttctattt', 'taggcccatgcaatggcattagggc']
identify_strand(sequences, 4)
['tgaaaattcctttctattttaggcccatgcaatggcattagggcggttaa']

Question 5 [8]

sequence_list = []
with open("dna_fragments/strand_100.fasta") as f:
    for line in f:
        if line[0] != ">":
            sequence_list.append(line.strip())
            
s = identify_strand(sequence_list, 4)
print(s)
['CCCAGGGAGACCACTGACCCATCAACCTGTACGGGAACCTTCTGTATCGTTCTCGGACGGAGAGATAACTACAGTGCCGCTTACAGCCCCTCTGTCGTCG']
sequence_list = []
with open("dna_fragments/strand_200.fasta") as f:
    for line in f:
        if line[0] != ">":
            sequence_list.append(line.strip())
            
s = identify_strand(sequence_list, 4)
print(s)
print(s[-1]) # longest string is last one in list
['GTGTAGGTTCTGACCGATTCGTGC', 'CCGACGTCTGTAATGTAGCCTCATTGTGATTCCACCCTATTGAGGCATTGACTGATGCGGGAAGAGATCTGAAATGAACTGGTCTATGCGACAGAAACTGTGCAGCTACCTAATCTCCTTAGTGTAGGTTCTGACCGATTCGTGCTTCGTTGAGAACTCACAATTTAACAACAGAGGACATAAGCCCTACGCCCATGATC']
CCGACGTCTGTAATGTAGCCTCATTGTGATTCCACCCTATTGAGGCATTGACTGATGCGGGAAGAGATCTGAAATGAACTGGTCTATGCGACAGAAACTGTGCAGCTACCTAATCTCCTTAGTGTAGGTTCTGACCGATTCGTGCTTCGTTGAGAACTCACAATTTAACAACAGAGGACATAAGCCCTACGCCCATGATC
sequence_list = []
with open("dna_fragments/strand_500.fasta") as f:
    for line in f:
        if line[0] != ">":
            sequence_list.append(line.strip())
            
s = identify_strand(sequence_list, 4)
print(s[1]) #longest string is last one in list
AATCTTTTTCACTGACAGTCATATTGGGGTGCTCCTAAGCTTTTCCACTTGGCTGGGTCTGCTAGGCCTCCGTGCCCGGAGTTTCGGCGCTGTGCTGCCGAGAGCCGGCCATTGTCATTGGGGCCTCACTTGAGGATACCCCGACCTATTTTGTCGGGACCACTCGGGGTAGTCGTTGGGCTTATGCACCGTAAAGTCCTCCGCCGGCCTCCCCGCTACAGAAGATGATAAGCTCCGGCAAGCAATTATGAACAACGCAAGGATCGGCGATATAAACAGAGAAACGGCTGATTACACTTGTTCGTGTGGTATCGCTAAATAGCCTCGCGGAGCCTTATGCCATACTCGTCCGCGGAGCACTCTGGTAACGCTTATGGTCCATAGGACATTCATCGCTTCCGGGTATGCGCTCTATTTGACGATCCTTTGGCGCACAGATGCTGGCCACGAGCTAAATTAGAGCGACTGCACAACTGTAAGGTCCGTCACGCAGACGACGG
# [1] correctly open file
# [1] loop over lines
# [2] form list of strands ommiting lines starting '>'
# [1] call identify_strand
# [1] identify longest one (OK to do this by eye but must be commented or otherwise identified)
# [2] repeat for the other two files (could be loop or repeated code)