top of page

Check if whitespace or otherwise separable string contains N consecutive words

Writer's picture: Dr Edin HamzićDr Edin Hamzić

This is the first blog in a series of ones I am planning to write that will be focused on small Python tasks useful in the context of bioinformatics and genomics.


These posts will be short and with a simple layout which will be the following:


  • Task or problem: Explaining in simple words the problem that needs to be solved

  • Solution: Presenting and explaining the solution of the stated problem in the form of textual explanation and Python code which will generally be a Python function.



Problem: Check if whitespace or otherwise separable string contains N consecutive words


So, as the title already tells, the main problem that needs to be solved is to check if the input string that can be separated using whitespace or type of separator contains N consecutive elements that are not integers but some type of word. An example of inputs would be the following:


string_a = "ACT1 123 BCL-ABL 01 20 30 BRCA1 BRCA2 MET RET"
string_b = "C,A,G,1,T,A,A,0,0,0,"
string_c = "AC|CG|10|BC|AC|01|11|"

Solution


I am sure there are otherwise to write Python code that solves this problem, but my focus was on NOT using any third-party libraries and also focusing on using Python comprehensions.




def check_n_consecutive_words(input_string, n, sep):
    words = input_string.split(sep)    
    are_words_digits = [not el.strip().isdigit() for el in words]
    length = len(are_words_digits)
    sets_of_checks = [are_words_digits[i:i+n] for i in range(length-2)]
    sets_meet_criteria = [sum(n_words_set) == n for n_words_set in sets_of_checks]
    return any(sets_meet_criteria)








Comments


bottom of page