One of Rosalind problems deals with finding the “majority element” of a vector, which is defined as the value that appears in more than half of the entries. Note that this is a little different from what we usually call the mode in statistics. This is a pretty simple task to run using, e.g., Python or R.
Here is a solution in Python:
from collections import Counter def maj(lst): """" >>> items = [8, 7, 7, 7, 1, 7, 3, 7] >>> print(maj(items)) 7 """ c = Counter(lst) val, cnt = c.most_common() if cnt > len(lst)/2: return int(val) else: return -1
We could probably use numpy and/or scipy as well, but let’s keep it simple. I really like the
Counter idiom in Python, which allows to keep a record of the item index that is being processed much like
enumerate does. I missed this in Racket. It is quite easy to tabulate a list of values using a
for/list pattern, as the following code illustrates:
(define (counter x lst) (if (null? lst) 0 (if (= x (car lst)) (+ 1 (counter x (cdr lst))) (counter x (cdr lst))))) (define items (list 8 7 7 7 1 7 3 7)) (apply max (for/list ([i (remove-duplicates items)]) (counter i items)))
If we want to return both the item value and the associated counts, there are two options (and probably more): either we rely on a hash table, or we return a list (or a pair) of values and counts, like Python’s
enumerate; this could be something like replacing
(counter i items) with
(list i (counter i items)), but we will need to update the function used when calling
max will not accept a tuple of values.
Here is a little example of using a hash instead of plain lists:
;; Credit: https://stackoverflow.com/a/5741004 ;; See also math/statistics::samples->hash (define (bagify xs) (for/fold ((ht #hash())) ((key (in-list xs))) (hash-update ht key add1 0))) (define (which-max ht) (car (argmax cdr (hash->list ht)))) (define table (bagify items)) (which-max table)
This also (sort of) solves the problem of the joint use of
max described above:
(argmax cadr (for/list ([i (remove-duplicates items)]) (list i (counter i items))))
The same approach could obviously serve the purpose of finding the mode in a list of values, e.g.:
(define (mode xs) (cond [(empty? xs) '()] [else (car (argmax cdr (hash->list (bagify xs))))]))
Note that the above procedure will not handle the case of multiple modes.
Finally, what about
enumerate like idiom in Racket? Here’s one solution:
(define zip (lambda (x y) (map list x y))) (define (enumerate xs) (zip (range (length xs)) xs))