I have a string of characters
abcdefghijklmnopqrstuvwxyz_
I want to take this string of characters and sort them by the number of times they appear in a large block of characters. For example:
cwrxwzbgickpjbp_svnudntddwdqbfgzyiqpuxddmpvyfquosmicfzkjekxzchngpqaksafulateukuwomdrwza_n_ptzktjzcuibnebe_tqessrzqewgkadrkvtyznaupodanwazopg_fijcoojojbsolr_ejesukzc_quochdnmti_lkvrsegyieqlqysuxdvetkqtkhxaiypfdiddztlicjurnllriopdtuuzpryrsepfydyeg_xkr_ruxp_lgqesysidfsygztwrba_ay_gaqqklbrvr_lbhawjraqujfxptmuvqfzklfodgaqrnhjravksjwemoosdlxtvw_qspxmlvqryusfixzlkb_p_c_tepzozzwnokvqspkizygoqpbhjnsxopchzgapctowbrletrunlgnvzpfwrqgedo_s_ygkxz_mpncnve_gfpbotupawevhfxvqhwlerupjfibosbvhiijrodigzyhy_iijes_xsqorshhdzkjqitpljsftpitjetwmzqiabyiewgtbjaddtsjkckcxxvlyrchloetluxkohn_uihkdjpcqgvejanslakmwendgkmvmayknvjjnr_kdapnumwvz__lsimxdtrflyleykxejl_jbkhexpcyreoapelqzzyriyrbxdgbgwrrxlj_pt_mpwubvbveakxfsbfgj___
I also want to drop any characters after and including the underscores once I have these sorted.
Is recursion the right idea I need to look at here?
EDIT
Example of what may be output:
afiskjweocnsdkspwjrhfg
Basically the characters will simply be sorted based on their frequency in a single line.
<?php
$text = 'ahugechunkofatext';
$charCounts = count_chars($text, 1);
arsort($charCounts);
$chars = array_map('chr', array_keys($charCounts));
$chars = array_filter($chars, function ($char) {
return !in_array($char, ['_']); // A list of chars that you don't want
});
echo implode('', $chars) . PHP_EOL;
You could use collections.Counter
to count the characters in the big string:
import collections
walloftext = """cwrxwzbgickpjbp_svnudntddwdqbfgzyiqpuxddmpvyfquosmicfzkjekxzchngpqaksafulateukuwomdrwza_n_ptzktjzcuibnebe_tqessrzqewgkadrkvtyznaupodanwazopg_fijcoojojbsolr_ejesukzc_quochdnmti_lkvrsegyieqlqysuxdvetkqtkhxaiypfdiddztlicjurnllriopdtuuzpryrsepfydyeg_xkr_ruxp_lgqesysidfsygztwrba_ay_gaqqklbrvr_lbhawjraqujfxptmuvqfzklfodgaqrnhjravksjwemoosdlxtvw_qspxmlvqryusfixzlkb_p_c_tepzozzwnokvqspkizygoqpbhjnsxopchzgapctowbrletrunlgnvzpfwrqgedo_s_ygkxz_mpncnve_gfpbotupawevhfxvqhwlerupjfibosbvhiijrodigzyhy_iijes_xsqorshhdzkjqitpljsftpitjetwmzqiabyiewgtbjaddtsjkckcxxvlyrchloetluxkohn_uihkdjpcqgvejanslakmwendgkmvmayknvjjnr_kdapnumwvz__lsimxdtrflyleykxejl_jbkhexpcyreoapelqzzyriyrbxdgbgwrrxlj_pt_mpwubvbveakxfsbfgj___"""
wallcount = collections.Counter(walloftext)
Then use these counts to sort the original alphabet:
alphabet = "abcdefghijklmnopqrstuvwxyz_"
sortedalph = sorted(alphabet, key=lambda c: wallcount[c])
(This sorts by increasing frequency: the result has the least-frequent letter first. If you want it the other way around, throw in a -
before wallcount
in the lambda.)
Finally, join the sorted alphabet back into a string and chop off the underscore and everything after it:
finalalph = "".join(sortedalph).split("_")[0]