{"id":34,"date":"2020-06-04T11:32:15","date_gmt":"2020-06-04T11:32:15","guid":{"rendered":"https:\/\/michaeljohnsteiner.com\/?p=34"},"modified":"2021-01-19T13:26:21","modified_gmt":"2021-01-19T13:26:21","slug":"chisquared-cs","status":"publish","type":"post","link":"https:\/\/michaeljohnsteiner.com\/index.php\/2020\/06\/04\/chisquared-cs\/","title":{"rendered":"ChiSquared.cs"},"content":{"rendered":"\n<p>Chi Squared Data\/Byte\/Text Test<\/p>\n\n\n\n<p>Updated: Jan-19,2021<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"csharp\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">using System;\nusing System.Collections.Concurrent;\nusing System.Collections.Generic;\nusing System.Globalization;\nusing System.IO;\nusing System.Linq;\npublic static class ChiSquared\n{\n    \/\/\/ &lt;summary>\n    \/\/\/     Calculated from an English word dictionary containing over 466,000 words.\n    \/\/\/ &lt;\/summary>\n    private static readonly float[] _expectedPercentages = {.0846f, .0189f, .0420f, .0353f, .1098f, .0125f, .0243f, .0274f, .0864f, .0018f, .0089f, .0574f, .0292f, .0715f, .0709f, .0310f, .0019f, .0704f, .0705f, .0647f, .0363f, .0099f, .0085f, .0028f, .0192f, .0041f};\n    \/\/\/ &lt;summary>\n    \/\/\/     Not accurate 100% all of the time.\n    \/\/\/ &lt;\/summary>\n    \/\/\/ &lt;param name=\"path\">&lt;\/param>\n    public static bool IsFileCompressed(this string path)\n    {\n        var arr = File.ReadAllBytes(path);\n        var r1  = arr.ChiSquaredTest();\n        return r1.isRandom;\n    }\n    \/\/\/ &lt;summary>\n    \/\/\/     Tests a buffer for randomness. Returns chi squared values.\n    \/\/\/     isRandom - is the buffer a random sequence.\n    \/\/\/     Quality - Less than 1 or greater than 1 is off target. Observed is off expected.\n    \/\/\/     Entropy - Calculates a 8 bit Entropy level of the buffer as a percentage of perfect disorder 100%\n    \/\/\/     ExpectedChiSq - The expected chi squared value.\n    \/\/\/     LowLimit - (R - (2*sqrt(R)))\n    \/\/\/     chiSqValue - The observed chi squared value.\n    \/\/\/     UpperLimit - (R + (2*sqrt(R)))\n    \/\/\/ &lt;\/summary>\n    \/\/\/ &lt;param name=\"bArr\">The byte Array&lt;\/param>\n    public static (bool isRandom, float Quality, float Entropy, int ExpectedChiSq, float LowLimit, float chiSqValue, float UpperLimit) ChiSquaredTest(this byte[] bArr)\n    {\n        if (bArr != null)\n        {\n            var iArr = Ia(bArr);\n            var ent  = Entropy(bArr);\n            if (ent &lt; 80)\n                return (false, 0, ent, 0, 0, 0, 0);\n            var aLen = iArr.Length;\n            var rLim = aLen \/ 10;\n            var n    = aLen;\n            var r    = rLim - 1;\n            var freq = new ConcurrentDictionary&lt;int, int>();\n            iArr.AsParallel().WithDegreeOfParallelism(2).ForAll(I =>\n            {\n                var iT = Math.Abs(Math.Abs(I) % rLim - rLim);\n                if (!freq.ContainsKey(iT))\n                    freq.TryAdd(iT, 1);\n                else\n                    freq[iT] += 1;\n            });\n            var t  = freq.Sum(e => (float) Math.Pow(e.Value, 2));\n            var cS = Math.Abs(r * t \/ n - n);\n            var fL = r - 2.0f * (float) Math.Sqrt(r);\n            var fH = r + 2.0f * (float) Math.Sqrt(r);\n            var iR = (fL &lt; cS) &amp; (fH > cS);\n            var q  = cS \/ r;\n            var nfL = 0;\n            var nfH = fH - fL;\n            var ncS = cS - fL;\n            return (iR, q, ent, (int)(r-fL), (int)nfL, (int)ncS, (int)nfH);\n        }\n        return default;\n    }\n    private static int[] Ia(byte[] ba)\n    {\n        var bal        = ba.Length;\n        var dWordCount = bal \/ 4 + (bal % 4 == 0 ? 0 : 1);\n        var arr        = new int[dWordCount];\n        Buffer.BlockCopy(ba, 0, arr, 0, bal);\n        return arr;\n    }\n    private static float Entropy(byte[] s)\n    {\n        float len = s.Length;\n        var   map = new int[256];\n        for (var i = 0; i &lt; (int) len; i++)\n            map[s[i]]++;\n        var result = 0f;\n        for (var idx = 0; idx &lt; map.Length; idx++)\n        {\n            var frequency = map[idx] \/ len;\n            if (frequency > 0)\n                result -= frequency * (float) Math.Log(frequency, 2);\n        }\n        return result \/ 8f * 100f;\n    }\n    public static int ChiSquaredCount(this byte[] s, byte b)\n    {\n        float len = s.Length;\n        var   map = new int[256];\n        for (var i = 0; i &lt; (int) len; i++)\n            map[s[i]]++;\n        return map[b];\n    }\n    public static int ChiSquaredCount(this string s, char b)\n    {\n        float len = s.Length;\n        var   map = new int[256];\n        for (var i = 0; i &lt; (int) len; i++)\n            map[s[i]]++;\n        return map[b];\n    }\n    public static float ChiSquaredAsPercent(this string s, char b)\n    {\n        float len = s.Length;\n        var   map = new int[256];\n        for (var i = 0; i &lt; (int) len; i++)\n            map[s[i]]++;\n        return map[b] \/ len;\n    }\n    \/\/\/ &lt;summary>\n    \/\/\/     Compute the letter frequencies within the English language.\n    \/\/\/     Use a large English language text block for accurate testing.\n    \/\/\/ &lt;\/summary>\n    \/\/\/ &lt;param name=\"s\">String that contains the large English text&lt;\/param>\n    public static KeyValuePair&lt;char, float>[] ChiSquaredTextAsPercent(this string s)\n    {\n        float len = s.Length;\n        s = s.ToLower(CultureInfo.CurrentCulture);\n        var lst = new Dictionary&lt;char, float>();\n        var map = new int[256];\n        for (var i = 0; i &lt; (int) len; i++)\n            if (s[i].IsLetter())\n                map[s[i]]++;\n        var t = map.Sum(e => e);\n        foreach (var l in \"abcdefghijklmnopqrstuvwxyz\")\n            lst.Add(l, map[l] \/ (float) t);\n        var klst      = lst.OrderBy(e => e.Key).ToArray();\n        var KeyList   = \"\";\n        var ValueList = \"\";\n        foreach (var kv in klst)\n        {\n            KeyList   += $\"{kv.Key},\";\n            ValueList += $\"{kv.Value:.0000},\";\n        }\n        var nlst = lst.OrderBy(e => e.Value).ToArray();\n        return nlst;\n    }\n    public static float ChiSquaredTextTest(this string s)\n    {\n        var realLen = 0;\n        s = s.ToLower(CultureInfo.CurrentCulture);\n        var observed = new Dictionary&lt;char, int>();\n        foreach (var c in s)\n            if (c.IsLetter())\n            {\n                if (!observed.ContainsKey(c))\n                    observed.Add(c, 1);\n                else\n                    observed[c]++;\n                realLen++;\n            }\n        var expected = new Dictionary&lt;char, float>();\n        for (var i = 0; i &lt; 26; i++)\n            expected.Add((char) (i + 97), _expectedPercentages[i] * realLen);\n        var cSList = new List&lt;float>();\n        foreach (var item in expected)\n        {\n            var c = item.Key;\n            if (observed.ContainsKey(c))\n                cSList.Add((float) Math.Pow(observed[c] - expected[c], 2) \/ expected[c]);\n        }\n        return cSList.Sum(e => e) \/ realLen * 100f;\n    }\n    \/\/\/ &lt;summary>\n    \/\/\/     The value of 10 as a combined chi-squared total distance\n    \/\/\/     percentage threshold is subjective. Determined from\n    \/\/\/     about 40 test runs of over 1 million mixed files. Most\n    \/\/\/     non-text files have readings in the 100's\n    \/\/\/ &lt;\/summary>\n    \/\/\/ &lt;param name=\"path\">Path to the file to test&lt;\/param>\n    public static bool IsTextFile(this string path)\n    {\n        return File.ReadAllText(path).ChiSquaredTextTest() &lt; 10;\n    }\n    \n}<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Chi Squared Data\/Byte\/Text Test Updated: Jan-19,2021<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[],"_links":{"self":[{"href":"https:\/\/michaeljohnsteiner.com\/index.php\/wp-json\/wp\/v2\/posts\/34"}],"collection":[{"href":"https:\/\/michaeljohnsteiner.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/michaeljohnsteiner.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/michaeljohnsteiner.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/michaeljohnsteiner.com\/index.php\/wp-json\/wp\/v2\/comments?post=34"}],"version-history":[{"count":2,"href":"https:\/\/michaeljohnsteiner.com\/index.php\/wp-json\/wp\/v2\/posts\/34\/revisions"}],"predecessor-version":[{"id":377,"href":"https:\/\/michaeljohnsteiner.com\/index.php\/wp-json\/wp\/v2\/posts\/34\/revisions\/377"}],"wp:attachment":[{"href":"https:\/\/michaeljohnsteiner.com\/index.php\/wp-json\/wp\/v2\/media?parent=34"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/michaeljohnsteiner.com\/index.php\/wp-json\/wp\/v2\/categories?post=34"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/michaeljohnsteiner.com\/index.php\/wp-json\/wp\/v2\/tags?post=34"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}