如何通过尽可能地减少数学公式来简化数学公式

I will strive to be objective in the question.

I have a database with millions of mathematical formulas created by an application, these are formulas that respect a logic:

1) Only 5 mathematical operations involved: addition, subtraction, multiplication, division and power (+ - * / ^)

2) Each operation is delimited/grouped by parentheses.

The "parameters" can be simple, such as a constant or a variable.

Example: (x + 15)

In the compound case, I may have:

(x * (x + 15))

It is important to remember that the correct order of operations was guaranteed with the use of parentheses.

Example:

((x * (x + 15)) / 15)

My challenge is to reduce these formulas.

They are not useful or repeat themselves a lot when the operations are the same and no matter the order of the factors.

Example:

((((x + 4) + 8) - x) / 12)

That equals 1, it's not a formula, it's a constant, and I have to ignore it.

And duplicities like ((x + 4) + 8) and ((8 + x) + 4) need to be eliminated.

That's why I need to reduce.

Just to be aware, all formulas are standardized with the parentheses and spaces between operations.

I created a routine in PHP using regular expressions to make substitutions, I managed to make huge progress reducing in a specific formula from 8 thousand possibilities to less than three hundred.

However, as the size (not complexity because they are not complex) of the formulas increases, my routine can no longer be efficient.

I need an algorithm, not a routine, to apply the mathematical reduction we learned in school.

I think it is possible since the formulas are standardized and limited to 5 basic mathematical operations.

I am using the EvalMath class obtained in GitHub to aid in the reduction and execution of the formulas.

To give you a better idea, the formulas are abstract and each "@" is replaced in real time by constants and variables.

(@ + @)
(@ / @)
(@ ** @)
((@ + @) + @)
((@ + @) ** @)
((@ - @) + @)
((@ - @) / @)
((@ - @) ** @)
((@ * @) + @)
((@ * @) / @)
((@ * @) ** @)
((@ / @) / @)

Here is a snippet of PHP code that is part of my reduction routine.

Within a WHILE (TRUE) I've grouped rules that repeat themselves until no substitutions are made.

In the end I have an array with numerous duplicities obtained with some reduction and reordering of the elements of the formula, something that an array_unique () solves.

I really need help, my brain is exploding with this problem.

Thank you.

<?php
    $Math_Formulas = array(
        '(((x + 7) ** 9) - 9)',
        '(((x ^ 3) - 9) - 5)',
        '(((2 + x) + x) * x)',
        '(((x + 3) / 6) / 8)',
        '(((x - 5) + 6) ** 2)',
        '(1024 ^ (x / 5))',
        '((3 - (x + 6)) + 3)',
        '(((x ^ 3) + 9) * 6)',
    );

    while (TRUE)
    {
        $changed = FALSE;

        // Rule 1: (x - x) = 0
        for ($i = 0; $i < count($Math_Formulas); $i++)
        {
            $Formula = trim(preg_replace_callback('/([^-]?)([a-z]+?) [-] ([a-z]+?)/', function($matches) {
                $Expression = $matches[0];
                if ($matches[2] == $matches[3])
                {
                    $Expression = $matches[1] . '0';
                }
                return($Expression);
            }, $Math_Formulas[$i]));
            if ($Formula != $Math_Formulas[$i])
            {
                $changed = TRUE;
                $Math_Formulas[$i] = $Formula;
            }
        }

        // Rule 2: ...

        if (!$changed)
        {
            break;
        }
    }

    $Math_Formulas = array_values(array_unique($Math_Formulas));
?>

UPDATE 1:

I think if the "reverse polish notation" had been used in the creation of the formulas everything would be much easier, but with the formulas that I have, I need to rearrange the parameters (in ascending or descending alphabetical order) to be able to compare them.

In RPN:

(x + 4) + 5) becomes "x 4 5 +"

(X + 5) + 4) becomes "x 5 4 +"

How to compare both? And what about larger functions?

I think I made a mistake by not detailing the technique used in the 14 "regular expressions" that I am applying to simplify, as much as possible, these formulas. There is more than regular expression in the process:

Original Formula: (((4 - 5) + x) + 8)

Step 1: Addition (or subtraction or multiplication) of the two-to-two constants and reduction of the expression, without the parentheses.

Formula: ((-1 + x) + 8)

Step 2: Delete the parentheses for ((n +- n) +- n) or (n +- (n +- n)).

Formula: (-1 + x + 8)

Step 3: Reorder the parameters in descending alphabetical order.

Formula: (x + 8 - 1)

Step 4: In the loop, Step 1 is executed again.

Final Formula: (x + 7)

There are more transformations, for example (x + x + x) becomes (3 * x), (-x + x) becomes 0.

All of this was getting beautiful, but when I came across functions such as ((x * 9) * (x * 5)) / 9), that logic lost efficiency. I would have to create at least one more 14 other nested rules.

So the general scheme would be to first convert the expression into an expression tree format. This is tree structure where each node represents either a mathematical operation, a number or a variable. You can use the shunting yard algorithm to do the conversion and might find evalmath could be modified to produce a tree rather than simply evaluate it. There are probably other Python libraries which can do more than evalmath.

Once you have a tree structure it becomes much easier to manipulate. You can manipulate small sub-trees to produce simplier forms, applying some of the mathematical rules you learnt in high school.

For example you might want to expand brackets. The expression (x+3)*(x+5) would be represented as a tree as

              *
        +           +
     x     3     x     5 

this tree could be rewritten as x^2 + 8 x + 15

              +
        ^                +
     x     2        *        15
                 x     8

The general scheme would probably be expanding all your brackets. Then collect like terms and simplify.

Allowing only the four operations (+ - * /), all these expressions are so-called rational fractions, i.e. the ratio of two polynomials.

Indeed, the following rules apply:

p/q + p'/q'   = (pq' + p'q)/pq
p/p.p'/'q     = (pp')/(qq')
(p/p')/(q/q') = (pq')/(p'q)

so that the combination of two rational fractions is also a rational fraction.

You can implement a system where every expression is described as a pair of polynomials (with integer or rational coefficients), and integrate the above formulas. This essentially takes polynomial addition and multiplication.

Sometimes, the numerator and denominator will have common factors, to be simplified. You can spot these by the Eucidean algorithm on polynomials, giving a polynomial GCD. (Then you need polynomial division as well.)

By this system, simplifications will come on their own. Furthermore, the expression will get normalized so that you can spot duplicates.

If there is a single variable, the polynomials will be univariate and their representation is simple. For several variables, it becomes a little more involved.

Now, if you allow exponentiation, two cases to distinguish:

  • exponents can only be positive integers: then you can expand p^n and q^n to other polynomials by the multinomial rule (but the size will explode quickly);

  • exponents can be rational constants or functions of a variable: the whole method breaks down.

Final remark:

In the end, you will have to produce the simplified evaluation rule, which is the ratio of two polynomials. The efficient way to express such evaluation is Horner's scheme. In some cases, this will bring you back to where you started from !

Example:

Simplify

(1 - x / y) (x + y) + (x * x) / y

Steps:

1 - x / y                 => (y - x) / y
x + y                     => (x + y) / 1
(y - x) / y . (x + y) / 1 => (y² - x²) / y
x * x                     => x² / 1
y                         => y / 1
(x * x) / y               => x² / y
(y² - x²) / y + x² / y    => y³ / y²
y³ / y²                   => y / 1