Week 3

[code]Note that whenever I do ket notation of something, I start with the character | but the arrow doesn't show up, sorry.[/code] [h1]Quantum Algorithms[/h1] A huge advantage in quantum computing is the ability to use entangled particles to expose patterns in data, and as such finding factors for large numbers has a super-polynomial improvement over doing so in a classical machine. The most general quantum algorithm is Lov Grover's search algorithm. (It does use entanglement and interference). [h2]Grover's Algorithm[/h2] Grover's algorithm allows a person to easily do problems such as Find a value of x, where f(x) = k, for some number k. In chess, for example, k would be a boolean value, true if won, false if lose. And x would be the sequence of moves. In a trip planning example, x could be the combination of landmarks to go past to get to a destination in an optimal manner. In the chess example you'd be able to, for example, put all of the first 20 moves for both sides, put them in superposition, (10^120 combinations) and then calculate winning or losing scenarios. It's virtually impossible to directly derive any meaningful information form that, so the algorithm increases the quantum probability amplitude of moves that lead to a win, and decrease those that lead to a loss. However, to do this, you'd need to run through the program multiple times, increasing/decreasing the probability amplitudes slightly through each run, until the probability for a good run is relatively high. It changes the amplitude through use of interference and entanglement, and allows for a polynomial speed up of virtually any kind of search problem like these, from an average of N/2 tries to find the answer on a classical machine and about sqrt(N) tries from a classical machine. --------------------------------------------------------- If for example, x, only tells you if it matches k, i.e, it was true or false, then it can be impossible to find a pattern that leads to a better solution if one wasn't found, especially when x, is large, and as such you can only keep trying different values of x. These kinds of problems are called a randomised database search problems. These sort of unstructured problems allow no ability to direct your searches, so you can expect to find a solution (TRUE/FALSE) after trying about half of all possibilities, i.e, N/2. That was just reiterating what was said in the previous section basically, but here's what's important: * To find the winning solution(s), you'd need to increase the amplitude of desirable outcomes many times until it is high enough to have a likelihood of finding a desirable outcome. * So, you'd first have a mark routine, where answers are labelled as good/bad * And then a diffusion operator, which spreads that probability amplitude to surrounding dials/probabilities, so dials around that would be more probable [h2]The Mark Routine[/h2] So basically it puts the states in a superposition and then flips the phase of the of the states where f(x)= k. Though it needs to be able to recognise the answer in such a manner. [h2]The Diffusion Operator[/h1] So this basically gets the mean value for all of the states in superposition, and then (imagine the dials representing the states), they are flipped about the average, leading to the marked states (the one shifted by pi due to the mark routine), being flipped from phase pi and some value between 0 and 1, to phase of 0 and precisely 1, whereas the non flipped values become 0. All of this happens while they are in superposition, so at the same time, so then they just measure the system and the one with the highest probability (the ones which were marked and flipped), are the ones with the highest probability of being measured at that time. [img src="https://res.cloudinary.com/deylrqt2d/image/upload/v1542410351/Capture_pyfo1n.png"] Also keep in mind that Grover's algorithm offers polynomial speedup rather than exponential, and can do so in problems without any specific sort of pattern/heuristic to go by to speed up the algorithm. Basically it's made for problems that would be brute forced using a classical machine. It's very useful, and much faster than a classical machine. [h2]Factoring[/h2] In computer science, the two most important classes of problems are P and NP. The P class stands for polynomial, that is, we can solve the problem in order of N to the k time (O(N^k)) where N represents the size of the problem and k is some constant number. NP is non-deterministic polynomial. NP problems and problems believed to be outside of P are exponential, and factoring is somewhere between polynomial and exponential. Shor's algorithm can factor numbers in polynomial time, so offers huge speedup. Quantum algorithms are more like hybrids between classical and quantum ones in truth. And as such they involve classical processing. [highlight]The quantum machine essentially serves as a coprocessor to the classical machine.[/highlight] The quantum subrouting of Shor's algorithm is called the [b]Quantum Fourier Transform[/b], or [b]QFT[/b], which creates interference that gives us the period. Then the classical algorithm known as [b]Euclid's Algorithm[/b] is known to find the prime factors of the number N. [img src='https://res.cloudinary.com/deylrqt2d/image/upload/v1542500388/2_7_shorsalgorithm_v2_bzsd05.jpg'] [h3]Modulo Arithmetic[/h3] Doing arithmetic modulo ten is just like keeping track of only the last digit of any addition or multiplication operation, so for example 99mod10 is just 1, rather than 81. Consider a function that adds 4 every step. If you do modulus to it at the same time, it will go as follows: [code]0,4,1,5,2,6,3,0[/code] [h3]The Period of a function[/h3] Many mathematical functions are periodic, that is, if you evaluate x and get k, then after some value n, your function will evalutate to k yet again, The time it takes to get back to k is the period, just like for waves. And in fact, going back to the earlier example, where the increments each step are 4, and modulus 7 is applied, like: [img src='https://res.cloudinary.com/deylrqt2d/image/upload/v1542500913/Screenshot-2018-11-18-00_27_57_bbwqkl.png'] So it repeats like: [code]0,4,1,5,2,6,3,0[/code] The value 0 is [b]7[/b] values apart, so the [i]period[/i] is [b]7[/b]. And f(x+7) = f(x). [h3]Quantum Fourier Transform[/h3] Let's start off by having 2 constants and 1 variable: a and N as constants, x as the variable. If your make a functions [b]f(x) = a^x mod N[/b], then consider the following examples: [highlight] f(x)=a^x mod N where N = 5, and x increases from 0: [ul] [li]When a = 2: 1,2,4,3,1 period of 3[/li] [li]When a = 3: 1,3,4,3,1 period of 3[/li] [li]When a = 4: 1,4,1,4,1 period of 2[/li] [/ul] [/highlight] So the simples way to find the period, as demonstrated above, is to simply try f(x) with lineraly increasing numbers. However this is incredibly inefficient so that's not really used. In classical computing the [b]Fourier Transform[/b] is used. The Fourier transform is used to find the frequence of a signal, which we can easily invert to find the period of it, and, recall, that the function described above is similar to a wave, as in, it has a period, so the Fourier transform can be used. However you first need to collect a sequence of data of the wave like you would by linearly iterating through x, so that's kinda useless. However with a quantum computer that's more practical. If you can take all the possible values of x in superposition, then calculate f(x), you have a superposition of all of the results of the function. If you can pick two values of x out of the superposition, then the period can be found, however that cannot be done directly. Instead the Quantum Fourier Transform can be applied (QFT), which will create interference among all the values of x, in a way which allows one to directly read the period of the function. The modular exponentiation and the QFT both turn out to have a cost that is polynomial in the length of the number in bits, a profound reduction compared to the exponential effort necessary to find the period classically. This is the core of Shor's algorithm. [h3]Euclid's Algorithm[/h3] The final classical step is take r and use it to find the factors. This is done on a classical computer, using Euclids algorithm for finding the greatest common denominator (gcd) of two numbers. Euclids algorithm is polynomial in the size of the numbers, and indeed is fairly fast on a classical computer, so there is no need to be concerned about the difficulty of this step. Assuming we have correctly found the period r, calculating the gcd of N with the numbers [b]a^(r/2) + 1[/b] and [b]a^(r/2) 1[/b] will give us two prime factors of the number N. So in summary, this image: [img src='https://res.cloudinary.com/deylrqt2d/image/upload/v1542550948/Screenshot-2018-11-18-14_22_10_mnflrz.png'] Remember it is based on probabilities, so rerunning may result in a different outcome. [h2]Euclid's Algorithm - properly[/h2] The algorithm is best explained by example, so lets look for the GCD of 126 and 72. Begin by subtracting the smaller number from the larger one, and repeating until we reach zero: 126-72=54 72-54=18 54-18=36 36-18=18 18-18=0 and the last non-zero number is our GCD. [highlight]The algorithm has a beautiful visualization. To find gcd(x,y) , create a rectangle that is xy. Begin filling it with the largest squares that will fit (one or more squares the size of the smaller of x and y). In the remaining rectangle, perform the same operation. Repeat until the original rectangle is filled. The edge length of the smallest size of square is the GCD.[/highlight] So basically keep going until you get to zero, and out of the three numbers present in the subtraction, (x-y=z, so x,y and z), pick the smallest two, and subtract the smaller one of the two from the bigger one of the two (until you get zero), in which case the smallest non-zero number is the GCD (as stated above). [h2] Fourier Transform - properly[/h2] Basically converts spatial/temporal data into frequency data. If for example the sine wave were plotted in frequency space, it would be a single line, as the frequency doesn't change for it. Similarly, when plotting a musical note, say, C on a piano, it would be a single bar, like: [img src='https://res.cloudinary.com/deylrqt2d/image/upload/v1542581771/frequency-amplitude.001_myfeed.jpg'] More complex signals that vary over time even can be represented as well, it's better seen than explained, watch the videoin the link below: [link href='https://www.youtube.com/watch?v=ykNtIbtCR-8']Fourier Transforms[/link] [h3]The Discrete Fourier Transform[/h3] The example above was a continuous Fourier transform, as it uses continuous data, however when working with computers we generally use samples of continuous data, i.e discrete data. There you may have a pattern that repeats itself every 4 bits, then every 8 bits, then every 12 bits and so on, like this: [code]10001000100010001000100010001000...[/code] The result of the transform rises linearly, so the frequency space representation increases infinitely: [img src='https://res.cloudinary.com/deylrqt2d/image/upload/v1542583324/frequency-amplitude.002_u6pxvq.jpg'] This kind of graph can be more properly called the [i]frequency domain representation[/i] of the signal (the signal being the function). [h2] Modular Exponentiation[/h2] [h3]The State[/h3] The state we need is the superposition of |x|f(x) for all of the possible values of x. For reasons related to cleanly finding the period, we need the number of qubits in |x to be twice the number of bits in the number N. If we want to find the factors of a 1,024-bit number (substantially larger than the 768-bit number that is the best solved classically), we should use 2,048 qubits to represent |x. Fortunately, the math works out perfectly, if you take the modulo while calculating the numbers to store, rather than storing all the numbers, this makes it much more feasible to do such problems. So, to write down our state, if N is a 1,024-bit number, we need 2,048 qubits to hold |x and 1,024 to hold |f(x). We will also need various temporary variables during the computation, so we might need a total of five or six thousand qubits in the fastest, most straightforward form of the computation. [h3]The Algorithmic Idea[/h3] Working out say, 3^1024 would normaly be done by doing 3x3x3x3.... 1024 times, instead you could just square the last result. i.e, 3^2, then do (3^2)^2, and so forth, so 3^4, 3^8, 3^16 when you square it each time (multiply by itself). Much faster. This will require 2n repetitions for n bits. The best way to describe the next bit is by an image: [img src='https://res.cloudinary.com/deylrqt2d/image/upload/v1542584225/Screenshot-2018-11-18-23_36_51_mek74f.png'] And by the way, this is what's meant by the partial products etc: [img src='https://res.cloudinary.com/deylrqt2d/image/upload/v1542584301/Screenshot-2018-11-18-23_38_08_yu1kzj.png'] For all this computation, consider parallel computation, which means you need to consider the structure of botht the algorithm and the quantum machine itself. These can all help make the process more efficient and fast. [h3]Quantum Fourier Transform - more[/h3] Like the classical Fourier transform, quantum Fourier transform (QFT) takes data from the original signal representation to the frequency domain representation. The QFT differs from the classical Fourier transform in that it operates on a superposition state and produces a different superposition state as the output. QFT works using interference, which as we saw in the first week makes the signal either stronger or weaker. The components interfere either constructively or destructively, depending on their amplitude and phase. [h3]Shor's Algorithm - speed factors[/h3] [highlight] [ul] [li]The clock speed (gates per second for example)[/li] [li]The architecture of the sysyem itself[/li] [li]The algorithm being used[/li] [/ul] [/highlight] [h2]Machine learning[/h2] Machine learning involves creating a classifier that will tell us whether a datum belongs to a particular class. We can make a distinction between the strategy used to create the classifier, and the details of the problem being solved. Classifiers learn how to do their work. They use training data to find the best set of parameters for the problem at hand, then the training data is discarded before the classifier is used for production work. Supervised learning uses data that someone else has already classified (e.g., this is malware, that isnt) and learns what characteristics of the data define a class. Unsupervised learning instead looks at the data and figures out how many different groups, or clusters, there are in the data. Reinforcement learning is more iterative, with the program learning how to be rewarded in real-world situations. There are many mathematical classification techniques. k -nearest neighbor, support vector machines (SVM), and k -means clustering are relatively straightforward, and all can be partially solved or supported using a quantum algorithm known as HHL. [h2]The HHL Algorithm[/h2] In 2009, Aram Harrow, Avinatan Hassidim and Seth Lloyd found an algorithm (called, naturally enough, HHL) that helps us prepare a state |x=A1|b, where A1 is the inverse of the matrix A. At first glance, it might appear that it will solve for x, as we are often told to do in math problems, but there is a catch involving the data representation. In these linear algebra problems, the natural data representation is to write down the data as a large vector. We might have, for example, a billion data items, which we can write as a vector with a billion entries. Writing down this vector naturally requires a billion time steps. HHL, instead, encodes each of those data elements in the amplitude of a single quantum value in a register, using a superposition of all one billion elements. Because 230>109 , we can store all of our data in 30 qubits, total, instead of a billion separate memories. Of course, the register has to be normalized. Then we can calculate on all billion values at the same time, in superposition. In the above equation, using HHL, we start with our data in the superposition of |b, and end with the superposition of all results in |x . The catch, as in all quantum algorithms, is that we cant extract all of the data values. Instead, once again, we design our algorithm to create interference in a pattern that tells us something about the original data distribution. For example, we can learn that the 1,117th entry in the vector x is the largest data element. Or, treating the whole data set as a vector in an n-dimensional space (in this case, a 30-dimensional space), we can find the distance between |x and some other vector |z . If we use HHL properly for certain tasks, it is exponentionally faster than the best known classical algorithm for the same task. However, there are caveats to achieving that performance, and researchers have not yet been able to rule out that efficient classical algorithms exist for the same tasks when all of the constraints are taken into account. [h2]QRAM[/h2] An important caveat is the preparation of the vector |b containing our original data. If it is a superposition of data values we have stored in a classical computer, then it can take a billion time steps to create a superposition of all billion values, and we have discarded our quantum algorithms performance advantage. This is the basis of our assertion in the first week that quantum computers in general arent good at big data problems. However, Giovannetti, Lloyd and Maccone proposed piece of hardware that, if developed, could be used to prepare |b quickly: a quantum random access memory, or QRAM. In a normal (classical) memory (RAM), we give the memory an address, and it returns the value stored at that address. A QRAM is designed to hold classical data but allow quantum queries. If we can give it an address in superposition, and it returns us the data also in superposition, then we can create our superposition for |b using only a few Hadamard gates and a single query to the QRAM. Of course, the creation of the data itself requires O(N) time for N data items, but then we can run many programs repeatedly against the same data, amortizing that cost, as opposed to a purely quantum memory, where the superposition is destroyed on every run and we must recreate it. [h2]Other algorithms[/h2] HHL planted the seed, but the more practical algorithms have been in follow-on work and in other work. Lloyd, Mohseni and Rebentrost created algorithms for supervised and unsupervised machine learning. Ambainis extended HHL to be more practical with complex datasets. HHLs running time depends in part on the precision you need in the answer; Childs, Kothari and Somma showed how to reduce this dependence dramatically. Within the field of quantum algorithms, quantum machine learning and more broadly quantum artificial intelligence, including such topics as quantum neural networks, is perhaps the fastest-moving area. Given the importance of the problems being attacked, if the dependence on large data sets can be resolved, quantum AI has the potential to drive the broad adoption of quantum computers. [h2]Quantum Chemistry Algorithms[/h2] You can basically take a system, i.e, a set of molecules, proteins etc, then map it to a quantum computational device, and then simulate it responding to light/thermal engergy etc, and then extracting the response frequencies using the Quantum Fourier Transform. These sort of algorithms can be called 'model spectroscopy'. The Hamiltonian is a linear operator that takes vectors and transforms them into new vectors. (In physics, an operator is a function over a space of physical states to another space of physical states). If a transformation applied to a vector results in the same vector, it is called the eigenvector. The length of the vector may be changed, not the direction though, the change in length is the eigenvalue. [highlight]One can describe quantum chemistry as being similar to crash test dummies: rather than solving lots of equations, some important quantum chemistry algorithms model the behavior of molecules very directly. We assign qubits to behave like parts of a quantum mechanical system, then crash them (apply the conditions we want to learn about), then measure the results to see what happened, just like sensors attached to crash test dummies help us learn about the dynamics of an automobile crash. Intuitively, it might seem that to model a molecule, we might assign a qubit to each electron, or to each point in space. However, the states in quantum chemistry algorithms are a little bit different. Instead, with only a few qubits, we can get a very low-precision answer to a question, whereas with more qubits, we can get a higher-precision answer.[/highlight] In problems where really high precision is needed, efficiency is a challenge, and to tackle the problem, two of the most important algorithms are the variational quantum eigensolver (VQE) and the quantum approximate optimization algorithm (QAOA).