As I’ve been doing the last couple of years, I will be going to the
O’Reilly MySQL Conference & Expo. In addition to
the tutorial and the replication sessions that I will be holding
together with Lars,
I will be holding a session about the binary log together with Chuck
from the Backup team which the Replication team normally works very
close with.
This year, O’Reilly also have a Friend of the Speaker
discount of 25% that you can use when you register using the codemys10fsp.
The sessions that we are going to hold are listed below. Note that I
am using Microformats, which will
allow you to easily extract and add the events to your calendar using,
for example, the Operator
plugin for Firefox.
See you there!
I’ve been immersed in the world of automated deployment systems for quite a while. Because I like Python, I’ve been using Fabric, but I also dabbled in Puppet. When people are asked about alternatives to Puppet in the Python world, many mention Fabric, but in fact these two systems are very different. Their main difference is the topic of this blog post.
Fabric is what I consider a ‘push’ automated deployment system: you install Fabric on a server, and from there you push deployments by running remote commands via ssh on a set of servers. In the Ruby world, an example of a push system is Capistrano.
The main advantages of a ‘push’ system are:
The main disadvantages of a ‘push’ system are:
Puppet is what I consider a ‘pull‘ automated deployment system (actually to be more precise, it is a configuration management system). In such a system, you have a server which acts as a master, and clients which contact the master to find out what they need to do, thus pulling their configuration information from the master. In Puppet, configuration files are called manifests. They are written in a specific language and they are declarative, i.e. they tell each client what to do, not how to do it. The Puppet client software running on each server knows how to interpret the manifest files and how to translate them into actions specific to the operating system of that server. For example, you specify in your manifest file that you want a user created and you don’t need to say ‘run the adduser command on server X’. Other examples of ‘pull’ deployment/configuration management systems are bcfg2 (Python),Chef (Ruby) and slack (Perl). A newcomer in the Python world is a port of Chef called kokki (it looks like it’s very much in its infancy still, but I hope the author will continue to actively develop it).
The main advantages of a ‘pull’ system are:
The main disadvantages of a ‘pull’ system are:
My particular preference is to use a ‘pull’ system for the initial configuration of a server, including all the packages necessary to deploy my application (for example tornado). For the actual application code deployment, I prefer to use a ‘push’ system, because it gives me more control over how exactly I do the deployment. I can take a server out of the load balancer, deploy, test, then put it back, rinse and repeat.
In discussions with Holger Krekel at PyCon, I realized that execnet might be a good replacement for Fabric for my needs. It already provides remote command execution via ssh, and an rsync-like file transfer protocol. All it needs is a small library of functions on top to do common system administration tasks such as running commands as sudo, etc. I also want to look into kokki as a replacement for Puppet in my deployment architecture.
A parting thought: my colleague Dan Mesh suggested using a queuing mechanism for the client-server protocol in a ‘pull’ system. In fact, I am becoming more and more convinced that as far as scalability is concerned, when in doubt, use a queuing mechanism. In this deployment architecture, the master would post tasks to be done by a specific client to a central queue. The client would check the queue periodically for a task assigned to it, would execute it then would send a report back to the server when done. Of course, you need to worry about authentication in this scenario, but it seems that it would solve a lot of the scalability issues that both push and pull systems exhibit. Who knows, we may build it at Evite and open source it…so stay tuned ![]()
Hello everybody,
One month earlier than planned we have the great pleasure to announce you that the company called FromDual goes operational today!
We are excited about this step and it is an new era in our personal evolution to get back in full-contact with customers and solve their real life day-to-day MySQL problems.
So we are happy hearing from you and to help you solving your individual MySQL problems…
You can find us at FromDual or you can drop us a line.
Regards,
Oli Sennhauser (aka Shinguz)
Senior MySQL Consultant at FromDual
FromDual provides neutral and vendor independent MySQL consulting, training and other services around MySQL and its derivatives. The company concentrates on the individual needs of its customers and achieves, in a close co-operation the best results for their problems.
Our consultants have been working in many projects in Europe. We were involved in small start-ups, medium size enterprises and huge world wide operating top-500 companies and solved their Performance Problems, developed Architecture & Design studies with them, answered their operation questions, and reviewed their Backup/Recovery concepts.
FromDual does on-site and remote consulting, remote emergency aid and helps its customers to fill MySQL staff gaps if needed.
The company is privately owned. Its HQ is close to Zurich in Switzerland.
This is a note for myself, but maybe it will be useful to other people too.
I’ve been using Fabric version 1.0a lately, and it’s been working very well, with an important exception: when launching remote processes that get daemonized, the ‘run’ Fabric command which launches those processes hangs, and needs to be forcefully killed on the server where I run the ‘fab’ commands.
I remembered vaguely reading on the Fabric mailing list something about the ssh channel not being closed properly, so I hacked the source code (operations.py) to close the channel before waiting for the stdout/stderr capture threads to exit.
Here’s my diff:
git diff fabric/operations.py diff --git a/fabric/operations.py b/fabric/operations.pyindex 9c567c1..fe12450 100644--- a/fabric/operations.py+++ b/fabric/operations.py@@ -498,12 +498,13 @@ def _run_command(command, shell=True, pty=False, sudo=False, user=None): # Close when done status = channel.recv_exit_status() + # Close channel+ channel.close()+ # Wait for threads to exit so we aren't left with stale threads out_thread.join() err_thread.join()- # Close channel- channel.close() # Assemble output string out = _AttributeString("".join(capture_stdout).strip())I realize this is a hack, but it solved my particular problem…If you’ve seen this and have found a different solution, please leave a comment.
A few years ago MySQL+memcached and PostgreSQL+memcached were the only choices for high-scale applications. That has changed with the arrival of NoSQL. Change is good. Open-source monopolies are not much better than closed-source ones from the perspective of an end user. I expect MySQL to focus much more on the needs of high-scale applications to remain relevant. I also expect it to play better with others as it is no longer the only persistent data store for high-scale applications.
I think that MySQL+memcached is still the default choice and I don’t think it is going away in the high-scale market. But some high-scale applications either don’t need all of the features of a SQL RDBMS or are willing to go without those features to scale. This isn’t a blanket endorsement of NoSQL as the definition of NoSQL is weak. I am referring to the NoSQL systems that support high-scale.
I don’t believe all of the bad press that MySQL receives from high-scale applications. I know that some problems with MySQL are self-inflicted (seriously, I know this). It is hard to diagnose many problems for which the primary symptom is a slow MySQL server so it is also hard to identify self-inflicted problems. I also don’t think that some NoSQL systems will provide a different scale-out experience than MySQL given that some NoSQL systems scale-out by sharding (just like MySQL) and that I can deploy MySQL like NoSQL (disallow joins and secondary indexes, use HANDLER statements)
I also wonder whether affordable SSD/Flash reduces the need to migrate from MySQL to NoSQL. Many MySQL deployments that were IO bound when it was difficult to get more than a few thousand IOPs on a commodity server can now get 10,000 to 100,000 IOPs in that server at commodity prices.
MySQL and NoSQL are also at significantly different stages. MySQL is mature and maturity has its benefits. MySQL has amazing support and documentation. There are client libraries for almost every language that you should use. There are even bindings for languages you shouldn’t use. The MySQL C API is easy to use. The JDBC driver is awesome, even if support for JDBC makes it much more complex than needed. There is a lot of MySQL expertise that can be hired or rented (MySQL, Monty Program, Percona, Open Query, Pythian, FromDual) and there is some innovation (not enough companies, but they are doing amazing things) from third-parties such as InfiniDB, InfoBright and TokuDB.
What happened?
NoSQL systems are improving faster than MySQL, MySQL has focused on features for the enterprise RDBMS market in the past two releases and the changes we need from MySQL are hard to implement. Change is hard because MySQL is a complex server that supports many features. Change is also much harder than it should be because of the MySQL coding style. Parts of it are not modular and features are entangled. Some of the difficulty could be overcome were there interest from external contributors. There are external contributors willing and able to improve server code but they are working on other projects like NoSQL. The MySQL effort is also split (or diluted) between official MySQL, Drizzle and MariaDB.
What really happened?
I don’t know. It may have been better for the business of MySQL to focus on the enterprise market. I can describe some of the problems that need to be fixed in MySQL to make things easier for me. I think other high-scale applications share these problems:
What is NoSQL?
Do your homework when evaluating a NoSQL system as they differ greatly from each other:
I was using the latest version of HAProxy 1.3 and was load balancing backend MySQL servers while also checking their ports, so if one server went down it would be taken out of the load balancing pool. However, since the port checks in HAProxy happen at the TCP level, the MySQL instance which was being hit by the port checks wasn’t happy, because it wasn’t a proper MySQL connection. As a result, after a number of some checks, MySQL refused to allow clients to connect, with a message like this:
OperationalError: (1129, “Host ‘myhost’ is blocked because of many connection errors; unblock with ‘mysqladmin flush-hosts’”)
Solution: upgrade HAProxy to the newly released version 1.4 (at the date of this writing, the exact version is 1.4.0). Documentation is here.
For MySQL specific checks, you can specify ‘option mysql-check’ in a backend or ‘listen’ section of the configuration file. For example, I have something similar to this in my HAProxy configuration file:
listen mysql 0.0.0.0:3306
mode tcp
option mysql-check
balance roundrobin
server mysql1 10.0.1.1:3306 check port 3306
server mysql2 10.0.1.2:3306 check port 3306 backup
Martin Scholl (@zeit_geist) has started a new project based on the PBXT storage engine: EPBXT – Embedded PBXT! In his first blog he describes how you can easily build the latest version: Building Embedded PBXT from bzr.
The interesting thing about this project is that it exposes the “raw” power of the engine. Some basic performance tests show this really is the case.
At the lowest level, PBXT does not impose any format on the data stored in tables and indexes. When running as a MySQL storage engine it uses the MySQL native row and index formats. Theoretically it would be possible to expose this in an embedded API. The work Martin is doing goes in at this level. The wrapper around the engine determines the data types, data sizes, row and index format. Comparison operations for the data types are also supplied by the embedded code or user program.
This flexibility will make it possible for an application to store its own data very efficiently. As Martin suggested, it would also be possible to use Google’s protobuf’s for the row format. This would eliminate the need to use an ALTER TABLE for many types of changes to a table’s definition!
Of course, EPBXT is still a way from realizing this vision, and Martin has some very specific problems he wants to solve with the development. However, judging by his command of the code within such a short time, this is going to be a project to watch in the future!
First lets check if write cache is disabled for a zvol rpool/iscsi/vol1
milek@r600:~/progs# ./zvol_wce /dev/zvol/rdsk/rpool/iscsi/vol1Write Cache: disabled
Now lets issue 1000 writes
milek@r600:~/progs# ptime ./sync_file_create_loop /dev/zvol/rdsk/rpool/iscsi/vol1 1000real 12.013566363user 0.003144874sys 0.104826470
milek@r600:~/progs# ./zvol_wce /dev/zvol/rdsk/rpool/iscsi/vol1 1milek@r600:~/progs# ./zvol_wce /dev/zvol/rdsk/rpool/iscsi/vol1Write Cache: enabledmilek@r600:~/progs# ptime ./sync_file_create_loop /dev/zvol/rdsk/rpool/iscsi/vol1 1000real 0.239360231user 0.000949655sys 0.019019552
Worked fine.
The zvol_wce program is not idiot-proof and it doesn’t check if operation succeeded or not. You should be able to compile it by issuing: gcc -o zvol_wce zwol_wce.c
milek@r600:~/progs# cat zvol_wce.c/* Robert Milkowski http://milek.blogspot.com*/#include <unistd.h>#include <stropts.h>#include <sys/types.h>#include <sys/stat.h>#include <fcntl.h>#include <stropts.h>#include <sys/dkio.h>int main(int argc, char **argv){ char *path; int wce = 0; int rc; int fd; path = argv[1]; if ((fd = open(path, O_RDONLY|O_LARGEFILE)) == -1) exit(2); if (argc>2) { wce = atoi(argv[2]) ? 1 : 0; rc = ioctl(fd, DKIOCSETWCE, &wce); } else { rc = ioctl(fd, DKIOCGETWCE, &wce); printf("Write Cache: %s\n", wce ? "enabled" : "disabled"); } close(fd); exit(0);}
Brian Aker, a brilliant helpful duder, who I learn a lot from. Gives a great talk about what is NoSQL explained in a way for database guys. I warn you, there are some points in this video where you can’t hear Brian due to the audience “participation” but you should get the content.